commit 7ae1f5508d9a33fd58ed3059bd2d569961e3b8bd upstream.
The exception handler is broken for unaligned memory acceses with fldw
and fstw instructions, because it trashes or uses randomly some other
floating point register than the one specified in the instruction word
on loads and stores.
The instruction "fldw 0(addr),%fr22L" (and the other fldw/fstw
instructions) encode the target register (%fr22) in the rightmost 5 bits
of the instruction word. The 7th rightmost bit of the instruction word
defines if the left or right half of %fr22 should be used.
While processing unaligned address accesses, the FR3() define is used to
extract the offset into the local floating-point register set. But the
calculation in FR3() was buggy, so that for example instead of %fr22,
register %fr12 [((22 * 2) & 0x1f) = 12] was used.
This bug has been since forever in the parisc kernel and I wonder why it
wasn't detected earlier. Interestingly I noticed this bug just because
the libime debian package failed to build on *native* hardware, while it
successfully built in qemu.
This patch corrects the bitshift and masking calculation in FR3().
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 74de14fe05dd6b151d73cb0c73c8ec874cbdcde6 ]
When CONFIG_XPA is enabled, Clang warns:
arch/mips/mm/tlbex.c:629:24: error: converting the result of '<<' to a boolean; did you mean '(1 << _PAGE_NO_EXEC_SHIFT) != 0'? [-Werror,-Wint-in-bool-context]
if (cpu_has_rixi && !!_PAGE_NO_EXEC) {
^
arch/mips/include/asm/pgtable-bits.h:174:28: note: expanded from macro '_PAGE_NO_EXEC'
# define _PAGE_NO_EXEC (1 << _PAGE_NO_EXEC_SHIFT)
^
arch/mips/mm/tlbex.c:2568:24: error: converting the result of '<<' to a boolean; did you mean '(1 << _PAGE_NO_EXEC_SHIFT) != 0'? [-Werror,-Wint-in-bool-context]
if (!cpu_has_rixi || !_PAGE_NO_EXEC) {
^
arch/mips/include/asm/pgtable-bits.h:174:28: note: expanded from macro '_PAGE_NO_EXEC'
# define _PAGE_NO_EXEC (1 << _PAGE_NO_EXEC_SHIFT)
^
2 errors generated.
_PAGE_NO_EXEC can be '0' or '1 << _PAGE_NO_EXEC_SHIFT' depending on the
build and runtime configuration, which is what the negation operators
are trying to convey. To silence the warning, explicitly compare against
0 so the result of the '<<' operator is not implicitly converted to a
boolean.
According to its documentation, GCC enables -Wint-in-bool-context with
-Wall but this warning is not visible when building the same
configuration with GCC. It appears GCC only warns when compiling C++,
not C, although the documentation makes no note of this:
https://godbolt.org/z/x39q3brxf
Reported-by: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca829e05d3d4f728810cc5e4b468d9ebc7745eb3 ]
On 64-bit, calling jump_label_init() in setup_feature_keys() is too
late because static keys may be used in subroutines of
parse_early_param() which is again subroutine of early_init_devtree().
For example booting with "threadirqs":
static_key_enable_cpuslocked(): static key '0xc000000002953260' used before call to jump_label_init()
WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xfc/0x120
...
NIP static_key_enable_cpuslocked+0xfc/0x120
LR static_key_enable_cpuslocked+0xf8/0x120
Call Trace:
static_key_enable_cpuslocked+0xf8/0x120 (unreliable)
static_key_enable+0x30/0x50
setup_forced_irqthreads+0x28/0x40
do_early_param+0xa0/0x108
parse_args+0x290/0x4e0
parse_early_options+0x48/0x5c
parse_early_param+0x58/0x84
early_init_devtree+0xd4/0x518
early_setup+0xb4/0x214
So call jump_label_init() just before parse_early_param() in
early_init_devtree().
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
[mpe: Add call trace to change log and minor wording edits.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220726015747.11754-1-zhouzhouyi@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 446cda1b21d9a6b3697fe399c6a3a00ff4a285f5 ]
Since commit 4bf4f42a2f ("powerpc/kbuild: Set default generic
machine type for 32-bit compile"), when building a 32 bits kernel
with a bi-arch version of GCC, or when building a book3s/32 kernel,
the option -mcpu=powerpc is passed to GCC at all time, relying on it
being eventually overriden by a subsequent -mcpu=xxxx.
But when building the same kernel with a 32 bits only version of GCC,
that is not done, relying on gcc being built with the expected default
CPU.
This logic has two problems. First, it is a bit fragile to rely on
whether the GCC version is bi-arch or not, because today we can have
bi-arch versions of GCC configured with a 32 bits default. Second,
there are some versions of GCC which don't support -mcpu=powerpc,
for instance for e500 SPE-only versions.
So, stop relying on this approximative logic and allow the user to
decide whether he/she wants to use the toolchain's default CPU or if
he/she wants to set one, and allow only possible CPUs based on the
selected target.
Reported-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d4df724691351531bf46d685d654689e5dfa0d74.1657549153.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f1901110a89b0e2e13adb2ac8d1a7102879ea98 ]
Currently, almost all archs (x86, arm64, mips...) support fast call
of crash_kexec() when "regs && kexec_should_crash()" is true. But
RISC-V not, it can only enter crash system via panic(). However panic()
doesn't pass the regs of the real accident scene to crash_kexec(),
it caused we can't get accurate backtrace via gdb,
$ riscv64-linux-gnu-gdb vmlinux vmcore
Reading symbols from vmlinux...
[New LWP 95]
#0 console_unlock () at kernel/printk/printk.c:2557
2557 if (do_cond_resched)
(gdb) bt
#0 console_unlock () at kernel/printk/printk.c:2557
#1 0x0000000000000000 in ?? ()
With the patch we can get the accurate backtrace,
$ riscv64-linux-gnu-gdb vmlinux vmcore
Reading symbols from vmlinux...
[New LWP 95]
#0 0xffffffe00063a4e0 in test_thread (data=<optimized out>) at drivers/test_crash.c:81
81 *(int *)p = 0xdead;
(gdb)
(gdb) bt
#0 0xffffffe00064d5c0 in test_thread (data=<optimized out>) at drivers/test_crash.c:81
#1 0x0000000000000000 in ?? ()
Test code to produce NULL address dereference in test_crash.c,
void *p = NULL;
*(int *)p = 0xdead;
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220606082308.2883458-1-xianting.tian@linux.alibaba.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a9f743ceead60ed454c46fbc3085ee9a79cbebb ]
We should call of_node_put() for the reference 'uctl_node' returned by
of_get_parent() which will increase the refcount. Otherwise, there will
be a refcount leak bug.
Signed-off-by: Liang He <windhl@126.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2310c74d418deca0f1d749c45f1f43162510f51 ]
On kprobe registration kernel allocate one insn_slot for new kprobe,
but it forget to reclaim the insn_slot on unregistration, leading to a
potential leakage.
Reported-by: Chen Guokai <chenguokai17@mails.ucas.ac.cn>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dda520d07b95072a0b63f6c52a8eb566d08ea897 ]
QEMU has a -no-reboot option, which halts instead of reboots when the
guest asks to reboot. This is invaluable when used with
CONFIG_PANIC_TIMEOUT=-1 (and panic_on_warn), because it allows panics
and warnings to be caught immediately in CI. Implement this in UML too,
by way of a basic setup param.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8d48562a2729742f767b0fdd994d6b2a56a49c63 upstream.
The recent change to get_phb_number() causes a DEBUG_ATOMIC_SLEEP
warning on some systems:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
1 lock held by swapper/1:
#0: c157efb0 (hose_spinlock){+.+.}-{2:2}, at: pcibios_alloc_controller+0x64/0x220
Preemption disabled at:
[<00000000>] 0x0
CPU: 0 PID: 1 Comm: swapper Not tainted 5.19.0-yocto-standard+ #1
Call Trace:
[d101dc90] [c073b264] dump_stack_lvl+0x50/0x8c (unreliable)
[d101dcb0] [c0093b70] __might_resched+0x258/0x2a8
[d101dcd0] [c0d3e634] __mutex_lock+0x6c/0x6ec
[d101dd50] [c0a84174] of_alias_get_id+0x50/0xf4
[d101dd80] [c002ec78] pcibios_alloc_controller+0x1b8/0x220
[d101ddd0] [c140c9dc] pmac_pci_init+0x198/0x784
[d101de50] [c140852c] discover_phbs+0x30/0x4c
[d101de60] [c0007fd4] do_one_initcall+0x94/0x344
[d101ded0] [c1403b40] kernel_init_freeable+0x1a8/0x22c
[d101df10] [c00086e0] kernel_init+0x34/0x160
[d101df30] [c001b334] ret_from_kernel_thread+0x5c/0x64
This is because pcibios_alloc_controller() holds hose_spinlock but
of_alias_get_id() takes of_mutex which can sleep.
The hose_spinlock protects the phb_bitmap, and also the hose_list, but
it doesn't need to be held while get_phb_number() calls the OF routines,
because those are only looking up information in the device tree.
So fix it by having get_phb_number() take the hose_spinlock itself, only
where required, and then dropping the lock before returning.
pcibios_alloc_controller() then needs to take the lock again before the
list_add() but that's safe, the order of the list is not important.
Fixes: 0fe1e96fef0a ("powerpc/pci: Prefer PCI domain assignment via DT 'linux,pci-domain' and alias")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220815065550.1303620-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd0c153daad135d0ec1a53c5dbe6936a724d6ae1 upstream.
If we use the ancient SysV syscall ABI, we'd better have tell the
kernel how to claim that a negative return value is a success.
Use ->orig_r2 for that - it's inaccessible via ptrace, so it's
a fair game for changes and it's normally[*] non-negative on return
from syscall. Set to -1; syscall is not going to be restart-worthy
by definition, so we won't interfere with that use either.
[*] the only exception is rt_sigreturn(), where we skip the entire
messing with r1/r2 anyway.
Fixes: 82ed08dd1b ("nios2: Exception handling")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d631bd58fe0ea3e3350212e23c9aba1fb606514 upstream.
sys_foo() returns -512 (aka -ERESTARTSYS) => do_signal() sees
512 in r2 and 1 in r1.
sys_foo() returns 512 => do_signal() sees 512 in r2 and 0 in r1.
The former is restart-worthy; the latter obviously isn't.
Fixes: b53e906d25 ("nios2: Signal handling support")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25ba820ef36bdbaf9884adeac69b6e1821a7df76 upstream.
all checks done before letting the tracer modify the register
state are worthless...
Fixes: 82ed08dd1b ("nios2: Exception handling")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45ec746c65097c25e77d24eae8fee0def5b6cc5d upstream.
fill the gaps in there with sys_ni_syscall, as everyone does...
Fixes: 82ed08dd1b ("nios2: Exception handling")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88e0a74902f894fbbc55ad3ad2cb23b4bfba555c upstream.
Commit c164fbb40c43f("x86/mm: thread pgprot_t through
init_memory_mapping()") mistakenly used __pgprot() which doesn't respect
__default_kernel_pte_mask when setting PUD mapping.
Fix it by only setting the one bit we actually need (PSE) and leaving
the other bits (that have been properly masked) alone.
Fixes: c164fbb40c ("x86/mm: thread pgprot_t through init_memory_mapping()")
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ac19ead0dfbabd8e0bfc731f507cfb0b95d6c99 upstream.
When returning from the compare function the u64 is truncated to an
int. This results in a loss of the high nybble[1] in the event select
and its sign if that nybble is in use. Switch from using a result that
can end up being truncated to a result that can only be: 1, 0, -1.
[1] bits 35:32 in the event select register and bits 11:8 in the event
select.
Fixes: 7ff775aca48ad ("KVM: x86/pmu: Use binary search to check filtered events")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220517051238.2566934-1-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ec37d1cbe17d8189d9562178d8b29167fe1c31a upstream
When KVM_CAP_HYPERV_SYNIC{,2} is activated, KVM already checks for
irqchip_in_kernel() so normally SynIC irqs should never be set. It is,
however, possible for a misbehaving VMM to write to SYNIC/STIMER MSRs
causing erroneous behavior.
The immediate issue being fixed is that kvm_irq_delivery_to_apic()
(kvm_irq_delivery_to_apic_fast()) crashes when called with
'irq.shorthand = APIC_DEST_SELF' and 'src == NULL'.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-2-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Ghinea <stefan.ghinea@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 98defd2e17803263f49548fea930cfc974d505aa ]
MSR_CORE_PERF_GLOBAL_CTRL is introduced as part of Architecture PMU V2,
as indicated by Intel SDM 19.2.2 and the intel_is_valid_msr() function.
So in the absence of global_ctrl support, all PMCs are enabled as AMD does.
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220509102204.62389-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93255bf92939d948bc86d81c6bb70bb0fecc5db1 ]
Mark all MSR_CORE_PERF_GLOBAL_CTRL and MSR_CORE_PERF_GLOBAL_OVF_CTRL bits
as reserved if there is no guest vPMU. The nVMX VM-Entry consistency
checks do not check for a valid vPMU prior to consuming the masks via
kvm_valid_perf_global_ctrl(), i.e. may incorrectly allow a non-zero mask
to be loaded via VM-Enter or VM-Exit (well, attempted to be loaded, the
actual MSR load will be rejected by intel_is_valid_msr()).
Fixes: f5132b0138 ("KVM: Expose a version 2 architectural PMU to a guests")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220722224409.1336532-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c985527dd8d283e786ad7a67e532ef7f6f00fac ]
The mask value of fixed counter control register should be dynamic
adjusted with the number of fixed counters. This patch introduces a
variable that includes the reserved bits of fixed counter control
registers. This is a generic code refactoring.
Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-6-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95b065bf5c431c06c68056a03a5853b660640ecc ]
The third nybble of AMD's event select overlaps with Intel's IN_TX and
IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
platforms that support TSX.
Declare a raw_event_mask in the kvm_pmu structure, initialize it in
the vendor-specific pmu_refresh() functions, and use that mask for
PERF_TYPE_RAW configurations in reprogram_gp_counter().
Fixes: 710c47651431 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW")
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220308012452.3468611-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a755753903a40d982f6dd23d65eb96b248a2577a ]
Once MSR_IA32_PERF_CAPABILITIES is changed via vmx_set_msr(), the
value should not be changed by cpuid(). To ensure that the new value
is kept, the default initialization path is moved to intel_pmu_init().
The effective value of the MSR will be 0 if PDCM is clear, however.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7d855c2aff2d511fd60ee2e356134c4fb394799 ]
Inject a #UD if L1 attempts VMXON with a CR0 or CR4 that is disallowed
per the associated nested VMX MSRs' fixed0/1 settings. KVM cannot rely
on hardware to perform the checks, even for the few checks that have
higher priority than VM-Exit, as (a) KVM may have forced CR0/CR4 bits in
hardware while running the guest, (b) there may incompatible CR0/CR4 bits
that have lower priority than VM-Exit, e.g. CR0.NE, and (c) userspace may
have further restricted the allowed CR0/CR4 values by manipulating the
guest's nested VMX MSRs.
Note, despite a very strong desire to throw shade at Jim, commit
70f3aac964 ("kvm: nVMX: Remove superfluous VMX instruction fault checks")
is not to blame for the buggy behavior (though the comment...). That
commit only removed the CR0.PE, EFLAGS.VM, and COMPATIBILITY mode checks
(though it did erroneously drop the CPL check, but that has already been
remedied). KVM may force CR0.PE=1, but will do so only when also
forcing EFLAGS.VM=1 to emulate Real Mode, i.e. hardware will still #UD.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216033
Fixes: ec378aeef9 ("KVM: nVMX: Implement VMXON and VMXOFF")
Reported-by: Eric Li <ercli@ucdavis.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220607213604.3346000-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2fe3cd4604ac87c587db05d41843d667dc43815 ]
Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
and invoke the new hook from kvm_valid_cr4(). This fixes an issue where
KVM_SET_SREGS would return success while failing to actually set CR4.
Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
in __set_sregs() is not a viable option as KVM has already stuffed a
variety of vCPU state.
Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
inverted semantics. This will be remedied in a future patch.
Fixes: 5e1746d620 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 311a06593b9a3944a63ed176b95cb8d857f7c83b ]
Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles
the check by incorporating VMXE into the CR4 reserved bits, via
kvm_cpu_caps. SVM obviously does not set X86_FEATURE_VMX.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a447e38a7fadb2e554c3942dda183e55cccd5df0 ]
Drop vmx_set_cr4()'s explicit check on the 'nested' module param now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, via kvm_cpu_caps. X86_FEATURE_VMX is set in kvm_cpu_caps
(by vmx_set_cpu_caps()), if and only if 'nested' is true.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3a9e4146a6f79f19430bca3f2a4d6ebaaffe36b ]
Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, i.e. in cr4_guest_rsvd_bits. This fixes a bug where KVM
incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
before KVM_SET_CPUID{,2}.
Fixes: 5e1746d620 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Reported-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92dcd3d31843fbe1a95d880dc912e1f6beac6632 ]
In order to be able to experiment with suspend in UML, add the
minimal work to be able to suspend (s2idle) an instance of UML,
and be able to wake it back up from that state with the USR1
signal sent to the main UML process.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2368048bf5c2ec4b604ac3431564071e89a0bc71 ]
Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
MCi_STATUS MSR. The behavior of "all zeros' or "all ones" for CTL MSRs
is architectural, as is the "only zeros" behavior for STATUS MSRs. I.e.
the intent is to inject a #GP, not exit to userspace due to an unhandled
emulation case. Returning '-1' gets interpreted as -EPERM up the stack
and effecitvely kills the guest.
Fixes: 890ca9aefa ("KVM: Add MCE support")
Fixes: 9ffd986c6e ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3a2ba42cbd0b669ce3837ba400905f93dd06c79f upstream.
The bitops compile-time optimization series revealed one more
problem in olpc-xo1-sci.c:send_ebook_state(), resulted in GCC
warnings:
arch/x86/platform/olpc/olpc-xo1-sci.c: In function 'send_ebook_state':
arch/x86/platform/olpc/olpc-xo1-sci.c:83:63: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]
83 | if (!!test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == state)
| ^~
arch/x86/platform/olpc/olpc-xo1-sci.c:83:13: note: add parentheses around left hand side expression to silence this warning
Despite this code working as intended, this redundant double
negation of boolean value, together with comparing to `char`
with no explicit conversion to bool, makes compilers think
the author made some unintentional logical mistakes here.
Make it the other way around and negate the char instead
to silence the warnings.
Fixes: d2aa37411b ("x86/olpc/xo1/sci: Produce wakeup events for buttons and switches")
Cc: stable@vger.kernel.org # 3.5+
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6cfcdda8cbe81eaf821c897369a65fec987b404 upstream.
AMD's "Technical Guidance for Mitigating Branch Type Confusion,
Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On
Privileged Mode Entry / SMT Safety" says:
Similar to the Jmp2Ret mitigation, if the code on the sibling thread
cannot be trusted, software should set STIBP to 1 or disable SMT to
ensure SMT safety when using this mitigation.
So, like already being done for retbleed=unret, and now also for
retbleed=ibpb, force STIBP on machines that have it, and report its SMT
vulnerability status accordingly.
[ bp: Remove the "we" and remove "[AMD]" applicability parameter which
doesn't work here. ]
Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit de979c83574abf6e78f3fa65b716515c91b2613d ]
With CONFIG_PREEMPTION disabled, arch/x86/entry/thunk_$(BITS).o becomes
an empty object file.
With some old versions of binutils (i.e., 2.35.90.20210113-1ubuntu1) the
GNU assembler doesn't generate a symbol table for empty object files and
objtool fails with the following error when a valid symbol table cannot
be found:
arch/x86/entry/thunk_64.o: warning: objtool: missing symbol table
To prevent this from happening, build thunk_$(BITS).o only if
CONFIG_PREEMPTION is enabled.
BugLink: https://bugs.launchpad.net/bugs/1911359
Fixes: 320100a5ff ("x86/entry: Remove the TRACE_IRQS cruft")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Ys/Ke7EWjcX+ZlXO@arighi-desktop
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 625395c4a0f4775e0fe00f616888d2e6c1ba49db ]
GCC-12 started triggering a new warning:
arch/x86/mm/numa.c: In function ‘cpumask_of_node’:
arch/x86/mm/numa.c:916:39: warning: the comparison will always evaluate as ‘false’ for the address of ‘node_to_cpumask_map’ will never be NULL [-Waddress]
916 | if (node_to_cpumask_map[node] == NULL) {
| ^~
node_to_cpumask_map is of type cpumask_var_t[].
When CONFIG_CPUMASK_OFFSTACK is set, cpumask_var_t is typedef'd to a
pointer for dynamic allocation, else to an array of one element. The
"wicked game" can be checked on line 700 of include/linux/cpumask.h.
The original code in debug_cpumask_set_cpu() and cpumask_of_node() were
probably written by the original authors with CONFIG_CPUMASK_OFFSTACK=y
(i.e. dynamic allocation) in mind, checking if the cpumask was available
via a direct NULL check.
When CONFIG_CPUMASK_OFFSTACK is not set, GCC gives the above warning
while compiling the kernel.
Fix that by using cpumask_available(), which does the NULL check when
CONFIG_CPUMASK_OFFSTACK is set, otherwise returns true. Use it wherever
such checks are made.
Conditional definitions of cpumask_available() can be found along with
the definition of cpumask_var_t. Check the cpumask.h reference mentioned
above.
Fixes: c032ef60d1 ("cpumask: convert node_to_cpumask_map[] to cpumask_var_t")
Fixes: de2d9445f1 ("x86: Unify node_to_cpumask_map handling between 32 and 64bit")
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220731160913.632092-1-code@siddh.me
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4b39e88b42d13366b831270306326b5c20971ca ]
The recent change to the PHB numbering logic has a logic error in the
handling of "ibm,opal-phbid".
When an "ibm,opal-phbid" property is present, &prop is written to and
ret is set to zero.
The following call to of_alias_get_id() is skipped because ret == 0.
But then the if (ret >= 0) is true, and the body of that if statement
sets prop = ret which throws away the value that was just read from
"ibm,opal-phbid".
Fix the logic by only doing the ret >= 0 check in the of_alias_get_id()
case.
Fixes: 0fe1e96fef0a ("powerpc/pci: Prefer PCI domain assignment via DT 'linux,pci-domain' and alias")
Reviewed-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220802105723.1055178-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df5d4b616ee76abc97e5bd348e22659c2b095b1c ]
of_get_next_parent() returns a node pointer with refcount incremented,
we should use of_node_put() on it when not need anymore.
Add missing of_node_put() in the error path to avoid refcount leak.
Fixes: ce21b3c964 ("[CELL] add support for MSI on Axon-based Cell systems")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220605065129.63906-1-linmq006@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 255b650cbec6849443ce2e0cdd187fd5e61c218c ]
of_find_node_by_path() returns a node pointer with
refcount incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.
Fixes: eac1e731b5 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220605053225.56125-1-linmq006@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ac059dacffa8ab2f7798f20e4bd3333890c541c ]
of_find_node_by_path() returns remote device nodepointer with
refcount incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.
Fixes: 0afacde3df ("[POWERPC] spufs: allow isolated mode apps by starting the SPE loader")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220603121543.22884-1-linmq006@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0fe1e96fef0a5c53b4c0d1500d356f3906000f81 ]
Other Linux architectures use DT property 'linux,pci-domain' for
specifying fixed PCI domain of PCI controller specified in Device-Tree.
And lot of Freescale powerpc boards have defined numbered pci alias in
Device-Tree for every PCIe controller which number specify preferred PCI
domain.
So prefer usage of DT property 'linux,pci-domain' (via function
of_get_pci_domain_nr()) and DT pci alias (via function
of_alias_get_id()) on powerpc architecture for assigning PCI domain to
PCI controller.
Fixes: 63a72284b1 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220706102148.5060-2-pali@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9be013b2a9ecb29b5168e4b9db0e48ed53acf37c ]
Commit 0e00a8c9fd ("powerpc: Allow CPU selection also on PPC32")
enlarged the CPU selection logic to PPC32 by removing depend to
PPC64, and failed to restrict that depend to E5500_CPU and E6500_CPU.
Fortunately that got unnoticed because -mcpu=8540 will override the
-mcpu=e500mc64 or -mpcu=e6500 as they are ealier, but that's
fragile and may no be right in the future.
Add back the depend PPC64 on E5500_CPU and E6500_CPU.
Fixes: 0e00a8c9fd ("powerpc: Allow CPU selection also on PPC32")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8abab4888da69ff78b73a56f64d9678a7bf684e9.1657549153.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc306186a130c6d9feb0aabc1c71b8ed1674a3bf ]
Virtual addresses of vmcore_info and os_info members are
wrongly passed to copy_oldmem_kernel(), while the function
expects physical address of the source. Instead, __pa()
macro should have been applied.
Yet, use of __pa() macro could be somehow confusing, since
copy_oldmem_kernel() may treat the source as an offset, not
as a direct physical address (that depens from the oldmem
availability and location).
Fix the virtual vs physical address confusion and make the
way the old lowcore is read consistent across all sources.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 890005a7d98f7452cfe86dcfb2aeeb7df01132ce ]
commit 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear
pending PMI before resetting an overflown PMC") added a new
function "pmi_irq_pending" in hw_irq.h. This function is to check
if there is a PMI marked as pending in Paca (PACA_IRQ_PMI).This is
used in power_pmu_disable in a WARN_ON. The intention here is to
provide a warning if there is PMI pending, but no counter is found
overflown.
During some of the perf runs, below warning is hit:
WARNING: CPU: 36 PID: 0 at arch/powerpc/perf/core-book3s.c:1332 power_pmu_disable+0x25c/0x2c0
Modules linked in:
-----
NIP [c000000000141c3c] power_pmu_disable+0x25c/0x2c0
LR [c000000000141c8c] power_pmu_disable+0x2ac/0x2c0
Call Trace:
[c000000baffcfb90] [c000000000141c8c] power_pmu_disable+0x2ac/0x2c0 (unreliable)
[c000000baffcfc10] [c0000000003e2f8c] perf_pmu_disable+0x4c/0x60
[c000000baffcfc30] [c0000000003e3344] group_sched_out.part.124+0x44/0x100
[c000000baffcfc80] [c0000000003e353c] __perf_event_disable+0x13c/0x240
[c000000baffcfcd0] [c0000000003dd334] event_function+0xc4/0x140
[c000000baffcfd20] [c0000000003d855c] remote_function+0x7c/0xa0
[c000000baffcfd50] [c00000000026c394] flush_smp_call_function_queue+0xd4/0x300
[c000000baffcfde0] [c000000000065b24] smp_ipi_demux_relaxed+0xa4/0x100
[c000000baffcfe20] [c0000000000cb2b0] xive_muxed_ipi_action+0x20/0x40
[c000000baffcfe40] [c000000000207c3c] __handle_irq_event_percpu+0x8c/0x250
[c000000baffcfee0] [c000000000207e2c] handle_irq_event_percpu+0x2c/0xa0
[c000000baffcff10] [c000000000210a04] handle_percpu_irq+0x84/0xc0
[c000000baffcff40] [c000000000205f14] generic_handle_irq+0x54/0x80
[c000000baffcff60] [c000000000015740] __do_irq+0x90/0x1d0
[c000000baffcff90] [c000000000016990] __do_IRQ+0xc0/0x140
[c0000009732f3940] [c000000bafceaca8] 0xc000000bafceaca8
[c0000009732f39d0] [c000000000016b78] do_IRQ+0x168/0x1c0
[c0000009732f3a00] [c0000000000090c8] hardware_interrupt_common_virt+0x218/0x220
This means that there is no PMC overflown among the active events
in the PMU, but there is a PMU pending in Paca. The function
"any_pmc_overflown" checks the PMCs on active events in
cpuhw->n_events. Code snippet:
<<>>
if (any_pmc_overflown(cpuhw))
clear_pmi_irq_pending();
else
WARN_ON(pmi_irq_pending());
<<>>
Here the PMC overflown is not from active event. Example: When we do
perf record, default cycles and instructions will be running on PMC6
and PMC5 respectively. It could happen that overflowed event is currently
not active and pending PMI is for the inactive event. Debug logs from
trace_printk:
<<>>
any_pmc_overflown: idx is 5: pmc value is 0xd9a
power_pmu_disable: PMC1: 0x0, PMC2: 0x0, PMC3: 0x0, PMC4: 0x0, PMC5: 0xd9a, PMC6: 0x80002011
<<>>
Here active PMC (from idx) is PMC5 , but overflown PMC is PMC6(0x80002011).
When we handle PMI interrupt for such cases, if the PMC overflown is
from inactive event, it will be ignored. Reference commit:
commit bc09c219b2 ("powerpc/perf: Fix finding overflowed PMC in interrupt")
Patch addresses two changes:
1) Fix 1 : Removal of warning ( WARN_ON(pmi_irq_pending()); )
We were printing warning if no PMC is found overflown among active PMU
events, but PMI pending in PACA. But this could happen in cases where
PMC overflown is not in active PMC. An inactive event could have caused
the overflow. Hence the warning is not needed. To know pending PMI is
from an inactive event, we need to loop through all PMC's which will
cause more SPR reads via mfspr and increase in context switch. Also in
existing function: perf_event_interrupt, already we ignore PMI's
overflown when it is from an inactive PMC.
2) Fix 2: optimization in clearing pending PMI.
Currently we check for any active PMC overflown before clearing PMI
pending in Paca. This is causing additional SPR read also. From point 1,
we know that if PMI pending in Paca from inactive cases, that is going
to be ignored during replay. Hence if there is pending PMI in Paca, just
clear it irrespective of PMC overflown or not.
In summary, remove the any_pmc_overflown check entirely in
power_pmu_disable. ie If there is a pending PMI in Paca, clear it, since
we are in pmu_disable. There could be cases where PMI is pending because
of inactive PMC ( which later when replayed also will get ignored ), so
WARN_ON could give false warning. Hence removing it.
Fixes: 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220522142256.24699-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e70cbd11b03889c92462cf52edb2bd023c798fa ]
Initialising the hwrng struct with zeros causes a
compile-time sparse warning:
$ ARCH=um make -j10 W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
...
CHECK arch/um/drivers/random.c
arch/um/drivers/random.c:31:31: sparse: warning: Using plain integer as NULL pointer
Fix the warning by not initialising the hwrng struct
with zeros as it is initialised anyway during module
init.
Fixes: 72d3e093afae ("um: random: Register random as hwrng-core device")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christopher Obbard <chris.obbard@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 628ccfc8f5f79dd548319408fcc53949fe97b258 ]
The 'pdev' and 'netdev' need to be released in error cases of
iss_net_configure().
Change the return type of iss_net_configure() to void, because it's
not used.
Fixes: 7282bee787 ("[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 8")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>