commit 02b670c1f88e78f42a6c5aee155c7b26960ca054 upstream.
The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:
https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114 ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[gpiccoli: Backport the patch due to differences in the trees. The main change
between 5.10.y and 5.15.y is due to renaming the fixup function, by
commit 6456a2a69ee1 ("x86/fault: Rename no_context() to kernelmode_fixup_or_oops()").
Following 2 commits cause divergence in the diffs too (in the removed lines):
cd072dab453a ("x86/fault: Add a helper function to sanitize error code")
d4ffd5df9d18 ("x86/fault: Fix wrong signal when vsyscall fails with pkey")
Finally, there is context adjustment in the processor.h file.]
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3fb0fdb3bbe7aed495109b3296b06c2409734023 ]
On 32-bit kernels, the stackprotector canary is quite nasty -- it is
stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
percpu storage. It's even nastier because it means that whether %gs
contains userspace state or kernel state while running kernel code
depends on whether stackprotector is enabled (this is
CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
that segment selectors work. Supporting both variants is a
maintenance and testing mess.
Merely rearranging so that percpu and the stack canary
share the same segment would be messy as the 32-bit percpu address
layout isn't currently compatible with putting a variable at a fixed
offset.
Fortunately, GCC 8.1 added options that allow the stack canary to be
accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
percpu variable. This lets us get rid of all of the code to manage the
stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
(That name is special. We could use any symbol we want for the
%fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
name other than __stack_chk_guard.)
Forcibly disable stackprotector on older compilers that don't support
the new options and turn the stack canary into a percpu variable. The
"lazy GS" approach is now used for all 32-bit configurations.
Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
it loads the GS selector and updates the user GSBASE accordingly. (This
is unchanged.) On 32-bit kernels, it loads the GS selector and updates
GSBASE, which is now always the user base. This means that the overall
effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.
[ bp: Massage commit message. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
Stable-dep-of: e3f269ed0acc ("x86/pm: Work around false positive kmemleak report in msr_build_context()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 77245f1c3c6495521f6a3af082696ee2f8ce3921 upstream.
Under certain circumstances, an integer division by 0 which faults, can
leave stale quotient data from a previous division operation on Zen1
microarchitectures.
Do a dummy division 0/1 before returning from the #DE exception handler
in order to avoid any leaks of potentially sensitive data.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit: fb3bd914b3ec28f5fb697ac55c4846ac2d542855
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.
The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.
To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0dd9245aa9e25a697181f6085692272c9ec61bc ]
The kernel caches each CPU's feature bits at boot in an x86_capability[]
structure. However, the capabilities in the BSP's copy can be turned off
as a result of certain command line parameters or configuration
restrictions, for example the SGX bit. This can cause a mismatch when
comparing the values before and after the microcode update.
Another example is X86_FEATURE_SRBDS_CTRL which gets added only after
microcode update:
# --- cpuid.before 2023-01-21 14:54:15.652000747 +0100
# +++ cpuid.after 2023-01-21 14:54:26.632001024 +0100
# @@ -10,7 +10,7 @@ CPU:
# 0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
# 0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x11142120
# 0x00000006 0x00: eax=0x000027f7 ebx=0x00000002 ecx=0x00000001 edx=0x00000000
# - 0x00000007 0x00: eax=0x00000000 ebx=0x029c6fbf ecx=0x40000000 edx=0xbc002400
# + 0x00000007 0x00: eax=0x00000000 ebx=0x029c6fbf ecx=0x40000000 edx=0xbc002e00
^^^
and which proves for a gazillionth time that late loading is a bad bad
idea.
microcode_check() is called after an update to report any previously
cached CPUID bits which might have changed due to the update.
Therefore, store the cached CPU caps before the update and compare them
with the CPU caps after the microcode update has succeeded.
Thus, the comparison is done between the CPUID *hardware* bits before
and after the upgrade instead of using the cached, possibly runtime
modified values in BSP's boot_cpu_data copy.
As a result, false warnings about CPUID bits changes are avoided.
[ bp:
- Massage.
- Add SRBDS_CTRL example.
- Add kernel-doc.
- Incorporate forgotten review feedback from dhansen.
]
Fixes: 1008c52c09 ("x86/CPU: Add a microcode loader callback")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230109153555.4986-3-ashok.raj@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab31c74455c64e69342ddab21fd9426fcbfefde7 ]
Add a parameter to store CPU capabilities before performing a microcode
update so that CPU capabilities can be compared before and after update.
[ bp: Massage. ]
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230109153555.4986-2-ashok.raj@intel.com
Stable-dep-of: c0dd9245aa9e ("x86/microcode: Check CPU capabilities after late microcode update correctly")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b1efd0ff4bd16e8bb8607ba566b03f2024a830bb ]
SEV-ES guests require properly setup task register with which the TSS
descriptor in the GDT can be located so that the IST-type #VC exception
handler which they need to function properly, can be executed.
This setup needs to happen before attempting to load microcode in
ucode_cpu_init() on secondary CPUs which can cause such #VC exceptions.
Simplify the machinery by running that exception setup from a new function
cpu_init_secondary() and explicitly call cpu_init_exception_handling() for
the boot CPU before cpu_init(). The latter prepares for fixing and
simplifying the exception/IST setup on the boot CPU.
There should be no functional changes resulting from this patch.
[ tglx: Reworked it so cpu_init_exception_handling() stays seperate ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lai Jiangshan <laijs@linux.alibaba.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/87k0o6gtvu.ffs@nanos.tec.linutronix.de
Stable-dep-of: c0dd9245aa9e ("x86/microcode: Check CPU capabilities after late microcode update correctly")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit b968e84b509da593c50dc3db679e1d33de701f78 upstream.
Since commit c8137ace56 ("x86/iopl: Restrict iopl() permission
scope") it's possible to emulate iopl(3) using ioperm(), except for
the CLI/STI usage.
Userspace CLI/STI usage is very dubious (read broken), since any
exception taken during that window can lead to rescheduling anyway (or
worse). The IOPL(2) manpage even states that usage of CLI/STI is highly
discouraged and might even crash the system.
Of course, that won't stop people and HP has the dubious honour of
being the first vendor to be found using this in their hp-health
package.
In order to enable this 'software' to still 'work', have the #GP treat
the CLI/STI instructions as NOPs when iopl(3). Warn the user that
their program is doing dubious things.
Fixes: a24ca99768 ("x86/iopl: Remove legacy IOPL option")
Reported-by: Ondrej Zary <linux@zary.sk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org # v5.5+
Link: https://lkml.kernel.org/r/20210918090641.GD5106@worktop.programming.kicks-ass.net
Signed-off-by: Ondrej Zary <linux@zary.sk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66c1b6d74cd7035e85c426f0af4aede19e805c8a upstream.
Move TS_COMPAT back to asm/thread_info.h, close to TS_I386_REGS_POKED.
It was moved to asm/processor.h by b9d989c721 ("x86/asm: Move the
thread_info::status field to thread_struct"), then later 37a8f7c383
("x86/asm: Move 'status' from thread_struct to thread_info") moved the
'status' field back but TS_COMPAT was forgotten.
Preparatory patch to fix the COMPAT case for get_nr_restart_syscall()
Fixes: 609c19a385 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174649.GA17880@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode
Pull x86 SEV-ES support from Borislav Petkov:
"SEV-ES enhances the current guest memory encryption support called SEV
by also encrypting the guest register state, making the registers
inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared
between the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
so in order for that exception mechanism to work, the early x86 init
code needed to be made able to handle exceptions, which, in itself,
brings a bunch of very nice cleanups and improvements to the early
boot code like an early page fault handler, allowing for on-demand
building of the identity mapping. With that, !KASLR configurations do
not use the EFI page table anymore but switch to a kernel-controlled
one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly separate
from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and
behind static keys to minimize the performance impact on !SEV-ES
setups.
Work by Joerg Roedel and Thomas Lendacky and others"
* tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
x86/sev-es: Check required CPU features for SEV-ES
x86/efi: Add GHCB mappings when SEV-ES is active
x86/sev-es: Handle NMI State
x86/sev-es: Support CPU offline/online
x86/head/64: Don't call verify_cpu() on starting APs
x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
x86/realmode: Setup AP jump table
x86/realmode: Add SEV-ES specific trampoline entry point
x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
x86/sev-es: Handle #DB Events
x86/sev-es: Handle #AC Events
x86/sev-es: Handle VMMCALL Events
x86/sev-es: Handle MWAIT/MWAITX Events
x86/sev-es: Handle MONITOR/MONITORX Events
x86/sev-es: Handle INVD Events
x86/sev-es: Handle RDPMC Events
x86/sev-es: Handle RDTSC(P) Events
...
Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more. To properly
handle the TASK_SIZE_MAX checking for 4 vs 5-level page tables on
x86 a new alternative is introduced, which just like the one in
entry_64.S has to use the hardcoded virtual address bits to escape
the fact that TASK_SIZE_MAX isn't actually a constant when 5-level
page tables are enabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
At least for 64-bit this moves them closer to some of the defines
they are based on, and it prepares for using the TASK_SIZE_MAX
definition from assembly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Current usage of thread.debugreg6 is convoluted at best. It starts life as
a copy of the hardware DR6 value, but then various bits are cleared and
set.
Replace this with a new variable thread.virtual_dr6 that is initialized to
0 when DR6 is read and only gains bits, at the same time the actual (on
stack) dr6 value which is read from the hardware only gets bits cleared.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20200902133201.415372940@infradead.org
Pull x86 fsgsbase from Thomas Gleixner:
"Support for FSGSBASE. Almost 5 years after the first RFC to support
it, this has been brought into a shape which is maintainable and
actually works.
This final version was done by Sasha Levin who took it up after Intel
dropped the ball. Sasha discovered that the SGX (sic!) offerings out
there ship rogue kernel modules enabling FSGSBASE behind the kernels
back which opens an instantanious unpriviledged root hole.
The FSGSBASE instructions provide a considerable speedup of the
context switch path and enable user space to write GSBASE without
kernel interaction. This enablement requires careful handling of the
exception entries which go through the paranoid entry path as they
can no longer rely on the assumption that user GSBASE is positive (as
enforced via prctl() on non FSGSBASE enabled systemn).
All other entries (syscalls, interrupts and exceptions) can still just
utilize SWAPGS unconditionally when the entry comes from user space.
Converting these entries to use FSGSBASE has no benefit as SWAPGS is
only marginally slower than WRGSBASE and locating and retrieving the
kernel GSBASE value is not a free operation either. The real benefit
of RD/WRGSBASE is the avoidance of the MSR reads and writes.
The changes come with appropriate selftests and have held up in field
testing against the (sanitized) Graphene-SGX driver"
* tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/fsgsbase: Fix Xen PV support
x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
selftests/x86/fsgsbase: Add a missing memory constraint
selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test
selftests/x86: Add a syscall_arg_fault_64 test for negative GSBASE
selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE
selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write
Documentation/x86/64: Add documentation for GS/FS addressing mode
x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit
x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
x86/entry/64: Introduce the FIND_PERCPU_BASE macro
x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation
x86/process/64: Use FSGSBASE instructions on thread copy and ptrace
x86/process/64: Use FSBSBASE in switch_to() if available
x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE
x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
...
save_fsgs_for_kvm() is invoked via
vcpu_enter_guest()
kvm_x86_ops.prepare_guest_switch(vcpu)
vmx_prepare_switch_to_guest()
save_fsgs_for_kvm()
with preemption disabled, but interrupts enabled.
The upcoming FSGSBASE based GS safe needs interrupts to be disabled. This
could be done in the helper function, but that function is also called from
switch_to() which has interrupts disabled already.
Disable interrupts inside save_fsgs_for_kvm() and rename the function to
current_save_fsgs() so it can be invoked from other places.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200528201402.1708239-7-sashal@kernel.org
Add cpu feature conditional FSGSBASE access to the relevant helper
functions. That allows to accelerate certain FS/GS base operations in
subsequent changes.
Note, that while possible, the user space entry/exit GSBASE operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.
To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive()
helpers. Note, for Xen PV, paravirt hooks can be added later as they might
allow a very efficient but different implementation.
[ tglx: Massaged changelog, convert it to noinstr and force inline
native_swapgs() ]
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-5-sashal@kernel.org
Pull objtool updates from Ingo Molnar:
"There are a lot of objtool changes in this cycle, all across the map:
- Speed up objtool significantly, especially when there are large
number of sections
- Improve objtool's understanding of special instructions such as
IRET, to reduce the number of annotations required
- Implement 'noinstr' validation
- Do baby steps for non-x86 objtool use
- Simplify/fix retpoline decoding
- Add vmlinux validation
- Improve documentation
- Fix various bugs and apply smaller cleanups"
* tag 'objtool-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
objtool: Enable compilation of objtool for all architectures
objtool: Move struct objtool_file into arch-independent header
objtool: Exit successfully when requesting help
objtool: Add check_kcov_mode() to the uaccess safelist
samples/ftrace: Fix asm function ELF annotations
objtool: optimize add_dead_ends for split sections
objtool: use gelf_getsymshndx to handle >64k sections
objtool: Allow no-op CFI ops in alternatives
x86/retpoline: Fix retpoline unwind
x86: Change {JMP,CALL}_NOSPEC argument
x86: Simplify retpoline declaration
x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
objtool: Add support for intra-function calls
objtool: Move the IRET hack into the arch decoder
objtool: Remove INSN_STACK
objtool: Make handle_insn_ops() unconditional
objtool: Rework allocating stack_ops on decode
objtool: UNWIND_HINT_RET_OFFSET should not check registers
objtool: is_fentry_call() crashes if call has no destination
x86,smap: Fix smap_{save,restore}() alternatives
...
The original Memory Bandwidth Monitoring (MBM) architectural
definition defines counters of up to 62 bits in the
IA32_QM_CTR MSR while the first-generation MBM implementation
uses statically defined 24 bit counters.
Expand the MBM CPUID enumeration properties to include the MBM
counter width. The previously undefined EAX output register contains,
in bits [7:0], the MBM counter width encoded as an offset from
24 bits. Enumerating this property is only specified for Intel
CPUs.
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.com
Teach objtool a little more about IRET so that we can avoid using the
SAVE/RESTORE annotation. In particular, make the weird corner case in
insn->restore go away.
The purpose of that corner case is to deal with the fact that
UNWIND_HINT_RESTORE lands on the instruction after IRET, but that
instruction can end up being outside the basic block, consider:
if (cond)
sync_core()
foo();
Then the hint will land on foo(), and we'll encounter the restore
hint without ever having seen the save hint.
By teaching objtool about the arch specific exception frame size, and
assuming that any IRET in an STT_FUNC symbol is an exception frame
sized POP, we can remove the use of save/restore hints for this code.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115118.631224674@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 cleanups from Ingo Molnar:
"This topic tree contains more commits than usual:
- most of it are uaccess cleanups/reorganization by Al
- there's a bunch of prototype declaration (--Wmissing-prototypes)
cleanups
- misc other cleanups all around the map"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/mm/set_memory: Fix -Wmissing-prototypes warnings
x86/efi: Add a prototype for efi_arch_mem_reserve()
x86/mm: Mark setup_emu2phys_nid() static
x86/jump_label: Move 'inline' keyword placement
x86/platform/uv: Add a missing prototype for uv_bau_message_interrupt()
kill uaccess_try()
x86: unsafe_put-style macro for sigmask
x86: x32_setup_rt_frame(): consolidate uaccess areas
x86: __setup_rt_frame(): consolidate uaccess areas
x86: __setup_frame(): consolidate uaccess areas
x86: setup_sigcontext(): list user_access_{begin,end}() into callers
x86: get rid of put_user_try in __setup_rt_frame() (both 32bit and 64bit)
x86: ia32_setup_rt_frame(): consolidate uaccess areas
x86: ia32_setup_frame(): consolidate uaccess areas
x86: ia32_setup_sigcontext(): lift user_access_{begin,end}() into the callers
x86/alternatives: Mark text_poke_loc_init() static
x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()
x86/mm: Drop pud_mknotpresent()
x86: Replace setup_irq() by request_irq()
x86/configs: Slightly reduce defconfigs
...
Pull x86 MPX removal from Dave Hansen:
"MPX requires recompiling applications, which requires compiler
support. Unfortunately, GCC 9.1 is expected to be be released without
support for MPX. This means that there was only a relatively small
window where folks could have ever used MPX. It failed to gain wide
adoption in the industry, and Linux was the only mainstream OS to ever
support it widely.
Support for the feature may also disappear on future processors.
This set completes the process that we started during the 5.4 merge
window when the MPX prctl()s were removed. XSAVE support is left in
place, which allows MPX-using KVM guests to continue to function"
* tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx:
x86/mpx: remove MPX from arch/x86
mm: remove arch_bprm_mm_init() hook
x86/mpx: remove bounds exception code
x86/mpx: remove build infrastructure
x86/alternatives: add missing insn.h include
From: Dave Hansen <dave.hansen@linux.intel.com>
MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).
This removes all the remaining (dead at this point) MPX handling
code remaining in the tree. The only remaining code is the XSAVE
support for MPX state which is currently needd for KVM to handle
VMs which might use MPX.
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill
the capabilities during IA32_FEAT_CTL MSR initialization.
Make the VMX capabilities dependent on IA32_FEAT_CTL and
X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't
possibly support VMX, or when /proc/cpuinfo is not available.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com
Add a VMX-specific variant of X86_FEATURE_* flags, which will eventually
supplant the synthetic VMX flags defined in cpufeatures word 8. Use the
Intel-defined layouts for the major VMX execution controls so that their
word entries can be directly populated from their respective MSRs, and
so that the VMX_FEATURE_* flags can be used to define the existing bit
definitions in asm/vmx.h, i.e. force developers to define a VMX_FEATURE
flag when adding support for a new hardware feature.
The majority of Intel's (and compatible CPU's) VMX capabilities are
enumerated via MSRs and not CPUID, i.e. querying /proc/cpuinfo doesn't
naturally provide any insight into the virtualization capabilities of
VMX enabled CPUs. Commit
e38e05a858 ("x86: extended "flags" to show virtualization HW feature
in /proc/cpuinfo")
attempted to address the issue by synthesizing select VMX features into
a Linux-defined word in cpufeatures.
Lack of reporting of VMX capabilities via /proc/cpuinfo is problematic
because there is no sane way for a user to query the capabilities of
their platform, e.g. when trying to find a platform to test a feature or
debug an issue that has a hardware dependency. Lack of reporting is
especially problematic when the user isn't familiar with VMX, e.g. the
format of the MSRs is non-standard, existence of some MSRs is reported
by bits in other MSRs, several "features" from KVM's point of view are
enumerated as 3+ distinct features by hardware, etc...
The synthetic cpufeatures approach has several flaws:
- The set of synthesized VMX flags has become extremely stale with
respect to the full set of VMX features, e.g. only one new flag
(EPT A/D) has been added in the the decade since the introduction of
the synthetic VMX features. Failure to keep the VMX flags up to
date is likely due to the lack of a mechanism that forces developers
to consider whether or not a new feature is worth reporting.
- The synthetic flags may incorrectly be misinterpreted as affecting
kernel behavior, i.e. KVM, the kernel's sole consumer of VMX,
completely ignores the synthetic flags.
- New CPU vendors that support VMX have duplicated the hideous code
that propagates VMX features from MSRs to cpufeatures. Bringing the
synthetic VMX flags up to date would exacerbate the copy+paste
trainwreck.
Define separate VMX_FEATURE flags to set the stage for enumerating VMX
capabilities outside of the cpu_has() framework, and for adding
functional usage of VMX_FEATURE_* to help ensure the features reported
via /proc/cpuinfo is up to date with respect to kernel recognition of
VMX capabilities.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-10-sean.j.christopherson@intel.com
There are three problems with the current layout of the doublefault
stack and TSS. First, the TSS is only cacheline-aligned, which is
not enough -- if the hardware portion of the TSS (struct x86_hw_tss)
crosses a page boundary, horrible things happen [0]. Second, the
stack and TSS are global, so simultaneous double faults on different
CPUs will cause massive corruption. Third, the whole mechanism
won't work if user CR3 is loaded, resulting in a triple fault [1].
Let the doublefault stack and TSS share a page (which prevents the
TSS from spanning a page boundary), make it percpu, and move it into
cpu_entry_area. Teach the stack dump code about the doublefault
stack.
[0] Real hardware will read past the end of the page onto the next
*physical* page if a task switch happens. Virtual machines may
have any number of bugs, and I would consider it reasonable for
a VM to summarily kill the guest if it tries to task-switch to
a page-spanning TSS.
[1] Real hardware triple faults. At least some VMs seem to hang.
I'm not sure what's going on.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 64-bit doublefault handler is much nicer than the 32-bit one.
As a first step toward unifying them, make the 64-bit handler
self-contained. This should have no effect no functional effect
except in the odd case of x86_64 with CONFIG_DOUBLEFAULT=n in which
case it will change the logging a bit.
This also gets rid of CONFIG_DOUBLEFAULT configurability on 64-bit
kernels. It didn't do anything useful -- CONFIG_DOUBLEFAULT=n
didn't actually disable doublefault handling on x86_64.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After the following commit:
05b042a194: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise")
'struct cpu_entry_area' has to be Kconfig invariant, so that we always
have a matching CPU_ENTRY_AREA_PAGES size.
This commit added a CONFIG_X86_IOPL_IOPERM dependency to tss_struct:
111e7b15cf: ("x86/ioperm: Extend IOPL config to control ioperm() as well")
Which, if CONFIG_X86_IOPL_IOPERM is turned off, reduces the size of
cpu_entry_area by two pages, triggering the assert:
./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_202’ declared with attribute error: BUILD_BUG_ON failed: (CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE
Simplify the Kconfig dependencies and make cpu_entry_area constant
size on 32-bit kernels again.
Fixes: 05b042a194: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise")
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 iopl updates from Ingo Molnar:
"This implements a nice simplification of the iopl and ioperm code that
Thomas Gleixner discovered: we can implement the IO privilege features
of the iopl system call by using the IO permission bitmap in
permissive mode, while trapping CLI/STI/POPF/PUSHF uses in user-space
if they change the interrupt flag.
This implements that feature, with testing facilities and related
cleanups"
[ "Simplification" may be an over-statement. The main goal is to avoid
the cli/sti of iopl by effectively implementing the IO port access
parts of iopl in terms of ioperm.
This may end up not workign well in case people actually depend on
cli/sti being available, or if there are mixed uses of iopl and
ioperm. We will see.. - Linus ]
* 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/ioperm: Fix use of deprecated config option
x86/entry/32: Clarify register saving in __switch_to_asm()
selftests/x86/iopl: Extend test to cover IOPL emulation
x86/ioperm: Extend IOPL config to control ioperm() as well
x86/iopl: Remove legacy IOPL option
x86/iopl: Restrict iopl() permission scope
x86/iopl: Fixup misleading comment
selftests/x86/ioperm: Extend testing so the shared bitmap is exercised
x86/ioperm: Share I/O bitmap if identical
x86/ioperm: Remove bitmap if all permissions dropped
x86/ioperm: Move TSS bitmap update to exit to user work
x86/ioperm: Add bitmap sequence number
x86/ioperm: Move iobitmap data into a struct
x86/tss: Move I/O bitmap data into a seperate struct
x86/io: Speedup schedule out of I/O bitmap user
x86/ioperm: Avoid bitmap allocation if no permissions are set
x86/ioperm: Simplify first ioperm() invocation logic
x86/iopl: Cleanup include maze
x86/tss: Fix and move VMX BUILD_BUG_ON()
x86/cpu: Unify cpu_init()
...
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- Cross-arch changes to move the linker sections for NOTES and
EXCEPTION_TABLE into the RO_DATA area, where they belong on most
architectures. (Kees Cook)
- Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
trap jumps into the middle of those padding areas instead of
sliding execution. (Kees Cook)
- A thorough cleanup of symbol definitions within x86 assembler code.
The rather randomly named macros got streamlined around a
(hopefully) straightforward naming scheme:
SYM_START(name, linkage, align...)
SYM_END(name, sym_type)
SYM_FUNC_START(name)
SYM_FUNC_END(name)
SYM_CODE_START(name)
SYM_CODE_END(name)
SYM_DATA_START(name)
SYM_DATA_END(name)
etc - with about three times of these basic primitives with some
label, local symbol or attribute variant, expressed via postfixes.
No change in functionality intended. (Jiri Slaby)
- Misc other changes, cleanups and smaller fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
x86/entry/64: Remove pointless jump in paranoid_exit
x86/entry/32: Remove unused resume_userspace label
x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
m68k: Convert missed RODATA to RO_DATA
x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
x86/mm: Report actual image regions in /proc/iomem
x86/mm: Report which part of kernel image is freed
x86/mm: Remove redundant address-of operators on addresses
xtensa: Move EXCEPTION_TABLE to RO_DATA segment
powerpc: Move EXCEPTION_TABLE to RO_DATA segment
parisc: Move EXCEPTION_TABLE to RO_DATA segment
microblaze: Move EXCEPTION_TABLE to RO_DATA segment
ia64: Move EXCEPTION_TABLE to RO_DATA segment
h8300: Move EXCEPTION_TABLE to RO_DATA segment
c6x: Move EXCEPTION_TABLE to RO_DATA segment
arm64: Move EXCEPTION_TABLE to RO_DATA segment
alpha: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Actually use _etext for the end of the text segment
vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
...
Pull x86 cpu and fpu updates from Ingo Molnar:
- math-emu fixes
- CPUID updates
- sanity-check RDRAND output to see whether the CPU at least pretends
to produce random data
- various unaligned-access across cachelines fixes in preparation of
hardware level split-lock detection
- fix MAXSMP constraints to not allow !CPUMASK_OFFSTACK kernels with
larger than 512 NR_CPUS
- misc FPU related cleanups
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Align the x86_capability array to size of unsigned long
x86/cpu: Align cpu_caps_cleared and cpu_caps_set to unsigned long
x86/umip: Make the comments vendor-agnostic
x86/Kconfig: Rename UMIP config parameter
x86/Kconfig: Enforce limit of 512 CPUs with MAXSMP and no CPUMASK_OFFSTACK
x86/cpufeatures: Add feature bit RDPRU on AMD
x86/math-emu: Limit MATH_EMULATION to 486SX compatibles
x86/math-emu: Check __copy_from_user() result
x86/rdrand: Sanity-check RDRAND output
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Use XFEATURE_FP/SSE enum values instead of hardcoded numbers
x86/fpu: Shrink space allocated for xstate_comp_offsets
x86/fpu: Update stale variable name in comment
If iopl() is disabled, then providing ioperm() does not make much sense.
Rename the config option and disable/enable both syscalls with it. Guard
the code with #ifdefs where appropriate.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy
cruft dealing with the (e)flags based IOPL mechanism.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com> (Paravirt and Xen parts)
Acked-by: Andy Lutomirski <luto@kernel.org>
The access to the full I/O port range can be also provided by the TSS I/O
bitmap, but that would require to copy 8k of data on scheduling in the
task. As shown with the sched out optimization TSS.io_bitmap_base can be
used to switch the incoming task to a preallocated I/O bitmap which has all
bits zero, i.e. allows access to all I/O ports.
Implementing this allows to provide an iopl() emulation mode which restricts
the IOPL level 3 permissions to I/O port access but removes the STI/CLI
permission which is coming with the hardware IOPL mechansim.
Provide a config option to switch IOPL to emulation mode, make it the
default and while at it also provide an option to disable IOPL completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Add a globally unique sequence number which is incremented when ioperm() is
changing the I/O bitmap of a task. Store the new sequence number in the
io_bitmap structure and compare it with the sequence number of the I/O
bitmap which was last loaded on a CPU. Only update the bitmap if the
sequence is different.
That should further reduce the overhead of I/O bitmap scheduling when there
are only a few I/O bitmap users on the system.
The 64bit sequence counter is sufficient. A wraparound of the sequence
counter assuming an ioperm() call every nanosecond would require about 584
years of uptime.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No point in having all the data in thread_struct, especially as upcoming
changes add more.
Make the bitmap in the new struct accessible as array of longs and as array
of characters via a union, so both the bitmap functions and the update
logic can avoid type casts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move the non hardware portion of I/O bitmap data into a seperate struct for
readability sake.
Originally-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There is no requirement to update the TSS I/O bitmap when a thread using it is
scheduled out and the incoming thread does not use it.
For the permission check based on the TSS I/O bitmap the CPU calculates the memory
location of the I/O bitmap by the address of the TSS and the io_bitmap_base member
of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the
offset to an address outside of the TSS limit.
If an I/O instruction is issued from user space the TSS limit causes #GP to be
raised in the same was as valid I/O bitmap with all bits set to 1 would do.
This removes the extra work when an I/O bitmap using task is scheduled out
and puts the burden on the rare I/O bitmap users when they are scheduled
in.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The x86_capability array in cpuinfo_x86 is of type u32 and thus is
naturally aligned to 4 bytes. But, set_bit() and clear_bit() require the
array to be aligned to size of unsigned long (i.e. 8 bytes on 64-bit
systems).
The array pointer is handed into atomic bit operations. If the access is
not aligned to unsigned long then the atomic bit operations can end up
crossing a cache line boundary, which causes the CPU to do a full bus lock
as it can't lock both cache lines at once. The bus lock operation is heavy
weight and can cause severe performance degradation.
The upcoming #AC split lock detection mechanism will issue warnings for
this kind of access.
Force the alignment of the array to unsigned long. This avoids the massive
code changes which would be required when converting the array data type to
unsigned long.
[ tglx: Rewrote changelog so it contains information WHY this is required ]
Suggested-by: David Laight <David.Laight@aculab.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190916223958.27048-4-tony.luck@intel.com
TSX Async Abort (TAA) is a side channel vulnerability to the internal
buffers in some Intel processors similar to Microachitectural Data
Sampling (MDS). In this case, certain loads may speculatively pass
invalid data to dependent operations when an asynchronous abort
condition is pending in a TSX transaction.
This includes loads with no fault or assist condition. Such loads may
speculatively expose stale data from the uarch data structures as in
MDS. Scope of exposure is within the same-thread and cross-thread. This
issue affects all current processors that support TSX, but do not have
ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.
On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
using VERW or L1D_FLUSH, there is no additional mitigation needed for
TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
disabling the Transactional Synchronization Extensions (TSX) feature.
A new MSR IA32_TSX_CTRL in future and current processors after a
microcode update can be used to control the TSX feature. There are two
bits in that MSR:
* TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
Transactional Memory (RTM).
* TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
disabled with updated microcode but still enumerated as present by
CPUID(EAX=7).EBX{bit4}.
The second mitigation approach is similar to MDS which is clearing the
affected CPU buffers on return to user space and when entering a guest.
Relevant microcode update is required for the mitigation to work. More
details on this approach can be found here:
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
The TSX feature can be controlled by the "tsx" command line parameter.
If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
deployed. The effective mitigation state can be read from sysfs.
[ bp:
- massage + comments cleanup
- s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
- remove partial TAA mitigation in update_mds_branch_idle() - Josh.
- s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
]
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>