Remove the wp_works_ok member of struct cpuinfo_x86. It's an
optimization back from Linux v0.99 times where we had no fixup support
yet and did the CR0.WP test via special code in the page fault handler.
The < 0 test was an optimization to not do the special casing for each
NULL ptr access violation but just for the first one doing the WP test.
Today it serves no real purpose as the test no longer needs special code
in the page fault handler and the only call side -- mem_init() -- calls
it just once, anyway. However, Xen pre-initializes it to 1, to skip the
test.
Doing the test again for Xen should be no issue at all, as even the
commit introducing skipping the test (commit d560bc6157 ("x86, xen:
Suppress WP test on Xen")) mentioned it being ban aid only. And, in
fact, testing the patch on Xen showed nothing breaks.
The pre-fixup times are long gone and with the removal of the fallback
handling code in commit a5c2a893db ("x86, 386 removal: Remove
CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway.
So just get rid of the "optimization" and do the test unconditionally.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/1486933932-585-3-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull KVM updates from Paolo Bonzini:
"4.11 is going to be a relatively large release for KVM, with a little
over 200 commits and noteworthy changes for most architectures.
ARM:
- GICv3 save/restore
- cache flushing fixes
- working MSI injection for GICv3 ITS
- physical timer emulation
MIPS:
- various improvements under the hood
- support for SMP guests
- a large rewrite of MMU emulation. KVM MIPS can now use MMU
notifiers to support copy-on-write, KSM, idle page tracking,
swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is
also supported, so that writes to some memory regions can be
treated as MMIO. The new MMU also paves the way for hardware
virtualization support.
PPC:
- support for POWER9 using the radix-tree MMU for host and guest
- resizable hashed page table
- bugfixes.
s390:
- expose more features to the guest
- more SIMD extensions
- instruction execution protection
- ESOP2
x86:
- improved hashing in the MMU
- faster PageLRU tracking for Intel CPUs without EPT A/D bits
- some refactoring of nested VMX entry/exit code, preparing for live
migration support of nested hypervisors
- expose yet another AVX512 CPUID bit
- host-to-guest PTP support
- refactoring of interrupt injection, with some optimizations thrown
in and some duct tape removed.
- remove lazy FPU handling
- optimizations of user-mode exits
- optimizations of vcpu_is_preempted() for KVM guests
generic:
- alternative signaling mechanism that doesn't pound on
tsk->sighand->siglock"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits)
x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
x86/paravirt: Change vcp_is_preempted() arg type to long
KVM: VMX: use correct vmcs_read/write for guest segment selector/base
x86/kvm/vmx: Defer TR reload after VM exit
x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
x86/kvm/vmx: Simplify segment_base()
x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
x86/kvm/vmx: Don't fetch the TSS base from the GDT
x86/asm: Define the kernel TSS limit in a macro
kvm: fix page struct leak in handle_vmon
KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now
KVM: Return an error code only as a constant in kvm_get_dirty_log()
KVM: Return an error code only as a constant in kvm_get_dirty_log_protect()
KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl()
KVM: x86: remove code for lazy FPU handling
KVM: race-free exit from KVM_RUN without POSIX signals
KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message
KVM: PPC: Book3S PR: Ratelimit copy data failure error messages
KVM: Support vCPU-based gfn->hva cache
KVM: use separate generations for each address space
...
Historically, the entire TSS + io bitmap structure was cacheline
aligned, but commit ca241c7503 ("x86: unify tss_struct") changed it
(presumably inadvertently) so that the fixed-layout hardware part is
cacheline-aligned and the io bitmap is after the padding. This wastes
24 bytes (the hardware part should be 104 bytes, but this pads it to
128 bytes) and, serves no purpose, and causes sizeof(struct
x86_hw_tss) to have a confusing value.
Drop the pointless alignment.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than open-coding the kernel TSS limit in set_tss_desc(), make
it a real macro near the TSS layout definition.
This is purely a cleanup.
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit:
a33d331761 ("x86/CPU/AMD: Fix Bulldozer topology")
restored the initial approach we had with the Fam15h topology of
enumerating CU (Compute Unit) threads as cores. And this is still
correct - they're beefier than HT threads but still have some
shared functionality.
Our current approach has a problem with the Mad Max Steam game, for
example. Yves Dionne reported a certain "choppiness" while playing on
v4.9.5.
That problem stems most likely from the fact that the CU threads share
resources within one CU and when we schedule to a thread of a different
compute unit, this incurs latency due to migrating the working set to a
different CU through the caches.
When the thread siblings mask mirrors that aspect of the CUs and
threads, the scheduler pays attention to it and tries to schedule within
one CU first. Which takes care of the latency, of course.
Reported-by: Yves Dionne <yves.dionne@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 idle updates from Ingo Molnar:
"There were two bigger changes in this development cycle:
- remove idle notifiers:
32 files changed, 74 insertions(+), 803 deletions(-)
These notifiers were of questionable value and the main usecase,
the i7300 driver, was essentially unmaintained and can be removed,
plus modern power management concepts don't need the callback - so
use this golden opportunity and get rid of this opaque and fragile
callback from a latency sensitive code path.
(Len Brown, Thomas Gleixner)
- improve the AMD Erratum 400 workaround that used high overhead MSR
polling in the idle loop (Borisla Petkov, Thomas Gleixner)"
* 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Remove empty idle.h header
x86/amd: Simplify AMD E400 aware idle routine
x86/amd: Check for the C1E bug post ACPI subsystem init
x86/bugs: Separate AMD E400 erratum and C1E bug
x86/cpufeature: Provide helper to set bugs bits
x86/idle: Remove enter_idle(), exit_idle()
x86: Remove x86_test_and_clear_bit_percpu()
x86/idle: Remove is_idle flag
x86/idle: Remove idle_notifier
i7300_idle: Remove this driver
Pull x86 asm updates from Ingo Molnar:
"The main changes in this development cycle were:
- a large number of call stack dumping/printing improvements: higher
robustness, better cross-context dumping, improved output, etc.
(Josh Poimboeuf)
- vDSO getcpu() performance improvement for future Intel CPUs with
the RDPID instruction (Andy Lutomirski)
- add two new Intel AVX512 features and the CPUID support
infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela,
He Chen)
- more copy-user unification (Borislav Petkov)
- entry code assembly macro simplifications (Alexander Kuleshov)
- vDSO C/R support improvements (Dmitry Safonov)
- misc fixes and cleanups (Borislav Petkov, Paul Bolle)"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
scripts/decode_stacktrace.sh: Fix address line detection on x86
x86/boot/64: Use defines for page size
x86/dumpstack: Make stack name tags more comprehensible
selftests/x86: Add test_vdso to test getcpu()
x86/vdso: Use RDPID in preference to LSL when available
x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl()
x86/cpufeatures: Enable new AVX512 cpu features
x86/cpuid: Provide get_scattered_cpuid_leaf()
x86/cpuid: Cleanup cpuid_regs definitions
x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants
x86/unwind: Ensure stack grows down
x86/vdso: Set vDSO pointer only after success
x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE
x86/unwind: Detect bad stack return address
x86/dumpstack: Warn on stack recursion
x86/unwind: Warn on bad frame pointer
x86/decoder: Use stderr if insn sanity test fails
x86/decoder: Use stdout if insn decoder test is successful
mm/page_alloc: Remove kernel address exposure in free_reserved_area()
x86/dumpstack: Remove raw stack dump
...
Reorganize the E400 detection now that we have everything in place:
switch the CPUs to broadcast mode after the LAPIC has been initialized
and remove the facilities that were used previously on the idle path.
Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine
because alternatives have been applied when the actual detection happens,
so the static switching does not take effect and the test will stay
false. Use boot_cpu_has_bug() instead which is definitely an improvement
over the RDMSR and the cpumask handling.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sparse populated CPUID leafs are collected in a software provided leaf to
avoid bloat of the x86_capability array, but there is no way to rebuild the
real leafs (e.g. for KVM CPUID enumeration) other than rereading the CPUID
leaf from the CPU. While this is possible it is problematic as it does not
take software disabled features into account. If a feature is disabled on
the host it should not be exposed to a guest either.
Add get_scattered_cpuid_leaf() which rebuilds the leaf from the scattered
cpuid table information and the active CPU features.
[ tglx: Rewrote changelog ]
Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Piotr Luc <Piotr.Luc@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1478856336-9388-3-git-send-email-he.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fpu updates from Ingo Molnar:
"The main x86 FPU changes in this cycle were:
- a large series of cleanups, fixes and enhancements to re-enable the
XSAVES instruction on Intel CPUs - which is the most advanced
instruction to do FPU context switches (Yu-cheng Yu, Fenghua Yu)
- Add FPU tracepoints for the FPU state machine (Dave Hansen)"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Do not BUG_ON() in early FPU code
x86/fpu/xstate: Re-enable XSAVES
x86/fpu/xstate: Fix fpstate_init() for XRSTORS
x86/fpu/xstate: Return NULL for disabled xstate component address
x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES
x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates
x86/fpu/xstate: Fix XSTATE component offset print out
x86/fpu/xstate: Fix PTRACE frames for XSAVES
x86/fpu/xstate: Fix supervisor xstate component offset
x86/fpu/xstate: Align xstate components according to CPUID
x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use
x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
x86/fpu: Add tracepoints to dump FPU state at key points
Pull x86 boot updates from Ingo Molnar:
"The biggest changes in this cycle were:
- prepare for more KASLR related changes, by restructuring, cleaning
up and fixing the existing boot code. (Kees Cook, Baoquan He,
Yinghai Lu)
- simplifly/concentrate subarch handling code, eliminate
paravirt_enabled() usage. (Luis R Rodriguez)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/KASLR: Clarify purpose of each get_random_long()
x86/KASLR: Add virtual address choosing function
x86/KASLR: Return earliest overlap when avoiding regions
x86/KASLR: Add 'struct slot_area' to manage random_addr slots
x86/boot: Add missing file header comments
x86/KASLR: Initialize mapping_info every time
x86/boot: Comment what finalize_identity_maps() does
x86/KASLR: Build identity mappings on demand
x86/boot: Split out kernel_ident_mapping_init()
x86/boot: Clean up indenting for asm/boot.h
x86/KASLR: Improve comments around the mem_avoid[] logic
x86/boot: Simplify pointer casting in choose_random_location()
x86/KASLR: Consolidate mem_avoid[] entries
x86/boot: Clean up pointer casting
x86/boot: Warn on future overlapping memcpy() use
x86/boot: Extract error reporting functions
x86/boot: Correctly bounds-check relocations
x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
x86/boot: Fix "run_size" calculation
x86/boot: Calculate decompression size during boot not build
...
Pull x86 asm updates from Ingo Molnar:
"This is another big update. Main changes are:
- lots of x86 system call (and other traps/exceptions) entry code
enhancements. In particular the complex parts of the 64-bit entry
code have been migrated to C code as well, and a number of dusty
corners have been refreshed. (Andy Lutomirski)
- vDSO special mapping robustification and general cleanups (Andy
Lutomirski)
- cpufeature refactoring, cleanups and speedups (Borislav Petkov)
- lots of other changes ..."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
x86/cpufeature: Enable new AVX-512 features
x86/entry/traps: Show unhandled signal for i386 in do_trap()
x86/entry: Call enter_from_user_mode() with IRQs off
x86/entry/32: Change INT80 to be an interrupt gate
x86/entry: Improve system call entry comments
x86/entry: Remove TIF_SINGLESTEP entry work
x86/entry/32: Add and check a stack canary for the SYSENTER stack
x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
x86/entry: Vastly simplify SYSENTER TF (single-step) handling
x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
x86/entry/32: Restore FLAGS on SYSEXIT
x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
selftests/x86: In syscall_nt, test NT|TF as well
x86/asm-offsets: Remove PARAVIRT_enabled
x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
uprobes: __create_xol_area() must nullify xol_mapping.fault
x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
...
thread_saved_pc() reads stack of a potentially running task.
This can cause false KASAN stack-out-of-bounds reports,
because the running task concurrently poisons and unpoisons
own stack.
The same happens in get_wchan(), and get get_wchan() was fixed
by using READ_ONCE_NOCHECK(). Do the same here.
Example KASAN report triggered by sysrq-t:
BUG: KASAN: out-of-bounds in sched_show_task+0x306/0x3b0 at addr ffff880043c97c18
Read of size 8 by task syz-executor/23839
[...]
page dumped because: kasan: bad access detected
[...]
Call Trace:
[<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40
[<ffffffff813e7a26>] sched_show_task+0x306/0x3b0
[<ffffffff813e7bf4>] show_state_filter+0x124/0x1a0
[<ffffffff82d2ca00>] fn_show_state+0x10/0x20
[<ffffffff82d2cf98>] k_spec+0xa8/0xe0
[<ffffffff82d3354f>] kbd_event+0xb9f/0x4000
[<ffffffff843ca8a7>] input_to_handler+0x3a7/0x4b0
[<ffffffff843d1954>] input_pass_values.part.5+0x554/0x6b0
[<ffffffff843d29bc>] input_handle_event+0x2ac/0x1070
[<ffffffff843d3a47>] input_inject_event+0x237/0x280
[<ffffffff843e8c28>] evdev_write+0x478/0x680
[<ffffffff817ac653>] __vfs_write+0x113/0x480
[<ffffffff817ae0e7>] vfs_write+0x167/0x4a0
[<ffffffff817b13d1>] SyS_write+0x111/0x220
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: glider@google.com
Cc: kasan-dev@googlegroups.com
Cc: kcc@google.com
Cc: linux-kernel@vger.kernel.org
Cc: ryabinin.a.a@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Adding the rtc platform device in non-privileged Xen PV guests causes
an IRQ conflict because these guests do not have legacy PIC and may
allocate irqs in the legacy range.
In a single VCPU Xen PV guest we should have:
/proc/interrupts:
CPU0
0: 4934 xen-percpu-virq timer0
1: 0 xen-percpu-ipi spinlock0
2: 0 xen-percpu-ipi resched0
3: 0 xen-percpu-ipi callfunc0
4: 0 xen-percpu-virq debug0
5: 0 xen-percpu-ipi callfuncsingle0
6: 0 xen-percpu-ipi irqwork0
7: 321 xen-dyn-event xenbus
8: 90 xen-dyn-event hvc_console
...
But hvc_console cannot get its interrupt because it is already in use
by rtc0 and the console does not work.
genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
We can avoid this problem by realizing that unprivileged PV guests (both
Xen and lguests) are not supposed to have rtc_cmos device and so
adding it is not necessary.
Privileged guests (i.e. Xen's dom0) do use it but they should not have
irq conflicts since they allocate irqs above legacy range (above
gsi_top, in fact).
Instead of explicitly testing whether the guest is privileged we can
extend pv_info structure to include information about guest's RTC
support.
Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: vkuznets@redhat.com
Cc: xen-devel@lists.xenproject.org
Cc: konrad.wilk@oracle.com
Cc: stable@vger.kernel.org # 4.2+
Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 sigcontext header cleanups from Ingo Molnar:
"This series reorganizes and cleans up various aspects of the main
sigcontext UAPI headers, such as unifying the data structures and
updating/adding lots of comments to explain all the ABI details and
quirks. The headers can now also be built in user-space standalone"
* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers: Clean up too long lines
x86/headers: Remove <asm/sigcontext.h> references on the kernel side
x86/headers: Remove direct sigcontext32.h uses
x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
x86/headers: Make sigcontext pointers bit independent
x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
x86/headers: Unify register type definitions between 32-bit compat and i386
x86/headers: Use ABI types consistently in sigcontext*.h
x86/headers: Separate out legacy user-space structure definitions
x86/headers: Clean up and better document uapi/asm/sigcontext.h
x86/headers: Clean up uapi/asm/sigcontext32.h
x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h
Pull x86 boot updates from Ingo Molnar:
"The main x86 bootup related changes in this cycle were:
- more boot time optimizations. (Len Brown)
- implement hex output to allow the debugging of early bootup
parameters. (Kees Cook)
- remove obsolete MCA leftovers. (Paolo Pisati)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted
x86/smpboot: Remove SIPI delays from cpu_up()
x86/smpboot: Remove udelay(100) when polling cpu_callin_map
x86/smpboot: Remove udelay(100) when polling cpu_initialized_map
x86/boot: Obsolete the MCA sys_desc_table
x86/boot: Add hex output for debugging