Commit Graph

14941 Commits

Author SHA1 Message Date
Dou Liyang
a2510d156e x86/apic: Split local APIC timer setup from the APIC setup
apic_bsp_setup() sets up the local APIC, I/O APIC and APIC timer.

The local APIC and I/O APIC setup belongs to interrupt delivery mode
setup. Setting up the local APIC timer for booting CPU is another job
and has nothing to do with interrupt delivery mode setup.

Split local APIC timer setup from the APIC setup, keep it in the original
position for SMP and UP kernel for now.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-4-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:03:14 +02:00
Dou Liyang
4b1669e8d1 x86/apic: Prepare for unifying the interrupt delivery modes setup
There are three places which initialize the interrupt delivery modes:

1) init_bsp_APIC() which is called early might setup the through-local-APIC
   virtual wire mode on non SMP systems.

2) In an SMP-capable system, native_smp_prepare_cpus() tries to switch to
   symmetric I/O model.

3) In UP system with UP_LATE_INIT=y, the local APIC and I/O APIC are set up
   in smp_init().

There is no technical reason to make these initializations at random places
and run the kernel with the potentially wrong mode through the early boot
stage, but it has a problematic side effect: The late switch to symmetric
I/O mode causes dump-capture kernel to hang when the kernel command line
option 'notsc' is active.

Provide a new function to unify that three positions. Preparatory patch to
initialize an interrupt mode directly.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-3-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:03:14 +02:00
Dou Liyang
0114a8e877 x86/apic: Construct a selector for the interrupt delivery mode
There are quite some switches which are used to determine the final
interrupt delivery mode, as shown below:

1) Kconfig: CONFIG_X86_64; CONFIG_X86_LOCAL_APIC; CONFIG_x86_IO_APIC
2) Command line options: disable_apic; skip_ioapic_setup
3) CPU Capability: boot_cpu_has(X86_FEATURE_APIC)
4) MP table: smp_found_config
5) ACPI: acpi_lapic; acpi_ioapic; nr_ioapic

These switches are disordered and scattered and there are also some
dependencies between them. These make the code difficult to maintain and
read.

Construct a selector to unify them into a single function, then, Use this
selector to get an interrupt delivery mode directly.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-2-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:03:14 +02:00
Sean Fu
7d7099433d x86/sysfs: Fix off-by-one error in loop termination
An off-by-one error in loop terminantion conditions in
create_setup_data_nodes() will lead to memory leak when
create_setup_data_node() failed.

Signed-off-by: Sean Fu <fxinrong@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1505090001-1157-1-git-send-email-fxinrong@gmail.com
2017-09-25 09:36:16 +02:00
Eric Biggers
814fb7bb7d x86/fpu: Don't let userspace set bogus xcomp_bv
On x86, userspace can use the ptrace() or rt_sigreturn() system calls to
set a task's extended state (xstate) or "FPU" registers.  ptrace() can
set them for another task using the PTRACE_SETREGSET request with
NT_X86_XSTATE, while rt_sigreturn() can set them for the current task.
In either case, registers can be set to any value, but the kernel
assumes that the XSAVE area itself remains valid in the sense that the
CPU can restore it.

However, in the case where the kernel is using the uncompacted xstate
format (which it does whenever the XSAVES instruction is unavailable),
it was possible for userspace to set the xcomp_bv field in the
xstate_header to an arbitrary value.  However, all bits in that field
are reserved in the uncompacted case, so when switching to a task with
nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault.  This
caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit.  In
addition, since the error is otherwise ignored, the FPU registers from
the task previously executing on the CPU were leaked.

Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in
the uncompacted case, and returning an error otherwise.

The reason for validating xcomp_bv rather than simply overwriting it
with 0 is that we want userspace to see an error if it (incorrectly)
provides an XSAVE area in compacted format rather than in uncompacted
format.

Note that as before, in case of error we clear the task's FPU state.
This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be
better to return an error before changing anything.  But it seems the
"clear on error" behavior is fine for now, and it's a little tricky to
do otherwise because it would mean we couldn't simply copy the full
userspace state into kernel memory in one __copy_from_user().

This bug was found by syzkaller, which hit the above-mentioned
WARN_ON_FPU():

    WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
    RIP: 0010:__switch_to+0x5b5/0x5d0
    RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
    RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
    RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
    RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
    R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
    FS:  00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
    Call Trace:
    Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f

Here is a C reproducer.  The expected behavior is that the program spin
forever with no output.  However, on a buggy kernel running on a
processor with the "xsave" feature but without the "xsaves" feature
(e.g. Sandy Bridge through Broadwell for Intel), within a second or two
the program reports that the xmm registers were corrupted, i.e. were not
restored correctly.  With CONFIG_X86_DEBUG_FPU=y it also hits the above
kernel warning.

    #define _GNU_SOURCE
    #include <stdbool.h>
    #include <inttypes.h>
    #include <linux/elf.h>
    #include <stdio.h>
    #include <sys/ptrace.h>
    #include <sys/uio.h>
    #include <sys/wait.h>
    #include <unistd.h>

    int main(void)
    {
        int pid = fork();
        uint64_t xstate[512];
        struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };

        if (pid == 0) {
            bool tracee = true;
            for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
                tracee = (fork() != 0);
            uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF };
            asm volatile("   movdqu %0, %%xmm0\n"
                         "   mov %0, %%rbx\n"
                         "1: movdqu %%xmm0, %0\n"
                         "   mov %0, %%rax\n"
                         "   cmp %%rax, %%rbx\n"
                         "   je 1b\n"
                         : "+m" (xmm0) : : "rax", "rbx", "xmm0");
            printf("BUG: xmm registers corrupted!  tracee=%d, xmm0=%08X%08X%08X%08X\n",
                   tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
        } else {
            usleep(100000);
            ptrace(PTRACE_ATTACH, pid, 0, 0);
            wait(NULL);
            ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
            xstate[65] = -1;
            ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
            ptrace(PTRACE_CONT, pid, 0, 0);
            wait(NULL);
        }
        return 1;
    }

Note: the program only tests for the bug using the ptrace() system call.
The bug can also be reproduced using the rt_sigreturn() system call, but
only when called from a 32-bit program, since for 64-bit programs the
kernel restores the FPU state from the signal frame by doing XRSTOR
directly from userspace memory (with proper error checking).

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org> [v3.17+]
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Fixes: 0b29643a58 ("x86/xsaves: Change compacted format xsave area header")
Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-25 09:26:32 +02:00
kbuild test robot
4f8cef59ba x86/fpu: Fix boolreturn.cocci warnings
arch/x86/kernel/fpu/xstate.c:931:9-10: WARNING: return of 0/1 in function 'xfeatures_mxcsr_quirk' with return type bool

 Return statements in functions returning bool should use true/false instead of 1/0.

Generated by: scripts/coccinelle/misc/boolreturn.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Link: http://lkml.kernel.org/r/20170306004553.GA25764@lkp-wsm-ep1
Link: http://lkml.kernel.org/r/20170923130016.21448-23-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:35 +02:00
Rik van Riel
0852b37417 x86/fpu: Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake CPUs
On Skylake CPUs I noticed that XRSTOR is unable to deal with states
created by copyout_from_xsaves() if the xstate has only SSE/YMM state, and
no FP state. That is, xfeatures had XFEATURE_MASK_SSE set, but not
XFEATURE_MASK_FP.

The reason is that part of the SSE/YMM state lives in the MXCSR and
MXCSR_FLAGS fields of the FP state.

Ensure that whenever we copy SSE or YMM state around, the MXCSR and
MXCSR_FLAGS fields are also copied around.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170210085445.0f1cc708@annuminas.surriel.com
Link: http://lkml.kernel.org/r/20170923130016.21448-22-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar
99dc26bda2 x86/fpu: Remove struct fpu::fpregs_active
The previous changes paved the way for the removal of the
fpu::fpregs_active state flag - we now only have the
fpu::fpstate_active and fpu::last_cpu fields left.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-21-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar
6cf4edbe05 x86/fpu: Decouple fpregs_activate()/fpregs_deactivate() from fpu->fpregs_active
The fpregs_activate()/fpregs_deactivate() are currently called in such a pattern:

	if (!fpu->fpregs_active)
		fpregs_activate(fpu);

	...

	if (fpu->fpregs_active)
		fpregs_deactivate(fpu);

But note that it's actually safe to call them without checking the flag first.

This further decouples the fpu->fpregs_active flag from actual FPU logic.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-20-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar
f1c8cd0176 x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active
We want to simplify the FPU state machine by eliminating fpu->fpregs_active,
and we can do that because the two state flags (::fpregs_active and
::fpstate_active) are set essentially together.

The old lazy FPU switching code used to make a distinction - but there's
no lazy switching code anymore, we always switch in an 'eager' fashion.

Do this by first changing all substantial uses of fpu->fpregs_active
to fpu->fpstate_active and adding a few debug checks to double check
our assumption is correct.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-19-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar
b6aa85558d x86/fpu: Split the state handling in fpu__drop()
Prepare fpu__drop() to use fpu->fpregs_active.

There are two distinct usecases for fpu__drop() in this context:
exit_thread() when called for 'current' in exit(), and when called
for another task in fork().

This patch does not change behavior, it only adds a couple of
debug checks and structures the code to make the ->fpregs_active
change more obviously correct.

All the complications will be removed later on.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-18-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar
a10b6a16cd x86/fpu: Make the fpu state change in fpu__clear() scheduler-atomic
Do this temporarily only, to make it easier to change the FPU state machine,
in particular this change couples the fpu->fpregs_active and fpu->fpstate_active
states: they are only set/cleared together (as far as the scheduler sees them).

This will be removed by later patches.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-17-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar
b3a163081c x86/fpu: Simplify fpu->fpregs_active use
The fpregs_active() inline function is pretty pointless - in almost
all the callsites it can be replaced with a direct fpu->fpregs_active
access.

Do so and eliminate the extra layer of obfuscation.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-16-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar
6d7f7da553 x86/fpu: Flip the parameter order in copy_*_to_xstate()
Make it more consistent with regular memcpy() semantics, where the destination
argument comes first.

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-15-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar
7b9094c688 x86/fpu: Remove 'kbuf' parameter from the copy_user_to_xstate() API
No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-14-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar
59dffa4edb x86/fpu: Remove 'ubuf' parameter from the copy_kernel_to_xstate() API
No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-13-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar
79fecc2b75 x86/fpu: Split copy_user_to_xstate() into copy_kernel_to_xstate() & copy_user_to_xstate()
Similar to:

  x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user()

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-12-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:32 +02:00
Ingo Molnar
8c0817f4a3 x86/fpu: Simplify __copy_xstate_to_kernel() return values
__copy_xstate_to_kernel() can only return 0 (because kernel copies cannot fail),
simplify the code throughout.

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-11-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:32 +02:00
Ingo Molnar
6ff15f8db7 x86/fpu: Change 'size_total' parameter to unsigned and standardize the size checks in copy_xstate_to_*()
'size_total' is derived from an unsigned input parameter - and then converted
to 'int' and checked for negative ranges:

	if (size_total < 0 || offset < size_total) {

This conversion and the checks are unnecessary obfuscation, reject overly
large requested copy sizes outright and simplify the underlying code.

Reported-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-10-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:32 +02:00
Ingo Molnar
56583c9a14 x86/fpu: Clarify parameter names in the copy_xstate_to_*() methods
Right now there's a confusing mixture of 'offset' and 'size' parameters:

 - __copy_xstate_to_*() input parameter 'end_pos' not not really an offset,
   but the full size of the copy to be performed.

 - input parameter 'count' to copy_xstate_to_*() shadows that of
   __copy_xstate_to_*()'s 'count' parameter name - but the roles
   are different: the first one is the total number of bytes to
   be copied, while the second one is a partial copy size.

To unconfuse all this, use a consistent set of parameter names:

 - 'size' is the partial copy size within a single xstate component
 - 'size_total' is the total copy requested
 - 'offset_start' is the requested starting offset.
 - 'offset' is the offset within an xstate component.

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-9-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:32 +02:00
Ingo Molnar
8a5b731889 x86/fpu: Remove the 'start_pos' parameter from the __copy_xstate_to_*() functions
'start_pos' is always 0, so remove it and remove the pointless check of 'pos < 0'
which can not ever be true as 'pos' is unsigned ...

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-8-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:31 +02:00
Ingo Molnar
becb2bb72f x86/fpu: Clean up the parameter definitions of copy_xstate_to_*()
Remove pointless 'const' of non-pointer input parameter.

Remove unnecessary parenthesis that shows uncertainty about arithmetic operator precedence.

Clarify copy_xstate_to_user() description.

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-7-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:31 +02:00
Ingo Molnar
d7eda6c99c x86/fpu: Clean up parameter order in the copy_xstate_to_*() APIs
Parameter ordering is weird:

  int copy_xstate_to_kernel(unsigned int pos, unsigned int count, void *kbuf, struct xregs_state *xsave);
  int copy_xstate_to_user(unsigned int pos, unsigned int count, void __user *ubuf, struct xregs_state *xsave);

'pos' and 'count', which are attributes of the destination buffer, are listed before the destination
buffer itself ...

List them after the primary arguments instead.

This makes the code more similar to regular memcpy() variant APIs.

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-6-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:31 +02:00
Ingo Molnar
a69c158fb3 x86/fpu: Remove 'kbuf' parameter from the copy_xstate_to_user() APIs
The 'kbuf' parameter is unused in the _user() side of the API, remove it.

This simplifies the code and makes it easier to think about.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-5-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:31 +02:00
Ingo Molnar
4d981cf2d9 x86/fpu: Remove 'ubuf' parameter from the copy_xstate_to_kernel() APIs
The 'ubuf' parameter is unused in the _kernel() side of the API, remove it.

This simplifies the code and makes it easier to think about.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-4-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:31 +02:00
Ingo Molnar
f0d4f30a7f x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user()
copy_xstate_to_user() is a weird API - in part due to a bad API inherited
from the regset APIs.

But don't propagate that bad API choice into the FPU code - so as a first
step split the API into kernel and user buffer handling routines.

(Also split the xstate_copyout() internal helper.)

The split API is a dumb duplication that should be obviously correct, the
real splitting will be done in the next patch.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-3-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:30 +02:00
Ingo Molnar
656f083116 x86/fpu: Rename copyin_to_xsaves()/copyout_from_xsaves() to copy_user_to_xstate()/copy_xstate_to_user()
The 'copyin/copyout' nomenclature needlessly departs from what the modern FPU code
uses, which is:

 copy_fpregs_to_fpstate()
 copy_fpstate_to_sigframe()
 copy_fregs_to_user()
 copy_fxregs_to_kernel()
 copy_fxregs_to_user()
 copy_kernel_to_fpregs()
 copy_kernel_to_fregs()
 copy_kernel_to_fxregs()
 copy_kernel_to_xregs()
 copy_user_to_fregs()
 copy_user_to_fxregs()
 copy_user_to_xregs()
 copy_xregs_to_kernel()
 copy_xregs_to_user()

I.e. according to this pattern, the following rename should be done:

  copyin_to_xsaves()    -> copy_user_to_xstate()
  copyout_from_xsaves() -> copy_xstate_to_user()

or, if we want to be pedantic, denote that that the user-space format is ptrace:

  copyin_to_xsaves()    -> copy_user_ptrace_to_xstate()
  copyout_from_xsaves() -> copy_xstate_to_user_ptrace()

But I'd suggest the shorter, non-pedantic name.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-2-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:30 +02:00
Andy Lutomirski
4ba55e65f4 x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
For unknown historical reasons (i.e. Borislav doesn't recall),
32-bit kernels invoke cpu_init() on secondary CPUs with
initial_page_table loaded into CR3.  Then they set
current->active_mm to &init_mm and call enter_lazy_tlb() before
fixing CR3.  This means that the x86 TLB code gets invoked while CR3
is inconsistent, and, with the improved PCID sanity checks I added,
we warn.

Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 72c0098d92 ("x86/mm: Reinitialize TLB state on hotplug and resume")
Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17 18:59:09 +02:00
Andy Lutomirski
b8b7abaed7 x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
Otherwise we might have the PCID feature bit set during cpu_init().

This is just for robustness.  I haven't seen any actual bugs here.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cba4671af7 ("x86/mm: Disable PCID on 32-bit kernels")
Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17 18:59:08 +02:00
Linus Torvalds
c44d1ac0c3 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "A single fix addressing the missing CP8 feature bit in CPUID for a
  range of AMD ZEN models/mask revisions"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/AMD: Fix erratum 1076 (CPB bit)
2017-09-17 08:05:54 -07:00
Linus Torvalds
9db59599ae Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
 - PPC bugfixes
 - RCU splat fix
 - swait races fix
 - pointless userspace-triggerable BUG() fix
 - misc fixes for KVM_RUN corner cases
 - nested virt correctness fixes + one host DoS
 - some cleanups
 - clang build fix
 - fix AMD AVIC with default QEMU command line options
 - x86 bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
  kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
  kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
  kvm,mips: Fix potential swait_active() races
  kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
  kvm: Serialize wq active checks in kvm_vcpu_wake_up()
  kvm,x86: Fix apf_task_wake_one() wq serialization
  kvm,lapic: Justify use of swait_active()
  kvm,async_pf: Use swq_has_sleeper()
  sched/wait: Add swq_has_sleeper()
  KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
  KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
  kvm: nVMX: Don't allow L2 to access the hardware CR8
  KVM: trace events: update list of exit reasons
  KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
  KVM: X86: Don't block vCPU if there is pending exception
  KVM: SVM: Add irqchip_split() checks before enabling AVIC
  KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
  KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
  KVM: x86: fix clang build
  ...
2017-09-15 15:43:55 -07:00
Davidlohr Bueso
a0cff57bb2 kvm,x86: Fix apf_task_wake_one() wq serialization
During code inspection, the following potential race was seen:

CPU0   	    		    	     	CPU1
kvm_async_pf_task_wait			apf_task_wake_one
					  [L] swait_active(&n->wq)
  [S] prepare_to_swait(&n.wq)
  [L] if (!hlist_unhahed(&n.link))
	schedule()			  [S] hlist_del_init(&n->link);

Properly serialize swait_active() checks such that a wakeup is
not missed.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15 16:57:12 +02:00
Borislav Petkov
f7f3dc00f6 x86/cpu/AMD: Fix erratum 1076 (CPB bit)
CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do
support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-15 11:30:53 +02:00
Christoph Hellwig
6faadbbb7f dmi: Mark all struct dmi_system_id instances const
... and __initconst if applicable.

Based on similar work for an older kernel in the Grsecurity patch.

[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
2017-09-14 11:59:30 +02:00
K. Y. Srinivasan
213ff44ae4 x86/hyper-V: Allocate the IDT entry early in boot
Allocate the hypervisor callback IDT entry early in the boot sequence.

The previous code would allocate the entry as part of registering the handler
when the vmbus driver loaded, and this caused a problem for the IDT cleanup
that Thomas is working on for v4.15.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: olaf@aepfle.de
Link: http://lkml.kernel.org/r/20170908231557.2419-1-kys@exchange.microsoft.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13 11:02:26 +02:00
Juergen Gross
87930019c7 x86/paravirt: Remove no longer used paravirt functions
With removal of lguest some of the paravirt functions are no longer
needed:

	->read_cr4()
	->store_idt()
	->set_pmd_at()
	->set_pud_at()
	->pte_update()

Remove them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13 10:55:15 +02:00
Andy Lutomirski
c7ad5ad297 x86/mm/64: Initialize CR4.PCIDE early
cpu_init() is weird: it's called rather late (after early
identification and after most MMU state is initialized) on the boot
CPU but is called extremely early (before identification) on secondary
CPUs.  It's called just late enough on the boot CPU that its CR4 value
isn't propagated to mmu_cr4_features.

Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
problems.  First, we'd crash in the trampoline code.  That's
fixable, and I tried that.  It turns out that mmu_cr4_features is
totally ignored by secondary_start_64(), though, so even with the
trampoline code fixed, it wouldn't help.

This means that we don't currently have CR4.PCIDE reliably initialized
before we start playing with cpu_tlbstate.  This is very fragile and
tends to cause boot failures if I make even small changes to the TLB
handling code.

Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
and propagate it to secondary CPUs in start_secondary().

( Yes, this is ugly.  I think we should have improved mmu_cr4_features
  to actually control CR4 during secondary bootup, but that would be
  fairly intrusive at this stage. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 660da7c922 ("x86/mm: Enable CR4.PCIDE on supported systems")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13 09:54:43 +02:00
Linus Torvalds
680352bda5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two fixes: dead code removal, plus a SME memory encryption fix on
  32-bit kernels that crashed Xen guests"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Remove unused and undefined __generic_processor_info() declaration
  x86/mm: Make the SME mask a u64
2017-09-12 11:34:39 -07:00
Linus Torvalds
dd198ce714 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "Life has been busy and I have not gotten half as much done this round
  as I would have liked. I delayed it so that a minor conflict
  resolution with the mips tree could spend a little time in linux-next
  before I sent this pull request.

  This includes two long delayed user namespace changes from Kirill
  Tkhai. It also includes a very useful change from Serge Hallyn that
  allows the security capability attribute to be used inside of user
  namespaces. The practical effect of this is people can now untar
  tarballs and install rpms in user namespaces. It had been suggested to
  generalize this and encode some of the namespace information
  information in the xattr name. Upon close inspection that makes the
  things that should be hard easy and the things that should be easy
  more expensive.

  Then there is my bugfix/cleanup for signal injection that removes the
  magic encoding of the siginfo union member from the kernel internal
  si_code. The mips folks reported the case where I had used FPE_FIXME
  me is impossible so I have remove FPE_FIXME from mips, while at the
  same time including a return statement in that case to keep gcc from
  complaining about unitialized variables.

  I almost finished the work to get make copy_siginfo_to_user a trivial
  copy to user. The code is available at:

     git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

  But I did not have time/energy to get the code posted and reviewed
  before the merge window opened.

  I was able to see that the security excuse for just copying fields
  that we know are initialized doesn't work in practice there are buggy
  initializations that don't initialize the proper fields in siginfo. So
  we still sometimes copy unitialized data to userspace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Introduce v3 namespaced file capabilities
  mips/signal: In force_fcr31_sig return in the impossible case
  signal: Remove kernel interal si_code magic
  fcntl: Don't use ambiguous SIG_POLL si_codes
  prctl: Allow local CAP_SYS_ADMIN changing exe_file
  security: Use user_namespace::level to avoid redundant iterations in cap_capable()
  userns,pidns: Verify the userns for new pid namespaces
  signal/testing: Don't look for __SI_FAULT in userspace
  signal/mips: Document a conflict with SI_USER with SIGFPE
  signal/sparc: Document a conflict with SI_USER with SIGFPE
  signal/ia64: Document a conflict with SI_USER with SIGFPE
  signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-11 18:34:47 -07:00
Dou Liyang
e2329b4252 x86/cpu: Remove unused and undefined __generic_processor_info() declaration
The following revert:

  2b85b3d229 ("x86/acpi: Restore the order of CPU IDs")

... got rid of __generic_processor_info(), but forgot to remove its
declaration in mpspec.h.

Remove the declaration and update the comments as well.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1505101403-29100-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-11 08:16:37 +02:00
Alexey Dobriyan
9b130ad5bb treewide: make "nr_cpu_ids" unsigned
First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
	kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
	while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

	add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
	function                                     old     new   delta
	coretemp_cpu_online                          450     512     +62
	rcu_init_one                                1234    1272     +38
	pci_device_probe                             374     399     +25

				...

	pgdat_reclaimable_pages                      628     556     -72
	select_fallback_rq                           446     369     -77
	task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Linus Torvalds
57e88b43b8 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes include various Hyper-V optimizations such as faster
  hypercalls and faster/better TLB flushes - and there's also some
  Intel-MID cleanups"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others()
  x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls
  x86/platform/intel-mid: Make several arrays static, to make code smaller
  MAINTAINERS: Add missed file for Hyper-V
  x86/hyper-v: Use hypercall for remote TLB flush
  hyper-v: Globalize vp_index
  x86/hyper-v: Implement rep hypercalls
  hyper-v: Use fast hypercall for HVCALL_SIGNAL_EVENT
  x86/hyper-v: Introduce fast hypercall implementation
  x86/hyper-v: Make hv_do_hypercall() inline
  x86/hyper-v: Include hyperv/ only when CONFIG_HYPERV is set
  x86/platform/intel-mid: Make 'bt_sfi_data' const
  x86/platform/intel-mid: Make IRQ allocation a bit more flexible
  x86/platform/intel-mid: Group timers callbacks together
2017-09-07 09:25:15 -07:00
Andy Lutomirski
1c9fe4409c x86/mm: Document how CR4.PCIDE restore works
While debugging a problem, I thought that using
cr4_set_bits_and_update_boot() to restore CR4.PCIDE would be
helpful.  It turns out to be counterproductive.

Add a comment documenting how this works.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 20:12:57 -07:00
Andy Lutomirski
72c0098d92 x86/mm: Reinitialize TLB state on hotplug and resume
When Linux brings a CPU down and back up, it switches to init_mm and then
loads swapper_pg_dir into CR3.  With PCID enabled, this has the side effect
of masking off the ASID bits in CR3.

This can result in some confusion in the TLB handling code.  If we
bring a CPU down and back up with any ASID other than 0, we end up
with the wrong ASID active on the CPU after resume.  This could
cause our internal state to become corrupt, although major
corruption is unlikely because init_mm doesn't have any user pages.
More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
in the next context switch.  The result of *that* is a failure to
resume from suspend with probability 1 - 1/6^(cpus-1).

Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Jiri Kosina <jikos@kernel.org>
Fixes: 10af6235e0 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 20:12:57 -07:00
Linus Torvalds
53ac64aac9 Merge tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
 "These include a usual ACPICA code update (this time to upstream
  revision 20170728), a fix for a boot crash on some systems with
  Thunderbolt devices connected at boot time, a rework of the handling
  of PCI bridges when setting up device wakeup, new support for Apple
  device properties, support for DMA configurations reported via ACPI on
  ARM64, APEI-related updates, ACPI EC driver updates and assorted minor
  modifications in several places.

  Specifics:

   - Update the ACPICA code in the kernel to upstream revision 20170728
     including:
      * Alias operator handling update (Bob Moore).
      * Deferred resolution of reference package elements (Bob Moore).
      * Support for the _DMA method in walk resources (Bob Moore).
      * Tables handling update and support for deferred table
        verification (Lv Zheng).
      * Update of SMMU models for IORT (Robin Murphy).
      * Compiler and disassembler updates (Alex James, Erik Schmauss,
        Ganapatrao Kulkarni, James Morse).
      * Tools updates (Erik Schmauss, Lv Zheng).
      * Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv
        Zheng, Shao Ming).

   - Rework the initialization of non-wakeup GPEs with method handlers
     in order to address a boot crash on some systems with Thunderbolt
     devices connected at boot time where we miss an early hotplug event
     due to a delay in GPE enabling (Rafael Wysocki).

   - Rework the handling of PCI bridges when setting up ACPI-based
     device wakeup in order to avoid disabling wakeup for bridges
     prematurely (Rafael Wysocki).

   - Consolidate Apple DMI checks throughout the tree, add support for
     Apple device properties to the device properties framework and use
     these properties for the handling of I2C and SPI devices on Apple
     systems (Lukas Wunner).

   - Add support for _DMA to the ACPI-based device properties lookup
     code and make it possible to use the information from there to
     configure DMA regions on ARM64 systems (Lorenzo Pieralisi).

   - Fix several issues in the APEI code, add support for exporting the
     BERT error region over sysfs and update APEI MAINTAINERS entry with
     reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit
     Agrawal, Tony Luck, Yazen Ghannam).

   - Fix a potential initialization ordering issue in the ACPI EC driver
     and clean it up somewhat (Lv Zheng).

   - Update the ACPI SPCR driver to extend the existing XGENE 8250
     workaround in it to a new platform (m400) and to work around an
     Xgene UART clock issue (Graeme Gregory).

   - Add a new utility function to the ACPI core to support using ACPI
     OEM ID / OEM Table ID / Revision for system identification in
     blacklisting or similar and switch over the existing code already
     using this information to this new interface (Toshi Kani).

   - Fix an xpower PMIC issue related to GPADC reads that always return
     0 without extra pin manipulations (Hans de Goede).

   - Add statements to print debug messages in a couple of places in the
     ACPI core for easier diagnostics (Rafael Wysocki).

   - Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun
     Guo).

   - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko).

   - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver
     (Alex Hung).

   - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur
     Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal,
     Ronald Tschalär, Sumeet Pawnikar)"

* tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
  ACPI / APEI: Suppress message if HEST not present
  intel_pstate: convert to use acpi_match_platform_list()
  ACPI / blacklist: add acpi_match_platform_list()
  ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources
  ACPI: make device_attribute const
  ACPI / sysfs: Extend ACPI sysfs to provide access to boot error region
  ACPI: APEI: fix the wrong iteration of generic error status block
  ACPI / processor: make function acpi_processor_check_duplicates() static
  ACPI / EC: Clean up EC GPE mask flag
  ACPI: EC: Fix possible issues related to EC initialization order
  ACPI / PM: Add debug statements to acpi_pm_notify_handler()
  ACPI: Add debug statements to acpi_global_event_handler()
  ACPI / scan: Enable GPEs before scanning the namespace
  ACPICA: Make it possible to enable runtime GPEs earlier
  ACPICA: Dispatch active GPEs at init time
  ACPI: SPCR: work around clock issue on xgene UART
  ACPI: SPCR: extend XGENE 8250 workaround to m400
  ACPI / LPSS: Don't abort ACPI scan on missing mem resource
  mailbox: pcc: Drop uninformative output during boot
  ACPI/IORT: Add IORT named component memory address limits
  ...
2017-09-05 12:45:03 -07:00
Linus Torvalds
bafb0762cb Merge tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
 "Here is the big char/misc driver update for 4.14-rc1.

  Lots of different stuff in here, it's been an active development cycle
  for some reason. Highlights are:

   - updated binder driver, this brings binder up to date with what
     shipped in the Android O release, plus some more changes that
     happened since then that are in the Android development trees.

   - coresight updates and fixes

   - mux driver file renames to be a bit "nicer"

   - intel_th driver updates

   - normal set of hyper-v updates and changes

   - small fpga subsystem and driver updates

   - lots of const code changes all over the driver trees

   - extcon driver updates

   - fmc driver subsystem upadates

   - w1 subsystem minor reworks and new features and drivers added

   - spmi driver updates

  Plus a smattering of other minor driver updates and fixes.

  All of these have been in linux-next with no reported issues for a
  while"

* tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits)
  ANDROID: binder: don't queue async transactions to thread.
  ANDROID: binder: don't enqueue death notifications to thread todo.
  ANDROID: binder: Don't BUG_ON(!spin_is_locked()).
  ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl
  ANDROID: binder: push new transactions to waiting threads.
  ANDROID: binder: remove proc waitqueue
  android: binder: Add page usage in binder stats
  android: binder: fixup crash introduced by moving buffer hdr
  drivers: w1: add hwmon temp support for w1_therm
  drivers: w1: refactor w1_slave_show to make the temp reading functionality separate
  drivers: w1: add hwmon support structures
  eeprom: idt_89hpesx: Support both ACPI and OF probing
  mcb: Fix an error handling path in 'chameleon_parse_cells()'
  MCB: add support for SC31 to mcb-lpc
  mux: make device_type const
  char: virtio: constify attribute_group structures.
  Documentation/ABI: document the nvmem sysfs files
  lkdtm: fix spelling mistake: "incremeted" -> "incremented"
  perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file
  nvmem: include linux/err.h from header
  ...
2017-09-05 11:08:17 -07:00
Linus Torvalds
24e700e291 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
 "This update provides:

   - Cleanup of the IDT management including the removal of the extra
     tracing IDT. A first step to cleanup the vector management code.

   - The removal of the paravirt op adjust_exception_frame. This is a
     XEN specific issue, but merged through this branch to avoid nasty
     merge collisions

   - Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has
     disabled the TSC DEADLINE timer in CPUID.

   - Adjust a debug message in the ioapic code to print out the
     information correctly"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  x86/idt: Fix the X86_TRAP_BP gate
  x86/xen: Get rid of paravirt op adjust_exception_frame
  x86/eisa: Add missing include
  x86/idt: Remove superfluous ALIGNment
  x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
  x86/idt: Remove the tracing IDT leftovers
  x86/idt: Hide set_intr_gate()
  x86/idt: Simplify alloc_intr_gate()
  x86/idt: Deinline setup functions
  x86/idt: Remove unused functions/inlines
  x86/idt: Move interrupt gate initialization to IDT code
  x86/idt: Move APIC gate initialization to tables
  x86/idt: Move regular trap init to tables
  x86/idt: Move IST stack based traps to table init
  x86/idt: Move debug stack init to table based
  x86/idt: Switch early trap init to IDT tables
  x86/idt: Prepare for table based init
  x86/idt: Move early IDT setup out of 32-bit asm
  x86/idt: Move early IDT handler setup to IDT code
  x86/idt: Consolidate IDT invalidation
  ...
2017-09-04 17:43:56 -07:00
Linus Torvalds
f57091767a Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache quality monitoring update from Thomas Gleixner:
 "This update provides a complete rewrite of the Cache Quality
  Monitoring (CQM) facility.

  The existing CQM support was duct taped into perf with a lot of issues
  and the attempts to fix those turned out to be incomplete and
  horrible.

  After lengthy discussions it was decided to integrate the CQM support
  into the Resource Director Technology (RDT) facility, which is the
  obvious choise as in hardware CQM is part of RDT. This allowed to add
  Memory Bandwidth Monitoring support on top.

  As a result the mechanisms for allocating cache/memory bandwidth and
  the corresponding monitoring mechanisms are integrated into a single
  management facility with a consistent user interface"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  x86/intel_rdt: Turn off most RDT features on Skylake
  x86/intel_rdt: Add command line options for resource director technology
  x86/intel_rdt: Move special case code for Haswell to a quirk function
  x86/intel_rdt: Remove redundant ternary operator on return
  x86/intel_rdt/cqm: Improve limbo list processing
  x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
  x86/intel_rdt: Modify the intel_pqr_state for better performance
  x86/intel_rdt/cqm: Clear the default RMID during hotcpu
  x86/intel_rdt: Show bitmask of shareable resource with other executing units
  x86/intel_rdt/mbm: Handle counter overflow
  x86/intel_rdt/mbm: Add mbm counter initialization
  x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
  x86/intel_rdt/cqm: Add CPU hotplug support
  x86/intel_rdt/cqm: Add sched_in support
  x86/intel_rdt: Introduce rdt_enable_key for scheduling
  x86/intel_rdt/cqm: Add mount,umount support
  x86/intel_rdt/cqm: Add rmdir support
  x86/intel_rdt: Separate the ctrl bits from rmdir
  x86/intel_rdt/cqm: Add mon_data
  x86/intel_rdt: Prepare for RDT monitor data support
  ...
2017-09-04 13:56:37 -07:00
Linus Torvalds
b1b6f83ac9 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "PCID support, 5-level paging support, Secure Memory Encryption support

  The main changes in this cycle are support for three new, complex
  hardware features of x86 CPUs:

   - Add 5-level paging support, which is a new hardware feature on
     upcoming Intel CPUs allowing up to 128 PB of virtual address space
     and 4 PB of physical RAM space - a 512-fold increase over the old
     limits. (Supercomputers of the future forecasting hurricanes on an
     ever warming planet can certainly make good use of more RAM.)

     Many of the necessary changes went upstream in previous cycles,
     v4.14 is the first kernel that can enable 5-level paging.

     This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
     default.

     (By Kirill A. Shutemov)

   - Add 'encrypted memory' support, which is a new hardware feature on
     upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
     RAM to be encrypted and decrypted (mostly) transparently by the
     CPU, with a little help from the kernel to transition to/from
     encrypted RAM. Such RAM should be more secure against various
     attacks like RAM access via the memory bus and should make the
     radio signature of memory bus traffic harder to intercept (and
     decrypt) as well.

     This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
     by default.

     (By Tom Lendacky)

   - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
     hardware feature that attaches an address space tag to TLB entries
     and thus allows to skip TLB flushing in many cases, even if we
     switch mm's.

     (By Andy Lutomirski)

  All three of these features were in the works for a long time, and
  it's coincidence of the three independent development paths that they
  are all enabled in v4.14 at once"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
  x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
  x86/mm: Use pr_cont() in dump_pagetable()
  x86/mm: Fix SME encryption stack ptr handling
  kvm/x86: Avoid clearing the C-bit in rsvd_bits()
  x86/CPU: Align CR3 defines
  x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
  acpi, x86/mm: Remove encryption mask from ACPI page protection type
  x86/mm, kexec: Fix memory corruption with SME on successive kexecs
  x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
  x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
  x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
  x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
  x86/mm: Allow userspace have mappings above 47-bit
  x86/mm: Prepare to expose larger address space to userspace
  x86/mpx: Do not allow MPX if we have mappings above 47-bit
  x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
  x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
  x86/mm/dump_pagetables: Fix printout of p4d level
  x86/mm/dump_pagetables: Generalize address normalization
  x86/boot: Fix memremap() related build failure
  ...
2017-09-04 12:21:28 -07:00
Linus Torvalds
e0a195b522 Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 spinlock update from Ingo Molnar:
 "Convert an NMI lock to raw"

[ Clarification: it's not that the lock itself is NMI-safe, it's about
  NMI registration called from RT contexts  - Linus ]

* 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Use raw lock
2017-09-04 11:13:56 -07:00