Commit Graph

13291 Commits

Author SHA1 Message Date
Borislav Petkov
7f709d0c32 x86/microcode: Collect CPU info on resume
We need to reread the CPU's microcode revision after resume because
applied microcode gets "forgotten" depending on the sleep state.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-9-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:58 +02:00
Borislav Petkov
6b14b81899 x86/microcode: Issue the debug printk on resume only on success
Move it after the patch application function which also checks whether
we were successful.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-8-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:58 +02:00
Borislav Petkov
b3763a672d x86/microcode/amd: Hand down the CPU family
Will be needed in a following patch.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:58 +02:00
Borislav Petkov
058dc49803 x86/microcode: Export the microcode cache linked list
It will be used by both drivers so move it to core.c.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-6-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:58 +02:00
Borislav Petkov
f61337d984 x86/microcode/intel: Simplify generic_load_microcode()
Make it return the ucode_state directly instead of assigning to a state
variable and jumping to an out: label.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:57 +02:00
Borislav Petkov
5879a26752 x86/microcode: Move driver authors to CREDITS
They're not active anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:57 +02:00
Borislav Petkov
777284b66f x86/microcode: Run the AP-loading routine only on the application processors
cpu_init() is run also on the BSP (in addition to the APs):

 x86_64_start_kernel
 |-> x86_64_start_reservations
 |-> start_kernel
 |-> trap_init
 |-> cpu_init
 |-> load_ucode_ap
 ...

but we run the AP (Application Processors) microcode loading routine
there too even though we have a BSP-specific routine for that:
load_ucode_bsp().

Which is unnecessary. So let's limit the AP microcode loading routine to
the APs only.

Remove a useless comment while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:57 +02:00
Borislav Petkov
59c6f278bd x86/cpu: Get rid of the show_msr= boot option
It is useless as it dumps the MSRs too early BUT(!) we do set MSRs later too.
Also, it dumps only BSP MSRs as it gets called only for CPU 0.

And the MSR range array would need constant updating anyway, and so on
and so on...

Oh, and we have msr.ko and msr-tools which are the much better solution
anyway. So off it goes...

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161024173844.23038-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 11:48:50 +02:00
Borislav Petkov
62a67e123e x86/cpu: Merge bugs.c and bugs_64.c
Should be easier when following boot paths. It probably is a left over
from the x86 unification eons ago.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161024173844.23038-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 11:48:50 +02:00
Borislav Petkov
d54ff31dd8 x86/cpu: Remove the printk format specifier in "CPU0: "
We're using a literal, move it into the string.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161024173844.23038-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 11:48:49 +02:00
Arnd Bergmann
d320b9a5bd x86/quirks: Hide maybe-uninitialized warning
gcc -Wmaybe-uninitialized detects that quirk_intel_brickland_xeon_ras_cap
uses uninitialized data when CONFIG_PCI is not set:

  arch/x86/kernel/quirks.c: In function ‘quirk_intel_brickland_xeon_ras_cap’:
  arch/x86/kernel/quirks.c:641:13: error: ‘capid0’ is used uninitialized in this function [-Werror=uninitialized]

However, the function is also not called in this configuration, so we
can avoid the warning by moving the existing #ifdef to cover it as well.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/20161024153325.2752428-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 11:45:13 +02:00
Josh Poimboeuf
7fbe6ac024 x86/unwind: Fix empty stack dereference in guess unwinder
Vince Waver reported the following bug:

  WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
  CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
  Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
  Call Trace:
   <NMI>  ? dump_stack+0x46/0x59
   ? __warn+0xd5/0xee
   ? vmalloc_fault+0x58/0x1f0
   ? __do_page_fault+0x6d/0x48e
   ? perf_log_throttle+0xa4/0xf4
   ? trace_page_fault+0x22/0x30
   ? __unwind_start+0x28/0x42
   ? perf_callchain_kernel+0x75/0xac
   ? get_perf_callchain+0x13a/0x1f0
   ? perf_callchain+0x6a/0x6c
   ? perf_prepare_sample+0x71/0x2eb
   ? perf_event_output_forward+0x1a/0x54
   ? __default_send_IPI_shortcut+0x10/0x2d
   ? __perf_event_overflow+0xfb/0x167
   ? x86_pmu_handle_irq+0x113/0x150
   ? native_read_msr+0x6/0x34
   ? perf_event_nmi_handler+0x22/0x39
   ? perf_ibs_nmi_handler+0x4a/0x51
   ? perf_event_nmi_handler+0x22/0x39
   ? nmi_handle+0x4d/0xf0
   ? perf_ibs_handle_irq+0x3d1/0x3d1
   ? default_do_nmi+0x3c/0xd5
   ? do_nmi+0x92/0x102
   ? end_repeat_nmi+0x1a/0x1e
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
   <EOE> ^A4---[ end trace 632723104d47d31a ]---
  BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
  kernel stack overflow (page fault): 0000 [#1] SMP
  ...

The NMI hit in the entry code right after setting up the stack pointer
from 'cpu_current_top_of_stack', so the kernel stack was empty.  The
'guess' version of __unwind_start() attempted to dereference the "top of
stack" pointer, which is not actually *on* the stack.

Add a check in the guess unwinder to deal with an empty stack.  (The
frame pointer unwinder already has such a check.)

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7c7900f897 ("x86/unwind: Add new unwind interface and implementations")
Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 11:36:43 +02:00
Sinan Kaya
f1caa61df2 ACPI/PCI: pci_link: penalize SCI correctly
Ondrej reported that IRQs stopped working in v4.7 on several
platforms.  A typical scenario, from Ondrej's VT82C694X/694X, is:

ACPI: Using PIC for interrupt routing
ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 *11 12 14 15)
ACPI: No IRQ available for PCI Interrupt Link [LNKA]
8139too 0000:00:0f.0: PCI INT A: no GSI

We're using PIC routing, so acpi_irq_balance == 0, and LNKA is already
active at IRQ 11. In that case, acpi_pci_link_allocate() only tries
to use the active IRQ (IRQ 11) which also happens to be the SCI.

We should penalize the SCI by PIRQ_PENALTY_PCI_USING, but
irq_get_trigger_type(11) returns something other than
IRQ_TYPE_LEVEL_LOW, so we penalize it by PIRQ_PENALTY_ISA_ALWAYS
instead, which makes acpi_pci_link_allocate() assume the IRQ isn't
available and give up.

Add acpi_penalize_sci_irq() so platforms can tell us the SCI IRQ,
trigger, and polarity directly and we don't have to depend on
irq_get_trigger_type().

Fixes: 103544d869 (ACPI,PCI,IRQ: reduce resource requirements)
Link: http://lkml.kernel.org/r/201609251512.05657.linux@rainbow-software.org
Reported-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-24 14:18:14 +02:00
Linus Torvalds
3e9679a365 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Three fixes, a hw-enablement and a cross-arch fix/enablement change:

   - SGI/UV fix for older platforms

   - x32 signal handling fix

   - older x86 platform bootup APIC fix

   - AVX512-4VNNIW (Neural Network Instructions) and AVX512-4FMAPS
     (Multiply Accumulation Single precision instructions) enablement.

   - move thread_info back into x86 specific code, to make life easier
     for other architectures trying to make use of
     CONFIG_THREAD_INFO_IN_TASK_STRUCT=y"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/smp: Don't try to poke disabled/non-existent APIC
  sched/core, x86: Make struct thread_info arch specific again
  x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi()
  x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates
  x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features
  x86/vmware: Skip timer_irq_works() check on VMware
2016-10-22 09:58:49 -07:00
Ville Syrjälä
ff8560512b x86/boot/smp: Don't try to poke disabled/non-existent APIC
Apparently trying to poke a disabled or non-existent APIC
leads to a box that doesn't even boot. Let's not do that.

No real clue if this is the right fix, but at least my
P3 machine boots again.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: dyoung@redhat.com
Cc: kexec@lists.infradead.org
Cc: stable@vger.kernel.org
Fixes: 2a51fe083e ("arch/x86: Handle non enumerated CPU after physical hotplug")
Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-22 10:47:54 +02:00
Sebastian Andrzej Siewior
d2d9c4a3d0 x86/apic: Get rid of "warning: 'acpi_ioapic_lock' defined but not used"
kbuild test robot reported this against the -RT tree:

|>> arch/x86/kernel/acpi/boot.c:90:21: warning: 'acpi_ioapic_lock' defined but not used [-Wunused-variable]
|    static DEFINE_MUTEX(acpi_ioapic_lock);
|                        ^
|   include/linux/mutex_rt.h:27:15: note: in definition of macro 'DEFINE_MUTEX'
|     struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
                  ^~~~~~~~~
which is also true (as in non-used) for !RT but the compiler does not
emit a warning.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161021084449.32523-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21 11:09:14 +02:00
Alexey Makhalov
cf11372949 x86/vmware: Read tsc_khz only once at boot time
Re-factor the vmware platform setup code to query the hypervisor for tsc
frequency only once during boot. Since the VMware hypervisor guarantees
constant TSC, calibrate_tsc now uses the saved value.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161020050211.GA25304@amakhalov-virtual-machine
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-21 10:12:11 +02:00
Josh Poimboeuf
6fa81a12b3 x86/dumpstack: Print orig_ax in __show_regs()
The value of regs->orig_ax contains potentially useful debugging data:
For syscalls it contains the syscall number.  For interrupts it contains
the (negated) vector number.  To reduce noise, print it only if it has a
useful value (i.e., something other than -1).

Here's what it looks like for a write syscall:

  RIP: 0033:[<00007f53ad7b1940>] 0x7f53ad7b1940
  RSP: 002b:00007fff8de66558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f53ad7b1940
  RDX: 0000000000000002 RSI: 00007f53ae0ca000 RDI: 0000000000000001
  ...

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/93f0fe0307a4af884d3fca00edabcc8cff236002.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21 09:26:05 +02:00
Josh Poimboeuf
1141c3e39c x86/dumpstack: Fix duplicate RIP address display in __show_regs()
The RIP address is shown twice in __show_regs().  Before:

  RIP: 0010:[<ffffffff81070446>]  [<ffffffff81070446>] native_write_msr+0x6/0x30

After:

  RIP: 0010:[<ffffffff81070446>] native_write_msr+0x6/0x30

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b3fda66f36761759b000883b059cdd9a7649dcc1.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21 09:26:04 +02:00
Josh Poimboeuf
3b3fa11bc7 x86/dumpstack: Print any pt_regs found on the stack
Now that we can find pt_regs registers on the stack, print them.  Here's
an example of what it looks like:

  Call Trace:
   <IRQ>
   [<ffffffff8144b793>] dump_stack+0x86/0xc3
   [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0
   [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60
   [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50
   [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0
  RIP: 0010:[<ffffffff818aef43>]  [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60
  RSP: 0018:ffff880079c4f760  EFLAGS: 00000202
  RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007
  RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000
  RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540
  R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000
   <EOI>
   [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250
   [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250
   [<ffffffff818a7b61>] __schedule+0x3e1/0xb20
   ...
   [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
   [<ffffffff818b1dd8>] async_page_fault+0x28/0x30
  RIP: 0010:[<ffffffff8145b062>]  [<ffffffff8145b062>] __clear_user+0x42/0x70
  RSP: 0018:ffff880079c4fd38  EFLAGS: 00010202
  RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
  RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928
  R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
  R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400
   [<ffffffff8145b043>] ? __clear_user+0x23/0x70
   [<ffffffff8145b0fb>] clear_user+0x2b/0x40
   [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750
   [<ffffffff8129a591>] search_binary_handler+0xa1/0x200
   [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0
   [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0
   [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50
   [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0
   [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25
  RIP: 0033:[<00007fd2e2f2e537>]  [<00007fd2e2f2e537>] 0x7fd2e2f2e537
  RSP: 002b:00007ffc449c5fc8  EFLAGS: 00000246
  RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537
  RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029
  RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20
  R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40
  R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5cc2c512ec82cfba00dd22467644d4ed751a48c0.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21 09:26:04 +02:00
Josh Poimboeuf
79439d8e15 x86/dumpstack: Print stack identifier on its own line
show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a
newline so that any stack address printed after it will appear on the
same line.  That causes the first stack address to be vertically
misaligned with the rest, making it visually cluttered and slightly
confusing:

  Call Trace:
   <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3
   [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
   [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
   ...
   <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
   [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0

It will look worse once we start printing pt_regs registers found in the
middle of the stack:

  <IRQ> RIP: 0010:[<ffffffff8189c6c3>]  [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60
  RSP: 0018:ffff88007876f720  EFLAGS: 00000206
  RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007
  ...

Improve readability by adding a newline to the stack name:

  Call Trace:
   <IRQ>
   [<ffffffff814431c3>] dump_stack+0x86/0xc3
   [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
   [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
   ...
   <EOI>
   [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
   [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0

Now that "continued" lines are no longer needed, we can also remove the
hack of using the empty string (aka KERN_CONT) and replace it with
KERN_DEFAULT.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9bdd6dee2c74555d45500939fcc155997dc7889e.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21 09:26:04 +02:00
Josh Poimboeuf
acb4608ad1 x86/unwind: Create stack frames for saved syscall registers
The entry code doesn't encode the pt_regs pointer for syscalls.  But the
pt_regs are always at the same location, so we can add a manual check
for them.

A later patch prints them as part of the oops stack dump.  They could be
useful, for example, to determine the arguments to a system call.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e176aa9272930cd3f51fda0b94e2eae356677da4.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21 09:26:04 +02:00
Josh Poimboeuf
946c191161 x86/entry/unwind: Create stack frames for saved interrupt registers
With frame pointers, when a task is interrupted, its stack is no longer
completely reliable because the function could have been interrupted
before it had a chance to save the previous frame pointer on the stack.
So the caller of the interrupted function could get skipped by a stack
trace.

This is problematic for live patching, which needs to know whether a
stack trace of a sleeping task can be relied upon.  There's currently no
way to detect if a sleeping task was interrupted by a page fault
exception or preemption before it went to sleep.

Another issue is that when dumping the stack of an interrupted task, the
unwinder has no way of knowing where the saved pt_regs registers are, so
it can't print them.

This solves those issues by encoding the pt_regs pointer in the frame
pointer on entry from an interrupt or an exception.

This patch also updates the unwinder to be able to decode it, because
otherwise the unwinder would be broken by this change.

Note that this causes a change in the behavior of the unwinder: each
instance of a pt_regs on the stack is now considered a "frame".  So
callers of unwind_get_return_address() will now get an occasional
'regs->ip' address that would have previously been skipped over.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8b9f84a21e39d249049e0547b559ff8da0df0988.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21 09:26:03 +02:00
Dmitry Safonov
ed1e7db33c x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi()
The recent introduction of SA_X32/IA32 sa_flags added a check for
user_64bit_mode() into sigaction_compat_abi(). user_64bit_mode() is true
for native 64-bit processes and x32 processes.

Due to that the function returns w/o setting the SA_X32_ABI flag for X32
processes. In consequence the kernel attempts to deliver the signal to the
X32 process in native 64-bit mode causing the process to segfault.

Remove the check, so the actual check for X32 mode which sets the ABI flag
can be reached. There is no side effect for native 64-bit mode.

[ tglx: Rewrote changelog ]

Fixes: 6846351052 ("x86/signal: Add SA_{X32,IA32}_ABI sa_flags")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Link: http://lkml.kernel.org/r/CAJwJo6Z8ZWPqNfT6t-i8GW1MKxQrKDUagQqnZ%2B0%2B697%3DMyVeGg@mail.gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 13:05:15 +02:00
Josh Poimboeuf
e728f61ce0 x86/boot: Move the _stext marker to before the boot code
When core_kernel_text() is used to determine whether an address on a
task's stack trace is a kernel text address, it incorrectly returns
false for early text addresses for the head code between the _text and
_stext markers.  Among other things, this can cause the unwinder to
behave incorrectly when unwinding to x86 head code.

Head code is text code too, so mark it as such.  This seems to match the
intent of other users of the _stext symbol, and it also seems consistent
with what other architectures are already doing.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/789cf978866420e72fa89df44aa2849426ac378d.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:15:24 +02:00
Josh Poimboeuf
22dc391865 x86/boot: Fix the end of the stack for idle tasks
Thanks to all the recent x86 entry code refactoring, most tasks' kernel
stacks start at the same offset right below their saved pt_regs,
regardless of which syscall was used to enter the kernel.  That creates
a nice convention which makes it straightforward to identify the end of
the stack, which can be useful for the unwinder to verify the stack is
sane.

However, the boot CPU's idle "swapper" task doesn't follow that
convention.  Fix that by starting its stack at a sizeof(pt_regs) offset
from the end of the stack page.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/81aee3beb6ed88e44f1bea6986bb7b65c368f77a.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:15:23 +02:00
Josh Poimboeuf
595c1e645d x86/boot/64: Put a real return address on the idle task stack
The frame at the end of each idle task stack has a zeroed return
address.  This is inconsistent with real task stacks, which have a real
return address at that spot.  This inconsistency can be confusing for
stack unwinders.  It also hides useful information about what asm code
was involved in calling into C.

Make it a real address by using the side effect of a call instruction to
push the instruction pointer on the stack.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f59593ae7b15d5126f872b0a23143173d28aa32d.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:15:23 +02:00
Josh Poimboeuf
a9468df5ad x86/boot/64: Use a common function for starting CPUs
There are two different pieces of code for starting a CPU: start_cpu0()
and the end of secondary_startup_64().  They're identical except for the
stack setup.  Combine the common parts into a shared start_cpu()
function.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1d692ffa62fcb3cc835a5b254e953f2d9bab3549.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:15:23 +02:00
Josh Poimboeuf
b9b1a9c363 x86/boot/smp/32: Fix initial idle stack location on 32-bit kernels
On 32-bit kernels, the initial idle stack calculation doesn't take into
account the TOP_OF_KERNEL_STACK_PADDING, making the stack end address
inconsistent with other tasks on 32-bit.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6cf569410bfa84cf923902fc4d628444cace94be.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:15:23 +02:00
Josh Poimboeuf
6616a147a7 x86/boot/32: Fix the end of the stack for idle tasks
The frame at the end of each idle task stack is inconsistent with real
task stacks, which have a stack frame header and a real return address
before the pt_regs area.  This inconsistency can be confusing for stack
unwinders.  It also hides useful information about what asm code was
involved in calling into C.

Fix that by changing the initial code jumps to calls.  Also add infinite
loops after the calls to make it clear that the calls don't return, and
to hang if they do.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2588f34b6fbac4ae6f6f9ead2a78d7f8d58a6341.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:15:23 +02:00
Josh Poimboeuf
1b00255f32 x86/entry/32, x86/boot/32: Use local labels
Add the local label prefix to all non-function named labels in head_32.S
and entry_32.S.  In addition to decluttering the symbol table, it also
will help stack traces to be more sensible.  For example, the last
reported function in the idle task stack trace will be startup_32_smp()
instead of is486().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/14f9f7afd478b23a762f40734da1a57c0c273f6e.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:15:22 +02:00
Tejun Heo
8bc4a04455 Merge branch 'for-4.9' into for-4.10 2016-10-19 12:12:40 -04:00
Linus Torvalds
63ae602cea Merge branch 'gup_flag-cleanups'
Merge the gup_flags cleanups from Lorenzo Stoakes:
 "This patch series adjusts functions in the get_user_pages* family such
  that desired FOLL_* flags are passed as an argument rather than
  implied by flags.

  The purpose of this change is to make the use of FOLL_FORCE explicit
  so it is easier to grep for and clearer to callers that this flag is
  being used.  The use of FOLL_FORCE is an issue as it overrides missing
  VM_READ/VM_WRITE flags for the VMA whose pages we are reading
  from/writing to, which can result in surprising behaviour.

  The patch series came out of the discussion around commit 38e0885465
  ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
  which addressed a BUG_ON() being triggered when a page was faulted in
  with PROT_NONE set but having been overridden by FOLL_FORCE.
  do_numa_page() was run on the assumption the page _must_ be one marked
  for NUMA node migration as an actual PROT_NONE page would have been
  dealt with prior to this code path, however FOLL_FORCE introduced a
  situation where this assumption did not hold.

  See

      https://marc.info/?l=linux-mm&m=147585445805166

  for the patch proposal"

Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.

[ This branch was rebased recently to add a few more acked-by's and
  reviewed-by's ]

* gup_flag-cleanups:
  mm: replace access_process_vm() write parameter with gup_flags
  mm: replace access_remote_vm() write parameter with gup_flags
  mm: replace __access_remote_vm() write parameter with gup_flags
  mm: replace get_user_pages_remote() write/force parameters with gup_flags
  mm: replace get_user_pages() write/force parameters with gup_flags
  mm: replace get_vaddr_frames() write/force parameters with gup_flags
  mm: replace get_user_pages_locked() write/force parameters with gup_flags
  mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
  mm: remove write/force parameters from __get_user_pages_unlocked()
  mm: remove write/force parameters from __get_user_pages_locked()
  mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
2016-10-19 08:39:47 -07:00
Piotr Luc
8214899342 x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features
AVX512_4VNNIW  - Vector instructions for deep learning enhanced word
variable precision.
AVX512_4FMAPS - Vector instructions for deep learning floating-point
single precision.

These new instructions are to be used in future Intel Xeon & Xeon Phi
processors. The bits 2&3 of CPUID[level:0x07, EDX] inform that new
instructions are supported by a processor.

The spec can be found in the Intel Software Developer Manual (SDM) or in
the Instruction Set Extensions Programming Reference (ISE).

Define new feature flags to enumerate the new instructions in /proc/cpuinfo
accordingly to CPUID bits and add the required xsave extensions which are
required for proper operation.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161018150111.29926-1-piotr.luc@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-19 17:37:13 +02:00
Renat Valiullin
854dd54245 x86/vmware: Skip timer_irq_works() check on VMware
The timer_irq_works() boot check may sometimes fail in a VM, when
the Host is overcommitted or when the Guest is running nested.

Since the intended check is unnecessary on VMware's virtual
hardware, by-pass it.

Signed-off-by: Renat Valiullin <rvaliullin@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161013184539.GA11497@rvaliullin-vm
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-19 17:36:33 +02:00
Lorenzo Stoakes
f307ab6dce mm: replace access_process_vm() write parameter with gup_flags
This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:31:25 -07:00
Linus Torvalds
0832881425 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes, plus hw-enablement changes:

   - fix persistent RAM handling
   - remove pkeys warning
   - remove duplicate macro
   - fix debug warning in irq handler
   - add new 'Knights Mill' CPU related constants and enable the perf bits"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add Knights Mill CPUID
  perf/x86/intel/rapl: Add Knights Mill CPUID
  perf/x86/intel: Add Knights Mill CPUID
  x86/cpu/intel: Add Knights Mill to Intel family
  x86/e820: Don't merge consecutive E820_PRAM ranges
  pkeys: Remove easily triggered WARN
  x86: Remove duplicate rtit status MSR macro
  x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt()
2016-10-18 09:59:04 -07:00
Ingo Molnar
4d69f155d5 Merge tag 'v4.9-rc1' into x86/fpu, to resolve conflict
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16 13:04:34 +02:00
Rik van Riel
c474e50711 x86/fpu: Split old_fpu & new_fpu handling into separate functions
By moving all of the new_fpu state handling into switch_fpu_finish(),
the code can be simplified some more.

This gets rid of the prefetch, but given the size of the FPU register
state on modern CPUs, and the amount of work done by __switch_to()
inbetween both functions, the value of a single cache line prefetch
seems somewhat dubious anyway.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1476447331-21566-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16 11:38:41 +02:00
Rik van Riel
317b622cb2 x86/fpu: Remove 'cpu' argument from __cpu_invalidate_fpregs_state()
The __{fpu,cpu}_invalidate_fpregs_state() functions can only be used
to invalidate a resource they control.  Document that, and change
the API a little bit to reflect that.

Go back to open coding the fpu_fpregs_owner_ctx write in the CPU
hotplug code, which should be the exception, and move __kernel_fpu_begin()
to this API.

This patch has no functional changes to the current code.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1476447331-21566-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16 11:38:31 +02:00
Ingo Molnar
1d33369db2 Merge tag 'v4.9-rc1' into x86/urgent, to pick up updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16 11:31:39 +02:00
Dan Williams
23446cb66c x86/e820: Don't merge consecutive E820_PRAM ranges
Commit:

  917db484dc ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")

... fixed up the broken manipulations of max_pfn in the presence of
E820_PRAM ranges.

However, it also broke the sanitize_e820_map() support for not merging
E820_PRAM ranges.

Re-introduce the enabling to keep resource boundaries between
consecutive defined ranges. Otherwise, for example, an environment that
boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0
device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhang Yi <yizhan@redhat.com>
Cc: linux-nvdimm@lists.01.org
Fixes: 917db484dc ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16 11:16:48 +02:00
Dmitry Vyukov
9f7d416c36 kprobes: Unpoison stack in jprobe_return() for KASAN
I observed false KSAN positives in the sctp code, when
sctp uses jprobe_return() in jsctp_sf_eat_sack().

The stray 0xf4 in shadow memory are stack redzones:

[     ] ==================================================================
[     ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480
[     ] Read of size 1 by task syz-executor/18535
[     ] page:ffffea00017923c0 count:0 mapcount:0 mapping:          (null) index:0x0
[     ] flags: 0x1fffc0000000000()
[     ] page dumped because: kasan: bad access detected
[     ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28
[     ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[     ]  ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8
[     ]  ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000
[     ]  ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370
[     ] Call Trace:
[     ]  [<ffffffff82d2b849>] dump_stack+0x12e/0x185
[     ]  [<ffffffff817d3169>] kasan_report+0x489/0x4b0
[     ]  [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20
[     ]  [<ffffffff82d49529>] memcmp+0xe9/0x150
[     ]  [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0
[     ]  [<ffffffff817d2031>] save_stack+0xb1/0xd0
[     ]  [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0
[     ]  [<ffffffff817d05b8>] kfree+0xc8/0x2a0
[     ]  [<ffffffff85b03f19>] skb_free_head+0x79/0xb0
[     ]  [<ffffffff85b0900a>] skb_release_data+0x37a/0x420
[     ]  [<ffffffff85b090ff>] skb_release_all+0x4f/0x60
[     ]  [<ffffffff85b11348>] consume_skb+0x138/0x370
[     ]  [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180
[     ]  [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70
[     ]  [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0
[     ]  [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0
[     ]  [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190
[     ]  [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20
[ ... ]
[     ] Memory state around the buggy address:
[     ]  ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]  ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]                    ^
[     ]  ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]  ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ] ==================================================================

KASAN stack instrumentation poisons stack redzones on function entry
and unpoisons them on function exit. If a function exits abnormally
(e.g. with a longjmp like jprobe_return()), stack redzones are left
poisoned. Later this leads to random KASAN false reports.

Unpoison stack redzones in the frames we are going to jump over
before doing actual longjmp in jprobe_return().

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: kasan-dev@googlegroups.com
Cc: surovegin@google.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16 11:02:31 +02:00
Dmitry Vyukov
9254139ad0 kprobes: Avoid false KASAN reports during stack copy
Kprobes save and restore raw stack chunks with memcpy().
With KASAN these chunks can contain poisoned stack redzones,
as the result memcpy() interceptor produces false
stack out-of-bounds reports.

Use __memcpy() instead of memcpy() for stack copying.
__memcpy() is not instrumented by KASAN and does not lead
to the false reports.

Currently there is a spew of KASAN reports during boot
if CONFIG_KPROBES_SANITY_TEST is enabled:

[   ] Kprobe smoke test: started
[   ] ==================================================================
[   ] BUG: KASAN: stack-out-of-bounds in setjmp_pre_handler+0x17c/0x280 at addr ffff88085259fba8
[   ] Read of size 64 by task swapper/0/1
[   ] page:ffffea00214967c0 count:0 mapcount:0 mapping:          (null) index:0x0
[   ] flags: 0x2fffff80000000()
[   ] page dumped because: kasan: bad access detected
[...]

Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kasan-dev@googlegroups.com
[ Improved various details. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16 10:58:59 +02:00
Linus Torvalds
84d69848c9 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - EXPORT_SYMBOL for asm source by Al Viro.

   This does bring a regression, because genksyms no longer generates
   checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
   working on a patch to fix this.

   Plus, we are talking about functions like strcpy(), which rarely
   change prototypes.

 - Fixes for PPC fallout of the above by Stephen Rothwell and Nick
   Piggin

 - fixdep speedup by Alexey Dobriyan.

 - preparatory work by Nick Piggin to allow architectures to build with
   -ffunction-sections, -fdata-sections and --gc-sections

 - CONFIG_THIN_ARCHIVES support by Stephen Rothwell

 - fix for filenames with colons in the initramfs source by me.

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
  initramfs: Escape colons in depfile
  ppc: there is no clear_pages to export
  powerpc/64: whitelist unresolved modversions CRCs
  kbuild: -ffunction-sections fix for archs with conflicting sections
  kbuild: add arch specific post-link Makefile
  kbuild: allow archs to select link dead code/data elimination
  kbuild: allow architectures to use thin archives instead of ld -r
  kbuild: Regenerate genksyms lexer
  kbuild: genksyms fix for typeof handling
  fixdep: faster CONFIG_ search
  ia64: move exports to definitions
  sparc32: debride memcpy.S a bit
  [sparc] unify 32bit and 64bit string.h
  sparc: move exports to definitions
  ppc: move exports to definitions
  arm: move exports to definitions
  s390: move exports to definitions
  m68k: move exports to definitions
  alpha: move exports to actual definitions
  x86: move exports to actual definitions
  ...
2016-10-14 14:26:58 -07:00
Wanpeng Li
1ec6ec14a2 x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt()
===============================
 [ INFO: suspicious RCU usage. ]
 4.8.0+ #24 Not tainted
 -------------------------------
 ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
 
 other info that might help us debug this:
 
 RCU used illegally from idle CPU!
 rcu_scheduler_active = 1, debug_locks = 0
 RCU used illegally from extended quiescent state!
 no locks held by swapper/1/0.
 
  [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140
  [<ffffffff9d06f860>] native_write_msr+0x20/0x30
  [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30
  [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30
  [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0

Reschedule interrupt may be called in cpu idle state. This causes lockdep 
check warning above. 

Add irq_enter/exit() in smp_reschedule_interrupt(), irq_enter() tells the RCU 
subsystems to end the extended quiescent state, so the following trace call in 
ack_APIC_irq() works correctly.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Link: http://lkml.kernel.org/r/1476409733-5133-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-14 14:14:20 +02:00
Linus Torvalds
6b25e21fa6 Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Core:
   - Fence destaging work
   - DRIVER_LEGACY to split off legacy drm drivers
   - drm_mm refactoring
   - Splitting drm_crtc.c into chunks and documenting better
   - Display info fixes
   - rbtree support for prime buffer lookup
   - Simple VGA DAC driver

  Panel:
   - Add Nexus 7 panel
   - More simple panels

  i915:
   - Refactoring GEM naming
   - Refactored vma/active tracking
   - Lockless request lookups
   - Better stolen memory support
   - FBC fixes
   - SKL watermark fixes
   - VGPU improvements
   - dma-buf fencing support
   - Better DP dongle support

  amdgpu:
   - Powerplay for Iceland asics
   - Improved GPU reset support
   - UVD/VEC powergating support for CZ/ST
   - Preinitialised VRAM buffer support
   - Virtual display support
   - Initial SI support
   - GTT rework
   - PCI shutdown callback support
   - HPD IRQ storm fixes

  amdkfd:
   - bugfixes

  tilcdc:
   - Atomic modesetting support

  mediatek:
   - AAL + GAMMA engine support
   - Hook up gamma LUT
   - Temporal dithering support

  imx:
   - Pixel clock from devicetree
   - drm bridge support for LVDS bridges
   - active plane reconfiguration
   - VDIC deinterlacer support
   - Frame synchronisation unit support
   - Color space conversion support

  analogix:
   - PSR support
   - Better panel on/off support

  rockchip:
   - rk3399 vop/crtc support
   - PSR support

  vc4:
   - Interlaced vblank timing
   - 3D rendering CPU overhead reduction
   - HDMI output fixes

  tda998x:
   - HDMI audio ASoC support

  sunxi:
   - Allwinner A33 support
   - better TCON support

  msm:
   - DT binding cleanups
   - Explicit fence-fd support

  sti:
   - remove sti415/416 support

  etnaviv:
   - MMUv2 refactoring
   - GC3000 support

  exynos:
   - Refactoring HDMI DCC/PHY
   - G2D pm regression fix
   - Page fault issues with wait for vblank

  There is no nouveau work in this tree, as Ben didn't get a pull
  request in, and he was fighting moving to atomic and adding mst
  support, so maybe best it waits for a cycle"

* tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits)
  drm/crtc: constify drm_crtc_index parameter
  drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next
  drm/i915/guc: Unwind GuC workqueue reservation if request construction fails
  drm/i915: Reset the breadcrumbs IRQ more carefully
  drm/i915: Force relocations via cpu if we run out of idle aperture
  drm/i915: Distinguish last emitted request from last submitted request
  drm/i915: Allow DP to work w/o EDID
  drm/i915: Move long hpd handling into the hotplug work
  drm/i915/execlists: Reinitialise context image after GPU hang
  drm/i915: Use correct index for backtracking HUNG semaphores
  drm/i915: Unalias obj->phys_handle and obj->userptr
  drm/i915: Just clear the mmiodebug before a register access
  drm/i915/gen9: only add the planes actually affected by ddb changes
  drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED
  drm/i915/bxt: Fix HDMI DPLL configuration
  drm/i915/gen9: fix the watermark res_blocks value
  drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations
  drm/i915/gen9: minimum scanlines for Y tile is not always 4
  drm/i915/gen9: fix the WaWmMemoryReadLatency implementation
  drm/i915/kbl: KBL also needs to run the SAGV code
  ...
2016-10-11 18:12:22 -07:00
Thomas Garnier
0549a3c02e kdump, vmcoreinfo: report memory sections virtual addresses
KASLR memory randomization can randomize the base of the physical memory
mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
(VMEMMAP_START).  Adding these variables on VMCOREINFO so tools can easily
identify the base of each memory section.

Link: http://lkml.kernel.org/r/1471531632-23003-1-git-send-email-thgarnie@google.com
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eugene Surovegin <surovegin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Hidehiro Kawai
0ee59413c9 x86/panic: replace smp_send_stop() with kdump friendly version in panic path
Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).

In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online.  As the result, for x86, kdump
routines fail to save other CPUs' registers and disable virtualization
extensions.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a weak
function, and it just call smp_send_stop().  Architecture codes should
override it so that kdump can work appropriately.  This patch only
provides x86-specific version.

For Xen's PV kernel, just keep the current behavior.

NOTES:

- Right solution would be to place crash_smp_send_stop() before
  __crash_kexec() invocation in all cases and remove smp_send_stop(), but
  we can't do that until all architectures implement own
  crash_smp_send_stop()

- crash_smp_send_stop()-like work is still needed by
  machine_crash_shutdown() because crash_kexec() can be called without
  entering panic()

Fixes: f06e5153f4 (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Reported-by: Daniel Walker <dwalker@fifo99.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Daniel Walker <dwalker@fifo99.com>
Cc: Xunlei Pang <xpang@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: "Steven J. Hill" <steven.hill@cavium.com>
Cc: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:32 -07:00
Jason Cooper
9c6f0902a5 x86: use simpler API for random address requests
Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can simplify
the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Link: http://lkml.kernel.org/r/20160803233913.32511-3-jason@lakedaemon.net
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:32 -07:00