Commit Graph

14569 Commits

Author SHA1 Message Date
Andy Lutomirski
33a2f1a6c4 x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
get_stack_info() doesn't currently know about the SYSENTER stack, so
unwinding will fail if we entered the kernel on the SYSENTER stack
and haven't fully switched off.  Teach get_stack_info() about the
SYSENTER stack.

With future patches applied that run part of the entry code on the
SYSENTER stack and introduce an intentional BUG(), I would get:

  PANIC: double fault, error_code: 0x0
  ...
  RIP: 0010:do_error_trap+0x33/0x1c0
  ...
  Call Trace:
  Code: ...

With this patch, I get:

  PANIC: double fault, error_code: 0x0
  ...
  Call Trace:
   <SYSENTER>
   ? async_page_fault+0x36/0x60
   ? invalid_op+0x22/0x40
   ? async_page_fault+0x36/0x60
   ? sync_regs+0x3c/0x40
   ? sync_regs+0x2e/0x40
   ? error_entry+0x6c/0xd0
   ? async_page_fault+0x36/0x60
   </SYSENTER>
  Code: ...

which is a lot more informative.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.392711508@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:54 +01:00
Andy Lutomirski
1a79797b58 x86/entry/64: Allocate and enable the SYSENTER stack
This will simplify future changes that want scratch variables early in
the SYSENTER handler -- they'll be able to spill registers to the
stack.  It also lets us get rid of a SWAPGS_UNSAFE_STACK user.

This does not depend on CONFIG_IA32_EMULATION=y because we'll want the
stack space even without IA32 emulation.

As far as I can tell, the reason that this wasn't done from day 1 is
that we use IST for #DB and #BP, which is IMO rather nasty and causes
a lot more problems than it solves.  But, since #DB uses IST, we don't
actually need a real stack for SYSENTER (because SYSENTER with TF set
will invoke #DB on the IST stack rather than the SYSENTER stack).

I want to remove IST usage from these vectors some day, and this patch
is a prerequisite for that as well.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.312726423@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:53 +01:00
Andy Lutomirski
4f3789e792 x86/irq/64: Print the offending IP in the stack overflow warning
In case something goes wrong with unwind (not unlikely in case of
overflow), print the offending IP where we detected the overflow.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.231677119@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:53 +01:00
Andy Lutomirski
6669a69260 x86/irq: Remove an old outdated comment about context tracking races
That race has been fixed and code cleaned up for a while now.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.150551639@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:53 +01:00
Josh Poimboeuf
b02fcf9ba1 x86/unwinder: Handle stack overflows more gracefully
There are at least two unwinder bugs hindering the debugging of
stack-overflow crashes:

- It doesn't deal gracefully with the case where the stack overflows and
  the stack pointer itself isn't on a valid stack but the
  to-be-dereferenced data *is*.

- The ORC oops dump code doesn't know how to print partial pt_regs, for the
  case where if we get an interrupt/exception in *early* entry code
  before the full pt_regs have been saved.

Fix both issues.

http://lkml.kernel.org/r/20171126024031.uxi4numpbjm5rlbr@treble

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.071425003@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:52 +01:00
Andy Lutomirski
d3a0910401 x86/unwinder/orc: Dont bail on stack overflow
If the stack overflows into a guard page and the ORC unwinder should work
well: by construction, there can't be any meaningful data in the guard page
because no writes to the guard page will have succeeded.

But there is a bug that prevents unwinding from working correctly: if the
starting register state has RSP pointing into a stack guard page, the ORC
unwinder bails out immediately.

Instead of bailing out immediately check whether the next page up is a
valid check page and if so analyze that. As a result the ORC unwinder will
start the unwind.

Tested by intentionally overflowing the task stack.  The result is an
accurate call trace instead of a trace consisting purely of '?' entries.

There are a few other bugs that are triggered if the unwinder encounters a
stack overflow after the first step, but they are outside the scope of this
fix.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150604.991389777@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:52 +01:00
Boris Ostrovsky
e17f823453 x86/entry/64/paravirt: Use paravirt-safe macro to access eflags
Commit 1d3e53e862 ("x86/entry/64: Refactor IRQ stacks and make them
NMI-safe") added DEBUG_ENTRY_ASSERT_IRQS_OFF macro that acceses eflags
using 'pushfq' instruction when testing for IF bit. On PV Xen guests
looking at IF flag directly will always see it set, resulting in 'ud2'.

Introduce SAVE_FLAGS() macro that will use appropriate save_fl pv op when
running paravirt.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20171204150604.899457242@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:52 +01:00
Will Deacon
3382290ed2 locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
[ Note, this is a Git cherry-pick of the following commit:

    506458efaf ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")

  ... for easier x86 PTI code testing and back-porting. ]

READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:57:15 +01:00
Rudolf Marek
f2dbad36c5 x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
[ Note, this is a Git cherry-pick of the following commit:

    2b67799bdf25 ("x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD")

  ... for easier x86 PTI code testing and back-porting. ]

The latest AMD AMD64 Architecture Programmer's Manual
adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]).

If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES
/ FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers,
thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs.

Signed-Off-By: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:55:02 +01:00
Ingo Molnar
e5d77a73f3 Merge commit 'upstream-x86-virt' into WIP.x86/mm
Merge a minimal set of virt cleanups, for a base for the MM isolation patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:50:01 +01:00
Ingo Molnar
650400b2cc Merge branch 'upstream-x86-selftests' into WIP.x86/pti.base
Conflicts:
	arch/x86/kernel/cpu/Makefile

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:04:28 +01:00
Ingo Molnar
0fd2e9c53d Merge commit 'upstream-x86-entry' into WIP.x86/mm
Pull in a minimal set of v4.15 entry code changes, for a base for the MM isolation patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 12:58:53 +01:00
Lucas De Marchi
33aa69ed8a x86/gpu: add CFL to early quirks
CFL was missing from intel_early_ids[]. The PCI ID needs to be there to
allow the memory region to be stolen, otherwise we could have RAM being
arbitrarily overwritten if for example we keep using the UEFI framebuffer,
depending on how BIOS has set up the e820 map.

Fixes: b056f8f3d6 ("drm/i915/cfl: Add Coffee Lake PCI IDs for S Skus.")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: <stable@vger.kernel.org> # v4.13+ 0890540e21 drm/i915: add GT number to intel_device_info
Cc: <stable@vger.kernel.org> # v4.13+ 41693fd523 drm/i915/kbl: Change a KBL pci id to GT2 from GT1.5
Cc: <stable@vger.kernel.org> # v4.13+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213200425.2954-1-lucas.demarchi@intel.com
2017-12-15 13:18:00 -08:00
Linus Torvalds
e53000b1ed Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - fix the s2ram regression related to confusion around segment
     register restoration, plus related cleanups that make the code more
     robust

   - a guess-unwinder Kconfig dependency fix

   - an isoimage build target fix for certain tool chain combinations

   - instruction decoder opcode map fixes+updates, and the syncing of
     the kernel decoder headers to the objtool headers

   - a kmmio tracing fix

   - two 5-level paging related fixes

   - a topology enumeration fix on certain SMP systems"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version
  x86/decoder: Fix and update the opcodes map
  x86/power: Make restore_processor_context() sane
  x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
  x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
  x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y
  x86/build: Don't verify mtools configuration file for isoimage
  x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
  x86/boot/compressed/64: Print error if 5-level paging is not supported
  x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
2017-12-15 12:14:33 -08:00
Andy Lutomirski
c739f930be x86/espfix/64: Fix espfix double-fault handling on 5-level systems
Using PGDIR_SHIFT to identify espfix64 addresses on 5-level systems
was wrong, and it resulted in panics due to unhandled double faults.
Use P4D_SHIFT instead, which is correct on 4-level and 5-level
machines.

This fixes a panic when running x86 selftests on 5-level machines.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1d33b21956 ("x86/espfix: Add support for 5-level paging")
Link: http://lkml.kernel.org/r/24c898b4f44fdf8c22d93703850fb384ef87cfdc.1513035461.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15 16:58:53 +01:00
Ingo Molnar
76523de619 Merge tag 'v4.15-rc3' into perf/core, to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-12 13:25:54 +01:00
Pravin Shedge
81bf665d00 x86/headers: Remove duplicate #includes
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: boris.ostrovsky@oracle.com
Cc: geert@linux-m68k.org
Cc: jgross@suse.com
Cc: linux-efi@vger.kernel.org
Cc: luto@kernel.org
Cc: matt@codeblueprint.co.uk
Cc: thomas.lendacky@amd.com
Cc: tim.c.chen@linux.intel.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1513024951-9221-1-git-send-email-pravin.shedge4linux@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-12 11:32:24 +01:00
Matthew Auld
3b51b6f31a x86/early-quirks: replace the magical increment start values
Replace the magical +2, +9 etc. with +MB, which is far easier to read.

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-4-matthew.auld@intel.com
2017-12-12 12:30:19 +02:00
Matthew Auld
55f56fc460 x86/early-quirks: export the stolen region as a resource
We duplicate the stolen discovery code in early-quirks and in i915,
however if we just export the region as a resource from early-quirks we
can nuke the duplication.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-3-matthew.auld@intel.com
2017-12-12 12:30:18 +02:00
Joonas Lahtinen
6f9fa996c9 x86/early-quirks: Extend Intel graphics stolen memory placement to 64bit
To give upcoming SKU BIOSes more flexibility in placing the Intel
graphics stolen memory, make all variables storing the placement or size
compatible with full 64 bit range.

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-2-matthew.auld@intel.com
2017-12-12 12:30:08 +02:00
Yonghong Song
e7ed9d9bd0 uprobes/x86: Emulate push insns for uprobe on x86
Uprobe is a tracing mechanism for userspace programs.
Typical uprobe will incur overhead of two traps.
First trap is caused by replaced trap insn, and
the second trap is to execute the original displaced
insn in user space.

To reduce the overhead, kernel provides hooks
for architectures to emulate the original insn
and skip the second trap. In x86, emulation
is done for certain branch insns.

This patch extends the emulation to "push <reg>"
insns. These insns are typical in the beginning
of the function. For example, bcc
in https://github.com/iovisor/bcc repo provides
tools to measure funclantency, detect memleak, etc.
The tools will place uprobes in the beginning of
function and possibly uretprobes at the end of function.
This patch is able to reduce the trap overhead for
uprobe from 2 to 1.

Without this patch, uretprobe will typically incur
three traps. With this patch, if the function starts
with "push" insn, the number of traps can be
reduced from 3 to 2.

An experiment was conducted on two local VMs,
fedora 26 64-bit VM and 32-bit VM, both 4 processors
and 4GB memory, booted with latest tip repo (and this patch).
The host is MacBook with intel i7 processor.

The test program looks like:

  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <sys/time.h>

  static void test() __attribute__((noinline));
  void test() {}
  int main() {
    struct timeval start, end;

    gettimeofday(&start, NULL);
    for (int i = 0; i < 1000000; i++) {
      test();
    }
    gettimeofday(&end, NULL);

    printf("%ld\n", ((end.tv_sec * 1000000 + end.tv_usec)
                     - (start.tv_sec * 1000000 + start.tv_usec)));
    return 0;
  }

The program is compiled without optimization, and
the first insn for function "test" is "push %rbp".
The host is relatively idle.

Before the test run, the uprobe is inserted as below for uprobe:
  echo 'p <binary>:<test_func_offset>' > /sys/kernel/debug/tracing/uprobe_events
  echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable
and for uretprobe:
  echo 'r <binary>:<test_func_offset>' > /sys/kernel/debug/tracing/uprobe_events
  echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable

Unit: microsecond(usec) per loop iteration

x86_64          W/ this patch   W/O this patch
uprobe          1.55            3.1
uretprobe       2.0             3.6

x86_32          W/ this patch   W/O this patch
uprobe          1.41            3.5
uretprobe       1.75            4.0

You can see that this patch significantly reduced the overhead,
50% for uprobe and 44% for uretprobe on x86_64, and even more
on x86_32.

Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20171201001202.3706564-1-yhs@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-11 18:42:11 +01:00
Borislav Petkov
03dd604e1d x86/apic: Remove local var in flat_send_IPI_allbutself()
No code changed:

  # arch/x86/kernel/apic/apic_flat_64.o:

   text    data     bss     dec     hex filename
   1838     624       0    2462     99e apic_flat_64.o.before
   1838     624       0    2462     99e apic_flat_64.o.after

md5:
   aa2ae687d94bc4534f86ae6865dabd6a  apic_flat_64.o.before.asm
   42148da76ba8f9a236c33f8803bd2a6b  apic_flat_64.o.after.asm

md5 sum is different due to asm output offsets changing.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171211115444.26577-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-11 14:47:16 +01:00
Prarit Bhargava
947134d9b0 x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
Documentation/x86/topology.txt defines smp_num_siblings as "The number of
threads in a core".  Since commit bbb65d2d36 ("x86: use cpuid vector 0xb
when available for detecting cpu topology") smp_num_siblings is the
maximum number of threads in a core.  If Simultaneous MultiThreading
(SMT) is disabled on a system, smp_num_siblings is 2 and not 1 as
expected.

Use topology_max_smt_threads(), which contains the active numer of threads,
in the __max_logical_packages calculation.

On a single socket, single core, single thread system __max_smt_threads has
not been updated when the __max_logical_packages calculation happens, so its
zero which makes the package estimate fail. Initialize it to one, which is
the minimum number of threads on a core.

[ tglx: Folded the __max_smt_threads fix in ]

Fixes: b4c0a7326f ("x86/smpboot: Fix __max_logical_packages estimate")
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Prarit Bhargava <prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jakub Kicinski <kubakici@wp.pl>
Cc: netdev@vger.kernel.org
Cc: "netdev@vger.kernel.org"
Cc: Clark Williams <williams@redhat.com>
Link: https://lkml.kernel.org/r/20171204164521.17870-1-prarit@redhat.com
2017-12-07 10:28:22 +01:00
Linus Torvalds
dd53a4214d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:

 - make CR4 handling irq-safe, which bug vmware guests ran into

 - don't crash on early IRQs in Xen guests

 - don't crash secondary CPU bringup if #UD assisted WARN()ings are
   triggered

 - make X86_BUG_FXSAVE_LEAK optional on newer AMD CPUs that have the fix

 - fix AMD Fam17h microcode loading

 - fix broadcom_postcore_init() if ACPI is disabled

 - fix resume regression in __restore_processor_context()

 - fix Sparse warnings

 - fix a GCC-8 warning

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Change time() prototype to match __vdso_time()
  x86: Fix Sparse warnings about non-static functions
  x86/power: Fix some ordering bugs in __restore_processor_context()
  x86/PCI: Make broadcom_postcore_init() check acpi_disabled
  x86/microcode/AMD: Add support for fam17h microcode loading
  x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
  x86/idt: Load idt early in start_secondary
  x86/xen: Support early interrupts in xen pv guests
  x86/tlb: Disable interrupts when changing CR4
  x86/tlb: Refactor CR4 setting and shadow write
2017-12-06 17:47:29 -08:00
Colin Ian King
d553d03f70 x86: Fix Sparse warnings about non-static functions
Functions x86_vector_debug_show(), uv_handle_nmi() and uv_nmi_setup_common()
are local to the source and do not need to be in global scope, so make them
static.

Fixes up various sparse warnings.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Cc: travis@sgi.com
Link: http://lkml.kernel.org/r/20171206173358.24388-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06 19:32:58 +01:00
Tom Lendacky
f4e9b7af0c x86/microcode/AMD: Add support for fam17h microcode loading
The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes.  Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06 12:27:24 +01:00
Rudolf Marek
e3811a3f74 x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
The latest AMD AMD64 Architecture Programmer's Manual
adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]).

If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES
/ FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers,
thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs.

Signed-off-by: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06 12:27:13 +01:00
Yazen Ghannam
c8a4364c33 x86/mce/AMD: Don't set DEF_INT_TYPE in MSR_CU_DEF_ERR on SMCA systems
The McaIntrCfg register (MSRC000_0410), previously known as CU_DEFER_ERR,
is used on SMCA systems to set the LVT offset for the Threshold and
Deferred error interrupts.

This register was used on non-SMCA systems to also set the Deferred
interrupt type in bits 2:1. However, these bits are reserved on SMCA
systems.

Only set MSRC000_0410[2:1] on non-SMCA systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171120162646.5210-1-Yazen.Ghannam@amd.com
2017-12-04 20:38:44 +01:00
Xie XiuQi
e085ac7a6d x86/MCE: Extend table to report action optional errors through CMCI too
According to the Intel SDM Volume 3B (253669-063US, July 2017), action
optional (SRAO) errors can be reported either via MCE or CMC:

  In cases when SRAO is signaled via CMCI the error signature is
  indicated via UC=1, PCC=0, S=0.

  Type(*1)	UC	EN	PCC	S	AR	Signaling
  ---------------------------------------------------------------
  UC		1	1	1	x	x	MCE
  SRAR		1	1	0	1	1	MCE
  SRAO		1	x(*2)	0	x(*2)	0	MCE/CMC
  UCNA		1	x	0	0	0	CMC
  CE		0	x	x	x	x	CMC

  NOTES:
  1. SRAR, SRAO and UCNA errors are supported by the processor only
     when IA32_MCG_CAP[24] (MCG_SER_P) is set.
  2. EN=1, S=1 when signaled via MCE. EN=x, S=0 when signaled via CMC.

And there is a description in 15.6.2 UCR Error Reporting and Logging, for
bit S:

  S (Signaling) flag, bit 56 - Indicates (when set) that a machine check
  exception was generated for the UCR error reported in this MC bank...
  When the S flag in the IA32_MCi_STATUS register is clear, this UCR error
  was not signaled via a machine check exception and instead was reported
  as a corrected machine check (CMC).

So merge the two cases and just remove the S=0 check for SRAO in
mce_severity().

[ Borislav: Massage commit message.]

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Chen Wei <chenwei68@huawei.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1511575548-41992-1-git-send-email-xiexiuqi@huawei.com
2017-12-04 20:38:44 +01:00
Tom Lendacky
18c71ce9c8 x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature
Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature.  SEV is identified by
CPUID 0x8000001f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR).  Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04 10:57:23 -06:00
Chunyu Hu
55d2d0ad2f x86/idt: Load idt early in start_secondary
On a secondary, idt is first loaded in cpu_init() with load_current_idt(),
i.e. no exceptions can be handled before that point.

The conversion of WARN() to use UD requires the IDT being loaded earlier as
any warning between start_secondary() and load_curren_idt() in cpu_init()
will result in an unhandled @UD exception and therefore fail the bringup of
the CPU.

Install the IDT handlers right in start_secondary() before calling cpu_init().

[ tglx: Massaged changelog ]

Fixes: 9a93848fe7 ("x86/debug: Implement __WARN() using UD0")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: rostedt@goodmis.org
Cc: luto@kernel.org
Link: https://lkml.kernel.org/r/1511792499-4073-1-git-send-email-chuhu@redhat.com
2017-11-28 08:15:40 +01:00
Al Viro
b146e2ce80 x86: annotate ->poll() instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:20:00 -05:00
Rafael J. Wysocki
57044031b0 ACPI / PM: Make it possible to ignore the system sleep blacklist
The ACPI code supporting system transitions to sleep states uses
an internal blacklist to apply special handling to some machines
reported to behave incorrectly in some ways.

However, some entries of that blacklist cover problematic as well as
non-problematic systems, so give the users of the latter a chance to
ignore the blacklist and run their systems in the default way by
adding acpi_sleep=nobl to the kernel command line.

For example, that allows the users of Dell XPS13 9360 systems not
affected by the issue that caused the blacklist entry for this
machine to be added by commit 71630b7a83 (ACPI / PM: Blacklist Low
Power S0 Idle _DSM for Dell XPS13 9360) to use suspend-to-idle with
the Low Power S0 Idle _DSM interface which in principle should be
more energy-efficient than S3 on them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-27 01:44:24 +01:00
Linus Torvalds
02fc87b117 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
 - topology enumeration fixes
 - KASAN fix
 - two entry fixes (not yet the big series related to KASLR)
 - remove obsolete code
 - instruction decoder fix
 - better /dev/mem sanity checks, hopefully working better this time
 - pkeys fixes
 - two ACPI fixes
 - 5-level paging related fixes
 - UMIP fixes that should make application visible faults more debuggable
 - boot fix for weird virtualization environment

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/decoder: Add new TEST instruction pattern
  x86/PCI: Remove unused HyperTransport interrupt support
  x86/umip: Fix insn_get_code_seg_params()'s return value
  x86/boot/KASLR: Remove unused variable
  x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
  x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
  x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
  x86/pkeys/selftests: Fix protection keys write() warning
  x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
  x86/mpx/selftests: Fix up weird arrays
  x86/pkeys: Update documentation about availability
  x86/umip: Print a warning into the syslog if UMIP-protected instructions are used
  x86/smpboot: Fix __max_logical_packages estimate
  x86/topology: Avoid wasting 128k for package id array
  perf/x86/intel/uncore: Cache logical pkg id in uncore driver
  x86/acpi: Reduce code duplication in mp_override_legacy_irq()
  x86/acpi: Handle SCI interrupts above legacy space gracefully
  x86/boot: Fix boot failure when SMP MP-table is based at 0
  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
  x86/selftests: Add test for mapping placement for 5-level paging
  ...
2017-11-26 14:11:54 -08:00
Nadav Amit
9d0b62328d x86/tlb: Disable interrupts when changing CR4
CR4 modifications are implemented as RMW operations which update a shadow
variable and write the result to CR4. The RMW operation is protected by
preemption disable, but there is no enforcement or debugging mechanism.

CR4 modifications happen also in interrupt context via
__native_flush_tlb_global(). This implementation does not affect a
interrupted thread context CR4 operation, because the CR4 toggle restores
the original content and does not modify the shadow variable.

So the current situation seems to be safe, but a recent patch tried to add
an actual RMW operation in interrupt context, which will cause subtle
corruptions.

To prevent that and make the CR4 handling future proof:

 - Add a lockdep assertion to __cr4_set() which will catch interrupt
   enabled invocations

 - Disable interrupts in the cr4 manipulator inlines

 - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called
   from __switch_to_xtra() where interrupts are already disabled and
   performance matters.

All other call sites are not performance critical, so the extra overhead of
an additional local_irq_save/restore() pair is not a problem. If new call
sites care about performance then the necessary _irqsoff() variants can be
added.

[ tglx: Condensed the patch by moving the irq protection inside the
  	manipulator functions. Updated changelog ]

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Luck <tony.luck@intel.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: nadav.amit@gmail.com
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
2017-11-25 13:28:43 +01:00
Bjorn Helgaas
fd2fa6c18b x86/PCI: Remove unused HyperTransport interrupt support
There are no in-tree callers of ht_create_irq(), the driver interface for
HyperTransport interrupts, left.  Remove the unused entry point and all the
supporting code.

See 8b955b0ddd ("[PATCH] Initial generic hypertransport interrupt
support").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-pci@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-11-23 20:18:18 +01:00
Borislav Petkov
e2a5dca753 x86/umip: Fix insn_get_code_seg_params()'s return value
In order to save on redundant structs definitions
insn_get_code_seg_params() was made to return two 4-bit values in a char
but clang complains:

  arch/x86/lib/insn-eval.c:780:10: warning: implicit conversion from 'int' to 'char'
	  changes value from 132 to -124 [-Wconstant-conversion]
                  return INSN_CODE_SEG_PARAMS(4, 8);
                  ~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
  ./arch/x86/include/asm/insn-eval.h:16:57: note: expanded from macro 'INSN_CODE_SEG_PARAMS'
  #define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))

Those two values do get picked apart afterwards the opposite way of how
they were ORed so wrt to the LSByte, the return value is the same.

But this function returns -EINVAL in the error case, which is an int. So
make it return an int which is the native word size anyway and thus fix
the clang warning.

Reported-by: Kees Cook <keescook@google.com>
Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ricardo.neri-calderon@linux.intel.com
Link: https://lkml.kernel.org/r/20171123091951.1462-1-bp@alien8.de
2017-11-23 20:17:59 +01:00
Ricardo Neri
fd11a6496e x86/umip: Print a warning into the syslog if UMIP-protected instructions are used
Print a rate-limited warning when a user-space program attempts to execute
any of the instructions that UMIP protects (i.e., SGDT, SIDT, SLDT, STR
and SMSW).

This is useful, because when CONFIG_X86_INTEL_UMIP=y is selected and
supported by the hardware, user space programs that try to execute such
instructions will receive a SIGSEGV signal that they might not expect.

In the specific cases for which emulation is provided (instructions SGDT,
SIDT and SMSW in protected and virtual-8086 modes), no signal is
generated. However, a warning is helpful to encourage updates in such
programs to avoid the use of such instructions.

Warnings are printed via a customized printk() function that also provides
information about the program that attempted to use the affected
instructions.

Utility macros are defined to wrap umip_printk() for the error and warning
kernel log levels.

While here, replace an existing call to the generic rate-limited pr_err()
with the new umip_pr_err().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1511233476-17088-1-git-send-email-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-21 08:13:43 +01:00
Linus Torvalds
e75080f185 Merge tag 'pm-fixes-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull two power management fixes from Rafael Wysocki:
 "This is the change making /proc/cpuinfo on x86 report current CPU
  frequency in "cpu MHz" again in all cases and an additional one
  dealing with an overzealous check in one of the helper routines in the
  runtime PM framework"

* tag 'pm-fixes-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / runtime: Drop children check from __pm_runtime_set_status()
  x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
2017-11-17 14:49:25 -08:00
Prarit Bhargava
b4c0a7326f x86/smpboot: Fix __max_logical_packages estimate
A system booted with a small number of cores enabled per package
panics because the estimate of __max_logical_packages is too low.

This occurs when the total number of active cores across all packages is
less than the maximum core count for a single package. e.g.:

  On a 4 package system with 20 cores/package where only 4 cores are
  enabled on each package, the value of __max_logical_packages is
  calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.

Calculate __max_logical_packages after the cpu enumeration has completed.
Use the boot cpu's data to extrapolate the number of packages.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: He Chen <he.chen@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Piotr Luc <piotr.luc@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
2017-11-17 16:22:31 +01:00
Andi Kleen
30bb981185 x86/topology: Avoid wasting 128k for package id array
Analyzing large early boot allocations unveiled the logical package id
storage as a prominent memory waste. Since commit 1f12e32f4c
("x86/topology: Create logical package id") every 64-bit system allocates a
128k array to convert logical package ids.

This happens because the array is sized for MAX_LOCAL_APIC which is always
32k on 64bit systems, and it needs 4 bytes for each entry.

This is fairly wasteful, especially for the common case of having only one
socket, which uses exactly 4 byte out of 128K.

There is no user of the package id map which is performance critical, so
the lookup is not required to be O(1). Store the logical processor id in
cpu_data and use a loop based lookup.

To keep the mapping stable accross cpu hotplug operations, add a flag to
cpu_data which is set when the CPU is brought up the first time. When the
flag is set, then cpu_data is not reinitialized by copying boot_cpu_data on
subsequent bringups.

[ tglx: Rename the flag to 'initialized', use proper pointers instead of
  	repeated cpu_data(x) evaluation and massage changelog. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: He Chen <he.chen@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Piotr Luc <piotr.luc@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171114124257.22013-3-prarit@redhat.com
2017-11-17 16:22:30 +01:00
Vikas C Sajjan
4ee2ec1b12 x86/acpi: Reduce code duplication in mp_override_legacy_irq()
The new function mp_register_ioapic_irq() is a subset of the code in
mp_override_legacy_irq().

Replace the code duplication by invoking mp_register_ioapic_irq() from
mp_override_legacy_irq().

Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: kkamagui@gmail.com
Cc: linux-acpi@vger.kernel.org
Link: https://lkml.kernel.org/r/1510848825-21965-3-git-send-email-vikas.cha.sajjan@hpe.com
2017-11-17 15:30:33 +01:00
Vikas C Sajjan
252714155f x86/acpi: Handle SCI interrupts above legacy space gracefully
Platforms which support only IOAPIC mode, pass the SCI information above
the legacy space (0-15) via the FADT mechanism and not via MADT.

In such cases mp_override_legacy_irq() which is invoked from
acpi_sci_ioapic_setup() to register SCI interrupts fails for interrupts
greater equal 16, since it is meant to handle only the legacy space and
emits error "Invalid bus_irq %u for legacy override".

Add a new function to handle SCI interrupts >= 16 and invoke it
conditionally in acpi_sci_ioapic_setup().

The code duplication due to this new function will be cleaned up in a
separate patch.

Co-developed-by: Sunil V L <sunil.vl@hpe.com>
Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com>
Signed-off-by: Sunil V L <sunil.vl@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Abdul Lateef Attar <abdul-lateef.attar@hpe.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: kkamagui@gmail.com
Cc: linux-acpi@vger.kernel.org
Link: https://lkml.kernel.org/r/1510848825-21965-2-git-send-email-vikas.cha.sajjan@hpe.com
2017-11-17 15:30:33 +01:00
Tom Lendacky
ac5292e9a2 x86/boot: Fix boot failure when SMP MP-table is based at 0
When crosvm is used to boot a kernel as a VM, the SMP MP-table is found
at physical address 0x0. This causes mpf_base to be set to 0 and a
subsequent "if (!mpf_base)" check in default_get_smp_config() results in
the MP-table not being parsed.  Further into the boot this results in an
oops when attempting a read_apic_id().

Add a boolean variable that is set to true when the MP-table is found.
Use this variable for testing if the MP-table was found so that even a
value of 0 for mpf_base will result in continued parsing of the MP-table.

Fixes: 5997efb967 ("x86/boot: Use memremap() to map the MPF and MPC data")
Reported-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: regression@leemhuis.info
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171106201753.23059.86674.stgit@tlendack-t1.amdoffice.net
2017-11-17 15:30:33 +01:00
Linus Torvalds
051089a2ee Merge tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
 "Xen features and fixes for v4.15-rc1

  Apart from several small fixes it contains the following features:

   - a series by Joao Martins to add vdso support of the pv clock
     interface

   - a series by Juergen Gross to add support for Xen pv guests to be
     able to run on 5 level paging hosts

   - a series by Stefano Stabellini adding the Xen pvcalls frontend
     driver using a paravirtualized socket interface"

* tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits)
  xen/pvcalls: fix potential endless loop in pvcalls-front.c
  xen/pvcalls: Add MODULE_LICENSE()
  MAINTAINERS: xen, kvm: track pvclock-abi.h changes
  x86/xen/time: setup vcpu 0 time info page
  x86/xen/time: set pvclock flags on xen_time_init()
  x86/pvclock: add setter for pvclock_pvti_cpu0_va
  ptp_kvm: probe for kvm guest availability
  xen/privcmd: remove unused variable pageidx
  xen: select grant interface version
  xen: update arch/x86/include/asm/xen/cpuid.h
  xen: add grant interface version dependent constants to gnttab_ops
  xen: limit grant v2 interface to the v1 functionality
  xen: re-introduce support for grant v2 interface
  xen: support priv-mapping in an HVM tools domain
  xen/pvcalls: remove redundant check for irq >= 0
  xen/pvcalls: fix unsigned less than zero error check
  xen/time: Return -ENODEV from xen_get_wallclock()
  xen/pvcalls-front: mark expected switch fall-through
  xen: xenbus_probe_frontend: mark expected switch fall-throughs
  xen/time: do not decrease steal time after live migration on xen
  ...
2017-11-16 13:06:27 -08:00
Kirill A. Shutemov
1e0f25dbf2 x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border
In case of 5-level paging, the kernel does not place any mapping above
47-bit, unless userspace explicitly asks for it.

Userspace can request an allocation from the full address space by
specifying the mmap address hint above 47-bit.

Nicholas noticed that the current implementation violates this interface:

  If user space requests a mapping at the end of the 47-bit address space
  with a length which causes the mapping to cross the 47-bit border
  (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
  below and above.

Sanity check the mmap address hint so that start and end of the resulting
vma are on the same side of the 47-bit border. If that's not the case fall
back to the code path which ignores the address hint and allocate from the
regular address space below 47-bit.

To make the checks consistent, mask out the address hints lower bits
(either PAGE_MASK or huge_page_mask()) instead of using ALIGN() which can
push them up to the next boundary.

[ tglx: Moved the address check to a function and massaged comment and
  	changelog ]

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171115143607.81541-1-kirill.shutemov@linux.intel.com
2017-11-16 11:43:11 +01:00
Levin, Alexander (Sasha Levin)
4675ff05de kmemcheck: rip it out
Fix up makefiles, remove references, and git rm kmemcheck.

Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Levin, Alexander (Sasha Levin)
75f296d93b kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK
Convert all allocations that used a NOTRACK flag to stop using it.

Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Levin, Alexander (Sasha Levin)
4950276672 kmemcheck: remove annotations
Patch series "kmemcheck: kill kmemcheck", v2.

As discussed at LSF/MM, kill kmemcheck.

KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow).  KASan is already upstream.

We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).

The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.

Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.

This patch (of 4):

Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
  Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Rafael J. Wysocki
7d5905dc14 x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
After commit 890da9cf09 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration.  That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.

To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".

However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.

Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.

Fixes: 890da9cf09 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
2017-11-15 19:46:50 +01:00