Commit Graph

14712 Commits

Author SHA1 Message Date
mike.travis@hpe.com
6c66350d0a x86/tsc: Provide a means to disable TSC ART
On systems where multiple chassis are reset asynchronously, and thus
the TSC counters are started asynchronously, the offset needed to
convert to TSC to ART would be different.  Disable ART in that case
and rely on the TSC counters to supply the accurate time.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Gao <bin.gao@linux.intel.com>
Link: https://lkml.kernel.org/r/20171012163202.289397994@stormcage.americas.sgi.com
2017-10-16 22:50:37 +02:00
mike.travis@hpe.com
41e7864ab5 x86/tsc: Drastically reduce the number of firmware bug warnings
Prior to the TSC ADJUST MSR being available, the method to set TSC's in
sync with each other naturally caused a small skew between cpu threads.
This was NOT a firmware bug at the time so introducing a whole avalanche
of alarming warning messages might cause unnecessary concern and customer
complaints. (Example: >3000 msgs in a 32 socket Skylake system.)

Simply report the warning condition, if possible do the necessary fixes,
and move on.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Reviewed-by: Russ Anderson <russ.anderson@hpe.com>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Bin Gao <bin.gao@linux.intel.com>
Link: https://lkml.kernel.org/r/20171012163202.175062400@stormcage.americas.sgi.com
2017-10-16 22:50:36 +02:00
mike.travis@hpe.com
9514ececa5 x86/tsc: Skip TSC test and error messages if already unstable
If the TSC has already been determined to be unstable, then checking
TSC ADJUST values is a waste of time and generates unnecessary error
messages.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Reviewed-by: Russ Anderson <russ.anderson@hpe.com>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Bin Gao <bin.gao@linux.intel.com>
Link: https://lkml.kernel.org/r/20171012163202.060777495@stormcage.americas.sgi.com
2017-10-16 22:50:36 +02:00
mike.travis@hpe.com
341102c3ef x86/tsc: Add option that TSC on Socket 0 being non-zero is valid
Add a flag to indicate and process that TSC counters are on chassis
that reset at different times during system startup.  Therefore which
TSC ADJUST values should be zero is not predictable.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Reviewed-by: Russ Anderson <russ.anderson@hpe.com>
Reviewed-by: Andrew Banman <andrew.abanman@hpe.com>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Bin Gao <bin.gao@linux.intel.com>
Link: https://lkml.kernel.org/r/20171012163201.944370012@stormcage.americas.sgi.com
2017-10-16 22:50:36 +02:00
Thomas Gleixner
9c48c0965b x86/idt: Initialize early IDT before cr4_init_shadow()
Moving the early IDT setup out of assembly code breaks the boot on first
generation 486 systems.

The reason is that the call of idt_setup_early_handler, which sets up the
early handlers was added after the call to cr4_init_shadow().

cr4_init_shadow() tries to read CR4 which is not available on those
systems. The accessor function uses a extable fixup to handle the resulting
fault. As the IDT is not set up yet, the cr4 read exception causes an
instantaneous reboot for obvious reasons.

Call idt_setup_early_handler() before cr4_init_shadow() so IDT is set up
before the first exception hits.

Fixes: 87e81786b1 ("x86/idt: Move early IDT setup out of 32-bit asm")
Reported-and-tested-by:  Matthew Whitehead <whiteheadm@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710161210290.1973@nanos
2017-10-16 20:47:37 +02:00
Colin Ian King
e6fc454b77 x86/cpu/intel_cacheinfo: Remove redundant assignment to 'this_leaf'
The 'this_leaf' variable is assigned a value that is never
read and it is updated a little later with a newer value,
hence we can remove the redundant assignment.

Cleans up the following Clang warning:

  Value stored to 'this_leaf' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20171015160203.12332-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-16 09:25:18 +02:00
Linus Torvalds
e7a36a6ec9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A landry list of fixes:

   - fix reboot breakage on some PCID-enabled system

   - fix crashes/hangs on some PCID-enabled systems

   - fix microcode loading on certain older CPUs

   - various unwinder fixes

   - extend an APIC quirk to more hardware systems and disable APIC
     related warning on virtualized systems

   - various Hyper-V fixes

   - a macro definition robustness fix

   - remove jprobes IRQ disabling

   - various mem-encryption fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Do the family check first
  x86/mm: Flush more aggressively in lazy TLB mode
  x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping
  x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors
  x86/mm: Disable various instrumentations of mm/mem_encrypt.c and mm/tlb.c
  x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing
  x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures
  x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs
  x86/unwind: Disable unwinder warnings on 32-bit
  x86/unwind: Align stack pointer in unwinder dump
  x86/unwind: Use MSB for frame pointer encoding on 32-bit
  x86/unwind: Fix dereference of untrusted pointer
  x86/alternatives: Fix alt_max_short macro to really be a max()
  x86/mm/64: Fix reboot interaction with CR4.PCIDE
  kprobes/x86: Remove IRQ disabling from jprobe handlers
  kprobes/x86: Set up frame pointer in kprobe trampoline
2017-10-14 15:26:38 -04:00
Linus Torvalds
7b764cedcb Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fixes from Ingo Molnar:
 "A boot parameter fix, plus a header export fix"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Hide mca_cfg
  RAS/CEC: Use the right length for "cec_disable"
2017-10-14 15:19:11 -04:00
Borislav Petkov
1f161f67a2 x86/microcode: Do the family check first
On CPUs like AMD's Geode, for example, we shouldn't even try to load
microcode because they do not support the modern microcode loading
interface.

However, we do the family check *after* the other checks whether the
loader has been disabled on the command line or whether we're running in
a guest.

So move the family checks first in order to exit early if we're being
loaded on an unsupported family.

Reported-and-tested-by: Sven Glodowski <glodi1@arcor.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.11..
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-14 12:55:40 +02:00
Josh Poimboeuf
11af847446 x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*'
Rename the unwinder config options from:

  CONFIG_ORC_UNWINDER
  CONFIG_FRAME_POINTER_UNWINDER
  CONFIG_GUESS_UNWINDER

to:

  CONFIG_UNWINDER_ORC
  CONFIG_UNWINDER_FRAME_POINTER
  CONFIG_UNWINDER_GUESS

... in order to give them a more logical config namespace.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-14 10:12:12 +02:00
Len Brown
616dd5872e x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping
SKX stepping-3 fixed the TSC_DEADLINE issue in a different ucode
version number than stepping-4.  Linux needs to know this stepping-3
specific version number to also enable the TSC_DEADLINE on stepping-3.

The steppings and ucode versions are documented in the SKX BIOS update:
https://downloadmirror.intel.com/26978/eng/ReleaseNotes_R00.01.0004.txt

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/60f2bbf7cf617e212b522e663f84225bfebc50e5.1507756305.git.len.brown@intel.com
2017-10-12 17:10:11 +02:00
Paolo Bonzini
cc6afe2240 x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors
Commit 594a30fb12 ("x86/apic: Silence "FW_BUG TSC_DEADLINE disabled
due to Errata" on CPUs without the feature", 2017-08-30) was also about
silencing the warning on VirtualBox; however, KVM does expose the TSC
deadline timer, and it's virtualized so that it is immune from CPU errata.

Therefore, booting 4.13 with "-cpu Haswell" shows this in the logs:

     [    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata;
                    please update microcode to version: 0xb2 (or later)

Even if you had a hypervisor that does _not_ virtualize the TSC deadline
and rather exposes the hardware one, it should be the hypervisors task
to update microcode and possibly hide the flag from CPUID.  So just
hide the message when running on _any_ hypervisor, not just those that
do not support the TSC deadline timer.

The older check still makes sense, so keep it.

Fixes: bd9240a18e ("x86/apic: Add TSC_DEADLINE quirk due to errata")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1507630377-54471-1-git-send-email-pbonzini@redhat.com
2017-10-12 17:10:10 +02:00
Thomas Gleixner
02edee152d x86/apic/vector: Ignore set_affinity call for inactive interrupts
The core interrupt code can call the affinity setter for inactive
interrupts under certain circumstances.

For inactive intererupts which use managed or reservation mode this is a
pointless exercise as the activation will assign a vector which fits the
destination mask.

Check for this and return w/o going through the vector assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-10-12 12:58:15 +02:00
Thomas Gleixner
331b57d148 Merge branch 'irq/urgent' into x86/apic
Pick up core changes which affect the vector rework.
2017-10-12 11:02:50 +02:00
Josh Poimboeuf
d4a2d031dd x86/unwind: Disable unwinder warnings on 32-bit
x86-32 doesn't have stack validation, so in most cases it doesn't make
sense to warn about bad frame pointers.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a69658760800bf281e6353248c23e0fa0acf5230.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 12:49:49 +02:00
Josh Poimboeuf
99bd28a49b x86/unwind: Align stack pointer in unwinder dump
When printing the unwinder dump, the stack pointer could be unaligned,
for one of two reasons:

- stack corruption; or

- GCC created an unaligned stack.

There's no way for the unwinder to tell the difference between the two,
so we have to assume one or the other.  GCC unaligned stacks are very
rare, and have only been spotted before GCC 5.  Presumably, if we're
doing an unwinder stack dump, stack corruption is more likely than a
GCC unaligned stack.  So always align the stack before starting the
dump.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2f540c515946ab09ed267e1a1d6421202a0cce08.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 12:49:49 +02:00
Josh Poimboeuf
5c99b692cf x86/unwind: Use MSB for frame pointer encoding on 32-bit
On x86-32, Tetsuo Handa and Fengguang Wu reported unwinder warnings
like:

  WARNING: kernel stack regs at f60bb9c8 in swapper:1 has bad 'bp' value 0ba00000

And also there were some stack dumps with a bunch of unreliable '?'
symbols after an apic_timer_interrupt symbol, meaning the unwinder got
confused when it tried to read the regs.

The cause of those issues is that, with GCC 4.8 (and possibly older),
there are cases where GCC misaligns the stack pointer in a leaf function
for no apparent reason:

  c124a388 <acpi_rs_move_data>:
  c124a388:       55                      push   %ebp
  c124a389:       89 e5                   mov    %esp,%ebp
  c124a38b:       57                      push   %edi
  c124a38c:       56                      push   %esi
  c124a38d:       89 d6                   mov    %edx,%esi
  c124a38f:       53                      push   %ebx
  c124a390:       31 db                   xor    %ebx,%ebx
  c124a392:       83 ec 03                sub    $0x3,%esp
  ...
  c124a3e3:       83 c4 03                add    $0x3,%esp
  c124a3e6:       5b                      pop    %ebx
  c124a3e7:       5e                      pop    %esi
  c124a3e8:       5f                      pop    %edi
  c124a3e9:       5d                      pop    %ebp
  c124a3ea:       c3                      ret

If an interrupt occurs in such a function, the regs on the stack will be
unaligned, which breaks the frame pointer encoding assumption.  So on
32-bit, use the MSB instead of the LSB to encode the regs.

This isn't an issue on 64-bit, because interrupts align the stack before
writing to it.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/279a26996a482ca716605c7dbc7f2db9d8d91e81.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 12:49:48 +02:00
Josh Poimboeuf
62dd86ac01 x86/unwind: Fix dereference of untrusted pointer
Tetsuo Handa and Fengguang Wu reported a panic in the unwinder:

  BUG: unable to handle kernel NULL pointer dereference at 000001f2
  IP: update_stack_state+0xd4/0x340
  *pde = 00000000

  Oops: 0000 [#1] PREEMPT SMP
  CPU: 0 PID: 18728 Comm: 01-cpu-hotplug Not tainted 4.13.0-rc4-00170-gb09be67 #592
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
  task: bb0b53c0 task.stack: bb3ac000
  EIP: update_stack_state+0xd4/0x340
  EFLAGS: 00010002 CPU: 0
  EAX: 0000a570 EBX: bb3adccb ECX: 0000f401 EDX: 0000a570
  ESI: 00000001 EDI: 000001ba EBP: bb3adc6b ESP: bb3adc3f
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  CR0: 80050033 CR2: 000001f2 CR3: 0b3a7000 CR4: 00140690
  DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
  DR6: fffe0ff0 DR7: 00000400
  Call Trace:
   ? unwind_next_frame+0xea/0x400
   ? __unwind_start+0xf5/0x180
   ? __save_stack_trace+0x81/0x160
   ? save_stack_trace+0x20/0x30
   ? __lock_acquire+0xfa5/0x12f0
   ? lock_acquire+0x1c2/0x230
   ? tick_periodic+0x3a/0xf0
   ? _raw_spin_lock+0x42/0x50
   ? tick_periodic+0x3a/0xf0
   ? tick_periodic+0x3a/0xf0
   ? debug_smp_processor_id+0x12/0x20
   ? tick_handle_periodic+0x23/0xc0
   ? local_apic_timer_interrupt+0x63/0x70
   ? smp_trace_apic_timer_interrupt+0x235/0x6a0
   ? trace_apic_timer_interrupt+0x37/0x3c
   ? strrchr+0x23/0x50
  Code: 0f 95 c1 89 c7 89 45 e4 0f b6 c1 89 c6 89 45 dc 8b 04 85 98 cb 74 bc 88 4d e3 89 45 f0 83 c0 01 84 c9 89 04 b5 98 cb 74 bc 74 3b <8b> 47 38 8b 57 34 c6 43 1d 01 25 00 00 02 00 83 e2 03 09 d0 83
  EIP: update_stack_state+0xd4/0x340 SS:ESP: 0068:bb3adc3f
  CR2: 00000000000001f2
  ---[ end trace 0d147fd4aba8ff50 ]---
  Kernel panic - not syncing: Fatal exception in interrupt

On x86-32, after decoding a frame pointer to get a regs address,
regs_size() dereferences the regs pointer when it checks regs->cs to see
if the regs are user mode.  This is dangerous because it's possible that
what looks like a decoded frame pointer is actually a corrupt value, and
we don't want the unwinder to make things worse.

Instead of calling regs_size() on an unsafe pointer, just assume they're
kernel regs to start with.  Later, once it's safe to access the regs, we
can do the user mode check and corresponding safety check for the
remaining two regs.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 5ed8d8bb38 ("x86/unwind: Move common code into update_stack_state()")
Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 12:49:47 +02:00
Juergen Gross
9043442b43 locking/paravirt: Use new static key for controlling call of virt_spin_lock()
There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
has the downside of falling back to unfair test and set scheme for
qspinlocks due to virt_spin_lock() detecting the virtualized
environment.

Add a static key controlling whether virt_spin_lock() should be
called or not. When running on bare metal set the new key to false.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: hpa@zytor.com
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170906173625.18158-2-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 11:50:12 +02:00
Andy Lutomirski
924c6b900c x86/mm/64: Fix reboot interaction with CR4.PCIDE
Trying to reboot via real mode fails with PCID on: long mode cannot
be exited while CR4.PCIDE is set.  (No, I have no idea why, but the
SDM and actual CPUs are in agreement here.)  The result is a GPF and
a hang instead of a reboot.

I didn't catch this in testing because neither my computer nor my VM
reboots this way.  I can trigger it with reboot=bios, though.

Fixes: 660da7c922 ("x86/mm: Enable CR4.PCIDE on supported systems")
Reported-and-tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.1507524746.git.luto@kernel.org
2017-10-09 13:31:04 +02:00
Kees Cook
92bb6cb140 x86/mce: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adjust sanity-check WARN to make sure
the triggering timer matches the current CPU timer.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20171005005425.GA23950@beast
2017-10-05 14:34:55 +02:00
Borislav Petkov
262e681183 x86/mce: Hide mca_cfg
Now that lguest is gone, put it in the internal header which should be
used only by MCA/RAS code.

Add missing header guards while at it.

No functional change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20171002092836.22971-3-bp@alien8.de
2017-10-05 14:23:06 +02:00
Jithu Joseph
3916a4135c x86/intel_rdt: Remove redundant assignment
The assignment to the 'files' variable is immediately overwritten
in the following line. Remove the older assignment, which was meant
specifially for creating control groups files.

Fixes: c7d9aac613 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: tony.luck@intel.com
Cc: vikas.shivappa@intel.com
Link: https://lkml.kernel.org/r/1507157337-18118-1-git-send-email-jithu.joseph@intel.com
2017-10-05 13:20:32 +02:00
Colin Ian King
5fd88b60e1 x86/intel_rdt/cqm: Make integer rmid_limbo_count static
rmid_limbo_count is local to the source and does not need to be in global
scope, so make it static.

Cleans up sparse warning:
symbol 'rmid_limbo_count' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20171002145931.27479-1-colin.king@canonical.com
2017-10-05 13:20:32 +02:00
Boqun Feng
a2b7861bb3 kvm/x86: Avoid async PF preempting the kernel incorrectly
Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
schedule() to reschedule in some cases.  This could result in
accidentally ending the current RCU read-side critical section early,
causing random memory corruption in the guest, or otherwise preempting
the currently running task inside between preempt_disable and
preempt_enable.

The difficulty to handle this well is because we don't know whether an
async PF delivered in a preemptible section or RCU read-side critical section
for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
are both no-ops in that case.

To cure this, we treat any async PF interrupting a kernel context as one
that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
the schedule() path in that case.

To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
so that we know whether it's called from a context interrupting the
kernel, and the parameter is set properly in all the callsites.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-04 18:28:53 +02:00
Masami Hiramatsu
b664d57f39 kprobes/x86: Remove IRQ disabling from jprobe handlers
Jprobes actually don't need to disable IRQs while calling
handlers, because of how we specify the kernel interface in
Documentation/kprobes.txt:

-----
 Probe handlers are run with preemption disabled.  Depending on the
 architecture and optimization state, handlers may also run with
 interrupts disabled (e.g., kretprobe handlers and optimized kprobe
 handlers run without interrupt disabled on x86/x86-64).
-----

So let's remove IRQ disabling from jprobes too.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150701508194.32266.14458959863314097305.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-03 19:11:48 +02:00
Josh Poimboeuf
ee213fc72f kprobes/x86: Set up frame pointer in kprobe trampoline
Richard Weinberger saw an unwinder warning when running bcc's opensnoop:

  WARNING: kernel stack frame pointer at ffff99ef4076bea0 in opensnoop:2008 has bad value 0000000000000008
  unwind stack type:0 next_sp:          (null) mask:0x2 graph_idx:0
  ...
  ffff99ef4076be88: ffff99ef4076bea0 (0xffff99ef4076bea0)
  ffff99ef4076be90: ffffffffac442721 (optimized_callback +0x81/0x90)
  ...

A lockdep stack trace was initiated from inside a kprobe handler, when
the unwinder noticed a bad frame pointer on the stack.  The bad frame
pointer is related to the fact that the kprobe optprobe trampoline
doesn't save the frame pointer before calling into optimized_callback().

Reported-and-tested-by: Richard Weinberger <richard@sigma-star.at>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/7aef2f8ecd75c2f505ef9b80490412262cf4a44c.1507038547.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-03 19:11:27 +02:00
Jean Delvare
a1652bb8a0 x86/boot: Spell out "boot CPU" for BP
It's not obvious to everybody that BP stands for boot processor. At
least it was not for me. And BP is also a CPU register on x86, so it
is ambiguous. Spell out "boot CPU" everywhere instead.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-03 18:41:23 +02:00
Linus Torvalds
368f89984b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This contains the following fixes and improvements:

   - Avoid dereferencing an unprotected VMA pointer in the fault signal
     generation code

   - Fix inline asm call constraints for GCC 4.4

   - Use existing register variable to retrieve the stack pointer
     instead of forcing the compiler to create another indirect access
     which results in excessive extra 'mov %rsp, %<dst>' instructions

   - Disable branch profiling for the memory encryption code to prevent
     an early boot crash

   - Fix a sparse warning caused by casting the __user annotation in
     __get_user_asm_u64() away

   - Fix an off by one error in the loop termination of the error patch
     in the x86 sysfs init code

   - Add missing CPU IDs to various Intel specific drivers to enable the
     functionality on recent hardware

   - More (init) constification in the numachip code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Use register variable to get stack pointer value
  x86/mm: Disable branch profiling in mem_encrypt.c
  x86/asm: Fix inline asm call constraints for GCC 4.4
  perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
  perf/x86/intel/rapl: Add missing CPU IDs
  perf/x86/msr: Add missing CPU IDs
  perf/x86/intel/cstate: Add missing CPU IDs
  x86: Don't cast away the __user in __get_user_asm_u64()
  x86/sysfs: Fix off-by-one error in loop termination
  x86/mm: Fix fault error path using unsafe vma pointer
  x86/numachip: Add const and __initconst to numachip2_clockevent
2017-10-01 13:55:32 -07:00
Linus Torvalds
42057e1825 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "Mixed bugfixes. Perhaps the most interesting one is a latent bug that
  was finally triggered by PCID support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm/x86: Handle async PF in RCU read-side critical sections
  KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume
  KVM: VMX: use cmpxchg64
  KVM: VMX: simplify and fix vmx_vcpu_pi_load
  KVM: VMX: avoid double list add with VT-d posted interrupts
  KVM: VMX: extract __pi_post_block
  KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception
  KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
2017-09-29 12:18:55 -07:00
Vlastimil Babka
77072f09ea x86/stacktrace: Avoid recording save_stack_trace() wrappers
The save_stack_trace() and save_stack_trace_tsk() wrappers of
__save_stack_trace() add themselves to the call stack, and thus appear in the
recorded stacktraces. This is redundant and wasteful when we have limited space
to record the useful part of the backtrace with e.g. page_owner functionality.

Fix this by making sure __save_stack_trace() is noinline (which matches the
current gcc decision) and bumping the skip in the wrappers
(save_stack_trace_tsk() only when called for the current task). This is similar
to what was done for arm in 3683f44c42 ("ARM: stacktrace: avoid listing
stacktrace functions in stacktrace") and is pending for arm64.

Also make sure that __save_stack_trace_reliable() doesn't get this problem in
the future by marking it __always_inline (which matches current gcc decision),
per Josh Poimboeuf.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929092335.2744-1-vbabka@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29 19:44:03 +02:00
Andrey Ryabinin
196bd485ee x86/asm: Use register variable to get stack pointer value
Currently we use current_stack_pointer() function to get the value
of the stack pointer register. Since commit:

  f5caf621ee ("x86/asm: Fix inline asm call constraints for Clang")

... we have a stack register variable declared. It can be used instead of
current_stack_pointer() function which allows to optimize away some
excessive "mov %rsp, %<dst>" instructions:

 -mov    %rsp,%rdx
 -sub    %rdx,%rax
 -cmp    $0x3fff,%rax
 -ja     ffffffff810722fd <ist_begin_non_atomic+0x2d>

 +sub    %rsp,%rax
 +cmp    $0x3fff,%rax
 +ja     ffffffff810722fa <ist_begin_non_atomic+0x2a>

Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer
and use it instead of the removed function.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29 19:39:44 +02:00
Boqun Feng
b862789aa5 kvm/x86: Handle async PF in RCU read-side critical sections
Sasha Levin reported a WARNING:

| WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
| rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
| WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
| rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
...
| CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
| 1.10.1-1ubuntu1 04/01/2014
| Call Trace:
...
| RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
| RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
| RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002
| RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000
| RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410
| RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001
| R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08
| R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040
| __schedule+0x201/0x2240 kernel/sched/core.c:3292
| schedule+0x113/0x460 kernel/sched/core.c:3421
| kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
| do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
| async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
| RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
| RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
| RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
| RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
| RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
| R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
| R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
| vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
| sprintf+0xbe/0xf0 lib/vsprintf.c:2386
| proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
| get_link fs/namei.c:1047 [inline]
| link_path_walk+0x1041/0x1490 fs/namei.c:2127
...

This happened when the host hit a page fault, and delivered it as in an
async page fault, while the guest was in an RCU read-side critical
section.  The guest then tries to reschedule in kvm_async_pf_task_wait(),
but rcu_preempt_note_context_switch() would treat the reschedule as a
sleep in RCU read-side critical section, which is not allowed (even in
preemptible RCU).  Thus the WARN.

To cure this, make kvm_async_pf_task_wait() go to the halt path if the
PF happens in a RCU read-side critical section.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-29 17:05:17 +02:00
Colin Ian King
79761ce80a x86/apic: Fix spelling mistake: "symmectic" -> "symmetric"
Trivial fix to spelling mistakes in pr_info messages

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: https://lkml.kernel.org/r/20170927102223.31920-1-colin.king@canonical.com
2017-09-28 12:22:40 +02:00
Josh Poimboeuf
2704fbb672 x86/head: Add unwind hint annotations
Jiri Slaby reported an ORC issue when unwinding from an idle task.  The
stack was:

    ffffffff811083c2 do_idle+0x142/0x1e0
    ffffffff8110861d cpu_startup_entry+0x5d/0x60
    ffffffff82715f58 start_kernel+0x3ff/0x407
    ffffffff827153e8 x86_64_start_kernel+0x14e/0x15d
    ffffffff810001bf secondary_startup_64+0x9f/0xa0

The ORC unwinder errored out at secondary_startup_64 because the head
code isn't annotated yet so there wasn't a corresponding ORC entry.

Fix that and any other head-related unwinding issues by adding unwind
hints to the head code.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/78ef000a2f68f545d6eef44ee912edceaad82ccf.1505764066.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:39:04 +02:00
Josh Poimboeuf
e93db75a00 x86/boot: Annotate verify_cpu() as a callable function
verify_cpu() is a callable function.  Annotate it as such.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/293024b8a080832075312f38c07ccc970fc70292.1505764066.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:39:03 +02:00
Josh Poimboeuf
015a2ea547 x86/head: Fix head ELF function annotations
These functions aren't callable C-type functions, so don't annotate them
as such.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/36eb182738c28514f8bf95e403d89b6413a88883.1505764066.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:39:03 +02:00
Josh Poimboeuf
a8b88e84d1 x86/head: Remove unused 'bad_address' code
It's no longer possible for this code to be executed, so remove it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/32a46fe92d2083700599b36872b26e7dfd7b7965.1505764066.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:39:03 +02:00
Josh Poimboeuf
17270717e8 x86/head: Remove confusing comment
This comment is actively wrong and confusing.  It refers to the
registers' stack offsets after the pt_regs has been constructed on the
stack, but this code is *before* that.

At this point the stack just has the standard iret frame, for which no
comment should be needed.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a3c267b770fc56c9b86df9c11c552848248aace2.1505764066.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:39:02 +02:00
Masami Hiramatsu
a19b2e3d78 kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes
Kkprobes don't need to disable IRQs if they are called from the
ftrace/jump trampoline code, because Documentation/kprobes.txt says:

  -----
  Probe handlers are run with preemption disabled.  Depending on the
  architecture and optimization state, handlers may also run with
  interrupts disabled (e.g., kretprobe handlers and optimized kprobe
  handlers run without interrupt disabled on x86/x86-64).
  -----

So let's remove IRQ disabling from those handlers.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581534039.32348.11331736206004264553.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:25:50 +02:00
Masami Hiramatsu
5bb4fc2d86 kprobes/x86: Disable preemption in ftrace-based jprobes
Disable preemption in ftrace-based jprobe handlers as
described in Documentation/kprobes.txt:

  "Probe handlers are run with preemption disabled."

This will fix jprobes behavior when CONFIG_PREEMPT=y.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581530024.32348.9863783558598926771.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:23:04 +02:00
Masami Hiramatsu
9a09f261a4 kprobes/x86: Disable preemption in optprobe
Disable preemption in optprobe handler as described
in Documentation/kprobes.txt, which says:

  "Probe handlers are run with preemption disabled."

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581525942.32348.6359217983269060829.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:23:04 +02:00
Masami Hiramatsu
cd52edad55 kprobes/x86: Move the get_kprobe_ctlblk() into irq-disabled block
Since get_kprobe_ctlblk() accesses per-cpu variables
which calls smp_processor_id(), it must be called under
preempt-disabled or irq-disabled.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150581517952.32348.2655896843219158446.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:23:03 +02:00
Masami Hiramatsu
a8976fc84b kprobes/x86: Remove addressof() operators
The following commit:

  54a7d50b92 ("x86: mark kprobe templates as character arrays, not single characters")

changed optprobe_template_* to arrays, so we can remove the addressof()
operators from those symbols.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150304469798.17009.15886717935027472863.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:23:03 +02:00
Masami Hiramatsu
63fef14fc9 kprobes/x86: Make insn buffer always ROX and use text_poke()
Make insn buffer always ROX and use text_poke() to write
the copied instructions instead of set_memory_*().
This makes instruction buffer stronger against other
kernel subsystems because there is no window time
to modify the buffer.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150304463032.17009.14195368040691676813.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28 09:23:03 +02:00
Tony Luck
cfd0f34e4c x86/intel_rdt: Add diagnostics when making directories
Mostly this is about running out of RMIDs or CLOSIDs. Other
errors are various internal errors.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/027cf1ffb3a3695f2d54525813a1d644887353cf.1506382469.git.tony.luck@intel.com
2017-09-27 12:10:11 +02:00
Tony Luck
94457b36e8 x86/intel_rdt: Add diagnostics when writing the cpus file
Can't add a cpu to a monitor group unless it belongs to parent
group. Can't delete cpus from the default group.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/757a869a25e9fc1b7a2e9bc43e1159455c1964a0.1506382469.git.tony.luck@intel.com
2017-09-27 12:10:11 +02:00
Tony Luck
29e74f35b2 x86/intel_rdt: Add diagnostics when writing the tasks file
About the only tricky case is trying to move a task into a monitor
group that is a subdirectory of a different control group. But cover
the simple cases too.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/f1841cce6a242aed37cb926dee8942727331bf78.1506382469.git.tony.luck@intel.com
2017-09-27 12:10:10 +02:00
Tony Luck
c377dcfbee x86/intel_rdt: Add diagnostics when writing the schemata file
Save helpful descriptions of what went wrong when writing a
schemata file.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/9d6cef757dc88639c8ab47f1e7bc1b081a84bb88.1506382469.git.tony.luck@intel.com
2017-09-27 12:10:10 +02:00
Tony Luck
9b3a7fd0f5 x86/intel_rdt: Add framework for better RDT UI diagnostics
Commands are given to the resctrl file system by making/removing
directories, or by writing to files.  When something goes wrong
the user is generally left wondering why they got:

   bash: echo: write error: Invalid argument

Add a new file "last_cmd_status" to the "info" directory that
will give the user some better clues on what went wrong.

Provide functions to clear and update last_cmd_status which
check that we hold the rdtgroup_mutex.

[ tglx: Made last_cmd_status static and folded back the hunk from patch 3
  	which replaces the open coded access to last_cmd_status with the
  	accessor function ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/edc4e0e9741eee89bba569f0021b1b2662fd9508.1506382469.git.tony.luck@intel.com
2017-09-27 12:10:10 +02:00