Commit Graph

13291 Commits

Author SHA1 Message Date
Wei Yang
3ec979658e x86/e820: Fix very large 'size' handling boundary condition
The (start, size) tuple represents a range [start, start + size - 1],
which means "start" and "start + size - 1" should be compared to see
whether the range overflows.

For example, a range with (start, size):

	(0xffffffff fffffff0, 0x00000000 00000010)

represents

	[0xffffffff fffffff0, 0xffffffff ffffffff]

... would be judged overflow in the original code, while actually it is not.

This patch fixes this and makes sure it still works when size is zero.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1471657213-31817-1-git-send-email-richard.weiyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 09:11:14 +02:00
Josh Poimboeuf
5a8ff54c26 x86/dumpstack: Remove unnecessary stack pointer arguments
When calling show_stack_log_lvl() or dump_trace() with a regs argument,
providing a stack pointer or frame pointer is redundant.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1694e2e955e3b9a73a3c3d5ba2634344014dd550.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:58:40 +02:00
Josh Poimboeuf
4b8afafbe7 x86/dumpstack: Add get_stack_pointer() and get_frame_pointer()
The various functions involved in dumping the stack all do similar
things with regard to getting the stack pointer and the frame pointer
based on the regs and task arguments.  Create helper functions to
do that instead.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f448914885a35f333fe04da1b97a6c2cc1f80974.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:58:40 +02:00
Josh Poimboeuf
d438f5fda3 x86/dumpstack: Make printk_stack_address() more generally useful
Change printk_stack_address() to be useful when called by an unwinder
outside the context of dump_trace().

Specifically:

- printk_stack_address()'s 'data' argument is always used as the log
  level string.  Make that explicit.

- Call touch_nmi_watchdog().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9fbe0db05bacf66d337c162edbf61450d0cff1e2.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:58:40 +02:00
Andy Lutomirski
6271cfdfc0 x86/mm: Improve stack-overflow #PF handling
If we get a page fault indicating kernel stack overflow, invoke
handle_stack_overflow().  To prevent us from overflowing the stack
again while handling the overflow (because we are likely to have
very little stack space left), call handle_stack_overflow() on the
double-fault stack.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6d6cf96b3fb9b4c9aa303817e1dc4de0c7c36487.1472603235.git.luto@kernel.org
[ Minor edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:47:20 +02:00
Ingo Molnar
2b3061c77c Merge branch 'x86/mm' into x86/asm, to unify the two branches for simplicity
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:41:52 +02:00
Dou Liyang
c291b01515 x86/apic: Fix num_processors value in case of failure
If the topology package map check of the APIC ID and the CPU is a failure,
we don't generate the processor info for that APIC ID yet we increase
disabled_cpus by one - which is buggy.

Only increase num_processors once we are sure we don't fail.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:11:03 +02:00
Carlos Santa
8d9c20e1d1 drm/i915: Remove .is_mobile field from platform struct
As recommended by Ville Syrjala removing .is_mobile field from the
platform struct definition for vlv and hsw+ GPUs as there's no need to
make the distinction in later hardware anymore. Keep it for older GPUs
as it is still needed for ilk-ivb.

Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2016-09-07 16:07:07 -07:00
Sebastian Andrzej Siewior
9a20ea4b4c x86/kvm: Convert to hotplug state machine
Install the callbacks via the state machine. The online & down callbacks are
invoked on the target CPU so we can avoid using smp_call_function_single().
local_irq_disable() is used because smp_call_function_single() used to invoke
the function with interrupts disabled.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160818125731.27256-15-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 18:30:25 +02:00
Juergen Gross
47ae4b05d0 virt, sched: Add generic vCPU pinning support
Add generic virtualization support for pinning the current vCPU to a
specified physical CPU. As this operation isn't performance critical
(a very limited set of operations like BIOS calls and SMIs is expected
to need this) just add a hypervisor specific indirection.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Douglas_Warzecha@dell.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: david.vrabel@citrix.com
Cc: hpa@zytor.com
Cc: jdelvare@suse.com
Cc: jeremy@goop.org
Cc: linux@roeck-us.net
Cc: pali.rohar@gmail.com
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1472453327-19050-3-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05 13:52:38 +02:00
Tony Luck
ffb173e657 x86/mce: Drop X86_FEATURE_MCE_RECOVERY and the related model string test
We now have a better way to determine if we are running on a cpu that
supports machine check recovery. Free up this feature bit.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d5db39e08d46cf1012d94d3902275d08ba931926.1472754712.git.tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05 11:47:31 +02:00
Tony Luck
9a6fb28a35 x86/mce: Improve memcpy_mcsafe()
Use the mcsafe_key defined in the previous patch to make decisions on which
copy function to use. We can't use the FEATURE bit any more because PCI
quirks run too late to affect the patching of code. So we use a static key.

Turn memcpy_mcsafe() into an inline function to make life easier for
callers. The assembly code that actually does the copy is now named
memcpy_mcsafe_unrolled()

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/bfde2fc774e94f53d91b70a4321c85a0d33e7118.1472754712.git.tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05 11:47:31 +02:00
Tony Luck
3637efb008 x86/mce: Add PCI quirks to identify Xeons with machine check recovery
Each Xeon includes a number of capability registers in PCI space that
describe some features not enumerated by CPUID.

Use these to determine that we are running on a model that can recover from
machine checks. Hooks for Ivybridge ... Skylake provided.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Boris Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/abf331dc4a3e2a2d17444129bc51127437bcf4ba.1472754711.git.tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05 11:47:31 +02:00
Borislav Petkov
cc2187a6e0 x86/microcode/AMD: Fix load of builtin microcode with randomized memory
We do not need to add the randomization offset when the microcode is
built in.

Reported-and-tested-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160904093736.GA11939@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05 10:38:56 +02:00
Linus Torvalds
9ca581b50d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "A single fix for an AMD erratum so machines without a BIOS fix work"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/AMD: Apply erratum 665 on machines without a BIOS fix
2016-09-04 08:45:41 -07:00
Emanuel Czirai
d199299675 x86/AMD: Apply erratum 665 on machines without a BIOS fix
AMD F12h machines have an erratum which can cause DIV/IDIV to behave
unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
there is no BIOS update containing that workaround so let's do it
ourselves unconditionally. It is simple enough.

[ Borislav: Wrote commit message. ]

Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yaowu Xu <yaowu@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-02 20:42:28 +02:00
Steven Rostedt
15301a5707 x86/paravirt: Do not trace _paravirt_ident_*() functions
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:

  _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()

It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.

This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.

In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue.  But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:

 mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
 mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
 NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
 Modules linked in: e1000e
 CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
 RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
 RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
 FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
 Call Trace:
   _raw_spin_lock+0x27/0x30
   handle_pte_fault+0x13db/0x16b0
   handle_mm_fault+0x312/0x670
   __do_page_fault+0x1b1/0x4e0
   do_page_fault+0x22/0x30
   page_fault+0x28/0x30
   __vfs_read+0x28/0xe0
   vfs_read+0x86/0x130
   SyS_read+0x46/0xa0
   entry_SYSCALL_64_fastpath+0x1e/0xa8
 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b

Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-02 09:40:47 -07:00
Thomas Gleixner
0cb7bf61b1 Merge branch 'linus' into smp/hotplug
Apply upstream changes to avoid conflicts with pending patches.
2016-09-01 18:33:46 +02:00
Brian Gerst
ffcb043ba5 sched/x86: Fix thread_saved_pc()
thread_saved_pc() was using a completely bogus method to get the return
address.  Since switch_to() was previously inlined, there was no sane way
to know where on the stack the return address was stored.  Now with the
frame of a sleeping thread well defined, this can be implemented correctly.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-7-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:31:51 +02:00
Brian Gerst
616d24835e sched/x86: Pass kernel thread parameters in 'struct fork_frame'
Instead of setting up a fake pt_regs context, put the kernel thread
function pointer and arg into the unused callee-restored registers
of 'struct fork_frame'.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-6-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:31:50 +02:00
Brian Gerst
0100301bfd sched/x86: Rewrite the switch_to() code
Move the low-level context switch code to an out-of-line asm stub instead of
using complex inline asm.  This allows constructing a new stack frame for the
child process to make it seamlessly flow to ret_from_fork without an extra
test and branch in __switch_to().  It also improves code generation for
__schedule() by using the C calling convention instead of clobbering all
registers.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:31:41 +02:00
Brian Gerst
7b32aeadbc sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame
Add 'struct inactive_task_frame', which defines the layout of the stack for
a sleeping process.  For now, the only defined field is the BP register
(frame pointer).

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:27:41 +02:00
Brian Gerst
163630191e sched/x86/64, kgdb: Clear GDB_PS on 64-bit
switch_to() no longer saves EFLAGS, so it's bogus to look for it on the
stack.  Set it to zero like 32-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:27:40 +02:00
Brian Gerst
4e047aa7f2 sched/x86/32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
Match 64-bit and set gdb_regs[GDB_PC] to zero.  thread.ip is always the
same point in the scheduler (except for newly forked processes), and will
be removed in a future patch.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:27:40 +02:00
Josh Poimboeuf
13e25bab7e x86/dumpstack/ftrace: Don't print unreliable addresses in print_context_stack_bp()
When function graph tracing is enabled, print_context_stack_bp() can
report return_to_handler() as an unreliable address, which is confusing
and misleading: return_to_handler() is really only useful as a hint for
debugging, whereas print_context_stack_bp() users only care about the
actual 'reliable' call path.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c51aef578d8027791b38d2ad9bac0c7f499fde91.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:15 +02:00
Josh Poimboeuf
6f727b84e2 x86/dumpstack/ftrace: Mark function graph handler function as unreliable
When function graph tracing is enabled for a function, its return
address on the stack is replaced with the address of an ftrace handler
(return_to_handler).

Currently 'return_to_handler' can be reported as reliable.  That's not
ideal, and can actually be misleading.  When saving or dumping the
stack, you normally only care about what led up to that point (the call
path), rather than what will happen in the future (the return path).

That's especially true in the non-oops stack trace case, which isn't
used for debugging.  For example, in a perf profiling operation,
reporting return_to_handler() in the trace would just be confusing.

And in the oops case, where debugging is important, "unreliable" is also
more appropriate there because it serves as a hint that graph tracing
was involved, instead of trying to imply that return_to_handler() was
the real caller.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f8af15749c7d632d3e7f815995831d5b7f82950d.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:15 +02:00
Josh Poimboeuf
471bd10f5e ftrace/x86: Implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
Use the more reliable version of ftrace_graph_ret_addr() so we no longer
have to worry about the unwinder getting out of sync with the function
graph ret_stack index, which can happen if the unwinder skips any frames
before calling ftrace_graph_ret_addr().

This fixes this issue (and several others like it):

  $ cat /proc/self/stack
  [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
  [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110
  [<ffffffff813127c4>] proc_single_show+0x54/0x80
  [<ffffffff812be088>] seq_read+0x108/0x3e0
  [<ffffffff812923d7>] __vfs_read+0x37/0x140
  [<ffffffff812929d9>] vfs_read+0x99/0x140
  [<ffffffff81293f28>] SyS_read+0x58/0xc0
  [<ffffffff818af97c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
  [<ffffffffffffffff>] 0xffffffffffffffff

  $ echo function_graph > /sys/kernel/debug/tracing/current_tracer

  $ cat /proc/self/stack
  [<ffffffff818b2428>] return_to_handler+0x0/0x27
  [<ffffffff810394cc>] print_context_stack+0xfc/0x100
  [<ffffffff818b2428>] return_to_handler+0x0/0x27
  [<ffffffff8103891b>] dump_trace+0x12b/0x350
  [<ffffffff818b2428>] return_to_handler+0x0/0x27
  [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
  [<ffffffff818b2428>] return_to_handler+0x0/0x27
  [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110
  [<ffffffff818b2428>] return_to_handler+0x0/0x27
  [<ffffffff813127c4>] proc_single_show+0x54/0x80
  [<ffffffff818b2428>] return_to_handler+0x0/0x27
  [<ffffffff812be088>] seq_read+0x108/0x3e0
  [<ffffffff818b2428>] return_to_handler+0x0/0x27
  [<ffffffff812923d7>] __vfs_read+0x37/0x140
  [<ffffffff818b2428>] return_to_handler+0x0/0x27
  [<ffffffff812929d9>] vfs_read+0x99/0x140
  [<ffffffffffffffff>] 0xffffffffffffffff

Enabling function graph tracing causes the stack trace to change in two
ways:

First, the real call addresses are confusingly interspersed with
'return_to_handler' addresses.  This issue will be fixed by the next
patch.

Second, the stack trace is offset by two frames, because the unwinder
skipped the first two frames and got out of sync with the ret_stack
index.  This patch fixes this issue.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a6d623e36f8d08f9a17bd74d804d201177a23afd.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:15 +02:00
Josh Poimboeuf
408fe5de2f x86/dumpstack/ftrace: Convert dump_trace() callbacks to use ftrace_graph_ret_addr()
Convert print_context_stack() and print_context_stack_bp() to use the
arch-independent ftrace_graph_ret_addr() helper.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/56ec97cafc1bf2e34d1119e6443d897db406da86.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:14 +02:00
Josh Poimboeuf
9a7c348ba6 ftrace: Add return address pointer to ftrace_ret_stack
Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack.  Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.

Note that an array of 50 ftrace_ret_stack structs is allocated for every
task.  So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:14 +02:00
Andy Lutomirski
e37e43a497 x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
This allows x86_64 kernels to enable vmapped stacks by setting
HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y
high level Kconfig option.

There are a couple of interesting bits:

First, x86 lazily faults in top-level paging entries for the vmalloc
area.  This won't work if we get a page fault while trying to access
the stack: the CPU will promote it to a double-fault and we'll die.
To avoid this problem, probe the new stack when switching stacks and
forcibly populate the pgd entry for the stack when switching mms.

Second, once we have guard pages around the stack, we'll want to
detect and handle stack overflow.

I didn't enable it on x86_32.  We'd need to rework the double-fault
code a bit and I'm concerned about running out of vmalloc virtual
addresses under some workloads.

This patch, by itself, will behave somewhat erratically when the
stack overflows while RSP is still more than a few tens of bytes
above the bottom of the stack.  Specifically, we'll get #PF and make
it to no_context and them oops without reliably triggering a
double-fault, and no_context doesn't know about stack overflows.
The next patch will improve that case.

Thank you to Nadav and Brian for helping me pay enough attention to
the SDM to hopefully get this right.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c88f3e2920b18e6cc621d772a04a62c06869037e.1470907718.git.luto@kernel.org
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:11:42 +02:00
Wei Jiangang
5035da4199 x86/apic: Update comment about disabling processor focus
Fix references to discarded end_level_ioapic_irq().

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1471576957-12961-2-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 11:24:33 +02:00
Wei Jiangang
384d9fe374 x86/smpboot: Check APIC ID before setting up default routing
This is not a bugfix, but code optimization.

If the BSP's APIC ID in local APIC is unexpected,
a kernel panic will occur and the system will halt.
That means no need to enable APIC mode, and no reason
to set up the default routing for APIC.

The combination of default_setup_apic_routing() and
apic_bsp_setup() are used to enable APIC mode.
They two should be kept together, rather than being
separated by the codes of checking APIC ID.
Just like their usage in APIC_init_uniprocessor().

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1471576957-12961-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 11:24:33 +02:00
Borislav Petkov
556b672368 x86/entry: Remove outdated comment about SYSCALL targets
The comment probably meant some old AMD64 incarnation which most likely
never saw the light of day. STAR and LSTAR are two different registers
and STAR sets CS/SS(DS) selectors for *all* modes, not only 32-bit.

So simply remove that comment.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160823172356.15879-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 11:20:31 +02:00
Wanpeng Li
2e63ad4bd5 x86/apic: Do not init irq remapping if ioapic is disabled
native_smp_prepare_cpus
  -> default_setup_apic_routing
    -> enable_IR_x2apic
      -> irq_remapping_prepare
        -> intel_prepare_irq_remapping
          -> intel_setup_irq_remapping		  

So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.

Prevent remap initialization when IOAPIC is disabled.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-24 09:45:40 +02:00
Jessica Yu
d4c3e6e1b1 livepatch/x86: apply alternatives and paravirt patches after relocations
Implement arch_klp_init_object_loaded() for x86, which applies
alternatives/paravirt patches. This fixes the order in which relocations
and alternatives/paravirt patches are applied.

Previously, if a patch module had alternatives or paravirt patches,
these were applied first by the module loader before livepatch can apply
per-object relocations. The (buggy) sequence of events was:

(1) Load patch module
(2) Apply alternatives and paravirt patches to patch module
    * Note that these are applied to the new functions in the patch module
(3) Apply per-object relocations to patch module when target module loads.
    * This clobbers what was written in step 2

This lead to crashes and corruption in general, since livepatch would
overwrite or step on previously applied alternative/paravirt patches.
The correct sequence of events should be:

(1) Load patch module
(2) Apply per-object relocations to patch module
(3) Apply alternatives and paravirt patches to patch module

This is fixed by delaying paravirt/alternatives patching until after
relocations are applied. Any .altinstructions or .parainstructions
sections are prefixed with ".klp.arch.${objname}" and applied in
arch_klp_init_object_loaded().

Signed-off-by: Jessica Yu <jeyu@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-08-18 23:41:55 +02:00
Josh Poimboeuf
4950d6d48a x86/dumpstack: Remove 64-byte gap at end of irq stack
There has been a 64-byte gap at the end of the irq stack for at least 12
years.  It predates git history, and I can't find any good reason for
it.  Remove it.  What's the worst that could happen?

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/14f9281c5475cc44af95945ea7546bff2e3836db.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 18:41:33 +02:00
Josh Poimboeuf
72b4f6a5e9 x86/dumpstack: Fix x86_32 kernel_stack_pointer() previous stack access
On x86_32, when an interrupt happens from kernel space, SS and SP aren't
pushed and the existing stack is used.  So pt_regs is effectively two
words shorter, and the previous stack pointer is normally the memory
after the shortened pt_regs, aka '&regs->sp'.

But in the rare case where the interrupt hits right after the stack
pointer has been changed to point to an empty stack, like for example
when call_on_stack() is used, the address immediately after the
shortened pt_regs is no longer on the stack.  In that case, instead of
'&regs->sp', the previous stack pointer should be retrieved from the
beginning of the current stack page.

kernel_stack_pointer() wants to do that, but it forgets to dereference
the pointer.  So instead of returning a pointer to the previous stack,
it returns a pointer to the beginning of the current stack.

Note that it's probably outside of kernel_stack_pointer()'s scope to be
switching stacks at all.  The x86_64 version of this function doesn't do
it, and it would be better for the caller to do it if necessary.  But
that's a patch for another day.  This just fixes the original intent.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0788aa6a23 ("x86: Prepare removal of previous_esp from i386 thread_info structure")
Link: http://lkml.kernel.org/r/472453d6e9f6a2d4ab16aaed4935f43117111566.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 18:41:31 +02:00
Josh Poimboeuf
ae952ffdfd x86/head: Remove useless zeroed word
This zeroed word has no apparent purpose, so remove it.

Brian Gerst says:

  "FYI the word used to be the SS segment selector for the LSS
   instruction, which isn't needed in 64-bit mode."

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b056855c295bbb3825b97c1e9f7958539a4d6cf2.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 18:41:30 +02:00
Josh Poimboeuf
6225f3232a x86/dumpstack: Remove extra brackets around "<EOE>"
When starting the dump of an exception stack, it shows "<<EOE>>" instead
of "<EOE>".  print_trace_stack() already adds brackets, no need to add
them again.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/77f185fd5b81845869b400aa619415458df6b6cc.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 18:41:29 +02:00
Josh Poimboeuf
b32f96c75d x86/asm/head: Rename 'stack_start' -> 'initial_stack'
The 'stack_start' variable is similar in usage to 'initial_code' and
'initial_gs': they're all stored in head_64.S and they're all updated by
SMP and ACPI suspend before starting a CPU.

Rename it to 'initial_stack' to be consistent with the others.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87063d773a3212051b77e17b0ee427f6582a5050.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 18:41:29 +02:00
Josh Poimboeuf
bf255bdaad x86/dumpstack: Remove show_trace()
There are a bewildering array of options for dumping the stack.
Simplify things a little by removing show_trace(), which is unused.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/fe02292eac9d409001ec0cf6d06f90ced242570d.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 18:41:27 +02:00
Jiri Olsa
7b0501b1e7 x86/smp: Fix __max_logical_packages value setup
Frank reported kernel panic when he disabled several cores in BIOS
via following option:

  Core Disable Bitmap(Hex)   [0]

with number 0xFFE, which leaves 16 CPUs in system (out of 48).

The kernel panic below goes along with following messages:

 smpboot: Max logical packages: 2^M
 smpboot: APIC(0) Converting physical 0 to logical package 0^M
 smpboot: APIC(20) Converting physical 1 to logical package 1^M
 smpboot: APIC(40) Package 2 exceeds logical package map^M
 smpboot: CPU 8 APICId 40 disabled^M
 smpboot: APIC(60) Package 3 exceeds logical package map^M
 smpboot: CPU 12 APICId 60 disabled^M
 ...
 general protection fault: 0000 [#1] SMP^M
 Modules linked in:^M
 CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
 Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
 task: ffff8801673e0000 ti: ffff8801673ac000 task.ti: ffff8801673ac000^M
 RIP: 0010:[<ffffffff81014d54>]  [<ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
 ...
  [<ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
  [<ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
  [<ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
  [<ffffffff81002190>] do_one_initcall+0x50/0x190^M
  [<ffffffff810ab193>] ? parse_args+0x293/0x480^M
  [<ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
  [<ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
  [<ffffffff816dc19e>] kernel_init+0xe/0x110^M
  [<ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
  [<ffffffff816dc190>] ? rest_init+0x80/0x80^M

The reason for the panic is wrong value of __max_logical_packages,
which lets logical_package_map uninitialized and the uncore code
relying on this map being properly initialized (maybe we should
add some safety checks there as well).

The __max_logical_packages is computed as:

  DIV_ROUND_UP(total_cpus, ncpus);
  - ncpus being number of cores

With above BIOS setup we get total_cpus == 16 which set
__max_logical_packages to 2 (ncpus is 12).

Once topology_update_package_map processes CPU with logical
pkg over 2 we display above messages and fail to initialize
the physical_to_logical_pkg map, which makes the uncore code
crash.

The fix is to remove logical_package_map bitmap completely
and keep and update the logical_packages number instead.

After we enumerate all the present CPUs, we check if the
enumerated logical packages count is within its computed
maximum from BIOS data.

If it's not the case, we set this maximum to the new enumerated
value and freeze any new addition of logical packages.

The freeze is because lot of init code like uncore/rapl/cqm
depends on having maximum logical package value set to allocate
their data, so we can't change it later on.

Prarit Bhargava tested the patch and confirms that it solves
the problem:

  From dmidecode:
          Core Count: 24
          Core Enabled: 24
          Thread Count: 48

Orig kernel boot log:

 [    0.464981] smpboot: Max logical packages: 19
 [    0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3

1.  nr_cpus=8, should stop enumerating in package 0:

 [    0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.539596] smpboot: Max logical packages: 19

2.  max_cpus=8, should still enumerate all packages:

 [    0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
 [    0.550524] smpboot: Max logical packages: 19

3.  nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
    package 2:

 [    0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.539368] smpboot: Max logical packages: 19

4.  maxcpus=49, should still enumerate all packages:

 [    0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
 [    0.549624] smpboot: Max logical packages: 19

5.  kdump (nr_cpus=1) works as well.

Reported-by: Frank Ramsay <framsay@redhat.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160815101700.GA30090@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 10:14:48 +02:00
Borislav Petkov
88b2f63402 x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
Similar to:

  efaad554b4 ("x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y")

... fix microcode loading from the initrd on AMD by adding the
randomization offset to the microcode patch container within the initrd.

Reported-and-tested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Link: http://lkml.kernel.org/r/20160817113314.GA19221@nazgul.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 10:06:49 +02:00
Daniel Vetter
cc9263874b Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because too many conflicts, and also we need to get at the
latest struct fence patches from Gustavo. Requested by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-08-15 10:41:47 +02:00
Baoquan He
31b02dd718 x86/apic, ACPI: Fix incorrect assignment when handling apic/x2apic entries
By pure accident the bug makes no functional difference, because the only
expression where we are using these values is (!count && !x2count), in which
the variables are interchangeable, but it makes sense to fix the bug
nevertheless.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1470986507-24191-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-15 08:53:44 +02:00
Baoquan He
6de421198c x86/apic, ACPI: Remove the repeated lapic address override entry parsing
The ACPI MADT has a 32-bit field providing lapic address at which
each processor can access its lapic information. MADT also contains
an optional entry to provide a 64-bit address to override the 32-bit
one. However the current code does the lapic address override entry
parsing twice. One is in early_acpi_boot_init() because AMD NUMA need
get boot_cpu_id earlier. The other is in acpi_boot_init() which parses
all MADT entries.

So in this patch we remove the repeated code in the 2nd part.

Meanwhile print lapic override entry information like other MADT entry,
this will be added to boot log.

This patch is not supposed to change any runtime behavior, other than
improving kernel messages.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-15 08:53:37 +02:00
Baoquan He
a91bf718db x86/mm/numa: Open code function early_get_boot_cpu_id()
Previously early_acpi_boot_init() was called in early_get_boot_cpu_id()
to get the value for boot_cpu_physical_apicid. Now early_acpi_boot_init()
has been taken out and moved to setup_arch(), the name of
early_get_boot_cpu_id() doesn't match its implementation anymore, and
only the getting boot-time SMP configuration code was left.

So in this patch we open code it.

Also move the smp_found_config check into default_get_smp_config to
simplify code, because both early_get_smp_config() and get_smp_config()
call x86_init.mpparse.get_smp_config().

Also remove the redundent CONFIG_X86_MPPARSE #ifdef check when we call
early_get_smp_config().

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-15 08:51:54 +02:00
Linus Torvalds
01ea443982 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This is bigger than usual - the reason is partly a pent-up stream of
  fixes after the merge window and partly accidental.  The fixes are:

   - five patches to fix a boot failure on Andy Lutomirsky's laptop
   - four SGI UV platform fixes
   - KASAN fix
   - warning fix
   - documentation update
   - swap entry definition fix
   - pkeys fix
   - irq stats fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
  x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
  x86/boot: Rework reserve_real_mode() to allow multiple tries
  x86/boot: Defer setup_real_mode() to early_initcall time
  x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
  x86/boot: Run reserve_bios_regions() after we initialize the memory map
  x86/irq: Do not substract irq_tlb_count from irq_call_count
  x86/mm: Fix swap entry comment and macro
  x86/mm/kaslr: Fix -Wformat-security warning
  x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
  x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
  x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
  x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
  x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
  x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
  x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
  x86/mm: Disable preemption during CR3 read+write
  x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
  x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
  x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
2016-08-12 14:31:10 -07:00
Linus Torvalds
3bc6d8c155 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Misc fixes: a /dev/rtc regression fix, two APIC timer period
  calibration fixes, an ARM clocksource driver fix and a NOHZ
  power use regression fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
  x86/timers/apic: Inform TSC deadline clockevent device about recalibration
  x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
  timers: Fix get_next_timer_interrupt() computation
  clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered
2016-08-12 13:55:06 -07:00
Denys Vlasenko
68187872c7 uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
Since instruction decoder now supports EVEX-encoded instructions, two fixes
are needed to correctly handle them in uprobes.

Extended bits for MODRM.rm field need to be sanitized just like we do it
for VEX3, to avoid encoding wrong register for register-relative access.

EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
paranoid here: proper encoding for register-relative access
should have EVEX.x = 1.

Secondly, we should fetch vex.vvvv for EVEX too.
This is now super easy because instruction decoder populates
vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # v4.1+
Fixes: 8a764a875f ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-12 08:29:24 +02:00