Yazen Ghannam
e5d6a126d4
x86/mce/AMD: Pass the bank number to smca_get_bank_type()
...
Pass the bank number to smca_get_bank_type() since that's all we need.
Also, we should compare the bank number to MAX_NR_BANKS (size of the
smca_banks array) not the number of bank types. Bank types are reused
for multiple banks, so the number of types can be different from the
number of banks in a system and thus we could return an invalid bank
type.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: <stable@vger.kernel.org > # 4.14.x
Cc: <stable@vger.kernel.org > # 4.14.x: 11cf887728
x86/MCE/AMD: Define a function to get SMCA bank type
Cc: <stable@vger.kernel.org > # 4.14.x: c6708d50f1
x86/MCE: Report only DRAM ECC as memory errors on AMD systems
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/20180221101900.10326-6-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 17:00:54 +01:00
Borislav Petkov
4b1e84276a
x86/mce/AMD: Collect error info even if valid bits are not set
...
The MCA banks log error info into MCA_ADDR, MCA_MISC0, and MCA_SYND even
if the corresponding valid bits are not set:
"Error handlers should save the values in MCA_ADDR, MCA_MISC0,
and MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
MCA_STATUS[SyndV] are zero."
Do so by setting those bits so that code down the MCE processing path
doesn't need to be changed.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/20180221101900.10326-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 17:00:54 +01:00
Borislav Petkov
b2fbf6f282
x86/mce: Issue the 'mcelog --ascii' message only on !AMD
...
mcelog cannot decode AMD MCEs.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/20180221101900.10326-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 17:00:53 +01:00
Borislav Petkov
0993394664
x86/mce: Convert 'struct mca_config' bools to a bitfield
...
... to save space when future flags are added.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/20180221101900.10326-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 17:00:53 +01:00
Borislav Petkov
a189c03235
x86/mce: Put private structures and definitions into the internal header
...
... because they don't need to be exported outside of MCE.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/20180221101900.10326-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 17:00:53 +01:00
Josh Poimboeuf
9fbcc57aa1
extable: Make init_kernel_text() global
...
Convert init_kernel_text() to a global function and use it in a few
places instead of manually comparing _sinittext and _einittext.
Note that kallsyms.h has a very similar function called
is_kernel_inittext(), but its end check is inclusive. I'm not sure
whether that's intentional behavior, so I didn't touch it.
Suggested-by: Jason Baron <jbaron@akamai.com >
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com >
Acked-by: Peter Zijlstra <peterz@infradead.org >
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/4335d02be8d45ca7d265d2f174251d0b7ee6c5fd.1519051220.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 16:54:06 +01:00
Kirill A. Shutemov
39b9552281
x86/mm: Optimize boot-time paging mode switching cost
...
By this point we have functioning boot-time switching between 4- and
5-level paging mode. But naive approach comes with cost.
Numbers below are for kernel build, allmodconfig, 5 times.
CONFIG_X86_5LEVEL=n:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17308719.892691 task-clock:u (msec) # 26.772 CPUs utilized ( +- 0.11% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,993,164 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,614,978,867,455 cycles:u # 2.520 GHz ( +- 0.01% )
39,371,534,575,126 stalled-cycles-frontend:u # 90.27% frontend cycles idle ( +- 0.09% )
28,363,350,152,428 instructions:u # 0.65 insn per cycle
# 1.39 stalled cycles per insn ( +- 0.00% )
6,316,784,066,413 branches:u # 364.948 M/sec ( +- 0.00% )
250,808,144,781 branch-misses:u # 3.97% of all branches ( +- 0.01% )
646.531974142 seconds time elapsed ( +- 1.15% )
CONFIG_X86_5LEVEL=y:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17411536.780625 task-clock:u (msec) # 26.426 CPUs utilized ( +- 0.10% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,868,663 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,865,909,056,301 cycles:u # 2.519 GHz ( +- 0.01% )
39,740,130,365,581 stalled-cycles-frontend:u # 90.59% frontend cycles idle ( +- 0.05% )
28,363,358,997,959 instructions:u # 0.65 insn per cycle
# 1.40 stalled cycles per insn ( +- 0.00% )
6,316,784,937,460 branches:u # 362.793 M/sec ( +- 0.00% )
251,531,919,485 branch-misses:u # 3.98% of all branches ( +- 0.00% )
658.886307752 seconds time elapsed ( +- 0.92% )
The patch tries to fix the performance regression by using
cpu_feature_enabled(X86_FEATURE_LA57) instead of pgtable_l5_enabled in
all hot code paths. These will statically patch the target code for
additional performance.
CONFIG_X86_5LEVEL=y + the patch:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17381990.268506 task-clock:u (msec) # 26.907 CPUs utilized ( +- 0.19% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,862,625 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,697,726,320,051 cycles:u # 2.514 GHz ( +- 0.03% )
39,480,408,690,401 stalled-cycles-frontend:u # 90.35% frontend cycles idle ( +- 0.05% )
28,363,394,221,388 instructions:u # 0.65 insn per cycle
# 1.39 stalled cycles per insn ( +- 0.00% )
6,316,794,985,573 branches:u # 363.410 M/sec ( +- 0.00% )
251,013,232,547 branch-misses:u # 3.97% of all branches ( +- 0.01% )
645.991174661 seconds time elapsed ( +- 1.19% )
Unfortunately, this approach doesn't help with text size:
vmlinux.before .text size: 8190319
vmlinux.after .text size: 8200623
The .text section is increased by about 4k. Not sure if we can do anything
about this.
Signed-off-by: Kirill A. Shuemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180216114948.68868-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 10:19:18 +01:00
Kirill A. Shutemov
b9952ec787
x86/xen: Allow XEN_PV and XEN_PVH to be enabled with X86_5LEVEL
...
With boot-time switching between paging modes, XEN_PV and XEN_PVH can be
boot into 4-level paging mode.
Tested-by: Juergen Gross <jgross@suse.com >
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Reviewed-by: Juergen Gross <jgross@suse.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180216114948.68868-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 10:19:18 +01:00
Peter Zijlstra
bd89004f63
x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
...
The objtool retpoline validation found this indirect jump. Seeing how
it's on CPU bringup before we run userspace it should be safe, annotate
it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 09:05:03 +01:00
David Woodhouse
dd84441a79
x86/speculation: Use IBRS if available before calling into firmware
...
Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.
Block preemption while IBRS is set, although in practice the call sites
already had to be doing that.
Ignore hpwdt.c for now. It's taking spinlocks and calling into firmware
code, from an NMI handler. I don't want to touch that with a bargepole.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk >
Reviewed-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: arjan.van.de.ven@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/1519037457-7643-2-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-20 09:38:33 +01:00
Jan Beulich
6262b6e78c
x86/IO-APIC: Avoid warning in 32-bit builds
...
Constants wider than 32 bits should be tagged with ULL.
Signed-off-by: Jan Beulich <jbeulich@suse.com >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/5A8AF23F02000078001A91E5@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-20 09:33:40 +01:00
Baoquan He
bee3204ec3
x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
...
Currently the kdump kernel becomes very slow if 'noapic' is specified.
Normal kernel doesn't have this bug.
Kernel parameter 'noapic' is used to disable IO-APIC in system for
testing or special purpose. Here the root cause is that in kdump
kernel LAPIC is disabled since commit:
522e664644
("x86/apic: Disable I/O APIC before shutdown of the local APIC")
In this case we need set up through-local-APIC on boot CPU in
setup_local_APIC().
In normal kernel the legacy irq mode is enabled by the BIOS. If
it is virtual wire mode, the local-APIC has been enabled and set as
through-local-APIC.
Though we fixed the regression introduced by commit 522e664644
,
to further improve robustness set up the through-local-APIC mode
explicitly, do not rely on the default boot IRQ mode.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-7-bhe@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:46 +01:00
Baoquan He
51b146c572
x86/apic: Rename variables and functions related to x86_io_apic_ops
...
The names of x86_io_apic_ops and its two member variables are
misleading:
The ->read() member is to read IO_APIC reg, while ->disable()
which is called by native_disable_io_apic()/irq_remapping_disable_io_apic()
is actually used to restore boot IRQ mode, not to disable the IO-APIC.
So rename x86_io_apic_ops to 'x86_apic_ops' since it doesn't only
handle the IO-APIC, but also the local APIC.
Also rename its member variables and the related callbacks.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-6-bhe@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:45 +01:00
Baoquan He
50374b96d2
x86/apic: Remove the (now) unused disable_IO_APIC() function
...
No one uses it anymore.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-5-bhe@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:45 +01:00
Baoquan He
339b2ae0cd
x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump
...
This is a regression fix.
Before, to fix erratum AVR31, the following commit:
522e664644
("x86/apic: Disable I/O APIC before shutdown of the local APIC")
... moved the lapic_shutdown() call to after disable_IO_APIC() in the reboot
and kexec/kdump code paths.
This introduced the following regression: disable_IO_APIC() not only clears
the IO-APIC, but it also restores boot IRQ mode by setting the
LAPIC/APIC/IMCR, calling lapic_shutdown() after disable_IO_APIC() will
disable LAPIC and ruin the possible virtual wire mode setting which
the code has been trying to do all along.
The consequence is that a KVM guest kernel always prints the warning below
during kexec/kdump as the kernel boots up:
[ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330
[ ........]
[ 0.001000] Call Trace:
[ 0.001000] apic_bsp_setup+0x56/0x74
[ 0.001000] x86_late_time_init+0x11/0x16
[ 0.001000] start_kernel+0x3c9/0x486
[ 0.001000] secondary_startup_64+0xa5/0xb0
[ ........]
[ 0.001000] masked ExtINT on CPU#0
To fix this, just call clear_IO_APIC() to stop the IO-APIC where
disable_IO_APIC() was called, and call restore_boot_irq_mode() to
restore boot IRQ mode before a reboot or a kexec/kdump jump.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: stable@vger.kernel.org
Cc: uobergfe@redhat.com
Fixes: commit 522e664644
("x86/apic: Disable I/O APIC before shutdown of the local APIC")
Link: http://lkml.kernel.org/r/20180214054656.3780-4-bhe@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:45 +01:00
Baoquan He
3c9e76dbea
x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y
...
Split following patches disable_IO_APIC() will be broken up into
clear_IO_APIC() and restore_boot_irq_mode().
These two functions will be called separately where they are needed
to fix a regression introduced by:
522e664644
("x86/apic: Disable I/O APIC before shutdown of the local APIC").
While the CONFIG_KEXEC_JUMP=y code doesn't call lapic_shutdown() before jump
like kexec/kdump, so it's not impacted by commit 522e664644
.
Hence here change clear_IO_APIC() as public, and replace disable_IO_APIC()
with clear_IO_APIC() and restore_boot_irq_mode() to keep CONFIG_KEXEC_JUMP=y
code unchanged in essence. No functional change.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-3-bhe@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:44 +01:00
Baoquan He
ce279cdc04
x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC()
...
This is a preparation patch. Split out the code which restores boot
irq mode from disable_IO_APIC() into the new restore_boot_irq_mode()
function.
No functional changes.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-2-bhe@redhat.com
[ Build fix for !CONFIG_IO_APIC and rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:29 +01:00
Prarit Bhargava
63e708f826
x86/xen: Calculate __max_logical_packages on PV domains
...
The kernel panics on PV domains because native_smp_cpus_done() is
only called for HVM domains.
Calculate __max_logical_packages for PV domains.
Fixes: b4c0a7326f
("x86/smpboot: Fix __max_logical_packages estimate")
Signed-off-by: Prarit Bhargava <prarit@redhat.com >
Tested-and-reported-by: Simon Gaiser <simon@invisiblethingslab.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: x86@kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Cc: Juergen Gross <jgross@suse.com >
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com >
Cc: Prarit Bhargava <prarit@redhat.com >
Cc: Kate Stewart <kstewart@linuxfoundation.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Vitaly Kuznetsov <vkuznets@redhat.com >
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Signed-off-by: Juergen Gross <jgross@suse.com >
2018-02-17 09:40:45 +01:00
Borislav Petkov
42ca8082e2
x86/CPU: Check CPU feature bits after microcode upgrade
...
With some microcode upgrades, new CPUID features can become visible on
the CPU. Check what the kernel has mirrored now and issue a warning
hinting at possible things the user/admin can do to make use of the
newly visible features.
Originally-by: Ashok Raj <ashok.raj@intel.com >
Tested-by: Ashok Raj <ashok.raj@intel.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Reviewed-by: Ashok Raj <ashok.raj@intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180216112640.11554-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 08:43:55 +01:00
Borislav Petkov
1008c52c09
x86/CPU: Add a microcode loader callback
...
Add a callback function which the microcode loader calls when microcode
has been updated to a newer revision. Do the callback only when no error
was encountered during loading.
Tested-by: Ashok Raj <ashok.raj@intel.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Reviewed-by: Ashok Raj <ashok.raj@intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180216112640.11554-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 08:43:55 +01:00
Borislav Petkov
3f1f576a19
x86/microcode: Propagate return value from updating functions
...
... so that callers can know when microcode was updated and act
accordingly.
Tested-by: Ashok Raj <ashok.raj@intel.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Reviewed-by: Ashok Raj <ashok.raj@intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180216112640.11554-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 08:43:55 +01:00
Kirill A. Shutemov
6f9dd32971
x86/mm: Support boot-time switching of paging modes in the early boot code
...
Early boot code should be able to initialize page tables for both 4- and
5-level paging modes.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:48 +01:00
Kirill A. Shutemov
9b46a051e4
x86/mm: Initialize vmemmap_base at boot-time
...
vmemmap area has different placement depending on paging mode.
Let's adjust it during early boot accodring to machine capability.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:48 +01:00
Kirill A. Shutemov
a7412546d8
x86/mm: Adjust vmalloc base and size at boot-time
...
vmalloc area has different placement and size depending on paging mode.
Let's adjust it during early boot accodring to machine capability.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:47 +01:00
Kirill A. Shutemov
4fa5662b6b
x86/mm: Initialize 'page_offset_base' at boot-time
...
For 4- and 5-level paging we have different 'page_offset_base'.
Let's initialize it at boot-time accordingly to machine capability.
We also have to split __PAGE_OFFSET_BASE into two constants -- for 4-
and 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:47 +01:00
Kirill A. Shutemov
b16e770bfa
x86/mm: Initialize 'pgdir_shift' and 'ptrs_per_p4d' at boot-time
...
Switching between paging modes requires the folding of the p4d page table level
when we only have 4 paging levels, which means we need to adjust 'pgdir_shift'
and 'ptrs_per_p4d' during early boot according to paging mode.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:47 +01:00
Kirill A. Shutemov
4c2b4058ab
x86/mm: Initialize 'pgtable_l5_enabled' at boot-time
...
'pgtable_l5_enabled' indicates which paging mode we are using. We need to
initialize it at boot-time according to machine capability.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:46 +01:00
Dou Liyang
b753a2b79a
x86/apic: Make setup_local_APIC() static
...
This function isn't used outside of apic.c, so let's mark it static.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: bhe@redhat.com
Cc: ebiederm@xmission.com
Link: http://lkml.kernel.org/r/20180214062554.21020-1-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:39:11 +01:00
Linus Torvalds
e525de3ab0
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
...
Pull x86 fixes from Ingo Molnar:
"Misc fixes all across the map:
- /proc/kcore vsyscall related fixes
- LTO fix
- build warning fix
- CPU hotplug fix
- Kconfig NR_CPUS cleanups
- cpu_has() cleanups/robustification
- .gitignore fix
- memory-failure unmapping fix
- UV platform fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
x86/error_inject: Make just_return_func() globally visible
x86/platform/UV: Fix GAM Range Table entries less than 1GB
x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
x86/Kconfig: Further simplify the NR_CPUS config
x86/Kconfig: Simplify NR_CPUS config
x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
x86/cpufeature: Update _static_cpu_has() to use all named variables
x86/cpufeature: Reindent _static_cpu_has()
2018-02-14 17:31:51 -08:00
Linus Torvalds
d4667ca142
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
...
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
2018-02-14 17:02:15 -08:00
Gustavo A. R. Silva
24dbc6000f
x86/cpu: Change type of x86_cache_size variable to unsigned int
...
Currently, x86_cache_size is of type int, which makes no sense as we
will never have a valid cache size equal or less than 0. So instead of
initializing this variable to -1, it can perfectly be initialized to 0
and use it as an unsigned variable instead.
Suggested-by: Thomas Gleixner <tglx@linutronix.de >
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Addresses-Coverity-ID: 1464429
Link: http://lkml.kernel.org/r/20180213192208.GA26414@embeddedor.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:53 +01:00
Dan Carpenter
9de29eac8d
x86/spectre: Fix an error message
...
If i == ARRAY_SIZE(mitigation_options) then we accidentally print
garbage from one space beyond the end of the mitigation_options[] array.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Petkov <bp@suse.de >
Cc: David Woodhouse <dwmw@amazon.co.uk >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: KarimAllah Ahmed <karahmed@amazon.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: kernel-janitors@vger.kernel.org
Fixes: 9005c6834c
("x86/spectre: Simplify spectre_v2 command line parsing")
Link: http://lkml.kernel.org/r/20180214071416.GA26677@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:53 +01:00
Jia Zhang
b399151cb4
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
...
x86_mask is a confusing name which is hard to associate with the
processor's stepping.
Additionally, correct an indent issue in lib/cpu.c.
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com >
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:52 +01:00
Andy Lutomirski
1299ef1d88
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
...
flush_tlb_single() and flush_tlb_one() sound almost identical, but
they really mean "flush one user translation" and "flush one kernel
translation". Rename them to flush_tlb_one_user() and
flush_tlb_one_kernel() to make the semantics more obvious.
[ I was looking at some PTI-related code, and the flush-one-address code
is unnecessarily hard to understand because the names of the helpers are
uninformative. This came up during PTI review, but no one got around to
doing it. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Dave Hansen <dave.hansen@intel.com >
Cc: Eduardo Valentin <eduval@amazon.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Juergen Gross <jgross@suse.com >
Cc: Kees Cook <keescook@google.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Linux-MM <linux-mm@kvack.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Will Deacon <will.deacon@arm.com >
Link: http://lkml.kernel.org/r/3303b02e3c3d049dc5235d5651e0ae6d29a34354.1517414378.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:52 +01:00
Peter Zijlstra
3b3a371cc9
x86/debug: Use UD2 for WARN()
...
Since the Intel SDM added an ModR/M byte to UD0 and binutils followed
that specification, we now cannot disassemble our kernel anymore.
This now means Intel and AMD disagree on the encoding of UD0. And instead
of playing games with additional bytes that are valid ModR/M and single
byte instructions (0xd6 for instance), simply use UD2 for both WARN() and
BUG().
Requested-by: Linus Torvalds <torvalds@linux-foundation.org >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Acked-by: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180208194406.GD25181@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:50 +01:00
Kirill A. Shutemov
162434e7f5
x86/mm: Make MAX_PHYSADDR_BITS and MAX_PHYSMEM_BITS dynamic
...
For boot-time switching between paging modes, we need to be able to
adjust size of physical address space at runtime.
As part of making physical address space size variable, we have to make
X86_5LEVEL dependent on SPARSEMEM_VMEMMAP. !SPARSEMEM_VMEMMAP
configuration doesn't build with variable MAX_PHYSMEM_BITS.
For !SPARSEMEM_VMEMMAP SECTIONS_WIDTH depends on MAX_PHYSMEM_BITS:
SECTIONS_WIDTH
SECTIONS_SHIFT
MAX_PHYSMEM_BITS
And SECTIONS_WIDTH is used on pre-processor stage, it doesn't work if it's
dyncamic. See include/linux/page-flags-layout.h.
Effect on kernel image size:
text data bss dec hex filename
8628393 4734340 1368064 14730797 e0c62d vmlinux.before
8628892 4734340 1368064 14731296 e0c820 vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-8-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:15 +01:00
Kirill A. Shutemov
c65e774fb3
x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable
...
For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.
The change doesn't affect the kernel image size much:
text data bss dec hex filename
8628091 4734304 1368064 14730459 e0c4db vmlinux.before
8628393 4734340 1368064 14730797 e0c62d vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:14 +01:00
Kirill A. Shutemov
e626e6bb0d
x86/mm: Introduce 'pgtable_l5_enabled'
...
The new flag would indicate what paging mode we are in.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:14 +01:00
Kirill A. Shutemov
eedb92abb9
x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y
...
We need to be able to adjust virtual memory layout at runtime to be able
to switch between 4- and 5-level paging at boot-time.
KASLR already has movable __VMALLOC_BASE, __VMEMMAP_BASE and __PAGE_OFFSET.
Let's re-use it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:13 +01:00
Dou Liyang
ccf5355d05
x86/apic: Simplify init_bsp_APIC() usage
...
Since CONFIG_X86_64 selects CONFIG_X86_LOCAL_APIC, the following
condition:
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_LOCAL_APIC)
is equivalent to:
#if defined(CONFIG_X86_LOCAL_APIC)
... and we can eliminate that #ifdef by providing an empty
init_bsp_APIC() stub in the !CONFIG_X86_LOCAL_APIC case.
Also add some comments to explain why we call init_bsp_APIC().
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: mroos@linux.ee
Cc: ville.syrjala@linux.intel.com
Link: http://lkml.kernel.org/r/20180117073748.23905-1-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 17:30:38 +01:00
Dou Liyang
afed7d1720
x86/x2apic: Mark set_x2apic_phys_mode() as __init
...
set_x2apic_phys_mode() is only called as part of early_param()
initialization - so mark it as __init.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com >
Cc: Juergen Gross <jgross@suse.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180117034543.26723-1-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 17:24:36 +01:00
Tony Luck
fd0e786d9d
x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
...
In the following commit:
ce0fa3e56a
("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
... we added code to memory_failure() to unmap the page from the
kernel 1:1 virtual address space to avoid speculative access to the
page logging additional errors.
But memory_failure() may not always succeed in taking the page offline,
especially if the page belongs to the kernel. This can happen if
there are too many corrected errors on a page and either mcelog(8)
or drivers/ras/cec.c asks to take a page offline.
Since we remove the 1:1 mapping early in memory_failure(), we can
end up with the page unmapped, but still in use. On the next access
the kernel crashes :-(
There are also various debug paths that call memory_failure() to simulate
occurrence of an error. Since there is no actual error in memory, we
don't need to map out the page for those cases.
Revert most of the previous attempt and keep the solution local to
arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
1) there is a real error
2) memory_failure() succeeds.
All of this only applies to 64-bit systems. 32-bit kernel doesn't map
all of memory into kernel space. It isn't worth adding the code to unmap
the piece that is mapped because nobody would run a 32-bit kernel on a
machine that has recoverable machine checks.
Signed-off-by: Tony Luck <tony.luck@intel.com >
Cc: Andrew Morton <akpm@linux-foundation.org >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Petkov <bp@suse.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Dave <dave.hansen@intel.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Robert (Persistent Memory) <elliott@hpe.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org #v4.14
Fixes: ce0fa3e56a
("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 16:25:06 +01:00
mike.travis@hpe.com
c25d99d20b
x86/platform/UV: Fix GAM Range Table entries less than 1GB
...
The latest UV platforms include the new ApachePass NVDIMMs into the
UV address space. This has introduced address ranges in the Global
Address Map Table that are less than the previous lowest range, which
was 2GB. Fix the address calculation so it accommodates address ranges
from bytes to exabytes.
Signed-off-by: Mike Travis <mike.travis@hpe.com >
Reviewed-by: Andrew Banman <andrew.banman@hpe.com >
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Russ Anderson <russ.anderson@hpe.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 14:15:45 +01:00
Cao jin
a06cc94f3f
x86/build: Drop superfluous ALIGN from the linker script
...
ALIGN(8) is superfluous since macro TEXT_TEXT already has one.
bonus cleanups:
- indentation fix
- spaces -> tab.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180208063857.15197-1-caoj.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 13:08:46 +01:00
Masayoshi Mizuma
295cc7eb31
x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
...
When a physical CPU is hot-removed, the following warning messages
are shown while the uncore device is removed in uncore_pci_remove():
WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
uncore_pci_remove+0xf1/0x110
...
CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
...
Call Trace:
pci_device_remove+0x36/0xb0
device_release_driver_internal+0x145/0x210
pci_stop_bus_device+0x76/0xa0
pci_stop_root_bus+0x44/0x60
acpi_pci_root_remove+0x1f/0x80
acpi_bus_trim+0x54/0x90
acpi_bus_trim+0x2e/0x90
acpi_device_hotplug+0x2bc/0x4b0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x141/0x340
worker_thread+0x47/0x3e0
kthread+0xf5/0x130
When uncore_pci_remove() runs, it tries to get the package ID to
clear the value of uncore_extra_pci_dev[].dev[] by using
topology_phys_to_logical_pkg(). The warning messesages are
shown because topology_phys_to_logical_pkg() returns -1.
arch/x86/events/intel/uncore.c:
static void uncore_pci_remove(struct pci_dev *pdev)
{
...
phys_id = uncore_pcibus_to_physid(pdev->bus);
...
pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
uncore_extra_pci_dev[pkg].dev[i] = NULL;
break;
}
}
WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!
topology_phys_to_logical_pkg() tries to find
cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
arch/x86/kernel/smpboot.c:
int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
int cpu;
for_each_possible_cpu(cpu) {
struct cpuinfo_x86 *c = &cpu_data(cpu);
if (c->initialized && c->phys_proc_id == phys_pkg)
return c->logical_proc_id;
}
return -1;
}
However, the phys_proc_id was already set to 0 by remove_siblinginfo()
when the CPU was offlined.
So, topology_phys_to_logical_pkg() cannot find the correct
logical_proc_id and always returns -1.
As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
messages are shown.
What is worse is that the bogus 'pkg' index results in two bugs:
- We dereference uncore_extra_pci_dev[] with a negative index
- We fail to clean up a stale pointer in uncore_extra_pci_dev[][]
To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().
This should not cause any problems, because ->phys_proc_id is not
used after it is hot-removed and it is re-set while hot-adding.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: yasu.isimatu@gmail.com
Cc: <stable@vger.kernel.org >
Fixes: 30bb981185
("x86/topology: Avoid wasting 128k for package id array")
Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 12:47:28 +01:00
Ingo Molnar
21e433bdb9
x86/speculation: Clean up various Spectre related details
...
Harmonize all the Spectre messages so that a:
dmesg | grep -i spectre
... gives us most Spectre related kernel boot messages.
Also fix a few other details:
- clarify a comment about firmware speculation control
- s/KPTI/PTI
- remove various line-breaks that made the code uglier
Acked-by: David Woodhouse <dwmw@amazon.co.uk >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 09:03:08 +01:00
David Woodhouse
f208820a32
Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
...
This reverts commit 64e16720ea
.
We cannot call C functions like that, without marking all the
call-clobbered registers as, well, clobbered. We might have got away
with it for now because the __ibp_barrier() function was *fairly*
unlikely to actually use any other registers. But no. Just no.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: sironi@amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 08:59:00 +01:00
David Woodhouse
d37fc6d360
x86/speculation: Correct Speculation Control microcode blacklist again
...
Arjan points out that the Intel document only clears the 0xc2 microcode
on *some* parts with CPUID 506E3 (INTEL_FAM6_SKYLAKE_DESKTOP stepping 3).
For the Skylake H/S platform it's OK but for Skylake E3 which has the
same CPUID it isn't (yet) cleared.
So removing it from the blacklist was premature. Put it back for now.
Also, Arjan assures me that the 0x84 microcode for Kaby Lake which was
featured in one of the early revisions of the Intel document was never
released to the public, and won't be until/unless it is also validated
as safe. So those can change to 0x80 which is what all *other* versions
of the doc have identified.
Once the retrospective testing of existing public microcodes is done, we
should be back into a mode where new microcodes are only released in
batches and we shouldn't even need to update the blacklist for those
anyway, so this tweaking of the list isn't expected to be a thing which
keeps happening.
Requested-by: Arjan van de Ven <arjan.van.de.ven@intel.com >
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1518449255-2182-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-13 08:58:59 +01:00
Linus Torvalds
a9a08845e9
vfs: do bulk POLL* -> EPOLL* replacement
...
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2018-02-11 14:34:03 -08:00
Borislav Petkov
c80c5ec1b2
x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
...
The following commit:
7b6061627e
("x86: do not use print_symbol()")
... introduced a new build warning on 32-bit x86:
arch/x86/kernel/cpu/mcheck/mce.c:237:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
pr_cont("{%pS}", (void *)m->ip);
^
Fix the type mismatch between the 'void *' expected by %pS and the mce->ip
field which is u64 by casting to long.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-kernel@vger.kernel.org
Fixes: 7b6061627e
("x86: do not use print_symbol()")
Link: http://lkml.kernel.org/r/20180210145314.22174-1-bp@alien8.de
[ Cleaned up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-11 11:37:39 +01:00