Juergen Gross
038bac2b02
x86/acpi: Add a new x86_init_acpi structure to x86_init_ops
...
Add a new struct x86_init_acpi to x86_init_ops. For now it contains
only one init function to get the RSDP table address.
Signed-off-by: Juergen Gross <jgross@suse.com >
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Eric Biederman <ebiederm@xmission.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Kees Cook <keescook@chromium.org >
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: boris.ostrovsky@oracle.com
Cc: lenb@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20180219100906.14265-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-26 08:43:20 +01:00
Ingo Molnar
3f7df3efeb
Merge tag 'v4.16-rc3' into x86/mm, to pick up fixes
...
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-26 08:41:15 +01:00
Linus Torvalds
c23a757591
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
...
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes:
- UAPI data type correction for hyperv
- correct the cpu cores field in /proc/cpuinfo on CPU hotplug
- return proper error code in the resctrl file system failure path to
avoid silent subsequent failures
- correct a subtle accounting issue in the new vector allocation code
which went unnoticed for a while and caused suspend/resume
failures"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
x86/topology: Fix function name in documentation
x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
x86/apic/vector: Handle vector release on CPU unplug correctly
genirq/matrix: Handle CPU offlining proper
x86/headers/UAPI: Use __u64 instead of u64 in <uapi/asm/hyperv.h>
2018-02-25 16:58:55 -08:00
Radim Krčmář
fe2a3027e7
KVM: x86: fix backward migration with async_PF
...
Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
bit when enabling async_PF, but this bit is reserved on old hypervisors,
which results in a failure upon migration.
To avoid breaking different cases, we are checking for CPUID feature bit
before enabling the feature and nothing else.
Fixes: 52a5c155cf
("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
Cc: <stable@vger.kernel.org >
Reviewed-by: Wanpeng Li <wanpengli@tencent.com >
Reviewed-by: David Hildenbrand <david@redhat.com >
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com >
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com >
2018-02-24 01:43:48 +01:00
Sebastian Ott
f75e4924f0
kvm: fix warning for non-x86 builds
...
Fix the following sparse warning by moving the prototype
of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h .
CHECK arch/s390/kvm/../../../virt/kvm/kvm_main.c
arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com >
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com >
2018-02-24 01:43:47 +01:00
Daniel Borkmann
a493a87f38
bpf, x64: implement retpoline for tail call
...
Implement a retpoline [0] for the BPF tail call JIT'ing that converts
the indirect jump via jmp %rax that is used to make the long jump into
another JITed BPF image. Since this is subject to speculative execution,
we need to control the transient instruction sequence here as well
when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
The latter aligns also with what gcc / clang emits (e.g. [1]).
JIT dump after patch:
# bpftool p d x i 1
0: (18) r2 = map[id:1]
2: (b7) r3 = 0
3: (85) call bpf_tail_call#12
4: (b7) r0 = 2
5: (95) exit
With CONFIG_RETPOLINE:
# bpftool p d j i 1
[...]
33: cmp %edx,0x24(%rsi)
36: jbe 0x0000000000000072 |*
38: mov 0x24(%rbp),%eax
3e: cmp $0x20,%eax
41: ja 0x0000000000000072 |
43: add $0x1,%eax
46: mov %eax,0x24(%rbp)
4c: mov 0x90(%rsi,%rdx,8),%rax
54: test %rax,%rax
57: je 0x0000000000000072 |
59: mov 0x28(%rax),%rax
5d: add $0x25,%rax
61: callq 0x000000000000006d |+
66: pause |
68: lfence |
6b: jmp 0x0000000000000066 |
6d: mov %rax,(%rsp) |
71: retq |
72: mov $0x2,%eax
[...]
* relative fall-through jumps in error case
+ retpoline for indirect jump
Without CONFIG_RETPOLINE:
# bpftool p d j i 1
[...]
33: cmp %edx,0x24(%rsi)
36: jbe 0x0000000000000063 |*
38: mov 0x24(%rbp),%eax
3e: cmp $0x20,%eax
41: ja 0x0000000000000063 |
43: add $0x1,%eax
46: mov %eax,0x24(%rbp)
4c: mov 0x90(%rsi,%rdx,8),%rax
54: test %rax,%rax
57: je 0x0000000000000063 |
59: mov 0x28(%rax),%rax
5d: add $0x25,%rax
61: jmpq *%rax |-
63: mov $0x2,%eax
[...]
* relative fall-through jumps in error case
- plain indirect jump as before
[0] https://support.google.com/faqs/answer/7625886
[1] a31e654fa1
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net >
Signed-off-by: Alexei Starovoitov <ast@kernel.org >
2018-02-22 15:31:42 -08:00
Yazen Ghannam
68627a697c
x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type
...
Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
bank 4 in the smca_banks array. This means that when we check if a bank
is initialized, like during boot or resume, we will see that bank 4 is
not initialized and try to initialize it.
This will cause a call trace, when resuming from suspend, due to
rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
an IPI but we're running with interrupts disabled. This triggers:
WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
...
Reserved banks will be read-as-zero, so their MCA_IPID register will be
zero. So, like the smca_banks array, the threshold_banks array will not
have an entry for a reserved bank since all its MCA_MISC* registers will
be zero.
Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.
Use the "Reserved" type when checking if a bank is reserved. It's
possible that other bank numbers may be reserved on future systems.
Don't try to find the block address on reserved banks.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: <stable@vger.kernel.org > # 4.14.x
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 17:00:54 +01:00
Borislav Petkov
a189c03235
x86/mce: Put private structures and definitions into the internal header
...
... because they don't need to be exported outside of MCE.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Tony Luck <tony.luck@intel.com >
Cc: linux-edac <linux-edac@vger.kernel.org >
Link: http://lkml.kernel.org/r/20180221101900.10326-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 17:00:53 +01:00
Ingo Molnar
d72f4e29e6
x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
...
firmware_restrict_branch_speculation_*() recently started using
preempt_enable()/disable(), but those are relatively high level
primitives and cause build failures on some 32-bit builds.
Since we want to keep <asm/nospec-branch.h> low level, convert
them to macros to avoid header hell...
Cc: David Woodhouse <dwmw@amazon.co.uk >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: arjan.van.de.ven@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 16:54:03 +01:00
Kirill A. Shutemov
39b9552281
x86/mm: Optimize boot-time paging mode switching cost
...
By this point we have functioning boot-time switching between 4- and
5-level paging mode. But naive approach comes with cost.
Numbers below are for kernel build, allmodconfig, 5 times.
CONFIG_X86_5LEVEL=n:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17308719.892691 task-clock:u (msec) # 26.772 CPUs utilized ( +- 0.11% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,993,164 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,614,978,867,455 cycles:u # 2.520 GHz ( +- 0.01% )
39,371,534,575,126 stalled-cycles-frontend:u # 90.27% frontend cycles idle ( +- 0.09% )
28,363,350,152,428 instructions:u # 0.65 insn per cycle
# 1.39 stalled cycles per insn ( +- 0.00% )
6,316,784,066,413 branches:u # 364.948 M/sec ( +- 0.00% )
250,808,144,781 branch-misses:u # 3.97% of all branches ( +- 0.01% )
646.531974142 seconds time elapsed ( +- 1.15% )
CONFIG_X86_5LEVEL=y:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17411536.780625 task-clock:u (msec) # 26.426 CPUs utilized ( +- 0.10% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,868,663 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,865,909,056,301 cycles:u # 2.519 GHz ( +- 0.01% )
39,740,130,365,581 stalled-cycles-frontend:u # 90.59% frontend cycles idle ( +- 0.05% )
28,363,358,997,959 instructions:u # 0.65 insn per cycle
# 1.40 stalled cycles per insn ( +- 0.00% )
6,316,784,937,460 branches:u # 362.793 M/sec ( +- 0.00% )
251,531,919,485 branch-misses:u # 3.98% of all branches ( +- 0.00% )
658.886307752 seconds time elapsed ( +- 0.92% )
The patch tries to fix the performance regression by using
cpu_feature_enabled(X86_FEATURE_LA57) instead of pgtable_l5_enabled in
all hot code paths. These will statically patch the target code for
additional performance.
CONFIG_X86_5LEVEL=y + the patch:
Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
17381990.268506 task-clock:u (msec) # 26.907 CPUs utilized ( +- 0.19% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
331,862,625 page-faults:u # 0.019 M/sec ( +- 0.01% )
43,697,726,320,051 cycles:u # 2.514 GHz ( +- 0.03% )
39,480,408,690,401 stalled-cycles-frontend:u # 90.35% frontend cycles idle ( +- 0.05% )
28,363,394,221,388 instructions:u # 0.65 insn per cycle
# 1.39 stalled cycles per insn ( +- 0.00% )
6,316,794,985,573 branches:u # 363.410 M/sec ( +- 0.00% )
251,013,232,547 branch-misses:u # 3.97% of all branches ( +- 0.01% )
645.991174661 seconds time elapsed ( +- 1.19% )
Unfortunately, this approach doesn't help with text size:
vmlinux.before .text size: 8190319
vmlinux.after .text size: 8200623
The .text section is increased by about 4k. Not sure if we can do anything
about this.
Signed-off-by: Kirill A. Shuemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180216114948.68868-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 10:19:18 +01:00
Kirill A. Shutemov
92e1c5b3f7
x86/mm: Redefine some of page table helpers as macros
...
This is preparation for the next patch, which would change
pgtable_l5_enabled to be cpu_feature_enabled(X86_FEATURE_LA57).
The change makes few helpers in paravirt.h dependent on
cpu_feature_enabled() definition from cpufeature.h.
And cpufeature.h is dependent on paravirt.h.
Let's re-define some of helpers as macros to break this dependency loop.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180216114948.68868-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 10:19:18 +01:00
Peter Zijlstra
3010a0663f
x86/paravirt, objtool: Annotate indirect calls
...
Paravirt emits indirect calls which get flagged by objtool retpoline
checks, annotate it away because all these indirect calls will be
patched out before we start userspace.
This patching happens through alternative_instructions() ->
apply_paravirt() -> pv_init_ops.patch() which will eventually end up
in paravirt_patch_default(). This function _will_ write direct
alternatives.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 09:05:03 +01:00
Peter Zijlstra
9e0e3c5130
x86/speculation, objtool: Annotate indirect calls/jumps for objtool
...
Annotate the indirect calls/jumps in the CALL_NOSPEC/JUMP_NOSPEC
alternatives.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-21 09:05:03 +01:00
David Woodhouse
dd84441a79
x86/speculation: Use IBRS if available before calling into firmware
...
Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.
Block preemption while IBRS is set, although in practice the call sites
already had to be doing that.
Ignore hpwdt.c for now. It's taking spinlocks and calling into firmware
code, from an NMI handler. I don't want to touch that with a bargepole.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk >
Reviewed-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: arjan.van.de.ven@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/1519037457-7643-2-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-20 09:38:33 +01:00
David Woodhouse
d1c99108af
Revert "x86/retpoline: Simplify vmexit_fill_RSB()"
...
This reverts commit 1dde7415e9
. By putting
the RSB filling out of line and calling it, we waste one RSB slot for
returning from the function itself, which means one fewer actual function
call we can make if we're doing the Skylake abomination of call-depth
counting.
It also changed the number of RSB stuffings we do on vmexit from 32,
which was correct, to 16. Let's just stop with the bikeshedding; it
didn't actually *fix* anything anyway.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: arjan.van.de.ven@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/1519037457-7643-4-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-20 09:38:26 +01:00
Jan Beulich
f2f18b16c7
x86/LDT: Avoid warning in 32-bit builds with older gcc
...
BUG() doesn't always imply "no return", and hence should be followed by
a return statement even if that's obviously (to a human) unreachable.
Signed-off-by: Jan Beulich <jbeulich@suse.com >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/5A8AF2AA02000078001A91E9@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-20 09:33:40 +01:00
Jan Beulich
700b7c5409
x86/asm: Improve how GEN_*_SUFFIXED_RMWcc() specify clobbers
...
Commit:
df3405245a
("x86/asm: Add suffix macro for GEN_*_RMWcc()")
... introduced "suffix" RMWcc operations, adding bogus clobber specifiers:
For one, on x86 there's no point explicitly clobbering "cc".
In fact, with GCC properly fixed, this results in an overlap being detected by
the compiler between outputs and clobbers.
Furthermore it seems bad practice to me to have clobber specification
and use of the clobbered register(s) disconnected - it should rather be
at the invocation place of that GEN_{UN,BIN}ARY_SUFFIXED_RMWcc() macros
that the clobber is specified which this particular invocation needs.
Drop the "cc" clobber altogether and move the "cx" one to refcount.h.
Signed-off-by: Jan Beulich <jbeulich@suse.com >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Kees Cook <keescook@chromium.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/5A8AF1F802000078001A91E1@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-20 09:33:39 +01:00
Jan Beulich
842cef9113
x86/mm: Fix {pmd,pud}_{set,clear}_flags()
...
Just like pte_{set,clear}_flags() their PMD and PUD counterparts should
not do any address translation. This was outright wrong under Xen
(causing a dead boot with no useful output on "suitable" systems), and
produced needlessly more complicated code (even if just slightly) when
paravirt was enabled.
Signed-off-by: Jan Beulich <jbeulich@suse.com >
Reviewed-by: Juergen Gross <jgross@suse.com >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/5A8AF1BB02000078001A91C3@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-20 09:33:39 +01:00
KarimAllah Ahmed
894266466a
x86/headers/UAPI: Use __u64 instead of u64 in <uapi/asm/hyperv.h>
...
... since u64 has a hidden header dependency that was not there before
using it (i.e. it breaks our VMM build).
Also, __u64 is the right way to expose data types through UAPI.
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de >
Acked-by: Thomas Gleixner <tglx@linutronix.de >
Cc: Haiyang Zhang <haiyangz@microsoft.com >
Cc: K. Y. Srinivasan <kys@microsoft.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Stephen Hemminger <sthemmin@microsoft.com >
Cc: devel@linuxdriverproject.org
Fixes: 93286261
("x86/hyperv: Reenlightenment notifications support")
Link: http://lkml.kernel.org/r/1519112391-23773-1-git-send-email-karahmed@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-20 08:54:47 +01:00
Baoquan He
51b146c572
x86/apic: Rename variables and functions related to x86_io_apic_ops
...
The names of x86_io_apic_ops and its two member variables are
misleading:
The ->read() member is to read IO_APIC reg, while ->disable()
which is called by native_disable_io_apic()/irq_remapping_disable_io_apic()
is actually used to restore boot IRQ mode, not to disable the IO-APIC.
So rename x86_io_apic_ops to 'x86_apic_ops' since it doesn't only
handle the IO-APIC, but also the local APIC.
Also rename its member variables and the related callbacks.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-6-bhe@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:45 +01:00
Baoquan He
50374b96d2
x86/apic: Remove the (now) unused disable_IO_APIC() function
...
No one uses it anymore.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-5-bhe@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:45 +01:00
Baoquan He
3c9e76dbea
x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y
...
Split following patches disable_IO_APIC() will be broken up into
clear_IO_APIC() and restore_boot_irq_mode().
These two functions will be called separately where they are needed
to fix a regression introduced by:
522e664644
("x86/apic: Disable I/O APIC before shutdown of the local APIC").
While the CONFIG_KEXEC_JUMP=y code doesn't call lapic_shutdown() before jump
like kexec/kdump, so it's not impacted by commit 522e664644
.
Hence here change clear_IO_APIC() as public, and replace disable_IO_APIC()
with clear_IO_APIC() and restore_boot_irq_mode() to keep CONFIG_KEXEC_JUMP=y
code unchanged in essence. No functional change.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-3-bhe@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:44 +01:00
Baoquan He
ce279cdc04
x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC()
...
This is a preparation patch. Split out the code which restores boot
irq mode from disable_IO_APIC() into the new restore_boot_irq_mode()
function.
No functional changes.
Signed-off-by: Baoquan He <bhe@redhat.com >
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: douly.fnst@cn.fujitsu.com
Cc: joro@8bytes.org
Cc: prarit@redhat.com
Cc: uobergfe@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-2-bhe@redhat.com
[ Build fix for !CONFIG_IO_APIC and rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 11:47:29 +01:00
Prarit Bhargava
63e708f826
x86/xen: Calculate __max_logical_packages on PV domains
...
The kernel panics on PV domains because native_smp_cpus_done() is
only called for HVM domains.
Calculate __max_logical_packages for PV domains.
Fixes: b4c0a7326f
("x86/smpboot: Fix __max_logical_packages estimate")
Signed-off-by: Prarit Bhargava <prarit@redhat.com >
Tested-and-reported-by: Simon Gaiser <simon@invisiblethingslab.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: x86@kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Cc: Juergen Gross <jgross@suse.com >
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com >
Cc: Prarit Bhargava <prarit@redhat.com >
Cc: Kate Stewart <kstewart@linuxfoundation.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Vitaly Kuznetsov <vkuznets@redhat.com >
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Signed-off-by: Juergen Gross <jgross@suse.com >
2018-02-17 09:40:45 +01:00
Borislav Petkov
1008c52c09
x86/CPU: Add a microcode loader callback
...
Add a callback function which the microcode loader calls when microcode
has been updated to a newer revision. Do the callback only when no error
was encountered during loading.
Tested-by: Ashok Raj <ashok.raj@intel.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Reviewed-by: Ashok Raj <ashok.raj@intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180216112640.11554-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 08:43:55 +01:00
Borislav Petkov
3f1f576a19
x86/microcode: Propagate return value from updating functions
...
... so that callers can know when microcode was updated and act
accordingly.
Tested-by: Ashok Raj <ashok.raj@intel.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Reviewed-by: Ashok Raj <ashok.raj@intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180216112640.11554-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-17 08:43:55 +01:00
Kirill A. Shutemov
6657fca06e
x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y
...
All pieces of the puzzle are in place and we can now allow to boot with
CONFIG_X86_5LEVEL=y on a machine without LA57 support.
Kernel will detect that LA57 is missing and fold p4d at runtime.
Update the documentation and the Kconfig option description to reflect the
change.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:49 +01:00
Kirill A. Shutemov
91f606a8fa
x86/mm: Replace compile-time checks for 5-level paging with runtime-time checks
...
This patch converts the of CONFIG_X86_5LEVEL check to runtime checks for
p4d folding.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-9-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:49 +01:00
Kirill A. Shutemov
98219dda2a
x86/mm: Fold p4d page table layer at runtime
...
Change page table helpers to fold p4d at runtime.
The logic is the same as in <asm-generic/pgtable-nop4d.h>.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-8-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:48 +01:00
Kirill A. Shutemov
9b46a051e4
x86/mm: Initialize vmemmap_base at boot-time
...
vmemmap area has different placement depending on paging mode.
Let's adjust it during early boot accodring to machine capability.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:48 +01:00
Kirill A. Shutemov
a7412546d8
x86/mm: Adjust vmalloc base and size at boot-time
...
vmalloc area has different placement and size depending on paging mode.
Let's adjust it during early boot accodring to machine capability.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:47 +01:00
Kirill A. Shutemov
4fa5662b6b
x86/mm: Initialize 'page_offset_base' at boot-time
...
For 4- and 5-level paging we have different 'page_offset_base'.
Let's initialize it at boot-time accordingly to machine capability.
We also have to split __PAGE_OFFSET_BASE into two constants -- for 4-
and 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@suse.de >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Woodhouse <dwmw2@infradead.org >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:48:47 +01:00
Dou Liyang
b753a2b79a
x86/apic: Make setup_local_APIC() static
...
This function isn't used outside of apic.c, so let's mark it static.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: bhe@redhat.com
Cc: ebiederm@xmission.com
Link: http://lkml.kernel.org/r/20180214062554.21020-1-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-16 10:39:11 +01:00
Linus Torvalds
e525de3ab0
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
...
Pull x86 fixes from Ingo Molnar:
"Misc fixes all across the map:
- /proc/kcore vsyscall related fixes
- LTO fix
- build warning fix
- CPU hotplug fix
- Kconfig NR_CPUS cleanups
- cpu_has() cleanups/robustification
- .gitignore fix
- memory-failure unmapping fix
- UV platform fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
x86/error_inject: Make just_return_func() globally visible
x86/platform/UV: Fix GAM Range Table entries less than 1GB
x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
x86/Kconfig: Further simplify the NR_CPUS config
x86/Kconfig: Simplify NR_CPUS config
x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
x86/cpufeature: Update _static_cpu_has() to use all named variables
x86/cpufeature: Reindent _static_cpu_has()
2018-02-14 17:31:51 -08:00
Linus Torvalds
d4667ca142
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
...
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
2018-02-14 17:02:15 -08:00
Gustavo A. R. Silva
24dbc6000f
x86/cpu: Change type of x86_cache_size variable to unsigned int
...
Currently, x86_cache_size is of type int, which makes no sense as we
will never have a valid cache size equal or less than 0. So instead of
initializing this variable to -1, it can perfectly be initialized to 0
and use it as an unsigned variable instead.
Suggested-by: Thomas Gleixner <tglx@linutronix.de >
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Addresses-Coverity-ID: 1464429
Link: http://lkml.kernel.org/r/20180213192208.GA26414@embeddedor.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:53 +01:00
Jia Zhang
b399151cb4
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
...
x86_mask is a confusing name which is hard to associate with the
processor's stepping.
Additionally, correct an indent issue in lib/cpu.c.
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com >
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:52 +01:00
Andy Lutomirski
1299ef1d88
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
...
flush_tlb_single() and flush_tlb_one() sound almost identical, but
they really mean "flush one user translation" and "flush one kernel
translation". Rename them to flush_tlb_one_user() and
flush_tlb_one_kernel() to make the semantics more obvious.
[ I was looking at some PTI-related code, and the flush-one-address code
is unnecessarily hard to understand because the names of the helpers are
uninformative. This came up during PTI review, but no one got around to
doing it. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Dave Hansen <dave.hansen@intel.com >
Cc: Eduardo Valentin <eduval@amazon.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Juergen Gross <jgross@suse.com >
Cc: Kees Cook <keescook@google.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Linux-MM <linux-mm@kvack.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Will Deacon <will.deacon@arm.com >
Link: http://lkml.kernel.org/r/3303b02e3c3d049dc5235d5651e0ae6d29a34354.1517414378.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:52 +01:00
Peter Zijlstra
ea00f30128
x86/speculation: Add <asm/msr-index.h> dependency
...
Joe Konno reported a compile failure resulting from using an MSR
without inclusion of <asm/msr-index.h>, and while the current code builds
fine (by accident) this needs fixing for future patches.
Reported-by: Joe Konno <joe.konno@linux.intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: arjan@linux.intel.com
Cc: bp@alien8.de
Cc: dan.j.williams@intel.com
Cc: dave.hansen@linux.intel.com
Cc: dwmw2@infradead.org
Cc: dwmw@amazon.co.uk
Cc: gregkh@linuxfoundation.org
Cc: hpa@zytor.com
Cc: jpoimboe@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: luto@kernel.org
Fixes: 20ffa1caec
("x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support")
Link: http://lkml.kernel.org/r/20180213132819.GJ25201@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:51 +01:00
Dan Williams
be3233fbfc
x86/speculation: Fix up array_index_nospec_mask() asm constraint
...
Allow the compiler to handle @size as an immediate value or memory
directly rather than allocating a register.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org >
Signed-off-by: Dan Williams <dan.j.williams@intel.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/151797010204.1289.1510000292250184993.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:50 +01:00
Peter Zijlstra
3b3a371cc9
x86/debug: Use UD2 for WARN()
...
Since the Intel SDM added an ModR/M byte to UD0 and binutils followed
that specification, we now cannot disassemble our kernel anymore.
This now means Intel and AMD disagree on the encoding of UD0. And instead
of playing games with additional bytes that are valid ModR/M and single
byte instructions (0xd6 for instance), simply use UD2 for both WARN() and
BUG().
Requested-by: Linus Torvalds <torvalds@linux-foundation.org >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Acked-by: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Josh Poimboeuf <jpoimboe@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180208194406.GD25181@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:50 +01:00
Josh Poimboeuf
2b5db66862
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
...
By default, objtool assumes that a UD2 is a dead end. This is mainly
because GCC 7+ sometimes inserts a UD2 when it detects a divide-by-zero
condition.
Now that WARN() is moving back to UD2, annotate the code after it as
reachable so objtool can follow the code flow.
Reported-by: Borislav Petkov <bp@alien8.de >
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Arjan van de Ven <arjan@linux.intel.com >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: kbuild test robot <fengguang.wu@intel.com >
Link: http://lkml.kernel.org/r/0e483379275a42626ba8898117f918e1bf661e40.1518130694.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-15 01:15:49 +01:00
Kirill A. Shutemov
09e61a779e
x86/mm: Make __VIRTUAL_MASK_SHIFT dynamic
...
For boot-time switching between paging modes, we need to be able to
adjust virtual mask shifts.
The change doesn't affect the kernel image size much:
text data bss dec hex filename
8628892 4734340 1368064 14731296 e0c820 vmlinux.before
8628966 4734340 1368064 14731370 e0c86a vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-9-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:15 +01:00
Kirill A. Shutemov
162434e7f5
x86/mm: Make MAX_PHYSADDR_BITS and MAX_PHYSMEM_BITS dynamic
...
For boot-time switching between paging modes, we need to be able to
adjust size of physical address space at runtime.
As part of making physical address space size variable, we have to make
X86_5LEVEL dependent on SPARSEMEM_VMEMMAP. !SPARSEMEM_VMEMMAP
configuration doesn't build with variable MAX_PHYSMEM_BITS.
For !SPARSEMEM_VMEMMAP SECTIONS_WIDTH depends on MAX_PHYSMEM_BITS:
SECTIONS_WIDTH
SECTIONS_SHIFT
MAX_PHYSMEM_BITS
And SECTIONS_WIDTH is used on pre-processor stage, it doesn't work if it's
dyncamic. See include/linux/page-flags-layout.h.
Effect on kernel image size:
text data bss dec hex filename
8628393 4734340 1368064 14730797 e0c62d vmlinux.before
8628892 4734340 1368064 14731296 e0c820 vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-8-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:15 +01:00
Kirill A. Shutemov
c65e774fb3
x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable
...
For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.
The change doesn't affect the kernel image size much:
text data bss dec hex filename
8628091 4734304 1368064 14730459 e0c4db vmlinux.before
8628393 4734340 1368064 14730797 e0c62d vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:14 +01:00
Kirill A. Shutemov
5c7919bb19
x86/mm: Make LDT_BASE_ADDR dynamic
...
LDT_BASE_ADDR has different value in 4- and 5-level paging
configurations.
We need to make it dynamic in preparation for boot-time switching
between paging modes.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:14 +01:00
Kirill A. Shutemov
e626e6bb0d
x86/mm: Introduce 'pgtable_l5_enabled'
...
The new flag would indicate what paging mode we are in.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:14 +01:00
Kirill A. Shutemov
eedb92abb9
x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y
...
We need to be able to adjust virtual memory layout at runtime to be able
to switch between 4- and 5-level paging at boot-time.
KASLR already has movable __VMALLOC_BASE, __VMEMMAP_BASE and __PAGE_OFFSET.
Let's re-use it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:13 +01:00
Kirill A. Shutemov
02390b87a9
mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
...
With boot-time switching between paging mode we will have variable
MAX_PHYSMEM_BITS.
Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
configuration to define zsmalloc data structures.
The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
It also suits well to handle PAE special case.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Reviewed-by: Nitin Gupta <ngupta@vflare.org >
Acked-by: Minchan Kim <minchan@kernel.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:13 +01:00
Kirill A. Shutemov
b83ce5ee91
x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52
...
__PHYSICAL_MASK_SHIFT is used to define the mask that helps to extract
physical address from a page table entry.
Although real physical address space available may differ between
machines, it's safe to use 52 as __PHYSICAL_MASK_SHIFT. Unused bits
above log2(MAXPHYADDR) up to bit 51 are reserved and must be 0.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@suse.de >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2018-02-14 13:11:13 +01:00