Introduce FS/GS base access functionality via <asm/fsgsbase.h>,
not yet used by anything directly.
Factor out task_seg_base() from x86/ptrace.c and rename it to
x86_fsgsbase_read_task() to make it part of the new helpers.
This will allow us to enhance FSGSBASE support and eventually enable
the FSBASE/GSBASE instructions.
An "inactive" GS base refers to a base saved at kernel entry
and being part of an inactive, non-running/stopped user-task.
(The typical ptrace model.)
Here are the new functions:
x86_fsbase_read_task()
x86_gsbase_read_task()
x86_fsbase_write_task()
x86_gsbase_write_task()
x86_fsbase_read_cpu()
x86_fsbase_write_cpu()
x86_gsbase_read_cpu_inactive()
x86_gsbase_write_cpu_inactive()
As an advantage of the unified namespace we can now see all FS/GSBASE
API use in the kernel via the following 'git grep' pattern:
$ git grep x86_.*sbase
[ mingo: Wrote new changelog. ]
Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-3-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit()
to register/unregister the "fpu" template. But these functions don't
exist anymore, causing a build error. Remove the calls to them.
Fixes: 944585a64f ("crypto: x86/aes-ni - remove special handling of AES in PCBC mode")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When building a 32-bit config which has the above MFD item as module
but OLPC_XO1_PM is enabled =y - which is bool, btw - the kernel fails
building with:
ld: arch/x86/platform/olpc/olpc-xo1-pm.o: in function `xo1_pm_remove':
/home/boris/kernel/linux/arch/x86/platform/olpc/olpc-xo1-pm.c:159: undefined reference to `mfd_cell_disable'
ld: arch/x86/platform/olpc/olpc-xo1-pm.o: in function `xo1_pm_probe':
/home/boris/kernel/linux/arch/x86/platform/olpc/olpc-xo1-pm.c:133: undefined reference to `mfd_cell_enable'
make: *** [Makefile:1030: vmlinux] Error 1
Force MFD_CS5535 to y if OLPC_XO1_PM is enabled.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Lubomir Rintel <lkundrak@v3.sk>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20181005131750.GA5366@zn.tnic
When SME is enabled, the memory is encrypted in the first kernel. In
this case, SME also needs to be enabled in the kdump kernel, and we have
to remap the old memory with the memory encryption mask.
The case of concern here is if SME is active in the first kernel,
and it is active too in the kdump kernel. There are four cases to be
considered:
a. dump vmcore
It is encrypted in the first kernel, and needs be read out in the
kdump kernel.
b. crash notes
When dumping vmcore, the people usually need to read useful
information from notes, and the notes is also encrypted.
c. iommu device table
It's encrypted in the first kernel, kdump kernel needs to access its
content to analyze and get information it needs.
d. mmio of AMD iommu
not encrypted in both kernels
Add a new bool parameter @encrypted to __ioremap_caller(). If set,
memory will be remapped with the SME mask.
Add a new function ioremap_encrypted() to explicitly pass in a true
value for @encrypted. Use ioremap_encrypted() for the above a, b, c
cases.
[ bp: cleanup commit message, extern defs in io.h and drop forgotten
include. ]
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kexec@lists.infradead.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: akpm@linux-foundation.org
Cc: dan.j.williams@intel.com
Cc: bhelgaas@google.com
Cc: baiyaowei@cmss.chinamobile.com
Cc: tiwai@suse.de
Cc: brijesh.singh@amd.com
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: jroedel@suse.de
Link: https://lkml.kernel.org/r/20180927071954.29615-2-lijiang@redhat.com
Ingo writes:
"perf fixes:
- fix a CPU#0 hot unplug bug and a PCI enumeration bug in the x86 Intel uncore PMU driver
- fix a CPU event enumeration bug in the x86 AMD PMU driver
- fix a perf ring-buffer corruption bug when using tracepoints
- fix a PMU unregister locking bug"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events
perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
perf/ring_buffer: Prevent concurent ring buffer access
perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0
perf/core: Fix perf_pmu_unregister() locking
vgetcyc() is full of barriers, so fetching values out of the vvar
page before vgetcyc() for use after vgetcyc() results in poor code
generation. Put vgetcyc() first to avoid this problem.
Also, pull the tv_sec division into the loop and put all the ts
writes together. The old code wrote ts->tv_sec on each iteration
before the syscall fallback check and then added in the offset
afterwards, which forced the compiler to pointlessly copy base->sec
to ts->tv_sec on each iteration. The new version seems to generate
sensible code.
Saves several cycles. With this patch applied, the result is faster
than before the clock_gettime() rewrite.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/3c05644d010b72216aa286a6d20b5078d5fae5cd.1538762487.git.luto@kernel.org
Paolo writes:
"KVM changes for 4.19-rc7
x86 and PPC bugfixes, mostly introduced in 4.19-rc1."
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: nVMX: fix entry with pending interrupt if APICv is enabled
KVM: VMX: hide flexpriority from guest when disabled at the module level
KVM: VMX: check for existence of secondary exec controls before accessing
KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault
KVM: x86: fix L1TF's MMIO GFN calculation
tools/kvm_stat: cut down decimal places in update interval dialog
KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS
KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly
KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
KVM: x86: never trap MSR_KERNEL_GS_BASE
On OLPC XO-1, the RTC is discovered via device tree from the arch
initcall. Don't let the PC platform register another one from its device
initcall, it's not going to work:
sysfs: cannot create duplicate filename '/devices/platform/rtc_cmos'
CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #12
Hardware name: OLPC XO/XO, BIOS OLPC Ver 1.00.01 06/11/2014
Call Trace:
dump_stack+0x16/0x18
sysfs_warn_dup+0x46/0x58
sysfs_create_dir_ns+0x76/0x9b
kobject_add_internal+0xed/0x209
? __schedule+0x3fa/0x447
kobject_add+0x5b/0x66
device_add+0x298/0x535
? insert_resource_conflict+0x2a/0x3e
platform_device_add+0x14d/0x192
? io_delay_init+0x19/0x19
platform_device_register+0x1c/0x1f
add_rtc_cmos+0x16/0x31
do_one_initcall+0x78/0x14a
? do_early_param+0x75/0x75
kernel_init_freeable+0x152/0x1e0
? rest_init+0xa2/0xa2
kernel_init+0x8/0xd5
ret_from_fork+0x2e/0x38
kobject_add_internal failed for rtc_cmos with -EEXIST, don't try to
register things with the same name in the same directory.
platform rtc_cmos: registered platform RTC device (no PNP device found)
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181004160808.307738-1-lkundrak@v3.sk
For historical reasons, the AES-NI based implementation of the PCBC
chaining mode uses a special FPU chaining mode wrapper template to
amortize the FPU start/stop overhead over multiple blocks.
When this FPU wrapper was introduced, it supported widely used
chaining modes such as XTS and CTR (as well as LRW), but currently,
PCBC is the only remaining user.
Since there are no known users of pcbc(aes) in the kernel, let's remove
this special driver, and rely on the generic pcbc driver to encapsulate
the AES-NI core cipher.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The code flow for the vclocks is convoluted as it requires the vclocks
which can be invalidated separately from the vsyscall_gtod_data sequence to
store the fact in a separate variable. That's inefficient.
Restructure the code so the vclock readout returns cycles and the
conversion to nanoseconds is handled at the call site.
If the clock gets invalidated or vclock is already VCLOCK_NONE, return
U64_MAX as the cycle value, which is invalid for all clocks and leave the
sequence loop immediately in that case by calling the fallback function
directly.
This allows to remove the gettimeofday fallback as it now uses the
clock_gettime() fallback and does the nanoseconds to microseconds
conversion in the same way as it does when the vclock is functional. It
does not make a difference whether the division by 1000 happens in the
kernel fallback or in userspace.
Generates way better code and gains a few cycles back.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Rickard <matt@softrans.com.au>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20180917130707.657928937@linutronix.de
It's desired to support more clocks in the VDSO, e.g. CLOCK_TAI. This
results either in indirect calls due to the larger switch case, which then
requires retpolines or when the compiler is forced to avoid jump tables it
results in even more conditionals.
To avoid both variants which are bad for performance the high resolution
functions and the coarse grained functions will be collapsed into one for
each. That requires to store the clock specific base time in an array.
Introcude struct vgtod_ts for storage and convert the data store, the
update function and the individual clock functions over to use it.
The new storage does not longer use gtod_long_t for seconds depending on 32
or 64 bit compile because this needs to be the full 64bit value even for
32bit when a Y2038 function is added. No point in keeping the distinction
alive in the internal representation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Rickard <matt@softrans.com.au>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20180917130707.324679401@linutronix.de
Commit b5861e5cf2 introduced a check on
the interrupt-window and NMI-window CPU execution controls in order to
inject an external interrupt vmexit before the first guest instruction
executes. However, when APIC virtualization is enabled the host does not
need a vmexit in order to inject an interrupt at the next interrupt window;
instead, it just places the interrupt vector in RVI and the processor will
inject it as soon as possible. Therefore, on machines with APICv it is
not enough to check the CPU execution controls: the same scenario can also
happen if RVI>vPPR.
Fixes: b5861e5cf2
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As of commit 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls
have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when
a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0,
whereas previously KVM would allow a nested guest to enable
VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware. That is,
KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't
(always) allow setting it when kvm-intel.flexpriority=0, and may even
initially allow the control and then clear it when the nested guest
writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause
functional issues.
Hide the control completely when the module parameter is cleared.
reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Return early from vmx_set_virtual_apic_mode() if the processor doesn't
support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of
which reside in SECONDARY_VM_EXEC_CONTROL. This eliminates warnings
due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing
on processors without secondary exec controls.
Remove the similar check for TPR shadowing as it is incorporated in the
flexpriority_enabled check and the APIC-related code in
vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE.
Reported-by: Gerhard Wiesinger <redhat@wiesinger.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As described in:
77b0bf55bc: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
In this patch we wrap the paravirt call section tricks in a macro,
to hide it from GCC.
The effect of the patch is a more aggressive inlining, which also
causes a size increase of kernel.
text data bss dec hex filename
18147336 10226688 2957312 31331336 1de1408 ./vmlinux before
18162555 10226288 2957312 31346155 1de4deb ./vmlinux after (+14819)
The number of static text symbols (non-inlined functions) goes down:
Before: 40053
After: 39942 (-111)
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20181003213100.189959-8-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
This patch increases the kernel size:
text data bss dec hex filename
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux before
18147336 10226688 2957312 31331336 1de1408 ./vmlinux after (+1755)
But enables more aggressive inlining (and probably better branch decisions).
The number of static text symbols in vmlinux is much lower:
Before: 40218
After: 40053 (-165)
The assembly code gets harder to read due to the extra macro layer.
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181003213100.189959-7-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block - i.e. to macrify the affected block.
As a result GCC considers the inline assembly block as a single instruction.
This patch handles the LOCK prefix, allowing more aggresive inlining:
text data bss dec hex filename
18140140 10225284 2957312 31322736 1ddf270 ./vmlinux before
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux after (+6845)
This is the reduction in non-inlined functions:
Before: 40286
After: 40218 (-68)
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181003213100.189959-6-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
This patch allows GCC to inline simple functions such as __get_seccomp_filter().
To no-one's surprise the result is that GCC performs more aggressive (read: correct)
inlining decisions in these senarios, which reduces the kernel size and presumably
also speeds it up:
text data bss dec hex filename
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux before
18140140 10225284 2957312 31322736 1ddf270 ./vmlinux after (-958)
16 fewer static text symbols:
Before: 40302
After: 40286 (-16)
these got inlined instead.
Functions such as kref_get(), free_user(), fuse_file_get() now get inlined. Hurray!
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181003213100.189959-5-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
In the case of objtool the resulting borkage can be significant, since all the
annotations of objtool are discarded during linkage and never inlined,
yet GCC bogusly considers most functions affected by objtool annotations
as 'too large'.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
This increases the kernel size slightly:
text data bss dec hex filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829)
The number of static text symbols (i.e. non-inlined functions) is reduced:
Before: 40321
After: 40302 (-19)
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-sparse@vger.kernel.org
Link: http://lkml.kernel.org/r/20181003213100.189959-4-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Using macros in inline assembly allows us to work around bugs
in GCC's inlining decisions.
Compile macros.S and use it to assemble all C files.
Currently only x86 will use it.
Background:
The inlining pass of GCC doesn't include an assembler, so it's not aware
of basic properties of the generated code, such as its size in bytes,
or that there are such things as discontiuous blocks of code and data
due to the newfangled linker feature called 'sections' ...
Instead GCC uses a lazy and fragile heuristic: it does a linear count of
certain syntactic and whitespace elements in inlined assembly block source
code, such as a count of new-lines and semicolons (!), as a poor substitute
for "code size and complexity".
Unsurprisingly this heuristic falls over and breaks its neck whith certain
common types of kernel code that use inline assembly, such as the frequent
practice of putting useful information into alternative sections.
As a result of this fresh, 20+ years old GCC bug, GCC's inlining decisions
are effectively disabled for inlined functions that make use of such asm()
blocks, because GCC thinks those sections of code are "large" - when in
reality they are often result in just a very low number of machine
instructions.
This absolute lack of inlining provess when GCC comes across such asm()
blocks both increases generated kernel code size and causes performance
overhead, which is particularly noticeable on paravirt kernels, which make
frequent use of these inlining facilities in attempt to stay out of the
way when running on baremetal hardware.
Instead of fixing the compiler we use a workaround: we set an assembly macro
and call it from the inlined assembly block. As a result GCC considers the
inline assembly block as a single instruction. (Which it often isn't but I digress.)
This uglifies and bloats the source code - for example just the refcount
related changes have this impact:
Makefile | 9 +++++++--
arch/x86/Makefile | 7 +++++++
arch/x86/kernel/macros.S | 7 +++++++
scripts/Kbuild.include | 4 +++-
scripts/mod/Makefile | 2 ++
5 files changed, 26 insertions(+), 3 deletions(-)
Yay readability and maintainability, it's not like assembly code is hard to read
and maintain ...
We also hope that GCC will eventually get fixed, but we are not holding
our breath for that. Yet we are optimistic, it might still happen, any decade now.
[ mingo: Wrote new changelog describing the background. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kbuild@vger.kernel.org
Link: http://lkml.kernel.org/r/20181003213100.189959-3-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In resctrl filesystem, mount options exist to enable L3/L2 CDP and MBA
Software Controller features if the platform supports them:
mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
But currently only "cdp" option is displayed in /proc/mounts. "cdpl2" and
"mba_MBps" options are not shown even when they are active.
Before:
# mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
# grep resctrl /proc/mounts
/sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp 0 0
After:
# mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
# grep resctrl /proc/mounts
/sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp,mba_MBps 0 0
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1536796118-60135-1-git-send-email-fenghua.yu@intel.com
Linus recently observed that if we did not worry about the padding
member in struct siginfo it is only about 48 bytes, and 48 bytes is
much nicer than 128 bytes for allocating on the stack and copying
around in the kernel.
The obvious thing of only adding the padding when userspace is
including siginfo.h won't work as there are sigframe definitions in
the kernel that embed struct siginfo.
So split siginfo in two; kernel_siginfo and siginfo. Keeping the
traditional name for the userspace definition. While the version that
is used internally to the kernel and ultimately will not be padded to
128 bytes is called kernel_siginfo.
The definition of struct kernel_siginfo I have put in include/signal_types.h
A set of buildtime checks has been added to verify the two structures have
the same field offsets.
To make it easy to verify the change kernel_siginfo retains the same
size as siginfo. The reduction in size comes in a following change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rework the defintion of struct siginfo so that the array padding
struct siginfo to SI_MAX_SIZE can be placed in a union along side of
the rest of the struct siginfo members. The result is that we no
longer need the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>