This patch provides APEI arch-specific bits for ARM64
Meanwhile,
(1) Move HEST type (ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) checking to
a generic place.
(2) Select HAVE_ACPI_APEI when EFI and ACPI is set on ARM64, because
arch_apei_get_mem_attribute is using efi_mem_attributes() on
ARM64.
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Fu Wei <fu.wei@linaro.org>
[ Fu Wei: improve && upstream ]
Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
0-day testing encountered a NULL pointer dereference in a cpumask access
from tsc_store_and_check_tsc_adjust().
This happens when the function is called on the boot CPU and the topology
masks are not yet available due to CPUMASK_OFFSTACK=y.
Add a NULL pointer check for the mask pointer. If NULL it's safe to assume
that the CPU is the boot CPU and the first one in the package.
Fixes: 8b223bc7ab ("x86/tsc: Store and check TSC ADJUST MSR")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is done to simplify the kexec_add_buffer argument list.
Adapt all callers to set up a kexec_buf to pass to kexec_add_buffer.
In addition, change the type of kexec_buf.buffer from char * to void *.
There is no particular reason for it to be a char *, and the change
allows us to get rid of 3 existing casts to char * in the code.
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the missing return statement to the inline stub
tsc_store_and_check_tsc_adjust() and add the other stubs to make a
SMP=y,TSC=n build happy.
While at it, remove the unused variable from the UP variant of
tsc_store_and_check_tsc_adjust().
Fixes: commit ba75fb646931 ("x86/tsc: Sync test only for the first cpu in a package")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Final 4.10 updates:
- fine-tune fb flushing and tracking (Chris Wilson)
- refactor state check dumper code for more conciseness (Tvrtko)
- roll out dev_priv all over the place (Tvrkto)
- finally remove __i915__ magic macro (Tvrtko)
- more gvt bugfixes (Zhenyu&team)
- better opregion CADL handling (Jani)
- refactor/clean up wm programming (Maarten)
- gpu scheduler + priority boosting for flips as first user (Chris
Wilson)
- make fbc use more atomic (Paulo)
- initial kvm-gvt framework, but not yet complete (Zhenyu&team)
* tag 'drm-intel-next-2016-11-21' of git://anongit.freedesktop.org/git/drm-intel: (127 commits)
drm/i915: Update DRIVER_DATE to 20161121
drm/i915: Skip final clflush if LLC is coherent
drm/i915: Always flush the dirty CPU cache when pinning the scanout
drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
drm/i915: Check that each request phase is completed before retiring
drm/i915: i915_pages_create_for_stolen should return err ptr
drm/i915: Enable support for nonblocking modeset
drm/i915: Be more careful to drop the GT wakeref
drm/i915: Move frontbuffer CS write tracking from ggtt vma to object
drm/i915: Only dump dp_m2_n2 configuration when drrs is used
drm/i915: don't leak global_timeline
drm/i915: add i915_address_space_fini
drm/i915: Add a few more sanity checks for stolen handling
drm/i915: Waterproof verification of gen9 forcewake table ranges
drm/i915: Introduce enableddisabled helper
drm/i915: Only dump possible panel fitter config for the platform
drm/i915: Only dump scaler config where supported
drm/i915: Compact a few pipe config debug lines
drm/i915: Don't log pipe config kernel pointer and duplicated pipe name
drm/i915: Dump FDI config only where applicable
...
If the first CPU of a package comes online, it is necessary to test whether
the TSC is in sync with a CPU on some other package. When a deviation is
observed (time going backwards between the two CPUs) the TSC is marked
unstable, which is a problem on large machines as they have to fall back to
the HPET clocksource, which is insanely slow.
It has been attempted to compensate the TSC by adding the offset to the TSC
and writing it back some time ago, but this never was merged because it did
not turn out to be stable, especially not on older systems.
Modern systems have become more stable in that regard and the TSC_ADJUST
MSR allows us to compensate for the time deviation in a sane way. If it's
available allow up to three synchronization runs and if a time warp is
detected the starting CPU can compensate the time warp via the TSC_ADJUST
MSR and retry. If the third run still shows a deviation or when random time
warps are detected the test terminally fails.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134018.048237517@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When entering idle, it's a good oportunity to verify that the TSC_ADJUST
MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
detected, emit a warning and restore it to the previous value.
This is especially important for machines, which mark the TSC reliable
because there is no watchdog clocksource available (SoCs).
This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never
goes idle, but adding a timer to do the check periodically is not an option
either. On a machine, which has this issue, the check triggeres right
during boot, so there is a decent chance that the sysadmin will notice.
Rate limit the check to once per second and warn only once per cpu.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.732180441@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
in a two aspects:
1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
cycles spent by storing the TSC value at SMM entry and restoring it at
SMM exit. On affected machines the TSCs run slowly out of sync up to the
point where the clocksource watchdog (if available) detects it.
The TSC_ADJUST MSR allows to detect the TSC modification before that and
eventually restore it. This is also important for SoCs which have no
watchdog clocksource and therefore TSC wreckage cannot be detected and
acted upon.
2) All threads in a package are required to have the same TSC_ADJUST
value. Broken BIOSes break that and as a result the TSC synchronization
check fails.
The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
online. If detected set it to the value of an already online CPU in the
same package. This also allows to reduce the number of sync tests
because with that in place the test is only required for the first CPU
in a package.
In principle all CPUs in a system should have the same TSC_ADJUST value
even across packages, but with physical CPU hotplug this assumption is
not true because the TSC starts with power on, so physical hotplug has
to do some trickery to bring the TSC into sync with already running
packages, which requires to use an TSC_ADJUST value different from CPUs
which got powered earlier.
A final enhancement is the opportunity to compensate for unsynced TSCs
accross nodes at boot time and make the TSC usable that way. It won't
help for TSCs which run apart due to frequency skew between packages,
but this gets detected by the clocksource watchdog later.
The first step toward this is to store the TSC_ADJUST value of a starting
CPU and compare it with the value of an already online CPU in the same
package. If they differ, emit a warning and adjust it to the reference
value. The !SMP version just stores the boot value for later verification.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Power management suspend/resume tracing (ab)uses the RTC to store
suspend/resume information persistently. As a consequence the RTC value is
clobbered when timekeeping is resumed and tries to inject the sleep time.
Commit a4f8f6667f ("timekeeping: Cap array access in timekeeping_debug")
plugged a out of bounds array access in the timekeeping debug code which
was caused by the clobbered RTC value, but we still use the clobbered RTC
value for sleep time injection into kernel timekeeping, which will result
in random adjustments depending on the stored "hash" value.
To prevent this keep track of the RTC clobbering and ignore the invalid RTC
timestamp at resume. If the system resumed successfully clear the flag,
which marks the RTC as unusable, warn the user about the RTC clobber and
recommend to adjust the RTC with 'ntpdate' or 'rdate'.
[jstultz: Fixed up pr_warn formating, and implemented suggestions from Ingo]
[ tglx: Rewrote changelog ]
Originally-from: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/1480372524-15181-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch converts aesni (including fpu) over to the skcipher
interface. The LRW implementation has been removed as the generic
LRW code can now be used directly on top of the accelerated ECB
implementation.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds xts helpers that use the skcipher interface rather
than blkcipher. This will be used by aesni_intel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When removing a sub directory/rdtgroup by rmdir or umount, closid in a
task in the sub directory is set to default rdtgroup's closid which is 0.
If the task is running on a CPU, the PQR_ASSOC MSR is only updated
when the task runs through a context switch. Up to the context switch,
the task runs with the wrong closid.
Make the change immediately effective by invoking a smp function call on
all CPUs which are running moved task. If one of the affected tasks was
moved or scheduled out before the function call is executed on the CPU the
only damage is the extra interruption of the CPU.
[ tglx: Reworked it to avoid blindly interrupting all CPUs and extra loops ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1479511084-59727-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In x86's include/asm/Kbuild three entries are appended to the genhdr-y make
variable:
genhdr-y += unistd_32.h
genhdr-y += unistd_64.h
genhdr-y += unistd_x32.h
The same entries are also appended to that variable in
include/uapi/asm/Kbuild. So commit:
10b63956fc ("UAPI: Plumb the UAPI Kbuilds into the user header installation and checking")
... removed these three entries from include/asm/Kbuild. But, apparently, some
merge conflict resolution re-added them.
The net effect is, in short, that the genhdr-y make variable contains these
file names twice and, as a consequence, that the corresponding headers get
installed twice. And so the build prints:
INSTALL usr/include/asm/ (65 files)
... while in reality only 62 files are installed in that directory.
Nothing breaks because of all that, but it's a good idea to finally remove
these unneeded entries nevertheless.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1480077707-2837-1-git-send-email-pebolle@tiscali.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Single-stepping through head_64.S made me look at the fixmap page PTEs
fixup loop:
So we're going through the whole level2_fixmap_pgt 4K page, looking at
whether PAGE_PRESENT is set in those PTEs and add the delta between
where we're compiled to run and where we actually end up running.
However, if that delta is 0 (most cases) we go through all those 512
PTEs for no reason at all. Oh well, we add 0 but that's no reason to me.
Skipping that useless fixup gives us a boot speedup of 0.004 seconds in
my guest. Not a lot but considering how cheap it is, I'll take it. Here
is the printk time difference:
before:
...
[ 0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized
[ 0.013590] Calibrating delay loop (skipped), value calculated using timer frequency..
8027.17 BogoMIPS (lpj=16054348)
[ 0.017094] pid_max: default: 32768 minimum: 301
...
after:
...
[ 0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized
[ 0.009587] Calibrating delay loop (skipped), value calculated using timer frequency..
8026.86 BogoMIPS (lpj=16053724)
[ 0.013090] pid_max: default: 32768 minimum: 301
...
For the other two changes converting naked numbers to defines:
# arch/x86/kernel/head_64.o:
text data bss dec hex filename
1124 290864 4096 296084 48494 head_64.o.before
1124 290864 4096 296084 48494 head_64.o.after
md5:
87086e202588939296f66e892414ffe2 head_64.o.before.asm
87086e202588939296f66e892414ffe2 head_64.o.after.asm
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161125111448.23623-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull KVM fixes from Radim Krčmář:
"Four fixes for bugs found by syzkaller on x86, all for stable"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: check for pic and ioapic presence before use
KVM: x86: fix out-of-bounds accesses of rtc_eoi map
KVM: x86: drop error recovery in em_jmp_far and em_ret_far
KVM: x86: fix out-of-bounds access in lapic
On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package. In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.
To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency. In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach. At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.
The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function. The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be
bigger that the maximal number of VCPUs, resulting in out-of-bounds
access.
Found by syzkaller:
BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...]
Write of size 1 by task a.out/27101
CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905
[...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495
[...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86
[...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360
[...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222
[...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235
[...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670
[...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668
[...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999
[...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: af1bae5497 ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.
Found by syzkaller:
WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
Kernel panic - not syncing: panic_on_warn set ...
CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __dump_stack lib/dump_stack.c:15
[...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
[...] panic+0x1b7/0x3a3 kernel/panic.c:179
[...] __warn+0x1c4/0x1e0 kernel/panic.c:542
[...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
[...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
[...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
[...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
[...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
[...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
[...] complete_emulated_io arch/x86/kvm/x86.c:6870
[...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
[...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
[...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
[...] vfs_ioctl fs/ioctl.c:43
[...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
[...] SYSC_ioctl fs/ioctl.c:694
[...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[...] entry_SYSCALL_64_fastpath+0x1f/0xc2
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: d1442d85cc ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Update the I/O interception support to add the kvm_fast_pio_in function
to speed up the in instruction similar to the out instruction.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
AMD hardware adds two additional bits to aid in nested page fault handling.
Bit 32 - NPF occurred while translating the guest's final physical address
Bit 33 - NPF occurred while translating the guest page tables
The guest page tables fault indicator can be used as an aid for nested
virtualization. Using V0 for the host, V1 for the first level guest and
V2 for the second level guest, when both V1 and V2 are using nested paging
there are currently a number of unnecessary instruction emulations. When
V2 is launched shadow paging is used in V1 for the nested tables of V2. As
a result, KVM marks these pages as RO in the host nested page tables. When
V2 exits and we resume V1, these pages are still marked RO.
Every nested walk for a guest page table is treated as a user-level write
access and this causes a lot of NPFs because the V1 page tables are marked
RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
sees a write to a read-only page, emulates the V1 instruction and unprotects
the page (marking it RW). This patch looks for cases where we get a NPF due
to a guest page table walk where the page was marked RO. It immediately
unprotects the page and resumes the guest, leading to far fewer instruction
emulations when nested virtualization is used.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Commit:
90954e7b94 ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag")
changed the coredumping code to construct the elf coredump file according
to register set size - and that's good: if binary crashes with 32-bit code
selector, generate 32-bit ELF core, otherwise - 64-bit core.
That was made for restoring 32-bit applications on x86_64: we want
32-bit application after restore to generate 32-bit ELF dump on crash.
All was quite good and recently I started reworking 32-bit applications
dumping part of CRIU: now it has two parasites (32 and 64) for seizing
compat/native tasks, after rework it'll have one parasite, working in
64-bit mode, to which 32-bit prologue long-jumps during infection.
And while it has worked for my work machine, in VM with
!CONFIG_X86_X32_ABI during reworking I faced that segfault in 32-bit
binary, that has long-jumped to 64-bit mode results in dereference
of garbage:
32-victim[19266]: segfault at f775ef65 ip 00000000f775ef65 sp 00000000f776aa50 error 14
BUG: unable to handle kernel paging request at ffffffffffffffff
IP: [<ffffffff81332ce0>] strlen+0x0/0x20
[...]
Call Trace:
[] elf_core_dump+0x11a9/0x1480
[] do_coredump+0xa6b/0xe60
[] get_signal+0x1a8/0x5c0
[] do_signal+0x23/0x660
[] exit_to_usermode_loop+0x34/0x65
[] prepare_exit_to_usermode+0x2f/0x40
[] retint_user+0x8/0x10
That's because we have 64-bit registers set (with according total size)
and we're writing it to elf_thread_core_info which has smaller size
on !CONFIG_X86_X32_ABI. That lead to overwriting ELF notes part.
Tested on 32-, 64-bit ELF crashes and on 32-bit binaries that have
jumped with 64-bit code selector - all is readable with gdb.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Fixes: 90954e7b94 ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf fixes from Ingo Molnar:
"Six fixes for bugs that were found via fuzzing, and a trivial
hw-enablement patch for AMD Family-17h CPU PMUs"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Allow only a single PMU/box within an events group
perf/x86/intel: Cure bogus unwind from PEBS entries
perf/x86: Restore TASK_SIZE check on frame pointer
perf/core: Fix address filter parser
perf/x86: Add perf support for AMD family-17h processors
perf/x86/uncore: Fix crash by removing bogus event_list[] handling for SNB client uncore IMC
perf/core: Do not set cpuctx->cgrp for unscheduled cgroups
Intel Xeons from Ivy Bridge onwards support a processor identification
number set in the factory. To the user this is a handy unique number to
identify a particular CPU. Intel can decode this to the fab/production
run to track errors. On systems that have it, include it in the machine
check record. I'm told that this would be helpful for users that run
large data centers with multi-socket servers to keep track of which CPUs
are seeing errors.
Boris:
* Add some clarifying comments and spacing.
* Mask out [63:2] in the disabled-but-not-locked case
* Call the MSR variable "val" for more readability.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20161123114855.njguoaygp3qnbkia@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.
The smp_call_function_single() is dropped because the ONLINE callback is
invoked on the target CPU since commit 1cf4f629d9 ("cpu/hotplug: Move
online calls to hotplugged cpu"). smp_call_function_single() invokes the
invoked function with interrupts disabled, but this calling convention is
not preserved as the MSR is not modified by anything else than this code.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: rt@linuxtronix.de
Link: http://lkml.kernel.org/r/20161117183541.8588-21-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>