Commit 432c6bacbd ("MIPS: Use per-mm page to execute branch delay slot
instructions") accidentally removed use of the MIPS_FPU_EMU_INC_STATS
macro from do_dsemulret, leading to the ds_emul file in debugfs always
returning zero even though we perform delay slot emulations.
Fix this by re-adding the use of the MIPS_FPU_EMU_INC_STATS macro.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: 432c6bacbd ("MIPS: Use per-mm page to execute branch delay slot instructions")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14301/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch fixes the possibility of a deadlock when bringing up
secondary CPUs.
The deadlock occurs because the set_cpu_online() is called before
synchronise_count_slave(). This can cause a deadlock if the boot CPU,
having scheduled another thread, attempts to send an IPI to the
secondary CPU, which it sees has been marked online. The secondary is
blocked in synchronise_count_slave() waiting for the boot CPU to enter
synchronise_count_master(), but the boot cpu is blocked in
smp_call_function_many() waiting for the secondary to respond to it's
IPI request.
Fix this by marking the CPU online in cpu_callin_map and synchronising
counters before declaring the CPU online and calculating the maps for
IPIs.
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Reported-by: Justin Chen <justinpopo6@gmail.com>
Tested-by: Justin Chen <justinpopo6@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: stable@vger.kernel.org # v4.1+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14302/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When faulting on some trap, the kernel currently reports in dmesg:
do_page_fault() command='perl' type=6 address=0xbe400403 in libcrypt-2.24.so[f9086000+9000]
vm_start = 0x00922000, vm_end = 0x00aed000
With this change the trap type additionally gets reported as human readable
string which makes it simpler to recognize the type of problem:
do_page_fault() command='perl' type=6 address=0xbe400403 in libcrypt-2.24.so[f9086000+9000]
trap #6: Instruction TLB miss fault, vm_start = 0x00922000, vm_end = 0x00aed000
Signed-off-by: Helge Deller <deller@gmx.de>
Pull perf fixes from Thomas Gleixner:
"Three fixlets for perf:
- add a missing NULL pointer check in the intel BTS driver
- make BTS an exclusive PMU because BTS can only handle one event at
a time
- ensure that exclusive events are limited to one PMU so that several
exclusive events can be scheduled on different PMU instances"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Limit matching exclusive events to one PMU
perf/x86/intel/bts: Make it an exclusive PMU
perf/x86/intel/bts: Make sure debug store is valid
Pull locking fixes from Thomas Gleixner:
"Two smallish fixes:
- use the proper asm constraint in the Super-H atomic_fetch_ops
- a trivial typo fix in the Kconfig help text"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()
Pull EFI fixes from Thomas Gleixner:
"Two fixes for EFI/PAT:
- a 32bit overflow bug in the PAT code which was unearthed by the
large EFI mappings
- prevent a boot hang on large systems when EFI mixed mode is enabled
but not used"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Only map RAM into EFI page tables if in mixed-mode
x86/mm/pat: Prevent hang during boot when mapping pages
In case of error, the function platform_device_register_simple()
returns ERR_PTR() and never returns NULL. The NULL test in the
return value check must therefor be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Vadim Pasternak <vadimp@mellanox.com>
Cc: platform-driver-x86@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus reported the following objtool warning:
kernel/signal.o: warning: objtool: .altinstr_replacement+0x54: call without frame pointer save/setup
The warning is valid. It's caused by the fact that gcc placed the call
instruction in alternative_call_2()'s inline asm before the frame
pointer setup, which breaks frame pointer convention and can result in a
bad stack trace.
Force a stack frame to be created before the call instruction by listing
the stack pointer as an output operand in the inline asm statement.
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160923214939.j5o7c67nhepzmh3t@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull arm64 fixes from Catalin Marinas:
"A couple of last-minute arm64 fixes for 4.8:
- Fix secondary CPU to NUMA node assignment
- Fix kgdb breakpoint insertion in read-only text sections (when
CONFIG_DEBUG_RODATA or CONFIG_DEBUG_SET_MODULE_RONX are enabled)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kgdb: handle read-only text / modules
arm64: Call numa_store_cpu_info() earlier.
In the mipsr2_decoder() function, used to emulate pre-MIPSr6
instructions that were removed in MIPSr6, the init_fpu() function is
called if a removed pre-MIPSr6 floating point instruction is the first
floating point instruction used by the task. However, init_fpu()
performs varous actions that rely upon not being migrated. For example
in the most basic case it sets the coprocessor 0 Status.CU1 bit to
enable the FPU & then loads FP register context into the FPU registers.
If the task were to migrate during this time, it may end up attempting
to load FP register context on a different CPU where it hasn't set the
CU1 bit, leading to errors such as:
do_cpu invoked from kernel context![#2]:
CPU: 2 PID: 7338 Comm: fp-prctl Tainted: G D 4.7.0-00424-g49b0c82 #2
task: 838e4000 ti: 88d38000 task.ti: 88d38000
$ 0 : 00000000 00000001 ffffffff 88d3fef8
$ 4 : 838e4000 88d38004 00000000 00000001
$ 8 : 3400fc01 801f8020 808e9100 24000000
$12 : dbffffff 807b69d8 807b0000 00000000
$16 : 00000000 80786150 00400fc4 809c0398
$20 : 809c0338 0040273c 88d3ff28 808e9d30
$24 : 808e9d30 00400fb4
$28 : 88d38000 88d3fe88 00000000 8011a2ac
Hi : 0040273c
Lo : 88d3ff28
epc : 80114178 _restore_fp+0x10/0xa0
ra : 8011a2ac mipsr2_decoder+0xd5c/0x1660
Status: 1400fc03 KERNEL EXL IE
Cause : 1080002c (ExcCode 0b)
PrId : 0001a920 (MIPS I6400)
Modules linked in:
Process fp-prctl (pid: 7338, threadinfo=88d38000, task=838e4000, tls=766527d0)
Stack : 00000000 00000000 00000000 88d3fe98 00000000 00000000 809c0398 809c0338
808e9100 00000000 88d3ff28 00400fc4 00400fc4 0040273c 7fb69e18 004a0000
004a0000 004a0000 7664add0 8010de18 00000000 00000000 88d3fef8 88d3ff28
808e9100 00000000 766527d0 8010e534 000c0000 85755000 8181d580 00000000
00000000 00000000 004a0000 00000000 766527d0 7fb69e18 004a0000 80105c20
...
Call Trace:
[<80114178>] _restore_fp+0x10/0xa0
[<8011a2ac>] mipsr2_decoder+0xd5c/0x1660
[<8010de18>] do_ri+0x90/0x6b8
[<80105c20>] ret_from_exception+0x0/0x10
Fix this by disabling preemption around the call to init_fpu(), ensuring
that it starts & completes on one CPU.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b0a668fb20 ("MIPS: kernel: mips-r2-to-r6-emul: Add R2 emulator for MIPS R6")
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.0+
Patchwork: https://patchwork.linux-mips.org/patch/14305/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Erratum A-008585 says that the ARM generic timer counter "has the
potential to contain an erroneous value for a small number of core
clock cycles every time the timer value changes". Accesses to TVAL
(both read and write) are also affected due to the implicit counter
read. Accesses to CVAL are not affected.
The workaround is to reread TVAL and count registers until successive
reads return the same value. Writes to TVAL are replaced with an
equivalent write to CVAL.
The workaround is to reread TVAL and count registers until successive reads
return the same value, and when writing TVAL to retry until counter
reads before and after the write return the same value.
The workaround is enabled if the fsl,erratum-a008585 property is found in
the timer node in the device tree. This can be overridden with the
clocksource.arm_arch_timer.fsl-a008585 boot parameter, which allows KVM
users to enable the workaround until a mechanism is implemented to
automatically communicate this information.
This erratum can be found on LS1043A and LS2080A.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Scott Wood <oss@buserror.net>
[will: renamed read macro to reflect that it's not usually unstable]
Signed-off-by: Will Deacon <will.deacon@arm.com>
A new accessor for gic_write_bpr1 is added to arch_gicv3.h in 4.9,
whilst the CP15 accessors are redifined in a separate branch.
This leads to a horrible clash, where the new accessor ends up with
a crap "asm volatile" definition.
Work around this by carrying our own definition of gic_write_bpr1,
creating a small conflict which will be obvious to resolve.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pull "few minor fixes for omap dts files for v4.9 merge window"
Few fixes for omap dts files for v4.9 merge window. Let's also add
the tilcdc quirks:
- Fix typo with recent beagleboard-x15 changes for mmc2_pins_default
- Add am335x blue-and-red-wiring quirk as specified in the binding in
Documentation/devicetree/bindings/display/tilcdc/tilcdc.txt. Also
fix up the whitespace formatting for am335x-evmsk.
- Fix for recent igepv5 power button for GPIO_ACTIVE_LOW. Also fix
up the whitespace formatting for the button
* tag 'omap-for-v4.9/dt-pt3-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: omap5-igep0050.dts: Use tabs for indentation
ARM: dts: Fix igepv5 power button GPIO direction
ARM: dts: am335x-evmsk: Add blue-and-red-wiring -property to lcdc node
ARM: dts: am335x-evmsk: Whitespace cleanup of lcdc related nodes
ARM: dts: am335x-evm: Add blue-and-red-wiring -property to lcdc node
ARM: dts: am57xx-beagle-x15-common: Fix wrong pinctrl selection for mmc2
Return value of class_create should be considered in module init function.
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
According to commit ef158bdf83
("mtd: Remove unused symbol CONFIG_MTDRAM_ABS_POS")
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
We unlocked a few lines earlier so we can delete these unlocks.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
Add set_dev_domain_options() to set PCI domain-specific options as devices
are added. The first usage is to request exclusive userspace control of
PCIe hotplug indicators in VMD domains.
Devices in a VMD domain use PCIe hotplug Attention and Power Indicators in
a non-standard way; tell pciehp to ignore the indicators so userspace can
control them via the sysfs "attention" file.
To determine whether a bus is within a VMD domain, add a bool to the
pci_sysdata structure that the VMD driver sets during initialization.
[bhelgaas: changelog]
Requested-by: Kapil Karkra <kapil.karkra@intel.com>
Tested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This file was only including module.h for exception table related
functions. We've now separated that content out into its own file
"extable.h" so now move over to that and avoid all the extra header
content in module.h that we don't really need to compile this file.
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: linux-cris-kernel@axis.com
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
The Kconfig CONFIG_ETRAX_AXISFLASHMAP_MTD0WHOLE
does not exist for crisv10, so remove it.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
commit d47529b2e9
"gpio: don't include module.h in shared driver header"
removed <linux/module.h> from the <linux/gpio/driver.h> header.
It seems arch/arm/mach-omap2/board-rx51-peripherals.c
is using __initdata_or_module from <linux/module.h> through
<linux/gpio.h> to <linux/gpio/driver.h>, so break this dependency
so that we get a clean compile.
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Lindgren <tony@atomide.com>
Fixes: d47529b2e9 ("gpio: don't include module.h in shared driver header")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Handle read-only cases when CONFIG_DEBUG_RODATA (4.0) or
CONFIG_DEBUG_SET_MODULE_RONX (3.18) are enabled by using
aarch64_insn_write() instead of probe_kernel_write() as introduced by
commit 2f896d5866 ("arm64: use fixmap for text patching") in 4.0.
Fixes: 11d91a770f ("arm64: Add CONFIG_DEBUG_SET_MODULE_RONX support")
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The wq_numa_init() function makes a private CPU to node map by calling
cpu_to_node() early in the boot process, before the non-boot CPUs are
brought online. Since the default implementation of cpu_to_node()
returns zero for CPUs that have never been brought online, the
workqueue system's view is that *all* CPUs are on node zero.
When the unbound workqueue for a non-zero node is created, the
tsk_cpus_allowed() for the worker threads is the empty set because
there are, in the view of the workqueue system, no CPUs on non-zero
nodes. The code in try_to_wake_up() using this empty cpumask ends up
using the cpumask empty set value of NR_CPUS as an index into the
per-CPU area pointer array, and gets garbage as it is one past the end
of the array. This results in:
[ 0.881970] Unable to handle kernel paging request at virtual address fffffb1008b926a4
[ 1.970095] pgd = fffffc00094b0000
[ 1.973530] [fffffb1008b926a4] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
[ 1.982610] Internal error: Oops: 96000004 [#1] SMP
[ 1.987541] Modules linked in:
[ 1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: G W 4.8.0-rc6-preempt-vol+ #9
[ 1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
[ 2.005159] task: fffffe0fe89cc300 task.stack: fffffe0fe8b8c000
[ 2.011158] PC is at try_to_wake_up+0x194/0x34c
[ 2.015737] LR is at try_to_wake_up+0x150/0x34c
[ 2.020318] pc : [<fffffc00080e7468>] lr : [<fffffc00080e7424>] pstate: 600000c5
[ 2.027803] sp : fffffe0fe8b8fb10
[ 2.031149] x29: fffffe0fe8b8fb10 x28: 0000000000000000
[ 2.036522] x27: fffffc0008c63bc8 x26: 0000000000001000
[ 2.041896] x25: fffffc0008c63c80 x24: fffffc0008bfb200
[ 2.047270] x23: 00000000000000c0 x22: 0000000000000004
[ 2.052642] x21: fffffe0fe89d25bc x20: 0000000000001000
[ 2.058014] x19: fffffe0fe89d1d00 x18: 0000000000000000
[ 2.063386] x17: 0000000000000000 x16: 0000000000000000
[ 2.068760] x15: 0000000000000018 x14: 0000000000000000
[ 2.074133] x13: 0000000000000000 x12: 0000000000000000
[ 2.079505] x11: 0000000000000000 x10: 0000000000000000
[ 2.084879] x9 : 0000000000000000 x8 : 0000000000000000
[ 2.090251] x7 : 0000000000000040 x6 : 0000000000000000
[ 2.095621] x5 : ffffffffffffffff x4 : 0000000000000000
[ 2.100991] x3 : 0000000000000000 x2 : 0000000000000000
[ 2.106364] x1 : fffffc0008be4c24 x0 : ffffff0ffffada80
[ 2.111737]
[ 2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfffffe0fe8b8c020)
[ 2.120102] Stack: (0xfffffe0fe8b8fb10 to 0xfffffe0fe8b90000)
[ 2.125914] fb00: fffffe0fe8b8fb80 fffffc00080e7648
.
.
.
[ 2.442859] Call trace:
[ 2.445327] Exception stack(0xfffffe0fe8b8f940 to 0xfffffe0fe8b8fa70)
[ 2.451843] f940: fffffe0fe89d1d00 0000040000000000 fffffe0fe8b8fb10 fffffc00080e7468
[ 2.459767] f960: fffffe0fe8b8f980 fffffc00080e4958 ffffff0ff91ab200 fffffc00080e4b64
[ 2.467690] f980: fffffe0fe8b8f9d0 fffffc00080e515c fffffe0fe8b8fa80 0000000000000000
[ 2.475614] f9a0: fffffe0fe8b8f9d0 fffffc00080e58e4 fffffe0fe8b8fa80 0000000000000000
[ 2.483540] f9c0: fffffe0fe8d10000 0000000000000040 fffffe0fe8b8fa50 fffffc00080e5ac4
[ 2.491465] f9e0: ffffff0ffffada80 fffffc0008be4c24 0000000000000000 0000000000000000
[ 2.499387] fa00: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000040
[ 2.507309] fa20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.515233] fa40: 0000000000000000 0000000000000000 0000000000000000 0000000000000018
[ 2.523156] fa60: 0000000000000000 0000000000000000
[ 2.528089] [<fffffc00080e7468>] try_to_wake_up+0x194/0x34c
[ 2.533723] [<fffffc00080e7648>] wake_up_process+0x28/0x34
[ 2.539275] [<fffffc00080d3764>] create_worker+0x110/0x19c
[ 2.544824] [<fffffc00080d69dc>] alloc_unbound_pwq+0x3cc/0x4b0
[ 2.550724] [<fffffc00080d6bcc>] wq_update_unbound_numa+0x10c/0x1e4
[ 2.557066] [<fffffc00080d7d78>] workqueue_online_cpu+0x220/0x28c
[ 2.563234] [<fffffc00080bd288>] cpuhp_invoke_callback+0x6c/0x168
[ 2.569398] [<fffffc00080bdf74>] cpuhp_up_callbacks+0x44/0xe4
[ 2.575210] [<fffffc00080be194>] cpuhp_thread_fun+0x13c/0x148
[ 2.581027] [<fffffc00080dfbac>] smpboot_thread_fn+0x19c/0x1a8
[ 2.586929] [<fffffc00080dbd64>] kthread+0xdc/0xf0
[ 2.591776] [<fffffc0008083380>] ret_from_fork+0x10/0x50
[ 2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821)
[ 2.603464] ---[ end trace 58c0cd36b88802bc ]---
[ 2.608138] Kernel panic - not syncing: Fatal exception
Fix by moving call to numa_store_cpu_info() for all CPUs into
smp_prepare_cpus(), which happens before wq_numa_init(). Since
smp_store_cpu_info() now contains only a single function call,
simplify by removing the function and out-lining its contents.
Suggested-by: Robert Richter <rric@kernel.org>
Fixes: 1a2db30034 ("arm64, numa: Add NUMA support for arm64 platforms.")
Cc: <stable@vger.kernel.org> # 4.7.x-
Signed-off-by: David Daney <david.daney@cavium.com>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Tested-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Run kvm-unit-tests/eventinj.flat in L1:
Sending NMI to self
After NMI to self
FAIL: NMI
This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly.
At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1
and L0 are empty so:
- The L2 accesses memory can generate EPT violation which can be intercepted by L0.
The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is
recorded in vmcs02's IDT-vectoring info.
- L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1.
The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since
it is a nested vmexit.
- L1 receives the EPT violation, then fixes its EPT12.
- L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0.
- L0 emulates VMRESUME which is called from L1, then return to L2.
L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through
vmcs02.
- The L2 re-executes the fault instruction and cause EPT violation again.
- Since the L1's EPT12 is valid, L0 can fix its EPT02
- L0 resume L2
The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info
is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry
event injection since it is caused by EPT02's EPT violation.
However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in
guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is
the L0's responsibility to inject NMI from IDT-vectoring info to L2.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Bandan Das <bsd@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
I observed that kvmvapic(to optimize flexpriority=N or AMD) is used
to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case
on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f95
x86, apicv: add virtual x2apic support) disable virtual x2apic mode
completely if w/o APICv, and the author also told me that windows guest
can't enter into x2apic mode when he developed the APICv feature several
years ago. However, it is not truth currently, Interrupt Remapping and
vIOMMU is added to qemu and the developers from Intel test windows 8 can
work in x2apic mode w/ Interrupt Remapping enabled recently.
This patch enables TPR shadow for virtual x2apic mode to boost
windows guest in x2apic mode even if w/o APICv.
Can pass the kvm-unit-test.
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Suggested-by: Wincy Van <fanwenyi0529@gmail.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wincy Van <fanwenyi0529@gmail.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
dump_stack+0x99/0xd0
__warn+0xd1/0xf0
warn_slowpath_fmt+0x4f/0x60
? prepare_to_swait+0x39/0xa0
? prepare_to_swait+0x39/0xa0
__might_sleep+0x7e/0x80
__gfn_to_pfn_memslot+0x156/0x480 [kvm]
gfn_to_pfn+0x2a/0x30 [kvm]
gfn_to_page+0xe/0x20 [kvm]
kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
? _raw_spin_unlock_irqrestore+0x36/0x80
vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
kvm_vcpu_check_block+0x12/0x60 [kvm]
kvm_vcpu_block+0x94/0x4c0 [kvm]
kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc5+ #47 Not tainted
-------------------------------
./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/4230:
#0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
stack backtrace:
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
dump_stack+0x99/0xd0
lockdep_rcu_suspicious+0xe7/0x120
gfn_to_memslot+0x12a/0x140 [kvm]
gfn_to_pfn+0x12/0x30 [kvm]
gfn_to_page+0xe/0x20 [kvm]
kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
? _raw_spin_unlock_irqrestore+0x36/0x80
vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
kvm_vcpu_check_block+0x12/0x60 [kvm]
kvm_vcpu_block+0x94/0x4c0 [kvm]
kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0xfd/0x210
? __lock_is_held+0x54/0x70
do_vfs_ioctl+0x96/0x6a0
? __fget+0x11c/0x210
? __fget+0x5/0x210
SyS_ioctl+0x79/0x90
do_syscall_64+0x81/0x220
entry_SYSCALL64_slow_path+0x25/0x25
These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
The nested preemption timer is based on hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the new check_nested_events
hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
both can be in nested preemption timer evaluation path which results in
the warning above.
This patch fix it by leveraging request bit to async reload APIC access
page before vmentry in order to avoid to reload directly during the nested
preemption timer evaluation, it is safe since the vmcs01 is loaded and
current is nested vmexit.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The same logic appears twice and should probably be pulled out into a
function.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rui Teng <rui.teng@linux.vnet.ibm.com>
[mpe: Rename to tm_flush_hash_page() and move comment into the function]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CLR_TOP32() is defined as blank. Last useful instance of CLR_TOP32()
was removed by commit 40ef8cbc6d ("powerpc: Get 64-bit configs to
compile with ARCH=powerpc") in 2005.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some CPUs like the 8xx, _PAGE_RW hence _PAGE_WRITE is defined
as 0 and _PAGE_RO has to be set when a page is not writable
_PAGE_RO is defined by default in pte-common.h, however BOOK3S/64
doesn't include that file so _PAGE_RO has to be defined explicitly
in book3s/64/pgtable.h
Fixes: a7b9f671f2 ("powerpc32: adds handling of _PAGE_RO")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In eeh_handle_special_event(), eeh_pe_bus_get() is called before calling
eeh_report_failure() on every device under a PE. If a PE was missing a
bus for some reason, the error would occur before reporting failure, even
though eeh_report_failure() doesn't require a bus.
Fix this by moving the bus retrieval and error check after the
eeh_report_failure() calls.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When the PE used in pnv_eeh_reset() is that of a VF,
pnv_eeh_reset_vf_pe() is used. Unlike the other reset functions called
in pnv_eeh_reset(), the VF reset doesn't require a bus, and if a bus was
missing the function would error out before resetting the VF PE.
To avoid this, reorder the VF reset function to occur before finding and
checking the bus.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
eeh_pe_bus_get() can return NULL if a PCI bus isn't found for a given PE.
Some callers don't check this, and can cause a null pointer dereference
under certain circumstances.
Fix this by checking NULL everywhere eeh_pe_bus_get() is called.
Fixes: 8a6b1bc70d ("powerpc/eeh: EEH core to handle special event")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When we originally added the ability to split the exception vectors from
the kernel (commit 1f6a93e4c3 ("powerpc: Make it possible to move the
interrupt handlers away from the kernel" 2008-09-15)), the LOAD_HANDLER() macro
used an addi instruction to compute the offset of the common handler
from the kernel base address.
Using addi meant the handler had to be within 32K of the kernel base
address, due to the addi instruction taking a signed immediate value.
That necessitated creating a trampoline for the system call handler,
because system_call_common (in entry64.S) is not linked within 32K of
the kernel base address.
Later in commit 61e2390ede ("powerpc: Make load_hander handle upto 64k
offset" 2012-11-15) we changed LOAD_HANDLER to take a 64K offset, by
changing it to use ori.
Although system_call_common is not in head_64.S or exceptions-64s.S, it
is included in head-y, which causes it to be linked early in the kernel
text, so in practice it ends up below 64K. Additionally if it can't be
placed below 64K the linker will fail to build with a "relocation
truncated to fit" error.
So remove the trampoline.
Newer toolchains are able to work out that the ori in LOAD_HANDLER only
takes a 16 bit offset, and so they generate a 16 bit relocation. Older
toolchains (binutils 2.22 at least) are not so smart, so we have to add
the @l annotation to tell the assembler to generate a 16 bit relocation.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The 0xf80 hv_facility_unavailable trampoline branches to the 0xf60
handler. This works because they both do the same thing, but it should
be fixed.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On EEH events the kernel will print a dump of relevant registers.
If EEH is unavailable (i.e. CONFIG_EEH is disabled, a new platform
doesn't have EEH support, etc) this information isn't readily available.
Add a new debugfs handler to trigger a PHB register dump, so that this
information can be made available on demand.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The only difference is now the TCE table check which doesn't need
to be ifdef'ed out, it will basically do nothing on BookE (it is
only useful for ancient IBM machines).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we turn the MMU off after copying the image, and we make
sure there is no overlap between the hash table and the target pages
in that case.
That doesn't work for Radix however. In that case, the page tables
are scattered and we can't really enforce that the target of the
image isn't overlapping one of them.
So instead, let's turn the MMU off before copying the image in radix
mode. Thankfully, in radix mode, even under a hypervisor, we know we
don't have the same kind of RMA limitations that hash mode has.
While at it, also turn the MMU off early when using hash in non-LPAR
mode, that way we can get rid of the collision check completely.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>