These are not local timer interrupts but IPIs. It's good to be able
to see how timer offloading is behaving, so split these out into
their own category.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The broadcast tick recipient can call tick_receive_broadcast rather
than re-running the full timer interrupt.
It does not have to check for the next event time, because the sender
already determined the timer has expired. It does not have to test
irq_work_pending, because that's a direct decrementer interrupt and
does not go through the clock events subsystem. And it does not have
to read PURR because that was removed with the previous patch.
This results in no code size change, but both the decrementer and
broadcast path lengths are reduced.
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For SPLPAR, lparcfg provides a sum of PURR registers for all CPUs.
Currently this is done by reading PURR in context switch and timer
interrupt, and storing that into a per-CPU variable. These are summed
to provide the value.
This does not work with all timer schemes (e.g., NO_HZ_FULL), and it
is sub-optimal for performance because it reads the PURR register on
every context switch, although that's been difficult to distinguish
from noise in the contxt_switch microbenchmark.
This patch implements the sum by calling a function on each CPU, to
read and add PURR values of each CPU.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Book3S minimum supported ISA version now requires mtmsrd L=1. This
instruction does not require bits other than RI and EE to be supplied,
so __hard_irq_enable() and __hard_irq_disable() does not have to read
the kernel_msr from paca.
Interrupt entry code already relies on L=1 support.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This check does not catch IRQ soft mask bugs, but this option is
slightly more suitable than TRACE_IRQFLAGS.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is a branch with a mixture of mm, x86 and powerpc commits all
relating to some minor cross-arch pkeys consolidation. The x86/mm
changes have been reviewed by Ingo & Dave Hansen and the tree has been
in linux-next for some weeks without issue.
We ended up with an ugly conflict between fixes and next in ftrace.h
involving multiple nested ifdefs, and the automatic resolution is
wrong. So merge fixes into next so we can fix it up.
Fix the below crash on Book3E 64. pgtable_page_dtor expects struct
page *arg.
Also call the destructor on non book3s platforms correctly. This frees
up the split PTL locks correctly if we had allocated them before.
Call Trace:
.kmem_cache_free+0x9c/0x44c (unreliable)
.ptlock_free+0x1c/0x30
.tlb_remove_table+0xdc/0x224
.free_pgd_range+0x298/0x500
.shift_arg_pages+0x10c/0x1e0
.setup_arg_pages+0x200/0x25c
.load_elf_binary+0x450/0x16c8
.search_binary_handler.part.11+0x9c/0x248
.do_execveat_common.isra.13+0x868/0xc18
.run_init_process+0x34/0x4c
.try_to_run_init_process+0x1c/0x68
.kernel_init+0xdc/0x130
.ret_from_kernel_thread+0x58/0x7c
Fixes: 702346768 ("powerpc/mm/nohash: Remove pte fragment dependency from nohash")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently guest kernel doesn't handle TAR facility unavailable and it
always runs with TAR bit on. PR KVM will lazily enable TAR. TAR is not
a frequent-use register and it is not included in SVCPU struct.
Due to the above, the checkpointed TAR val might be a bogus TAR val.
To solve this issue, we will make vcpu->arch.fscr tar bit consistent
with shadow_fscr when TM is enabled.
At the end of emulating treclaim., the correct TAR val need to be loaded
into the register if FSCR_TAR bit is on.
At the beginning of emulating trechkpt., TAR needs to be flushed so that
the right tar val can be copied into tar_tm.
Tested with:
tools/testing/selftests/powerpc/tm/tm-tar
tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar (remove DSCR/PPR
related testing).
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This patch adds host emulation when guest PR KVM executes "trechkpt.",
which is a privileged instruction and will trap into host.
We firstly copy vcpu ongoing content into vcpu tm checkpoint
content, then perform kvmppc_restore_tm_pr() to do trechkpt.
with updated vcpu tm checkpoint values.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Currently the kernel doesn't use transaction memory.
And there is an issue for privileged state in the guest that:
tbegin/tsuspend/tresume/tabort TM instructions can impact MSR TM bits
without trapping into the PR host. So following code will lead to a
false mfmsr result:
tbegin <- MSR bits update to Transaction active.
beq <- failover handler branch
mfmsr <- still read MSR bits from magic page with
transaction inactive.
It is not an issue for non-privileged guest state since its mfmsr is
not patched with magic page and will always trap into the PR host.
This patch will always fail tbegin attempt for privileged state in the
guest, so that the above issue is prevented. It is benign since
currently (guest) kernel doesn't initiate a transaction.
Test case:
https://github.com/justdoitqd/publicFiles/blob/master/test_tbegin_pr.c
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The mfspr/mtspr on TM SPRs(TEXASR/TFIAR/TFHAR) are non-privileged
instructions and can be executed by PR KVM guest in problem state
without trapping into the host. We only emulate mtspr/mfspr
texasr/tfiar/tfhar in guest PR=0 state.
When we are emulating mtspr tm sprs in guest PR=0 state, the emulation
result needs to be visible to guest PR=1 state. That is, the actual TM
SPR val should be loaded into actual registers.
We already flush TM SPRs into vcpu when switching out of CPU, and load
TM SPRs when switching back.
This patch corrects mfspr()/mtspr() emulation for TM SPRs to make the
actual source/dest be the actual TM SPRs.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The transaction memory checkpoint area save/restore behavior is
triggered when VCPU qemu process is switching out/into CPU, i.e.
at kvmppc_core_vcpu_put_pr() and kvmppc_core_vcpu_load_pr().
MSR TM active state is determined by TS bits:
active: 10(transactional) or 01 (suspended)
inactive: 00 (non-transactional)
We don't "fake" TM functionality for guest. We "sync" guest virtual
MSR TM active state(10 or 01) with shadow MSR. That is to say,
we don't emulate a transactional guest with a TM inactive MSR.
TM SPR support(TFIAR/TFAR/TEXASR) has already been supported by
commit 9916d57e64 ("KVM: PPC: Book3S PR: Expose TM registers").
Math register support (FPR/VMX/VSX) will be done at subsequent
patch.
Whether TM context need to be saved/restored can be determined
by kvmppc_get_msr() TM active state:
* TM active - save/restore TM context
* TM inactive - no need to do so and only save/restore
TM SPRs.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Currently __kvmppc_save/restore_tm() APIs can only be invoked from
assembly function. This patch adds C function wrappers for them so
that they can be safely called from C function.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This merges in the ppc-kvm topic branch of the powerpc repository
to get some changes on which future patches will depend, in particular
some new exports and TEXASR bit definitions.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The toc field in the mod_arch_specific struct isn't actually used
anywhere, so remove it.
Also the ftrace-specific fields are now common between 32-bit and
64-bit, so simplify the struct definition a bit by moving them out of
the __powerpc64__ #ifdef.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull KVM fixes from Radim Krčmář:
"PPC:
- Close a hole which could possibly lead to the host timebase getting
out of sync.
- Three fixes relating to PTEs and TLB entries for radix guests.
- Fix a bug which could lead to an interrupt never getting delivered
to the guest, if it is pending for a guest vCPU when the vCPU gets
offlined.
s390:
- Fix false negatives in VSIE validity check (Cc stable)
x86:
- Fix time drift of VMX preemption timer when a guest uses LAPIC
timer in periodic mode (Cc stable)
- Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow
migration from hosts that don't need retpoline mitigation (Cc
stable)
- Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and
CPUID.OSXSAVE (Cc stable)
- Report correct RIP after Hyper-V hypercall #UD (introduced in
-rc6)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix #UD address of failed Hyper-V hypercalls
kvm: x86: IA32_ARCH_CAPABILITIES is always supported
KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
x86/kvm: fix LAPIC timer drift when guest uses periodic mode
KVM: s390: vsie: fix < 8k check for the itdba
KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path
KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change
KVM: PPC: Book3S HV: Make radix clear pte when unmapping
KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page
KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
Add one missing prototype for function rh_dump_blk. Fix warning treated as
error in W=1:
arch/powerpc/lib/rheap.c:740:6: error: no previous prototype for ‘rh_dump_blk’ [-Werror=missing-prototypes]
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The function prototypes were declared within a `#ifdef CONFIG_PPC_LITE5200`
block which would prevent them from being visible when compiling
`mpc52xx_pm.c`. Move the prototypes outside of the `#ifdef` block to fix
the following warnings treated as errors with W=1:
arch/powerpc/platforms/52xx/mpc52xx_pm.c:58:5: error: no previous prototype for ‘mpc52xx_pm_prepare’ [-Werror=missing-prototypes]
arch/powerpc/platforms/52xx/mpc52xx_pm.c:113:5: error: no previous prototype for ‘mpc52xx_pm_enter’ [-Werror=missing-prototypes]
arch/powerpc/platforms/52xx/mpc52xx_pm.c:181:6: error: no previous prototype for ‘mpc52xx_pm_finish’ [-Werror=missing-prototypes]
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The pmac_pfunc_base_install prototype was declared in powermac/smp.c since
function was used there, move it to pmac_pfunc.h header to be visible in
pfunc_base.c. Fix a warning treated as error with W=1:
arch/powerpc/platforms/powermac/pfunc_base.c:330:12: error: no previous prototype for ‘pmac_pfunc_base_install’ [-Werror=missing-prototypes]
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Trivial fix to remove the following sparse warnings:
arch/powerpc/kernel/module_32.c:112:74: warning: Using plain integer as NULL pointer
arch/powerpc/kernel/module_32.c:117:74: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:1155:28: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:1230:20: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:1385:36: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:1752:23: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:2084:19: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:2110:32: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:2167:19: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:2183:19: warning: Using plain integer as NULL pointer
drivers/macintosh/via-pmu.c:277:20: warning: Using plain integer as NULL pointer
arch/powerpc/platforms/powermac/setup.c:155:67: warning: Using plain integer as NULL pointer
arch/powerpc/platforms/powermac/setup.c:247:27: warning: Using plain integer as NULL pointer
arch/powerpc/platforms/powermac/setup.c:249:27: warning: Using plain integer as NULL pointer
arch/powerpc/platforms/powermac/setup.c:252:37: warning: Using plain integer as NULL pointer
arch/powerpc/mm/tlb_hash32.c:127:21: warning: Using plain integer as NULL pointer
arch/powerpc/mm/tlb_hash32.c:148:21: warning: Using plain integer as NULL pointer
arch/powerpc/mm/tlb_hash32.c:44:21: warning: Using plain integer as NULL pointer
arch/powerpc/mm/tlb_hash32.c:57:21: warning: Using plain integer as NULL pointer
arch/powerpc/mm/tlb_hash32.c:87:21: warning: Using plain integer as NULL pointer
arch/powerpc/kernel/btext.c:160:31: warning: Using plain integer as NULL pointer
arch/powerpc/kernel/btext.c:167:22: warning: Using plain integer as NULL pointer
arch/powerpc/kernel/btext.c:274:21: warning: Using plain integer as NULL pointer
arch/powerpc/kernel/btext.c:285:31: warning: Using plain integer as NULL pointer
arch/powerpc/include/asm/hugetlb.h:204:16: warning: Using plain integer as NULL pointer
arch/powerpc/mm/ppc_mmu_32.c:170:21: warning: Using plain integer as NULL pointer
arch/powerpc/platforms/powermac/pci.c:1227:23: warning: Using plain integer as NULL pointer
arch/powerpc/platforms/powermac/pci.c:65:24: warning: Using plain integer as NULL pointer
Also use `--fix` command line option from `script/checkpatch --strict` to
remove the following:
CHECK: Comparison to NULL could be written "!dispDeviceBase"
#72: FILE: arch/powerpc/kernel/btext.c:160:
+ if (dispDeviceBase == NULL)
CHECK: Comparison to NULL could be written "!vbase"
#80: FILE: arch/powerpc/kernel/btext.c:167:
+ if (vbase == NULL)
CHECK: Comparison to NULL could be written "!base"
#89: FILE: arch/powerpc/kernel/btext.c:274:
+ if (base == NULL)
CHECK: Comparison to NULL could be written "!dispDeviceBase"
#98: FILE: arch/powerpc/kernel/btext.c:285:
+ if (dispDeviceBase == NULL)
CHECK: Comparison to NULL could be written "strstr"
#117: FILE: arch/powerpc/kernel/module_32.c:117:
+ if (strstr(secstrings + sechdrs[i].sh_name, ".debug") != NULL)
CHECK: Comparison to NULL could be written "!Hash"
#130: FILE: arch/powerpc/mm/ppc_mmu_32.c:170:
+ if (Hash == NULL)
CHECK: Comparison to NULL could be written "Hash"
#143: FILE: arch/powerpc/mm/tlb_hash32.c:44:
+ if (Hash != NULL) {
CHECK: Comparison to NULL could be written "!Hash"
#152: FILE: arch/powerpc/mm/tlb_hash32.c:57:
+ if (Hash == NULL) {
CHECK: Comparison to NULL could be written "!Hash"
#161: FILE: arch/powerpc/mm/tlb_hash32.c:87:
+ if (Hash == NULL) {
CHECK: Comparison to NULL could be written "!Hash"
#170: FILE: arch/powerpc/mm/tlb_hash32.c:127:
+ if (Hash == NULL) {
CHECK: Comparison to NULL could be written "!Hash"
#179: FILE: arch/powerpc/mm/tlb_hash32.c:148:
+ if (Hash == NULL) {
ERROR: space required after that ';' (ctx:VxV)
#192: FILE: arch/powerpc/platforms/powermac/pci.c:65:
+ for (; node != NULL;node = node->sibling) {
CHECK: Comparison to NULL could be written "node"
#192: FILE: arch/powerpc/platforms/powermac/pci.c:65:
+ for (; node != NULL;node = node->sibling) {
CHECK: Comparison to NULL could be written "!region"
#201: FILE: arch/powerpc/platforms/powermac/pci.c:1227:
+ if (region == NULL)
CHECK: Comparison to NULL could be written "of_get_property"
#214: FILE: arch/powerpc/platforms/powermac/setup.c:155:
+ if (of_get_property(np, "cache-unified", NULL) != NULL && dc) {
CHECK: Comparison to NULL could be written "!np"
#223: FILE: arch/powerpc/platforms/powermac/setup.c:247:
+ if (np == NULL)
CHECK: Comparison to NULL could be written "np"
#226: FILE: arch/powerpc/platforms/powermac/setup.c:249:
+ if (np != NULL) {
CHECK: Comparison to NULL could be written "l2cr"
#230: FILE: arch/powerpc/platforms/powermac/setup.c:252:
+ if (l2cr != NULL) {
CHECK: Comparison to NULL could be written "via"
#243: FILE: drivers/macintosh/via-pmu.c:277:
+ if (via != NULL)
CHECK: Comparison to NULL could be written "current_req"
#252: FILE: drivers/macintosh/via-pmu.c:1155:
+ if (current_req != NULL) {
CHECK: Comparison to NULL could be written "!req"
#261: FILE: drivers/macintosh/via-pmu.c:1230:
+ if (req == NULL || pmu_state != idle
CHECK: Comparison to NULL could be written "!req"
#270: FILE: drivers/macintosh/via-pmu.c:1385:
+ if (req == NULL) {
CHECK: Comparison to NULL could be written "!pp"
#288: FILE: drivers/macintosh/via-pmu.c:2084:
+ if (pp == NULL)
CHECK: Comparison to NULL could be written "!pp"
#297: FILE: drivers/macintosh/via-pmu.c:2110:
+ if (count < 1 || pp == NULL)
CHECK: Comparison to NULL could be written "!pp"
#306: FILE: drivers/macintosh/via-pmu.c:2167:
+ if (pp == NULL)
CHECK: Comparison to NULL could be written "pp"
#315: FILE: drivers/macintosh/via-pmu.c:2183:
+ if (pp != NULL) {
Link: https://github.com/linuxppc/linux/issues/37
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some functions prototypes were missing for the non-altivec code. Add the
missing prototypes in a new header file, fix warnings treated as errors
with W=1:
arch/powerpc/lib/xor_vmx_glue.c:18:6: error: no previous prototype for ‘xor_altivec_2’ [-Werror=missing-prototypes]
arch/powerpc/lib/xor_vmx_glue.c:29:6: error: no previous prototype for ‘xor_altivec_3’ [-Werror=missing-prototypes]
arch/powerpc/lib/xor_vmx_glue.c:40:6: error: no previous prototype for ‘xor_altivec_4’ [-Werror=missing-prototypes]
arch/powerpc/lib/xor_vmx_glue.c:52:6: error: no previous prototype for ‘xor_altivec_5’ [-Werror=missing-prototypes]
The prototypes were already present in <asm/xor.h> but this header file is
meant to be included after <include/linux/raid/xor.h>. Trying to re-use
<asm/xor.h> directly would lead to warnings such as:
arch/powerpc/include/asm/xor.h:39:15: error: variable ‘xor_block_altivec’ has initializer but incomplete type
Trying to re-use <asm/xor.h> after <include/linux/raid/xor.h> in
xor_vmx_glue.c would in turn trigger the following warnings:
include/asm-generic/xor.h:688:34: error: ‘xor_block_32regs’ defined but not used [-Werror=unused-variable]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This allows the compiler to verify the format strings vs the types of
the arguments.
Update the other prototype declarations in asm/xmon.h.
Silence warnings (triggered at W=1) by adding relevant __printf
attribute. Move #define at bottom of the file to prevent conflict with
gcc attribute.
Solves the original warning:
arch/powerpc/xmon/nonstdio.c:178:2: error: function might be
possible candidate for ‘gnu_printf’ format attribute
In turn this uncovered many formatting errors in xmon.c, all fixed in
this patch.
Signed-off-by: Mathieu Malaterre <malat@debian.org>
[mpe: Always use px not p, fixup the 44x specific code, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use symbolic names defined in asm/ppc-opcode.h
instead of hardcoded values.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch exports tm_enable()/tm_disable/tm_abort() APIs, which
will be used for PR KVM transactional memory logic.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patches add some macros for CR0/TEXASR bits so that PR KVM TM
logic (tbegin./treclaim./tabort.) can make use of them later.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch reimplements LOAD_VMX/STORE_VMX MMIO emulation with
analyse_instr() input. When emulating the store, the VMX reg will need to
be flushed so that the right reg val can be retrieved before writing to
IO MEM.
This patch also adds support for lvebx/lvehx/lvewx/stvebx/stvehx/stvewx
MMIO emulation. To meet the requirement of handling different element
sizes, kvmppc_handle_load128_by2x64()/kvmppc_handle_store128_by2x64()
were replaced with kvmppc_handle_vmx_load()/kvmppc_handle_vmx_store().
The framework used is similar to VSX instruction MMIO emulation.
Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
VSX MMIO emulation uses mmio_vsx_copy_type to represent VSX emulated
element size/type, such as KVMPPC_VSX_COPY_DWORD_LOAD, etc. This
patch expands mmio_vsx_copy_type to cover VMX copy type, such as
KVMPPC_VMX_COPY_BYTE(stvebx/lvebx), etc. As a result,
mmio_vsx_copy_type is also renamed to mmio_copy_type.
It is a preparation for reimplementing VMX MMIO emulation.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Currently HV will save math regs(FP/VEC/VSX) when trap into host. But
PR KVM will only save math regs when qemu task switch out of CPU, or
when returning from qemu code.
To emulate FP/VEC/VSX mmio load, PR KVM need to make sure that math
regs were flushed firstly and then be able to update saved VCPU
FPR/VEC/VSX area reasonably.
This patch adds giveup_ext() field to KVM ops. Only PR KVM has non-NULL
giveup_ext() ops. kvmppc_complete_mmio_load() can invoke that hook
(when not NULL) to flush math regs accordingly, before updating saved
register vals.
Math regs flush is also necessary for STORE, which will be covered
in later patch within this patch series.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This patch reimplements non-SIMD LOAD/STORE instruction MMIO emulation
with analyse_instr() input. It utilizes the BYTEREV/UPDATE/SIGNEXT
properties exported by analyse_instr() and invokes
kvmppc_handle_load(s)/kvmppc_handle_store() accordingly.
It also moves CACHEOP type handling into the skeleton.
instruction_type within kvm_ppc.h is renamed to avoid conflict with
sstep.h.
Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Some VSX instructions like lxvwsx will splat word into VSR. This patch
adds a new VSX copy type KVMPPC_VSX_COPY_WORD_LOAD_DUMP to support this.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
On some CPUs we can prevent a vulnerability related to store-to-load
forwarding by preventing store forwarding between privilege domains,
by inserting a barrier in kernel entry and exit paths.
This is known to be the case on at least Power7, Power8 and Power9
powerpc CPUs.
Barriers must be inserted generally before the first load after moving
to a higher privilege, and after the last store before moving to a
lower privilege, HV and PR privilege transitions must be protected.
Barriers are added as patch sections, with all kernel/hypervisor entry
points patched, and the exit points to lower privilge levels patched
similarly to the RFI flush patching.
Firmware advertisement is not implemented yet, so CPU flush types
are hard coded.
Thanks to Michal Suchánek for bug fixes and review.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds support to read 64-bit sensor values. This method is
used to read energy sensors and counters which are of type u64.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add byte-swapping versions of __raw_writeq() and __raw_rm_writeq().
This allows us to avoid sparse warnings caused by passing __be64 to
__raw_writeq(), which takes unsigned long:
arch/powerpc/platforms/powernv/pci-ioda.c:1981:38:
warning: incorrect type in argument 1 (different base types)
expected unsigned long [unsigned] v
got restricted __be64 [usertype] <noident>
It's also generally preferable to use a byte-swapping accessor rather
than doing it by hand in the code, which is more bug prone.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
tlbies to an LPAR do not have to be serialised since POWER4/PPC970,
after which the MMU_FTR_LOCKLESS_TLBIE feature was introduced to
avoid tlbie locking.
Since commit c17b98cf60 ("KVM: PPC: Book3S HV: Remove code for
PPC970 processors"), KVM no longer supports processors that do not
have this feature, so the tlbie locking can be removed completely.
A sanity check for the feature is put in kvmppc_mmu_hv_init.
Testing was done on a POWER9 system in HPT mode, with a -smp 32 guest
in HPT mode. 32 instances of the powerpc fork benchmark from selftests
were run with --fork, and the results measured.
Without this patch, total throughput was about 13.5K/sec, and this is
the top of the host profile:
74.52% [k] do_tlbies
2.95% [k] kvmppc_book3s_hv_page_fault
1.80% [k] calc_checksum
1.80% [k] kvmppc_vcpu_run_hv
1.49% [k] kvmppc_run_core
After this patch, throughput was about 51K/sec, with this profile:
21.28% [k] do_tlbies
5.26% [k] kvmppc_run_core
4.88% [k] kvmppc_book3s_hv_page_fault
3.30% [k] _raw_spin_lock_irqsave
3.25% [k] gup_pgd_range
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This patch moves nip/ctr/lr/xer registers from scattered places in
kvm_vcpu_arch to pt_regs structure.
cr register is "unsigned long" in pt_regs and u32 in vcpu->arch.
It will need more consideration and may move in later patches.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Current regs are scattered at kvm_vcpu_arch structure and it will
be more neat to organize them into pt_regs structure.
Also it will enable reimplementation of MMIO emulation code with
analyse_instr() later.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This merges in the ppc-kvm topic branch of the powerpc repository
to get some changes on which future patches will depend, in particular
the definitions of various new TLB flushing functions.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
arch/powerpc/Makefile activates -mmultiple on BE PPC32 configs
in order to use multiple word instructions in functions entry/exit.
The patch does the same for the asm parts, for consistency.
On processors like the 8xx on which insn fetching is pretty slow,
this speeds up registers save/restore.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: PPC32 is BE only, so drop the endian checks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reverts commit 6ad966d730.
That commit was pointless, because csum_add() sums two 32 bits
values, so the sum is 0x1fffffffe at the maximum.
And then when adding upper part (1) and lower part (0xfffffffe),
the result is 0xffffffff which doesn't carry.
Any lower value will not carry either.
And behind the fact that this commit is useless, it also kills the
whole purpose of having an arch specific inline csum_add()
because the resulting code gets even worse than what is obtained
with the generic implementation of csum_add()
0000000000000240 <.csum_add>:
240: 38 00 ff ff li r0,-1
244: 7c 84 1a 14 add r4,r4,r3
248: 78 00 00 20 clrldi r0,r0,32
24c: 78 89 00 22 rldicl r9,r4,32,32
250: 7c 80 00 38 and r0,r4,r0
254: 7c 09 02 14 add r0,r9,r0
258: 78 09 00 22 rldicl r9,r0,32,32
25c: 7c 00 4a 14 add r0,r0,r9
260: 78 03 00 20 clrldi r3,r0,32
264: 4e 80 00 20 blr
In comparison, the generic implementation of csum_add() gives:
0000000000000290 <.csum_add>:
290: 7c 63 22 14 add r3,r3,r4
294: 7f 83 20 40 cmplw cr7,r3,r4
298: 7c 10 10 26 mfocrf r0,1
29c: 54 00 ef fe rlwinm r0,r0,29,31,31
2a0: 7c 60 1a 14 add r3,r0,r3
2a4: 78 63 00 20 clrldi r3,r3,32
2a8: 4e 80 00 20 blr
And the reverted implementation for PPC64 gives:
0000000000000240 <.csum_add>:
240: 7c 84 1a 14 add r4,r4,r3
244: 78 80 00 22 rldicl r0,r4,32,32
248: 7c 80 22 14 add r4,r0,r4
24c: 78 83 00 20 clrldi r3,r4,32
250: 4e 80 00 20 blr
Fixes: 6ad966d730 ("powerpc/64: Fix checksum folding in csum_add()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PMD_PAGE_SIZE() is nowhere used and _PMD_SIZE is only
used by PMD_PAGE_SIZE().
This patch removes them.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Although Linux doesn't use PURR and SPURR ((Scaled) Processor
Utilization of Resources Register), other OSes depend on them.
On POWER8 they count at a rate depending on whether the VCPU is
idle or running, the activity of the VCPU, and the value in the
RWMR (Region-Weighting Mode Register). Hardware expects the
hypervisor to update the RWMR when a core is dispatched to reflect
the number of online VCPUs in the vcore.
This adds code to maintain a count in the vcore struct indicating
how many VCPUs are online. In kvmppc_run_core we use that count
to set the RWMR register on POWER8. If the core is split because
of a static or dynamic micro-threading mode, we use the value for
8 threads. The RWMR value is not relevant when the host is
executing because Linux does not use the PURR or SPURR register,
so we don't bother saving and restoring the host value.
For the sake of old userspace which does not set the KVM_REG_PPC_ONLINE
register, we set online to 1 if it was 0 at the time of a KVM_RUN
ioctl.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds a new KVM_REG_PPC_ONLINE register which userspace can set
to 0 or 1 via the GET/SET_ONE_REG interface to indicate whether it
considers the VCPU to be offline (0), that is, not currently running,
or online (1). This will be used in a later patch to configure the
register which controls PURR and SPURR accumulation.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Currently, the HV KVM guest entry/exit code adds the timebase offset
from the vcore struct to the timebase on guest entry, and subtracts
it on guest exit. Which is fine, except that it is possible for
userspace to change the offset using the SET_ONE_REG interface while
the vcore is running, as there is only one timebase offset per vcore
but potentially multiple VCPUs in the vcore. If that were to happen,
KVM would subtract a different offset on guest exit from that which
it had added on guest entry, leading to the timebase being out of sync
between cores in the host, which then leads to bad things happening
such as hangs and spurious watchdog timeouts.
To fix this, we add a new field 'tb_offset_applied' to the vcore struct
which stores the offset that is currently applied to the timebase.
This value is set from the vcore tb_offset field on guest entry, and
is what is subtracted from the timebase on guest exit. Since it is
zero when the timebase offset is not applied, we can simplify the
logic in kvmhv_start_timing and kvmhv_accumulate_time.
In addition, we had secondary threads reading the timebase while
running concurrently with code on the primary thread which would
eventually add or subtract the timebase offset from the timebase.
This occurred while saving or restoring the DEC register value on
the secondary threads. Although no specific incorrect behaviour has
been observed, this is a race which should be fixed. To fix it, we
move the DEC saving code to just before we call kvmhv_commence_exit,
and the DEC restoring code to after the point where we have waited
for the primary thread to switch the MMU context and add the timebase
offset. That way we are sure that the timebase contains the guest
timebase value in both cases.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Implement a local TLB flush for invalidating an LPID with variants for
process or partition scope. And a global TLB flush for invalidating
a partition scoped page of an LPID.
These will be used by KVM in subsequent patches.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>