Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Set missing wakeup bit in LPCR on POWER9
- Fix the early OPAL console wrappers
- Fixup kernel read only mapping
Fixes for code merged this cycle:
- Fix missing CRCs, add more asm-prototypes.h declarations"
* tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fixup kernel read only mapping
powerpc/boot: Fix the early OPAL console wrappers
powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
powerpc: Set missing wakeup bit in LPCR on POWER9
With commit e58e87adc8 ("powerpc/mm: Update _PAGE_KERNEL_RO") we
started using the ppp value 0b110 to map kernel readonly. But that
facility was only added as part of ISA 2.04. For earlier ISA version
only supported ppp bit value for readonly mapping is 0b011. (This
implies both user and kernel get mapped using the same ppp bit value for
readonly mapping.).
Update the code such that for earlier architecture version we use ppp
value 0b011 for readonly mapping. We don't differentiate between power5+
and power5 here and apply the new ppp bits only from power6 (ISA 2.05).
This keep the changes minimal.
This fixes issue with PS3 spu usage reported at
https://lkml.kernel.org/r/rep.1421449714.geoff@infradead.org
Fixes: e58e87adc8 ("powerpc/mm: Update _PAGE_KERNEL_RO")
Cc: stable@vger.kernel.org # v4.7+
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit d0563a1297 ("powerpc: Implement {cmp}xchg for u8 and u16")
we removed the volatile from __cmpxchg().
This is leading to warnings such as:
drivers/gpu/drm/drm_lock.c: In function ‘drm_lock_take’:
arch/powerpc/include/asm/cmpxchg.h:484:37: warning: passing argument 1
of ‘__cmpxchg’ discards ‘volatile’ qualifier from pointer target
(__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_, \
There doesn't seem to be consensus across architectures whether the
argument is volatile or not, so at least for now put the volatile back.
Fixes: d0563a1297 ("powerpc: Implement {cmp}xchg for u8 and u16")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The new XIVE interrupt controller on POWER9 can direct external
interrupts to the hypervisor or the guest. The interrupts directed to
the hypervisor are controlled by an LPCR bit called LPCR_HVICE, and
come in as a "hypervisor virtualization interrupt". This sets the
LPCR bit so that hypervisor virtualization interrupts can occur while
we are in the guest. We then also need to cope with exiting the guest
because of a hypervisor virtualization interrupt.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
POWER9 includes a new interrupt controller, called XIVE, which is
quite different from the XICS interrupt controller on POWER7 and
POWER8 machines. KVM-HV accesses the XICS directly in several places
in order to send and clear IPIs and handle interrupts from PCI
devices being passed through to the guest.
In order to make the transition to XIVE easier, OPAL firmware will
include an emulation of XICS on top of XIVE. Access to the emulated
XICS is via OPAL calls. The one complication is that the EOI
(end-of-interrupt) function can now return a value indicating that
another interrupt is pending; in this case, the XIVE will not signal
an interrupt in hardware to the CPU, and software is supposed to
acknowledge the new interrupt without waiting for another interrupt
to be delivered in hardware.
This adapts KVM-HV to use the OPAL calls on machines where there is
no XICS hardware. When there is no XICS, we look for a device-tree
node with "ibm,opal-intc" in its compatible property, which is how
OPAL indicates that it provides XICS emulation.
In order to handle the EOI return value, kvmppc_read_intr() has
become kvmppc_read_one_intr(), with a boolean variable passed by
reference which can be set by the EOI functions to indicate that
another interrupt is pending. The new kvmppc_read_intr() keeps
calling kvmppc_read_one_intr() until there are no more interrupts
to process. The return value from kvmppc_read_intr() is the
largest non-zero value of the returns from kvmppc_read_one_intr().
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
and tlbiel (local tlbie) instructions. Both instructions get a
set of new parameters (RIC, PRS and R) which appear as bits in the
instruction word. The tlbiel instruction now has a second register
operand, which contains a PID and/or LPID value if needed, and
should otherwise contain 0.
This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
as well as older processors. Since we only handle HPT guests so
far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
word as on previous processors, so we don't need to conditionally
execute different instructions depending on the processor.
The local flush on first entry to a guest in book3s_hv_rmhandlers.S
is a loop which depends on the number of TLB sets. Rather than
using feature sections to set the number of iterations based on
which CPU we're on, we now work out this number at VM creation time
and store it in the kvm_arch struct. That will make it possible to
get the number from the device tree in future, which will help with
compatibility with future processors.
Since mmu_partition_table_set_entry() does a global flush of the
whole LPID, we don't need to do the TLB flush on first entry to the
guest on each processor. Therefore we don't set all bits in the
tlb_need_flush bitmap on VM startup on POWER9.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds code to handle two new guest-accessible special-purpose
registers on POWER9: TIDR (thread ID register) and PSSCR (processor
stop status and control register). They are context-switched
between host and guest, and the guest values can be read and set
via the one_reg interface.
The PSSCR contains some fields which are guest-accessible and some
which are only accessible in hypervisor mode. We only allow the
guest-accessible fields to be read or set by userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This merges in the ppc-kvm topic branch to get changes to
arch/powerpc code that are necessary for adding POWER9 KVM support.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Partialy copied from commit c743f38013 ("ARM: initial stack protector
(-fstack-protector) support")
This is the very basic stuff without the changing canary upon
task switch yet. Just the Kconfig option and a constant canary
value initialized at boot time.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Implement xchg{u8,u16}{local,relaxed}, and
cmpxchg{u8,u16}{,local,acquire,relaxed}.
It works on all ppc.
remove volatile of first parameter in __cmpxchg_local and __cmpxchg
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Invoke the kprobe handlers directly rather than through notify_die(), to
reduce path taken for handling kprobes. Similar to commit 6f6343f53d
("kprobes/x86: Call exception handlers directly from do_int3/do_debug").
While at it, rename post_kprobe_handler() to kprobe_post_handler() for
more uniform naming.
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 03465f899b ("powerpc: Use kprobe blacklist for exception
handlers") removed __kprobes annotation from some of the prototypes,
but left the kprobes header include directive unchanged. Remove it as it
is no longer needed.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Define and set the POWER9 HFSCR doorbell bit so that guests can use
msgsndp.
ISA 3.0 calls this MSGP, so name it accordingly in the code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
ISA 3.0 defines a new PECE (Power-saving mode Exit Cause Enable) field
in the LPCR (Logical Partitioning Control Register), called
LPCR_PECE_HVEE (Hypervisor Virtualization Exit Enable).
KVM code will need to know about this bit, so add a definition for it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
ISA 3.00 adds the logical PVR value 0x0f000005, so add a definition for
this.
Define PCR_ARCH_207 to reflect ISA 2.07 compatibility mode in the processor
compatibility register (PCR).
[paulus@ozlabs.org - moved dummy PCR_ARCH_300 value into next patch]
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi()
and opal_int_set_mfrr(), for use by KVM real-mode code.
It also exports opal_int_set_mfrr() so that the modular part of KVM
can use it to send IPIs.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 requires the host to set up a partition table, which is a
table in memory indexed by logical partition ID (LPID) which
contains the pointers to page tables and process tables for the
host and each guest.
This factors out the initialization of the partition table into
a single function. This code was previously duplicated between
hash_utils_64.c and pgtable-radix.c.
This provides a function for setting a partition table entry,
which is used in early MMU initialization, and will be used by
KVM whenever a guest is created. This function includes a tlbie
instruction which will flush all TLB entries for the LPID and
all caches of the partition table entry for the LPID, across the
system.
This also moves a call to memblock_set_current_limit(), which was
in radix_init_partition_table(), but has nothing to do with the
partition table. By analogy with the similar code for hash, the
call gets moved to near the end of radix__early_init_mmu(). It
now gets called when running as a guest, whereas previously it
would only be called if the kernel is running as the host.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
After patch 4efca4ed0 ("kbuild: modversions for EXPORT_SYMBOL() for asm"),
asm exports can get modversions CRCs generated if they have C definitions
in asm-prototypes.h. This patch adds missing definitions for 32 and 64 bit
allmodconfig builds.
Fixes: 9445aa1a30 ("ppc: move exports to definitions")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is a new bit, LPCR_PECE_HVEE (Hypervisor Virtualization Exit
Enable), which controls wakeup from STOP states on Hypervisor
Virtualization Interrupts (which happen to also be all external
interrupts in host or bare metal mode).
It needs to be set or we will miss wakeups.
Fixes: 9baaef0a22 ("powerpc/irq: Add support for HV virtualization interrupts")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rename it to HVEE to match the name in the ISA]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
eeh_pe_reset and eeh_reset_pe are two different functions in the same
file which do mostly the same thing. Not only is this confusing, but
potentially causes disrepancies in functionality, notably eeh_reset_pe
as it does not check return values for failure.
Refactor this into the following:
- eeh_pe_reset(): stays as is, performs a single operation, exported
- eeh_pe_reset_full(): new, full reset process that calls eeh_pe_reset()
- eeh_reset_pe(): removed and replaced by eeh_pe_reset_full()
- eeh_reset_pe_once(): removed
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When switching from/to a guest that has a transaction in progress,
we need to save/restore the checkpointed register state. Although
XER is part of the CPU state that gets checkpointed, the code that
does this saving and restoring doesn't save/restore XER.
This fixes it by saving and restoring the XER. To allow userspace
to read/write the checkpointed XER value, we also add a new ONE_REG
specifier.
The visible effect of this bug is that the guest may see its XER
value being corrupted when it uses transactions.
Fixes: e4e3812150 ("KVM: PPC: Book3S HV: Add transactional memory support")
Fixes: 0a8eccefcb ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This keeps a per vcpu cache for recently page faulted MMIO entries.
On a page fault, if the entry exists in the cache, we can avoid some
time-consuming paths, for example, looking up HPT, locking HPTE twice
and searching mmio gfn from memslots, then directly call
kvmppc_hv_emulate_mmio().
In current implenment, we limit the size of cache to four. We think
it's enough to cover the high-frequency MMIO HPTEs in most case.
For example, considering the case of using virtio device, for virtio
legacy devices, one HPTE could handle notifications from up to
1024 (64K page / 64 byte Port IO register) devices, so one cache entry
is enough; for virtio modern devices, we always need one HPTE to handle
notification for each device because modern device would use a 8M MMIO
register to notify host instead of Port IO register, typically the
system's configuration should not exceed four virtio devices per
vcpu, four cache entry is also enough in this case. Of course, if needed,
we could also modify the macro to a module parameter in the future.
Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
A bunch of KVM functions are only called from assembler.
Give them prototypes in asm-prototypes.h
This reduces sparse warnings.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- fix system reset interrupt winkle wakeups
- fix setting of AIL in hypervisor mode
Fixes for code merged this cycle:
- fix exception vector build with 2.23 era binutils
- fix missing update of HID register on secondary CPUs
Other:
- fix missing pr_cont()s
- invalidate ERAT on tlbiel for POWER9 DD1"
* tag 'powerpc-4.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix missing update of HID register on secondary CPUs
powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1
powerpc/64: Fix setting of AIL in hypervisor mode
powerpc/oops: Fix missing pr_cont()s in instruction dump
powerpc/oops: Fix missing pr_cont()s in show_regs()
powerpc/oops: Fix missing pr_cont()s in print_msr_bits() et. al.
powerpc/oops: Fix missing pr_cont()s in show_stack()
powerpc: Fix exception vector build with 2.23 era binutils
powerpc/64s: Fix system reset interrupt winkle wakeups
MIN_HUGEPTE_SHIFT hasn't been used since commit d1837cba5d
("powerpc/mm: Cleanup initialization of hugepages on powerpc")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On POWER9 DD1, when we do a local TLB invalidate we also need to explicitly
invalidate the ERAT.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently there's some CMO (Cooperative Memory Overcommit) code, in
plpar_wrappers.h. Some of it is #ifdef CONFIG_PSERIES and some of it
isn't. The end result being if a file includes plpar_wrappers.h it won't
build with CONFIG_PSERIES=n.
Fix it by moving the CMO code into platforms/pseries. The two hcall
wrappers can just be moved into their only caller, cmm.c, and the
accessors can go in pseries.h.
Note we need the accessors because cmm.c can be built as a module, so
there needs to be a split between the built-in code vs the module, and
that's achieved by using those accessors.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This changes the way that we support the new ISA v3.00 HPTE format.
Instead of adapting everything that uses HPTE values to handle either
the old format or the new format, depending on which CPU we are on,
we now convert explicitly between old and new formats if necessary
in the low-level routines that actually access HPTEs in memory.
This limits the amount of code that needs to know about the new
format and makes the conversions explicit. This is OK because the
old format contains all the information that is in the new format.
This also fixes operation under a hypervisor, because the H_ENTER
hypercall (and other hypercalls that deal with HPTEs) will continue
to require the HPTE value to be supplied in the old format. At
present the kernel will not boot in HPT mode on POWER9 under a
hypervisor.
This fixes and partially reverts commit 50de596de8
("powerpc/mm/hash: Add support for Power9 Hash", 2016-04-29).
Fixes: 50de596de8 ("powerpc/mm/hash: Add support for Power9 Hash")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Load monitored is no longer supported on POWER9 so let's remove the
code.
This reverts commit bd3ea317fd ("powerpc: Load Monitor Register
Support").
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
An hcall was recently added that does exactly what we need during kexec
- it clears the entire MMU hash table, ignoring any VRMA mappings.
Try it and fall back to the old method if we get a failure.
On a POWER8 box with 5TB of memory, this reduces the time it takes to
kexec a new kernel from from 4 minutes to 1 minute.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Split into separate functions and tweak function naming]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This halves the exception table size on 64-bit builds, and it allows
build-time sorting of exception tables to work on relocated kernels.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Minor asm fixups and bits to keep the selftests working]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This macro is taken from s390, and allows more flexibility in
changing exception table format.
mpe: Put it in ppc_asm.h and only define one version using
stringinfy_in_c(). Add some empty definitions and headers to keep the
selftests happy.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There's no reason to #error if we include ppc_asm.h in asm files, the
ifdef already prevents any problems.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Exception handlers are aligned to 128 bytes (L1 cache) on 64s, which is
overkill. It can reduce the icache footprint of any individual exception
path. However taken as a whole, the expansion in icache footprint seems
likely to be counter-productive and cause more total misses.
Create IFETCH_ALIGN_SHIFT/BYTES, which should give optimal ifetch
alignment with much more reasonable alignment. This saves 1792 bytes
from head_64.o text with an allmodconfig build.
Other subarchitectures should define appropriate IFETCH_ALIGN_SHIFT
values if this becomes more widely used.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There are three #ifdef CONFIG_PPC_BOOK3E sections in nohash/64/pgtable.h.
And there should be no configurations possible which use nohash/64/pgtable.h
but don't also enable CONFIG_PPC_BOOK3E.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rui Teng <rui.teng@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The changes to use gas sections for constructing the exception vectors
causes a build break when using binutils 2.23:
arch/powerpc/kernel/exceptions-64s.S:770: Error: operand out of range
(0xffffffffffff8100 is not between 0x0000000000000000 and 0x000000000000ffff)
And so on.
Reported by Hugh with binutils-2.23.2-8.1.4.ppc64 from openSUSE 13.1 and
also Naveen & Denis using 2.23.52.0.1-26.el7 from RHEL 7. Strangely
binutils 2.22 (what I test with) is not affected.
This is caused by the use of @l in LOAD_HANDLER(). The @l was only
recently added in commit a24553dd02 ("powerpc/pseries: Remove
unnecessary syscall trampoline").
Luckily the gas section changes split out the LOAD_SYSCALL_HANDLER()
macro, which means we actually *don't* need to use @l in LOAD_HANDLER()
any more, only in LOAD_SYSCALL_HANDLER().
So drop the @l from LOAD_HANDLER().
Fixes: 57f266497d ("powerpc: Use gas sections for arranging exception vectors")
Signed-off-by: Hugh Dickins <hughd@google.com>
[mpe: Add gory details to change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wakeups from winkle set the low bit of the HSPRG0 register, to
distinguish it from other sleep states. This is also the PACA pointer.
The system reset exception handler fails to mask this bit away before
using this value before using it as the PACA pointer.
Fix this by adding a new type of exception prolog macro where we already
have the PACA set in r13, and have the system reset vector mask it out.
The winkle wakeup handler will store the masked value back into HSPRG0.
Fixes: fb479e44a9 ("powerpc/64s: relocation, register save fixes for system reset interrupt")
Cc: stable@vger.kernel.org # v3.0+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>