The naming of eeh_{add_to|remove_from}_parent_pe() doesn't really reflect
what they actually do. If the PE referred to be edev->pe_config_addr
already exists under that PHB then the edev is added to that PE. However,
if the PE doesn't exist the a new one is created for the edev.
The bulk of the implementation of eeh_add_to_parent_pe() covers that
second case. Similarly, most of eeh_remove_from_parent_pe() is
determining when it's safe to delete a PE.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-12-oohall@gmail.com
The edev->class_code field is never referenced anywhere except for the
platform specific probe functions. The same information is available in
the pci_dev for PowerNV and in the pci_dn on pseries so we can remove
the field.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-11-oohall@gmail.com
There's a bunch of strange things about this code. First up is that none of
the fields being written to are functional for a VF. The SR-IOV
specification lists then as "Reserved, but OS should preserve" so writing
new values to them doesn't do anything and is clearly wrong from a
correctness perspective.
However, since VFs are designed to be managed by the OS there is an
argument to be made that we should be saving and restoring some parts of
config space. We already sort of do that by saving the first 64 bytes of
config space in the eeh_dev (see eeh_dev->config_space[]). This is
inadequate since it doesn't even consider saving and restoring the PCI
capability structures. However, this is a problem with EEH in general and
that needs to be fixed for non-VF devices too.
There's no real reason to keep around this around so delete it.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-6-oohall@gmail.com
This is used in precisely one place which is in pseries specific platform
code. There's no need to have the callback in eeh_ops since the platform
chooses the EEH PE addresses anyway. The PowerNV implementation has always
been a stub too so remove it.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-5-oohall@gmail.com
Drivers that do not support the PCI error handling callbacks are handled by
tearing down the device and re-probing them. If the device being removed is
a virtual function then we need to know the VF index so it can be removed
using the pci_iov_{add|remove}_virtfn() API.
Currently this is handled by looking up the pci_dn, and using the vf_index
that was stashed there when the pci_dn for the VF was created in
pcibios_sriov_enable(). We would like to eliminate the use of pci_dn
outside of pseries though so we need to provide the generic EEH code with
some other way to find the vf_index.
The easiest thing to do here is move the vf_index field out of pci_dn and
into eeh_dev. Currently pci_dn and eeh_dev are allocated and initialized
together so this is a fairly minimal change in preparation for splitting
pci_dn and eeh_dev in the future.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-3-oohall@gmail.com
This function is a one line wrapper around eeh_phb_pe_create() and despite
the name it doesn't create any eeh_dev structures. Replace it with direct
calls to eeh_phb_pe_create() since that does what it says on the tin
and removes a layer of indirection.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725081231.39076-1-oohall@gmail.com
CPU_FTR_DAWR is by default enabled for host via CPU_FTRS_DT_CPU_BASE
(controlled by CONFIG_PPC_DT_CPU_FTRS). But cpu-features device-tree
node is not PAPR compatible and thus not yet used by kvm or pHyp
guests. Enable watchpoint functionality on power10 guest (both kvm
and powervm) by adding CPU_FTR_DAWR to CPU_FTRS_POWER10. Note that
this change does not enable 2nd DAWR support.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-5-ravi.bangoria@linux.ibm.com
From Nick's cover letter:
Linux powerpc new system call instruction and ABI
System Call Vectored (scv) ABI
==============================
The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is
performance (trading slower SRR0/1 with faster LR/CTR registers, and
entering the kernel with MSR[EE] and MSR[RI] left enabled, which can
reduce MSR updates. The scv instruction has 128 levels (not enough to
cover the Linux system call space).
Assignment and advertisement
----------------------------
The proposal is to assign scv levels conservatively, and advertise
them with HWCAP feature bits as we add support for more.
Linux has not enabled FSCR[SCV] yet, so executing the scv instruction
will cause the kernel to log a "SCV facility unavilable" message, and
deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a
HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it.
This change allocates the zero level ('scv 0'), advertised with
PPC_FEATURE2_SCV, which will be used to provide normal Linux system
calls (equivalent to 'sc').
Attempting to execute scv with other levels will cause a SIGILL to be
delivered the same as before, but will not log a "SCV facility
unavailable" message (because the processor facility is enabled).
Calling convention
------------------
The proposal is for scv 0 to provide the standard Linux system call
ABI with the following differences from sc convention[1]:
- LR is to be volatile across scv calls. This is necessary because the
scv instruction clobbers LR. From previous discussion, this should
be possible to deal with in GCC clobbers and CFI.
- cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow
the kernel system call exit to avoid restoring the volatile cr
registers (although we probably still would anyway to avoid
information leaks).
- Error handling: The consensus among kernel, glibc, and musl is to
move to using negative return values in r3 rather than CR0[SO]=1 to
indicate error, which matches most other architectures, and is
closer to a function call.
Notes
-----
- r0,r4-r8 are documented as volatile in the ABI, but the kernel patch
as submitted currently preserves them. This is to leave room for
deciding which way to go with these. Some small benefit was found by
preserving them[1] but I'm not convinced it's worth deviating from
the C function call ABI just for this. Release code should follow
the ABI.
Previous discussions:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.htmlhttps://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html
[1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html
On PAPR+ the hcall() on 0x1B0 is called H_DISABLE_AND_GET, but got
defined as H_DISABLE_AND_GETC instead.
This define was introduced with a typo in commit <b13a96cfb055>
("[PATCH] powerpc: Extends HCALL interface for InfiniBand usage"), and was
later used without having the typo noticed.
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200707004812.190765-1-leobras.c@gmail.com
There are quite a few places where instructions are printed, this is
done using a '%x' format specifier. With the introduction of prefixed
instructions, this does not work well. Currently in these places,
ppc_inst_val() is used for the value for %x so only the first word of
prefixed instructions are printed.
When the instructions are word instructions, only a single word should
be printed. For prefixed instructions both the prefix and suffix should
be printed. To accommodate both of these situations, instead of a '%x'
specifier use '%s' and introduce a helper, __ppc_inst_as_str() which
returns a char *. The char * __ppc_inst_as_str() returns is buffer that
is passed to it by the caller.
It is cumbersome to require every caller of __ppc_inst_as_str() to now
declare a buffer. To make it more convenient to use __ppc_inst_as_str(),
wrap it in a macro that uses a compound statement to allocate a buffer
on the caller's stack before calling it.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
[mpe: Drop 0x prefix to match most existings uses, especially xmon]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200602052728.18227-1-jniethe5@gmail.com
Add tests for the prefixed versions of the floating-point load/stores
that are currently tested. This includes the following instructions:
* Prefixed Load Floating-Point Single (plfs)
* Prefixed Load Floating-Point Double (plfd)
* Prefixed Store Floating-Point Single (pstfs)
* Prefixed Store Floating-Point Double (pstfd)
Skip the new tests if ISA v3.10 is unsupported.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Fix conflicts with ppc-opcode.h changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200525025923.19843-2-jniethe5@gmail.com
Add tests for the prefixed versions of the integer load/stores that
are currently tested. This includes the following instructions:
* Prefixed Load Doubleword (pld)
* Prefixed Load Word and Zero (plwz)
* Prefixed Store Doubleword (pstd)
Skip the new tests if ISA v3.1 is unsupported.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Fix conflicts with ppc-opcode.h changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200525025923.19843-1-jniethe5@gmail.com
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
structure. For historical reasons, many kvm-related function parameters
retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
patch does a unified cleanup of these remaining redundant parameters.
[paulus@ozlabs.org - Fixed places that were missed in book3s_interrupts.S]
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Add support for the scv instruction on POWER9 and later CPUs.
For now this implements the zeroth scv vector 'scv 0', as identical to
'sc' system calls, with the exception that LR is not preserved, nor
are volatile CR registers, and error is not indicated with CR0[SO],
but by returning a negative errno.
rfscv is implemented to return from scv type system calls. It can not
be used to return from sc system calls because those are defined to
preserve LR.
getpid syscall throughput on POWER9 is improved by 26% (428 to 318
cycles), largely due to reducing mtmsr and mtspr.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix ppc64e build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
Add Power10 feature function to DT CPU features, along with a Power10
specific init() to initialize PMU SPRs, sets the oprofile_cpu_type and
cpu_features. This will enable performance monitoring unit (PMU) for
Power10 in CPU features with "performance-monitor-power10".
For Power ISA v3.1, BHRB disable is controlled via Monitor Mode
Control Register A (MMCRA) bit, namely "BHRB Recording
Disable (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB
feature at boot for Power10.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Move MMCRA_BHRB_DISABLE as noted by jpn, drop CPU setup changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1594996707-3727-8-git-send-email-atrajeev@linux.vnet.ibm.com
Power ISA v3.1 has added new performance monitoring unit (PMU) special
purpose registers (SPRs). They are:
Monitor Mode Control Register 3 (MMCR3)
Sampled Instruction Event Register A (SIER2)
Sampled Instruction Event Register B (SIER3)
Add support to save/restore these new SPRs while entering/exiting
guest. Also include changes to support KVM_REG_PPC_MMCR3/SIER2/SIER3.
Add new SPRs to KVM API documentation.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1594996707-3727-6-git-send-email-atrajeev@linux.vnet.ibm.com
PowerISA v3.1 includes new performance monitoring unit(PMU)
special purpose registers (SPRs). They are
Monitor Mode Control Register 3 (MMCR3)
Sampled Instruction Event Register 2 (SIER2)
Sampled Instruction Event Register 3 (SIER3)
MMCR3 is added for further sampling related configuration
control. SIER2/SIER3 are added to provide additional
information about the sampled instruction.
Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of
these new SPRs, updates the struct thread_struct to include these new
SPRs, include MMCR3 in struct mmcr_regs. This is needed to support
programming of MMCR3 SPR during event_enable/disable. Patch also adds
the sysfs support for the MMCR3 SPR along with SPRN_ macros for these
new pmu SPRs.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Rename to PPMU_ARCH_31 as noted by jpn]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1594996707-3727-5-git-send-email-atrajeev@linux.vnet.ibm.com
Events of type PERF_TYPE_HW_CACHE was described for Power PMU
as: int (*cache_events)[type][op][result];
where type, op, result values unpacked from the event attribute config
value is used to generate the raw event code at runtime.
So far the event code values which used to create these cache-related
events were within 32 bit and `int` type worked. In power10,
some of the event codes are of 64-bit value and hence update the
Power PMU cache_events to `u64` type in `power_pmu` struct.
Also propagate this change to existing all PMU driver code paths
which are using ppmu->cache_events.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1594996707-3727-4-git-send-email-atrajeev@linux.vnet.ibm.com
Currently `kvm_vcpu_arch` stores all Monitor Mode Control registers
in a flat array in order: mmcr0, mmcr1, mmcra, mmcr2, mmcrs
Split this to give mmcra and mmcrs its own entries in vcpu and
use a flat array for mmcr0 to mmcr2. This patch implements this
cleanup to make code easier to read.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Fix MMCRA/MMCR2 uapi breakage as noted by paulus]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1594996707-3727-3-git-send-email-atrajeev@linux.vnet.ibm.com
core-book3s currently uses array to store the MMCR registers as part
of per-cpu `cpu_hw_events`. This patch does a clean up to use `struct`
to store mmcr regs instead of array. This will make code easier to read
and reduces chance of any subtle bug that may come in the future, say
when new registers are added. Patch updates all relevant code that was
using MMCR array ( cpuhw->mmcr[x]) to use newly introduced `struct`.
This includes the PMU driver code for supported platforms (power5
to power9) and ISA macros for counter support functions.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1594996707-3727-2-git-send-email-atrajeev@linux.vnet.ibm.com
Compilation error:
arch/powerpc/perf/perf_regs.c:80:undefined reference to `.is_sier_available'
Currently is_sier_available() is part of core-book3s.c, which is added
to build based on CONFIG_PPC_PERF_CTRS.
A config with CONFIG_PERF_EVENTS and without CONFIG_PPC_PERF_CTRS will
have a build break because of missing is_sier_available().
In practice it only breaks when CONFIG_FSL_EMB_PERF_EVENT=n because
that also guards the usage of is_sier_available(). That only happens
with CONFIG_PPC_BOOK3E_64=y and CONFIG_FSL_SOC_BOOKE=n.
Patch adds is_sier_available() in asm/perf_event.h to fix the build
break for configs missing CONFIG_PPC_PERF_CTRS.
Fixes: 333804dc3b ("powerpc/perf: Update perf_regs structure to include SIER")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Add detail about CONFIG_FSL_SOC_BOOKE]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200614083604.302611-1-maddy@linux.ibm.com
ISA v3.1 does not support the SAO storage control attribute required to
implement PROT_SAO. PROT_SAO was used by specialised system software
(Lx86) that has been discontinued for about 7 years, and is not thought
to be used elsewhere, so removal should not cause problems.
We rather remove it than keep support for older processors, because
live migrating guest partitions to newer processors may not be possible
if SAO is in use (or worse allowed with silent races).
- PROT_SAO stays in the uapi header so code using it would still build.
- arch_validate_prot() is removed, the generic version rejects PROT_SAO
so applications would get a failure at mmap() time.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop KVM change for the time being]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200703011958.1166620-3-npiggin@gmail.com
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200718103958.5455-1-grandmaster@al2klimov.de
POWER8 and POWER9 have 12-bit LPIDs. Change LPID_RSVD to support up to
(4096 - 2) guests on these processors. POWER7 is kept the same with a
limitation of (1024 - 2), but it might be time to drop KVM support for
POWER7.
Tested with 2048 guests * 4 vCPUs on a witherspoon system with 512G
RAM and a bit of swap.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Adds support for emulating ISAv3.1 guests by adding the appropriate PCR
and FSCR bits.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
UAMOR values are not application-specific. The kernel initializes
its value based on different reserved keys. Remove the thread-specific
UAMOR value and don't switch the UAMOR on context switch.
Move UAMOR initialization to key initialization code and remove
thread_struct.uamor because it is not used anymore.
Before commit: 4a4a5e5d2a ("powerpc/pkeys: key allocation/deallocation must not change pkey registers")
we used to update uamor based on key allocation and free.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-20-aneesh.kumar@linux.ibm.com
As we kexec across kernels that use AMR/IAMR for different purposes
we need to ensure that new kernels get kexec'd with a reset value
of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old
AMR value prevents access to key 0.
This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR
is not needed and the IAMR reset is partial (it doesn't do the reset
on secondary cpus) and is redundant with this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-19-aneesh.kumar@linux.ibm.com