Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "ARM: - support for SVE and Pointer Authentication in guests - PMU improvements POWER: - support for direct access to the POWER9 XIVE interrupt controller - memory and performance optimizations x86: - support for accessing memory not backed by struct page - fixes and refactoring Generic: - dirty page tracking improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits) kvm: fix compilation on aarch64 Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" kvm: x86: Fix L1TF mitigation for shadow MMU KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing" KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete tests: kvm: Add tests for KVM_SET_NESTED_STATE KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID tests: kvm: Add tests to .gitignore KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one KVM: Fix the bitmap range to copy during clear dirty KVM: arm64: Fix ptrauth ID register masking logic KVM: x86: use direct accessors for RIP and RSP KVM: VMX: Use accessors for GPRs outside of dedicated caching logic KVM: x86: Omit caching logic for always-available GPRs kvm, x86: Properly check whether a pfn is an MMIO or not ...
This commit is contained in:
@@ -69,23 +69,6 @@ by and on behalf of the VM's process may not be freed/unaccounted when
|
||||
the VM is shut down.
|
||||
|
||||
|
||||
It is important to note that althought VM ioctls may only be issued from
|
||||
the process that created the VM, a VM's lifecycle is associated with its
|
||||
file descriptor, not its creator (process). In other words, the VM and
|
||||
its resources, *including the associated address space*, are not freed
|
||||
until the last reference to the VM's file descriptor has been released.
|
||||
For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will
|
||||
not be freed until both the parent (original) process and its child have
|
||||
put their references to the VM's file descriptor.
|
||||
|
||||
Because a VM's resources are not freed until the last reference to its
|
||||
file descriptor is released, creating additional references to a VM via
|
||||
via fork(), dup(), etc... without careful consideration is strongly
|
||||
discouraged and may have unwanted side effects, e.g. memory allocated
|
||||
by and on behalf of the VM's process may not be freed/unaccounted when
|
||||
the VM is shut down.
|
||||
|
||||
|
||||
3. Extensions
|
||||
-------------
|
||||
|
||||
@@ -347,7 +330,7 @@ They must be less than the value that KVM_CHECK_EXTENSION returns for
|
||||
the KVM_CAP_MULTI_ADDRESS_SPACE capability.
|
||||
|
||||
The bits in the dirty bitmap are cleared before the ioctl returns, unless
|
||||
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information,
|
||||
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information,
|
||||
see the description of the capability.
|
||||
|
||||
4.9 KVM_SET_MEMORY_ALIAS
|
||||
@@ -1117,9 +1100,8 @@ struct kvm_userspace_memory_region {
|
||||
This ioctl allows the user to create, modify or delete a guest physical
|
||||
memory slot. Bits 0-15 of "slot" specify the slot id and this value
|
||||
should be less than the maximum number of user memory slots supported per
|
||||
VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS,
|
||||
if this capability is supported by the architecture. Slots may not
|
||||
overlap in guest physical address space.
|
||||
VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS.
|
||||
Slots may not overlap in guest physical address space.
|
||||
|
||||
If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
|
||||
specifies the address space which is being modified. They must be
|
||||
@@ -1901,6 +1883,12 @@ Architectures: all
|
||||
Type: vcpu ioctl
|
||||
Parameters: struct kvm_one_reg (in)
|
||||
Returns: 0 on success, negative value on failure
|
||||
Errors:
|
||||
ENOENT: no such register
|
||||
EINVAL: invalid register ID, or no such register
|
||||
EPERM: (arm64) register access not allowed before vcpu finalization
|
||||
(These error codes are indicative only: do not rely on a specific error
|
||||
code being returned in a specific situation.)
|
||||
|
||||
struct kvm_one_reg {
|
||||
__u64 id;
|
||||
@@ -1985,6 +1973,7 @@ registers, find a list below:
|
||||
PPC | KVM_REG_PPC_TLB3PS | 32
|
||||
PPC | KVM_REG_PPC_EPTCFG | 32
|
||||
PPC | KVM_REG_PPC_ICP_STATE | 64
|
||||
PPC | KVM_REG_PPC_VP_STATE | 128
|
||||
PPC | KVM_REG_PPC_TB_OFFSET | 64
|
||||
PPC | KVM_REG_PPC_SPMC1 | 32
|
||||
PPC | KVM_REG_PPC_SPMC2 | 32
|
||||
@@ -2137,6 +2126,37 @@ contains elements ranging from 32 to 128 bits. The index is a 32bit
|
||||
value in the kvm_regs structure seen as a 32bit array.
|
||||
0x60x0 0000 0010 <index into the kvm_regs struct:16>
|
||||
|
||||
Specifically:
|
||||
Encoding Register Bits kvm_regs member
|
||||
----------------------------------------------------------------
|
||||
0x6030 0000 0010 0000 X0 64 regs.regs[0]
|
||||
0x6030 0000 0010 0002 X1 64 regs.regs[1]
|
||||
...
|
||||
0x6030 0000 0010 003c X30 64 regs.regs[30]
|
||||
0x6030 0000 0010 003e SP 64 regs.sp
|
||||
0x6030 0000 0010 0040 PC 64 regs.pc
|
||||
0x6030 0000 0010 0042 PSTATE 64 regs.pstate
|
||||
0x6030 0000 0010 0044 SP_EL1 64 sp_el1
|
||||
0x6030 0000 0010 0046 ELR_EL1 64 elr_el1
|
||||
0x6030 0000 0010 0048 SPSR_EL1 64 spsr[KVM_SPSR_EL1] (alias SPSR_SVC)
|
||||
0x6030 0000 0010 004a SPSR_ABT 64 spsr[KVM_SPSR_ABT]
|
||||
0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND]
|
||||
0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ]
|
||||
0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ]
|
||||
0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] (*)
|
||||
0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] (*)
|
||||
...
|
||||
0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] (*)
|
||||
0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr
|
||||
0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr
|
||||
|
||||
(*) These encodings are not accepted for SVE-enabled vcpus. See
|
||||
KVM_ARM_VCPU_INIT.
|
||||
|
||||
The equivalent register content can be accessed via bits [127:0] of
|
||||
the corresponding SVE Zn registers instead for vcpus that have SVE
|
||||
enabled (see below).
|
||||
|
||||
arm64 CCSIDR registers are demultiplexed by CSSELR value:
|
||||
0x6020 0000 0011 00 <csselr:8>
|
||||
|
||||
@@ -2146,6 +2166,64 @@ arm64 system registers have the following id bit patterns:
|
||||
arm64 firmware pseudo-registers have the following bit pattern:
|
||||
0x6030 0000 0014 <regno:16>
|
||||
|
||||
arm64 SVE registers have the following bit patterns:
|
||||
0x6080 0000 0015 00 <n:5> <slice:5> Zn bits[2048*slice + 2047 : 2048*slice]
|
||||
0x6050 0000 0015 04 <n:4> <slice:5> Pn bits[256*slice + 255 : 256*slice]
|
||||
0x6050 0000 0015 060 <slice:5> FFR bits[256*slice + 255 : 256*slice]
|
||||
0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register
|
||||
|
||||
Access to register IDs where 2048 * slice >= 128 * max_vq will fail with
|
||||
ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit
|
||||
quadwords: see (**) below.
|
||||
|
||||
These registers are only accessible on vcpus for which SVE is enabled.
|
||||
See KVM_ARM_VCPU_INIT for details.
|
||||
|
||||
In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not
|
||||
accessible until the vcpu's SVE configuration has been finalized
|
||||
using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). See KVM_ARM_VCPU_INIT
|
||||
and KVM_ARM_VCPU_FINALIZE for more information about this procedure.
|
||||
|
||||
KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector
|
||||
lengths supported by the vcpu to be discovered and configured by
|
||||
userspace. When transferred to or from user memory via KVM_GET_ONE_REG
|
||||
or KVM_SET_ONE_REG, the value of this register is of type
|
||||
__u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as
|
||||
follows:
|
||||
|
||||
__u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS];
|
||||
|
||||
if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX &&
|
||||
((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >>
|
||||
((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1))
|
||||
/* Vector length vq * 16 bytes supported */
|
||||
else
|
||||
/* Vector length vq * 16 bytes not supported */
|
||||
|
||||
(**) The maximum value vq for which the above condition is true is
|
||||
max_vq. This is the maximum vector length available to the guest on
|
||||
this vcpu, and determines which register slices are visible through
|
||||
this ioctl interface.
|
||||
|
||||
(See Documentation/arm64/sve.txt for an explanation of the "vq"
|
||||
nomenclature.)
|
||||
|
||||
KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT.
|
||||
KVM_ARM_VCPU_INIT initialises it to the best set of vector lengths that
|
||||
the host supports.
|
||||
|
||||
Userspace may subsequently modify it if desired until the vcpu's SVE
|
||||
configuration is finalized using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
|
||||
|
||||
Apart from simply removing all vector lengths from the host set that
|
||||
exceed some value, support for arbitrarily chosen sets of vector lengths
|
||||
is hardware-dependent and may not be available. Attempting to configure
|
||||
an invalid set of vector lengths via KVM_SET_ONE_REG will fail with
|
||||
EINVAL.
|
||||
|
||||
After the vcpu's SVE configuration is finalized, further attempts to
|
||||
write this register will fail with EPERM.
|
||||
|
||||
|
||||
MIPS registers are mapped using the lower 32 bits. The upper 16 of that is
|
||||
the register group type:
|
||||
@@ -2198,6 +2276,12 @@ Architectures: all
|
||||
Type: vcpu ioctl
|
||||
Parameters: struct kvm_one_reg (in and out)
|
||||
Returns: 0 on success, negative value on failure
|
||||
Errors include:
|
||||
ENOENT: no such register
|
||||
EINVAL: invalid register ID, or no such register
|
||||
EPERM: (arm64) register access not allowed before vcpu finalization
|
||||
(These error codes are indicative only: do not rely on a specific error
|
||||
code being returned in a specific situation.)
|
||||
|
||||
This ioctl allows to receive the value of a single register implemented
|
||||
in a vcpu. The register to read is indicated by the "id" field of the
|
||||
@@ -2690,6 +2774,49 @@ Possible features:
|
||||
- KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
|
||||
Depends on KVM_CAP_ARM_PMU_V3.
|
||||
|
||||
- KVM_ARM_VCPU_PTRAUTH_ADDRESS: Enables Address Pointer authentication
|
||||
for arm64 only.
|
||||
Depends on KVM_CAP_ARM_PTRAUTH_ADDRESS.
|
||||
If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are
|
||||
both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and
|
||||
KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be
|
||||
requested.
|
||||
|
||||
- KVM_ARM_VCPU_PTRAUTH_GENERIC: Enables Generic Pointer authentication
|
||||
for arm64 only.
|
||||
Depends on KVM_CAP_ARM_PTRAUTH_GENERIC.
|
||||
If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are
|
||||
both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and
|
||||
KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be
|
||||
requested.
|
||||
|
||||
- KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only).
|
||||
Depends on KVM_CAP_ARM_SVE.
|
||||
Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
|
||||
|
||||
* After KVM_ARM_VCPU_INIT:
|
||||
|
||||
- KVM_REG_ARM64_SVE_VLS may be read using KVM_GET_ONE_REG: the
|
||||
initial value of this pseudo-register indicates the best set of
|
||||
vector lengths possible for a vcpu on this host.
|
||||
|
||||
* Before KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
|
||||
|
||||
- KVM_RUN and KVM_GET_REG_LIST are not available;
|
||||
|
||||
- KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access
|
||||
the scalable archietctural SVE registers
|
||||
KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or
|
||||
KVM_REG_ARM64_SVE_FFR;
|
||||
|
||||
- KVM_REG_ARM64_SVE_VLS may optionally be written using
|
||||
KVM_SET_ONE_REG, to modify the set of vector lengths available
|
||||
for the vcpu.
|
||||
|
||||
* After KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
|
||||
|
||||
- the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can
|
||||
no longer be written using KVM_SET_ONE_REG.
|
||||
|
||||
4.83 KVM_ARM_PREFERRED_TARGET
|
||||
|
||||
@@ -3809,7 +3936,7 @@ to I/O ports.
|
||||
|
||||
4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
|
||||
|
||||
Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
|
||||
Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
|
||||
Architectures: x86, arm, arm64, mips
|
||||
Type: vm ioctl
|
||||
Parameters: struct kvm_dirty_log (in)
|
||||
@@ -3842,10 +3969,10 @@ the address space for which you want to return the dirty bitmap.
|
||||
They must be less than the value that KVM_CHECK_EXTENSION returns for
|
||||
the KVM_CAP_MULTI_ADDRESS_SPACE capability.
|
||||
|
||||
This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
|
||||
This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
|
||||
is enabled; for more information, see the description of the capability.
|
||||
However, it can always be used as long as KVM_CHECK_EXTENSION confirms
|
||||
that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is present.
|
||||
that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present.
|
||||
|
||||
4.118 KVM_GET_SUPPORTED_HV_CPUID
|
||||
|
||||
@@ -3904,6 +4031,40 @@ number of valid entries in the 'entries' array, which is then filled.
|
||||
'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved,
|
||||
userspace should not expect to get any particular value there.
|
||||
|
||||
4.119 KVM_ARM_VCPU_FINALIZE
|
||||
|
||||
Architectures: arm, arm64
|
||||
Type: vcpu ioctl
|
||||
Parameters: int feature (in)
|
||||
Returns: 0 on success, -1 on error
|
||||
Errors:
|
||||
EPERM: feature not enabled, needs configuration, or already finalized
|
||||
EINVAL: feature unknown or not present
|
||||
|
||||
Recognised values for feature:
|
||||
arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE)
|
||||
|
||||
Finalizes the configuration of the specified vcpu feature.
|
||||
|
||||
The vcpu must already have been initialised, enabling the affected feature, by
|
||||
means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in
|
||||
features[].
|
||||
|
||||
For affected vcpu features, this is a mandatory step that must be performed
|
||||
before the vcpu is fully usable.
|
||||
|
||||
Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be
|
||||
configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration
|
||||
that should be performaned and how to do it are feature-dependent.
|
||||
|
||||
Other calls that depend on a particular feature being finalized, such as
|
||||
KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with
|
||||
-EPERM unless the feature has already been finalized by means of a
|
||||
KVM_ARM_VCPU_FINALIZE call.
|
||||
|
||||
See KVM_ARM_VCPU_INIT for details of vcpu features that require finalization
|
||||
using this ioctl.
|
||||
|
||||
5. The kvm_run structure
|
||||
------------------------
|
||||
|
||||
@@ -4505,6 +4666,15 @@ struct kvm_sync_regs {
|
||||
struct kvm_vcpu_events events;
|
||||
};
|
||||
|
||||
6.75 KVM_CAP_PPC_IRQ_XIVE
|
||||
|
||||
Architectures: ppc
|
||||
Target: vcpu
|
||||
Parameters: args[0] is the XIVE device fd
|
||||
args[1] is the XIVE CPU number (server ID) for this vcpu
|
||||
|
||||
This capability connects the vcpu to an in-kernel XIVE device.
|
||||
|
||||
7. Capabilities that can be enabled on VMs
|
||||
------------------------------------------
|
||||
|
||||
@@ -4798,7 +4968,7 @@ and injected exceptions.
|
||||
* For the new DR6 bits, note that bit 16 is set iff the #DB exception
|
||||
will clear DR6.RTM.
|
||||
|
||||
7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
|
||||
7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
|
||||
|
||||
Architectures: x86, arm, arm64, mips
|
||||
Parameters: args[0] whether feature should be enabled or not
|
||||
@@ -4821,6 +4991,11 @@ while userspace can see false reports of dirty pages. Manual reprotection
|
||||
helps reducing this time, improving guest performance and reducing the
|
||||
number of dirty log false positives.
|
||||
|
||||
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
|
||||
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
|
||||
it hard or impossible to use it correctly. The availability of
|
||||
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed.
|
||||
Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT.
|
||||
|
||||
8. Other capabilities.
|
||||
----------------------
|
||||
|
@@ -141,7 +141,8 @@ struct kvm_s390_vm_cpu_subfunc {
|
||||
u8 pcc[16]; # valid with Message-Security-Assist-Extension 4
|
||||
u8 ppno[16]; # valid with Message-Security-Assist-Extension 5
|
||||
u8 kma[16]; # valid with Message-Security-Assist-Extension 8
|
||||
u8 reserved[1808]; # reserved for future instructions
|
||||
u8 kdsa[16]; # valid with Message-Security-Assist-Extension 9
|
||||
u8 reserved[1792]; # reserved for future instructions
|
||||
};
|
||||
|
||||
Parameters: address of a buffer to load the subfunction blocks from.
|
||||
|
197
Documentation/virtual/kvm/devices/xive.txt
Normal file
197
Documentation/virtual/kvm/devices/xive.txt
Normal file
@@ -0,0 +1,197 @@
|
||||
POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
|
||||
==========================================================
|
||||
|
||||
Device types supported:
|
||||
KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1
|
||||
|
||||
This device acts as a VM interrupt controller. It provides the KVM
|
||||
interface to configure the interrupt sources of a VM in the underlying
|
||||
POWER9 XIVE interrupt controller.
|
||||
|
||||
Only one XIVE instance may be instantiated. A guest XIVE device
|
||||
requires a POWER9 host and the guest OS should have support for the
|
||||
XIVE native exploitation interrupt mode. If not, it should run using
|
||||
the legacy interrupt mode, referred as XICS (POWER7/8).
|
||||
|
||||
* Device Mappings
|
||||
|
||||
The KVM device exposes different MMIO ranges of the XIVE HW which
|
||||
are required for interrupt management. These are exposed to the
|
||||
guest in VMAs populated with a custom VM fault handler.
|
||||
|
||||
1. Thread Interrupt Management Area (TIMA)
|
||||
|
||||
Each thread has an associated Thread Interrupt Management context
|
||||
composed of a set of registers. These registers let the thread
|
||||
handle priority management and interrupt acknowledgment. The most
|
||||
important are :
|
||||
|
||||
- Interrupt Pending Buffer (IPB)
|
||||
- Current Processor Priority (CPPR)
|
||||
- Notification Source Register (NSR)
|
||||
|
||||
They are exposed to software in four different pages each proposing
|
||||
a view with a different privilege. The first page is for the
|
||||
physical thread context and the second for the hypervisor. Only the
|
||||
third (operating system) and the fourth (user level) are exposed the
|
||||
guest.
|
||||
|
||||
2. Event State Buffer (ESB)
|
||||
|
||||
Each source is associated with an Event State Buffer (ESB) with
|
||||
either a pair of even/odd pair of pages which provides commands to
|
||||
manage the source: to trigger, to EOI, to turn off the source for
|
||||
instance.
|
||||
|
||||
3. Device pass-through
|
||||
|
||||
When a device is passed-through into the guest, the source
|
||||
interrupts are from a different HW controller (PHB4) and the ESB
|
||||
pages exposed to the guest should accommadate this change.
|
||||
|
||||
The passthru_irq helpers, kvmppc_xive_set_mapped() and
|
||||
kvmppc_xive_clr_mapped() are called when the device HW irqs are
|
||||
mapped into or unmapped from the guest IRQ number space. The KVM
|
||||
device extends these helpers to clear the ESB pages of the guest IRQ
|
||||
number being mapped and then lets the VM fault handler repopulate.
|
||||
The handler will insert the ESB page corresponding to the HW
|
||||
interrupt of the device being passed-through or the initial IPI ESB
|
||||
page if the device has being removed.
|
||||
|
||||
The ESB remapping is fully transparent to the guest and the OS
|
||||
device driver. All handling is done within VFIO and the above
|
||||
helpers in KVM-PPC.
|
||||
|
||||
* Groups:
|
||||
|
||||
1. KVM_DEV_XIVE_GRP_CTRL
|
||||
Provides global controls on the device
|
||||
Attributes:
|
||||
1.1 KVM_DEV_XIVE_RESET (write only)
|
||||
Resets the interrupt controller configuration for sources and event
|
||||
queues. To be used by kexec and kdump.
|
||||
Errors: none
|
||||
|
||||
1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
|
||||
Sync all the sources and queues and mark the EQ pages dirty. This
|
||||
to make sure that a consistent memory state is captured when
|
||||
migrating the VM.
|
||||
Errors: none
|
||||
|
||||
2. KVM_DEV_XIVE_GRP_SOURCE (write only)
|
||||
Initializes a new source in the XIVE device and mask it.
|
||||
Attributes:
|
||||
Interrupt source number (64-bit)
|
||||
The kvm_device_attr.addr points to a __u64 value:
|
||||
bits: | 63 .... 2 | 1 | 0
|
||||
values: | unused | level | type
|
||||
- type: 0:MSI 1:LSI
|
||||
- level: assertion level in case of an LSI.
|
||||
Errors:
|
||||
-E2BIG: Interrupt source number is out of range
|
||||
-ENOMEM: Could not create a new source block
|
||||
-EFAULT: Invalid user pointer for attr->addr.
|
||||
-ENXIO: Could not allocate underlying HW interrupt
|
||||
|
||||
3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
|
||||
Configures source targeting
|
||||
Attributes:
|
||||
Interrupt source number (64-bit)
|
||||
The kvm_device_attr.addr points to a __u64 value:
|
||||
bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0
|
||||
values: | eisn | mask | server | priority
|
||||
- priority: 0-7 interrupt priority level
|
||||
- server: CPU number chosen to handle the interrupt
|
||||
- mask: mask flag (unused)
|
||||
- eisn: Effective Interrupt Source Number
|
||||
Errors:
|
||||
-ENOENT: Unknown source number
|
||||
-EINVAL: Not initialized source number
|
||||
-EINVAL: Invalid priority
|
||||
-EINVAL: Invalid CPU number.
|
||||
-EFAULT: Invalid user pointer for attr->addr.
|
||||
-ENXIO: CPU event queues not configured or configuration of the
|
||||
underlying HW interrupt failed
|
||||
-EBUSY: No CPU available to serve interrupt
|
||||
|
||||
4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
|
||||
Configures an event queue of a CPU
|
||||
Attributes:
|
||||
EQ descriptor identifier (64-bit)
|
||||
The EQ descriptor identifier is a tuple (server, priority) :
|
||||
bits: | 63 .... 32 | 31 .. 3 | 2 .. 0
|
||||
values: | unused | server | priority
|
||||
The kvm_device_attr.addr points to :
|
||||
struct kvm_ppc_xive_eq {
|
||||
__u32 flags;
|
||||
__u32 qshift;
|
||||
__u64 qaddr;
|
||||
__u32 qtoggle;
|
||||
__u32 qindex;
|
||||
__u8 pad[40];
|
||||
};
|
||||
- flags: queue flags
|
||||
KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
|
||||
forces notification without using the coalescing mechanism
|
||||
provided by the XIVE END ESBs.
|
||||
- qshift: queue size (power of 2)
|
||||
- qaddr: real address of queue
|
||||
- qtoggle: current queue toggle bit
|
||||
- qindex: current queue index
|
||||
- pad: reserved for future use
|
||||
Errors:
|
||||
-ENOENT: Invalid CPU number
|
||||
-EINVAL: Invalid priority
|
||||
-EINVAL: Invalid flags
|
||||
-EINVAL: Invalid queue size
|
||||
-EINVAL: Invalid queue address
|
||||
-EFAULT: Invalid user pointer for attr->addr.
|
||||
-EIO: Configuration of the underlying HW failed
|
||||
|
||||
5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
|
||||
Synchronize the source to flush event notifications
|
||||
Attributes:
|
||||
Interrupt source number (64-bit)
|
||||
Errors:
|
||||
-ENOENT: Unknown source number
|
||||
-EINVAL: Not initialized source number
|
||||
|
||||
* VCPU state
|
||||
|
||||
The XIVE IC maintains VP interrupt state in an internal structure
|
||||
called the NVT. When a VP is not dispatched on a HW processor
|
||||
thread, this structure can be updated by HW if the VP is the target
|
||||
of an event notification.
|
||||
|
||||
It is important for migration to capture the cached IPB from the NVT
|
||||
as it synthesizes the priorities of the pending interrupts. We
|
||||
capture a bit more to report debug information.
|
||||
|
||||
KVM_REG_PPC_VP_STATE (2 * 64bits)
|
||||
bits: | 63 .... 32 | 31 .... 0 |
|
||||
values: | TIMA word0 | TIMA word1 |
|
||||
bits: | 127 .......... 64 |
|
||||
values: | unused |
|
||||
|
||||
* Migration:
|
||||
|
||||
Saving the state of a VM using the XIVE native exploitation mode
|
||||
should follow a specific sequence. When the VM is stopped :
|
||||
|
||||
1. Mask all sources (PQ=01) to stop the flow of events.
|
||||
|
||||
2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
|
||||
flush any in-flight event notification and to stabilize the EQs. At
|
||||
this stage, the EQ pages are marked dirty to make sure they are
|
||||
transferred in the migration sequence.
|
||||
|
||||
3. Capture the state of the source targeting, the EQs configuration
|
||||
and the state of thread interrupt context registers.
|
||||
|
||||
Restore is similar :
|
||||
|
||||
1. Restore the EQ configuration. As targeting depends on it.
|
||||
2. Restore targeting
|
||||
3. Restore the thread interrupt contexts
|
||||
4. Restore the source states
|
||||
5. Let the vCPU run
|
Reference in New Issue
Block a user