Merge tag 'kvm-ppc-next-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
PPC KVM update for 5.2 * Support for guests to access the new POWER9 XIVE interrupt controller hardware directly, reducing interrupt latency and overhead for guests. * In-kernel implementation of the H_PAGE_INIT hypercall. * Reduce memory usage of sparsely-populated IOMMU tables. * Several bug fixes. Second PPC KVM update for 5.2 * Fix a bug, fix a spelling mistake, remove some useless code.
Cette révision appartient à :
@@ -1967,6 +1967,7 @@ registers, find a list below:
|
||||
PPC | KVM_REG_PPC_TLB3PS | 32
|
||||
PPC | KVM_REG_PPC_EPTCFG | 32
|
||||
PPC | KVM_REG_PPC_ICP_STATE | 64
|
||||
PPC | KVM_REG_PPC_VP_STATE | 128
|
||||
PPC | KVM_REG_PPC_TB_OFFSET | 64
|
||||
PPC | KVM_REG_PPC_SPMC1 | 32
|
||||
PPC | KVM_REG_PPC_SPMC2 | 32
|
||||
@@ -4487,6 +4488,15 @@ struct kvm_sync_regs {
|
||||
struct kvm_vcpu_events events;
|
||||
};
|
||||
|
||||
6.75 KVM_CAP_PPC_IRQ_XIVE
|
||||
|
||||
Architectures: ppc
|
||||
Target: vcpu
|
||||
Parameters: args[0] is the XIVE device fd
|
||||
args[1] is the XIVE CPU number (server ID) for this vcpu
|
||||
|
||||
This capability connects the vcpu to an in-kernel XIVE device.
|
||||
|
||||
7. Capabilities that can be enabled on VMs
|
||||
------------------------------------------
|
||||
|
||||
|
197
Documentation/virtual/kvm/devices/xive.txt
Fichier normal
197
Documentation/virtual/kvm/devices/xive.txt
Fichier normal
@@ -0,0 +1,197 @@
|
||||
POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
|
||||
==========================================================
|
||||
|
||||
Device types supported:
|
||||
KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1
|
||||
|
||||
This device acts as a VM interrupt controller. It provides the KVM
|
||||
interface to configure the interrupt sources of a VM in the underlying
|
||||
POWER9 XIVE interrupt controller.
|
||||
|
||||
Only one XIVE instance may be instantiated. A guest XIVE device
|
||||
requires a POWER9 host and the guest OS should have support for the
|
||||
XIVE native exploitation interrupt mode. If not, it should run using
|
||||
the legacy interrupt mode, referred as XICS (POWER7/8).
|
||||
|
||||
* Device Mappings
|
||||
|
||||
The KVM device exposes different MMIO ranges of the XIVE HW which
|
||||
are required for interrupt management. These are exposed to the
|
||||
guest in VMAs populated with a custom VM fault handler.
|
||||
|
||||
1. Thread Interrupt Management Area (TIMA)
|
||||
|
||||
Each thread has an associated Thread Interrupt Management context
|
||||
composed of a set of registers. These registers let the thread
|
||||
handle priority management and interrupt acknowledgment. The most
|
||||
important are :
|
||||
|
||||
- Interrupt Pending Buffer (IPB)
|
||||
- Current Processor Priority (CPPR)
|
||||
- Notification Source Register (NSR)
|
||||
|
||||
They are exposed to software in four different pages each proposing
|
||||
a view with a different privilege. The first page is for the
|
||||
physical thread context and the second for the hypervisor. Only the
|
||||
third (operating system) and the fourth (user level) are exposed the
|
||||
guest.
|
||||
|
||||
2. Event State Buffer (ESB)
|
||||
|
||||
Each source is associated with an Event State Buffer (ESB) with
|
||||
either a pair of even/odd pair of pages which provides commands to
|
||||
manage the source: to trigger, to EOI, to turn off the source for
|
||||
instance.
|
||||
|
||||
3. Device pass-through
|
||||
|
||||
When a device is passed-through into the guest, the source
|
||||
interrupts are from a different HW controller (PHB4) and the ESB
|
||||
pages exposed to the guest should accommadate this change.
|
||||
|
||||
The passthru_irq helpers, kvmppc_xive_set_mapped() and
|
||||
kvmppc_xive_clr_mapped() are called when the device HW irqs are
|
||||
mapped into or unmapped from the guest IRQ number space. The KVM
|
||||
device extends these helpers to clear the ESB pages of the guest IRQ
|
||||
number being mapped and then lets the VM fault handler repopulate.
|
||||
The handler will insert the ESB page corresponding to the HW
|
||||
interrupt of the device being passed-through or the initial IPI ESB
|
||||
page if the device has being removed.
|
||||
|
||||
The ESB remapping is fully transparent to the guest and the OS
|
||||
device driver. All handling is done within VFIO and the above
|
||||
helpers in KVM-PPC.
|
||||
|
||||
* Groups:
|
||||
|
||||
1. KVM_DEV_XIVE_GRP_CTRL
|
||||
Provides global controls on the device
|
||||
Attributes:
|
||||
1.1 KVM_DEV_XIVE_RESET (write only)
|
||||
Resets the interrupt controller configuration for sources and event
|
||||
queues. To be used by kexec and kdump.
|
||||
Errors: none
|
||||
|
||||
1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
|
||||
Sync all the sources and queues and mark the EQ pages dirty. This
|
||||
to make sure that a consistent memory state is captured when
|
||||
migrating the VM.
|
||||
Errors: none
|
||||
|
||||
2. KVM_DEV_XIVE_GRP_SOURCE (write only)
|
||||
Initializes a new source in the XIVE device and mask it.
|
||||
Attributes:
|
||||
Interrupt source number (64-bit)
|
||||
The kvm_device_attr.addr points to a __u64 value:
|
||||
bits: | 63 .... 2 | 1 | 0
|
||||
values: | unused | level | type
|
||||
- type: 0:MSI 1:LSI
|
||||
- level: assertion level in case of an LSI.
|
||||
Errors:
|
||||
-E2BIG: Interrupt source number is out of range
|
||||
-ENOMEM: Could not create a new source block
|
||||
-EFAULT: Invalid user pointer for attr->addr.
|
||||
-ENXIO: Could not allocate underlying HW interrupt
|
||||
|
||||
3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
|
||||
Configures source targeting
|
||||
Attributes:
|
||||
Interrupt source number (64-bit)
|
||||
The kvm_device_attr.addr points to a __u64 value:
|
||||
bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0
|
||||
values: | eisn | mask | server | priority
|
||||
- priority: 0-7 interrupt priority level
|
||||
- server: CPU number chosen to handle the interrupt
|
||||
- mask: mask flag (unused)
|
||||
- eisn: Effective Interrupt Source Number
|
||||
Errors:
|
||||
-ENOENT: Unknown source number
|
||||
-EINVAL: Not initialized source number
|
||||
-EINVAL: Invalid priority
|
||||
-EINVAL: Invalid CPU number.
|
||||
-EFAULT: Invalid user pointer for attr->addr.
|
||||
-ENXIO: CPU event queues not configured or configuration of the
|
||||
underlying HW interrupt failed
|
||||
-EBUSY: No CPU available to serve interrupt
|
||||
|
||||
4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
|
||||
Configures an event queue of a CPU
|
||||
Attributes:
|
||||
EQ descriptor identifier (64-bit)
|
||||
The EQ descriptor identifier is a tuple (server, priority) :
|
||||
bits: | 63 .... 32 | 31 .. 3 | 2 .. 0
|
||||
values: | unused | server | priority
|
||||
The kvm_device_attr.addr points to :
|
||||
struct kvm_ppc_xive_eq {
|
||||
__u32 flags;
|
||||
__u32 qshift;
|
||||
__u64 qaddr;
|
||||
__u32 qtoggle;
|
||||
__u32 qindex;
|
||||
__u8 pad[40];
|
||||
};
|
||||
- flags: queue flags
|
||||
KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
|
||||
forces notification without using the coalescing mechanism
|
||||
provided by the XIVE END ESBs.
|
||||
- qshift: queue size (power of 2)
|
||||
- qaddr: real address of queue
|
||||
- qtoggle: current queue toggle bit
|
||||
- qindex: current queue index
|
||||
- pad: reserved for future use
|
||||
Errors:
|
||||
-ENOENT: Invalid CPU number
|
||||
-EINVAL: Invalid priority
|
||||
-EINVAL: Invalid flags
|
||||
-EINVAL: Invalid queue size
|
||||
-EINVAL: Invalid queue address
|
||||
-EFAULT: Invalid user pointer for attr->addr.
|
||||
-EIO: Configuration of the underlying HW failed
|
||||
|
||||
5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
|
||||
Synchronize the source to flush event notifications
|
||||
Attributes:
|
||||
Interrupt source number (64-bit)
|
||||
Errors:
|
||||
-ENOENT: Unknown source number
|
||||
-EINVAL: Not initialized source number
|
||||
|
||||
* VCPU state
|
||||
|
||||
The XIVE IC maintains VP interrupt state in an internal structure
|
||||
called the NVT. When a VP is not dispatched on a HW processor
|
||||
thread, this structure can be updated by HW if the VP is the target
|
||||
of an event notification.
|
||||
|
||||
It is important for migration to capture the cached IPB from the NVT
|
||||
as it synthesizes the priorities of the pending interrupts. We
|
||||
capture a bit more to report debug information.
|
||||
|
||||
KVM_REG_PPC_VP_STATE (2 * 64bits)
|
||||
bits: | 63 .... 32 | 31 .... 0 |
|
||||
values: | TIMA word0 | TIMA word1 |
|
||||
bits: | 127 .......... 64 |
|
||||
values: | unused |
|
||||
|
||||
* Migration:
|
||||
|
||||
Saving the state of a VM using the XIVE native exploitation mode
|
||||
should follow a specific sequence. When the VM is stopped :
|
||||
|
||||
1. Mask all sources (PQ=01) to stop the flow of events.
|
||||
|
||||
2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
|
||||
flush any in-flight event notification and to stabilize the EQs. At
|
||||
this stage, the EQ pages are marked dirty to make sure they are
|
||||
transferred in the migration sequence.
|
||||
|
||||
3. Capture the state of the source targeting, the EQs configuration
|
||||
and the state of thread interrupt context registers.
|
||||
|
||||
Restore is similar :
|
||||
|
||||
1. Restore the EQ configuration. As targeting depends on it.
|
||||
2. Restore targeting
|
||||
3. Restore the thread interrupt contexts
|
||||
4. Restore the source states
|
||||
5. Let the vCPU run
|
Référencer dans un nouveau ticket
Bloquer un utilisateur