xive.rst 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ===========================================================
  3. POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
  4. ===========================================================
  5. Device types supported:
  6. - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1
  7. This device acts as a VM interrupt controller. It provides the KVM
  8. interface to configure the interrupt sources of a VM in the underlying
  9. POWER9 XIVE interrupt controller.
  10. Only one XIVE instance may be instantiated. A guest XIVE device
  11. requires a POWER9 host and the guest OS should have support for the
  12. XIVE native exploitation interrupt mode. If not, it should run using
  13. the legacy interrupt mode, referred as XICS (POWER7/8).
  14. * Device Mappings
  15. The KVM device exposes different MMIO ranges of the XIVE HW which
  16. are required for interrupt management. These are exposed to the
  17. guest in VMAs populated with a custom VM fault handler.
  18. 1. Thread Interrupt Management Area (TIMA)
  19. Each thread has an associated Thread Interrupt Management context
  20. composed of a set of registers. These registers let the thread
  21. handle priority management and interrupt acknowledgment. The most
  22. important are :
  23. - Interrupt Pending Buffer (IPB)
  24. - Current Processor Priority (CPPR)
  25. - Notification Source Register (NSR)
  26. They are exposed to software in four different pages each proposing
  27. a view with a different privilege. The first page is for the
  28. physical thread context and the second for the hypervisor. Only the
  29. third (operating system) and the fourth (user level) are exposed the
  30. guest.
  31. 2. Event State Buffer (ESB)
  32. Each source is associated with an Event State Buffer (ESB) with
  33. either a pair of even/odd pair of pages which provides commands to
  34. manage the source: to trigger, to EOI, to turn off the source for
  35. instance.
  36. 3. Device pass-through
  37. When a device is passed-through into the guest, the source
  38. interrupts are from a different HW controller (PHB4) and the ESB
  39. pages exposed to the guest should accommadate this change.
  40. The passthru_irq helpers, kvmppc_xive_set_mapped() and
  41. kvmppc_xive_clr_mapped() are called when the device HW irqs are
  42. mapped into or unmapped from the guest IRQ number space. The KVM
  43. device extends these helpers to clear the ESB pages of the guest IRQ
  44. number being mapped and then lets the VM fault handler repopulate.
  45. The handler will insert the ESB page corresponding to the HW
  46. interrupt of the device being passed-through or the initial IPI ESB
  47. page if the device has being removed.
  48. The ESB remapping is fully transparent to the guest and the OS
  49. device driver. All handling is done within VFIO and the above
  50. helpers in KVM-PPC.
  51. * Groups:
  52. 1. KVM_DEV_XIVE_GRP_CTRL
  53. Provides global controls on the device
  54. Attributes:
  55. 1.1 KVM_DEV_XIVE_RESET (write only)
  56. Resets the interrupt controller configuration for sources and event
  57. queues. To be used by kexec and kdump.
  58. Errors: none
  59. 1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
  60. Sync all the sources and queues and mark the EQ pages dirty. This
  61. to make sure that a consistent memory state is captured when
  62. migrating the VM.
  63. Errors: none
  64. 1.3 KVM_DEV_XIVE_NR_SERVERS (write only)
  65. The kvm_device_attr.addr points to a __u32 value which is the number of
  66. interrupt server numbers (ie, highest possible vcpu id plus one).
  67. Errors:
  68. ======= ==========================================
  69. -EINVAL Value greater than KVM_MAX_VCPU_IDS.
  70. -EFAULT Invalid user pointer for attr->addr.
  71. -EBUSY A vCPU is already connected to the device.
  72. ======= ==========================================
  73. 2. KVM_DEV_XIVE_GRP_SOURCE (write only)
  74. Initializes a new source in the XIVE device and mask it.
  75. Attributes:
  76. Interrupt source number (64-bit)
  77. The kvm_device_attr.addr points to a __u64 value::
  78. bits: | 63 .... 2 | 1 | 0
  79. values: | unused | level | type
  80. - type: 0:MSI 1:LSI
  81. - level: assertion level in case of an LSI.
  82. Errors:
  83. ======= ==========================================
  84. -E2BIG Interrupt source number is out of range
  85. -ENOMEM Could not create a new source block
  86. -EFAULT Invalid user pointer for attr->addr.
  87. -ENXIO Could not allocate underlying HW interrupt
  88. ======= ==========================================
  89. 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
  90. Configures source targeting
  91. Attributes:
  92. Interrupt source number (64-bit)
  93. The kvm_device_attr.addr points to a __u64 value::
  94. bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0
  95. values: | eisn | mask | server | priority
  96. - priority: 0-7 interrupt priority level
  97. - server: CPU number chosen to handle the interrupt
  98. - mask: mask flag (unused)
  99. - eisn: Effective Interrupt Source Number
  100. Errors:
  101. ======= =======================================================
  102. -ENOENT Unknown source number
  103. -EINVAL Not initialized source number
  104. -EINVAL Invalid priority
  105. -EINVAL Invalid CPU number.
  106. -EFAULT Invalid user pointer for attr->addr.
  107. -ENXIO CPU event queues not configured or configuration of the
  108. underlying HW interrupt failed
  109. -EBUSY No CPU available to serve interrupt
  110. ======= =======================================================
  111. 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
  112. Configures an event queue of a CPU
  113. Attributes:
  114. EQ descriptor identifier (64-bit)
  115. The EQ descriptor identifier is a tuple (server, priority)::
  116. bits: | 63 .... 32 | 31 .. 3 | 2 .. 0
  117. values: | unused | server | priority
  118. The kvm_device_attr.addr points to::
  119. struct kvm_ppc_xive_eq {
  120. __u32 flags;
  121. __u32 qshift;
  122. __u64 qaddr;
  123. __u32 qtoggle;
  124. __u32 qindex;
  125. __u8 pad[40];
  126. };
  127. - flags: queue flags
  128. KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
  129. forces notification without using the coalescing mechanism
  130. provided by the XIVE END ESBs.
  131. - qshift: queue size (power of 2)
  132. - qaddr: real address of queue
  133. - qtoggle: current queue toggle bit
  134. - qindex: current queue index
  135. - pad: reserved for future use
  136. Errors:
  137. ======= =========================================
  138. -ENOENT Invalid CPU number
  139. -EINVAL Invalid priority
  140. -EINVAL Invalid flags
  141. -EINVAL Invalid queue size
  142. -EINVAL Invalid queue address
  143. -EFAULT Invalid user pointer for attr->addr.
  144. -EIO Configuration of the underlying HW failed
  145. ======= =========================================
  146. 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
  147. Synchronize the source to flush event notifications
  148. Attributes:
  149. Interrupt source number (64-bit)
  150. Errors:
  151. ======= =============================
  152. -ENOENT Unknown source number
  153. -EINVAL Not initialized source number
  154. ======= =============================
  155. * VCPU state
  156. The XIVE IC maintains VP interrupt state in an internal structure
  157. called the NVT. When a VP is not dispatched on a HW processor
  158. thread, this structure can be updated by HW if the VP is the target
  159. of an event notification.
  160. It is important for migration to capture the cached IPB from the NVT
  161. as it synthesizes the priorities of the pending interrupts. We
  162. capture a bit more to report debug information.
  163. KVM_REG_PPC_VP_STATE (2 * 64bits)::
  164. bits: | 63 .... 32 | 31 .... 0 |
  165. values: | TIMA word0 | TIMA word1 |
  166. bits: | 127 .......... 64 |
  167. values: | unused |
  168. * Migration:
  169. Saving the state of a VM using the XIVE native exploitation mode
  170. should follow a specific sequence. When the VM is stopped :
  171. 1. Mask all sources (PQ=01) to stop the flow of events.
  172. 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
  173. flush any in-flight event notification and to stabilize the EQs. At
  174. this stage, the EQ pages are marked dirty to make sure they are
  175. transferred in the migration sequence.
  176. 3. Capture the state of the source targeting, the EQs configuration
  177. and the state of thread interrupt context registers.
  178. Restore is similar:
  179. 1. Restore the EQ configuration. As targeting depends on it.
  180. 2. Restore targeting
  181. 3. Restore the thread interrupt contexts
  182. 4. Restore the source states
  183. 5. Let the vCPU run