vcpu.rst 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ======================
  3. Generic vcpu interface
  4. ======================
  5. The virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR,
  6. KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct
  7. kvm_device_attr as other devices, but targets VCPU-wide settings and controls.
  8. The groups and attributes per virtual cpu, if any, are architecture specific.
  9. 1. GROUP: KVM_ARM_VCPU_PMU_V3_CTRL
  10. ==================================
  11. :Architectures: ARM64
  12. 1.1. ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_IRQ
  13. ---------------------------------------
  14. :Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
  15. pointer to an int
  16. Returns:
  17. ======= ========================================================
  18. -EBUSY The PMU overflow interrupt is already set
  19. -EFAULT Error reading interrupt number
  20. -ENXIO PMUv3 not supported or the overflow interrupt not set
  21. when attempting to get it
  22. -ENODEV KVM_ARM_VCPU_PMU_V3 feature missing from VCPU
  23. -EINVAL Invalid PMU overflow interrupt number supplied or
  24. trying to set the IRQ number without using an in-kernel
  25. irqchip.
  26. ======= ========================================================
  27. A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
  28. number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
  29. type must be same for each vcpu. As a PPI, the interrupt number is the same for
  30. all vcpus, while as an SPI it must be a separate number per vcpu.
  31. 1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
  32. ---------------------------------------
  33. :Parameters: no additional parameter in kvm_device_attr.addr
  34. Returns:
  35. ======= ======================================================
  36. -EEXIST Interrupt number already used
  37. -ENODEV PMUv3 not supported or GIC not initialized
  38. -ENXIO PMUv3 not supported, missing VCPU feature or interrupt
  39. number not set
  40. -EBUSY PMUv3 already initialized
  41. ======= ======================================================
  42. Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel
  43. virtual GIC implementation, this must be done after initializing the in-kernel
  44. irqchip.
  45. 1.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER
  46. -----------------------------------------
  47. :Parameters: in kvm_device_attr.addr the address for a PMU event filter is a
  48. pointer to a struct kvm_pmu_event_filter
  49. :Returns:
  50. ======= ======================================================
  51. -ENODEV PMUv3 not supported or GIC not initialized
  52. -ENXIO PMUv3 not properly configured or in-kernel irqchip not
  53. configured as required prior to calling this attribute
  54. -EBUSY PMUv3 already initialized or a VCPU has already run
  55. -EINVAL Invalid filter range
  56. ======= ======================================================
  57. Request the installation of a PMU event filter described as follows::
  58. struct kvm_pmu_event_filter {
  59. __u16 base_event;
  60. __u16 nevents;
  61. #define KVM_PMU_EVENT_ALLOW 0
  62. #define KVM_PMU_EVENT_DENY 1
  63. __u8 action;
  64. __u8 pad[3];
  65. };
  66. A filter range is defined as the range [@base_event, @base_event + @nevents),
  67. together with an @action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The
  68. first registered range defines the global policy (global ALLOW if the first
  69. @action is DENY, global DENY if the first @action is ALLOW). Multiple ranges
  70. can be programmed, and must fit within the event space defined by the PMU
  71. architecture (10 bits on ARMv8.0, 16 bits from ARMv8.1 onwards).
  72. Note: "Cancelling" a filter by registering the opposite action for the same
  73. range doesn't change the default action. For example, installing an ALLOW
  74. filter for event range [0:10) as the first filter and then applying a DENY
  75. action for the same range will leave the whole range as disabled.
  76. Restrictions: Event 0 (SW_INCR) is never filtered, as it doesn't count a
  77. hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
  78. isn't strictly speaking an event. Filtering the cycle counter is possible
  79. using event 0x11 (CPU_CYCLES).
  80. 1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
  81. ------------------------------------------
  82. :Parameters: in kvm_device_attr.addr the address to an int representing the PMU
  83. identifier.
  84. :Returns:
  85. ======= ====================================================
  86. -EBUSY PMUv3 already initialized, a VCPU has already run or
  87. an event filter has already been set
  88. -EFAULT Error accessing the PMU identifier
  89. -ENXIO PMU not found
  90. -ENODEV PMUv3 not supported or GIC not initialized
  91. -ENOMEM Could not allocate memory
  92. ======= ====================================================
  93. Request that the VCPU uses the specified hardware PMU when creating guest events
  94. for the purpose of PMU emulation. The PMU identifier can be read from the "type"
  95. file for the desired PMU instance under /sys/devices (or, equivalent,
  96. /sys/bus/even_source). This attribute is particularly useful on heterogeneous
  97. systems where there are at least two CPU PMUs on the system. The PMU that is set
  98. for one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU
  99. if a PMU event filter is already present.
  100. Note that KVM will not make any attempts to run the VCPU on the physical CPUs
  101. associated with the PMU specified by this attribute. This is entirely left to
  102. userspace. However, attempting to run the VCPU on a physical CPU not supported
  103. by the PMU will fail and KVM_RUN will return with
  104. exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting
  105. hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and
  106. the cpu field to the processor id.
  107. 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
  108. =================================
  109. :Architectures: ARM64
  110. 2.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_VTIMER, KVM_ARM_VCPU_TIMER_IRQ_PTIMER
  111. -----------------------------------------------------------------------------
  112. :Parameters: in kvm_device_attr.addr the address for the timer interrupt is a
  113. pointer to an int
  114. Returns:
  115. ======= =================================
  116. -EINVAL Invalid timer interrupt number
  117. -EBUSY One or more VCPUs has already run
  118. ======= =================================
  119. A value describing the architected timer interrupt number when connected to an
  120. in-kernel virtual GIC. These must be a PPI (16 <= intid < 32). Setting the
  121. attribute overrides the default values (see below).
  122. ============================= ==========================================
  123. KVM_ARM_VCPU_TIMER_IRQ_VTIMER The EL1 virtual timer intid (default: 27)
  124. KVM_ARM_VCPU_TIMER_IRQ_PTIMER The EL1 physical timer intid (default: 30)
  125. ============================= ==========================================
  126. Setting the same PPI for different timers will prevent the VCPUs from running.
  127. Setting the interrupt number on a VCPU configures all VCPUs created at that
  128. time to use the number provided for a given timer, overwriting any previously
  129. configured values on other VCPUs. Userspace should configure the interrupt
  130. numbers on at least one VCPU after creating all VCPUs and before running any
  131. VCPUs.
  132. 3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
  133. ==================================
  134. :Architectures: ARM64
  135. 3.1 ATTRIBUTE: KVM_ARM_VCPU_PVTIME_IPA
  136. --------------------------------------
  137. :Parameters: 64-bit base address
  138. Returns:
  139. ======= ======================================
  140. -ENXIO Stolen time not implemented
  141. -EEXIST Base address already set for this VCPU
  142. -EINVAL Base address not 64 byte aligned
  143. ======= ======================================
  144. Specifies the base address of the stolen time structure for this VCPU. The
  145. base address must be 64 byte aligned and exist within a valid guest memory
  146. region. See Documentation/virt/kvm/arm/pvtime.rst for more information
  147. including the layout of the stolen time structure.
  148. 4. GROUP: KVM_VCPU_TSC_CTRL
  149. ===========================
  150. :Architectures: x86
  151. 4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
  152. :Parameters: 64-bit unsigned TSC offset
  153. Returns:
  154. ======= ======================================
  155. -EFAULT Error reading/writing the provided
  156. parameter address.
  157. -ENXIO Attribute not supported
  158. ======= ======================================
  159. Specifies the guest's TSC offset relative to the host's TSC. The guest's
  160. TSC is then derived by the following equation:
  161. guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
  162. This attribute is useful to adjust the guest's TSC on live migration,
  163. so that the TSC counts the time during which the VM was paused. The
  164. following describes a possible algorithm to use for this purpose.
  165. From the source VMM process:
  166. 1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src),
  167. kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds
  168. (host_src).
  169. 2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
  170. guest TSC offset (ofs_src[i]).
  171. 3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
  172. guest's TSC (freq).
  173. From the destination VMM process:
  174. 4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from
  175. kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective
  176. fields. Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
  177. structure.
  178. KVM will advance the VM's kvmclock to account for elapsed time since
  179. recording the clock values. Note that this will cause problems in
  180. the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized
  181. between the source and destination, and a reasonably short time passes
  182. between the source pausing the VMs and the destination executing
  183. steps 4-7.
  184. 5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and
  185. kvmclock nanoseconds (guest_dest).
  186. 6. Adjust the guest TSC offsets for every vCPU to account for (1) time
  187. elapsed since recording state and (2) difference in TSCs between the
  188. source and destination machine:
  189. ofs_dst[i] = ofs_src[i] -
  190. (guest_src - guest_dest) * freq +
  191. (tsc_src - tsc_dest)
  192. ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to
  193. a time of 0 in kvmclock. The above formula ensures that it is the
  194. same on the destination as it was on the source).
  195. 7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
  196. respective value derived in the previous step.