Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Changes for v4.4-rc1 Includes a number of fixes for the arch-timer, introducing proper level-triggered semantics for the arch-timers, a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups getting rid of redundant state, and finally a stylistic change that gets rid of some ctags warnings. Conflicts: arch/x86/include/asm/kvm_host.h
This commit is contained in:
187
Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
Normal file
187
Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
Normal file
@@ -0,0 +1,187 @@
|
||||
KVM/ARM VGIC Forwarded Physical Interrupts
|
||||
==========================================
|
||||
|
||||
The KVM/ARM code implements software support for the ARM Generic
|
||||
Interrupt Controller's (GIC's) hardware support for virtualization by
|
||||
allowing software to inject virtual interrupts to a VM, which the guest
|
||||
OS sees as regular interrupts. The code is famously known as the VGIC.
|
||||
|
||||
Some of these virtual interrupts, however, correspond to physical
|
||||
interrupts from real physical devices. One example could be the
|
||||
architected timer, which itself supports virtualization, and therefore
|
||||
lets a guest OS program the hardware device directly to raise an
|
||||
interrupt at some point in time. When such an interrupt is raised, the
|
||||
host OS initially handles the interrupt and must somehow signal this
|
||||
event as a virtual interrupt to the guest. Another example could be a
|
||||
passthrough device, where the physical interrupts are initially handled
|
||||
by the host, but the device driver for the device lives in the guest OS
|
||||
and KVM must therefore somehow inject a virtual interrupt on behalf of
|
||||
the physical one to the guest OS.
|
||||
|
||||
These virtual interrupts corresponding to a physical interrupt on the
|
||||
host are called forwarded physical interrupts, but are also sometimes
|
||||
referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
|
||||
|
||||
Forwarded physical interrupts are handled slightly differently compared
|
||||
to virtual interrupts generated purely by a software emulated device.
|
||||
|
||||
|
||||
The HW bit
|
||||
----------
|
||||
Virtual interrupts are signalled to the guest by programming the List
|
||||
Registers (LRs) on the GIC before running a VCPU. The LR is programmed
|
||||
with the virtual IRQ number and the state of the interrupt (Pending,
|
||||
Active, or Pending+Active). When the guest ACKs and EOIs a virtual
|
||||
interrupt, the LR state moves from Pending to Active, and finally to
|
||||
inactive.
|
||||
|
||||
The LRs include an extra bit, called the HW bit. When this bit is set,
|
||||
KVM must also program an additional field in the LR, the physical IRQ
|
||||
number, to link the virtual with the physical IRQ.
|
||||
|
||||
When the HW bit is set, KVM must EITHER set the Pending OR the Active
|
||||
bit, never both at the same time.
|
||||
|
||||
Setting the HW bit causes the hardware to deactivate the physical
|
||||
interrupt on the physical distributor when the guest deactivates the
|
||||
corresponding virtual interrupt.
|
||||
|
||||
|
||||
Forwarded Physical Interrupts Life Cycle
|
||||
----------------------------------------
|
||||
|
||||
The state of forwarded physical interrupts is managed in the following way:
|
||||
|
||||
- The physical interrupt is acked by the host, and becomes active on
|
||||
the physical distributor (*).
|
||||
- KVM sets the LR.Pending bit, because this is the only way the GICV
|
||||
interface is going to present it to the guest.
|
||||
- LR.Pending will stay set as long as the guest has not acked the interrupt.
|
||||
- LR.Pending transitions to LR.Active on the guest read of the IAR, as
|
||||
expected.
|
||||
- On guest EOI, the *physical distributor* active bit gets cleared,
|
||||
but the LR.Active is left untouched (set).
|
||||
- KVM clears the LR on VM exits when the physical distributor
|
||||
active state has been cleared.
|
||||
|
||||
(*): The host handling is slightly more complicated. For some forwarded
|
||||
interrupts (shared), KVM directly sets the active state on the physical
|
||||
distributor before entering the guest, because the interrupt is never actually
|
||||
handled on the host (see details on the timer as an example below). For other
|
||||
forwarded interrupts (non-shared) the host does not deactivate the interrupt
|
||||
when the host ISR completes, but leaves the interrupt active until the guest
|
||||
deactivates it. Leaving the interrupt active is allowed, because Linux
|
||||
configures the physical GIC with EOIMode=1, which causes EOI operations to
|
||||
perform a priority drop allowing the GIC to receive other interrupts of the
|
||||
default priority.
|
||||
|
||||
|
||||
Forwarded Edge and Level Triggered PPIs and SPIs
|
||||
------------------------------------------------
|
||||
Forwarded physical interrupts injected should always be active on the
|
||||
physical distributor when injected to a guest.
|
||||
|
||||
Level-triggered interrupts will keep the interrupt line to the GIC
|
||||
asserted, typically until the guest programs the device to deassert the
|
||||
line. This means that the interrupt will remain pending on the physical
|
||||
distributor until the guest has reprogrammed the device. Since we
|
||||
always run the VM with interrupts enabled on the CPU, a pending
|
||||
interrupt will exit the guest as soon as we switch into the guest,
|
||||
preventing the guest from ever making progress as the process repeats
|
||||
over and over. Therefore, the active state on the physical distributor
|
||||
must be set when entering the guest, preventing the GIC from forwarding
|
||||
the pending interrupt to the CPU. As soon as the guest deactivates the
|
||||
interrupt, the physical line is sampled by the hardware again and the host
|
||||
takes a new interrupt if and only if the physical line is still asserted.
|
||||
|
||||
Edge-triggered interrupts do not exhibit the same problem with
|
||||
preventing guest execution that level-triggered interrupts do. One
|
||||
option is to not use HW bit at all, and inject edge-triggered interrupts
|
||||
from a physical device as pure virtual interrupts. But that would
|
||||
potentially slow down handling of the interrupt in the guest, because a
|
||||
physical interrupt occurring in the middle of the guest ISR would
|
||||
preempt the guest for the host to handle the interrupt. Additionally,
|
||||
if you configure the system to handle interrupts on a separate physical
|
||||
core from that running your VCPU, you still have to interrupt the VCPU
|
||||
to queue the pending state onto the LR, even though the guest won't use
|
||||
this information until the guest ISR completes. Therefore, the HW
|
||||
bit should always be set for forwarded edge-triggered interrupts. With
|
||||
the HW bit set, the virtual interrupt is injected and additional
|
||||
physical interrupts occurring before the guest deactivates the interrupt
|
||||
simply mark the state on the physical distributor as Pending+Active. As
|
||||
soon as the guest deactivates the interrupt, the host takes another
|
||||
interrupt if and only if there was a physical interrupt between injecting
|
||||
the forwarded interrupt to the guest and the guest deactivating the
|
||||
interrupt.
|
||||
|
||||
Consequently, whenever we schedule a VCPU with one or more LRs with the
|
||||
HW bit set, the interrupt must also be active on the physical
|
||||
distributor.
|
||||
|
||||
|
||||
Forwarded LPIs
|
||||
--------------
|
||||
LPIs, introduced in GICv3, are always edge-triggered and do not have an
|
||||
active state. They become pending when a device signal them, and as
|
||||
soon as they are acked by the CPU, they are inactive again.
|
||||
|
||||
It therefore doesn't make sense, and is not supported, to set the HW bit
|
||||
for physical LPIs that are forwarded to a VM as virtual interrupts,
|
||||
typically virtual SPIs.
|
||||
|
||||
For LPIs, there is no other choice than to preempt the VCPU thread if
|
||||
necessary, and queue the pending state onto the LR.
|
||||
|
||||
|
||||
Putting It Together: The Architected Timer
|
||||
------------------------------------------
|
||||
The architected timer is a device that signals interrupts with level
|
||||
triggered semantics. The timer hardware is directly accessed by VCPUs
|
||||
which program the timer to fire at some point in time. Each VCPU on a
|
||||
system programs the timer to fire at different times, and therefore the
|
||||
hardware is multiplexed between multiple VCPUs. This is implemented by
|
||||
context-switching the timer state along with each VCPU thread.
|
||||
|
||||
However, this means that a scenario like the following is entirely
|
||||
possible, and in fact, typical:
|
||||
|
||||
1. KVM runs the VCPU
|
||||
2. The guest programs the time to fire in T+100
|
||||
3. The guest is idle and calls WFI (wait-for-interrupts)
|
||||
4. The hardware traps to the host
|
||||
5. KVM stores the timer state to memory and disables the hardware timer
|
||||
6. KVM schedules a soft timer to fire in T+(100 - time since step 2)
|
||||
7. KVM puts the VCPU thread to sleep (on a waitqueue)
|
||||
8. The soft timer fires, waking up the VCPU thread
|
||||
9. KVM reprograms the timer hardware with the VCPU's values
|
||||
10. KVM marks the timer interrupt as active on the physical distributor
|
||||
11. KVM injects a forwarded physical interrupt to the guest
|
||||
12. KVM runs the VCPU
|
||||
|
||||
Notice that KVM injects a forwarded physical interrupt in step 11 without
|
||||
the corresponding interrupt having actually fired on the host. That is
|
||||
exactly why we mark the timer interrupt as active in step 10, because
|
||||
the active state on the physical distributor is part of the state
|
||||
belonging to the timer hardware, which is context-switched along with
|
||||
the VCPU thread.
|
||||
|
||||
If the guest does not idle because it is busy, the flow looks like this
|
||||
instead:
|
||||
|
||||
1. KVM runs the VCPU
|
||||
2. The guest programs the time to fire in T+100
|
||||
4. At T+100 the timer fires and a physical IRQ causes the VM to exit
|
||||
(note that this initially only traps to EL2 and does not run the host ISR
|
||||
until KVM has returned to the host).
|
||||
5. With interrupts still disabled on the CPU coming back from the guest, KVM
|
||||
stores the virtual timer state to memory and disables the virtual hw timer.
|
||||
6. KVM looks at the timer state (in memory) and injects a forwarded physical
|
||||
interrupt because it concludes the timer has expired.
|
||||
7. KVM marks the timer interrupt as active on the physical distributor
|
||||
7. KVM enables the timer, enables interrupts, and runs the VCPU
|
||||
|
||||
Notice that again the forwarded physical interrupt is injected to the
|
||||
guest without having actually been handled on the host. In this case it
|
||||
is because the physical interrupt is never actually seen by the host because the
|
||||
timer is disabled upon guest return, and the virtual forwarded interrupt is
|
||||
injected on the KVM guest entry path.
|
@@ -44,28 +44,29 @@ Groups:
|
||||
Attributes:
|
||||
The attr field of kvm_device_attr encodes two values:
|
||||
bits: | 63 .... 40 | 39 .. 32 | 31 .... 0 |
|
||||
values: | reserved | cpu id | offset |
|
||||
values: | reserved | vcpu_index | offset |
|
||||
|
||||
All distributor regs are (rw, 32-bit)
|
||||
|
||||
The offset is relative to the "Distributor base address" as defined in the
|
||||
GICv2 specs. Getting or setting such a register has the same effect as
|
||||
reading or writing the register on the actual hardware from the cpu
|
||||
specified with cpu id field. Note that most distributor fields are not
|
||||
banked, but return the same value regardless of the cpu id used to access
|
||||
the register.
|
||||
reading or writing the register on the actual hardware from the cpu whose
|
||||
index is specified with the vcpu_index field. Note that most distributor
|
||||
fields are not banked, but return the same value regardless of the
|
||||
vcpu_index used to access the register.
|
||||
Limitations:
|
||||
- Priorities are not implemented, and registers are RAZ/WI
|
||||
- Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
|
||||
Errors:
|
||||
-ENODEV: Getting or setting this register is not yet supported
|
||||
-ENXIO: Getting or setting this register is not yet supported
|
||||
-EBUSY: One or more VCPUs are running
|
||||
-EINVAL: Invalid vcpu_index supplied
|
||||
|
||||
KVM_DEV_ARM_VGIC_GRP_CPU_REGS
|
||||
Attributes:
|
||||
The attr field of kvm_device_attr encodes two values:
|
||||
bits: | 63 .... 40 | 39 .. 32 | 31 .... 0 |
|
||||
values: | reserved | cpu id | offset |
|
||||
values: | reserved | vcpu_index | offset |
|
||||
|
||||
All CPU interface regs are (rw, 32-bit)
|
||||
|
||||
@@ -91,8 +92,9 @@ Groups:
|
||||
- Priorities are not implemented, and registers are RAZ/WI
|
||||
- Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
|
||||
Errors:
|
||||
-ENODEV: Getting or setting this register is not yet supported
|
||||
-ENXIO: Getting or setting this register is not yet supported
|
||||
-EBUSY: One or more VCPUs are running
|
||||
-EINVAL: Invalid vcpu_index supplied
|
||||
|
||||
KVM_DEV_ARM_VGIC_GRP_NR_IRQS
|
||||
Attributes:
|
||||
|
Reference in New Issue
Block a user