Add support for handling 52bit guest physical address to the
VGIC layer. So far we have limited the guest physical address
to 48bits, by explicitly masking the upper bits. This patch
removes the restriction. We do not have to check if the host
supports 52bit as the gpa is always validated during an access.
(e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()).
Also, the ITS table save-restore is also not affected with
the enhancement. The DTE entries already store the bits[51:8]
of the ITT_addr (with a 256byte alignment).
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <cdall@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[ Macro clean ups, fix PROPBASER and PENDBASER accesses ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Right now the stage2 page table for a VM is hard coded, assuming
an IPA of 40bits. As we are about to add support for per VM IPA,
prepare the stage2 page table helpers to accept the kvm instance
to make the right decision for the VM. No functional changes.
Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves
some of the definitions in arm32 to align with the arm64.
Also drop the _AC() specifier constants wherever possible.
Cc: Christoffer Dall <cdall@kernel.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
kvm_vgic_sync_hwstate is only called with IRQ being disabled.
There is thus no need to call spin_lock_irqsave/restore in
vgic_fold_lr_state and vgic_prune_ap_list.
This patch replace them with the non irq-safe version.
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3,
so let's move it to vgic.h
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Although vgic-v3 now supports Group0 interrupts, it still doesn't
deal with Group0 SGIs. As usually with the GIC, nothing is simple:
- ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1
with KVM (as per 8.1.10, Non-secure EL1 access)
- ICC_SGI0R can only generate Group0 SGIs
- ICC_ASGI1R sees its scope refocussed to generate only Group0
SGIs (as per the note at the bottom of Table 8-14)
We only support Group1 SGIs so far, so no material change.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
It's possible for userspace to control n. Sanitize n when using it as an
array index, to inhibit the potential spectre-v1 write gadget.
Note that while it appears that n must be bound to the interval [0,3]
due to the way it is extracted from addr, we cannot guarantee that
compiler transformations (and/or future refactoring) will ensure this is
the case, and given this is a slow path it's better to always perform
the masking.
Found by smatch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Simply letting IGROUPR be writable from userspace would break
migration from old kernels to newer kernels, because old kernels
incorrectly report interrupt groups as group 1. This would not be a big
problem if userspace wrote GICD_IIDR as read from the kernel, because we
could detect the incompatibility and return an error to userspace.
Unfortunately, this is not the case with current userspace
implementations and simply letting IGROUPR be writable from userspace for
an emulated GICv2 silently breaks migration and causes the destination
VM to no longer run after migration.
We now encourage userspace to write the read and expected value of
GICD_IIDR as the first part of a GIC register restore, and if we observe
a write to GICD_IIDR we know that userspace has been updated and has had
a chance to cope with older kernels (VGICv2 IIDR.Revision == 0)
incorrectly reporting interrupts as group 1, and therefore we now allow
groups to be user writable.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Implement the required MMIO accessors for GICv2 and GICv3 for the
IGROUPR distributor and redistributor registers.
This can allow guests to change behavior compared to running on previous
versions of KVM, but only to align with the architecture and hardware
implementations.
This also allows userspace to configure the interrupts groups for GICv3.
We don't allow userspace to write the groups on GICv2 just yet, because
that would result in GICv2 guests not receiving interrupts after
migrating from an older kernel that exposes GICv2 interrupts as group 1.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If userspace attempts to write a GICD_IIDR that does not match the
kernel version, return an error to userspace. The intention is to allow
implementation changes inside KVM while avoiding silently breaking
migration resulting in guests not running without any clear indication
of what went wrong.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently we do not allow any vgic mmio write operations to fail, which
makes sense from mmio traps from the guest. However, we should be able
to report failures to userspace, if userspace writes incompatible values
to read-only registers. Rework the internal interface to allow errors
to be returned on the write side for userspace writes.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now when we have a group configuration on the struct IRQ, use this state
when populating the LR and signaling interrupts as either group 0 or
group 1 to the VM. Depending on the model of the emulated GIC, and the
guest's configuration of the VMCR, interrupts may be signaled as IRQs or
FIQs to the VM.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for proper group 0 and group 1 support in the vgic, we
add a field in the struct irq to store the group of all interrupts.
We initialize the group to group 0 when emulating GICv2 and to group 1
when emulating GICv3, just like we treat them today. LPIs are always
group 1. We also continue to ignore writes from the guest, preserving
existing functionality, for now.
Finally, we also add this field to the vgic debug logic to show the
group for all interrupts.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We currently don't support grouping in the emulated VGIC, which is a
known defect on KVM (not hurting any currently used guests as far as
we're aware). This is currently handled by treating all interrupts as
group 0 interrupts for an emulated GICv2 and always signaling interrupts
as group 0 to the virtual CPU interface.
However, when reading which group interrupts belongs to in the guest
from the emulated VGIC, the VGIC currently reports group 1 instead of
group 0, which is misleading. Fix this temporarily before introducing
full group support by changing the hander to _raz instead of _rao.
Fixes: fb848db396 "KVM: arm/arm64: vgic-new: Add GICv2 MMIO handling framework"
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we are about to tweak implementation aspects of the VGIC emulation,
while still preserving some level of backwards compatibility support,
add a field to keep track of the implementation revision field which is
reported to the VM and to userspace.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Instead of hardcoding the shifts and masks in the GICD_IIDR register
emulation, let's add the definition of these fields to the GIC header
files and use them.
This will make things more obvious when we're going to bump the revision
in the IIDR when we'll make guest-visible changes to the implementation.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The vgic debugfs file only knows about SGI/PPI/SPI interrupts, and
completely ignores LPIs. Let's fix that.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The vgic_init function can race with kvm_arch_vcpu_create() which does
not hold kvm_lock() and we therefore have no synchronization primitives
to ensure we're doing the right thing.
As the user is trying to initialize or run the VM while at the same time
creating more VCPUs, we just have to refuse to initialize the VGIC in
this case rather than silently failing with a broken VCPU.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When booting a 64 KB pages kernel on a ACPI GICv3 system that
implements support for v2 emulation, the following warning is
produced
GICV size 0x2000 not a multiple of page size 0x10000
and support for v2 emulation is disabled, preventing GICv2 VMs
from being able to run on such hosts.
The reason is that vgic_v3_probe() performs a sanity check on the
size of the window (it should be a multiple of the page size),
while the ACPI MADT parsing code hardcodes the size of the window
to 8 KB. This makes sense, considering that ACPI does not bother
to describe the size in the first place, under the assumption that
platforms implementing ACPI will follow the architecture and not
put anything else in the same 64 KB window.
So let's just drop the sanity check altogether, and assume that
the window is at least 64 KB in size.
Fixes: 9097773245 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pull more overflow updates from Kees Cook:
"The rest of the overflow changes for v4.18-rc1.
This includes the explicit overflow fixes from Silvio, further
struct_size() conversions from Matthew, and a bug fix from Dan.
But the bulk of it is the treewide conversions to use either the
2-factor argument allocators (e.g. kmalloc(a * b, ...) into
kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
b) into vmalloc(array_size(a, b)).
Coccinelle was fighting me on several fronts, so I've done a bunch of
manual whitespace updates in the patches as well.
Summary:
- Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed
(Kees)"
* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
treewide: Use array_size in f2fs_kvzalloc()
treewide: Use array_size() in f2fs_kzalloc()
treewide: Use array_size() in f2fs_kmalloc()
treewide: Use array_size() in sock_kmalloc()
treewide: Use array_size() in kvzalloc_node()
treewide: Use array_size() in vzalloc_node()
treewide: Use array_size() in vzalloc()
treewide: Use array_size() in vmalloc()
treewide: devm_kzalloc() -> devm_kcalloc()
treewide: devm_kmalloc() -> devm_kmalloc_array()
treewide: kvzalloc() -> kvcalloc()
treewide: kvmalloc() -> kvmalloc_array()
treewide: kzalloc_node() -> kcalloc_node()
treewide: kzalloc() -> kcalloc()
treewide: kmalloc() -> kmalloc_array()
mm: Introduce kvcalloc()
video: uvesafb: Fix integer overflow in allocation
UBIFS: Fix potential integer overflow in allocation
leds: Use struct_size() in allocation
Convert intel uncore to struct_size
...
Now all the internals are ready to handle multiple redistributor
regions, let's allow the userspace to register them.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
On vcpu first run, we eventually know the actual number of vcpus.
This is a synchronization point to check all redistributors
were assigned. On kvm_vgic_map_resources() we check both dist and
redist were set, eventually check potential base address inconsistencies.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we are going to register several redist regions,
vgic_register_all_redist_iodevs() may be called several times. We need
to register a redist_iodev for a given vcpu only once. So let's
check if the base address has already been set. Initialize this latter
in kvm_vgic_vcpu_init().
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
kvm_vgic_vcpu_early_init gets called after kvm_vgic_cpu_init which
is confusing. The call path is as follows:
kvm_vm_ioctl_create_vcpu
|_ kvm_arch_cpu_create
|_ kvm_vcpu_init
|_ kvm_arch_vcpu_init
|_ kvm_vgic_vcpu_init
|_ kvm_arch_vcpu_postcreate
|_ kvm_vgic_vcpu_early_init
Static initialization currently done in kvm_vgic_vcpu_early_init()
can be moved to kvm_vgic_vcpu_init(). So let's move the code and
remove kvm_vgic_vcpu_early_init(). kvm_arch_vcpu_postcreate() does
nothing.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We introduce a new helper that creates and inserts a new redistributor
region into the rdist region list. This helper both handles the case
where the redistributor region size is known at registration time
and the legacy case where it is not (eventually depending on the number
of online vcpus). Depending on pfns, we perform all the possible checks
that we can do:
- end of memory crossing
- incorrect alignment of the base address
- collision with distributor region if already defined
- collision with already registered rdist regions
- check of the new index
Rdist regions must be inserted by increasing order of indices. Indices
must be contiguous.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
vgic_v3_check_base() currently only handles the case of a unique
legacy redistributor region whose size is not explicitly set but
inferred, instead, from the number of online vcpus.
We adapt it to handle the case of multiple redistributor regions
with explicitly defined size. We rely on two new helpers:
- vgic_v3_rdist_overlap() is used to detect overlap with the dist
region if defined
- vgic_v3_rd_region_size computes the size of the redist region,
would it be a legacy unique region or a new explicitly sized
region.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The TYPER of an redistributor reflects whether the rdist is
the last one of the redistributor region. Let's compare the TYPER
GPA against the address of the last occupied slot within the
redistributor region.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We introduce vgic_v3_rdist_free_slot to help identifying
where we can place a new 2x64KB redistributor.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
At the moment KVM supports a single rdist region. We want to
support several separate rdist regions so let's introduce a list
of them. This patch currently only cares about a single
entry in this list as the functionality to register several redist
regions is not yet there. So this only translates the existing code
into something functionally similar using that new data struct.
The redistributor region handle is stored in the vgic_cpu structure
to allow later computation of the TYPER last bit.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
in case kvm_vgic_map_resources() fails, typically if the vgic
distributor is not defined, __kvm_vgic_destroy will be called
several times. Indeed kvm_vgic_map_resources() is called on
first vcpu run. As a result dist->spis is freeed more than once
and on the second time it causes a "kernel BUG at mm/slub.c:3912!"
Set dist->spis to NULL to avoid the crash.
Fixes: ad275b8bb1 ("KVM: arm/arm64: vgic-new: vgic_init: implement
vgic_init")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Use the newly introduced wrapper for that.
Cc: Stable <stable@vger.kernel.org> # 4.12+
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Provide a wrapper which does that and use that everywhere.
Note that ending the SRCU critical section before returning from the
kvm_read_guest() wrapper is safe, because the data has been *copied*, so
we don't need to rely on valid references to the memslot anymore.
Cc: Stable <stable@vger.kernel.org> # 4.8+
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Apparently the development of update_affinity() overlapped with the
promotion of irq_lock to be _irqsave, so the patch didn't convert this
lock over. This will make lockdep complain.
Fix this by disabling IRQs around the lock.
Cc: stable@vger.kernel.org
Fixes: 08c9fd0421 ("KVM: arm/arm64: vITS: Add a helper to update the affinity of an LPI")
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As Jan reported [1], lockdep complains about the VGIC not being bullet
proof. This seems to be due to two issues:
- When commit 006df0f349 ("KVM: arm/arm64: Support calling
vgic_update_irq_pending from irq context") promoted irq_lock and
ap_list_lock to _irqsave, we forgot two instances of irq_lock.
lockdeps seems to pick those up.
- If a lock is _irqsave, any other locks we take inside them should be
_irqsafe as well. So the lpi_list_lock needs to be promoted also.
This fixes both issues by simply making the remaining instances of those
locks _irqsave.
One irq_lock is addressed in a separate patch, to simplify backporting.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/575718.html
Cc: stable@vger.kernel.org
Fixes: 006df0f349 ("KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context")
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/arm fixes for 4.17, take #2
- Fix proxying of GICv2 CPU interface accesses
- Fix crash when switching to BE
- Track source vcpu git GICv2 SGIs
- Fix an outdated bit of documentation
One comment still mentioned process_maintenance operations after
commit af0614991a ("KVM: arm/arm64: vgic: Get rid of unnecessary
process_maintenance operation")
Update the comment to point to vgic_fold_lr_state instead, which
is where maintenance interrupts are taken care of.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pull KVM fixes from Radim Krčmář:
"ARM:
- PSCI selection API, a leftover from 4.16 (for stable)
- Kick vcpu on active interrupt affinity change
- Plug a VMID allocation race on oversubscribed systems
- Silence debug messages
- Update Christoffer's email address (linaro -> arm)
x86:
- Expose userspace-relevant bits of a newly added feature
- Fix TLB flushing on VMX with VPID, but without EPT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI
kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use
arm/arm64: KVM: Add PSCI version selection API
KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration
arm64: KVM: Demote SVE and LORegion warnings to debug only
MAINTAINERS: Update e-mail address for Christoffer Dall
KVM: arm/arm64: Close VMID generation race
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
It's possible for userspace to control n. Sanitize n when using it as an
array index.
Note that while it appears that n must be bound to the interval [0,3]
due to the way it is extracted from addr, we cannot guarantee that
compiler transformations (and/or future refactoring) will ensure this is
the case, and given this is a slow path it's better to always perform
the masking.
Found by smatch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Will Deacon <will.deacon@arm.com>
When vgic_prune_ap_list() finds an interrupt that needs to be migrated
to a new VCPU, we should notify this VCPU of the pending interrupt,
since it requires immediate action.
Kick this VCPU once we have added the new IRQ to the list, but only
after dropping the locks.
Reported-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
vgic_copy_lpi_list() parses the LPI list and picks LPIs targeting
a given vcpu. We allocate the array containing the intids before taking
the lpi_list_lock, which means we can have an array size that is not
equal to the number of LPIs.
This is particularly obvious when looking at the path coming from
vgic_enable_lpis, which is not a command, and thus can run in parallel
with commands:
vcpu 0: vcpu 1:
vgic_enable_lpis
its_sync_lpi_pending_table
vgic_copy_lpi_list
intids = kmalloc_array(irq_count)
MAPI(lpi targeting vcpu 0)
list_for_each_entry(lpi_list_head)
intids[i++] = irq->intid;
At that stage, we will happily overrun the intids array. Boo. An easy
fix is is to break once the array is full. The MAPI command will update
the config anyway, and we won't miss a thing. We also make sure that
lpi_list_count is read exactly once, so that further updates of that
value will not affect the array bound check.
Cc: stable@vger.kernel.org
Fixes: ccb1d791ab ("KVM: arm64: vgic-its: Fix pending table sync")
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
It was recently reported that VFIO mediated devices, and anything
that VFIO exposes as level interrupts, do no strictly follow the
expected logic of such interrupts as it only lowers the input
line when the guest has EOId the interrupt at the GIC level, rather
than when it Acked the interrupt at the device level.
THe GIC's Active+Pending state is fundamentally incompatible with
this behaviour, as it prevents KVM from observing the EOI, and in
turn results in VFIO never dropping the line. This results in an
interrupt storm in the guest, which it really never expected.
As we cannot really change VFIO to follow the strict rules of level
signalling, let's forbid the A+P state altogether, as it is in the
end only an optimization. It ensures that we will transition via
an invalid state, which we can use to notify VFIO of the EOI.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we're about to change the way we map devices at HYP, we need
to move away from kern_hyp_va on an IO address.
One way of achieving this is to store the VAs in kvm_vgic_global_state,
and use that directly from the HYP code. This requires a small change
to create_hyp_io_mappings so that it can also return a HYP VA.
We take this opportunity to nuke the vctrl_base field in the emulated
distributor, as it is not used anymore.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Both HYP io mappings call ioremap, followed by create_hyp_io_mappings.
Let's move the ioremap call into create_hyp_io_mappings itself, which
simplifies the code a bit and allows for further refactoring.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We can finally get completely rid of any calls to the VGICv3
save/restore functions when the AP lists are empty on VHE systems. This
requires carefully factoring out trap configuration from saving and
restoring state, and carefully choosing what to do on the VHE and
non-VHE path.
One of the challenges is that we cannot save/restore the VMCR lazily
because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when
emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end
up being delivered as FIQ.
To solve this problem, and still provide fast performance in the fast
path of exiting a VM when no interrupts are pending (which also
optimized the latency for actually delivering virtual interrupts coming
from physical interrupts), we orchestrate a dance of only doing the
activate/deactivate traps in vgic load/put for VHE systems (which can
have ICC_SRE_EL1.SRE cleared when running in the host), and doing the
configuration on every round-trip on non-VHE systems.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>