Features and fixes for 4.1 (kvm/next)
1. Assorted changes
1.1 allow more feature bits for the guest
1.2 Store breaking event address on program interrupts
2. Interrupt handling rework
2.1 Fix copy_to_user while holding a spinlock (cc stable)
2.2 Rework floating interrupts to follow the priorities
2.3 Allow to inject all local interrupts via new ioctl
2.4 allow to get/set the full local irq state, e.g. for migration
and introspection
KVM/ARM changes for v4.1:
- fixes for live migration
- irqfd support
- kvm-io-bus & vgic rework to enable ioeventfd
- page ageing for stage-2 translation
- various cleanups
This patch adds support to migrate vcpu interrupts. Two new vcpu ioctls
are added which get/set the complete status of pending interrupts in one
go. The ioctls are marked as available with the new capability
KVM_CAP_S390_IRQ_STATE.
We can not use a ONEREG, as the number of pending local interrupts is not
constant and depends on the number of CPUs.
To retrieve the interrupt state we add an ioctl KVM_S390_GET_IRQ_STATE.
Its input parameter is a pointer to a struct kvm_s390_irq_state which
has a buffer and length. For all currently pending interrupts, we copy
a struct kvm_s390_irq into the buffer and pass it to userspace.
To store interrupt state into a buffer provided by userspace, we add an
ioctl KVM_S390_SET_IRQ_STATE. It passes a struct kvm_s390_irq_state into
the kernel and injects all interrupts contained in the buffer.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
We have introduced struct kvm_s390_irq a while ago which allows to
inject all kinds of interrupts as defined in the Principles of
Operation.
Add ioctl to inject interrupts with the extended struct kvm_s390_irq
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Now that the code is in place for KVM to support MIPS SIMD Architecutre
(MSA) in MIPS guests, wire up the new KVM_CAP_MIPS_MSA capability.
For backwards compatibility, the capability must be explicitly enabled
in order to detect or make use of MSA from the guest.
The capability is not supported if the hardware supports MSA vector
partitioning, since the extra support cannot be tested yet and it
extends the state that the userland program would have to save.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Add KVM register numbers for the MIPS SIMD Architecture (MSA) registers,
and implement access to them with the KVM_GET_ONE_REG / KVM_SET_ONE_REG
ioctls when the MSA capability is enabled (exposed in a later patch) and
present in the guest according to its Config3.MSAP bit.
The MSA vector registers use the same register numbers as the FPU
registers except with a different size (128bits). Since MSA depends on
Status.FR=1, these registers are inaccessible when Status.FR=0. These
registers are returned as a single native endian 128bit value, rather
than least significant half first with each 64-bit half native endian as
the kernel uses internally.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Add KVM register numbers for the MIPS FPU registers, and implement
access to them with the KVM_GET_ONE_REG / KVM_SET_ONE_REG ioctls when
the FPU capability is enabled (exposed in a later patch) and present in
the guest according to its Config1.FP bit.
The registers are accessible in the current mode of the guest, with each
sized access showing what the guest would see with an equivalent access,
and like the architecture they may become UNPREDICTABLE if the FR mode
is changed. When FR=0, odd doubles are inaccessible as they do not exist
in that mode.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Add Config4 and Config5 co-processor 0 registers, and add capability to
write the Config1, Config3, Config4, and Config5 registers using the KVM
API.
Only supported bits can be written, to minimise the chances of the guest
being given a configuration from e.g. QEMU that is inconsistent with
that being emulated, and as such the handling is in trap_emul.c as it
may need to be different for VZ. Currently the only modification
permitted is to make Config4 and Config5 exist via the M bits, but other
bits will be added for FPU and MSA support in future patches.
Care should be taken by userland not to change bits without fully
handling the possible extra state that may then exist and which the
guest may begin to use and depend on.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Provide the KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS ioctl which can be used
to get/set guest storage keys. This functionality is needed for live migration
of s390 guests that use storage keys.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The Store System Information (STSI) instruction currently collects all
information it relays to the caller in the kernel. Some information,
however, is only available in user space. An example of this is the
guest name: The kernel always sets "KVMGuest", but user space knows the
actual guest name.
This patch introduces a new exit, KVM_EXIT_S390_STSI, guarded by a
capability that can be enabled by user space if it wants to be able to
insert such data. User space will be provided with the target buffer
and the requested STSI function code.
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
On s390, we've got to make sure to hold the IPTE lock while accessing
logical memory. So let's add an ioctl for reading and writing logical
memory to provide this feature for userspace, too.
The maximum transfer size of this call is limited to 64kB to prevent
that the guest can trigger huge copy_from/to_user transfers. QEMU
currently only requests up to one or two pages so far, so 16*4kB seems
to be a reasonable limit here.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
To cleanly restore an SMP VM we need to ensure that the current pause
state of each vcpu is correctly recorded. Things could get confused if
the CPU starts running after migration restore completes when it was
paused before it state was captured.
We use the existing KVM_GET/SET_MP_STATE ioctl to do this. The arm/arm64
interface is a lot simpler as the only valid states are
KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_STOPPED.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
This patch enables irqfd on arm/arm64.
Both irqfd and resamplefd are supported. Injection is implemented
in vgic.c without routing.
This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.
KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.
Irqfd injection is restricted to SPI. The rationale behind not
supporting PPI irqfd injection is that any device using a PPI would
be a private-to-the-CPU device (timer for instance), so its state
would have to be context-switched along with the VCPU and would
require in-kernel wiring anyhow. It is not a relevant use case for
irqfds.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Define and allocate space for both the host and guest views of
the vector registers for a given vcpu. The 32 vector registers
occupy 128 bits each (512 bytes total), but architecturally are
paired with 512 additional bytes of reserved space for future
expansion.
The kvm_sync_regs structs containing the registers are union'ed
with 1024 bytes of padding in the common kvm_run struct. The
addition of 1024 bytes of new register information clearly exceeds
the existing union, so an expansion of that padding is required.
When changing environments, we need to appropriately save and
restore the vector registers viewed by both the host and guest,
into and out of the sync_regs space.
The floating point registers overlay the upper half of vector
registers 0-15, so there's a bit of data duplication here that
needs to be carefully avoided.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
KVM: s390: fixes and features for kvm/next (3.20)
1. Generic
- sparse warning (make function static)
- optimize locking
- bugfixes for interrupt injection
- fix MVPG addressing modes
2. hrtimer/wakeup fun
A recent change can cause KVM hangs if adjtime is used in the host.
The hrtimer might wake up too early or too late. Too early is fatal
as vcpu_block will see that the wakeup condition is not met and
sleep again. This CPU might never wake up again.
This series addresses this problem. adjclock slowing down the host
clock will result in too late wakeups. This will require more work.
In addition to that we also change the hrtimer from REALTIME to
MONOTONIC to avoid similar problems with timedatectl set-time.
3. sigp rework
We will move all "slow" sigps to QEMU (protected with a capability that
can be enabled) to avoid several races between concurrent SIGP orders.
4. Optimize the shadow page table
Provide an interface to announce the maximum guest size. The kernel
will use that to make the pagetable 2,3,4 (or theoretically) 5 levels.
5. Provide an interface to set the guest TOD
We now use two vm attributes instead of two oneregs, as oneregs are
vcpu ioctl and we don't want to call them from other threads.
6. Protected key functions
The real HMC allows to enable/disable protected key CPACF functions.
Lets provide an implementation + an interface for QEMU to activate
this the protected key instructions.
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU
We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).
This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.
For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
With all of the GICv3 code in place now we allow userland to ask the
kernel for using a virtual GICv3 in the guest.
Also we provide the necessary support for guests setting the memory
addresses for the virtual distributor and redistributors.
This requires some userland code to make use of that feature and
explicitly ask for a virtual GICv3.
Document that KVM_CREATE_IRQCHIP only works for GICv2, but is
considered legacy and using KVM_CREATE_DEVICE is preferred.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
problems, clarifies VCPU init, and fixes a regression concerning the
VGIC init flow.
Conflicts:
arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]
When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
should really be turned off for the VM adhering to the suggestions in
the PSCI spec, and it's the sane thing to do.
Also, clarify the behavior and expectations for exits to user space with
the KVM_EXIT_SYSTEM_EVENT case.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
It is not clear that this ioctl can be called multiple times for a given
vcpu. Userspace already does this, so clarify the ABI.
Also specify that userspace is expected to always make secondary and
subsequent calls to the ioctl with the same parameters for the VCPU as
the initial call (which userspace also already does).
Add code to check that userspace doesn't violate that ABI in the future,
and move the kvm_vcpu_set_target() function which is currently
duplicated between the 32-bit and 64-bit versions in guest.c to a common
static function in arm.c, shared between both architectures.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The implementation of KVM_ARM_VCPU_INIT is currently not doing what
userspace expects, namely making sure that a vcpu which may have been
turned off using PSCI is returned to its initial state, which would be
powered on if userspace does not set the KVM_ARM_VCPU_POWER_OFF flag.
Implement the expected functionality and clarify the ABI.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
No kernel ever reported KVM_CAP_DEVICE_MSIX, KVM_CAP_DEVICE_MSI,
KVM_CAP_DEVICE_ASSIGNMENT, KVM_CAP_DEVICE_DEASSIGNMENT.
This makes the documentation wrong, and no application ever
written to use these capabilities has a chance to work correctly.
The only way to detect support is to try, and test errno for ENOTTY.
That's unfortunate, but we can't fix the past.
Document the actual semantics, and drop the definitions from
the exported header to make it easier for application
developers to note and fix the bug.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This was missed in respective one_reg implementation patch.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch clarifies that kvm_dirty_regs are just a hint to the kernel and
that the kernel might just ignore some flags and sync the values (like done for
acrs and gprs now).
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Patch queue for ppc - 2014-08-01
Highlights in this release include:
- BookE: Rework instruction fetch, not racy anymore now
- BookE HV: Fix ONE_REG accessors for some in-hardware registers
- Book3S: Good number of LE host fixes, enable HV on LE
- Book3S: Some misc bug fixes
- Book3S HV: Add in-guest debug support
- Book3S HV: Preload cache lines on context switch
- Remove 440 support
Alexander Graf (31):
KVM: PPC: Book3s PR: Disable AIL mode with OPAL
KVM: PPC: Book3s HV: Fix tlbie compile error
KVM: PPC: Book3S PR: Handle hyp doorbell exits
KVM: PPC: Book3S PR: Fix ABIv2 on LE
KVM: PPC: Book3S PR: Fix sparse endian checks
PPC: Add asm helpers for BE 32bit load/store
KVM: PPC: Book3S HV: Make HTAB code LE host aware
KVM: PPC: Book3S HV: Access guest VPA in BE
KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
KVM: PPC: Book3S HV: Access XICS in BE
KVM: PPC: Book3S HV: Fix ABIv2 on LE
KVM: PPC: Book3S HV: Enable for little endian hosts
KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct
KVM: PPC: Deflect page write faults properly in kvmppc_st
KVM: PPC: Book3S: Stop PTE lookup on write errors
KVM: PPC: Book3S: Add hack for split real mode
KVM: PPC: Book3S: Make magic page properly 4k mappable
KVM: PPC: Remove 440 support
KVM: Rename and add argument to check_extension
KVM: Allow KVM_CHECK_EXTENSION on the vm fd
KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode
KVM: PPC: Implement kvmppc_xlate for all targets
KVM: PPC: Move kvmppc_ld/st to common code
KVM: PPC: Remove kvmppc_bad_hva()
KVM: PPC: Use kvm_read_guest in kvmppc_ld
KVM: PPC: Handle magic page in kvmppc_ld/st
KVM: PPC: Separate loadstore emulation from priv emulation
KVM: PPC: Expose helper functions for data/inst faults
KVM: PPC: Remove DCR handling
KVM: PPC: HV: Remove generic instruction emulation
KVM: PPC: PR: Handle FSCR feature deselects
Alexey Kardashevskiy (1):
KVM: PPC: Book3S: Fix LPCR one_reg interface
Aneesh Kumar K.V (4):
KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
KVM: PPC: BOOK3S: PR: Emulate instruction counter
KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
Anton Blanchard (2):
KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()
Bharat Bhushan (10):
kvm: ppc: bookehv: Added wrapper macros for shadow registers
kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1
kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR
kvm: ppc: booke: Add shared struct helpers of SPRN_ESR
kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7
kvm: ppc: Add SPRN_EPR get helper function
kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit
KVM: PPC: Booke-hv: Add one reg interface for SPRG9
KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
Michael Neuling (1):
KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling
Mihai Caraman (8):
KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
KVM: PPC: e500: Fix default tlb for victim hint
KVM: PPC: e500: Emulate power management control SPR
KVM: PPC: e500mc: Revert "add load inst fixup"
KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1
KVM: PPC: Book3s: Remove kvmppc_read_inst() function
KVM: PPC: Allow kvmppc_get_last_inst() to fail
KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
Paul Mackerras (4):
KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling
KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled
KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call
KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication
Stewart Smith (2):
Split out struct kvmppc_vcore creation to separate function
Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
Conflicts:
Documentation/virtual/kvm/api.txt
DCR handling was only needed for 440 KVM. Since we removed it, we can also
remove handling of DCR accesses.
Signed-off-by: Alexander Graf <agraf@suse.de>
The KVM_CHECK_EXTENSION is only available on the kvm fd today. Unfortunately
on PPC some of the capabilities change depending on the way a VM was created.
So instead we need a way to expose capabilities as VM ioctl, so that we can
see which VM type we're using (HV or PR). To enable this, add the
KVM_CHECK_EXTENSION ioctl to our vm ioctl portfolio.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Unfortunately, the LPCR got defined as a 32-bit register in the
one_reg interface. This is unfortunate because KVM allows userspace
to control the DPFD (default prefetch depth) field, which is in the
upper 32 bits. The result is that DPFD always get set to 0, which
reduces performance in the guest.
We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID,
since that would break existing userspace binaries. Instead we define
a new KVM_REG_PPC_LPCR_64 id which is 64-bit. Userspace can still use
the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in
the bottom 32 bits that userspace can modify (ILE, TC and AIL).
If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD
as well.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
capability is used to enable or disable in-kernel handling of an
hcall, that the hcall is actually implemented by the kernel.
If not an EINVAL error is returned.
This also checks the default-enabled list of hcalls and prints a
warning if any hcall there is not actually implemented.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This provides a way for userspace controls which sPAPR hcalls get
handled in the kernel. Each hcall can be individually enabled or
disabled for in-kernel handling, except for H_RTAS. The exception
for H_RTAS is because userspace can already control whether
individual RTAS functions are handled in-kernel or not via the
KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
H_RTAS is out of the normal sequence of hcall numbers.
Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
The args field of the struct kvm_enable_cap specifies the hcall number
in args[0] and the enable/disable flag in args[1]; 0 means disable
in-kernel handling (so that the hcall will always cause an exit to
userspace) and 1 means enable. Enabling or disabling in-kernel
handling of an hcall is effective across the whole VM.
The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
on PowerPC is new, added by this commit. The KVM_CAP_ENABLE_CAP_VM
capability advertises that this ability exists.
When a VM is created, an initial set of hcalls are enabled for
in-kernel handling. The set that is enabled is the set that have
an in-kernel implementation at this point. Any new hcall
implementations from this point onwards should not be added to the
default set without a good reason.
No distinction is made between real-mode and virtual-mode hcall
implementations; the one setting controls them both.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Bugfixes
--------
- add IPTE to trace event decoder
- document and advertise KVM_CAP_S390_IRQCHIP
Cleanups
--------
- Reuse kvm_vcpu_block for s390
- Get rid of tasklet for wakup processing
This series enables the "KVM_(S|G)ET_MP_STATE" ioctls on s390 to make
the cpu state settable by user space.
This is necessary to avoid races in s390 SIGP/reset handling which
happen because some SIGPs are handled in QEMU, while others are
handled in the kernel. Together with the busy conditions as return
value of SIGP races happen especially in areas like starting and
stopping of CPUs. (For example, there is a program 'cpuplugd', that
runs on several s390 distros which does automatic onlining and
offlining on cpus.)
As soon as the MPSTATE interface is used, user space takes complete
control of the cpu states. Otherwise the kernel will use the old way.
Therefore, the new kernel continues to work fine with old QEMUs.
Capabilities can be enabled on a vcpu or (since recently) on a vm. Document
this and note for the existing capabilites whether they are per-vcpu or
per-vm.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch
- adds s390 specific MP states to linux headers and documents them
- implements the KVM_{SET,GET}_MP_STATE ioctls
- enables KVM_CAP_MP_STATE
- allows user space to control the VCPU state on s390.
If user space sets the VCPU state using the ioctl KVM_SET_MP_STATE, we can disable
manual changing of the VCPU state and trust user space to do the right thing.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Highlight the aspects of the ioctls that are actually specific to x86
and ia64. As defined restrictions (irqchip) and mp states may not apply
to other architectures, these parts are flagged to belong to x86 and ia64.
In preparation for the use of KVM_(S|G)ET_MP_STATE by s390.
Fix a spelling error (KVM_SET_MP_STATE vs. KVM_SET_MPSTATE) on the way.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Document the MIPS specific parts of the KVM API, including:
- The layout of the kvm_regs structure.
- The interrupt number passed to KVM_INTERRUPT.
- The registers supported by the KVM_{GET,SET}_ONE_REG interface, and
the encoding of those register ids.
- That KVM_INTERRUPT and KVM_GET_REG_LIST are supported on MIPS.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some of the MIPS registers that can be accessed with the
KVM_{GET,SET}_ONE_REG interface have fairly long names, so widen the
Register column of the table in the KVM_SET_ONE_REG documentation to
allow them to fit.
Tabs in the table are replaced with spaces at the same time for
consistency.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull trivial tree changes from Jiri Kosina:
"Usual pile of patches from trivial tree that make the world go round"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
staging: go7007: remove reference to CONFIG_KMOD
aic7xxx: Remove obsolete preprocessor define
of: dma: doc fixes
doc: fix incorrect formula to calculate CommitLimit value
doc: Note need of bc in the kernel build from 3.10 onwards
mm: Fix printk typo in dmapool.c
modpost: Fix comment typo "Modules.symvers"
Kconfig.debug: Grammar s/addition/additional/
wimax: Spelling s/than/that/, wording s/destinatary/recipient/
aic7xxx: Spelling s/termnation/termination/
arm64: mm: Remove superfluous "the" in comment
of: Spelling s/anonymouns/anonymous/
dma: imx-sdma: Spelling s/determnine/determine/
ath10k: Improve grammar in comments
ath6kl: Spelling s/determnine/determine/
of: Improve grammar for of_alias_get_id() documentation
drm/exynos: Spelling s/contro/control/
radio-bcm2048.c: fix wrong overflow check
doc: printk-formats: do not mention casts for u64/s64
doc: spelling error changes
...
Pull KVM updates from Paolo Bonzini:
"At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM. Changes include:
- a lot of s390 changes: optimizations, support for migration, GDB
support and more
- ARM changes are pretty small: support for the PSCI 0.2 hypercall
interface on both the guest and the host (the latter acked by
Catalin)
- initial POWER8 and little-endian host support
- support for running u-boot on embedded POWER targets
- pretty large changes to MIPS too, completing the userspace
interface and improving the handling of virtualized timer hardware
- for x86, a larger set of changes is scheduled for 3.17. Still, we
have a few emulator bugfixes and support for running nested
fully-virtualized Xen guests (para-virtualized Xen guests have
always worked). And some optimizations too.
The only missing architecture here is ia64. It's not a coincidence
that support for KVM on ia64 is scheduled for removal in 3.17"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
KVM: add missing cleanup_srcu_struct
KVM: PPC: Book3S PR: Rework SLB switching code
KVM: PPC: Book3S PR: Use SLB entry 0
KVM: PPC: Book3S HV: Fix machine check delivery to guest
KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
KVM: PPC: Book3S HV: Fix dirty map for hugepages
KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
KVM: PPC: Book3S: Add ONE_REG register names that were missed
KVM: PPC: Add CAP to indicate hcall fixes
KVM: PPC: MPIC: Reset IRQ source private members
KVM: PPC: Graciously fail broken LE hypercalls
PPC: ePAPR: Fix hypercall on LE guest
KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
KVM: PPC: BOOK3S: Always use the saved DAR value
PPC: KVM: Make NX bit available with magic page
KVM: PPC: Disable NX for old magic page using guests
KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
...
Patch queue for ppc - 2014-05-30
In this round we have a few nice gems. PR KVM gains initial POWER8 support
as well as LE host awareness, ihe e500 targets can now properly run u-boot,
LE guests now work with PR KVM including KVM hypercalls and HV KVM guests
can now use huge pages.
On top of this there are some bug fixes.
Conflicts:
include/uapi/linux/kvm.h
Commit b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8
SPRs") added a definition of KVM_REG_PPC_WORT with the same register
number as the existing KVM_REG_PPC_VRSAVE (though in fact the
definitions are not identical because of the different register sizes.)
For clarity, this moves KVM_REG_PPC_WORT to the next unused number,
and also adds it to api.txt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit 3b7834743f ("KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg") added definitions for several KVM_REG_PPC_* symbols
but missed adding some to api.txt. This adds them.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>