android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Sean Christopherson	ca431c0cc3	KVM: VMX: Drop redundant capability checks in low level INVVPID helpers Remove the INVVPID capabilities checks from vpid_sync_vcpu_single() and vpid_sync_vcpu_global() now that all callers ensure the INVVPID variant is supported. Note, in some cases the guarantee is provided in concert with hardware_setup(), which enables VPID if and only if at least of invvpid_single() or invvpid_global() is supported. Drop the WARN_ON_ONCE() from vmx_flush_tlb() as vpid_sync_vcpu_single() will trigger a WARN() on INVVPID failure, i.e. if SINGLE_CONTEXT isn't supported. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-13-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-20 17:26:08 -04:00
Sean Christopherson	ab4b3597ff	KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr() Directly invoke vpid_sync_context() to do a global INVVPID when the individual address variant is not supported instead of deferring such behavior to the caller. This allows for additional consolidation of code as the logic is basically identical to the emulation of the individual address variant in handle_invvpid(). No functional change intended. Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-12-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-20 17:26:08 -04:00
Sean Christopherson	8a8b097c6c	KVM: VMX: Move vpid_sync_vcpu_addr() down a few lines Move vpid_sync_vcpu_addr() below vpid_sync_context() so that it can be refactored in a future patch to call vpid_sync_context() directly when the "individual address" INVVPID variant isn't supported. No functional change intended. Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-20 17:26:07 -04:00
Sean Christopherson	446ace4bca	KVM: VMX: Use vpid_sync_context() directly when possible Use vpid_sync_context() directly for flows that run if and only if enable_vpid=1, or more specifically, nested VMX flows that are gated by vmx->nested.msrs.secondary_ctls_high.SECONDARY_EXEC_ENABLE_VPID being set, which is allowed if and only if enable_vpid=1. Because these flows call __vmx_flush_tlb() with @invalidate_gpa=false, the if-statement that decides between INVEPT and INVVPID will always go down the INVVPID path, i.e. call vpid_sync_context() because "enable_ept && (invalidate_gpa \|\| !enable_vpid)" always evaluates false. This helps pave the way toward removing @invalidate_gpa and @vpid from __vmx_flush_tlb() and its callers. Opportunstically drop unnecessary brackets in handle_invvpid() around an affected __vmx_flush_tlb()->vpid_sync_context() conversion. No functional change intended. Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-10-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-20 17:26:06 -04:00
Sean Christopherson	c746b3a4b8	KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context() Skip the global INVVPID in the unlikely scenario that vpid==0 and the SINGLE_CONTEXT variant of INVVPID is unsupported. If vpid==0, there's no need to INVVPID as it's impossible to do VM-Enter with VPID enabled and vmcs.VPID==0, i.e. there can't be any TLB entries for the vCPU with vpid==0. The fact that the SINGLE_CONTEXT variant isn't supported is irrelevant. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-9-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-20 17:26:06 -04:00
Junaid Shahid	ee1fa209f5	KVM: x86: Sync SPTEs when injecting page/EPT fault into L1 When injecting a page fault or EPT violation/misconfiguration, KVM is not syncing any shadow PTEs associated with the faulting address, including those in previous MMUs that are associated with L1's current EPTP (in a nested EPT scenario), nor is it flushing any hardware TLB entries. All this is done by kvm_mmu_invalidate_gva. Page faults that are either !PRESENT or RSVD are exempt from the flushing, as the CPU is not allowed to cache such translations. Signed-off-by: Junaid Shahid <junaids@google.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-20 17:26:05 -04:00
Junaid Shahid	d6e3f8385d	KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT Free all roots when emulating INVVPID for L1 and EPT is disabled, as outstanding changes to the page tables managed by L1 need to be recognized. Because L1 and L2 share an MMU when EPT is disabled, and because VPID is not tracked by the MMU role, all roots in the current MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or VM-Exit could do a fast CR3 switch (without a flush/sync) and consume stale SPTEs. Fixes: `5c614b3583` ("KVM: nVMX: nested VPID emulation") Signed-off-by: Junaid Shahid <junaids@google.com> [sean: ported to upstream KVM, reworded the comment and changelog] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-5-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:49 -04:00
Sean Christopherson	f8aa7e3958	KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 Free all L2 (guest_mmu) roots when emulating INVEPT for L1. Outstanding changes to the EPT tables managed by L1 need to be recognized, and relying on KVM to always flush L2's EPTP context on nested VM-Enter is dangerous. Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote TLB flush if necessary, e.g. if L1 has never entered L2 then there is nothing to be done. Nuking all L2 roots is overkill for the single-context variant, but it's the safe and easy bet. A more precise zap mechanism will be added in the future. Add a TODO to call out that KVM only needs to invalidate affected contexts. Fixes: `14c07ad89f` ("x86/kvm/mmu: introduce guest_mmu") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-4-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:49 -04:00
Sean Christopherson	eed0030e4c	KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) Signal VM-Fail for the single-context variant of INVEPT if the specified EPTP is invalid. Per the INEVPT pseudocode in Intel's SDM, it's subject to the standard EPT checks: If VM entry with the "enable EPT" VM execution control set to 1 would fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID); Fixes: `bfd0a56b90` ("nEPT: Nested INVEPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-3-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:48 -04:00
Sean Christopherson	e8eff28215	KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH. Remote TLB flushes require all contexts to be invalidated, not just the active contexts, e.g. all mappings in all contexts for a given HVA need to be invalidated on a mmu_notifier invalidation. Similarly, the instigator of the deferred TLB flush may be expecting all contexts to be flushed, e.g. vmx_vcpu_load_vmcs(). Without nested VMX, flushing only the current EPTP/VPID context isn't problematic because KVM uses a constant VPID for each vCPU, and mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP for L1. In the rare case where a different EPTP is created or reused, KVM (currently) unconditionally flushes the new EPTP context prior to entering the guest. With nested VMX, KVM conditionally uses a different VPID for L2, and unconditionally uses a different EPTP for L2. Because KVM doesn't _intentionally_ guarantee L2's EPTP/VPID context is flushed on nested VM-Enter, it'd be possible for a malicious L1 to attack the host and/or different VMs by exploiting the lack of flushing for L2. 1) Launch nested guest from malicious L1. 2) Nested VM-Enter to L2. 3) Access target GPA 'g'. CPU inserts TLB entry tagged with L2's ASID mapping 'g' to host PFN 'x'. 2) Nested VM-Exit to L1. 3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing the page for PFN 'x'. 4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and remaps the page to PFN 'y'. mmu_notifier sends invalidate command, KVM flushes TLB only for L1's ASID. 4) Host kernel reallocates PFN 'x' to some other task/guest. 5) Nested VM-Enter to L2. KVM does not invalidate L2's EPTP or VPID. 6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its stale TLB entry. However, current KVM unconditionally flushes L1's EPTP/VPID context on nested VM-Exit. But, that behavior is mostly unintentional, KVM doesn't go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather a TLB flush is guaranteed to occur prior to re-entering L1 due to __kvm_mmu_new_cr3() always being called with skip_tlb_flush=false. On nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT enabled) or in nested_vmx_load_cr3() (nested EPT disabled). On nested VM-Exit it occurs via nested_vmx_load_cr3(). This also fixes a bug where a deferred TLB flush in the context of L2, with EPT disabled, would flush L1's VPID instead of L2's VPID, as vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode(). Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Ben Gardon <bgardon@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Junaid Shahid <junaids@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: John Haxby <john.haxby@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Fixes: `efebf0aaec` ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:48 -04:00
Oliver Upton	69c0975525	kvm: nVMX: match comment with return type for nested_vmx_exit_reflected nested_vmx_exit_reflected() returns a bool, not int. As such, refer to the return values as true/false in the comment instead of 1/0. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20200414221241.134103-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:35 -04:00
Oliver Upton	b045ae906b	kvm: nVMX: reflect MTF VM-exits if injected by L1 According to SDM 26.6.2, it is possible to inject an MTF VM-exit via the VM-entry interruption-information field regardless of the 'monitor trap flag' VM-execution control. KVM appropriately copies the VM-entry interruption-information field from vmcs12 to vmcs02. However, if L1 has not set the 'monitor trap flag' VM-execution control, KVM fails to reflect the subsequent MTF VM-exit into L1. Fix this by consulting the VM-entry interruption-information field of vmcs12 to determine if L1 has injected the MTF VM-exit. If so, reflect the exit, regardless of the 'monitor trap flag' VM-execution control. Fixes: `5f3d45e7f2` ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200414224746.240324-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:35 -04:00
Uros Bizjak	fb56baae5e	KVM: VMX: Enable machine check support for 32bit targets There is no reason to limit the use of do_machine_check to 64bit targets. MCE handling works for both target familes. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Fixes: `a0861c02a9` ("KVM: Add VT-x machine check support") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414071414.45636-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-14 04:22:10 -04:00
Xiaoyao Li	e6f8b6c12f	KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest Two types of #AC can be generated in Intel CPUs: 1. legacy alignment check #AC 2. split lock #AC Reflect #AC back into the guest if the guest has legacy alignment checks enabled or if split lock detection is disabled. If the #AC is not a legacy one and split lock detection is enabled, then invoke handle_guest_split_lock() which will either warn and disable split lock detection for this task or force SIGBUS on it. [ tglx: Switch it to handle_guest_split_lock() and rename the misnamed helper function. ] Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200410115517.176308876@linutronix.de	2020-04-11 16:42:41 +02:00
Vitaly Kuznetsov	dbef2808af	KVM: VMX: fix crash cleanup when KVM wasn't used If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: `31603d4fc2` ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-07 08:35:36 -04:00
Oliver Upton	5c8beb4746	KVM: nVMX: don't clear mtf_pending when nested events are blocked If nested events are blocked, don't clear the mtf_pending flag to avoid missing later delivery of the MTF VM-exit. Fixes: `5ef8acbdd6` ("KVM: nVMX: Emulate MTF when performing instruction emulation") Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20200406201237.178725-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-07 04:21:41 -04:00
Uros Bizjak	da7e423209	KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenter The exception trampoline in .fixup section is not needed, the exception handling code can jump directly to the label in the .text section. Changes since v1: - Fix commit message. Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200406202108.74300-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-07 04:21:20 -04:00
Linus Torvalds	8c1b724ddb	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - GICv4.1 support - 32bit host removal PPC: - secure (encrypted) using under the Protected Execution Framework ultravisor s390: - allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: - New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. - Initial work on making nested SVM event injection more similar to VMX, and less buggy. - Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". - A large refactoring of CPUID features, which now use an array that parallels the core x86_features. - Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. - New Tigerlake CPUID features. - More bugfixes, optimizations and cleanups. Generic: - selftests: cleanups, new MMU notifier stress test, steal-time test - CSV output for kvm_stat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits) x86/kvm: fix a missing-prototypes "vmread_error" KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y KVM: VMX: Add a trampoline to fix VMREAD error handling KVM: SVM: Annotate svm_x86_ops as __initdata KVM: VMX: Annotate vmx_x86_ops as __initdata KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes KVM: VMX: Configure runtime hooks using vmx_x86_ops KVM: VMX: Move hardware_setup() definition below vmx_x86_ops KVM: x86: Move init-only kvm_x86_ops to separate struct KVM: Pass kvm_init()'s opaque param to additional arch funcs s390/gmap: return proper error code on ksm unsharing KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move() KVM: Fix out of range accesses to memslots KVM: X86: Micro-optimize IPI fastpath delay KVM: X86: Delay read msr data iff writes ICR MSR KVM: PPC: Book3S HV: Add a capability for enabling secure guests KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs ...	2020-04-02 15:13:15 -07:00
Qian Cai	514ccc1949	x86/kvm: fix a missing-prototypes "vmread_error" The commit `842f4be958` ("KVM: VMX: Add a trampoline to fix VMREAD error handling") removed the declaration of vmread_error() causes a W=1 build failure with KVM_WERROR=y. Fix it by adding it back. arch/x86/kvm/vmx/vmx.c:359:17: error: no previous prototype for 'vmread_error' [-Werror=missing-prototypes] asmlinkage void vmread_error(unsigned long field, bool fault) ^~~~~~~~~~~~ Signed-off-by: Qian Cai <cai@lca.pw> Message-Id: <20200402153955.1695-1-cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-02 15:17:45 -04:00
Linus Torvalds	fdf5563a72	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "This topic tree contains more commits than usual: - most of it are uaccess cleanups/reorganization by Al - there's a bunch of prototype declaration (--Wmissing-prototypes) cleanups - misc other cleanups all around the map" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/mm/set_memory: Fix -Wmissing-prototypes warnings x86/efi: Add a prototype for efi_arch_mem_reserve() x86/mm: Mark setup_emu2phys_nid() static x86/jump_label: Move 'inline' keyword placement x86/platform/uv: Add a missing prototype for uv_bau_message_interrupt() kill uaccess_try() x86: unsafe_put-style macro for sigmask x86: x32_setup_rt_frame(): consolidate uaccess areas x86: __setup_rt_frame(): consolidate uaccess areas x86: __setup_frame(): consolidate uaccess areas x86: setup_sigcontext(): list user_access_{begin,end}() into callers x86: get rid of put_user_try in __setup_rt_frame() (both 32bit and 64bit) x86: ia32_setup_rt_frame(): consolidate uaccess areas x86: ia32_setup_frame(): consolidate uaccess areas x86: ia32_setup_sigcontext(): lift user_access_{begin,end}() into the callers x86/alternatives: Mark text_poke_loc_init() static x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl() x86/mm: Drop pud_mknotpresent() x86: Replace setup_irq() by request_irq() x86/configs: Slightly reduce defconfigs ...	2020-03-31 11:04:05 -07:00
Sean Christopherson	842f4be958	KVM: VMX: Add a trampoline to fix VMREAD error handling Add a hand coded assembly trampoline to preserve volatile registers across vmread_error(), and to handle the calling convention differences between 64-bit and 32-bit due to asmlinkage on vmread_error(). Pass @field and @fault on the stack when invoking the trampoline to avoid clobbering volatile registers in the context of the inline assembly. Calling vmread_error() directly from inline assembly is partially broken on 64-bit, and completely broken on 32-bit. On 64-bit, it will clobber %rdi and %rsi (used to pass @field and @fault) and any volatile regs written by vmread_error(). On 32-bit, asmlinkage means vmread_error() expects the parameters to be passed on the stack, not via regs. Opportunistically zero out the result in the trampoline to save a few bytes of code for every VMREAD. A happy side effect of the trampoline is that the inline code footprint is reduced by three bytes on 64-bit due to PUSH/POP being more efficent (in terms of opcode bytes) than MOV. Fixes: `6e2020977e` ("KVM: VMX: Add error handling to VMREAD helper") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200326160712.28803-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-31 10:48:11 -04:00
Sean Christopherson	e286ac0e38	KVM: VMX: Annotate vmx_x86_ops as __initdata Tag vmx_x86_ops with __initdata now the the struct is copied by value to a common x86 instance of kvm_x86_ops as part of kvm_init(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-9-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-31 10:48:10 -04:00
Sean Christopherson	6e4fd06f3e	KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() Remove the __exit annotation from VMX hardware_unsetup(), the hook can be reached during kvm_init() by way of kvm_arch_hardware_unsetup() if failure occurs at various points during initialization. Removing the annotation also lets us annotate vmx_x86_ops and svm_x86_ops with __initdata; otherwise, objtool complains because it doesn't understand that the vendor specific __initdata is being copied by value to a non-__initdata instance. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-8-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-31 10:48:09 -04:00
Sean Christopherson	afaf0b2f9b	KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection Replace the kvm_x86_ops pointer in common x86 with an instance of the struct to save one pointer dereference when invoking functions. Copy the struct by value to set the ops during kvm_init(). Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the ops have been initialized, i.e. a vendor KVM module has been loaded. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-7-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-31 10:48:08 -04:00
Sean Christopherson	72b0eaa946	KVM: VMX: Configure runtime hooks using vmx_x86_ops Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead of using the global kvm_x86_ops. This sets the stage for waiting until after ->hardware_setup() to set kvm_x86_ops with the vendor's implementation. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-5-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-31 10:48:06 -04:00
Sean Christopherson	484014faf8	KVM: VMX: Move hardware_setup() definition below vmx_x86_ops Move VMX's hardware_setup() below its vmx_x86_ops definition so that a future patch can refactor hardware_setup() to modify vmx_x86_ops directly instead of indirectly modifying the ops via the global kvm_x86_ops. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-4-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-31 10:48:05 -04:00
Sean Christopherson	d008dfdb0e	KVM: x86: Move init-only kvm_x86_ops to separate struct Move the kvm_x86_ops functions that are used only within the scope of kvm_init() into a separate struct, kvm_x86_init_ops. In addition to identifying the init-only functions without restorting to code comments, this also sets the stage for waiting until after ->hardware_setup() to set kvm_x86_ops. Setting kvm_x86_ops after ->hardware_setup() is desirable as many of the hooks are not usable until ->hardware_setup() completes. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-3-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-31 10:48:04 -04:00
Paolo Bonzini	cf39d37539	Merge tag 'kvmarm-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for Linux 5.7 - GICv4.1 support - 32bit host removal	2020-03-31 10:44:53 -04:00
Linus Torvalds	9b82f05f86	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle were: Kernel side changes: - A couple of x86/cpu cleanups and changes were grandfathered in due to patch dependencies. These clean up the set of CPU model/family matching macros with a consistent namespace and C99 initializer style. - A bunch of updates to various low level PMU drivers: * AMD Family 19h L3 uncore PMU * Intel Tiger Lake uncore support * misc fixes to LBR TOS sampling - optprobe fixes - perf/cgroup: optimize cgroup event sched-in processing - misc cleanups and fixes Tooling side changes are to: - perf {annotate,expr,record,report,stat,test} - perl scripting - libapi, libperf and libtraceevent - vendor events on Intel and S390, ARM cs-etm - Intel PT updates - Documentation changes and updates to core facilities - misc cleanups, fixes and other enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits) cpufreq/intel_pstate: Fix wrong macro conversion x86/cpu: Cleanup the now unused CPU match macros hwrng: via_rng: Convert to new X86 CPU match macros crypto: Convert to new CPU match macros ASoC: Intel: Convert to new X86 CPU match macros powercap/intel_rapl: Convert to new X86 CPU match macros PCI: intel-mid: Convert to new X86 CPU match macros mmc: sdhci-acpi: Convert to new X86 CPU match macros intel_idle: Convert to new X86 CPU match macros extcon: axp288: Convert to new X86 CPU match macros thermal: Convert to new X86 CPU match macros hwmon: Convert to new X86 CPU match macros platform/x86: Convert to new CPU match macros EDAC: Convert to new X86 CPU match macros cpufreq: Convert to new X86 CPU match macros ACPI: Convert to new X86 CPU match macros x86/platform: Convert to new CPU match macros x86/kernel: Convert to new CPU match macros x86/kvm: Convert to new CPU match macros x86/perf/events: Convert to new CPU match macros ...	2020-03-30 16:40:08 -07:00
Ingo Molnar	629b3df7ec	Merge branch 'x86/cpu' into perf/core, to resolve conflict Conflicts: arch/x86/events/intel/uncore.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-03-25 15:20:44 +01:00
Thomas Gleixner	320debe5ef	x86/kvm: Convert to new CPU match macros The new macro set has a consistent namespace and uses C99 initializers instead of the grufty C89 ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lkml.kernel.org/r/20200320131509.136884777@linutronix.de	2020-03-24 21:27:29 +01:00
Thomas Gleixner	ba5bade4cc	x86/devicetable: Move x86 specific macro out of generic code There is no reason that this gunk is in a generic header file. The wildcard defines need to stay as they are required by file2alias. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lkml.kernel.org/r/20200320131508.736205164@linutronix.de	2020-03-24 21:02:47 +01:00
Sean Christopherson	4f6ea0a876	KVM: VMX: Gracefully handle faults on VMXON Gracefully handle faults on VMXON, e.g. #GP due to VMX being disabled by BIOS, instead of letting the fault crash the system. Now that KVM uses cpufeatures to query support instead of reading MSR_IA32_FEAT_CTL directly, it's possible for a bug in a different subsystem to cause KVM to incorrectly attempt VMXON[]. Crashing the system is especially annoying if the system is configured such that hardware_enable() will be triggered during boot. Oppurtunistically rename @addr to @vmxon_pointer and use a named param to reference it in the inline assembly. Print 0xdeadbeef in the ultra-"rare" case that reading MSR_IA32_FEAT_CTL also faults. [] https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321193751.24985-4-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:26 -04:00
Sean Christopherson	d260f9ef50	KVM: VMX: Fold loaded_vmcs_init() into alloc_loaded_vmcs() Subsume loaded_vmcs_init() into alloc_loaded_vmcs(), its only remaining caller, and drop the VMCLEAR on the shadow VMCS, which is guaranteed to be NULL. loaded_vmcs_init() was previously used by loaded_vmcs_clear(), but loaded_vmcs_clear() also subsumed loaded_vmcs_init() to properly handle smp_wmb() with respect to VMCLEAR. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321193751.24985-3-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:26 -04:00
Sean Christopherson	31603d4fc2	KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown interrupted a KVM update of the percpu in-use VMCS list. Because NMIs are not blocked by disabling IRQs, it's possible that crash_vmclear_local_loaded_vmcss() could be called while the percpu list of VMCSes is being modified, e.g. in the middle of list_add() in vmx_vcpu_load_vmcs(). This potential corner case was called out in the original commit[], but the analysis of its impact was wrong. Skipping the VMCLEARs is wrong because it all but guarantees that a loaded, and therefore cached, VMCS will live across kexec and corrupt memory in the new kernel. Corruption will occur because the CPU's VMCS cache is non-coherent, i.e. not snooped, and so the writeback of VMCS memory on its eviction will overwrite random memory in the new kernel. The VMCS will live because the NMI shootdown also disables VMX, i.e. the in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the VMCS cache on VMXOFF. Furthermore, interrupting list_add() and list_del() is safe due to crash_vmclear_local_loaded_vmcss() using forward iteration. list_add() ensures the new entry is not visible to forward iteration unless the entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev" pointer could be observed if the NMI shootdown interrupted list_del() or list_add(), but list_for_each_entry() does not consume ->prev. In addition to removing the temporary disabling of VMCLEAR, open code loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that the VMCS is deleted from the list only after it's been VMCLEAR'd. Deleting the VMCS before VMCLEAR would allow a race where the NMI shootdown could arrive between list_del() and vmcs_clear() and thus neither flow would execute a successful VMCLEAR. Alternatively, more code could be moved into loaded_vmcs_init(), but that gets rather silly as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb() and would need to work around the list_del(). Update the smp_() comments related to the list manipulation, and opportunistically reword them to improve clarity. [*] https://patchwork.kernel.org/patch/1675731/#3720461 Fixes: `8f536b7697` ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:25 -04:00
Nick Desaulniers	428b8f1d9f	KVM: VMX: don't allow memory operands for inline asm that modifies SP THUNK_TARGET defines [thunk_target] as having "rm" input constraints when CONFIG_RETPOLINE is not set, which isn't constrained enough for this specific case. For inline assembly that modifies the stack pointer before using this input, the underspecification of constraints is dangerous, and results in an indirect call to a previously pushed flags register. In this case `entry`'s stack slot is good enough to satisfy the "m" constraint in "rm", but the inline assembly in handle_external_interrupt_irqoff() modifies the stack pointer via push+pushf before using this input, which in this case results in calling what was the previous state of the flags register, rather than `entry`. Be more specific in the constraints by requiring `entry` be in a register, and not a memory operand. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: syzbot+3f29ca2efb056a761e38@syzkaller.appspotmail.com Debugged-by: Alexander Potapenko <glider@google.com> Debugged-by: Paolo Bonzini <pbonzini@redhat.com> Debugged-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Message-Id: <20200323191243.30002-1-ndesaulniers@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:40:51 -04:00
Paolo Bonzini	96b100cd14	KVM: nVMX: remove side effects from nested_vmx_exit_reflected The name of nested_vmx_exit_reflected suggests that it's purely a test, but it actually marks VMCS12 pages as dirty. Move this to vmx_handle_exit, observing that the initial nested_run_pending check in nested_vmx_exit_reflected is pointless---nested_run_pending has just been cleared in vmx_vcpu_run and won't be set until handle_vmlaunch or handle_vmresume. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-18 12:16:39 +01:00
Uros Bizjak	bb03911f79	KVM: VMX: access regs array in vmenter.S in its natural order Registers in "regs" array are indexed as rax/rcx/rdx/.../rsi/rdi/r8/... Reorder access to "regs" array in vmenter.S to follow its natural order. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-17 14:14:32 +01:00
Vitaly Kuznetsov	b6a0653ae2	KVM: nVMX: properly handle errors in nested_vmx_handle_enlightened_vmptrld() nested_vmx_handle_enlightened_vmptrld() fails in two cases: - when we fail to kvm_vcpu_map() the supplied GPA - when revision_id is incorrect. Genuine Hyper-V raises #UD in the former case (at least with some incorrect GPAs) and does VMfailInvalid() in the later. KVM doesn't do anything so L1 just gets stuck retrying the same faulty VMLAUNCH. nested_vmx_handle_enlightened_vmptrld() has two call sites: nested_vmx_run() and nested_get_vmcs12_pages(). The former needs to queue do much: the failure there happens after migration when L2 was running (and L1 did something weird like wrote to VP assist page from a different vCPU), just kill L1 with KVM_EXIT_INTERNAL_ERROR. Reported-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Squash kbuild autopatch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 18:19:30 +01:00
Vitaly Kuznetsov	e942dbf8c5	KVM: nVMX: stop abusing need_vmcs12_to_shadow_sync for eVMCS mapping When vmx_set_nested_state() happens, we may not have all the required data to map enlightened VMCS: e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not yet be restored so we need a postponed action. Currently, we (ab)use need_vmcs12_to_shadow_sync/nested_sync_vmcs12_to_shadow() for that but this is not ideal: - We may not need to sync anything if L2 is running - It is hard to propagate errors from nested_sync_vmcs12_to_shadow() as we call it from vmx_prepare_switch_to_guest() which happens just before we do VMLAUNCH, the code is not ready to handle errors there. Move eVMCS mapping to nested_get_vmcs12_pages() and request KVM_REQ_GET_VMCS12_PAGES, it seems to be is less abusive in nature. It would probably be possible to introduce a specialized KVM_REQ_EVMCS_MAP but it is undesirable to propagate eVMCS specifics all the way up to x86.c Note, we don't need to request KVM_REQ_GET_VMCS12_PAGES from vmx_set_nested_state() directly as nested_vmx_enter_non_root_mode() already does that. Requesting KVM_REQ_GET_VMCS12_PAGES is done to document the (non-obvious) side-effect and to be future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 18:19:29 +01:00
Paolo Bonzini	0c546725ee	Merge branch 'kvm-null-pointer-fix' into HEAD	2020-03-16 17:59:11 +01:00
Oliver Upton	212617dbb6	KVM: nVMX: Consolidate nested MTF checks to helper function commit `5ef8acbdd6` ("KVM: nVMX: Emulate MTF when performing instruction emulation") introduced a helper to check the MTF VM-execution control in vmcs12. Change pre-existing check in nested_vmx_exit_reflected() to instead use the helper. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:59 +01:00
Wanpeng Li	041bc42ce2	KVM: VMX: Micro-optimize vmexit time when not exposing PMU PMU is not exposed to guest by most of products from cloud providers since the bad performance of PMU emulation and security concern. However, it calls perf_guest_switch_get_msrs() and clear_atomic_switch_msr() unconditionally even if PMU is not exposed to the guest before each vmentry. ~2% vmexit time reduced can be observed by kvm-unit-tests/vmexit.flat on my SKX server. Before patch: vmcall 1559 After patch: vmcall 1529 Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:58 +01:00
Paolo Bonzini	727a7e27cf	KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd The set_cr3 callback is not setting the guest CR3, it is setting the root of the guest page tables, either shadow or two-dimensional. To make this clearer as well as to indicate that the MMU calls it via kvm_mmu_load_cr3, rename it to load_mmu_pgd. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:52 +01:00
Paolo Bonzini	689f3bf216	KVM: x86: unify callbacks to load paging root Similar to what kvm-intel.ko is doing, provide a single callback that merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3. This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops. I'm doing that in this same patch because splitting it adds quite a bit of churn due to the need for forward declarations. For the same reason the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu from init_kvm_softmmu and nested_svm_init_mmu_context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:51 +01:00
Paolo Bonzini	408e9a318f	KVM: CPUID: add support for supervisor states Current CPUID 0xd enumeration code does not support supervisor states, because KVM only supports setting IA32_XSS to zero. Change it instead to use a new variable supported_xss, to be set from the hardware_setup callback which is in charge of CPU capabilities. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:45 +01:00
Sean Christopherson	257038745c	KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 code Handle CPUID 0x8000000A in the main switch in __do_cpuid_func() and drop ->set_supported_cpuid() now that both VMX and SVM implementations are empty. Like leaf 0x14 (Intel PT) and leaf 0x8000001F (SEV), leaf 0x8000000A is is (obviously) vendor specific but can be queried in common code while respecting SVM's wishes by querying kvm_cpu_cap_has(). Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:45 +01:00
Sean Christopherson	91661989d1	KVM: x86: Move VMX's host_efer to common x86 code Move host_efer to common x86 code and use it for CPUID's is_efer_nx() to avoid constantly re-reading the MSR. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:42 +01:00
Sean Christopherson	e884b854ee	KVM: x86: Don't propagate MMU lpage support to memslot.disallow_lpage Stop propagating MMU large page support into a memslot's disallow_lpage now that the MMU's max_page_level handles the scenario where VMX's EPT is enabled and EPT doesn't support 2M pages. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:41 +01:00
Sean Christopherson	703c335d06	KVM: x86/mmu: Configure max page level during hardware setup Configure the max page level during hardware setup to avoid a retpoline in the page fault handler. Drop ->get_lpage_level() as the page fault handler was the last user. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:40 +01:00

... 2 3 4 5 6 ...

653 Commits