Document the intended usage of each emulation type as each exists to
handle an edge case of one kind or another and can be easily
misinterpreted at first glance.
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VMX's EPT misconfig flow to handle fast-MMIO path falls back to decoding
the instruction to determine the instruction length when running as a
guest (Hyper-V doesn't fill VMCS.VM_EXIT_INSTRUCTION_LEN because it's
technically not defined for EPT misconfigs). Rather than implement the
slow skip in VMX's generic skip_emulated_instruction(),
handle_ept_misconfig() directly calls kvm_emulate_instruction() with
EMULTYPE_SKIP, which intentionally doesn't do single-step detection, and
so handle_ept_misconfig() misses a single-step #DB.
Rework the EPT misconfig fallback case to route it through
kvm_skip_emulated_instruction() so that single-step #DBs and interrupt
shadow updates are handled automatically. I.e. make VMX's slow skip
logic match SVM's and have the SVM flow not intentionally avoid the
shadow update.
Alternatively, the handle_ept_misconfig() could manually handle single-
step detection, but that results in EMULTYPE_SKIP having split logic for
the interrupt shadow vs. single-step #DBs, and split emulator logic is
largely what led to this mess in the first place.
Modifying SVM to mirror VMX flow isn't really an option as SVM's case
isn't limited to a specific exit reason, i.e. handling the slow skip in
skip_emulated_instruction() is mandatory for all intents and purposes.
Drop VMX's skip_emulated_instruction() wrapper since it can now fail,
and instead WARN if it fails unexpectedly, e.g. if exit_reason somehow
becomes corrupted.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes: d391f12070 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that EMULATE_FAIL is completely unused, remove the last remaning
usage where KVM does something functional in response to EMULATE_FAIL.
Leave the check in place as a WARN_ON_ONCE to provide a better paper
trail when EMULATE_{DONE,FAIL,USER_EXIT} are completely removed.
Opportunistically remove the gotos in handle_invalid_guest_state().
With the EMULATE_FAIL handling gone there is no need to have a common
handler for emulation failure and the gotos only complicate things,
e.g. the signal_pending() check always returns '1', but this is far
from obvious when glancing through the code.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Request triple fault in kvm_inject_realmode_interrupt() instead of
returning EMULATE_FAIL and deferring to the caller. All existing
callers request triple fault and it's highly unlikely Real Mode is
going to acquire new features. While this consolidates a small amount
of code, the real goal is to remove the last reference to EMULATE_FAIL.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Consolidate the reporting of emulation failure into kvm_task_switch()
so that it can return EMULATE_USER_EXIT. This helps pave the way for
removing EMULATE_FAIL altogether.
This also fixes a theoretical bug where task switch interception could
suppress an EMULATE_USER_EXIT return.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kill a few birds with one stone by forcing an exit to userspace on skip
emulation failure. This removes a reference to EMULATE_FAIL, fixes a
bug in handle_ept_misconfig() where it would exit to userspace without
setting run->exit_reason, and fixes a theoretical bug in SVM's
task_switch_interception() where it would overwrite run->exit_reason on
a return of EMULATE_USER_EXIT.
Note, this technically doesn't fully fix task_switch_interception()
as it now incorrectly handles EMULATE_FAIL, but in practice there is no
bug as EMULATE_FAIL will never be returned for EMULTYPE_SKIP.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Immediately inject a #UD and return EMULATE done if emulation fails when
handling an intercepted #UD. This helps pave the way for removing
EMULATE_FAIL altogether.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add an explicit emulation type for forced #UD emulation and use it to
detect that KVM should unconditionally inject a #UD instead of falling
into its standard emulation failure handling.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Immediately inject a #GP when VMware emulation fails and return
EMULATE_DONE instead of propagating EMULATE_FAIL up the stack. This
helps pave the way for removing EMULATE_FAIL altogether.
Rename EMULTYPE_VMWARE to EMULTYPE_VMWARE_GP to document that the x86
emulator is called to handle VMware #GP interception, e.g. why a #GP
is injected on emulation failure for EMULTYPE_VMWARE_GP.
Drop EMULTYPE_NO_UD_ON_FAIL as a standalone type. The "no #UD on fail"
is used only in the VMWare case and is obsoleted by having the emulator
itself reinject #GP.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The VMware backdoor hooks #GP faults on IN{S}, OUT{S}, and RDPMC, none
of which generate a non-zero error code for their #GP. Re-injecting #GP
instead of attempting emulation on a non-zero error code will allow a
future patch to move #GP injection (for emulation failure) into
kvm_emulate_instruction() without having to plumb in the error code.
Reviewed-and-tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Return the single-step emulation result directly instead of via an out
param. Presumably at some point in the past kvm_vcpu_do_singlestep()
could be called with *r==EMULATE_USER_EXIT, but that is no longer the
case, i.e. all callers are happy to overwrite their own return variable.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When handling emulation failure, return the emulation result directly
instead of capturing it in a local variable. Future patches will move
additional cases into handle_emulation_failure(), clean up the cruft
before so there isn't an ugly mix of setting a local variable and
returning directly.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the stat.mmio_exits update into x86_emulate_instruction(). This is
both a bug fix, e.g. the current update flows will incorrectly increment
mmio_exits on emulation failure, and a preparatory change to set the
stage for eliminating EMULATE_DONE and company.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to section "Checks Related to Address-Space Size" in Intel SDM
vol 3C, the following checks are performed on vmentry of nested guests:
If the logical processor is outside IA-32e mode (if IA32_EFER.LMA = 0)
at the time of VM entry, the following must hold:
- The "IA-32e mode guest" VM-entry control is 0.
- The "host address-space size" VM-exit control is 0.
If the logical processor is in IA-32e mode (if IA32_EFER.LMA = 1) at the
time of VM entry, the "host address-space size" VM-exit control must be 1.
If the "host address-space size" VM-exit control is 0, the following must
hold:
- The "IA-32e mode guest" VM-entry control is 0.
- Bit 17 of the CR4 field (corresponding to CR4.PCIDE) is 0.
- Bits 63:32 in the RIP field are 0.
If the "host address-space size" VM-exit control is 1, the following must
hold:
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.
- The RIP field contains a canonical address.
On processors that do not support Intel 64 architecture, checks are
performed to ensure that the "IA-32e mode guest" VM-entry control and the
"host address-space size" VM-exit control are both 0.
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
guarantee that two virtual processors won't end up running on sibling SMT
threads without knowing about it. This is done as an optimization as in
this case there is nothing the guest can do to protect itself against MDS
and issuing additional flush requests is just pointless. On bare metal the
topology is known, however, when Hyper-V is running nested (e.g. on top of
KVM) it needs an additional piece of information: a confirmation that the
exposed topology (wrt vCPU placement on different SMT threads) is
trustworthy.
NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
TLFS as follows: "Indicates that a virtual processor will never share a
physical core with another virtual processor, except for virtual processors
that are reported as sibling SMT threads." From KVM we can give such
guarantee in two cases:
- SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
as it can become re-enabled during the lifetime of the guest).
- vCPUs are properly pinned so the scheduler won't put them on sibling
SMT threads (when they're not reported as such).
This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
first case. The second case is outside of KVM's domain of responsibility
(as vCPU pinning is actually done by someone who manages KVM's userspace -
e.g. libvirt pinning QEMU threads).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported by syzkaller:
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:__apic_accept_irq+0x46/0x740 arch/x86/kvm/lapic.c:1029
Call Trace:
kvm_apic_set_irq+0xb4/0x140 arch/x86/kvm/lapic.c:558
stimer_notify_direct arch/x86/kvm/hyperv.c:648 [inline]
stimer_expiration arch/x86/kvm/hyperv.c:659 [inline]
kvm_hv_process_stimers+0x594/0x1650 arch/x86/kvm/hyperv.c:686
vcpu_enter_guest+0x2b2a/0x54b0 arch/x86/kvm/x86.c:7896
vcpu_run+0x393/0xd40 arch/x86/kvm/x86.c:8152
kvm_arch_vcpu_ioctl_run+0x636/0x900 arch/x86/kvm/x86.c:8360
kvm_vcpu_ioctl+0x6cf/0xaf0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2765
The testcase programs HV_X64_MSR_STIMERn_CONFIG/HV_X64_MSR_STIMERn_COUNT,
in addition, there is no lapic in the kernel, the counters value are small
enough in order that kvm_hv_process_stimers() inject this already-expired
timer interrupt into the guest through lapic in the kernel which triggers
the NULL deferencing. This patch fixes it by don't advertise direct mode
synthetic timers and discarding the inject when lapic is not in kernel.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=1752fe0a600000
Reported-by: syzbot+dff25ee91f0c7d5c1695@syzkaller.appspotmail.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zapping collapsible sptes, a.k.a. 4k sptes that can be promoted into a
large page, is only necessary when changing only the dirty logging flag
of a memory region. If the memslot is also being moved, then all sptes
for the memslot are zapped when it is invalidated. When a memslot is
being created, it is impossible for there to be existing dirty mappings,
e.g. KVM can have MMIO sptes, but not present, and thus dirty, sptes.
Note, the comment and logic are shamelessly borrowed from MIPS's version
of kvm_arch_commit_memory_region().
Fixes: 3ea3b7fa9a ("kvm: mmu: lazy collapse small sptes into large sptes")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was discovered that after commit 65efa61dc0 ("selftests: kvm: provide
common function to enable eVMCS") hyperv_cpuid selftest is failing on AMD.
The reason is that the commit changed _vcpu_ioctl() to vcpu_ioctl() in the
test and this one can't fail.
Instead of fixing the test is seems to make more sense to not announce
KVM_CAP_HYPERV_ENLIGHTENED_VMCS support if it is definitely missing
(on svm and in case kvm_intel.nested=0).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit 5158917c7b ("KVM: x86: nVMX: Allow nested_enable_evmcs to
be NULL") the code in x86.c is prepared to see nested_enable_evmcs being
NULL and in VMX case it actually is when nesting is disabled. Remove the
unneeded stub from SVM code.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V provides direct tlb flush function which helps
L1 Hypervisor to handle Hyper-V tlb flush request from
L2 guest. Add the function support for VMX.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V direct tlb flush function should be enabled for
guest that only uses Hyper-V hypercall. User space
hypervisor(e.g, Qemu) can disable KVM identification in
CPUID and just exposes Hyper-V identification to make
sure the precondition. Add new KVM capability KVM_CAP_
HYPERV_DIRECT_TLBFLUSH for user space to enable Hyper-V
direct tlb function and this function is default to be
disabled in KVM.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The struct hv_vp_assist_page was defined incorrectly.
The "vtl_control" should be u64[3], "nested_enlightenments
_control" should be a u64 and there are 7 reserved bytes
following "enlighten_vmentry". Fix the definition.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These MSRs should be enumerated by KVM_GET_MSR_INDEX_LIST, so that
userspace knows that these MSRs may be part of the vCPU state.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Eric Hankland <ehankland@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On POWER9, under some circumstances, a broadcast TLB invalidation will
fail to invalidate the ERAT cache on some threads when there are
parallel mtpidr/mtlpidr happening on other threads of the same core.
This can cause stores to continue to go to a page after it's unmapped.
The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie
flush. This additional TLB flush will cause the ERAT cache
invalidation. Since we are using PID=0 or LPID=0, we don't get
filtered out by the TLB snoop filtering logic.
We need to still follow this up with another tlbie to take care of
store vs tlbie ordering issue explained in commit:
a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on
POWER9"). The presence of ERAT cache implies we can still get new
stores and they may miss store queue marking flush.
Cc: stable@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com
The store ordering vs tlbie issue mentioned in commit
a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on
POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't
need to apply the fixup if we are running on them
We can only do this on PowerNV. On pseries guest with KVM we still
don't support redoing the feature fixup after migration. So we should
be enabling all the workarounds needed, because whe can possibly
migrate between DD 2.3 and DD 2.2
Fixes: a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com
Depending on the hardware and the hypervisor, the hcall H_BLOCK_REMOVE
may not be able to process all the page sizes for a segment base page
size, as reported by the TLB Invalidate Characteristics.
For each pair of base segment page size and actual page size, this
characteristic tells us the size of the block the hcall supports.
In the case, the hcall is not supporting a pair of base segment page
size, actual page size, it is returning H_PARAM which leads to a panic
like this:
kernel BUG at /home/srikar/work/linux.git/arch/powerpc/platforms/pseries/lpar.c:466!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 28 PID: 583 Comm: modprobe Not tainted 5.2.0-master #5
NIP: c0000000000be8dc LR: c0000000000be880 CTR: 0000000000000000
REGS: c0000007e77fb130 TRAP: 0700 Not tainted (5.2.0-master)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42224824 XER: 20000000
CFAR: c0000000000be8fc IRQMASK: 0
GPR00: 0000000022224828 c0000007e77fb3c0 c000000001434d00 0000000000000005
GPR04: 9000000004fa8c00 0000000000000000 0000000000000003 0000000000000001
GPR08: c0000007e77fb450 0000000000000000 0000000000000001 ffffffffffffffff
GPR12: c0000007e77fb450 c00000000edfcb80 0000cd7d3ea30000 c0000000016022b0
GPR16: 00000000000000b0 0000cd7d3ea30000 0000000000000001 c080001f04f00105
GPR20: 0000000000000003 0000000000000004 c000000fbeb05f58 c000000001602200
GPR24: 0000000000000000 0000000000000004 8800000000000000 c000000000c5d148
GPR28: c000000000000000 8000000000000000 a000000000000000 c0000007e77fb580
NIP [c0000000000be8dc] .call_block_remove+0x12c/0x220
LR [c0000000000be880] .call_block_remove+0xd0/0x220
Call Trace:
0xc000000fb8c00240 (unreliable)
.pSeries_lpar_flush_hash_range+0x578/0x670
.flush_hash_range+0x44/0x100
.__flush_tlb_pending+0x3c/0xc0
.zap_pte_range+0x7ec/0x830
.unmap_page_range+0x3f4/0x540
.unmap_vmas+0x94/0x120
.exit_mmap+0xac/0x1f0
.mmput+0x9c/0x1f0
.do_exit+0x388/0xd60
.do_group_exit+0x54/0x100
.__se_sys_exit_group+0x14/0x20
system_call+0x5c/0x70
Instruction dump:
39400001 38a00000 4800003c 60000000 60420000 7fa9e800 38e00000 419e0014
7d29d278 7d290074 7929d182 69270001 <0b070000> 7d495378 394a0001 7fa93040
The call to H_BLOCK_REMOVE should only be made for the supported pair
of base segment page size, actual page size and using the correct
maximum block size.
Due to the required complexity in do_block_remove() and
call_block_remove(), and the fact that currently a block size of 8 is
returned by the hypervisor, we are only supporting 8 size block to the
H_BLOCK_REMOVE hcall.
In order to identify this limitation easily in the code, a local
define HBLKR_SUPPORTED_SIZE defining the currently supported block
size, and a dedicated checking helper is_supported_hlbkr() are
introduced.
For regular pages and hugetlb, the assumption is made that the page
size is equal to the base page size. For THP the page size is assumed
to be 16M.
Fixes: ba2dd8a26b ("powerpc/pseries/mm: call H_BLOCK_REMOVE")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190920130523.20441-3-ldufour@linux.ibm.com
The PAPR document specifies the TLB Block Invalidate Characteristics
which tells for each pair of segment base page size, actual page size,
the size of the block the hcall H_BLOCK_REMOVE supports.
These characteristics are loaded at boot time in a new table
hblkr_size. The table is separate from the mmu_psize_def because this
is specific to the pseries platform.
A new init function, pseries_lpar_read_hblkrm_characteristics() is
added to read the characteristics. It is called from
pSeries_setup_arch().
Fixes: ba2dd8a26b ("powerpc/pseries/mm: call H_BLOCK_REMOVE")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190920130523.20441-2-ldufour@linux.ibm.com
On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB
of memory running the following guest configs:
guest A:
- 224GB of memory
- 56 VCPUs (sockets=1,cores=28,threads=2), where:
VCPUs 0-1 are pinned to CPUs 0-3,
VCPUs 2-3 are pinned to CPUs 4-7,
...
VCPUs 54-55 are pinned to CPUs 108-111
guest B:
- 4GB of memory
- 4 VCPUs (sockets=1,cores=4,threads=1)
with the following workloads (with KSM and THP enabled in all):
guest A:
stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M
guest B:
stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M
host:
stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M
the below soft-lockup traces were observed after an hour or so and
persisted until the host was reset (this was found to be reliably
reproducible for this configuration, for kernels 4.15, 4.18, 5.0,
and 5.3-rc5):
[ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1253.183319] rcu: 124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941
[ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709]
[ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913]
[ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331]
[ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338]
[ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525]
[ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
[ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1316.198032] rcu: 124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243
[ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
[ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1379.212629] rcu: 124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714
[ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
[ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1442.227115] rcu: 124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403
[ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds.
[ 1455.111822] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds.
[ 1455.111905] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds.
[ 1455.111986] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds.
[ 1455.112068] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds.
[ 1455.112159] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds.
[ 1455.112231] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds.
[ 1455.112303] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
[ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds.
[ 1455.112392] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
CPUs 45, 24, and 124 are stuck on spin locks, likely held by
CPUs 105 and 31.
CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on
target CPU 42. For instance:
# CPU 105 registers (via xmon)
R00 = c00000000020b20c R16 = 00007d1bcd800000
R01 = c00000363eaa7970 R17 = 0000000000000001
R02 = c0000000019b3a00 R18 = 000000000000006b
R03 = 000000000000002a R19 = 00007d537d7aecf0
R04 = 000000000000002a R20 = 60000000000000e0
R05 = 000000000000002a R21 = 0801000000000080
R06 = c0002073fb0caa08 R22 = 0000000000000d60
R07 = c0000000019ddd78 R23 = 0000000000000001
R08 = 000000000000002a R24 = c00000000147a700
R09 = 0000000000000001 R25 = c0002073fb0ca908
R10 = c000008ffeb4e660 R26 = 0000000000000000
R11 = c0002073fb0ca900 R27 = c0000000019e2464
R12 = c000000000050790 R28 = c0000000000812b0
R13 = c000207fff623e00 R29 = c0002073fb0ca808
R14 = 00007d1bbee00000 R30 = c0002073fb0ca800
R15 = 00007d1bcd600000 R31 = 0000000000000800
pc = c00000000020b260 smp_call_function_many+0x3d0/0x460
cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460
lr = c00000000020b20c smp_call_function_many+0x37c/0x460
msr = 900000010288b033 cr = 44024824
ctr = c000000000050790 xer = 0000000000000000 trap = 100
CPU 42 is running normally, doing VCPU work:
# CPU 42 stack trace (via xmon)
[link register ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv]
[c000008ed3343820] c000008ed3343850 (unreliable)
[c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv]
[c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv]
[c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm]
[c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm]
[c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70
[c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120
[c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80
[c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70
--- Exception: c00 (System Call) at 00007d545cfd7694
SP (7d53ff7edf50) is in userspace
It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION]
was set for CPU 42 by at least 1 of the CPUs waiting in
smp_call_function_many(), but somehow the corresponding
call_single_queue entries were never processed by CPU 42, causing the
callers to spin in csd_lock_wait() indefinitely.
Nick Piggin suggested something similar to the following sequence as
a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE
IPI messages seems to be most common, but any mix of CALL_FUNCTION and
!CALL_FUNCTION messages could trigger it):
CPU
X: smp_muxed_ipi_set_message():
X: smp_mb()
X: message[RESCHEDULE] = 1
X: doorbell_global_ipi(42):
X: kvmppc_set_host_ipi(42, 1)
X: ppc_msgsnd_sync()/smp_mb()
X: ppc_msgsnd() -> 42
42: doorbell_exception(): // from CPU X
42: ppc_msgsync()
105: smp_muxed_ipi_set_message():
105: smb_mb()
// STORE DEFERRED DUE TO RE-ORDERING
--105: message[CALL_FUNCTION] = 1
| 105: doorbell_global_ipi(42):
| 105: kvmppc_set_host_ipi(42, 1)
| 42: kvmppc_set_host_ipi(42, 0)
| 42: smp_ipi_demux_relaxed()
| 42: // returns to executing guest
| // RE-ORDERED STORE COMPLETES
->105: message[CALL_FUNCTION] = 1
105: ppc_msgsnd_sync()/smp_mb()
105: ppc_msgsnd() -> 42
42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
105: // hangs waiting on 42 to process messages/call_single_queue
This can be prevented with an smp_mb() at the beginning of
kvmppc_set_host_ipi(), such that stores to message[<type>] (or other
state indicated by the host_ipi flag) are ordered vs. the store to
to host_ipi.
However, doing so might still allow for the following scenario (not
yet observed):
CPU
X: smp_muxed_ipi_set_message():
X: smp_mb()
X: message[RESCHEDULE] = 1
X: doorbell_global_ipi(42):
X: kvmppc_set_host_ipi(42, 1)
X: ppc_msgsnd_sync()/smp_mb()
X: ppc_msgsnd() -> 42
42: doorbell_exception(): // from CPU X
42: ppc_msgsync()
// STORE DEFERRED DUE TO RE-ORDERING
-- 42: kvmppc_set_host_ipi(42, 0)
| 42: smp_ipi_demux_relaxed()
| 105: smp_muxed_ipi_set_message():
| 105: smb_mb()
| 105: message[CALL_FUNCTION] = 1
| 105: doorbell_global_ipi(42):
| 105: kvmppc_set_host_ipi(42, 1)
| // RE-ORDERED STORE COMPLETES
-> 42: kvmppc_set_host_ipi(42, 0)
42: // returns to executing guest
105: ppc_msgsnd_sync()/smp_mb()
105: ppc_msgsnd() -> 42
42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
105: // hangs waiting on 42 to process messages/call_single_queue
Fixing this scenario would require an smp_mb() *after* clearing
host_ipi flag in kvmppc_set_host_ipi() to order the store vs.
subsequent processing of IPI messages.
To handle both cases, this patch splits kvmppc_set_host_ipi() into
separate set/clear functions, where we execute smp_mb() prior to
setting host_ipi flag, and after clearing host_ipi flag. These
functions pair with each other to synchronize the sender and receiver
sides.
With that change in place the above workload ran for 20 hours without
triggering any lock-ups.
Fixes: 755563bc79 ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.com
arch_update_cpu_topology is first called from:
kernel_init_freeable->sched_init_smp->sched_init_domains
even before cpus has been registered in:
kernel_init_freeable->do_one_initcall->s390_smp_init
Do not trigger kobject_uevent change events until cpu devices are
actually created. Fixes the following kasan findings:
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb40/0xee0
Read of size 8 at addr 0000000000000020 by task swapper/0/1
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb36/0xee0
Read of size 8 at addr 0000000000000018 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
([<0000000143c6db7e>] show_stack+0x14e/0x1a8)
[<0000000145956498>] dump_stack+0x1d0/0x218
[<000000014429fb4c>] print_address_description+0x64/0x380
[<000000014429f630>] __kasan_report+0x138/0x168
[<0000000145960b96>] kobject_uevent_env+0xb36/0xee0
[<0000000143c7c47c>] arch_update_cpu_topology+0x104/0x108
[<0000000143df9e22>] sched_init_domains+0x62/0xe8
[<000000014644c94a>] sched_init_smp+0x3a/0xc0
[<0000000146433a20>] kernel_init_freeable+0x558/0x958
[<000000014599002a>] kernel_init+0x22/0x160
[<00000001459a71d4>] ret_from_fork+0x28/0x30
[<00000001459a71dc>] kernel_thread_starter+0x0/0x10
Cc: stable@vger.kernel.org
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- Fix Tegra OF node reference leak (Nishka Dasgupta)
- Add #defines for PCIe Data Link Feature and Physical Layer 16.0 GT/s
features (Vidya Sagar)
- Disable MSI for Tegra Root Ports since they don't support using MSI for
all Root Port events (Vidya Sagar)
- Group DesignWare write-protected register writes together (Vidya Sagar)
- Move DesignWare capability search interfaces so they can be used by
both host and endpoint drivers (Vidya Sagar)
- Add DesignWare extended capability search interfaces (Vidya Sagar)
- Export dw_pcie_wait_for_link() so drivers can be modules (Vidya Sagar)
- Add "snps,enable-cdm-check" DT binding for Configuration Dependent
Module (CDM) register checking (Vidya Sagar)
- Add DesignWare support for "snps,enable-cdm-check" CDM checking (Vidya
Sagar)
- Add "supports-clkreq" DT binding for host drivers to decide whether to
advertise low power features (Vidya Sagar)
- Add DT binding for Tegra194 (Vidya Sagar)
- Add DT binding for Tegra194 P2U (PIPE to UPHY) block (Vidya Sagar)
- Add support for Tegra194 P2U (PIPE to UPHY) (Vidya Sagar)
- Add support for Tegra194 host controller (Vidya Sagar)
- Add Tegra support for sideband PERST# and CLKREQ# for C5 (Vidya Sagar)
- Add Tegra support for slot regulators for p2972-0000 platform (Vidya
Sagar)
* lorenzo/pci/tegra:
arm64: tegra: Add PCIe slot supply information in p2972-0000 platform
arm64: tegra: Add configuration for PCIe C5 sideband signals
PCI: tegra: Add support to enable slot regulators
PCI: tegra: Add support to configure sideband pins
dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries
dt-bindings: PCI: tegra: Add sideband pins configuration entries
PCI: tegra: Add Tegra194 PCIe support
phy: tegra: Add PCIe PIPE2UPHY support
dt-bindings: PHY: P2U: Add Tegra194 P2U block
dt-bindings: PCI: tegra: Add device tree support for Tegra194
dt-bindings: Add PCIe supports-clkreq property
PCI: dwc: Add support to enable CDM register check
dt-bindings: PCI: designware: Add binding for CDM register check
PCI: dwc: Export dw_pcie_wait_for_link() API
PCI: dwc: Add extended configuration space capability search API
PCI: dwc: Move config space capability search API
PCI: dwc: Group DBI registers writes requiring unlocking
PCI: Disable MSI for Tegra root ports
PCI: Add #defines for some of PCIe spec r4.0 features
PCI: tegra: Fix OF node reference leak
- Make kirin_dw_pcie_ops constant (Nishka Dasgupta)
- Make DesignWare "num-lanes" property optional and remove from relevant
DTs (Hou Zhiqiang)
* remotes/lorenzo/pci/dwc:
arm64: dts: fsl: Remove num-lanes property from PCIe nodes
ARM: dts: ls1021a: Remove num-lanes property from PCIe nodes
PCI: dwc: Return directly when num-lanes is not found
dt-bindings: PCI: designware: Remove the num-lanes from Required properties
PCI: kirin: Make structure kirin_dw_pcie_ops constant
Droid4 needs USB option serial driver for modem, and lm3532 for the
LCD backlight.
Note that the LCD backlight does not yet get enabled automatically,
but needs to be done manually with:
# echo 50 > /sys/class/leds/lm3532::backlight/brightness
Signed-off-by: Tony Lindgren <tony@atomide.com>
The TFP410 driver was removed but the replacement driver was
never enabled. This patch enableds the DRM_TI_TFP410
Fixes: be3143d8b2 ("drm/omap: Remove TFP410 and DVI connector drivers")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
commit 6953c57ab1 "gpio: of: Handle SPI chipselect legacy bindings"
did introduce logic to centrally handle the legacy spi-cs-high property
in combination with cs-gpios. This assumes that the polarity
of the CS has to be inverted if spi-cs-high is missing, even
and especially if non-legacy GPIO_ACTIVE_HIGH is specified.
The DTS for the GTA04 was orginally introduced under the assumption
that there is no need for spi-cs-high if the gpio is defined with
proper polarity GPIO_ACTIVE_HIGH.
This was not a problem until gpiolib changed the interpretation of
GPIO_ACTIVE_HIGH and missing spi-cs-high.
The effect is that the missing spi-cs-high is now interpreted as CS being
low (despite GPIO_ACTIVE_HIGH) which turns off the SPI interface when the
panel is to be programmed by the panel driver.
Therefore, we have to add the redundant and legacy spi-cs-high property
to properly activate CS.
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The ahclkr clkctrl clock bit 28 only exists for mcasp 1 and 2 on dra7.
This causes the following warning on beagle-x15:
ti-sysc 48468000.target-module: could not add child clock ahclkr: -19
Also the mcasp clkctrl clock bits are wrong:
For mcasp1 and 2 we have four clocks at bits 28, 24, 22 and 0:
bit 28 is ahclkr
bit 24 is ahclkx
bit 22 is auxclk
bit 0 is fck
For mcasp3 to 8 we have three clocks at bits 24, 22 and 0.
bit 24 is ahclkx
bit 22 is auxclk
bit 0 is fck
We do not have currently mapped auxclk at bit 22 for the drivers, that can
be added if needed.
Fixes: 5241ccbf28 ("ARM: dts: Add missing ranges for dra7 mcasp l3 ports")
Cc: Suman Anna <s-anna@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull modules updates from Jessica Yu:
"The main bulk of this pull request introduces a new exported symbol
namespaces feature. The number of exported symbols is increasingly
growing with each release (we're at about 31k exports as of 5.3-rc7)
and we currently have no way of visualizing how these symbols are
"clustered" or making sense of this huge export surface.
Namespacing exported symbols allows kernel developers to more
explicitly partition and categorize exported symbols, as well as more
easily limiting the availability of namespaced symbols to other parts
of the kernel. For starters, we have introduced the USB_STORAGE
namespace to demonstrate the API's usage. I have briefly summarized
the feature and its main motivations in the tag below.
Summary:
- Introduce exported symbol namespaces.
This new feature allows subsystem maintainers to partition and
categorize their exported symbols into explicit namespaces. Module
authors are now required to import the namespaces they need.
Some of the main motivations of this feature include: allowing
kernel developers to better manage the export surface, allow
subsystem maintainers to explicitly state that usage of some
exported symbols should only be limited to certain users (think:
inter-module or inter-driver symbols, debugging symbols, etc), as
well as more easily limiting the availability of namespaced symbols
to other parts of the kernel.
With the module import requirement, it is also easier to spot the
misuse of exported symbols during patch review.
Two new macros are introduced: EXPORT_SYMBOL_NS() and
EXPORT_SYMBOL_NS_GPL(). The API is thoroughly documented in
Documentation/kbuild/namespaces.rst.
- Some small code and kbuild cleanups here and there"
* tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Remove leftover '#undef' from export header
module: remove unneeded casts in cmp_name()
module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES
module: remove redundant 'depends on MODULES'
module: Fix link failure due to invalid relocation on namespace offset
usb-storage: export symbols in USB_STORAGE namespace
usb-storage: remove single-use define for debugging
docs: Add documentation for Symbol Namespaces
scripts: Coccinelle script for namespace dependencies.
modpost: add support for generating namespace dependencies
export: allow definition default namespaces in Makefiles or sources
module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS
modpost: add support for symbol namespaces
module: add support for symbol namespaces.
export: explicitly align struct kernel_symbol
module: support reading multiple values per modinfo tag
Pull ARM updates from Russell King:
- fix various clang build and cppcheck issues
- switch ARM to use new common outgoing-CPU-notification code
- add some additional explanation about the boot code
- kbuild "make clean" fixes
- get rid of another "(____ptrval____)", this time for the VDSO code
- avoid treating cache maintenance faults as a write
- add a frame pointer unwinder implementation for clang
- add EDAC support for Aurora L2 cache
- improve robustness of adjust_lowmem_bounds() finding the bounds of
lowmem.
- add reset control for AMBA primecell devices
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (24 commits)
ARM: 8906/1: drivers/amba: add reset control to amba bus probe
ARM: 8905/1: Emit __gnu_mcount_nc when using Clang 10.0.0 or newer
ARM: 8904/1: skip nomap memblocks while finding the lowmem/highmem boundary
ARM: 8903/1: ensure that usable memory in bank 0 starts from a PMD-aligned address
ARM: 8891/1: EDAC: armada_xp: Add support for more SoCs
ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC
ARM: 8892/1: EDAC: Add missing debugfs_create_x32 wrapper
ARM: 8890/1: l2x0: add marvell,ecc-enable property for aurora
ARM: 8889/1: dt-bindings: document marvell,ecc-enable binding
ARM: 8886/1: l2x0: support parity-enable/disable on aurora
ARM: 8885/1: aurora-l2: add defines for parity and ECC registers
ARM: 8887/1: aurora-l2: add prefix to MAX_RANGE_SIZE
ARM: 8902/1: l2c: move cache-aurora-l2.h to asm/hardware
ARM: 8900/1: UNWINDER_FRAME_POINTER implementation for Clang
ARM: 8898/1: mm: Don't treat faults reported from cache maintenance as writes
ARM: 8896/1: VDSO: Don't leak kernel addresses
ARM: 8895/1: visit mach-* and plat-* directories when cleaning
ARM: 8894/1: boot: Replace open-coded nop with macro
ARM: 8893/1: boot: Explain the 8 nops
ARM: 8876/1: fix O= building with CONFIG_FPE_FASTFPE
...
Pull MIPS updates from Paul Burton:
"Main MIPS changes:
- boot_mem_map is removed, providing a nice cleanup made possible by
the recent removal of bootmem.
- Some fixes to atomics, in general providing compiler barriers for
smp_mb__{before,after}_atomic plus fixes specific to Loongson CPUs
or MIPS32 systems using cmpxchg64().
- Conversion to the new generic VDSO infrastructure courtesy of
Vincenzo Frascino.
- Removal of undefined behavior in set_io_port_base(), fixing the
behavior of some MIPS kernel configurations when built with recent
clang versions.
- Initial MIPS32 huge page support, functional on at least Ingenic
SoCs.
- pte_special() is now supported for some configurations, allowing
among other things generic fast GUP to be used.
- Miscellaneous fixes & cleanups.
And platform specific changes:
- Major improvements to Ingenic SoC support from Paul Cercueil,
mostly enabled by the inclusion of the new TCU (timer-counter unit)
drivers he's spent a very patient year or so working on. Plus some
fixes for X1000 SoCs from Zhou Yanjie.
- Netgear R6200 v1 systems are now supported by the bcm47xx platform.
- DT updates for BMIPS, Lantiq & Microsemi Ocelot systems"
* tag 'mips_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (89 commits)
MIPS: Detect bad _PFN_SHIFT values
MIPS: Disable pte_special() for MIPS32 with RiXi
MIPS: ralink: deactivate PCI support for SOC_MT7621
mips: compat: vdso: Use legacy syscalls as fallback
MIPS: Drop Loongson _CACHE_* definitions
MIPS: tlbex: Remove cpu_has_local_ebase
MIPS: tlbex: Simplify r3k check
MIPS: Select R3k-style TLB in Kconfig
MIPS: PCI: refactor ioc3 special handling
mips: remove ioremap_cachable
mips/atomic: Fix smp_mb__{before,after}_atomic()
mips/atomic: Fix loongson_llsc_mb() wreckage
mips/atomic: Fix cmpxchg64 barriers
MIPS: Octeon: remove duplicated include from dma-octeon.c
firmware: bcm47xx_nvram: Allow COMPILE_TEST
firmware: bcm47xx_nvram: Correct size_t printf format
MIPS: Treat Loongson Extensions as ASEs
MIPS: Remove dev_err() usage after platform_get_irq()
MIPS: dts: mscc: describe the PTP ready interrupt
MIPS: dts: mscc: describe the PTP register range
...
Pull UML updates from Richard Weinberger:
- virtio support
- fixes for our new time travel mode
- various improvements to make lockdep and kasan work better
- SPDX header updates
* tag 'for-linus-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (25 commits)
um: irq: Fix LAST_IRQ usage in init_IRQ()
um: Add SPDX headers for files in arch/um/include
um: Add SPDX headers for files in arch/um/os-Linux
um: Add SPDX headers to files in arch/um/kernel/
um: Add SPDX headers for files in arch/um/drivers
um: virtio: Implement VHOST_USER_PROTOCOL_F_REPLY_ACK
um: virtio: Implement VHOST_USER_PROTOCOL_F_SLAVE_REQ
um: drivers: Add virtio vhost-user driver
um: Use real DMA barriers
um: Don't use generic barrier.h
um: time-travel: Restrict time update in IRQ handler
um: time-travel: Fix periodic timers
um: Enable CONFIG_CONSTRUCTORS
um: Place (soft)irq text with macros
um: Fix VDSO compiler warning
um: Implement TRACE_IRQFLAGS_SUPPORT
um: Remove misleading #define ARCh_IRQ_ENABLED
um: Avoid using uninitialized regs
um: Remove sig_info[SIGALRM]
um: Error handling fixes in vector drivers
...
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi,
lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates. The
only core change this time around is the addition of request batching
for virtio. Since batching requires an additional flag to use, it
should be invisible to the rest of the drivers"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (264 commits)
scsi: hisi_sas: Fix the conflict between device gone and host reset
scsi: hisi_sas: Add BIST support for phy loopback
scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation
scsi: hisi_sas: Remove some unused function arguments
scsi: hisi_sas: Remove redundant work declaration
scsi: hisi_sas: Remove hisi_sas_hw.slot_complete
scsi: hisi_sas: Assign NCQ tag for all NCQ commands
scsi: hisi_sas: Update all the registers after suspend and resume
scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device
scsi: hisi_sas: Remove sleep after issue phy reset if sas_smp_phy_control() fails
scsi: hisi_sas: Directly return when running I_T_nexus reset if phy disabled
scsi: hisi_sas: Use true/false as input parameter of sas_phy_reset()
scsi: hisi_sas: add debugfs auto-trigger for internal abort time out
scsi: virtio_scsi: unplug LUNs when events missed
scsi: scsi_dh_rdac: zero cdb in send_mode_select()
scsi: fcoe: fix null-ptr-deref Read in fc_release_transport
scsi: ufs-hisi: use devm_platform_ioremap_resource() to simplify code
scsi: ufshcd: use devm_platform_ioremap_resource() to simplify code
scsi: hisi_sas: use devm_platform_ioremap_resource() to simplify code
scsi: ufs: Use kmemdup in ufshcd_read_string_desc()
...
Pull hmm updates from Jason Gunthorpe:
"This is more cleanup and consolidation of the hmm APIs and the very
strongly related mmu_notifier interfaces. Many places across the tree
using these interfaces are touched in the process. Beyond that a
cleanup to the page walker API and a few memremap related changes
round out the series:
- General improvement of hmm_range_fault() and related APIs, more
documentation, bug fixes from testing, API simplification &
consolidation, and unused API removal
- Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
and make them internal kconfig selects
- Hoist a lot of code related to mmu notifier attachment out of
drivers by using a refcount get/put attachment idiom and remove the
convoluted mmu_notifier_unregister_no_release() and related APIs.
- General API improvement for the migrate_vma API and revision of its
only user in nouveau
- Annotate mmu_notifiers with lockdep and sleeping region debugging
Two series unrelated to HMM or mmu_notifiers came along due to
dependencies:
- Allow pagemap's memremap_pages family of APIs to work without
providing a struct device
- Make walk_page_range() and related use a constant structure for
function pointers"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
libnvdimm: Enable unit test infrastructure compile checks
mm, notifier: Catch sleeping/blocking for !blockable
kernel.h: Add non_block_start/end()
drm/radeon: guard against calling an unpaired radeon_mn_unregister()
csky: add missing brackets in a macro for tlb.h
pagewalk: use lockdep_assert_held for locking validation
pagewalk: separate function pointers from iterator data
mm: split out a new pagewalk.h header from mm.h
mm/mmu_notifiers: annotate with might_sleep()
mm/mmu_notifiers: prime lockdep
mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
mm/hmm: hmm_range_fault() infinite loop
mm/hmm: hmm_range_fault() NULL pointer bug
mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
mm/mmu_notifiers: remove unregister_no_release
RDMA/odp: remove ib_ucontext from ib_umem
RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
...
Pull asm inline support from Miguel Ojeda:
"Make use of gcc 9's "asm inline()" (Rasmus Villemoes):
gcc 9+ (and gcc 8.3, 7.5) provides a way to override the otherwise
crude heuristic that gcc uses to estimate the size of the code
represented by an asm() statement. From the gcc docs
If you use 'asm inline' instead of just 'asm', then for inlining
purposes the size of the asm is taken as the minimum size, ignoring
how many instructions GCC thinks it is.
For compatibility with older compilers, we obviously want a
#if [understands asm inline]
#define asm_inline asm inline
#else
#define asm_inline asm
#endif
But since we #define the identifier inline to attach some attributes,
we have to use an alternate spelling of that keyword. gcc provides
both __inline__ and __inline, and we currently #define both to inline,
so they all have the same semantics.
We have to free up one of __inline__ and __inline, and the latter is
by far the easiest.
The two x86 changes cause smaller code gen differences than I'd
expect, but I think we do want the asm_inline thing available sooner
or later, so this is just to get the ball rolling"
* tag 'compiler-attributes-for-linus-v5.4' of git://github.com/ojeda/linux:
x86: bug.h: use asm_inline in _BUG_FLAGS definitions
x86: alternative.h: use asm_inline for all alternative variants
compiler-types.h: add asm_inline definition
compiler_types.h: don't #define __inline
lib/zstd/mem.h: replace __inline by inline
staging: rtl8723bs: replace __inline by inline
Pull ARM SoC late updates from Arnd Bergmann:
"This is some material that we picked up into our tree late or that had
complex inter-depondencies. The fact that there are these
interdependencies tends to meant that these are often actually the
most interesting new additions:
- The new Aspeed AST2600 baseboard management controller is added,
this is a Cortex-A7 based follow-up to the ARM11 based AST2500 and
had some dependencies on other device drivers.
- After many years, support for the MMP2 based OLPC XO-1.75 finally
makes it into the kernel.
- The Armada 3720 based Turris Mox open source router platform is a
late addition and it follows some preparatory work across multiple
branches.
- The OMAP2+ platform had some large-scale cleanup involving driver
changes and DT changes, here we finish it off, dropping a lot of
the now-unused platform data.
- The TI K3 platform that got added for 5.3 gains a lot more support
for individual bits on the SoC, this part just came late for the
merge window"
[ This pull request itself wasn't actually sent late at all by Arnd, but
I waited on the branches that it used to be pulled first, so it ends
up being merged much later than the other ARM SoC pull requests this
merge window - Linus ]
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (57 commits)
ARM: dts: dir685: Drop spi-cpol from the display
ARM: dts: aspeed: Add AST2600 pinmux nodes
ARM: dts: aspeed: Add AST2600 and EVB
ARM: exynos: Enable support for ARM architected timers
ARM: samsung: Fix system restart on S3C6410
ARM: dts: mmp2: add OLPC XO 1.75 machine
ARM: dts: mmp2: rename the USB PHY node
ARM: dts: mmp2: specify reg-shift for the UARTs
ARM: dts: mmp2: add camera interfaces
ARM: dts: mmp2: fix the SPI nodes
ARM: dts: mmp2: trivial whitespace fix
arm64: dts: marvell: add DTS for Turris Mox
dt-bindings: marvell: document Turris Mox compatible
arm64: dts: marvell: armada-37xx: add SPI CS1 pinctrl
arm64: dts: ti: k3-j721e-main: Fix gic-its node unit-address
arm64: dts: ti: k3-am65-main: Fix gic-its node unit-address
arm64: dts: ti: k3-j721e-main: Add hwspinlock node
arm64: dts: ti: k3-am65-main: Add hwspinlock node
arm64: dts: k3-j721e: Add gpio-keys on common processor board
dt-bindings: pinctrl: k3: Introduce pinmux definitions for J721E
...