Commit Graph

161909 Commits

Author SHA1 Message Date
Wanpeng Li
d9a710e5fc KVM: X86: Dynamically allocate user_fpu
After reverting commit 240c35a378 (kvm: x86: Use task structs fpu field
for user), struct kvm_vcpu is 19456 bytes on my server, PAGE_ALLOC_COSTLY_ORDER(3)
is the order at which allocations are deemed costly to service. In serveless
scenario, one host can service hundreds/thoudands firecracker/kata-container
instances, howerver, new instance will fail to launch after memory is too
fragmented to allocate kvm_vcpu struct on host, this was observed in some
cloud provider product environments.

This patch dynamically allocates user_fpu, kvm_vcpu is 15168 bytes now on my
Skylake server.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:48 +02:00
Wanpeng Li
e751732486 KVM: X86: Fix fpu state crash in kvm guest
The idea before commit 240c35a37 (which has just been reverted)
was that we have the following FPU states:

               userspace (QEMU)             guest
---------------------------------------------------------------------------
               processor                    vcpu->arch.guest_fpu
>>> KVM_RUN: kvm_load_guest_fpu
               vcpu->arch.user_fpu          processor
>>> preempt out
               vcpu->arch.user_fpu          current->thread.fpu
>>> preempt in
               vcpu->arch.user_fpu          processor
>>> back to userspace
>>> kvm_put_guest_fpu
               processor                    vcpu->arch.guest_fpu
---------------------------------------------------------------------------

With the new lazy model we want to get the state back to the processor
when schedule in from current->thread.fpu.

Reported-by: Thomas Lambertz <mail@thomaslambertz.de>
Reported-by: anthony <antdev66@gmail.com>
Tested-by: anthony <antdev66@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Lambertz <mail@thomaslambertz.de>
Cc: anthony <antdev66@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add a comment in front of the warning. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:47 +02:00
Paolo Bonzini
ec269475cb Revert "kvm: x86: Use task structs fpu field for user"
This reverts commit 240c35a378
("kvm: x86: Use task structs fpu field for user", 2018-11-06).
The commit is broken and causes QEMU's FPU state to be destroyed
when KVM_RUN is preempted.

Fixes: 240c35a378 ("kvm: x86: Use task structs fpu field for user")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:47 +02:00
Jan Kiszka
cf64527bb3 KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Letting this pend may cause nested_get_vmcs12_pages to run against an
invalid state, corrupting the effective vmcs of L1.

This was triggerable in QEMU after a guest corruption in L2, followed by
a L1 reset.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 7f7f1ba33c ("KVM: x86: do not load vmcs12 pages while still in SMM")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:46 +02:00
Anshuman Khandual
5a9060e943 arm64: mm: Drop pte_huge()
This helper is required from generic huge_pte_alloc() which is available
when arch subscribes ARCH_WANT_GENERAL_HUGETLB. arm64 implements it's own
huge_pte_alloc() and does not depend on the generic definition. Drop this
helper which is redundant on arm64.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Steve Capper <Steve.Capper@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 12:06:38 +01:00
Dave Martin
ed2f3e9ff6 arm64/sve: Fix a couple of magic numbers for the Z-reg count
There are some hand-written instances of "32" to express the number
of SVE Z-registers.

Since this code was written a #define was added for this, so
convert trivial instances of this magic number as appropriate.

No functional change.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 11:49:03 +01:00
Dave Martin
d16af870a7 arm64/sve: Factor out FPSIMD to SVE state conversion
Currently we convert from FPSIMD to SVE register state in memory in
two places.

To ease future maintenance, let's consolidate this in one place.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 11:48:11 +01:00
Mark Rutland
592700f094 arm64: stacktrace: Better handle corrupted stacks
The arm64 stacktrace code is careful to only dereference frame records
in valid stack ranges, ensuring that a corrupted frame record won't
result in a faulting access.

However, it's still possible for corrupt frame records to result in
infinite loops in the stacktrace code, which is also undesirable.

This patch ensures that we complete a stacktrace in finite time, by
keeping track of which stacks we have already completed unwinding, and
verifying that if the next frame record is on the same stack, it is at a
higher address.

As this has turned out to be particularly subtle, comments are added to
explain the procedure.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tengfei Fan <tengfeif@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 11:44:15 +01:00
Dave Martin
f3dcbe67ed arm64: stacktrace: Factor out backtrace initialisation
Some common code is required by each stacktrace user to initialise
struct stackframe before the first call to unwind_frame().

In preparation for adding to the common code, this patch factors it
out into a separate function start_backtrace(), and modifies the
stacktrace callers appropriately.

No functional change.

Signed-off-by: Dave Martin <dave.martin@arm.com>
[Mark: drop tsk argument, update more callsites]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 11:44:08 +01:00
Dave Martin
8caa6e2be7 arm64: stacktrace: Constify stacktrace.h functions
on_accessible_stack() and on_task_stack() shouldn't (and don't)
modify their task argument, so it can be const.

This patch adds the appropriate modifiers. Whitespace violations in the
parameter lists are fixed at the same time.

No functional change.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <dave.martin@arm.com>
[Mark: fixup const location, whitespace]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 11:44:00 +01:00
Vincenzo Frascino
a88754b231 arm64: vdso: Cleanup Makefiles
The recent changes to the vdso library for arm64 and the introduction of
the compat vdso library have generated some misalignment in the
Makefiles.

Cleanup the Makefiles for vdso and vdso32 libraries:
  * Removing unused rules.
  * Unifying the displayed compilation messages.
  * Simplifying the generic library inclusion path for
    arm64 vdso.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 10:37:33 +01:00
Naohiro Aota
2e2f3c9b86 arm64: vdso: fix flip/flop vdso build bug
Running "make" on an already compiled kernel tree will rebuild the kernel
even without any modifications:

$ make ARCH=arm64 CROSS_COMPILE=/usr/bin/aarch64-unknown-linux-gnu-
arch/arm64/Makefile:58: CROSS_COMPILE_COMPAT not defined or empty, the compat vDSO will not be built
  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  VDSOCHK arch/arm64/kernel/vdso/vdso.so.dbg
  VDSOSYM include/generated/vdso-offsets.h
  CHK     include/generated/compile.h
  CC      arch/arm64/kernel/signal.o
  CC      arch/arm64/kernel/vdso.o
  CC      arch/arm64/kernel/signal32.o
  LD      arch/arm64/kernel/vdso/vdso.so.dbg
  OBJCOPY arch/arm64/kernel/vdso/vdso.so
  AS      arch/arm64/kernel/vdso/vdso.o
  AR      arch/arm64/kernel/vdso/built-in.a
  AR      arch/arm64/kernel/built-in.a
  GEN     .version
  CHK     include/generated/compile.h
  UPD     include/generated/compile.h
  CC      init/version.o
  AR      init/built-in.a
  LD      vmlinux.o

This is the same bug fixed in commit 92a4728608 ("x86/boot: Fix
if_changed build flip/flop bug"). We cannot use two "if_changed" in one
target. Fix this build bug by merging two commands into one function.

Fixes: a7f71a2c89 ("arm64: compat: Add vDSO")
Fixes: 28b1a824a4 ("arm64: vdso: Substitute gettimeofday() with C implementation")
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
[will: merged in compat fix from Vincenzo and made rule names consistent]
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 10:33:42 +01:00
Vincenzo Frascino
85751e9e5b arm64: vdso: Fix population of AT_SYSINFO_EHDR for compat vdso
Prior to the introduction of Unified vDSO support and compat layer for
vDSO on arm64, AT_SYSINFO_EHDR was not defined for compat tasks.
In the current implementation, AT_SYSINFO_EHDR is defined even if the
compat vdso layer is not built, which has been shown to break Android
applications using bionic:

 | 01-01 01:22:14.097   755   755 F libc    : Fatal signal 11 (SIGSEGV),
 | code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 755 (cameraserver),
 | pid 755 (cameraserver)
 | 01-01 01:22:14.112   759   759 F libc    : Fatal signal 11 (SIGSEGV),
 | code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 759
 | (android.hardwar), pid 759 (android.hardwar)
 | 01-01 01:22:14.120   756   756 F libc    : Fatal signal 11 (SIGSEGV)
 | code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 756 (drmserver),
 | pid 756 (drmserver)

Restore the old behaviour by making sure that AT_SYSINFO_EHDR for compat
tasks is defined only when CONFIG_COMPAT_VDSO is enabled.

Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-07-22 10:33:42 +01:00
Cao jin
385065734c x86/irq/64: Update stale comment
Commit e6401c1309 ("x86/irq/64: Split the IRQ stack into its own pages")
missed to update one piece of comment as it did to its peer in Xen, which
will confuse people who still need to read comment.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190719081635.26528-1-caoj.fnst@cn.fujitsu.com
2019-07-22 10:54:27 +02:00
Hans de Goede
d02f1aa391 x86/sysfb_efi: Add quirks for some devices with swapped width and height
Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen but
advertise a landscape resolution and pitch, resulting in a messed up
display if the kernel tries to show anything on the efifb (because of the
wrong pitch).

Fix this by adding a new DMI match table for devices which need to have
their width and height swapped.

At first it was tried to use the existing table for overriding some of the
efifb parameters, but some of the affected devices have variants with
different LCD resolutions which will not work with hardcoded override
values.

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1730783
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190721152418.11644-1-hdegoede@redhat.com
2019-07-22 10:47:11 +02:00
Eiichi Tsukata
2af7c85714 x86/stacktrace: Prevent access_ok() warnings in arch_stack_walk_user()
When arch_stack_walk_user() is called from atomic contexts, access_ok() can
trigger the following warning if compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y.

Reproducer:

  // CONFIG_DEBUG_ATOMIC_SLEEP=y
  # cd /sys/kernel/debug/tracing
  # echo 1 > options/userstacktrace
  # echo 1 > events/irq/irq_handler_entry/enable

  WARNING: CPU: 0 PID: 2649 at arch/x86/kernel/stacktrace.c:103 arch_stack_walk_user+0x6e/0xf6
  CPU: 0 PID: 2649 Comm: bash Not tainted 5.3.0-rc1+ #99
  RIP: 0010:arch_stack_walk_user+0x6e/0xf6
  Call Trace:
   <IRQ>
   stack_trace_save_user+0x10a/0x16d
   trace_buffer_unlock_commit_regs+0x185/0x240
   trace_event_buffer_commit+0xec/0x330
   trace_event_raw_event_irq_handler_entry+0x159/0x1e0
   __handle_irq_event_percpu+0x22d/0x440
   handle_irq_event_percpu+0x70/0x100
   handle_irq_event+0x5a/0x8b
   handle_edge_irq+0x12f/0x3f0
   handle_irq+0x34/0x40
   do_IRQ+0xa6/0x1f0
   common_interrupt+0xf/0xf
   </IRQ>

Fix it by calling __range_not_ok() directly instead of access_ok() as
copy_from_user_nmi() does. This is fine here because the actual copy is
inside a pagefault disabled region.

Reported-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190722083216.16192-2-devel@etsukata.com
2019-07-22 10:42:36 +02:00
Joerg Roedel
8e998fc24d x86/mm: Sync also unmappings in vmalloc_sync_all()
With huge-page ioremap areas the unmappings also need to be synced between
all page-tables. Otherwise it can cause data corruption when a region is
unmapped and later re-used.

Make the vmalloc_sync_one() function ready to sync unmappings and make sure
vmalloc_sync_all() iterates over all page-tables even when an unmapped PMD
is found.

Fixes: 5d72b4fba4 ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-3-joro@8bytes.org
2019-07-22 10:18:30 +02:00
Joerg Roedel
51b75b5b56 x86/mm: Check for pfn instead of page in vmalloc_sync_one()
Do not require a struct page for the mapped memory location because it
might not exist. This can happen when an ioremapped region is mapped with
2MB pages.

Fixes: 5d72b4fba4 ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-2-joro@8bytes.org
2019-07-22 10:18:30 +02:00
Sébastien Szymanski
2ca9939633 ARM: dts: imx6ul: fix clock frequency property name of I2C buses
A few boards set clock frequency of their I2C buses with
"clock_frequency" property. The right property is "clock-frequency".

Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-07-22 15:42:22 +08:00
Michael Neuling
f16d80b75a powerpc/tm: Fix oops on sigreturn on systems without TM
On systems like P9 powernv where we have no TM (or P8 booted with
ppc_tm=off), userspace can construct a signal context which still has
the MSR TS bits set. The kernel tries to restore this context which
results in the following crash:

  Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69
  NIP:  c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
  REGS: c00000003fffbd70 TRAP: 0700   Not tainted  (5.2.0-11045-g7142b497d8)
  MSR:  8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]>  CR: 42004242  XER: 00000000
  CFAR: c0000000000022e0 IRQMASK: 0
  GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
  GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
  GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
  GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
  GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
  GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
  NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
  LR [00007fffb2d67e48] 0x7fffb2d67e48
  Call Trace:
  Instruction dump:
  e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
  e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18

The problem is the signal code assumes TM is enabled when
CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
with P9 powernv or if `ppc_tm=off` is used on P8.

This means any local user can crash the system.

Fix the problem by returning a bad stack frame to the user if they try
to set the MSR TS bits with sigreturn() on systems where TM is not
supported.

Found with sigfuz kernel selftest on P9.

This fixes CVE-2019-13648.

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org
2019-07-22 13:05:23 +10:00
Linus Walleij
ae00fcc51e ARM: Delete netx a second time
Commit ceb02dcf67 ARM: delete netx machine deleted
the mach-netx machine. Then eight days later
it was resurrected by SPDX tag fixes. I think.

Taking the liberty to fix some additional debug uart
cruft.

Link: https://lore.kernel.org/r/20190721224157.6597-1-linus.walleij@linaro.org
Fixes: ceb02dcf67 ("ARM: delete netx machine")
Acked-By: Robert Schwebel <r.schwebel@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2019-07-21 20:03:11 -07:00
Fabio Estevam
6e998ef24d ARM: dts: imx7ulp: Fix usb-phy unit address format
The following warning is seen when building with W=1:

arch/arm/boot/dts/imx7ulp.dtsi:189.31-195.5: Warning (simple_bus_reg): /bus@40000000/usb-phy@0x40350000: simple-bus unit address format error, expected "40350000"

Fix it as suggested by removing the extra "0x" notation.

Fixes: 5b7bd45631 ("ARM: dts: imx7ulp: add imx7ulp USBOTG1 support")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-07-22 09:42:19 +08:00
Mike Rapoport
618381f09c hexagon: switch to generic version of pte allocation
The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(),
pte_free_kernel() and pte_free() is identical to the generic except of
lack of __GFP_ACCOUNT for the user PTEs allocation.

Switch hexagon to use generic version of these functions.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-21 09:53:00 -07:00
Helge Deller
69245c9756 parisc: Flush ITLB in flush_tlb_all_local() only on split TLB machines
flush_tlb_all_local() flushes the ITLB and DTLB of the CPU.
In case the machine does not have separate ITLBs and DTLBs, use the
alternative functionality to replace the code which flushes the ITLB
with nops while keeping the code which flushes the DTLB.

Signed-off-by: Helge Deller <deller@gmx.de>
2019-07-21 11:03:02 +02:00
Sven Schnelle
f5e03d3a04 parisc: add kprobe_fault_handler()
Add kprobe_fault_handler() to fix compilation for PA-RISC.
On PA-RISC we actually don't need that function as the recovery counter
is restored after interrupt. See the PA-RISC 2.0 Architecture Manual,
pg. 4-8, Figure 4-4: "Interruption Processing".

Fixes: b98cca444d ("mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault()")
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
2019-07-21 11:01:55 +02:00
Linus Torvalds
ac60602a6d Merge tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
 "Fix various regressions:

   - force unencrypted dma-coherent buffers if encryption bit can't fit
     into the dma coherent mask (Tom Lendacky)

   - avoid limiting request size if swiotlb is not used (me)

   - fix swiotlb handling in dma_direct_sync_sg_for_cpu/device (Fugang
     Duan)"

* tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device
  dma-direct: only limit the mapping size if swiotlb could be used
  dma-mapping: add a dma_addressing_limited helper
  dma-direct: Force unencrypted DMA under SME for certain DMA masks
2019-07-20 12:09:52 -07:00
Linus Torvalds
c6dd78fcb8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of x86 specific fixes and updates:

   - The CR2 corruption fixes which store CR2 early in the entry code
     and hand the stored address to the fault handlers.

   - Revert a forgotten leftover of the dropped FSGSBASE series.

   - Plug a memory leak in the boot code.

   - Make the Hyper-V assist functionality robust by zeroing the shadow
     page.

   - Remove a useless check for dead processes with LDT

   - Update paravirt and VMware maintainers entries.

   - A few cleanup patches addressing various compiler warnings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Prevent clobbering of saved CR2 value
  x86/hyper-v: Zero out the VP ASSIST PAGE on allocation
  x86, boot: Remove multiple copy of static function sanitize_boot_params()
  x86/boot/compressed/64: Remove unused variable
  x86/boot/efi: Remove unused variables
  x86/mm, tracing: Fix CR2 corruption
  x86/entry/64: Update comments and sanity tests for create_gap
  x86/entry/64: Simplify idtentry a little
  x86/entry/32: Simplify common_exception
  x86/paravirt: Make read_cr2() CALLEE_SAVE
  MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
  x86/process: Delete useless check for dead process with LDT
  x86: math-emu: Hide clang warnings for 16-bit overflow
  x86/e820: Use proper booleans instead of 0/1
  x86/apic: Silence -Wtype-limits compiler warnings
  x86/mm: Free sme_early_buffer after init
  x86/boot: Fix memory leak in default_get_smp_config()
  Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
2019-07-20 11:24:49 -07:00
Linus Torvalds
e6023adc5c Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:

 - A collection of objtool fixes which address recent fallout partially
   exposed by newer toolchains, clang, BPF and general code changes.

 - Force USER_DS for user stack traces

[ Note: the "objtool fixes" are not all to objtool itself, but for
  kernel code that triggers objtool warnings.

  Things like missing function size annotations, or code that confuses
  the unwinder etc.   - Linus]

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  objtool: Support conditional retpolines
  objtool: Convert insn type to enum
  objtool: Fix seg fault on bad switch table entry
  objtool: Support repeated uses of the same C jump table
  objtool: Refactor jump table code
  objtool: Refactor sibling call detection logic
  objtool: Do frame pointer check before dead end check
  objtool: Change dead_end_function() to return boolean
  objtool: Warn on zero-length functions
  objtool: Refactor function alias logic
  objtool: Track original function across branches
  objtool: Add mcsafe_handle_tail() to the uaccess safe list
  bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
  x86/uaccess: Remove redundant CLACs in getuser/putuser error paths
  x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()
  x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
  x86/head/64: Annotate start_cpu0() as non-callable
  x86/entry: Fix thunk function ELF sizes
  x86/kvm: Don't call kvm_spurious_fault() from .fixup
  x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2
  ...
2019-07-20 10:45:15 -07:00
Linus Torvalds
70e6e1b971 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CONFIG_PREEMPT_RT stub config from Thomas Gleixner:
 "The real-time preemption patch set exists for almost 15 years now and
  while the vast majority of infrastructure and enhancements have found
  their way into the mainline kernel, the final integration of RT is
  still missing.

  Over the course of the last few years, we have worked on reducing the
  intrusivenness of the RT patches by refactoring kernel infrastructure
  to be more real-time friendly. Almost all of these changes were
  benefitial to the mainline kernel on their own, so there was no
  objection to integrate them.

  Though except for the still ongoing printk refactoring, the remaining
  changes which are required to make RT a first class mainline citizen
  are not longer arguable as immediately beneficial for the mainline
  kernel. Most of them are either reordering code flows or adding RT
  specific functionality.

  But this now has hit a wall and turned into a classic hen and egg
  problem:

     Maintainers are rightfully wary vs. these changes as they make only
     sense if the final integration of RT into the mainline kernel takes
     place.

  Adding CONFIG_PREEMPT_RT aims to solve this as a clear sign that RT
  will be fully integrated into the mainline kernel. The final
  integration of the missing bits and pieces will be of course done with
  the same careful approach as we have used in the past.

  While I'm aware that you are not entirely enthusiastic about that, I
  think that RT should receive the same treatment as any other widely
  used out of tree functionality, which we have accepted into mainline
  over the years.

  RT has become the de-facto standard real-time enhancement and is
  shipped by enterprise, embedded and community distros. It's in use
  throughout a wide range of industries: telecommunications, industrial
  automation, professional audio, medical devices, data acquisition,
  automotive - just to name a few major use cases.

  RT development is backed by a Linuxfoundation project which is
  supported by major stakeholders of this technology. The funding will
  continue over the actual inclusion into mainline to make sure that the
  functionality is neither introducing regressions, regressing itself,
  nor becomes subject to bitrot. There is also a lifely user community
  around RT as well, so contrary to the grim situation 5 years ago, it's
  a healthy project.

  As RT is still a good vehicle to exercise rarely used code paths and
  to detect hard to trigger issues, you could at least view it as a QA
  tool if nothing else"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT
2019-07-20 10:33:44 -07:00
Linus Torvalds
07ab9d5bc5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
 "Mostly bugfixes, but also:

   - s390 support for KVM selftests

   - LAPIC timer offloading to housekeeping CPUs

   - Extend an s390 optimization for overcommitted hosts to all
     architectures

   - Debugging cleanups and improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: x86: Add fixed counters to PMU filter
  KVM: nVMX: do not use dangling shadow VMCS after guest reset
  KVM: VMX: dump VMCS on failed entry
  KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
  KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
  KVM: Boost vCPUs that are delivering interrupts
  KVM: selftests: Remove superfluous define from vmx.c
  KVM: SVM: Fix detection of AMD Errata 1096
  KVM: LAPIC: Inject timer interrupt via posted interrupt
  KVM: LAPIC: Make lapic timer unpinned
  KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
  KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
  kvm: x86: ioapic and apic debug macros cleanup
  kvm: x86: some tsc debug cleanup
  kvm: vmx: fix coccinelle warnings
  x86: kvm: avoid constant-conversion warning
  x86: kvm: avoid -Wsometimes-uninitized warning
  KVM: x86: expose AVX512_BF16 feature to guest
  KVM: selftests: enable pgste option for the linker on s390
  KVM: selftests: Move kvm_create_max_vcpus test to generic code
  ...
2019-07-20 10:20:27 -07:00
Linus Torvalds
168c79971b Merge tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:

 - match the directory structure of the linux-libc-dev package to that
   of Debian-based distributions

 - fix incorrect include/config/auto.conf generation when Kconfig
   creates it along with the .config file

 - remove misleading $(AS) from documents

 - clean up precious tag files by distclean instead of mrproper

 - add a new coccinelle patch for devm_platform_ioremap_resource
   migration

 - refactor module-related scripts to read modules.order instead of
   $(MODVERDIR)/*.mod files to get the list of created modules

 - remove MODVERDIR

 - update list of header compile-test

 - add -fcf-protection=none flag to avoid conflict with the retpoline
   flags when CONFIG_RETPOLINE=y

 - misc cleanups

* tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
  kbuild: add -fcf-protection=none when using retpoline flags
  kbuild: update compile-test header list for v5.3-rc1
  kbuild: split out *.mod out of {single,multi}-used-m rules
  kbuild: remove 'prepare1' target
  kbuild: remove the first line of *.mod files
  kbuild: create *.mod with full directory path and remove MODVERDIR
  kbuild: export_report: read modules.order instead of .tmp_versions/*.mod
  kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod
  kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod
  kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod
  scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver
  kbuild: remove duplication from modules.order in sub-directories
  kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin}
  kbuild: do not create empty modules.order in the prepare stage
  coccinelle: api: add devm_platform_ioremap_resource script
  kbuild: compile-test headers listed in header-test-m as well
  kbuild: remove unused hostcc-option
  kbuild: remove tag files by distclean instead of mrproper
  kbuild: add --hash-style= and --build-id unconditionally
  kbuild: get rid of misleading $(AS) from documents
  ...
2019-07-20 09:34:55 -07:00
Thomas Gleixner
6879298bd0 x86/entry/64: Prevent clobbering of saved CR2 value
The recent fix for CR2 corruption introduced a new way to reliably corrupt
the saved CR2 value.

CR2 is saved early in the entry code in RDX, which is the third argument to
the fault handling functions. But it missed that between saving and
invoking the fault handler enter_from_user_mode() can be called. RDX is a
caller saved register so the invoked function can freely clobber it with
the obvious consequences.

The TRACE_IRQS_OFF call is safe as it calls through the thunk which
preserves RDX, but TRACE_IRQS_OFF_DEBUG is not because it also calls into
C-code outside of the thunk.

Store CR2 in R12 instead which is a callee saved register and move R12 to
RDX just before calling the fault handler.

Fixes: a0d14b8909 ("x86/mm, tracing: Fix CR2 corruption")
Reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907201020540.1782@nanos.tec.linutronix.de
2019-07-20 14:28:41 +02:00
Eric Hankland
30cd860432 KVM: x86: Add fixed counters to PMU filter
Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist
fixed counters.

Signed-off-by: Eric Hankland <ehankland@google.com>
[No need to check padding fields for zero. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:48 +02:00
Paolo Bonzini
88dddc11a8 KVM: nVMX: do not use dangling shadow VMCS after guest reset
If a KVM guest is reset while running a nested guest, free_nested will
disable the shadow VMCS execution control in the vmcs01.  However,
on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
the VMCS12 to the shadow VMCS which has since been freed.

This causes a vmptrld of a NULL pointer on my machime, but Jan reports
the host to hang altogether.  Let's see how much this trivial patch fixes.

Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:47 +02:00
Paolo Bonzini
3b20e03a10 KVM: VMX: dump VMCS on failed entry
This is useful for debugging, and is ratelimited nowadays.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:47 +02:00
Like Xu
6fc3977ccc KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
If a perf_event creation fails due to any reason of the host perf
subsystem, it has no chance to log the corresponding event for guest
which may cause abnormal sampling data in guest result. In debug mode,
this message helps to understand the state of vPMC and we may not
limit the number of occurrences but not in a spamming style.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:46 +02:00
Wanpeng Li
d984740944 KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
Use kvm_vcpu_wake_up() in kvm_s390_vcpu_wakeup().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:45 +02:00
Wanpeng Li
d73eb57b80 KVM: Boost vCPUs that are delivering interrupts
Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during
vcpu wakeup and interrupt delivery), we want to also boost not just
lock holders but also vCPUs that are delivering interrupts. Most
smp_call_function_many calls are synchronous, so the IPI target vCPUs
are also good yield candidates.  This patch introduces vcpu->ready to
boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
(VT-d PI handles voluntary preemption separately, in pi_pre_block).

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M

            vanilla     boosting    improved
1VM          21443       23520         9%
2VM           2800        8000       180%
3VM           1800        3100        72%

Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
one running ebizzy -M, the other running 'stress --cpu 2':

w/ boosting + w/o pv sched yield(vanilla)

            vanilla     boosting   improved
              1570         4000      155%

w/ boosting + w/ pv sched yield(vanilla)

            vanilla     boosting   improved
              1844         5157      179%

w/o boosting, perf top in VM:

 72.33%  [kernel]       [k] smp_call_function_many
  4.22%  [kernel]       [k] call_function_i
  3.71%  [kernel]       [k] async_page_fault

w/ boosting, perf top in VM:

 38.43%  [kernel]       [k] smp_call_function_many
  6.31%  [kernel]       [k] async_page_fault
  6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
  4.88%  [kernel]       [k] call_function_interrupt

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:45 +02:00
Liran Alon
118154bdf5 KVM: SVM: Fix detection of AMD Errata 1096
When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is
possible that CPU microcode implementing DecodeAssist will fail
to read bytes of instruction which caused #NPF. This is AMD errata
1096 and it happens because CPU microcode reading instruction bytes
incorrectly attempts to read code as implicit supervisor-mode data
accesses (that is, just like it would read e.g. a TSS), which are
susceptible to SMAP faults. The microcode reads CS:RIP and if it is
a user-mode address according to the page tables, the processor
gives up and returns no instruction bytes.  In this case,
GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly
return 0 instead of the correct guest instruction bytes.

Current KVM code attemps to detect and workaround this errata, but it
has multiple issues:

1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1,
which is required for encountering a SMAP fault.

2) It assumes SMAP faults can only occur when guest CPL==3.
However, in case guest CR4.SMEP=0, the guest can execute an instruction
which reside in a user-accessible page with CPL<3 priviledge. If this
instruction raise a #NPF on it's data access, then CPU DecodeAssist
microcode will still encounter a SMAP violation.  Even though no sane
OS will do so (as it's an obvious priviledge escalation vulnerability),
we still need to handle this semanticly correct in KVM side.

Note that (2) *is* a useful optimization, because CR4.SMAP=1 is an easy
triggerable condition and guests usually enable SMAP together with SMEP.
If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt
at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest
instead of #NPF.  We keep this condition to avoid false positives in
the detection of the errata.

In addition, to avoid future confusion and improve code readbility,
include details of the errata in code and not just in commit message.

Fixes: 05d5a48635 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
Cc: Singh Brijesh <brijesh.singh@amd.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:44 +02:00
Wanpeng Li
0c5f81dad4 KVM: LAPIC: Inject timer interrupt via posted interrupt
Dedicated instances are currently disturbed by unnecessary jitter due
to the emulated lapic timers firing on the same pCPUs where the
vCPUs reside.  There is no hardware virtual timer on Intel for guest
like ARM, so both programming timer in guest and the emulated timer fires
incur vmexits.  This patch tries to avoid vmexit when the emulated timer
fires, at least in dedicated instance scenario when nohz_full is enabled.

In that case, the emulated timers can be offload to the nearest busy
housekeeping cpus since APICv has been found for several years in server
processors. The guest timer interrupt can then be injected via posted interrupts,
which are delivered by the housekeeping cpu once the emulated timer fires.

The host should tuned so that vCPUs are placed on isolated physical
processors, and with several pCPUs surplus for busy housekeeping.
If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
~3% redis performance benefit can be observed on Skylake server, and the
number of external interrupt vmexits drops substantially.  Without patch

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )

While with patch:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:40 +02:00
Linus Torvalds
abdfd52a29 Merge tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC defconfig updates from Olof Johansson:
 "We keep this in a separate branch to avoid cross-branch conflicts, but
  most of the material here is fairly boring -- some new drivers turned
  on for hardware since they were merged, and some refreshed files due
  to time having moved a lot of entries around"

* tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
  ARM: configs: multi_v5: Remove duplicate ASPEED options
  arm64: defconfig: Enable CONFIG_KEYBOARD_SNVS_PWRKEY as module
  ARM: imx_v6_v7_defconfig: Enable CONFIG_ARM_IMX_CPUFREQ_DT
  defconfig: arm64: enable i.MX8 SCU octop driver
  arm64: defconfig: Add i.MX SCU SoC info driver
  arm64: defconfig: Enable CONFIG_QORIQ_THERMAL
  ARM: imx_v6_v7_defconfig: Select CONFIG_NVMEM_SNVS_LPGPR
  arm64: defconfig: ARM_IMX_CPUFREQ_DT=m
  ARM: imx_v6_v7_defconfig: Add TPM PWM support by default
  ARM: imx_v6_v7_defconfig: Enable the OV2680 camera driver
  ARM: imx_v6_v7_defconfig: Enable CONFIG_THERMAL_STATISTICS
  arm64: defconfig: NVMEM_IMX_OCOTP=y for imx8m
  ARM: multi_v7_defconfig: enable STMFX pinctrl support
  arm64 defconfig: enable LVM support
  ARM: configs: multi_v5: Add more ASPEED devices
  arm64: defconfig: Add Tegra194 PCIe driver
  ARM: configs: aspeed: Add new drivers
  ARM: exynos_defconfig: Enable Panfrost and Lima drivers
  ARM: multi_v7_defconfig: Enable Panfrost and Lima drivers
  arm64 defconfig: enable Mellanox cards
  ...
2019-07-19 17:27:27 -07:00
Linus Torvalds
af6af87d7e Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM Devicetree updates from Olof Johansson:
 "We continue to see a lot of new material. I've highlighted some of it
  below, but there's been more beyond that as well.

  One of the sweeping changes is that many boards have seen their ARM
  Mali GPU devices added to device trees, since the DRM drivers have now
  been merged.

  So, with the caveat that I have surely missed several great
  contributions, here's a collection of the material this time around:

  New SoCs:

   - Mediatek mt8183 (4x Cortex-A73 + 4x Cortex-A53)

   - TI J721E (2x Cortex-A72 + 3x Cortex-R5F + 3 DSPs + MMA)

   - Amlogic G12B (4x Cortex-A73 + 2x Cortex-A53)

  New Boards / platforms:

   - Aspeed BMC support for a number of new server platforms

   - Kontron SMARC SoM (several i.MX6 versions)

   - Novtech's Meerkat96 (i.MX7)

   - ST Micro Avenger96 board

   - Hardkernel ODROID-N2 (Amlogic G12B)

   - Purism Librem5 devkit (i.MX8MQ)

   - Google Cheza (Qualcomm SDM845)

   - Qualcomm Dragonboard 845c (Qualcomm SDM845)

   - Hugsun X99 TV Box (Rockchip RK3399)

   - Khadas Edge/Edge-V/Captain (Rockchip RK3399)

  Updated / expanded boards and platforms:

   - Renesas r7s9210 has a lot of new peripherals added

   - Fixes and polish for Rockchip-based Chromebooks

   - Amlogic G12A has a lot of peripherals added

   - Nvidia Jetson Nano sees various fixes and improvements, and is now
     at feature parity with TX1"

* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (586 commits)
  ARM: dts: gemini: Set DIR-685 SPI CS as active low
  ARM: dts: exynos: Adjust buck[78] regulators to supported values on Arndale Octa
  ARM: dts: exynos: Adjust buck[78] regulators to supported values on Odroid XU3 family
  ARM: dts: exynos: Move Mali400 GPU node to "/soc"
  ARM: dts: exynos: Fix imprecise abort on Mali GPU probe on Exynos4210
  arm64: dts: qcom: qcs404: Add missing space for cooling-cells property
  arm64: dts: rockchip: Fix USB3 Type-C on rk3399-sapphire
  arm64: dts: rockchip: Update DWC3 modules on RK3399 SoCs
  arm64: dts: rockchip: enable rk3328 watchdog clock
  ARM: dts: rockchip: add display nodes for rk322x
  ARM: dts: rockchip: fix vop iommu-cells on rk322x
  arm64: dts: rockchip: Add support for Hugsun X99 TV Box
  arm64: dts: rockchip: Define values for the IPA governor for rock960
  arm64: dts: rockchip: Fix multiple thermal zones conflict in rk3399.dtsi
  arm64: dts: rockchip: add core dtsi file for RK3399Pro SoCs
  arm64: dts: rockchip: improve rk3328-roc-cc rgmii performance.
  Revert "ARM: dts: rockchip: set PWM delay backlight settings for Minnie"
  ARM: dts: rockchip: Configure BT_DEV_WAKE in on rk3288-veyron
  arm64: dts: qcom: sdm845-cheza: add initial cheza dt
  ARM: dts: msm8974-FP2: Add vibration motor
  ...
2019-07-19 17:19:24 -07:00
Linus Torvalds
8362fd64f0 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC-related driver updates from Olof Johansson:
 "Various driver updates for platforms and a couple of the small driver
  subsystems we merge through our tree:

   - A driver for SCU (system control) on NXP i.MX8QXP

   - Qualcomm Always-on Subsystem messaging driver (AOSS QMP)

   - Qualcomm PM support for MSM8998

   - Support for a newer version of DRAM PHY driver for Broadcom (DPFE)

   - Reset controller support for Bitmain BM1880

   - TI SCI (System Control Interface) support for CPU control on AM654
     processors

   - More TI sysc refactoring and rework"

* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (84 commits)
  reset: remove redundant null check on pointer dev
  soc: rockchip: work around clang warning
  dt-bindings: reset: imx7: Fix the spelling of 'indices'
  soc: imx: Add i.MX8MN SoC driver support
  soc: aspeed: lpc-ctrl: Fix probe error handling
  soc: qcom: geni: Add support for ACPI
  firmware: ti_sci: Fix gcc unused-but-set-variable warning
  firmware: ti_sci: Use the correct style for SPDX License Identifier
  soc: imx8: Use existing of_root directly
  soc: imx8: Fix potential kernel dump in error path
  firmware/psci: psci_checker: Park kthreads before stopping them
  memory: move jedec_ddr.h from include/memory to drivers/memory/
  memory: move jedec_ddr_data.c from lib/ to drivers/memory/
  MAINTAINERS: Remove myself as qcom maintainer
  soc: aspeed: lpc-ctrl: make parameter optional
  soc: qcom: apr: Don't use reg for domain id
  soc: qcom: fix QCOM_AOSS_QMP dependency and build errors
  memory: tegra: Fix -Wunused-const-variable
  firmware: tegra: Early resume BPMP
  soc/tegra: Select pinctrl for Tegra194
  ...
2019-07-19 17:13:56 -07:00
Linus Torvalds
24e44913aa Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC platform updates from Olof Johansson:
 "SoC platform changes. Main theme this merge window:

   - The Netx platform (Netx 100/500) platform is removed by Linus
     Walleij-- the SoC doesn't have active maintainers with hardware,
     and in discussions with the vendor the agreement was that it's OK
     to remove.

   - Russell King has a series of patches that cleans up and refactors
     SA1101 and RiscPC support"

* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
  ARM: stm32: use "depends on" instead of "if" after prompt
  ARM: sa1100: convert to common clock framework
  ARM: exynos: Cleanup cppcheck shifting warning
  ARM: pxa/lubbock: remove lubbock_set_misc_wr() from global view
  ARM: exynos: Only build MCPM support if used
  arm: add missing include platform-data/atmel.h
  ARM: davinci: Use GPIO lookup table for DA850 LEDs
  ARM: OMAP2: drop explicit assembler architecture
  ARM: use arch_extension directive instead of arch argument
  ARM: imx: Switch imx7d to imx-cpufreq-dt for speed-grading
  ARM: bcm: Enable PINCTRL for ARCH_BRCMSTB
  ARM: bcm: Enable ARCH_HAS_RESET_CONTROLLER for ARCH_BRCMSTB
  ARM: riscpc: enable chained scatterlist support
  ARM: riscpc: reduce IRQ handling code
  ARM: riscpc: move RiscPC assembly files from arch/arm/lib to mach-rpc
  ARM: riscpc: parse video information from tagged list
  ARM: riscpc: add ecard quirk for Atomwide 3port serial card
  MAINTAINERS: mvebu: Add git entry
  soc: ti: pm33xx: Add a print while entering RTC only mode with DDR in self-refresh
  ARM: OMAP2+: Make some variables static
  ...
2019-07-19 17:05:08 -07:00
Linus Torvalds
a84d2d2906 Merge tag 'csky-for-linus-5.3-rc1' of git://github.com/c-sky/csky-linux
Pull arch/csky pupdates from Guo Ren:
 "This round of csky subsystem gives two features (ASID algorithm
  update, Perf pmu record support) and some fixups.

  ASID updates:
   - Revert mmu ASID mechanism
   - Add new asid lib code from arm
   - Use generic asid algorithm to implement switch_mm
   - Improve tlb operation with help of asid

  Perf pmu record support:
   - Init pmu as a device
   - Add count-width property for csky pmu
   - Add pmu interrupt support
   - Fix perf record in kernel/user space
   - dt-bindings: Add csky PMU bindings

  Fixes:
   - Fixup no panic in kernel for some traps
   - Fixup some error count in 810 & 860.
   - Fixup abiv1 memset error"

* tag 'csky-for-linus-5.3-rc1' of git://github.com/c-sky/csky-linux:
  csky: Fixup abiv1 memset error
  csky: Improve tlb operation with help of asid
  csky: Use generic asid algorithm to implement switch_mm
  csky: Add new asid lib code from arm
  csky: Revert mmu ASID mechanism
  dt-bindings: csky: Add csky PMU bindings
  dt-bindings: interrupt-controller: Update csky mpintc
  csky: Fixup some error count in 810 & 860.
  csky: Fix perf record in kernel/user space
  csky: Add pmu interrupt support
  csky: Add count-width property for csky pmu
  csky: Init pmu as a device
  csky: Fixup no panic in kernel for some traps
  csky: Select intc & timer drivers
2019-07-19 12:15:33 -07:00
Linus Torvalds
b5d72dda89 Merge tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
 "Fixes and features:

   - A series to introduce a common command line parameter for disabling
     paravirtual extensions when running as a guest in virtualized
     environment

   - A fix for int3 handling in Xen pv guests

   - Removal of the Xen-specific tmem driver as support of tmem in Xen
     has been dropped (and it was experimental only)

   - A security fix for running as Xen dom0 (XSA-300)

   - A fix for IRQ handling when offlining cpus in Xen guests

   - Some small cleanups"

* tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: let alloc_xenballooned_pages() fail if not enough memory free
  xen/pv: Fix a boot up hang revealed by int3 self test
  x86/xen: Add "nopv" support for HVM guest
  x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable
  xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete
  x86: Add "nopv" parameter to disable PV extensions
  x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init
  xen: remove tmem driver
  Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized"
  xen/events: fix binding user event channels to cpus
2019-07-19 11:41:26 -07:00
Linus Torvalds
933a90bf4f Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
2019-07-19 10:42:02 -07:00
Linus Torvalds
249be8511b Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "The rest of MM and a kernel-wide procfs cleanup.

  Summary of the more significant patches:

   - Patch series "mm/memory_hotplug: Factor out memory block
     devicehandling", v3. David Hildenbrand.

     Some spring-cleaning of the memory hotplug code, notably in
     drivers/base/memory.c

   - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang
     Shi.

     Fix /proc/pid/smaps output for THP pages used in shmem.

   - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit.

     Bugfix and speedup for kernel/resource.c

   - Patch series "mm: Further memory block device cleanups", David
     Hildenbrand.

     More spring-cleaning of the memory hotplug code.

   - Patch series "mm: Sub-section memory hotplug support". Dan
     Williams.

     Generalise the memory hotplug code so that pmem can use it more
     completely. Then remove the hacks from the libnvdimm code which
     were there to work around the memory-hotplug code's constraints.

   - "proc/sysctl: add shared variables for range check", Matteo Croce.

     We have about 250 instances of

          int zero;
          ...
                  .extra1 = &zero,

     in the tree. This is a tree-wide sweep to make all those private
     "zero"s and "one"s use global variables.

     Alas, it isn't practical to make those two global integers const"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits)
  proc/sysctl: add shared variables for range check
  mm: migrate: remove unused mode argument
  mm/sparsemem: cleanup 'section number' data types
  libnvdimm/pfn: stop padding pmem namespaces to section alignment
  libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
  mm/devm_memremap_pages: enable sub-section remap
  mm: document ZONE_DEVICE memory-model implications
  mm/sparsemem: support sub-section hotplug
  mm/sparsemem: prepare for sub-section ranges
  mm: kill is_dev_zone() helper
  mm/hotplug: kill is_dev_zone() usage in __remove_pages()
  mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()
  mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal
  mm/sparsemem: add helpers track active portions of a section at boot
  mm/sparsemem: introduce a SECTION_IS_EARLY flag
  mm/sparsemem: introduce struct mem_section_usage
  drivers/base/memory.c: get rid of find_memory_block_hinted()
  mm/memory_hotplug: move and simplify walk_memory_blocks()
  mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
  mm: make register_mem_sect_under_node() static
  ...
2019-07-19 09:45:58 -07:00
Shawn Anastasio
b4fc36e60f powerpc/dma: Fix invalid DMA mmap behavior
The refactor of powerpc DMA functions in commit 6666cc17d7
("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly
changes the way DMA mappings are handled on powerpc.
Since this change, all mapped pages are marked as cache-inhibited
through the default implementation of arch_dma_mmap_pgprot.
This differs from the previous behavior of only marking pages
in noncoherent mappings as cache-inhibited and has resulted in
sporadic system crashes in certain hardware configurations and
workloads (see Bugzilla).

This commit restores the previous correct behavior by providing
an implementation of arch_dma_mmap_pgprot that only marks
pages in noncoherent mappings as cache-inhibited. As this behavior
should be universal for all powerpc platforms a new file,
dma-generic.c, was created to store it.

Fixes: 6666cc17d7 ("powerpc/dma: remove dma_nommu_mmap_coherent")
# NOTE: fixes commit 6666cc17d7 released in v5.1.
# Consider a stable tag:
# Cc: stable@vger.kernel.org # v5.1+
# NOTE: fixes commit 6666cc17d7 released in v5.1.
# Consider a stable tag:
# Cc: stable@vger.kernel.org # v5.1+
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190717235437.12908-1-shawn@anastas.io
2019-07-19 21:29:49 +10:00
Dexuan Cui
e320ab3cec x86/hyper-v: Zero out the VP ASSIST PAGE on allocation
The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
5.2.1 "GPA Overlay Pages" for the details) and here is an excerpt:

"The hypervisor defines several special pages that "overlay" the guest's
 Guest Physical Addresses (GPA) space. Overlays are addressed GPA but are
 not included in the normal GPA map maintained internally by the hypervisor.
 Conceptually, they exist in a separate map that overlays the GPA map.

 If a page within the GPA space is overlaid, any SPA page mapped to the
 GPA page is effectively "obscured" and generally unreachable by the
 virtual processor through processor memory accesses.

 If an overlay page is disabled, the underlying GPA page is "uncovered",
 and an existing mapping becomes accessible to the guest."

SPA = System Physical Address = the final real physical address.

When a CPU (e.g. CPU1) is onlined, hv_cpu_init() allocates the VP ASSIST
PAGE and enables the EOI optimization for this CPU by writing the MSR
HV_X64_MSR_VP_ASSIST_PAGE. From now on, hvp->apic_assist belongs to the
special SPA page, and this CPU *always* uses hvp->apic_assist (which is
shared with the hypervisor) to decide if it needs to write the EOI MSR.

When a CPU is offlined then on the outgoing CPU:
1. hv_cpu_die() disables the EOI optimizaton for this CPU, and from
   now on hvp->apic_assist belongs to the original "normal" SPA page;
2. the remaining work of stopping this CPU is done
3. this CPU is completely stopped.

Between 1 and 3, this CPU can still receive interrupts (e.g. reschedule
IPIs from CPU0, and Local APIC timer interrupts), and this CPU *must* write
the EOI MSR for every interrupt received, otherwise the hypervisor may not
deliver further interrupts, which may be needed to completely stop the CPU.

So, after the EOI optimization is disabled in hv_cpu_die(), it's required
that the hvp->apic_assist's bit0 is zero, which is not guaranteed by the
current allocation mode because it lacks __GFP_ZERO. As a consequence the
bit might be set and interrupt handling would not write the EOI MSR causing
interrupt delivery to become stuck.

Add the missing __GFP_ZERO to the allocation.

Note 1: after the "normal" SPA page is allocted and zeroed out, neither the
hypervisor nor the guest writes into the page, so the page remains with
zeros.

Note 2: see Section 10.3.5 "EOI Assist" for the details of the EOI
optimization. When the optimization is enabled, the guest can still write
the EOI MSR register irrespective of the "No EOI required" value, but
that's slower than the optimized assist based variant.

Fixes: ba696429d2 ("x86/hyper-v: Implement EOI assist")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/ <PU1P153MB0169B716A637FABF07433C04BFCB0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
2019-07-19 09:48:15 +02:00