The time, stime, utime, utimes, and futimesat system calls are only
used on older architectures, and we do not provide y2038 safe variants
of them, as they are replaced by clock_gettime64, clock_settime64,
and utimensat_time64.
However, for consistency it seems better to have the 32-bit architectures
that still use them call the "time32" entry points (leaving the
traditional handlers for the 64-bit architectures), like we do for system
calls that now require two versions.
Note: We used to always define __ARCH_WANT_SYS_TIME and
__ARCH_WANT_SYS_UTIME and only set __ARCH_WANT_COMPAT_SYS_TIME and
__ARCH_WANT_SYS_UTIME32 for compat mode on 64-bit kernels. Now this is
reversed: only 64-bit architectures set __ARCH_WANT_SYS_TIME/UTIME, while
we need __ARCH_WANT_SYS_TIME32/UTIME32 for 32-bit architectures and compat
mode. The resulting asm/unistd.h changes look a bit counterintuitive.
This is only a cleanup patch and it should not change any behavior.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
machine_crash_nonpanic_core() does this:
while (1)
cpu_relax();
because the kernel has crashed, and we have no known safe way to deal
with the CPU. So, we place the CPU into an infinite loop which we
expect it to never exit - at least not until the system as a whole is
reset by some method.
In the absence of erratum 754327, this code assembles to:
b .
In other words, an infinite loop. When erratum 754327 is enabled,
this becomes:
1: dmb
b 1b
It has been observed that on some systems (eg, OMAP4) where, if a
crash is triggered, the system tries to kexec into the panic kernel,
but fails after taking the secondary CPU down - placing it into one
of these loops. This causes the system to livelock, and the most
noticable effect is the system stops after issuing:
Loading crashdump kernel...
to the system console.
The tested as working solution I came up with was to add wfe() to
these infinite loops thusly:
while (1) {
cpu_relax();
wfe();
}
which, without 754327 builds to:
1: wfe
b 1b
or with 754327 is enabled:
1: dmb
wfe
b 1b
Adding "wfe" does two things depending on the environment we're running
under:
- where we're running on bare metal, and the processor implements
"wfe", it stops us spinning endlessly in a loop where we're never
going to do any useful work.
- if we're running in a VM, it allows the CPU to be given back to the
hypervisor and rescheduled for other purposes (maybe a different VM)
rather than wasting CPU cycles inside a crashed VM.
However, in light of erratum 794072, Will Deacon wanted to see 10 nops
as well - which is reasonable to cover the case where we have erratum
754327 enabled _and_ we have a processor that doesn't implement the
wfe hint.
So, we now end up with:
1: wfe
b 1b
when erratum 754327 is disabled, or:
1: dmb
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
wfe
b 1b
when erratum 754327 is enabled. We also get the dmb + 10 nop
sequence elsewhere in the kernel, in terminating loops.
This is reasonable - it means we get the workaround for erratum
794072 when erratum 754327 is enabled, but still relinquish the dead
processor - either by placing it in a lower power mode when wfe is
implemented as such or by returning it to the hypervisior, or in the
case where wfe is a no-op, we use the workaround specified in erratum
794072 to avoid the problem.
These as two entirely orthogonal problems - the 10 nops addresses
erratum 794072, and the wfe is an optimisation that makes the system
more efficient when crashed either in terms of power consumption or
by allowing the host/other VMs to make use of the CPU.
I don't see any reason not to use kexec() inside a VM - it has the
potential to provide automated recovery from a failure of the VMs
kernel with the opportunity for saving a crashdump of the failure.
A panic() with a reboot timeout won't do that, and reading the
libvirt documentation, setting on_reboot to "preserve" won't either
(the documentation states "The preserve action for an on_reboot event
is treated as a destroy".) Surely it has to be a good thing to
avoiding having CPUs spinning inside a VM that is doing no useful
work.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Consolidating the "pen_release" stuff amongst the various SoC
implementations gives credence to having a CPU holding pen for
secondary CPUs. However, this is far from the truth.
Many SoC implementations cargo-cult copied various bits of the pen
release implementation from the initial Realview/Versatile Express
implementation without understanding what it was or why it existed.
The reason it existed is because these are _development_ platforms,
and some board firmware is unable to individually control the
startup of secondary CPUs. Moreover, they do not have a way to
power down or reset secondary CPUs for hot-unplug. Hence, the
pen_release implementation was designed for ARM Ltd's development
platforms to provide a working implementation, even though it is
very far from what is required.
It was decided a while back to reduce the duplication by consolidating
the "pen_release" variable, but this only made the situation worse -
we have ended up with several implementations that read this variable
but do not write it - again, showing the cargo-cult mentality at work,
lack of proper review of new code, and in some cases a lack of testing.
While it would be preferable to remove pen_release entirely from the
kernel, this is not possible without help from the SoC maintainers,
which seems to be lacking. However, I want to remove pen_release from
arch code to remove the credence that having it gives.
This patch removes pen_release from the arch code entirely, adding
private per-SoC definitions for it instead, and explicitly stating
that write_pen_release() is cargo-cult copied and should not be
copied any further. Rename write_pen_release() in a similar fashion
as well.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Arm TC2 fails cpu hotplug stress test.
This issue was tracked down to a missing copy of the new affinity
cpumask for the vexpress-spc interrupt into struct
irq_common_data.affinity when the interrupt is migrated in
migrate_one_irq().
Fix it by replacing the arm specific hotplug cpu migration with the
generic irq code.
This is the counterpart implementation to commit 217d453d47 ("arm64:
fix a migrating irq bug when hotplug cpu").
Tested with cpu hotplug stress test on Arm TC2 (multi_v7_defconfig plus
CONFIG_ARM_BIG_LITTLE_CPUFREQ=y and CONFIG_ARM_VEXPRESS_SPC_CPUFREQ=y).
The vexpress-spc interrupt (irq=22) on this board is affine to CPU0.
Its affinity cpumask now changes correctly e.g. from 0 to 1-4 when
CPU0 is hotplugged out.
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
ARMv8M introduces support for Security extension to M class, among
other things it affects exception handling, especially, encoding of
EXC_RETURN.
The new bits have been added:
Bit [6] Secure or Non-secure stack
Bit [5] Default callee register stacking
Bit [0] Exception Secure
which conflicts with hard-coded value of EXC_RETURN:
In fact, we only care of few bits:
Bit [3] Mode (0 - Handler, 1 - Thread)
Bit [2] Stack pointer selection (0 - Main, 1 - Process)
We can toggle only those bits and left other bits as they were on
exception entry.
It is basically, what patch does - saves EXC_RETURN when we do
transition form Thread to Handler mode (it is first svc), so later
saved value is used instead of EXC_RET_THREADMODE_PROCESSSTACK.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Convert the conditional infix to a postfix to make sure this inline
assembly is unified syntax. Since gcc assumes non-unified syntax
when emitting ARM instructions, make sure to define the syntax as
unified.
This allows to use LLVM's integrated assembler.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Convert the conditional infix to a postfix to make sure this inline
assembly is unified syntax. Since gcc assumes non-unified syntax
when emitting ARM instructions, make sure to define the syntax as
unified.
This allows to use LLVM's integrated assembler.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
This is used when mmapping the PCI resource* files in sys. Because ARM
currently lacks an implementation of pgprot_device(), it falls back to
pgprot_uncached() (Strongly Ordered), but we should be able to use
Device memory instead.
Doing this speeds up large writes to the resource files by about 40% on
one of my systems. It also ensures that mmaps on these resources use
the same memory type as ioremap().
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
As of commit 7484c727b6 ("ARM: realview: delete the RealView board
files"), the ARM Timer and Watchdog Unit is instantiated from DT only.
Moreover, the driver is selected from ARCH_MULTIPLATFORM platforms only,
which implies OF, TIMER_OF, and COMMON_CLK.
Hence remove all unused legacy infrastructure from the driver.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The migrate_pages system call has an assigned number on all architectures
except ARM. When it got added initially in commit d80ade7b32 ("ARM:
Fix warning: #warning syscall migrate_pages not implemented"), it was
intentionally left out based on the observation that there are no 32-bit
ARM NUMA systems.
However, there are now arm64 NUMA machines that can in theory run 32-bit
kernels (actually enabling NUMA there would require additional work)
as well as 32-bit user space on 64-bit kernels, so that argument is no
longer very strong.
Assigning the number lets us use the system call on 64-bit kernels as well
as providing a more consistent set of syscalls across architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Xen-swiotlb hooks into the arm/arm64 arch code through a copy of the DMA
DMA mapping operations stored in the struct device arch data.
Switching arm64 to use the direct calls for the merged DMA direct /
swiotlb code broke this scheme. Replace the indirect calls with
direct-calls in xen-swiotlb as well to fix this problem.
Fixes: 356da6d0cd ("dma-mapping: bypass indirect calls for dma-direct")
Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Pull ARM updates from Russell King:
"Included in this update:
- Florian Fainelli noticed that userspace segfaults caused by the
lack of kernel-userspace helpers was hard to diagnose; we now issue
a warning when userspace tries to use the helpers but the kernel
has them disabled.
- Ben Dooks wants compatibility for the old ATAG serial number with
DT systems.
- Some cleanup of assembly by Nicolas Pitre.
- User accessors optimisation from Vincent Whitchurch.
- More robust kdump on SMP systems from Yufen Wang.
- Sebastian Andrzej Siewior noticed problems with the SMP "boot_lock"
on RT kernels, and so we convert the Versatile series of platforms
to use a raw spinlock instead, consolidating the Versatile
implementation. We entirely remove the boot_lock on OMAP systems,
where it's unnecessary. Further patches for other systems will be
submitted for the following merge window.
- Start switching old StrongARM-11x0 systems to use gpiolib rather
than their private GPIO implementation - mostly PCMCIA bits.
- ARM Kconfig cleanups.
- Cleanup a mostly harmless mistake in the recent Spectre patch in
4.20 (which had the effect that data that can be placed into the
init sections was incorrectly always placed in the rodata section)"
* tag 'for-4.21' of git://git.armlinux.org.uk/~rmk/linux-arm: (25 commits)
ARM: omap2: remove unnecessary boot_lock
ARM: versatile: rename and comment SMP implementation
ARM: versatile: convert boot_lock to raw
ARM: vexpress/realview: consolidate immitation CPU hotplug
ARM: fix the cockup in the previous patch
ARM: sa1100/cerf: switch to using gpio_led_register_device()
ARM: sa1100/assabet: switch to using gpio leds
ARM: sa1100/assabet: add gpio keys support for right-hand two buttons
ARM: sa1111: remove legacy GPIO interfaces
pcmcia: sa1100*: remove redundant bvd1/bvd2 setting
ARM: pxa/lubbock: switch PCMCIA to MAX1600 library
ARM: pxa/mainstone: switch PCMCIA to MAX1600 library and gpiod APIs
ARM: sa1100/neponset: switch PCMCIA to MAX1600 library and gpiod APIs
ARM: sa1100/jornada720: switch PCMCIA to gpiod APIs
pcmcia: add MAX1600 library
ARM: sa1100: explicitly register sa11x0-pcmcia devices
ARM: 8813/1: Make aligned 2-byte getuser()/putuser() atomic on ARMv6+
ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS
ARM: 8811/1: always list both ldrd/strd registers explicitly
ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
...
Merge more updates from Andrew Morton:
- procfs updates
- various misc bits
- lib/ updates
- epoll updates
- autofs
- fatfs
- a few more MM bits
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
mm/page_io.c: fix polled swap page in
checkpatch: add Co-developed-by to signature tags
docs: fix Co-Developed-by docs
drivers/base/platform.c: kmemleak ignore a known leak
fs: don't open code lru_to_page()
fs/: remove caller signal_pending branch predictions
mm/: remove caller signal_pending branch predictions
arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
kernel/sched/: remove caller signal_pending branch predictions
kernel/locking/mutex.c: remove caller signal_pending branch predictions
mm: select HAVE_MOVE_PMD on x86 for faster mremap
mm: speed up mremap by 20x on large regions
mm: treewide: remove unused address argument from pte_alloc functions
initramfs: cleanup incomplete rootfs
scripts/gdb: fix lx-version string output
kernel/kcov.c: mark write_comp_data() as notrace
kernel/sysctl: add panic_print into sysctl
panic: add options to print system info when panic happens
bfs: extra sanity checking and static inode bitmap
exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
...
Patch series "Add support for fast mremap".
This series speeds up the mremap(2) syscall by copying page tables at
the PMD level even for non-THP systems. There is concern that the extra
'address' argument that mremap passes to pte_alloc may do something
subtle architecture related in the future that may make the scheme not
work. Also we find that there is no point in passing the 'address' to
pte_alloc since its unused. This patch therefore removes this argument
tree-wide resulting in a nice negative diff as well. Also ensuring
along the way that the enabled architectures do not do anything funky
with the 'address' argument that goes unnoticed by the optimization.
Build and boot tested on x86-64. Build tested on arm64. The config
enablement patch for arm64 will be posted in the future after more
testing.
The changes were obtained by applying the following Coccinelle script.
(thanks Julia for answering all Coccinelle questions!).
Following fix ups were done manually:
* Removal of address argument from pte_fragment_alloc
* Removal of pte_alloc_one_fast definitions from m68k and microblaze.
// Options: --include-headers --no-includes
// Note: I split the 'identifier fn' line, so if you are manually
// running it, please unsplit it so it runs for you.
virtual patch
@pte_alloc_func_def depends on patch exists@
identifier E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
type T2;
@@
fn(...
- , T2 E2
)
{ ... }
@pte_alloc_func_proto_noarg depends on patch exists@
type T1, T2, T3, T4;
identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
(
- T3 fn(T1, T2);
+ T3 fn(T1);
|
- T3 fn(T1, T2, T4);
+ T3 fn(T1, T2);
)
@pte_alloc_func_proto depends on patch exists@
identifier E1, E2, E4;
type T1, T2, T3, T4;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
(
- T3 fn(T1 E1, T2 E2);
+ T3 fn(T1 E1);
|
- T3 fn(T1 E1, T2 E2, T4 E4);
+ T3 fn(T1 E1, T2 E2);
)
@pte_alloc_func_call depends on patch exists@
expression E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
fn(...
-, E2
)
@pte_alloc_macro depends on patch exists@
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
identifier a, b, c;
expression e;
position p;
@@
(
- #define fn(a, b, c) e
+ #define fn(a, b) e
|
- #define fn(a, b) e
+ #define fn(a) e
)
Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull DMA mapping updates from Christoph Hellwig:
"A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for
architectures that have devices that perform DMA that is not cache
coherent. Based on the existing arm64 implementation and also used
for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel
data leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script"
* tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits)
dma-mapping: fix inverted logic in dma_supported
dma-mapping: deprecate dma_zalloc_coherent
dma-mapping: zero memory returned from dma_alloc_*
sparc/iommu: fix ->map_sg return value
sparc/io-unit: fix ->map_sg return value
arm64: default to the direct mapping in get_arch_dma_ops
PCI: Remove unused attr variable in pci_dma_configure
ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled
dma-mapping: bypass indirect calls for dma-direct
vmd: use the proper dma_* APIs instead of direct methods calls
dma-direct: merge swiotlb_dma_ops into the dma_direct code
dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
dma-direct: improve addressability error reporting
swiotlb: remove dma_mark_clean
swiotlb: remove SWIOTLB_MAP_ERROR
ACPI / scan: Refactor _CCA enforcement
dma-mapping: factor out dummy DMA ops
dma-mapping: always build the direct mapping code
dma-mapping: move dma_cache_sync out of line
dma-mapping: move various slow path functions out of line
...
Pull modules updates from Jessica Yu:
- Some modules-related kallsyms cleanups and a kallsyms fix for ARM.
- Include keys from the secondary keyring in module signature
verification.
* tag 'modules-for-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
ARM: module: Fix function kallsyms on Thumb-2
module: Overwrite st_size instead of st_info
module: make it clearer when we're handling kallsyms symbols vs exported symbols
modsign: use all trusted keys to verify module signature
Pull gcc-plugins update from Kees Cook:
"Both arm and arm64 are gaining per-task stack canaries (to match x86),
but arm is being done with a gcc plugin, hence it going through the
gcc-plugins tree.
New gcc-plugin:
- Enable per-task stack protector for ARM (Ard Biesheuvel)"
* tag 'gcc-plugins-v4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
ARM: smp: add support for per-task stack canaries
Pull KVM updates from Paolo Bonzini:
"ARM:
- selftests improvements
- large PUD support for HugeTLB
- single-stepping fixes
- improved tracing
- various timer and vGIC fixes
x86:
- Processor Tracing virtualization
- STIBP support
- some correctness fixes
- refactorings and splitting of vmx.c
- use the Hyper-V range TLB flush hypercall
- reduce order of vcpu struct
- WBNOINVD support
- do not use -ftrace for __noclone functions
- nested guest support for PAUSE filtering on AMD
- more Hyper-V enlightenments (direct mode for synthetic timers)
PPC:
- nested VFIO
s390:
- bugfixes only this time"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: x86: Add CPUID support for new instruction WBNOINVD
kvm: selftests: ucall: fix exit mmio address guessing
Revert "compiler-gcc: disable -ftracer for __noclone functions"
KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
KVM: Make kvm_set_spte_hva() return int
KVM: Replace old tlb flush function with new one to flush a specified range.
KVM/MMU: Add tlb flush with range helper function
KVM/VMX: Add hv tlb range flush support
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM: x86: Disable Intel PT when VMXON in L1 guest
KVM: x86: Set intercept for Intel PT MSRs read/write
KVM: x86: Implement Intel PT MSRs read/write emulation
...
The patch is to make kvm_set_spte_hva() return int and caller can
check return value to determine flush tlb or not.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
32 and 64bit use different symbols to identify the traps.
32bit has a fine grained approach (prefetch abort, data abort and HVC),
while 64bit is pretty happy with just "trap".
This has been fine so far, except that we now need to decode some
of that in tracepoints that are common to both architectures.
Introduce ARM_EXCEPTION_IS_TRAP which abstracts the trap symbols
and make the tracepoint use it.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
KVM only supports PMD hugepages at stage 2. Now that the various page
handling routines are updated, extend the stage 2 fault handling to
map in PUD hugepages.
Addition of PUD hugepage support enables additional page sizes (e.g.,
1G with 4K granule) which can be useful on cores that support mapping
larger block sizes in the TLB entries.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ]
Signed-off-by: Suzuki Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for creating PUD hugepages at stage 2, add support for
detecting execute permissions on PUD page table entries. Faults due to
lack of execute permissions on page table entries is used to perform
i-cache invalidation on first execute.
Provide trivial implementations of arm32 helpers to allow sharing of
code.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for creating PUD hugepages at stage 2, add support for
write protecting PUD hugepages when they are encountered. Write
protecting guest tables is used to track dirty pages when migrating
VMs.
Also, provide trivial implementations of required kvm_s2pud_* helpers
to allow sharing of code with arm32.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[ Replaced BUG() => WARN_ON() in arm32 pud helpers ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When we emulate a guest instruction, we don't advance the hardware
singlestep state machine, and thus the guest will receive a software
step exception after a next instruction which is not emulated by the
host.
We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
whether userspace requested a single step, and fake a debug exception
from within the kernel. Other times, we advance the HW singlestep state
rely on the HW to generate the exception for us. Thus, the observed step
behaviour differs for host and guest.
Let's make this simpler and consistent by always advancing the HW
singlestep state machine when we skip an instruction. Thus we can rely
on the hardware to generate the singlestep exception for us, and never
need to explicitly check for an active-pending step, nor do we need to
fake a debug exception from the guest.
Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Thumb-2 functions have the lowest bit set in the symbol value in the
symtab. When kallsyms are generated for the vmlinux, the kallsyms are
generated from the output of nm, and nm clears the lowest bit.
$ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
95947: 8015dc89 686 FUNC GLOBAL DEFAULT 2 show_interrupts
$ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
8015dc88 T show_interrupts
$ cat /proc/kallsyms | grep show_interrupts
8015dc88 T show_interrupts
However, for modules, the kallsyms uses the values in the symbol table
without modification, so for functions in modules, the lowest bit is set
in kallsyms.
$ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
333: 00002d4d 36 FUNC GLOBAL DEFAULT 1 tun_get_socket
$ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
00002d4c T tun_get_socket
$ cat /proc/kallsyms | grep tun_get_socket
7f802d4d t tun_get_socket [tun]
Because of this, the symbol+offset of the crashing instruction shown in
oopses is incorrect when the crash is in a module. For example, given a
tun_get_socket which starts like this,
00002d4c <tun_get_socket>:
2d4c: 6943 ldr r3, [r0, #20]
2d4e: 4a07 ldr r2, [pc, #28]
2d50: 4293 cmp r3, r2
a crash when tun_get_socket is called with NULL results in:
PC is at tun_xdp+0xa3/0xa4 [tun]
pc : [<7f802d4c>]
As can be seen, the "PC is at" line reports the wrong symbol name, and
the symbol+offset will point to the wrong source line if it is passed to
gdb.
To solve this, add a way for archs to fixup the reading of these module
kallsyms values, and use that to clear the lowest bit for function
symbols on Thumb-2.
After the fix:
# cat /proc/kallsyms | grep tun_get_socket
7f802d4c t tun_get_socket [tun]
PC is at tun_get_socket+0x0/0x24 [tun]
pc : [<7f802d4c>]
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
On ARM, we currently only change the value of the stack canary when
switching tasks if the kernel was built for UP. On SMP kernels, this
is impossible since the stack canary value is obtained via a global
symbol reference, which means
a) all running tasks on all CPUs must use the same value
b) we can only modify the value when no kernel stack frames are live
on any CPU, which is effectively never.
So instead, use a GCC plugin to add a RTL pass that replaces each
reference to the address of the __stack_chk_guard symbol with an
expression that produces the address of the 'stack_canary' field
that is added to struct thread_info. This way, each task will use
its own randomized value.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: kernel-hardening@lists.openwall.com
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
An SVE system is so far the only case where we mandate VHE. As we're
starting to grow this requirements, let's slightly rework the way we
deal with that situation, allowing for easy extension of this check.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Arm already returns (~(dma_addr_t)0x0) on mapping failures, so we can
switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping
code handle the rest.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we have migrated all users of the legacy private SA1111 gpio
interfaces, we can remove these redundant GPIO interfaces.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull ARM spectre updates from Russell King:
"These are the currently known final bits that resolve the Spectre
issues. big.Little systems used to be sufficiently identical in that
there were no differences between individual CPUs in the system that
mattered to the kernel. With the advent of the Spectre problem, the
CPUs now have differences in how the workaround is applied.
As a result of previous Spectre patches, these systems ended up
reporting quite a lot of:
"CPUx: Spectre v2: incorrect context switching function, system vulnerable"
messages due to the action of the big.Little switcher causing the CPUs
to be re-initialised regularly. This series resolves that issue by
making the CPU vtable unique to each CPU.
However, since this is used very early, before per-cpu is setup,
per-cpu can't be used. We also have a problem that two of the methods
are not called from preempt-safe paths, but thankfully these remain
identical between all CPUs in the system. To make sure, we validate
that these are identical during boot"
* 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: spectre-v2: per-CPU vtables to work around big.Little systems
ARM: add PROC_VTABLE and PROC_TABLE macros
ARM: clean up per-processor check_bugs method call
ARM: split out processor lookup
ARM: make lookup_processor_type() non-__init
getuser() and putuser() (and there underscored variants) use two
strb[t]/ldrb[t] instructions when they are asked to get/put 16-bits.
This means that the read/write is not atomic even when performed to a
16-bit-aligned address.
This leads to problems with vhost: vhost uses __getuser() to read the
vring's 16-bit avail.index field, and if it happens to observe a partial
update of the index, wrong descriptors will be used which will lead to a
breakdown of the virtio communication. A similar problem exists for
__putuser() which is used to write to the vring's used.index field.
The reason these functions use strb[t]/ldrb[t] is because strht/ldrht
instructions did not exist until ARMv6T2/ARMv7. So we should be easily
able to fix this on ARMv7. Also, since all ARMv6 processors also don't
actually use the unprivileged instructions anymore for uaccess (since
CONFIG_CPU_USE_DOMAINS is not used) we can easily fix them too.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
In big.Little systems, some CPUs require the Spectre workarounds in
paths such as the context switch, but other CPUs do not. In order
to handle these differences, we need per-CPU vtables.
We are unable to use the kernel's per-CPU variables to support this
as per-CPU is not initialised at times when we need access to the
vtables, so we have to use an array indexed by logical CPU number.
We use an array-of-pointers to avoid having function pointers in
the kernel's read/write .data section.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Allow the way we access members of the processor vtable to be changed
at compile time. We will need to move to per-CPU vtables to fix the
Spectre variant 2 issues on big.Little systems.
However, we have a couple of calls that do not need the vtable
treatment, and indeed cause a kernel warning due to the (later) use
of smp_processor_id(), so also introduce the PROC_TABLE macro for
these which always use CPU 0's function pointers.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Call the per-processor type check_bugs() method in the same way as we
do other per-processor functions - move the "processor." detail into
proc-fns.h.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Split out the lookup of the processor type and associated error handling
from the rest of setup_processor() - we will need to use this in the
secondary CPU bringup path for big.Little Spectre variant 2 mitigation.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull s390 fixes from Martin Schwidefsky:
- A fix for the pgtable_bytes misaccounting on s390. The patch changes
common code part in regard to page table folding and adds extra
checks to mm_[inc|dec]_nr_[pmds|puds].
- Add FORCE for all build targets using if_changed
- Use non-loadable phdr for the .vmlinux.info section to avoid a
segment overlap that confuses kexec
- Cleanup the attribute definition for the diagnostic sampling
- Increase stack size for CONFIG_KASAN=y builds
- Export __node_distance to fix a build error
- Correct return code of a PMU event init function
- An update for the default configs
* tag 's390-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/perf: Change CPUM_CF return code in event init function
s390: update defconfigs
s390/mm: Fix ERROR: "__node_distance" undefined!
s390/kasan: increase instrumented stack size to 64k
s390/cpum_sf: Rework attribute definition for diagnostic sampling
s390/mm: fix mis-accounting of pgtable_bytes
mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
mm: introduce mm_[p4d|pud|pmd]_folded
mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
s390: avoid vmlinux segments overlap
s390/vdso: add missing FORCE to build targets
s390/decompressor: add missing FORCE to build targets
Change the currently empty defines for __PAGETABLE_PMD_FOLDED,
__PAGETABLE_PUD_FOLDED and __PAGETABLE_P4D_FOLDED to return 1.
This makes it possible to use __is_defined() to test if the
preprocessor define exists.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull clk updates from Stephen Boyd:
"This time it looks like a quieter release cycle in the clk tree. I
guess that's because of summer time holidays/vacations. The biggest
change in the diffstat is in the Qualcomm clk driver, where they got
support for CPUs and handful of SoCs. After that, the at91 driver got
a major rewrite for newer DT bindings that should make things easier
going forward and the TI code moved to a clockdomain based design.
The long tail is mostly small driver updates for newer clks and some
simpler SoC clock drivers such as the Hisilicon and imx support.
In the core framework, we only have two small changes this time.
One is a new clk API to get all clks for a device with the bulk clk
APIs. This allows drivers that don't care about doing anything besides
turning on all the clks to just clk_get() them all and turn them on.
The other change is the beginning of a way to support save and restore
of clk settings in the clk framework. TI is the only user right now,
but we will want to expand upon this design in the future to support
more save and restore of clk registers. At least this gets us started
and works well enough for one SoC, but there's more work in the
future.
Core:
- clk_bulk_get_all() API and friends to get all the clks for a device
- Basic clk state save/restore hooks
New Drivers:
- Renesas RZ/A2 (R7S9210) SoC, including early clocks
- Rensas RZ/G1N (R8A7744) and RZ/G2E (R8A774C0) SoCs
- Rensas RZ/G2M (r8a774a1) SoC
- Qualcomm Krait CPU clk support
- Qualcomm QCS404 GCC support
- Qualcomm SDM660 GCC support
- Qualcomm SDM845 camera clock controller
- Ingenic jz4725b CGU
- Hisilicon 3670 SoC support
- TI SCI clks on K3 SoCs
- iMX6 MMDC clks
- Reset Controller (RMU) support for Actions Semi Owl S900 and S700 SoCs
Updates:
- Rework at91 PMC clock driver for new DT bindings
- Nvidia Tegra clk driver MBIST workaround fix
- S2RAM support for Marvell mvebu periph clks
- Use updated printk format for OF node names
- Fix TI code to only search DT subnodes
- Various static analysis finds
- Tag various drivers with SPDX license tags
- Support dynamic frequency switching (DFS) on qcom SDM845 GCC
- Only use s2mps11 dt-binding defines instead of redefining them in the driver
- Add some more missing clks to qcom MSM8996 GCC
- Quad SPI clks on qcom SDM845
- Add support for CMT timer clocks on R-Car V3H
- Add support for SHDI and various timer clocks on R-Car V3M
- Improve OSC and RCLK (watchdog) handling on R-Car Gen3 SoCs
- Amlogic clk-pll driver improvements and updates
- Amlogic axg audio controller system clocks
- Register Amlogic meson8b clock controller early
- Add support for SATA and Fine Display Processor (FDP) clocks on R-Car M3-N
- Consolidation of system suspend related code in Exynos, S5P, S3C SoC clk drivers
- Fixes for system suspend support on Exynos542x (Odroid boards) and Exynos5433 SoC
- Remove obsoleted Exynos4212 ISP clock definitions
- Migrated TI am3/4/5 and dra7 SoCs to clockdomain based design
- TI RTC+DDR sleep mode support for clock save/restore
- Allwinner A64 display engine support and fixes
- Allwinner A83t display engine support and fixes"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (186 commits)
clk: qcom: Remove unused arrays in SDM845 GCC
clk: fixed-rate: fix of_node_get-put imbalance
clk: s2mps11: Add used attribute to s2mps11_dt_match
clk: qcom: gcc-sdm660: Add MODULE_LICENSE
clk: qcom: Add safe switch hook for krait mux clocks
dt-bindings: clock: Document qcom,krait-cc
clk: qcom: Add Krait clock controller driver
dt-bindings: arm: Document qcom,kpss-gcc
clk: qcom: Add KPSS ACC/GCC driver
clk: qcom: Add support for Krait clocks
clk: qcom: Add IPQ806X's HFPLLs
clk: qcom: Add MSM8960/APQ8064's HFPLLs
dt-bindings: clock: Document qcom,hfpll
clk: qcom: Add HFPLL driver
clk: qcom: Add support for High-Frequency PLLs (HFPLLs)
ARM: Add Krait L2 register accessor functions
clk: imx6q: add mmdc0 ipg clock
clk: imx6sl: add mmdc ipg clocks
clk: imx6sll: add mmdc1 ipg clock
clk: imx6sx: add mmdc1 ipg clock
...