Currently LLVM's integrated assembler does not recognize .w form
of the pld instructions (LLVM Bug 40972 [0]):
./arch/arm/include/asm/processor.h:133:5: error: invalid instruction
"pldw.wt%a0 n"
^
<inline asm>:2:1: note: instantiated into assembly here
pldw.w [r0]
^
1 error generated.
The W macro for generating wide instructions when targeting Thumb-2
is not strictly required for the preload data instructions (pld, pldw)
since they are only available as wide instructions. The GNU assembler
works with or without the .w appended when compiling an Thumb-2 kernel.
Drop the macro to work around LLVM Bug 40972 issue.
[0] https://bugs.llvm.org/show_bug.cgi?id=40972
Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
machine_crash_nonpanic_core() does this:
while (1)
cpu_relax();
because the kernel has crashed, and we have no known safe way to deal
with the CPU. So, we place the CPU into an infinite loop which we
expect it to never exit - at least not until the system as a whole is
reset by some method.
In the absence of erratum 754327, this code assembles to:
b .
In other words, an infinite loop. When erratum 754327 is enabled,
this becomes:
1: dmb
b 1b
It has been observed that on some systems (eg, OMAP4) where, if a
crash is triggered, the system tries to kexec into the panic kernel,
but fails after taking the secondary CPU down - placing it into one
of these loops. This causes the system to livelock, and the most
noticable effect is the system stops after issuing:
Loading crashdump kernel...
to the system console.
The tested as working solution I came up with was to add wfe() to
these infinite loops thusly:
while (1) {
cpu_relax();
wfe();
}
which, without 754327 builds to:
1: wfe
b 1b
or with 754327 is enabled:
1: dmb
wfe
b 1b
Adding "wfe" does two things depending on the environment we're running
under:
- where we're running on bare metal, and the processor implements
"wfe", it stops us spinning endlessly in a loop where we're never
going to do any useful work.
- if we're running in a VM, it allows the CPU to be given back to the
hypervisor and rescheduled for other purposes (maybe a different VM)
rather than wasting CPU cycles inside a crashed VM.
However, in light of erratum 794072, Will Deacon wanted to see 10 nops
as well - which is reasonable to cover the case where we have erratum
754327 enabled _and_ we have a processor that doesn't implement the
wfe hint.
So, we now end up with:
1: wfe
b 1b
when erratum 754327 is disabled, or:
1: dmb
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
wfe
b 1b
when erratum 754327 is enabled. We also get the dmb + 10 nop
sequence elsewhere in the kernel, in terminating loops.
This is reasonable - it means we get the workaround for erratum
794072 when erratum 754327 is enabled, but still relinquish the dead
processor - either by placing it in a lower power mode when wfe is
implemented as such or by returning it to the hypervisior, or in the
case where wfe is a no-op, we use the workaround specified in erratum
794072 to avoid the problem.
These as two entirely orthogonal problems - the 10 nops addresses
erratum 794072, and the wfe is an optimisation that makes the system
more efficient when crashed either in terms of power consumption or
by allowing the host/other VMs to make use of the CPU.
I don't see any reason not to use kexec() inside a VM - it has the
potential to provide automated recovery from a failure of the VMs
kernel with the opportunity for saving a crashdump of the failure.
A panic() with a reboot timeout won't do that, and reading the
libvirt documentation, setting on_reboot to "preserve" won't either
(the documentation states "The preserve action for an on_reboot event
is treated as a destroy".) Surely it has to be a good thing to
avoiding having CPUs spinning inside a VM that is doing no useful
work.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
While ARM32 carries FPU state in the thread structure that is saved and
restored during signal handling, it doesn't need to declare a usercopy
whitelist, since existing accessors are all either using a bounce buffer
(for which whitelisting isn't checking the slab), are statically sized
(which will bypass the hardened usercopy check), or both.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
The elf_fdpic binary format driver has to initialize extra registers
beside the stack and program counter as required by the corresponding
ABI. So reinstate them after the regs structure has been cleared.
While at it let's get rid of start_thread_nommu().
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Mickael GUENE <mickael.guene@st.com>
Tested-by: Vincent Abriou <vincent.abriou@st.com>
Tested-by: Andras Szemzo <szemzo.andras@gmail.com>
SMP ARMv7 CPUs implement the pldw instruction, which allows them to
prefetch data cachelines in an exclusive state.
This patch defines the prefetchw macro using pldw for CPUs that support
it.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Patching UP/SMP alternatives inside inline assembly blocks is useful
outside of the spinlock implementation, where it is used for sev and wfe.
This patch lifts the macro into processor.h and gives it a scarier name
to (a) avoid conflicts in the global namespace and (b) to try and deter
its usage unless you "know what you're doing". The W macro for generating
wide instructions when targetting Thumb-2 is also made available under
the name WASM, to reduce the potential for conflicts with other headers.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The pld instruction does not affect the condition flags, so don't bother
clobbering them.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
a.out support on ARM requires that argc, argv and envp are passed in
r0-r2 respectively, which requires hacking load_aout_binary to
prevent argc being clobbered by the return code. Whilst mainline kernels
do set the registers up in start_thread, the aout loader has never
carried the hack in mainline.
Initialising the registers in this way actually goes against the libc
expectations for ELF binaries, where argc, argv and envp are passed on
the stack, with r0 being used to hold a pointer to an exit function for
cleaning up after the dynamic linker if required. If the pointer is
NULL, then it is ignored. When execing an ELF binary, Linux currently
zeroes r0, then sets it to argc and then finally clobbers it with the
return value of the execve syscall, so we actually end up with:
r0 = 0
stack[0] = argc
r1 = stack[1] = argv
r2 = stack[2] = envp
libc treats r1 and r2 as undefined. The clobbering of r0 by sys_execve
works for user-spawned threads, but when executing an ELF binary from a
kernel thread (via call_usermodehelper), the execve is performed on the
ret_from_fork path, which restores r0 from the saved pt_regs, resulting
in argc being presented to the C library. This has horrible consequences
when the application exits, since we have an exit function registered
using argc, resulting in a jump to hyperspace.
This patch solves the problem by removing the partial a.out support from
arch/arm/ altogether.
Cc: <stable@vger.kernel.org>
Cc: Ashish Sangwan <ashishsangwan2@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull fpu state cleanups from Ingo Molnar:
"This tree streamlines further aspects of FPU handling by eliminating
the prepare_to_copy() complication and moving that logic to
arch_dup_task_struct().
It also fixes the FPU dumps in threaded core dumps, removes and old
(and now invalid) assumption plus micro-optimizes the exit path by
avoiding an FPU save for dead tasks."
Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came
in because we now do the FPU handling in arch_dup_task_struct() rather
than the legacy (and now gone) prepare_to_copy().
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, fpu: drop the fpu state during thread exit
x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
coredump: ensure the fpu state is flushed for proper multi-threaded core dump
fork: move the real prepare_to_copy() users to arch_dup_task_struct()
Pull more ARM updates from Russell King.
This got a fair number of conflicts with the <asm/system.h> split, but
also with some other sparse-irq and header file include cleanups. They
all looked pretty trivial, though.
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (59 commits)
ARM: fix Kconfig warning for HAVE_BPF_JIT
ARM: 7361/1: provide XIP_VIRT_ADDR for no-MMU builds
ARM: 7349/1: integrator: convert to sparse irqs
ARM: 7259/3: net: JIT compiler for packet filters
ARM: 7334/1: add jump label support
ARM: 7333/2: jump label: detect %c support for ARM
ARM: 7338/1: add support for early console output via semihosting
ARM: use set_current_blocked() and block_sigmask()
ARM: exec: remove redundant set_fs(USER_DS)
ARM: 7332/1: extract out code patch function from kprobes
ARM: 7331/1: extract out insn generation code from ftrace
ARM: 7330/1: ftrace: use canonical Thumb-2 wide instruction format
ARM: 7351/1: ftrace: remove useless memory checks
ARM: 7316/1: kexec: EOI active and mask all interrupts in kexec crash path
ARM: Versatile Express: add NO_IOPORT
ARM: get rid of asm/irq.h in asm/prom.h
ARM: 7319/1: Print debug info for SIGBUS in user faults
ARM: 7318/1: gic: refactor irq_start assignment
ARM: 7317/1: irq: avoid NULL check in for_each_irq_desc loop
ARM: 7315/1: perf: add support for the Cortex-A7 PMU
...
For files that include asm/processor.h but not asm/system.h:
arch/arm/mach-msm/include/mach/uncompress.h: In function 'putc':
arch/arm/mach-msm/include/mach/uncompress.h:48:3: error: implicit declaration of function 'smp_mb' [-Werror=implicit-function-declaration]
In this case, smp_mb() is from the cpu_relax() call in the msm putc().
It likely went uncaught when the uncompress.h change went in since the
defconfig didn't enable that code path, but later changes (e76f4750f4:
ARM: debug: arrange Kconfig options more logically) resulted in the
option being on for msm_defconfig and thus exposed it.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Similar to other architectures, this adds topdown mmap support in user
process address space allocation policy. This allows mmap sizes greater
than 2GB. This support is largely copied from MIPS and the generic
implementations.
The address space randomization is moved into arch_pick_mmap_layout.
Tested on V-Express with ubuntu and a mmap test from here:
https://bugs.launchpad.net/bugs/861296
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On revisions of the Cortex-A9 prior to r2p0, the Store Buffer does not
have any automatic draining mechanism and therefore a livelock may occur
if an external agent continuously polls a memory location waiting to
observe an update.
This workaround defines cpu_relax() as smp_mb(), preventing correctly
written polling loops from denying visibility of updates to memory.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
PTRACE_SINGLESTEP is a ptrace request designed to offer single-stepping
support to userspace when the underlying architecture has hardware
support for this operation.
On ARM, we set arch_has_single_step() to 1 and attempt to emulate hardware
single-stepping by disassembling the current instruction to determine the
next pc and placing a software breakpoint on that location.
Unfortunately this has the following problems:
1.) Only a subset of ARMv7 instructions are supported
2.) Thumb-2 is unsupported
3.) The code is not SMP safe
We could try to fix this code, but it turns out that because of the above
issues it is rarely used in practice. GDB, for example, uses PTRACE_POKETEXT
and PTRACE_PEEKTEXT to manage breakpoints itself and does not require any
kernel assistance.
This patch removes the single-step emulation code from ptrace meaning that
the PTRACE_SINGLESTEP request will return -EIO on ARM. Portable code must
check the return value from a ptrace call and handle the failure gracefully.
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
For debuggers to take advantage of the hw-breakpoint framework in the kernel,
it is necessary to expose the API calls via a ptrace interface.
This patch exposes the hardware breakpoints framework as a collection of
virtual registers, accesible using PTRACE_SETHBPREGS and PTRACE_GETHBPREGS
requests. The breakpoints are stored in the debug_info struct of the running
thread.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: S. Karthikeyan <informkarthik@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linux expects that if a CPU modifies a memory location, then that
modification will eventually become visible to other CPUs in the system.
On an ARM11MPCore processor, loads are prioritised over stores so it is
possible for a store operation to be postponed if a polling loop immediately
follows it. If the variable being polled indirectly depends on the outstanding
store [for example, another CPU may be polling the variable that is pending
modification] then there is the potential for deadlock if interrupts are
disabled. This deadlock occurs in the KGDB testsuire when executing on an
SMP ARM11MPCore configuration.
This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
forcing a flushing of the write buffer on SMP systems before the next load
takes place. If the Kernel is not compiled for SMP support, this will expand
to a barrier() as before.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Starting with ARMv6, the CPUs support the BE-8 variant of big-endian
(byte-invariant). This patch adds the core support:
- setting of the BE-8 mode via the CPSR.E register for both kernel and
user threads
- big-endian page table walking
- REV used to rotate instructions read from memory during fault
processing as they are still little-endian format
- Kconfig and Makefile support for BE-8. The --be8 option must be passed
to the final linking stage to convert the instructions to
little-endian
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 8ec53663d2 ("[ARM] Improve
non-executable support") added support for detecting non-executable
stack binaries. One of the things it does is to make READ_IMPLIES_EXEC
be set in ->personality if we are running on a CPU that doesn't support
the XN ("Execute Never") page table bit or if we are running a binary
that needs an executable stack.
This exposed a latent bug in ARM's asm/processor.h due to which we'll
end up placing the stack at a very low address, where it will bump into
the heap on any application that uses significant amount of stack or
heap or both, causing many interesting crashes.
Fix this by testing the ADDR_LIMIT_32BIT bit in ->personality instead
of testing for equality against PER_LINUX_32BIT.
Reviewed-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As suggested by Andrew Morton, remove memzero() - it's not supported
on other architectures so use of it is a potential build breaking bug.
Since the compiler optimizes memset(x,0,n) to __memzero() perfectly
well, we don't miss out on the underlying benefits of memzero().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With gcc 4.3 and later, a pointer that has already been dereferenced is
assumed not to be null since it should have caused a segmentation fault
otherwise, hence any subsequent test against NULL is optimized away.
Current inline asm constraint used in the implementation of prefetch()
makes gcc believe that the pointer is dereferenced even though the PLD
instruction does not load any data and does not cause a segmentation
fault on null pointers, which causes all sorts of interesting results
when reaching the end of a linked lists for example.
Let's use a better constraint to properly represent the actual usage of
the pointer value.
Problem reported by Chris Steel.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move platform independent header files to arch/arm/include/asm, leaving
those in asm/arch* and asm/plat* alone.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>