Merge branches 'for-next/52-bit-kva', 'for-next/cpu-topology', 'for-next/error-injection', 'for-next/perf', 'for-next/psci-cpuidle', 'for-next/rng', 'for-next/smpboot', 'for-next/tbi' and 'for-next/tlbi' into for-next/core
* for-next/52-bit-kva: (25 commits) Support for 52-bit virtual addressing in kernel space * for-next/cpu-topology: (9 commits) Move CPU topology parsing into core code and add support for ACPI 6.3 * for-next/error-injection: (2 commits) Support for function error injection via kprobes * for-next/perf: (8 commits) Support for i.MX8 DDR PMU and proper SMMUv3 group validation * for-next/psci-cpuidle: (7 commits) Move PSCI idle code into a new CPUidle driver * for-next/rng: (4 commits) Support for 'rng-seed' property being passed in the devicetree * for-next/smpboot: (3 commits) Reduce fragility of secondary CPU bringup in debug configurations * for-next/tbi: (10 commits) Introduce new syscall ABI with relaxed requirements for pointer tags * for-next/tlbi: (6 commits) Handle spurious page faults arising from kernel space
This commit is contained in:

@@ -16,6 +16,7 @@ ARM64 Architecture
|
||||
pointer-authentication
|
||||
silicon-errata
|
||||
sve
|
||||
tagged-address-abi
|
||||
tagged-pointers
|
||||
|
||||
.. only:: subproject and html
|
||||
|
27
Documentation/arm64/kasan-offsets.sh
Normal file
27
Documentation/arm64/kasan-offsets.sh
Normal file
@@ -0,0 +1,27 @@
|
||||
#!/bin/sh
|
||||
|
||||
# Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW
|
||||
# start address at the mid-point of the kernel VA space
|
||||
|
||||
print_kasan_offset () {
|
||||
printf "%02d\t" $1
|
||||
printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \
|
||||
+ (1 << ($1 - 32 - $2)) \
|
||||
- (1 << (64 - 32 - $2)) ))
|
||||
}
|
||||
|
||||
echo KASAN_SHADOW_SCALE_SHIFT = 3
|
||||
printf "VABITS\tKASAN_SHADOW_OFFSET\n"
|
||||
print_kasan_offset 48 3
|
||||
print_kasan_offset 47 3
|
||||
print_kasan_offset 42 3
|
||||
print_kasan_offset 39 3
|
||||
print_kasan_offset 36 3
|
||||
echo
|
||||
echo KASAN_SHADOW_SCALE_SHIFT = 4
|
||||
printf "VABITS\tKASAN_SHADOW_OFFSET\n"
|
||||
print_kasan_offset 48 4
|
||||
print_kasan_offset 47 4
|
||||
print_kasan_offset 42 4
|
||||
print_kasan_offset 39 4
|
||||
print_kasan_offset 36 4
|
@@ -14,6 +14,10 @@ with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
|
||||
64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
|
||||
virtual address, are used but the memory layout is the same.
|
||||
|
||||
ARMv8.2 adds optional support for Large Virtual Address space. This is
|
||||
only available when running with a 64KB page size and expands the
|
||||
number of descriptors in the first level of translation.
|
||||
|
||||
User addresses have bits 63:48 set to 0 while the kernel addresses have
|
||||
the same bits set to 1. TTBRx selection is given by bit 63 of the
|
||||
virtual address. The swapper_pg_dir contains only kernel (global)
|
||||
@@ -22,40 +26,43 @@ The swapper_pg_dir address is written to TTBR1 and never written to
|
||||
TTBR0.
|
||||
|
||||
|
||||
AArch64 Linux memory layout with 4KB pages + 3 levels::
|
||||
|
||||
Start End Size Use
|
||||
-----------------------------------------------------------------------
|
||||
0000000000000000 0000007fffffffff 512GB user
|
||||
ffffff8000000000 ffffffffffffffff 512GB kernel
|
||||
|
||||
|
||||
AArch64 Linux memory layout with 4KB pages + 4 levels::
|
||||
AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
|
||||
|
||||
Start End Size Use
|
||||
-----------------------------------------------------------------------
|
||||
0000000000000000 0000ffffffffffff 256TB user
|
||||
ffff000000000000 ffffffffffffffff 256TB kernel
|
||||
ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map
|
||||
ffff800000000000 ffff9fffffffffff 32TB kasan shadow region
|
||||
ffffa00000000000 ffffa00007ffffff 128MB bpf jit region
|
||||
ffffa00008000000 ffffa0000fffffff 128MB modules
|
||||
ffffa00010000000 fffffdffbffeffff ~93TB vmalloc
|
||||
fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region]
|
||||
fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings
|
||||
fffffdfffea00000 fffffdfffebfffff 2MB [guard region]
|
||||
fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space
|
||||
fffffdffffc00000 fffffdffffdfffff 2MB [guard region]
|
||||
fffffdffffe00000 ffffffffffdfffff 2TB vmemmap
|
||||
ffffffffffe00000 ffffffffffffffff 2MB [guard region]
|
||||
|
||||
|
||||
AArch64 Linux memory layout with 64KB pages + 2 levels::
|
||||
AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
|
||||
|
||||
Start End Size Use
|
||||
-----------------------------------------------------------------------
|
||||
0000000000000000 000003ffffffffff 4TB user
|
||||
fffffc0000000000 ffffffffffffffff 4TB kernel
|
||||
|
||||
|
||||
AArch64 Linux memory layout with 64KB pages + 3 levels::
|
||||
|
||||
Start End Size Use
|
||||
-----------------------------------------------------------------------
|
||||
0000000000000000 0000ffffffffffff 256TB user
|
||||
ffff000000000000 ffffffffffffffff 256TB kernel
|
||||
|
||||
|
||||
For details of the virtual kernel memory layout please see the kernel
|
||||
booting log.
|
||||
0000000000000000 000fffffffffffff 4PB user
|
||||
fff0000000000000 fff7ffffffffffff 2PB kernel logical memory map
|
||||
fff8000000000000 fffd9fffffffffff 1440TB [gap]
|
||||
fffda00000000000 ffff9fffffffffff 512TB kasan shadow region
|
||||
ffffa00000000000 ffffa00007ffffff 128MB bpf jit region
|
||||
ffffa00008000000 ffffa0000fffffff 128MB modules
|
||||
ffffa00010000000 fffff81ffffeffff ~88TB vmalloc
|
||||
fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region]
|
||||
fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings
|
||||
fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region]
|
||||
fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space
|
||||
fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region]
|
||||
fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap
|
||||
ffffffffffe00000 ffffffffffffffff 2MB [guard region]
|
||||
|
||||
|
||||
Translation table lookup with 4KB pages::
|
||||
@@ -83,7 +90,8 @@ Translation table lookup with 64KB pages::
|
||||
| | | | [15:0] in-page offset
|
||||
| | | +----------> [28:16] L3 index
|
||||
| | +--------------------------> [41:29] L2 index
|
||||
| +-------------------------------> [47:42] L1 index
|
||||
| +-------------------------------> [47:42] L1 index (48-bit)
|
||||
| [51:42] L1 index (52-bit)
|
||||
+-------------------------------------------------> [63] TTBR0/1
|
||||
|
||||
|
||||
@@ -96,3 +104,62 @@ ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
|
||||
|
||||
When using KVM with the Virtualization Host Extensions, no additional
|
||||
mappings are created, since the host kernel runs directly in EL2.
|
||||
|
||||
52-bit VA support in the kernel
|
||||
-------------------------------
|
||||
If the ARMv8.2-LVA optional feature is present, and we are running
|
||||
with a 64KB page size; then it is possible to use 52-bits of address
|
||||
space for both userspace and kernel addresses. However, any kernel
|
||||
binary that supports 52-bit must also be able to fall back to 48-bit
|
||||
at early boot time if the hardware feature is not present.
|
||||
|
||||
This fallback mechanism necessitates the kernel .text to be in the
|
||||
higher addresses such that they are invariant to 48/52-bit VAs. Due
|
||||
to the kasan shadow being a fraction of the entire kernel VA space,
|
||||
the end of the kasan shadow must also be in the higher half of the
|
||||
kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
|
||||
the end of the kasan shadow is invariant and dependent on ~0UL,
|
||||
whilst the start address will "grow" towards the lower addresses).
|
||||
|
||||
In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
|
||||
is kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
|
||||
this obviates the need for an extra variable read. The physvirt
|
||||
offset and vmemmap offsets are computed at early boot to enable
|
||||
this logic.
|
||||
|
||||
As a single binary will need to support both 48-bit and 52-bit VA
|
||||
spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
|
||||
also must be sized large enought to accommodate a fixed PAGE_OFFSET.
|
||||
|
||||
Most code in the kernel should not need to consider the VA_BITS, for
|
||||
code that does need to know the VA size the variables are
|
||||
defined as follows:
|
||||
|
||||
VA_BITS constant the *maximum* VA space size
|
||||
|
||||
VA_BITS_MIN constant the *minimum* VA space size
|
||||
|
||||
vabits_actual variable the *actual* VA space size
|
||||
|
||||
|
||||
Maximum and minimum sizes can be useful to ensure that buffers are
|
||||
sized large enough or that addresses are positioned close enough for
|
||||
the "worst" case.
|
||||
|
||||
52-bit userspace VAs
|
||||
--------------------
|
||||
To maintain compatibility with software that relies on the ARMv8.0
|
||||
VA space maximum size of 48-bits, the kernel will, by default,
|
||||
return virtual addresses to userspace from a 48-bit range.
|
||||
|
||||
Software can "opt-in" to receiving VAs from a 52-bit space by
|
||||
specifying an mmap hint parameter that is larger than 48-bit.
|
||||
For example:
|
||||
maybe_high_address = mmap(~0UL, size, prot, flags,...);
|
||||
|
||||
It is also possible to build a debug kernel that returns addresses
|
||||
from a 52-bit space by enabling the following kernel config options:
|
||||
CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
|
||||
|
||||
Note that this option is only intended for debugging applications
|
||||
and should not be used in production.
|
||||
|
156
Documentation/arm64/tagged-address-abi.rst
Normal file
156
Documentation/arm64/tagged-address-abi.rst
Normal file
@@ -0,0 +1,156 @@
|
||||
==========================
|
||||
AArch64 TAGGED ADDRESS ABI
|
||||
==========================
|
||||
|
||||
Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
|
||||
Catalin Marinas <catalin.marinas@arm.com>
|
||||
|
||||
Date: 21 August 2019
|
||||
|
||||
This document describes the usage and semantics of the Tagged Address
|
||||
ABI on AArch64 Linux.
|
||||
|
||||
1. Introduction
|
||||
---------------
|
||||
|
||||
On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing
|
||||
userspace (EL0) to perform memory accesses through 64-bit pointers with
|
||||
a non-zero top byte. This document describes the relaxation of the
|
||||
syscall ABI that allows userspace to pass certain tagged pointers to
|
||||
kernel syscalls.
|
||||
|
||||
2. AArch64 Tagged Address ABI
|
||||
-----------------------------
|
||||
|
||||
From the kernel syscall interface perspective and for the purposes of
|
||||
this document, a "valid tagged pointer" is a pointer with a potentially
|
||||
non-zero top-byte that references an address in the user process address
|
||||
space obtained in one of the following ways:
|
||||
|
||||
- ``mmap()`` syscall where either:
|
||||
|
||||
- flags have the ``MAP_ANONYMOUS`` bit set or
|
||||
- the file descriptor refers to a regular file (including those
|
||||
returned by ``memfd_create()``) or ``/dev/zero``
|
||||
|
||||
- ``brk()`` syscall (i.e. the heap area between the initial location of
|
||||
the program break at process creation and its current location).
|
||||
|
||||
- any memory mapped by the kernel in the address space of the process
|
||||
during creation and with the same restrictions as for ``mmap()`` above
|
||||
(e.g. data, bss, stack).
|
||||
|
||||
The AArch64 Tagged Address ABI has two stages of relaxation depending
|
||||
how the user addresses are used by the kernel:
|
||||
|
||||
1. User addresses not accessed by the kernel but used for address space
|
||||
management (e.g. ``mmap()``, ``mprotect()``, ``madvise()``). The use
|
||||
of valid tagged pointers in this context is always allowed.
|
||||
|
||||
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
|
||||
relaxation is disabled by default and the application thread needs to
|
||||
explicitly enable it via ``prctl()`` as follows:
|
||||
|
||||
- ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged
|
||||
Address ABI for the calling thread.
|
||||
|
||||
The ``(unsigned int) arg2`` argument is a bit mask describing the
|
||||
control mode used:
|
||||
|
||||
- ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI.
|
||||
Default status is disabled.
|
||||
|
||||
Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0.
|
||||
|
||||
- ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged
|
||||
Address ABI for the calling thread.
|
||||
|
||||
Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0.
|
||||
|
||||
The ABI properties described above are thread-scoped, inherited on
|
||||
clone() and fork() and cleared on exec().
|
||||
|
||||
Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)``
|
||||
returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally
|
||||
disabled by ``sysctl abi.tagged_addr_disabled=1``. The default
|
||||
``sysctl abi.tagged_addr_disabled`` configuration is 0.
|
||||
|
||||
When the AArch64 Tagged Address ABI is enabled for a thread, the
|
||||
following behaviours are guaranteed:
|
||||
|
||||
- All syscalls except the cases mentioned in section 3 can accept any
|
||||
valid tagged pointer.
|
||||
|
||||
- The syscall behaviour is undefined for invalid tagged pointers: it may
|
||||
result in an error code being returned, a (fatal) signal being raised,
|
||||
or other modes of failure.
|
||||
|
||||
- The syscall behaviour for a valid tagged pointer is the same as for
|
||||
the corresponding untagged pointer.
|
||||
|
||||
|
||||
A definition of the meaning of tagged pointers on AArch64 can be found
|
||||
in Documentation/arm64/tagged-pointers.rst.
|
||||
|
||||
3. AArch64 Tagged Address ABI Exceptions
|
||||
-----------------------------------------
|
||||
|
||||
The following system call parameters must be untagged regardless of the
|
||||
ABI relaxation:
|
||||
|
||||
- ``prctl()`` other than pointers to user data either passed directly or
|
||||
indirectly as arguments to be accessed by the kernel.
|
||||
|
||||
- ``ioctl()`` other than pointers to user data either passed directly or
|
||||
indirectly as arguments to be accessed by the kernel.
|
||||
|
||||
- ``shmat()`` and ``shmdt()``.
|
||||
|
||||
Any attempt to use non-zero tagged pointers may result in an error code
|
||||
being returned, a (fatal) signal being raised, or other modes of
|
||||
failure.
|
||||
|
||||
4. Example of correct usage
|
||||
---------------------------
|
||||
.. code-block:: c
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/mman.h>
|
||||
#include <sys/prctl.h>
|
||||
|
||||
#define PR_SET_TAGGED_ADDR_CTRL 55
|
||||
#define PR_TAGGED_ADDR_ENABLE (1UL << 0)
|
||||
|
||||
#define TAG_SHIFT 56
|
||||
|
||||
int main(void)
|
||||
{
|
||||
int tbi_enabled = 0;
|
||||
unsigned long tag = 0;
|
||||
char *ptr;
|
||||
|
||||
/* check/enable the tagged address ABI */
|
||||
if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0))
|
||||
tbi_enabled = 1;
|
||||
|
||||
/* memory allocation */
|
||||
ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE,
|
||||
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
|
||||
if (ptr == MAP_FAILED)
|
||||
return 1;
|
||||
|
||||
/* set a non-zero tag if the ABI is available */
|
||||
if (tbi_enabled)
|
||||
tag = rand() & 0xff;
|
||||
ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
|
||||
|
||||
/* memory access to a tagged address */
|
||||
strcpy(ptr, "tagged pointer\n");
|
||||
|
||||
/* syscall with a tagged pointer */
|
||||
write(1, ptr, strlen(ptr));
|
||||
|
||||
return 0;
|
||||
}
|
@@ -20,7 +20,9 @@ Passing tagged addresses to the kernel
|
||||
--------------------------------------
|
||||
|
||||
All interpretation of userspace memory addresses by the kernel assumes
|
||||
an address tag of 0x00.
|
||||
an address tag of 0x00, unless the application enables the AArch64
|
||||
Tagged Address ABI explicitly
|
||||
(Documentation/arm64/tagged-address-abi.rst).
|
||||
|
||||
This includes, but is not limited to, addresses found in:
|
||||
|
||||
@@ -33,13 +35,15 @@ This includes, but is not limited to, addresses found in:
|
||||
- the frame pointer (x29) and frame records, e.g. when interpreting
|
||||
them to generate a backtrace or call graph.
|
||||
|
||||
Using non-zero address tags in any of these locations may result in an
|
||||
error code being returned, a (fatal) signal being raised, or other modes
|
||||
of failure.
|
||||
Using non-zero address tags in any of these locations when the
|
||||
userspace application did not enable the AArch64 Tagged Address ABI may
|
||||
result in an error code being returned, a (fatal) signal being raised,
|
||||
or other modes of failure.
|
||||
|
||||
For these reasons, passing non-zero address tags to the kernel via
|
||||
system calls is forbidden, and using a non-zero address tag for sp is
|
||||
strongly discouraged.
|
||||
For these reasons, when the AArch64 Tagged Address ABI is disabled,
|
||||
passing non-zero address tags to the kernel via system calls is
|
||||
forbidden, and using a non-zero address tag for sp is strongly
|
||||
discouraged.
|
||||
|
||||
Programs maintaining a frame pointer and frame records that use non-zero
|
||||
address tags may suffer impaired or inaccurate debug and profiling
|
||||
@@ -59,6 +63,9 @@ be preserved.
|
||||
The architecture prevents the use of a tagged PC, so the upper byte will
|
||||
be set to a sign-extension of bit 55 on exception return.
|
||||
|
||||
This behaviour is maintained when the AArch64 Tagged Address ABI is
|
||||
enabled.
|
||||
|
||||
|
||||
Other considerations
|
||||
--------------------
|
||||
|
Reference in New Issue
Block a user