Merge branches 'for-next/52-bit-kva', 'for-next/cpu-topology', 'for-next/error-injection', 'for-next/perf', 'for-next/psci-cpuidle', 'for-next/rng', 'for-next/smpboot', 'for-next/tbi' and 'for-next/tlbi' into for-next/core

* for-next/52-bit-kva: (25 commits)
  Support for 52-bit virtual addressing in kernel space

* for-next/cpu-topology: (9 commits)
  Move CPU topology parsing into core code and add support for ACPI 6.3

* for-next/error-injection: (2 commits)
  Support for function error injection via kprobes

* for-next/perf: (8 commits)
  Support for i.MX8 DDR PMU and proper SMMUv3 group validation

* for-next/psci-cpuidle: (7 commits)
  Move PSCI idle code into a new CPUidle driver

* for-next/rng: (4 commits)
  Support for 'rng-seed' property being passed in the devicetree

* for-next/smpboot: (3 commits)
  Reduce fragility of secondary CPU bringup in debug configurations

* for-next/tbi: (10 commits)
  Introduce new syscall ABI with relaxed requirements for pointer tags

* for-next/tlbi: (6 commits)
  Handle spurious page faults arising from kernel space
This commit is contained in:
89 changed files with 1975 additions and 972 deletions

View File

@@ -16,6 +16,7 @@ ARM64 Architecture
pointer-authentication
silicon-errata
sve
tagged-address-abi
tagged-pointers
.. only:: subproject and html

View File

@@ -0,0 +1,27 @@
#!/bin/sh
# Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW
# start address at the mid-point of the kernel VA space
print_kasan_offset () {
printf "%02d\t" $1
printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \
+ (1 << ($1 - 32 - $2)) \
- (1 << (64 - 32 - $2)) ))
}
echo KASAN_SHADOW_SCALE_SHIFT = 3
printf "VABITS\tKASAN_SHADOW_OFFSET\n"
print_kasan_offset 48 3
print_kasan_offset 47 3
print_kasan_offset 42 3
print_kasan_offset 39 3
print_kasan_offset 36 3
echo
echo KASAN_SHADOW_SCALE_SHIFT = 4
printf "VABITS\tKASAN_SHADOW_OFFSET\n"
print_kasan_offset 48 4
print_kasan_offset 47 4
print_kasan_offset 42 4
print_kasan_offset 39 4
print_kasan_offset 36 4

View File

@@ -14,6 +14,10 @@ with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
virtual address, are used but the memory layout is the same.
ARMv8.2 adds optional support for Large Virtual Address space. This is
only available when running with a 64KB page size and expands the
number of descriptors in the first level of translation.
User addresses have bits 63:48 set to 0 while the kernel addresses have
the same bits set to 1. TTBRx selection is given by bit 63 of the
virtual address. The swapper_pg_dir contains only kernel (global)
@@ -22,40 +26,43 @@ The swapper_pg_dir address is written to TTBR1 and never written to
TTBR0.
AArch64 Linux memory layout with 4KB pages + 3 levels::
Start End Size Use
-----------------------------------------------------------------------
0000000000000000 0000007fffffffff 512GB user
ffffff8000000000 ffffffffffffffff 512GB kernel
AArch64 Linux memory layout with 4KB pages + 4 levels::
AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
Start End Size Use
-----------------------------------------------------------------------
0000000000000000 0000ffffffffffff 256TB user
ffff000000000000 ffffffffffffffff 256TB kernel
ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map
ffff800000000000 ffff9fffffffffff 32TB kasan shadow region
ffffa00000000000 ffffa00007ffffff 128MB bpf jit region
ffffa00008000000 ffffa0000fffffff 128MB modules
ffffa00010000000 fffffdffbffeffff ~93TB vmalloc
fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region]
fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings
fffffdfffea00000 fffffdfffebfffff 2MB [guard region]
fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space
fffffdffffc00000 fffffdffffdfffff 2MB [guard region]
fffffdffffe00000 ffffffffffdfffff 2TB vmemmap
ffffffffffe00000 ffffffffffffffff 2MB [guard region]
AArch64 Linux memory layout with 64KB pages + 2 levels::
AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
Start End Size Use
-----------------------------------------------------------------------
0000000000000000 000003ffffffffff 4TB user
fffffc0000000000 ffffffffffffffff 4TB kernel
AArch64 Linux memory layout with 64KB pages + 3 levels::
Start End Size Use
-----------------------------------------------------------------------
0000000000000000 0000ffffffffffff 256TB user
ffff000000000000 ffffffffffffffff 256TB kernel
For details of the virtual kernel memory layout please see the kernel
booting log.
0000000000000000 000fffffffffffff 4PB user
fff0000000000000 fff7ffffffffffff 2PB kernel logical memory map
fff8000000000000 fffd9fffffffffff 1440TB [gap]
fffda00000000000 ffff9fffffffffff 512TB kasan shadow region
ffffa00000000000 ffffa00007ffffff 128MB bpf jit region
ffffa00008000000 ffffa0000fffffff 128MB modules
ffffa00010000000 fffff81ffffeffff ~88TB vmalloc
fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region]
fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings
fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region]
fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space
fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region]
fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap
ffffffffffe00000 ffffffffffffffff 2MB [guard region]
Translation table lookup with 4KB pages::
@@ -83,7 +90,8 @@ Translation table lookup with 64KB pages::
| | | | [15:0] in-page offset
| | | +----------> [28:16] L3 index
| | +--------------------------> [41:29] L2 index
| +-------------------------------> [47:42] L1 index
| +-------------------------------> [47:42] L1 index (48-bit)
| [51:42] L1 index (52-bit)
+-------------------------------------------------> [63] TTBR0/1
@@ -96,3 +104,62 @@ ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
When using KVM with the Virtualization Host Extensions, no additional
mappings are created, since the host kernel runs directly in EL2.
52-bit VA support in the kernel
-------------------------------
If the ARMv8.2-LVA optional feature is present, and we are running
with a 64KB page size; then it is possible to use 52-bits of address
space for both userspace and kernel addresses. However, any kernel
binary that supports 52-bit must also be able to fall back to 48-bit
at early boot time if the hardware feature is not present.
This fallback mechanism necessitates the kernel .text to be in the
higher addresses such that they are invariant to 48/52-bit VAs. Due
to the kasan shadow being a fraction of the entire kernel VA space,
the end of the kasan shadow must also be in the higher half of the
kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
the end of the kasan shadow is invariant and dependent on ~0UL,
whilst the start address will "grow" towards the lower addresses).
In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
is kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
this obviates the need for an extra variable read. The physvirt
offset and vmemmap offsets are computed at early boot to enable
this logic.
As a single binary will need to support both 48-bit and 52-bit VA
spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
also must be sized large enought to accommodate a fixed PAGE_OFFSET.
Most code in the kernel should not need to consider the VA_BITS, for
code that does need to know the VA size the variables are
defined as follows:
VA_BITS constant the *maximum* VA space size
VA_BITS_MIN constant the *minimum* VA space size
vabits_actual variable the *actual* VA space size
Maximum and minimum sizes can be useful to ensure that buffers are
sized large enough or that addresses are positioned close enough for
the "worst" case.
52-bit userspace VAs
--------------------
To maintain compatibility with software that relies on the ARMv8.0
VA space maximum size of 48-bits, the kernel will, by default,
return virtual addresses to userspace from a 48-bit range.
Software can "opt-in" to receiving VAs from a 52-bit space by
specifying an mmap hint parameter that is larger than 48-bit.
For example:
maybe_high_address = mmap(~0UL, size, prot, flags,...);
It is also possible to build a debug kernel that returns addresses
from a 52-bit space by enabling the following kernel config options:
CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
Note that this option is only intended for debugging applications
and should not be used in production.

View File

@@ -0,0 +1,156 @@
==========================
AArch64 TAGGED ADDRESS ABI
==========================
Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
Catalin Marinas <catalin.marinas@arm.com>
Date: 21 August 2019
This document describes the usage and semantics of the Tagged Address
ABI on AArch64 Linux.
1. Introduction
---------------
On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing
userspace (EL0) to perform memory accesses through 64-bit pointers with
a non-zero top byte. This document describes the relaxation of the
syscall ABI that allows userspace to pass certain tagged pointers to
kernel syscalls.
2. AArch64 Tagged Address ABI
-----------------------------
From the kernel syscall interface perspective and for the purposes of
this document, a "valid tagged pointer" is a pointer with a potentially
non-zero top-byte that references an address in the user process address
space obtained in one of the following ways:
- ``mmap()`` syscall where either:
- flags have the ``MAP_ANONYMOUS`` bit set or
- the file descriptor refers to a regular file (including those
returned by ``memfd_create()``) or ``/dev/zero``
- ``brk()`` syscall (i.e. the heap area between the initial location of
the program break at process creation and its current location).
- any memory mapped by the kernel in the address space of the process
during creation and with the same restrictions as for ``mmap()`` above
(e.g. data, bss, stack).
The AArch64 Tagged Address ABI has two stages of relaxation depending
how the user addresses are used by the kernel:
1. User addresses not accessed by the kernel but used for address space
management (e.g. ``mmap()``, ``mprotect()``, ``madvise()``). The use
of valid tagged pointers in this context is always allowed.
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
relaxation is disabled by default and the application thread needs to
explicitly enable it via ``prctl()`` as follows:
- ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged
Address ABI for the calling thread.
The ``(unsigned int) arg2`` argument is a bit mask describing the
control mode used:
- ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI.
Default status is disabled.
Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0.
- ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged
Address ABI for the calling thread.
Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0.
The ABI properties described above are thread-scoped, inherited on
clone() and fork() and cleared on exec().
Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)``
returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally
disabled by ``sysctl abi.tagged_addr_disabled=1``. The default
``sysctl abi.tagged_addr_disabled`` configuration is 0.
When the AArch64 Tagged Address ABI is enabled for a thread, the
following behaviours are guaranteed:
- All syscalls except the cases mentioned in section 3 can accept any
valid tagged pointer.
- The syscall behaviour is undefined for invalid tagged pointers: it may
result in an error code being returned, a (fatal) signal being raised,
or other modes of failure.
- The syscall behaviour for a valid tagged pointer is the same as for
the corresponding untagged pointer.
A definition of the meaning of tagged pointers on AArch64 can be found
in Documentation/arm64/tagged-pointers.rst.
3. AArch64 Tagged Address ABI Exceptions
-----------------------------------------
The following system call parameters must be untagged regardless of the
ABI relaxation:
- ``prctl()`` other than pointers to user data either passed directly or
indirectly as arguments to be accessed by the kernel.
- ``ioctl()`` other than pointers to user data either passed directly or
indirectly as arguments to be accessed by the kernel.
- ``shmat()`` and ``shmdt()``.
Any attempt to use non-zero tagged pointers may result in an error code
being returned, a (fatal) signal being raised, or other modes of
failure.
4. Example of correct usage
---------------------------
.. code-block:: c
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#define PR_SET_TAGGED_ADDR_CTRL 55
#define PR_TAGGED_ADDR_ENABLE (1UL << 0)
#define TAG_SHIFT 56
int main(void)
{
int tbi_enabled = 0;
unsigned long tag = 0;
char *ptr;
/* check/enable the tagged address ABI */
if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0))
tbi_enabled = 1;
/* memory allocation */
ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED)
return 1;
/* set a non-zero tag if the ABI is available */
if (tbi_enabled)
tag = rand() & 0xff;
ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
/* memory access to a tagged address */
strcpy(ptr, "tagged pointer\n");
/* syscall with a tagged pointer */
write(1, ptr, strlen(ptr));
return 0;
}

View File

@@ -20,7 +20,9 @@ Passing tagged addresses to the kernel
--------------------------------------
All interpretation of userspace memory addresses by the kernel assumes
an address tag of 0x00.
an address tag of 0x00, unless the application enables the AArch64
Tagged Address ABI explicitly
(Documentation/arm64/tagged-address-abi.rst).
This includes, but is not limited to, addresses found in:
@@ -33,13 +35,15 @@ This includes, but is not limited to, addresses found in:
- the frame pointer (x29) and frame records, e.g. when interpreting
them to generate a backtrace or call graph.
Using non-zero address tags in any of these locations may result in an
error code being returned, a (fatal) signal being raised, or other modes
of failure.
Using non-zero address tags in any of these locations when the
userspace application did not enable the AArch64 Tagged Address ABI may
result in an error code being returned, a (fatal) signal being raised,
or other modes of failure.
For these reasons, passing non-zero address tags to the kernel via
system calls is forbidden, and using a non-zero address tag for sp is
strongly discouraged.
For these reasons, when the AArch64 Tagged Address ABI is disabled,
passing non-zero address tags to the kernel via system calls is
forbidden, and using a non-zero address tag for sp is strongly
discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
address tags may suffer impaired or inaccurate debug and profiling
@@ -59,6 +63,9 @@ be preserved.
The architecture prevents the use of a tagged PC, so the upper byte will
be set to a sign-extension of bit 55 on exception return.
This behaviour is maintained when the AArch64 Tagged Address ABI is
enabled.
Other considerations
--------------------