1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798 |
- .. SPDX-License-Identifier: GPL-2.0
- ======================
- Memory Protection Keys
- ======================
- Memory Protection Keys provide a mechanism for enforcing page-based
- protections, but without requiring modification of the page tables when an
- application changes protection domains.
- Pkeys Userspace (PKU) is a feature which can be found on:
- * Intel server CPUs, Skylake and later
- * Intel client CPUs, Tiger Lake (11th Gen Core) and later
- * Future AMD CPUs
- Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
- a "protection key", giving 16 possible keys.
- Protections for each key are defined with a per-CPU user-accessible register
- (PKRU). Each of these is a 32-bit register storing two bits (Access Disable
- and Write Disable) for each of 16 keys.
- Being a CPU register, PKRU is inherently thread-local, potentially giving each
- thread a different set of protections from every other thread.
- There are two instructions (RDPKRU/WRPKRU) for reading and writing to the
- register. The feature is only available in 64-bit mode, even though there is
- theoretically space in the PAE PTEs. These permissions are enforced on data
- access only and have no effect on instruction fetches.
- Syscalls
- ========
- There are 3 system calls which directly interact with pkeys::
- int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
- int pkey_free(int pkey);
- int pkey_mprotect(unsigned long start, size_t len,
- unsigned long prot, int pkey);
- Before a pkey can be used, it must first be allocated with
- pkey_alloc(). An application calls the WRPKRU instruction
- directly in order to change access permissions to memory covered
- with a key. In this example WRPKRU is wrapped by a C function
- called pkey_set().
- ::
- int real_prot = PROT_READ|PROT_WRITE;
- pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
- ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
- ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
- ... application runs here
- Now, if the application needs to update the data at 'ptr', it can
- gain access, do the update, then remove its write access::
- pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
- *ptr = foo; // assign something
- pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
- Now when it frees the memory, it will also free the pkey since it
- is no longer in use::
- munmap(ptr, PAGE_SIZE);
- pkey_free(pkey);
- .. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
- An example implementation can be found in
- tools/testing/selftests/x86/protection_keys.c.
- Behavior
- ========
- The kernel attempts to make protection keys consistent with the
- behavior of a plain mprotect(). For instance if you do this::
- mprotect(ptr, size, PROT_NONE);
- something(ptr);
- you can expect the same effects with protection keys when doing this::
- pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
- pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
- something(ptr);
- That should be true whether something() is a direct access to 'ptr'
- like::
- *ptr = foo;
- or when the kernel does the access on the application's behalf like
- with a read()::
- read(fd, ptr, 1);
- The kernel will send a SIGSEGV in both cases, but si_code will be set
- to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
- the plain mprotect() permissions are violated.
|