Merge branch 'mauro' into docs-next
A big set of fixes and RST conversions from Mauro. He swears that this is the last RST conversion set, which is certainly cause for celebration.
This commit is contained in:
@@ -101,37 +101,48 @@ be specified in bytes with optional scale suffix [kKmMgG]. The default huge
|
||||
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
|
||||
|
||||
Hugetlb boot command line parameter semantics
|
||||
hugepagesz - Specify a huge page size. Used in conjunction with hugepages
|
||||
|
||||
hugepagesz
|
||||
Specify a huge page size. Used in conjunction with hugepages
|
||||
parameter to preallocate a number of huge pages of the specified
|
||||
size. Hence, hugepagesz and hugepages are typically specified in
|
||||
pairs such as:
|
||||
pairs such as::
|
||||
|
||||
hugepagesz=2M hugepages=512
|
||||
|
||||
hugepagesz can only be specified once on the command line for a
|
||||
specific huge page size. Valid huge page sizes are architecture
|
||||
dependent.
|
||||
hugepages - Specify the number of huge pages to preallocate. This typically
|
||||
hugepages
|
||||
Specify the number of huge pages to preallocate. This typically
|
||||
follows a valid hugepagesz or default_hugepagesz parameter. However,
|
||||
if hugepages is the first or only hugetlb command line parameter it
|
||||
implicitly specifies the number of huge pages of default size to
|
||||
allocate. If the number of huge pages of default size is implicitly
|
||||
specified, it can not be overwritten by a hugepagesz,hugepages
|
||||
parameter pair for the default size.
|
||||
For example, on an architecture with 2M default huge page size:
|
||||
|
||||
For example, on an architecture with 2M default huge page size::
|
||||
|
||||
hugepages=256 hugepagesz=2M hugepages=512
|
||||
|
||||
will result in 256 2M huge pages being allocated and a warning message
|
||||
indicating that the hugepages=512 parameter is ignored. If a hugepages
|
||||
parameter is preceded by an invalid hugepagesz parameter, it will
|
||||
be ignored.
|
||||
default_hugepagesz - Specify the default huge page size. This parameter can
|
||||
default_hugepagesz
|
||||
pecify the default huge page size. This parameter can
|
||||
only be specified once on the command line. default_hugepagesz can
|
||||
optionally be followed by the hugepages parameter to preallocate a
|
||||
specific number of huge pages of default size. The number of default
|
||||
sized huge pages to preallocate can also be implicitly specified as
|
||||
mentioned in the hugepages section above. Therefore, on an
|
||||
architecture with 2M default huge page size:
|
||||
architecture with 2M default huge page size::
|
||||
|
||||
hugepages=256
|
||||
default_hugepagesz=2M hugepages=256
|
||||
hugepages=256 default_hugepagesz=2M
|
||||
|
||||
will all result in 256 2M huge pages being allocated. Valid default
|
||||
huge page size is architecture dependent.
|
||||
|
||||
|
@@ -31,6 +31,7 @@ the Linux memory management.
|
||||
idle_page_tracking
|
||||
ksm
|
||||
memory-hotplug
|
||||
nommu-map
|
||||
numa_memory_policy
|
||||
numaperf
|
||||
pagemap
|
||||
|
283
Documentation/admin-guide/mm/nommu-mmap.rst
Normal file
283
Documentation/admin-guide/mm/nommu-mmap.rst
Normal file
@@ -0,0 +1,283 @@
|
||||
=============================
|
||||
No-MMU memory mapping support
|
||||
=============================
|
||||
|
||||
The kernel has limited support for memory mapping under no-MMU conditions, such
|
||||
as are used in uClinux environments. From the userspace point of view, memory
|
||||
mapping is made use of in conjunction with the mmap() system call, the shmat()
|
||||
call and the execve() system call. From the kernel's point of view, execve()
|
||||
mapping is actually performed by the binfmt drivers, which call back into the
|
||||
mmap() routines to do the actual work.
|
||||
|
||||
Memory mapping behaviour also involves the way fork(), vfork(), clone() and
|
||||
ptrace() work. Under uClinux there is no fork(), and clone() must be supplied
|
||||
the CLONE_VM flag.
|
||||
|
||||
The behaviour is similar between the MMU and no-MMU cases, but not identical;
|
||||
and it's also much more restricted in the latter case:
|
||||
|
||||
(#) Anonymous mapping, MAP_PRIVATE
|
||||
|
||||
In the MMU case: VM regions backed by arbitrary pages; copy-on-write
|
||||
across fork.
|
||||
|
||||
In the no-MMU case: VM regions backed by arbitrary contiguous runs of
|
||||
pages.
|
||||
|
||||
(#) Anonymous mapping, MAP_SHARED
|
||||
|
||||
These behave very much like private mappings, except that they're
|
||||
shared across fork() or clone() without CLONE_VM in the MMU case. Since
|
||||
the no-MMU case doesn't support these, behaviour is identical to
|
||||
MAP_PRIVATE there.
|
||||
|
||||
(#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE
|
||||
|
||||
In the MMU case: VM regions backed by pages read from file; changes to
|
||||
the underlying file are reflected in the mapping; copied across fork.
|
||||
|
||||
In the no-MMU case:
|
||||
|
||||
- If one exists, the kernel will re-use an existing mapping to the
|
||||
same segment of the same file if that has compatible permissions,
|
||||
even if this was created by another process.
|
||||
|
||||
- If possible, the file mapping will be directly on the backing device
|
||||
if the backing device has the NOMMU_MAP_DIRECT capability and
|
||||
appropriate mapping protection capabilities. Ramfs, romfs, cramfs
|
||||
and mtd might all permit this.
|
||||
|
||||
- If the backing device can't or won't permit direct sharing,
|
||||
but does have the NOMMU_MAP_COPY capability, then a copy of the
|
||||
appropriate bit of the file will be read into a contiguous bit of
|
||||
memory and any extraneous space beyond the EOF will be cleared
|
||||
|
||||
- Writes to the file do not affect the mapping; writes to the mapping
|
||||
are visible in other processes (no MMU protection), but should not
|
||||
happen.
|
||||
|
||||
(#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE
|
||||
|
||||
In the MMU case: like the non-PROT_WRITE case, except that the pages in
|
||||
question get copied before the write actually happens. From that point
|
||||
on writes to the file underneath that page no longer get reflected into
|
||||
the mapping's backing pages. The page is then backed by swap instead.
|
||||
|
||||
In the no-MMU case: works much like the non-PROT_WRITE case, except
|
||||
that a copy is always taken and never shared.
|
||||
|
||||
(#) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
||||
|
||||
In the MMU case: VM regions backed by pages read from file; changes to
|
||||
pages written back to file; writes to file reflected into pages backing
|
||||
mapping; shared across fork.
|
||||
|
||||
In the no-MMU case: not supported.
|
||||
|
||||
(#) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
||||
|
||||
In the MMU case: As for ordinary regular files.
|
||||
|
||||
In the no-MMU case: The filesystem providing the memory-backed file
|
||||
(such as ramfs or tmpfs) may choose to honour an open, truncate, mmap
|
||||
sequence by providing a contiguous sequence of pages to map. In that
|
||||
case, a shared-writable memory mapping will be possible. It will work
|
||||
as for the MMU case. If the filesystem does not provide any such
|
||||
support, then the mapping request will be denied.
|
||||
|
||||
(#) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
||||
|
||||
In the MMU case: As for ordinary regular files.
|
||||
|
||||
In the no-MMU case: As for memory backed regular files, but the
|
||||
blockdev must be able to provide a contiguous run of pages without
|
||||
truncate being called. The ramdisk driver could do this if it allocated
|
||||
all its memory as a contiguous array upfront.
|
||||
|
||||
(#) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
||||
|
||||
In the MMU case: As for ordinary regular files.
|
||||
|
||||
In the no-MMU case: The character device driver may choose to honour
|
||||
the mmap() by providing direct access to the underlying device if it
|
||||
provides memory or quasi-memory that can be accessed directly. Examples
|
||||
of such are frame buffers and flash devices. If the driver does not
|
||||
provide any such support, then the mapping request will be denied.
|
||||
|
||||
|
||||
Further notes on no-MMU MMAP
|
||||
============================
|
||||
|
||||
(#) A request for a private mapping of a file may return a buffer that is not
|
||||
page-aligned. This is because XIP may take place, and the data may not be
|
||||
paged aligned in the backing store.
|
||||
|
||||
(#) A request for an anonymous mapping will always be page aligned. If
|
||||
possible the size of the request should be a power of two otherwise some
|
||||
of the space may be wasted as the kernel must allocate a power-of-2
|
||||
granule but will only discard the excess if appropriately configured as
|
||||
this has an effect on fragmentation.
|
||||
|
||||
(#) The memory allocated by a request for an anonymous mapping will normally
|
||||
be cleared by the kernel before being returned in accordance with the
|
||||
Linux man pages (ver 2.22 or later).
|
||||
|
||||
In the MMU case this can be achieved with reasonable performance as
|
||||
regions are backed by virtual pages, with the contents only being mapped
|
||||
to cleared physical pages when a write happens on that specific page
|
||||
(prior to which, the pages are effectively mapped to the global zero page
|
||||
from which reads can take place). This spreads out the time it takes to
|
||||
initialize the contents of a page - depending on the write-usage of the
|
||||
mapping.
|
||||
|
||||
In the no-MMU case, however, anonymous mappings are backed by physical
|
||||
pages, and the entire map is cleared at allocation time. This can cause
|
||||
significant delays during a userspace malloc() as the C library does an
|
||||
anonymous mapping and the kernel then does a memset for the entire map.
|
||||
|
||||
However, for memory that isn't required to be precleared - such as that
|
||||
returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag to
|
||||
indicate to the kernel that it shouldn't bother clearing the memory before
|
||||
returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabled
|
||||
to permit this, otherwise the flag will be ignored.
|
||||
|
||||
uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this
|
||||
to allocate the brk and stack region.
|
||||
|
||||
(#) A list of all the private copy and anonymous mappings on the system is
|
||||
visible through /proc/maps in no-MMU mode.
|
||||
|
||||
(#) A list of all the mappings in use by a process is visible through
|
||||
/proc/<pid>/maps in no-MMU mode.
|
||||
|
||||
(#) Supplying MAP_FIXED or a requesting a particular mapping address will
|
||||
result in an error.
|
||||
|
||||
(#) Files mapped privately usually have to have a read method provided by the
|
||||
driver or filesystem so that the contents can be read into the memory
|
||||
allocated if mmap() chooses not to map the backing device directly. An
|
||||
error will result if they don't. This is most likely to be encountered
|
||||
with character device files, pipes, fifos and sockets.
|
||||
|
||||
|
||||
Interprocess shared memory
|
||||
==========================
|
||||
|
||||
Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU
|
||||
mode. The former through the usual mechanism, the latter through files created
|
||||
on ramfs or tmpfs mounts.
|
||||
|
||||
|
||||
Futexes
|
||||
=======
|
||||
|
||||
Futexes are supported in NOMMU mode if the arch supports them. An error will
|
||||
be given if an address passed to the futex system call lies outside the
|
||||
mappings made by a process or if the mapping in which the address lies does not
|
||||
support futexes (such as an I/O chardev mapping).
|
||||
|
||||
|
||||
No-MMU mremap
|
||||
=============
|
||||
|
||||
The mremap() function is partially supported. It may change the size of a
|
||||
mapping, and may move it [#]_ if MREMAP_MAYMOVE is specified and if the new size
|
||||
of the mapping exceeds the size of the slab object currently occupied by the
|
||||
memory to which the mapping refers, or if a smaller slab object could be used.
|
||||
|
||||
MREMAP_FIXED is not supported, though it is ignored if there's no change of
|
||||
address and the object does not need to be moved.
|
||||
|
||||
Shared mappings may not be moved. Shareable mappings may not be moved either,
|
||||
even if they are not currently shared.
|
||||
|
||||
The mremap() function must be given an exact match for base address and size of
|
||||
a previously mapped object. It may not be used to create holes in existing
|
||||
mappings, move parts of existing mappings or resize parts of mappings. It must
|
||||
act on a complete mapping.
|
||||
|
||||
.. [#] Not currently supported.
|
||||
|
||||
|
||||
Providing shareable character device support
|
||||
============================================
|
||||
|
||||
To provide shareable character device support, a driver must provide a
|
||||
file->f_op->get_unmapped_area() operation. The mmap() routines will call this
|
||||
to get a proposed address for the mapping. This may return an error if it
|
||||
doesn't wish to honour the mapping because it's too long, at a weird offset,
|
||||
under some unsupported combination of flags or whatever.
|
||||
|
||||
The driver should also provide backing device information with capabilities set
|
||||
to indicate the permitted types of mapping on such devices. The default is
|
||||
assumed to be readable and writable, not executable, and only shareable
|
||||
directly (can't be copied).
|
||||
|
||||
The file->f_op->mmap() operation will be called to actually inaugurate the
|
||||
mapping. It can be rejected at that point. Returning the ENOSYS error will
|
||||
cause the mapping to be copied instead if NOMMU_MAP_COPY is specified.
|
||||
|
||||
The vm_ops->close() routine will be invoked when the last mapping on a chardev
|
||||
is removed. An existing mapping will be shared, partially or not, if possible
|
||||
without notifying the driver.
|
||||
|
||||
It is permitted also for the file->f_op->get_unmapped_area() operation to
|
||||
return -ENOSYS. This will be taken to mean that this operation just doesn't
|
||||
want to handle it, despite the fact it's got an operation. For instance, it
|
||||
might try directing the call to a secondary driver which turns out not to
|
||||
implement it. Such is the case for the framebuffer driver which attempts to
|
||||
direct the call to the device-specific driver. Under such circumstances, the
|
||||
mapping request will be rejected if NOMMU_MAP_COPY is not specified, and a
|
||||
copy mapped otherwise.
|
||||
|
||||
.. important::
|
||||
|
||||
Some types of device may present a different appearance to anyone
|
||||
looking at them in certain modes. Flash chips can be like this; for
|
||||
instance if they're in programming or erase mode, you might see the
|
||||
status reflected in the mapping, instead of the data.
|
||||
|
||||
In such a case, care must be taken lest userspace see a shared or a
|
||||
private mapping showing such information when the driver is busy
|
||||
controlling the device. Remember especially: private executable
|
||||
mappings may still be mapped directly off the device under some
|
||||
circumstances!
|
||||
|
||||
|
||||
Providing shareable memory-backed file support
|
||||
==============================================
|
||||
|
||||
Provision of shared mappings on memory backed files is similar to the provision
|
||||
of support for shared mapped character devices. The main difference is that the
|
||||
filesystem providing the service will probably allocate a contiguous collection
|
||||
of pages and permit mappings to be made on that.
|
||||
|
||||
It is recommended that a truncate operation applied to such a file that
|
||||
increases the file size, if that file is empty, be taken as a request to gather
|
||||
enough pages to honour a mapping. This is required to support POSIX shared
|
||||
memory.
|
||||
|
||||
Memory backed devices are indicated by the mapping's backing device info having
|
||||
the memory_backed flag set.
|
||||
|
||||
|
||||
Providing shareable block device support
|
||||
========================================
|
||||
|
||||
Provision of shared mappings on block device files is exactly the same as for
|
||||
character devices. If there isn't a real device underneath, then the driver
|
||||
should allocate sufficient contiguous memory to honour any supported mapping.
|
||||
|
||||
|
||||
Adjusting page trimming behaviour
|
||||
=================================
|
||||
|
||||
NOMMU mmap automatically rounds up to the nearest power-of-2 number of pages
|
||||
when performing an allocation. This can have adverse effects on memory
|
||||
fragmentation, and as such, is left configurable. The default behaviour is to
|
||||
aggressively trim allocations and discard any excess pages back in to the page
|
||||
allocator. In order to retain finer-grained control over fragmentation, this
|
||||
behaviour can either be disabled completely, or bumped up to a higher page
|
||||
watermark where trimming begins.
|
||||
|
||||
Page trimming behaviour is configurable via the sysctl ``vm.nr_trim_pages``.
|
@@ -583,7 +583,7 @@ trimming of allocations is initiated.
|
||||
|
||||
The default value is 1.
|
||||
|
||||
See Documentation/nommu-mmap.txt for more information.
|
||||
See Documentation/admin-guide/mm/nommu-mmap.rst for more information.
|
||||
|
||||
|
||||
numa_zonelist_order
|
||||
|
Reference in New Issue
Block a user