123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177 |
- .. SPDX-License-Identifier: GPL-2.0
- .. _physical_memory_model:
- =====================
- Physical Memory Model
- =====================
- Physical memory in a system may be addressed in different ways. The
- simplest case is when the physical memory starts at address 0 and
- spans a contiguous range up to the maximal address. It could be,
- however, that this range contains small holes that are not accessible
- for the CPU. Then there could be several contiguous ranges at
- completely distinct addresses. And, don't forget about NUMA, where
- different memory banks are attached to different CPUs.
- Linux abstracts this diversity using one of the two memory models:
- FLATMEM and SPARSEMEM. Each architecture defines what
- memory models it supports, what the default memory model is and
- whether it is possible to manually override that default.
- All the memory models track the status of physical page frames using
- struct page arranged in one or more arrays.
- Regardless of the selected memory model, there exists one-to-one
- mapping between the physical page frame number (PFN) and the
- corresponding `struct page`.
- Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn`
- helpers that allow the conversion from PFN to `struct page` and vice
- versa.
- FLATMEM
- =======
- The simplest memory model is FLATMEM. This model is suitable for
- non-NUMA systems with contiguous, or mostly contiguous, physical
- memory.
- In the FLATMEM memory model, there is a global `mem_map` array that
- maps the entire physical memory. For most architectures, the holes
- have entries in the `mem_map` array. The `struct page` objects
- corresponding to the holes are never fully initialized.
- To allocate the `mem_map` array, architecture specific setup code should
- call :c:func:`free_area_init` function. Yet, the mappings array is not
- usable until the call to :c:func:`memblock_free_all` that hands all the
- memory to the page allocator.
- An architecture may free parts of the `mem_map` array that do not cover the
- actual physical pages. In such case, the architecture specific
- :c:func:`pfn_valid` implementation should take the holes in the
- `mem_map` into account.
- With FLATMEM, the conversion between a PFN and the `struct page` is
- straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
- `mem_map` array.
- The `ARCH_PFN_OFFSET` defines the first page frame number for
- systems with physical memory starting at address different from 0.
- SPARSEMEM
- =========
- SPARSEMEM is the most versatile memory model available in Linux and it
- is the only memory model that supports several advanced features such
- as hot-plug and hot-remove of the physical memory, alternative memory
- maps for non-volatile memory devices and deferred initialization of
- the memory map for larger systems.
- The SPARSEMEM model presents the physical memory as a collection of
- sections. A section is represented with struct mem_section
- that contains `section_mem_map` that is, logically, a pointer to an
- array of struct pages. However, it is stored with some other magic
- that aids the sections management. The section size and maximal number
- of section is specified using `SECTION_SIZE_BITS` and
- `MAX_PHYSMEM_BITS` constants defined by each architecture that
- supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a
- physical address that an architecture supports, the
- `SECTION_SIZE_BITS` is an arbitrary value.
- The maximal number of sections is denoted `NR_MEM_SECTIONS` and
- defined as
- .. math::
- NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)}
- The `mem_section` objects are arranged in a two-dimensional array
- called `mem_sections`. The size and placement of this array depend
- on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of
- sections:
- * When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections`
- array is static and has `NR_MEM_SECTIONS` rows. Each row holds a
- single `mem_section` object.
- * When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections`
- array is dynamically allocated. Each row contains PAGE_SIZE worth of
- `mem_section` objects and the number of rows is calculated to fit
- all the memory sections.
- The architecture setup code should call sparse_init() to
- initialize the memory sections and the memory maps.
- With SPARSEMEM there are two possible ways to convert a PFN to the
- corresponding `struct page` - a "classic sparse" and "sparse
- vmemmap". The selection is made at build time and it is determined by
- the value of `CONFIG_SPARSEMEM_VMEMMAP`.
- The classic sparse encodes the section number of a page in page->flags
- and uses high bits of a PFN to access the section that maps that page
- frame. Inside a section, the PFN is the index to the array of pages.
- The sparse vmemmap uses a virtually mapped memory map to optimize
- pfn_to_page and page_to_pfn operations. There is a global `struct
- page *vmemmap` pointer that points to a virtually contiguous array of
- `struct page` objects. A PFN is an index to that array and the
- offset of the `struct page` from `vmemmap` is the PFN of that
- page.
- To use vmemmap, an architecture has to reserve a range of virtual
- addresses that will map the physical pages containing the memory
- map and make sure that `vmemmap` points to that range. In addition,
- the architecture should implement :c:func:`vmemmap_populate` method
- that will allocate the physical memory and create page tables for the
- virtual memory map. If an architecture does not have any special
- requirements for the vmemmap mappings, it can use default
- :c:func:`vmemmap_populate_basepages` provided by the generic memory
- management.
- The virtually mapped memory map allows storing `struct page` objects
- for persistent memory devices in pre-allocated storage on those
- devices. This storage is represented with struct vmem_altmap
- that is eventually passed to vmemmap_populate() through a long chain
- of function calls. The vmemmap_populate() implementation may use the
- `vmem_altmap` along with :c:func:`vmemmap_alloc_block_buf` helper to
- allocate memory map on the persistent memory device.
- ZONE_DEVICE
- ===========
- The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
- `struct page` `mem_map` services for device driver identified physical
- address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
- that the page objects for these address ranges are never marked online,
- and that a reference must be taken against the device, not just the page
- to keep the memory pinned for active use. `ZONE_DEVICE`, via
- :c:func:`devm_memremap_pages`, performs just enough memory hotplug to
- turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
- :c:func:`get_user_pages` service for the given range of pfns. Since the
- page reference count never drops below 1 the page is never tracked as
- free memory and the page's `struct list_head lru` space is repurposed
- for back referencing to the host device / driver that mapped the memory.
- While `SPARSEMEM` presents memory as a collection of sections,
- optionally collected into memory blocks, `ZONE_DEVICE` users have a need
- for smaller granularity of populating the `mem_map`. Given that
- `ZONE_DEVICE` memory is never marked online it is subsequently never
- subject to its memory ranges being exposed through the sysfs memory
- hotplug api on memory block boundaries. The implementation relies on
- this lack of user-api constraint to allow sub-section sized memory
- ranges to be specified to :c:func:`arch_add_memory`, the top-half of
- memory hotplug. Sub-section support allows for 2MB as the cross-arch
- common alignment granularity for :c:func:`devm_memremap_pages`.
- The users of `ZONE_DEVICE` are:
- * pmem: Map platform persistent memory to be used as a direct-I/O target
- via DAX mappings.
- * hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
- event callbacks to allow a device-driver to coordinate memory management
- events related to device-memory, typically GPU memory. See
- Documentation/mm/hmm.rst.
- * p2pdma: Create `struct page` objects to allow peer devices in a
- PCI/-E topology to coordinate direct-DMA operations between themselves,
- i.e. bypass host memory.
|