
In order to support accesses to larger chunks of memory, pass in a 'size' parameter (counted in bytes), and return the amount available at that address. Add a new helper function, bdev_direct_access(), to handle common functionality including partition handling, checking the length requested is positive, checking for the sector being page-aligned, and checking the length of the request does not pass the end of the partition. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Jens Axboe <axboe@fb.com>
72 lines
3.1 KiB
Plaintext
72 lines
3.1 KiB
Plaintext
Execute-in-place for file mappings
|
|
----------------------------------
|
|
|
|
Motivation
|
|
----------
|
|
File mappings are performed by mapping page cache pages to userspace. In
|
|
addition, read&write type file operations also transfer data from/to the page
|
|
cache.
|
|
|
|
For memory backed storage devices that use the block device interface, the page
|
|
cache pages are in fact copies of the original storage. Various approaches
|
|
exist to work around the need for an extra copy. The ramdisk driver for example
|
|
does read the data into the page cache, keeps a reference, and discards the
|
|
original data behind later on.
|
|
|
|
Execute-in-place solves this issue the other way around: instead of keeping
|
|
data in the page cache, the need to have a page cache copy is eliminated
|
|
completely. With execute-in-place, read&write type operations are performed
|
|
directly from/to the memory backed storage device. For file mappings, the
|
|
storage device itself is mapped directly into userspace.
|
|
|
|
This implementation was initially written for shared memory segments between
|
|
different virtual machines on s390 hardware to allow multiple machines to
|
|
share the same binaries and libraries.
|
|
|
|
Implementation
|
|
--------------
|
|
Execute-in-place is implemented in three steps: block device operation,
|
|
address space operation, and file operations.
|
|
|
|
A block device operation named direct_access is used to translate the
|
|
block device sector number to a page frame number (pfn) that identifies
|
|
the physical page for the memory. It also returns a kernel virtual
|
|
address that can be used to access the memory.
|
|
|
|
The direct_access method takes a 'size' parameter that indicates the
|
|
number of bytes being requested. The function should return the number
|
|
of bytes that can be contiguously accessed at that offset. It may also
|
|
return a negative errno if an error occurs.
|
|
|
|
The block device operation is optional, these block devices support it as of
|
|
today:
|
|
- dcssblk: s390 dcss block device driver
|
|
|
|
An address space operation named get_xip_mem is used to retrieve references
|
|
to a page frame number and a kernel address. To obtain these values a reference
|
|
to an address_space is provided. This function assigns values to the kmem and
|
|
pfn parameters. The third argument indicates whether the function should allocate
|
|
blocks if needed.
|
|
|
|
This address space operation is mutually exclusive with readpage&writepage that
|
|
do page cache read/write operations.
|
|
The following filesystems support it as of today:
|
|
- ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt
|
|
|
|
A set of file operations that do utilize get_xip_page can be found in
|
|
mm/filemap_xip.c . The following file operation implementations are provided:
|
|
- aio_read/aio_write
|
|
- readv/writev
|
|
- sendfile
|
|
|
|
The generic file operations do_sync_read/do_sync_write can be used to implement
|
|
classic synchronous IO calls.
|
|
|
|
Shortcomings
|
|
------------
|
|
This implementation is limited to storage devices that are cpu addressable at
|
|
all times (no highmem or such). It works well on rom/ram, but enhancements are
|
|
needed to make it work with flash in read+write mode.
|
|
Putting the Linux kernel and/or its modules on a xip filesystem does not mean
|
|
they are not copied.
|