123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253 |
- ======================
- ioctl based interfaces
- ======================
- ioctl() is the most common way for applications to interface
- with device drivers. It is flexible and easily extended by adding new
- commands and can be passed through character devices, block devices as
- well as sockets and other special file descriptors.
- However, it is also very easy to get ioctl command definitions wrong,
- and hard to fix them later without breaking existing applications,
- so this documentation tries to help developers get it right.
- Command number definitions
- ==========================
- The command number, or request number, is the second argument passed to
- the ioctl system call. While this can be any 32-bit number that uniquely
- identifies an action for a particular driver, there are a number of
- conventions around defining them.
- ``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
- ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
- ``_IOW``, and ``_IOWR``. These should be used for all new commands,
- with the correct parameters:
- _IO/_IOR/_IOW/_IOWR
- The macro name specifies how the argument will be used. It may be a
- pointer to data to be passed into the kernel (_IOW), out of the kernel
- (_IOR), or both (_IOWR). _IO can indicate either commands with no
- argument or those passing an integer value instead of a pointer.
- It is recommended to only use _IO for commands without arguments,
- and use pointers for passing data.
- type
- An 8-bit number, often a character literal, specific to a subsystem
- or driver, and listed in Documentation/userspace-api/ioctl/ioctl-number.rst
- nr
- An 8-bit number identifying the specific command, unique for a give
- value of 'type'
- data_type
- The name of the data type pointed to by the argument, the command number
- encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
- leading to a limit of 8191 bytes for the maximum size of the argument.
- Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
- will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
- _IO does not have a data_type parameter.
- Interface versions
- ==================
- Some subsystems use version numbers in data structures to overload
- commands with different interpretations of the argument.
- This is generally a bad idea, since changes to existing commands tend
- to break existing applications.
- A better approach is to add a new ioctl command with a new number. The
- old command still needs to be implemented in the kernel for compatibility,
- but this can be a wrapper around the new implementation.
- Return code
- ===========
- ioctl commands can return negative error codes as documented in errno(3);
- these get turned into errno values in user space. On success, the return
- code should be zero. It is also possible but not recommended to return
- a positive 'long' value.
- When the ioctl callback is called with an unknown command number, the
- handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
- -ENOTTY being returned from the system call. Some subsystems return
- -ENOSYS or -EINVAL here for historic reasons, but this is wrong.
- Prior to Linux 5.5, compat_ioctl handlers were required to return
- -ENOIOCTLCMD in order to use the fallback conversion into native
- commands. As all subsystems are now responsible for handling compat
- mode themselves, this is no longer needed, but it may be important to
- consider when backporting bug fixes to older kernels.
- Timestamps
- ==========
- Traditionally, timestamps and timeout values are passed as ``struct
- timespec`` or ``struct timeval``, but these are problematic because of
- incompatible definitions of these structures in user space after the
- move to 64-bit time_t.
- The ``struct __kernel_timespec`` type can be used instead to be embedded
- in other data structures when separate second/nanosecond values are
- desired, or passed to user space directly. This is still not ideal though,
- as the structure matches neither the kernel's timespec64 nor the user
- space timespec exactly. The get_timespec64() and put_timespec64() helper
- functions can be used to ensure that the layout remains compatible with
- user space and the padding is treated correctly.
- As it is cheap to convert seconds to nanoseconds, but the opposite
- requires an expensive 64-bit division, a simple __u64 nanosecond value
- can be simpler and more efficient.
- Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
- as returned by ktime_get_ns() or ktime_get_ts64(). Unlike
- CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
- or forwards due to leap second adjustments and clock_settime() calls.
- ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
- need to be persistent across a reboot or between multiple machines.
- 32-bit compat mode
- ==================
- In order to support 32-bit user space running on a 64-bit machine, each
- subsystem or driver that implements an ioctl callback handler must also
- implement the corresponding compat_ioctl handler.
- As long as all the rules for data structures are followed, this is as
- easy as setting the .compat_ioctl pointer to a helper function such as
- compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
- compat_ptr()
- ------------
- On the s390 architecture, 31-bit user space has ambiguous representations
- for data pointers, with the upper bit being ignored. When running such
- a process in compat mode, the compat_ptr() helper must be used to
- clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
- pointer. On other architectures, this macro only performs a cast to a
- ``void __user *`` pointer.
- In an compat_ioctl() callback, the last argument is an unsigned long,
- which can be interpreted as either a pointer or a scalar depending on
- the command. If it is a scalar, then compat_ptr() must not be used, to
- ensure that the 64-bit kernel behaves the same way as a 32-bit kernel
- for arguments with the upper bit set.
- The compat_ptr_ioctl() helper can be used in place of a custom
- compat_ioctl file operation for drivers that only take arguments that
- are pointers to compatible data structures.
- Structure layout
- ----------------
- Compatible data structures have the same layout on all architectures,
- avoiding all problematic members:
- * ``long`` and ``unsigned long`` are the size of a register, so
- they can be either 32-bit or 64-bit wide and cannot be used in portable
- data structures. Fixed-length replacements are ``__s32``, ``__u32``,
- ``__s64`` and ``__u64``.
- * Pointers have the same problem, in addition to requiring the
- use of compat_ptr(). The best workaround is to use ``__u64``
- in place of pointers, which requires a cast to ``uintptr_t`` in user
- space, and the use of u64_to_user_ptr() in the kernel to convert
- it back into a user pointer.
- * On the x86-32 (i386) architecture, the alignment of 64-bit variables
- is only 32-bit, but they are naturally aligned on most other
- architectures including x86-64. This means a structure like::
- struct foo {
- __u32 a;
- __u64 b;
- __u32 c;
- };
- has four bytes of padding between a and b on x86-64, plus another four
- bytes of padding at the end, but no padding on i386, and it needs a
- compat_ioctl conversion handler to translate between the two formats.
- To avoid this problem, all structures should have their members
- naturally aligned, or explicit reserved fields added in place of the
- implicit padding. The ``pahole`` tool can be used for checking the
- alignment.
- * On ARM OABI user space, structures are padded to multiples of 32-bit,
- making some structs incompatible with modern EABI kernels if they
- do not end on a 32-bit boundary.
- * On the m68k architecture, struct members are not guaranteed to have an
- alignment greater than 16-bit, which is a problem when relying on
- implicit padding.
- * Bitfields and enums generally work as one would expect them to,
- but some properties of them are implementation-defined, so it is better
- to avoid them completely in ioctl interfaces.
- * ``char`` members can be either signed or unsigned, depending on
- the architecture, so the __u8 and __s8 types should be used for 8-bit
- integer values, though char arrays are clearer for fixed-length strings.
- Information leaks
- =================
- Uninitialized data must not be copied back to user space, as this can
- cause an information leak, which can be used to defeat kernel address
- space layout randomization (KASLR), helping in an attack.
- For this reason (and for compat support) it is best to avoid any
- implicit padding in data structures. Where there is implicit padding
- in an existing structure, kernel drivers must be careful to fully
- initialize an instance of the structure before copying it to user
- space. This is usually done by calling memset() before assigning to
- individual members.
- Subsystem abstractions
- ======================
- While some device drivers implement their own ioctl function, most
- subsystems implement the same command for multiple drivers. Ideally the
- subsystem has an .ioctl() handler that copies the arguments from and
- to user space, passing them into subsystem specific callback functions
- through normal kernel pointers.
- This helps in various ways:
- * Applications written for one driver are more likely to work for
- another one in the same subsystem if there are no subtle differences
- in the user space ABI.
- * The complexity of user space access and data structure layout is done
- in one place, reducing the potential for implementation bugs.
- * It is more likely to be reviewed by experienced developers
- that can spot problems in the interface when the ioctl is shared
- between multiple drivers than when it is only used in a single driver.
- Alternatives to ioctl
- =====================
- There are many cases in which ioctl is not the best solution for a
- problem. Alternatives include:
- * System calls are a better choice for a system-wide feature that
- is not tied to a physical device or constrained by the file system
- permissions of a character device node
- * netlink is the preferred way of configuring any network related
- objects through sockets.
- * debugfs is used for ad-hoc interfaces for debugging functionality
- that does not need to be exposed as a stable interface to applications.
- * sysfs is a good way to expose the state of an in-kernel object
- that is not tied to a file descriptor.
- * configfs can be used for more complex configuration than sysfs
- * A custom file system can provide extra flexibility with a simple
- user interface but adds a lot of complexity to the implementation.
|