usage-model.rst 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ========================
  3. Linux and the Devicetree
  4. ========================
  5. The Linux usage model for device tree data
  6. :Author: Grant Likely <[email protected]>
  7. This article describes how Linux uses the device tree. An overview of
  8. the device tree data format can be found on the device tree usage page
  9. at devicetree.org\ [1]_.
  10. .. [1] https://www.devicetree.org/specifications/
  11. The "Open Firmware Device Tree", or simply Devicetree (DT), is a data
  12. structure and language for describing hardware. More specifically, it
  13. is a description of hardware that is readable by an operating system
  14. so that the operating system doesn't need to hard code details of the
  15. machine.
  16. Structurally, the DT is a tree, or acyclic graph with named nodes, and
  17. nodes may have an arbitrary number of named properties encapsulating
  18. arbitrary data. A mechanism also exists to create arbitrary
  19. links from one node to another outside of the natural tree structure.
  20. Conceptually, a common set of usage conventions, called 'bindings',
  21. is defined for how data should appear in the tree to describe typical
  22. hardware characteristics including data busses, interrupt lines, GPIO
  23. connections, and peripheral devices.
  24. As much as possible, hardware is described using existing bindings to
  25. maximize use of existing support code, but since property and node
  26. names are simply text strings, it is easy to extend existing bindings
  27. or create new ones by defining new nodes and properties. Be wary,
  28. however, of creating a new binding without first doing some homework
  29. about what already exists. There are currently two different,
  30. incompatible, bindings for i2c busses that came about because the new
  31. binding was created without first investigating how i2c devices were
  32. already being enumerated in existing systems.
  33. 1. History
  34. ----------
  35. The DT was originally created by Open Firmware as part of the
  36. communication method for passing data from Open Firmware to a client
  37. program (like to an operating system). An operating system used the
  38. Device Tree to discover the topology of the hardware at runtime, and
  39. thereby support a majority of available hardware without hard coded
  40. information (assuming drivers were available for all devices).
  41. Since Open Firmware is commonly used on PowerPC and SPARC platforms,
  42. the Linux support for those architectures has for a long time used the
  43. Device Tree.
  44. In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit
  45. and 64-bit support, the decision was made to require DT support on all
  46. powerpc platforms, regardless of whether or not they used Open
  47. Firmware. To do this, a DT representation called the Flattened Device
  48. Tree (FDT) was created which could be passed to the kernel as a binary
  49. blob without requiring a real Open Firmware implementation. U-Boot,
  50. kexec, and other bootloaders were modified to support both passing a
  51. Device Tree Binary (dtb) and to modify a dtb at boot time. DT was
  52. also added to the PowerPC boot wrapper (``arch/powerpc/boot/*``) so that
  53. a dtb could be wrapped up with the kernel image to support booting
  54. existing non-DT aware firmware.
  55. Some time later, FDT infrastructure was generalized to be usable by
  56. all architectures. At the time of this writing, 6 mainlined
  57. architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1
  58. out of mainline (nios) have some level of DT support.
  59. 2. Data Model
  60. -------------
  61. If you haven't already read the Device Tree Usage\ [1]_ page,
  62. then go read it now. It's okay, I'll wait....
  63. 2.1 High Level View
  64. -------------------
  65. The most important thing to understand is that the DT is simply a data
  66. structure that describes the hardware. There is nothing magical about
  67. it, and it doesn't magically make all hardware configuration problems
  68. go away. What it does do is provide a language for decoupling the
  69. hardware configuration from the board and device driver support in the
  70. Linux kernel (or any other operating system for that matter). Using
  71. it allows board and device support to become data driven; to make
  72. setup decisions based on data passed into the kernel instead of on
  73. per-machine hard coded selections.
  74. Ideally, data driven platform setup should result in less code
  75. duplication and make it easier to support a wide range of hardware
  76. with a single kernel image.
  77. Linux uses DT data for three major purposes:
  78. 1) platform identification,
  79. 2) runtime configuration, and
  80. 3) device population.
  81. 2.2 Platform Identification
  82. ---------------------------
  83. First and foremost, the kernel will use data in the DT to identify the
  84. specific machine. In a perfect world, the specific platform shouldn't
  85. matter to the kernel because all platform details would be described
  86. perfectly by the device tree in a consistent and reliable manner.
  87. Hardware is not perfect though, and so the kernel must identify the
  88. machine during early boot so that it has the opportunity to run
  89. machine-specific fixups.
  90. In the majority of cases, the machine identity is irrelevant, and the
  91. kernel will instead select setup code based on the machine's core
  92. CPU or SoC. On ARM for example, setup_arch() in
  93. arch/arm/kernel/setup.c will call setup_machine_fdt() in
  94. arch/arm/kernel/devtree.c which searches through the machine_desc
  95. table and selects the machine_desc which best matches the device tree
  96. data. It determines the best match by looking at the 'compatible'
  97. property in the root device tree node, and comparing it with the
  98. dt_compat list in struct machine_desc (which is defined in
  99. arch/arm/include/asm/mach/arch.h if you're curious).
  100. The 'compatible' property contains a sorted list of strings starting
  101. with the exact name of the machine, followed by an optional list of
  102. boards it is compatible with sorted from most compatible to least. For
  103. example, the root compatible properties for the TI BeagleBoard and its
  104. successor, the BeagleBoard xM board might look like, respectively::
  105. compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3";
  106. compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3";
  107. Where "ti,omap3-beagleboard-xm" specifies the exact model, it also
  108. claims that it compatible with the OMAP 3450 SoC, and the omap3 family
  109. of SoCs in general. You'll notice that the list is sorted from most
  110. specific (exact board) to least specific (SoC family).
  111. Astute readers might point out that the Beagle xM could also claim
  112. compatibility with the original Beagle board. However, one should be
  113. cautioned about doing so at the board level since there is typically a
  114. high level of change from one board to another, even within the same
  115. product line, and it is hard to nail down exactly what is meant when one
  116. board claims to be compatible with another. For the top level, it is
  117. better to err on the side of caution and not claim one board is
  118. compatible with another. The notable exception would be when one
  119. board is a carrier for another, such as a CPU module attached to a
  120. carrier board.
  121. One more note on compatible values. Any string used in a compatible
  122. property must be documented as to what it indicates. Add
  123. documentation for compatible strings in Documentation/devicetree/bindings.
  124. Again on ARM, for each machine_desc, the kernel looks to see if
  125. any of the dt_compat list entries appear in the compatible property.
  126. If one does, then that machine_desc is a candidate for driving the
  127. machine. After searching the entire table of machine_descs,
  128. setup_machine_fdt() returns the 'most compatible' machine_desc based
  129. on which entry in the compatible property each machine_desc matches
  130. against. If no matching machine_desc is found, then it returns NULL.
  131. The reasoning behind this scheme is the observation that in the majority
  132. of cases, a single machine_desc can support a large number of boards
  133. if they all use the same SoC, or same family of SoCs. However,
  134. invariably there will be some exceptions where a specific board will
  135. require special setup code that is not useful in the generic case.
  136. Special cases could be handled by explicitly checking for the
  137. troublesome board(s) in generic setup code, but doing so very quickly
  138. becomes ugly and/or unmaintainable if it is more than just a couple of
  139. cases.
  140. Instead, the compatible list allows a generic machine_desc to provide
  141. support for a wide common set of boards by specifying "less
  142. compatible" values in the dt_compat list. In the example above,
  143. generic board support can claim compatibility with "ti,omap3" or
  144. "ti,omap3450". If a bug was discovered on the original beagleboard
  145. that required special workaround code during early boot, then a new
  146. machine_desc could be added which implements the workarounds and only
  147. matches on "ti,omap3-beagleboard".
  148. PowerPC uses a slightly different scheme where it calls the .probe()
  149. hook from each machine_desc, and the first one returning TRUE is used.
  150. However, this approach does not take into account the priority of the
  151. compatible list, and probably should be avoided for new architecture
  152. support.
  153. 2.3 Runtime configuration
  154. -------------------------
  155. In most cases, a DT will be the sole method of communicating data from
  156. firmware to the kernel, so also gets used to pass in runtime and
  157. configuration data like the kernel parameters string and the location
  158. of an initrd image.
  159. Most of this data is contained in the /chosen node, and when booting
  160. Linux it will look something like this::
  161. chosen {
  162. bootargs = "console=ttyS0,115200 loglevel=8";
  163. initrd-start = <0xc8000000>;
  164. initrd-end = <0xc8200000>;
  165. };
  166. The bootargs property contains the kernel arguments, and the initrd-*
  167. properties define the address and size of an initrd blob. Note that
  168. initrd-end is the first address after the initrd image, so this doesn't
  169. match the usual semantic of struct resource. The chosen node may also
  170. optionally contain an arbitrary number of additional properties for
  171. platform-specific configuration data.
  172. During early boot, the architecture setup code calls of_scan_flat_dt()
  173. several times with different helper callbacks to parse device tree
  174. data before paging is setup. The of_scan_flat_dt() code scans through
  175. the device tree and uses the helpers to extract information required
  176. during early boot. Typically the early_init_dt_scan_chosen() helper
  177. is used to parse the chosen node including kernel parameters,
  178. early_init_dt_scan_root() to initialize the DT address space model,
  179. and early_init_dt_scan_memory() to determine the size and
  180. location of usable RAM.
  181. On ARM, the function setup_machine_fdt() is responsible for early
  182. scanning of the device tree after selecting the correct machine_desc
  183. that supports the board.
  184. 2.4 Device population
  185. ---------------------
  186. After the board has been identified, and after the early configuration data
  187. has been parsed, then kernel initialization can proceed in the normal
  188. way. At some point in this process, unflatten_device_tree() is called
  189. to convert the data into a more efficient runtime representation.
  190. This is also when machine-specific setup hooks will get called, like
  191. the machine_desc .init_early(), .init_irq() and .init_machine() hooks
  192. on ARM. The remainder of this section uses examples from the ARM
  193. implementation, but all architectures will do pretty much the same
  194. thing when using a DT.
  195. As can be guessed by the names, .init_early() is used for any machine-
  196. specific setup that needs to be executed early in the boot process,
  197. and .init_irq() is used to set up interrupt handling. Using a DT
  198. doesn't materially change the behaviour of either of these functions.
  199. If a DT is provided, then both .init_early() and .init_irq() are able
  200. to call any of the DT query functions (of_* in include/linux/of*.h) to
  201. get additional data about the platform.
  202. The most interesting hook in the DT context is .init_machine() which
  203. is primarily responsible for populating the Linux device model with
  204. data about the platform. Historically this has been implemented on
  205. embedded platforms by defining a set of static clock structures,
  206. platform_devices, and other data in the board support .c file, and
  207. registering it en-masse in .init_machine(). When DT is used, then
  208. instead of hard coding static devices for each platform, the list of
  209. devices can be obtained by parsing the DT, and allocating device
  210. structures dynamically.
  211. The simplest case is when .init_machine() is only responsible for
  212. registering a block of platform_devices. A platform_device is a concept
  213. used by Linux for memory or I/O mapped devices which cannot be detected
  214. by hardware, and for 'composite' or 'virtual' devices (more on those
  215. later). While there is no 'platform device' terminology for the DT,
  216. platform devices roughly correspond to device nodes at the root of the
  217. tree and children of simple memory mapped bus nodes.
  218. About now is a good time to lay out an example. Here is part of the
  219. device tree for the NVIDIA Tegra board::
  220. /{
  221. compatible = "nvidia,harmony", "nvidia,tegra20";
  222. #address-cells = <1>;
  223. #size-cells = <1>;
  224. interrupt-parent = <&intc>;
  225. chosen { };
  226. aliases { };
  227. memory {
  228. device_type = "memory";
  229. reg = <0x00000000 0x40000000>;
  230. };
  231. soc {
  232. compatible = "nvidia,tegra20-soc", "simple-bus";
  233. #address-cells = <1>;
  234. #size-cells = <1>;
  235. ranges;
  236. intc: interrupt-controller@50041000 {
  237. compatible = "nvidia,tegra20-gic";
  238. interrupt-controller;
  239. #interrupt-cells = <1>;
  240. reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >;
  241. };
  242. serial@70006300 {
  243. compatible = "nvidia,tegra20-uart";
  244. reg = <0x70006300 0x100>;
  245. interrupts = <122>;
  246. };
  247. i2s1: i2s@70002800 {
  248. compatible = "nvidia,tegra20-i2s";
  249. reg = <0x70002800 0x100>;
  250. interrupts = <77>;
  251. codec = <&wm8903>;
  252. };
  253. i2c@7000c000 {
  254. compatible = "nvidia,tegra20-i2c";
  255. #address-cells = <1>;
  256. #size-cells = <0>;
  257. reg = <0x7000c000 0x100>;
  258. interrupts = <70>;
  259. wm8903: codec@1a {
  260. compatible = "wlf,wm8903";
  261. reg = <0x1a>;
  262. interrupts = <347>;
  263. };
  264. };
  265. };
  266. sound {
  267. compatible = "nvidia,harmony-sound";
  268. i2s-controller = <&i2s1>;
  269. i2s-codec = <&wm8903>;
  270. };
  271. };
  272. At .init_machine() time, Tegra board support code will need to look at
  273. this DT and decide which nodes to create platform_devices for.
  274. However, looking at the tree, it is not immediately obvious what kind
  275. of device each node represents, or even if a node represents a device
  276. at all. The /chosen, /aliases, and /memory nodes are informational
  277. nodes that don't describe devices (although arguably memory could be
  278. considered a device). The children of the /soc node are memory mapped
  279. devices, but the codec@1a is an i2c device, and the sound node
  280. represents not a device, but rather how other devices are connected
  281. together to create the audio subsystem. I know what each device is
  282. because I'm familiar with the board design, but how does the kernel
  283. know what to do with each node?
  284. The trick is that the kernel starts at the root of the tree and looks
  285. for nodes that have a 'compatible' property. First, it is generally
  286. assumed that any node with a 'compatible' property represents a device
  287. of some kind, and second, it can be assumed that any node at the root
  288. of the tree is either directly attached to the processor bus, or is a
  289. miscellaneous system device that cannot be described any other way.
  290. For each of these nodes, Linux allocates and registers a
  291. platform_device, which in turn may get bound to a platform_driver.
  292. Why is using a platform_device for these nodes a safe assumption?
  293. Well, for the way that Linux models devices, just about all bus_types
  294. assume that its devices are children of a bus controller. For
  295. example, each i2c_client is a child of an i2c_master. Each spi_device
  296. is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The
  297. same hierarchy is also found in the DT, where I2C device nodes only
  298. ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB,
  299. etc. The only devices which do not require a specific type of parent
  300. device are platform_devices (and amba_devices, but more on that
  301. later), which will happily live at the base of the Linux /sys/devices
  302. tree. Therefore, if a DT node is at the root of the tree, then it
  303. really probably is best registered as a platform_device.
  304. Linux board support code calls of_platform_populate(NULL, NULL, NULL, NULL)
  305. to kick off discovery of devices at the root of the tree. The
  306. parameters are all NULL because when starting from the root of the
  307. tree, there is no need to provide a starting node (the first NULL), a
  308. parent struct device (the last NULL), and we're not using a match
  309. table (yet). For a board that only needs to register devices,
  310. .init_machine() can be completely empty except for the
  311. of_platform_populate() call.
  312. In the Tegra example, this accounts for the /soc and /sound nodes, but
  313. what about the children of the SoC node? Shouldn't they be registered
  314. as platform devices too? For Linux DT support, the generic behaviour
  315. is for child devices to be registered by the parent's device driver at
  316. driver .probe() time. So, an i2c bus device driver will register a
  317. i2c_client for each child node, an SPI bus driver will register
  318. its spi_device children, and similarly for other bus_types.
  319. According to that model, a driver could be written that binds to the
  320. SoC node and simply registers platform_devices for each of its
  321. children. The board support code would allocate and register an SoC
  322. device, a (theoretical) SoC device driver could bind to the SoC device,
  323. and register platform_devices for /soc/interrupt-controller, /soc/serial,
  324. /soc/i2s, and /soc/i2c in its .probe() hook. Easy, right?
  325. Actually, it turns out that registering children of some
  326. platform_devices as more platform_devices is a common pattern, and the
  327. device tree support code reflects that and makes the above example
  328. simpler. The second argument to of_platform_populate() is an
  329. of_device_id table, and any node that matches an entry in that table
  330. will also get its child nodes registered. In the Tegra case, the code
  331. can look something like this::
  332. static void __init harmony_init_machine(void)
  333. {
  334. /* ... */
  335. of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
  336. }
  337. "simple-bus" is defined in the Devicetree Specification as a property
  338. meaning a simple memory mapped bus, so the of_platform_populate() code
  339. could be written to just assume simple-bus compatible nodes will
  340. always be traversed. However, we pass it in as an argument so that
  341. board support code can always override the default behaviour.
  342. [Need to add discussion of adding i2c/spi/etc child devices]
  343. Appendix A: AMBA devices
  344. ------------------------
  345. ARM Primecells are a certain kind of device attached to the ARM AMBA
  346. bus which include some support for hardware detection and power
  347. management. In Linux, struct amba_device and the amba_bus_type is
  348. used to represent Primecell devices. However, the fiddly bit is that
  349. not all devices on an AMBA bus are Primecells, and for Linux it is
  350. typical for both amba_device and platform_device instances to be
  351. siblings of the same bus segment.
  352. When using the DT, this creates problems for of_platform_populate()
  353. because it must decide whether to register each node as either a
  354. platform_device or an amba_device. This unfortunately complicates the
  355. device creation model a little bit, but the solution turns out not to
  356. be too invasive. If a node is compatible with "arm,amba-primecell", then
  357. of_platform_populate() will register it as an amba_device instead of a
  358. platform_device.