stm32-dma-mdma-chaining.rst 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415
  1. .. SPDX-License-Identifier: GPL-2.0
  2. =======================
  3. STM32 DMA-MDMA chaining
  4. =======================
  5. Introduction
  6. ------------
  7. This document describes the STM32 DMA-MDMA chaining feature. But before going
  8. further, let's introduce the peripherals involved.
  9. To offload data transfers from the CPU, STM32 microprocessors (MPUs) embed
  10. direct memory access controllers (DMA).
  11. STM32MP1 SoCs embed both STM32 DMA and STM32 MDMA controllers. STM32 DMA
  12. request routing capabilities are enhanced by a DMA request multiplexer
  13. (STM32 DMAMUX).
  14. **STM32 DMAMUX**
  15. STM32 DMAMUX routes any DMA request from a given peripheral to any STM32 DMA
  16. controller (STM32MP1 counts two STM32 DMA controllers) channels.
  17. **STM32 DMA**
  18. STM32 DMA is mainly used to implement central data buffer storage (usually in
  19. the system SRAM) for different peripheral. It can access external RAMs but
  20. without the ability to generate convenient burst transfer ensuring the best
  21. load of the AXI.
  22. **STM32 MDMA**
  23. STM32 MDMA (Master DMA) is mainly used to manage direct data transfers between
  24. RAM data buffers without CPU intervention. It can also be used in a
  25. hierarchical structure that uses STM32 DMA as first level data buffer
  26. interfaces for AHB peripherals, while the STM32 MDMA acts as a second level
  27. DMA with better performance. As a AXI/AHB master, STM32 MDMA can take control
  28. of the AXI/AHB bus.
  29. Principles
  30. ----------
  31. STM32 DMA-MDMA chaining feature relies on the strengths of STM32 DMA and
  32. STM32 MDMA controllers.
  33. STM32 DMA has a circular Double Buffer Mode (DBM). At each end of transaction
  34. (when DMA data counter - DMA_SxNDTR - reaches 0), the memory pointers
  35. (configured with DMA_SxSM0AR and DMA_SxM1AR) are swapped and the DMA data
  36. counter is automatically reloaded. This allows the SW or the STM32 MDMA to
  37. process one memory area while the second memory area is being filled/used by
  38. the STM32 DMA transfer.
  39. With STM32 MDMA linked-list mode, a single request initiates the data array
  40. (collection of nodes) to be transferred until the linked-list pointer for the
  41. channel is null. The channel transfer complete of the last node is the end of
  42. transfer, unless first and last nodes are linked to each other, in such a
  43. case, the linked-list loops on to create a circular MDMA transfer.
  44. STM32 MDMA has direct connections with STM32 DMA. This enables autonomous
  45. communication and synchronization between peripherals, thus saving CPU
  46. resources and bus congestion. Transfer Complete signal of STM32 DMA channel
  47. can triggers STM32 MDMA transfer. STM32 MDMA can clear the request generated
  48. by the STM32 DMA by writing to its Interrupt Clear register (whose address is
  49. stored in MDMA_CxMAR, and bit mask in MDMA_CxMDR).
  50. .. table:: STM32 MDMA interconnect table with STM32 DMA
  51. +--------------+----------------+-----------+------------+
  52. | STM32 DMAMUX | STM32 DMA | STM32 DMA | STM32 MDMA |
  53. | channels | channels | Transfer | request |
  54. | | | complete | |
  55. | | | signal | |
  56. +==============+================+===========+============+
  57. | Channel *0* | DMA1 channel 0 | dma1_tcf0 | *0x00* |
  58. +--------------+----------------+-----------+------------+
  59. | Channel *1* | DMA1 channel 1 | dma1_tcf1 | *0x01* |
  60. +--------------+----------------+-----------+------------+
  61. | Channel *2* | DMA1 channel 2 | dma1_tcf2 | *0x02* |
  62. +--------------+----------------+-----------+------------+
  63. | Channel *3* | DMA1 channel 3 | dma1_tcf3 | *0x03* |
  64. +--------------+----------------+-----------+------------+
  65. | Channel *4* | DMA1 channel 4 | dma1_tcf4 | *0x04* |
  66. +--------------+----------------+-----------+------------+
  67. | Channel *5* | DMA1 channel 5 | dma1_tcf5 | *0x05* |
  68. +--------------+----------------+-----------+------------+
  69. | Channel *6* | DMA1 channel 6 | dma1_tcf6 | *0x06* |
  70. +--------------+----------------+-----------+------------+
  71. | Channel *7* | DMA1 channel 7 | dma1_tcf7 | *0x07* |
  72. +--------------+----------------+-----------+------------+
  73. | Channel *8* | DMA2 channel 0 | dma2_tcf0 | *0x08* |
  74. +--------------+----------------+-----------+------------+
  75. | Channel *9* | DMA2 channel 1 | dma2_tcf1 | *0x09* |
  76. +--------------+----------------+-----------+------------+
  77. | Channel *10* | DMA2 channel 2 | dma2_tcf2 | *0x0A* |
  78. +--------------+----------------+-----------+------------+
  79. | Channel *11* | DMA2 channel 3 | dma2_tcf3 | *0x0B* |
  80. +--------------+----------------+-----------+------------+
  81. | Channel *12* | DMA2 channel 4 | dma2_tcf4 | *0x0C* |
  82. +--------------+----------------+-----------+------------+
  83. | Channel *13* | DMA2 channel 5 | dma2_tcf5 | *0x0D* |
  84. +--------------+----------------+-----------+------------+
  85. | Channel *14* | DMA2 channel 6 | dma2_tcf6 | *0x0E* |
  86. +--------------+----------------+-----------+------------+
  87. | Channel *15* | DMA2 channel 7 | dma2_tcf7 | *0x0F* |
  88. +--------------+----------------+-----------+------------+
  89. STM32 DMA-MDMA chaining feature then uses a SRAM buffer. STM32MP1 SoCs embed
  90. three fast access static internal RAMs of various size, used for data storage.
  91. Due to STM32 DMA legacy (within microcontrollers), STM32 DMA performances are
  92. bad with DDR, while they are optimal with SRAM. Hence the SRAM buffer used
  93. between STM32 DMA and STM32 MDMA. This buffer is split in two equal periods
  94. and STM32 DMA uses one period while STM32 MDMA uses the other period
  95. simultaneously.
  96. ::
  97. dma[1:2]-tcf[0:7]
  98. .----------------.
  99. ____________ ' _________ V____________
  100. | STM32 DMA | / __|>_ \ | STM32 MDMA |
  101. |------------| | / \ | |------------|
  102. | DMA_SxM0AR |<=>| | SRAM | |<=>| []-[]...[] |
  103. | DMA_SxM1AR | | \_____/ | | |
  104. |____________| \___<|____/ |____________|
  105. STM32 DMA-MDMA chaining uses (struct dma_slave_config).peripheral_config to
  106. exchange the parameters needed to configure MDMA. These parameters are
  107. gathered into a u32 array with three values:
  108. * the STM32 MDMA request (which is actually the DMAMUX channel ID),
  109. * the address of the STM32 DMA register to clear the Transfer Complete
  110. interrupt flag,
  111. * the mask of the Transfer Complete interrupt flag of the STM32 DMA channel.
  112. Device Tree updates for STM32 DMA-MDMA chaining support
  113. -------------------------------------------------------
  114. **1. Allocate a SRAM buffer**
  115. SRAM device tree node is defined in SoC device tree. You can refer to it in
  116. your board device tree to define your SRAM pool.
  117. ::
  118. &sram {
  119. my_foo_device_dma_pool: dma-sram@0 {
  120. reg = <0x0 0x1000>;
  121. };
  122. };
  123. Be careful of the start index, in case there are other SRAM consumers.
  124. Define your pool size strategically: to optimise chaining, the idea is that
  125. STM32 DMA and STM32 MDMA can work simultaneously, on each buffer of the
  126. SRAM.
  127. If the SRAM period is greater than the expected DMA transfer, then STM32 DMA
  128. and STM32 MDMA will work sequentially instead of simultaneously. It is not a
  129. functional issue but it is not optimal.
  130. Don't forget to refer to your SRAM pool in your device node. You need to
  131. define a new property.
  132. ::
  133. &my_foo_device {
  134. ...
  135. my_dma_pool = &my_foo_device_dma_pool;
  136. };
  137. Then get this SRAM pool in your foo driver and allocate your SRAM buffer.
  138. **2. Allocate a STM32 DMA channel and a STM32 MDMA channel**
  139. You need to define an extra channel in your device tree node, in addition to
  140. the one you should already have for "classic" DMA operation.
  141. This new channel must be taken from STM32 MDMA channels, so, the phandle of
  142. the DMA controller to use is the MDMA controller's one.
  143. ::
  144. &my_foo_device {
  145. [...]
  146. my_dma_pool = &my_foo_device_dma_pool;
  147. dmas = <&dmamux1 ...>, // STM32 DMA channel
  148. <&mdma1 0 0x3 0x1200000a 0 0>; // + STM32 MDMA channel
  149. };
  150. Concerning STM32 MDMA bindings:
  151. 1. The request line number : whatever the value here, it will be overwritten
  152. by MDMA driver with the STM32 DMAMUX channel ID passed through
  153. (struct dma_slave_config).peripheral_config
  154. 2. The priority level : choose Very High (0x3) so that your channel will
  155. take priority other the other during request arbitration
  156. 3. A 32bit mask specifying the DMA channel configuration : source and
  157. destination address increment, block transfer with 128 bytes per single
  158. transfer
  159. 4. The 32bit value specifying the register to be used to acknowledge the
  160. request: it will be overwritten by MDMA driver, with the DMA channel
  161. interrupt flag clear register address passed through
  162. (struct dma_slave_config).peripheral_config
  163. 5. The 32bit mask specifying the value to be written to acknowledge the
  164. request: it will be overwritten by MDMA driver, with the DMA channel
  165. Transfer Complete flag passed through
  166. (struct dma_slave_config).peripheral_config
  167. Driver updates for STM32 DMA-MDMA chaining support in foo driver
  168. ----------------------------------------------------------------
  169. **0. (optional) Refactor the original sg_table if dmaengine_prep_slave_sg()**
  170. In case of dmaengine_prep_slave_sg(), the original sg_table can't be used as
  171. is. Two new sg_tables must be created from the original one. One for
  172. STM32 DMA transfer (where memory address targets now the SRAM buffer instead
  173. of DDR buffer) and one for STM32 MDMA transfer (where memory address targets
  174. the DDR buffer).
  175. The new sg_list items must fit SRAM period length. Here is an example for
  176. DMA_DEV_TO_MEM:
  177. ::
  178. /*
  179. * Assuming sgl and nents, respectively the initial scatterlist and its
  180. * length.
  181. * Assuming sram_dma_buf and sram_period, respectively the memory
  182. * allocated from the pool for DMA usage, and the length of the period,
  183. * which is half of the sram_buf size.
  184. */
  185. struct sg_table new_dma_sgt, new_mdma_sgt;
  186. struct scatterlist *s, *_sgl;
  187. dma_addr_t ddr_dma_buf;
  188. u32 new_nents = 0, len;
  189. int i;
  190. /* Count the number of entries needed */
  191. for_each_sg(sgl, s, nents, i)
  192. if (sg_dma_len(s) > sram_period)
  193. new_nents += DIV_ROUND_UP(sg_dma_len(s), sram_period);
  194. else
  195. new_nents++;
  196. /* Create sg table for STM32 DMA channel */
  197. ret = sg_alloc_table(&new_dma_sgt, new_nents, GFP_ATOMIC);
  198. if (ret)
  199. dev_err(dev, "DMA sg table alloc failed\n");
  200. for_each_sg(new_dma_sgt.sgl, s, new_dma_sgt.nents, i) {
  201. _sgl = sgl;
  202. sg_dma_len(s) = min(sg_dma_len(_sgl), sram_period);
  203. /* Targets the beginning = first half of the sram_buf */
  204. s->dma_address = sram_buf;
  205. /*
  206. * Targets the second half of the sram_buf
  207. * for odd indexes of the item of the sg_list
  208. */
  209. if (i & 1)
  210. s->dma_address += sram_period;
  211. }
  212. /* Create sg table for STM32 MDMA channel */
  213. ret = sg_alloc_table(&new_mdma_sgt, new_nents, GFP_ATOMIC);
  214. if (ret)
  215. dev_err(dev, "MDMA sg_table alloc failed\n");
  216. _sgl = sgl;
  217. len = sg_dma_len(sgl);
  218. ddr_dma_buf = sg_dma_address(sgl);
  219. for_each_sg(mdma_sgt.sgl, s, mdma_sgt.nents, i) {
  220. size_t bytes = min_t(size_t, len, sram_period);
  221. sg_dma_len(s) = bytes;
  222. sg_dma_address(s) = ddr_dma_buf;
  223. len -= bytes;
  224. if (!len && sg_next(_sgl)) {
  225. _sgl = sg_next(_sgl);
  226. len = sg_dma_len(_sgl);
  227. ddr_dma_buf = sg_dma_address(_sgl);
  228. } else {
  229. ddr_dma_buf += bytes;
  230. }
  231. }
  232. Don't forget to release these new sg_tables after getting the descriptors
  233. with dmaengine_prep_slave_sg().
  234. **1. Set controller specific parameters**
  235. First, use dmaengine_slave_config() with a struct dma_slave_config to
  236. configure STM32 DMA channel. You just have to take care of DMA addresses,
  237. the memory address (depending on the transfer direction) must point on your
  238. SRAM buffer, and set (struct dma_slave_config).peripheral_size != 0.
  239. STM32 DMA driver will check (struct dma_slave_config).peripheral_size to
  240. determine if chaining is being used or not. If it is used, then STM32 DMA
  241. driver fills (struct dma_slave_config).peripheral_config with an array of
  242. three u32 : the first one containing STM32 DMAMUX channel ID, the second one
  243. the channel interrupt flag clear register address, and the third one the
  244. channel Transfer Complete flag mask.
  245. Then, use dmaengine_slave_config with another struct dma_slave_config to
  246. configure STM32 MDMA channel. Take care of DMA addresses, the device address
  247. (depending on the transfer direction) must point on your SRAM buffer, and
  248. the memory address must point to the buffer originally used for "classic"
  249. DMA operation. Use the previous (struct dma_slave_config).peripheral_size
  250. and .peripheral_config that have been updated by STM32 DMA driver, to set
  251. (struct dma_slave_config).peripheral_size and .peripheral_config of the
  252. struct dma_slave_config to configure STM32 MDMA channel.
  253. ::
  254. struct dma_slave_config dma_conf;
  255. struct dma_slave_config mdma_conf;
  256. memset(&dma_conf, 0, sizeof(dma_conf));
  257. [...]
  258. config.direction = DMA_DEV_TO_MEM;
  259. config.dst_addr = sram_dma_buf; // SRAM buffer
  260. config.peripheral_size = 1; // peripheral_size != 0 => chaining
  261. dmaengine_slave_config(dma_chan, &dma_config);
  262. memset(&mdma_conf, 0, sizeof(mdma_conf));
  263. config.direction = DMA_DEV_TO_MEM;
  264. mdma_conf.src_addr = sram_dma_buf; // SRAM buffer
  265. mdma_conf.dst_addr = rx_dma_buf; // original memory buffer
  266. mdma_conf.peripheral_size = dma_conf.peripheral_size; // <- dma_conf
  267. mdma_conf.peripheral_config = dma_config.peripheral_config; // <- dma_conf
  268. dmaengine_slave_config(mdma_chan, &mdma_conf);
  269. **2. Get a descriptor for STM32 DMA channel transaction**
  270. In the same way you get your descriptor for your "classic" DMA operation,
  271. you just have to replace the original sg_list (in case of
  272. dmaengine_prep_slave_sg()) with the new sg_list using SRAM buffer, or to
  273. replace the original buffer address, length and period (in case of
  274. dmaengine_prep_dma_cyclic()) with the new SRAM buffer.
  275. **3. Get a descriptor for STM32 MDMA channel transaction**
  276. If you previously get descriptor (for STM32 DMA) with
  277. * dmaengine_prep_slave_sg(), then use dmaengine_prep_slave_sg() for
  278. STM32 MDMA;
  279. * dmaengine_prep_dma_cyclic(), then use dmaengine_prep_dma_cyclic() for
  280. STM32 MDMA.
  281. Use the new sg_list using SRAM buffer (in case of dmaengine_prep_slave_sg())
  282. or, depending on the transfer direction, either the original DDR buffer (in
  283. case of DMA_DEV_TO_MEM) or the SRAM buffer (in case of DMA_MEM_TO_DEV), the
  284. source address being previously set with dmaengine_slave_config().
  285. **4. Submit both transactions**
  286. Before submitting your transactions, you may need to define on which
  287. descriptor you want a callback to be called at the end of the transfer
  288. (dmaengine_prep_slave_sg()) or the period (dmaengine_prep_dma_cyclic()).
  289. Depending on the direction, set the callback on the descriptor that finishes
  290. the overal transfer:
  291. * DMA_DEV_TO_MEM: set the callback on the "MDMA" descriptor
  292. * DMA_MEM_TO_DEV: set the callback on the "DMA" descriptor
  293. Then, submit the descriptors whatever the order, with dmaengine_tx_submit().
  294. **5. Issue pending requests (and wait for callback notification)**
  295. As STM32 MDMA channel transfer is triggered by STM32 DMA, you must issue
  296. STM32 MDMA channel before STM32 DMA channel.
  297. If any, your callback will be called to warn you about the end of the overal
  298. transfer or the period completion.
  299. Don't forget to terminate both channels. STM32 DMA channel is configured in
  300. cyclic Double-Buffer mode so it won't be disabled by HW, you need to terminate
  301. it. STM32 MDMA channel will be stopped by HW in case of sg transfer, but not
  302. in case of cyclic transfer. You can terminate it whatever the kind of transfer.
  303. **STM32 DMA-MDMA chaining DMA_MEM_TO_DEV special case**
  304. STM32 DMA-MDMA chaining in DMA_MEM_TO_DEV is a special case. Indeed, the
  305. STM32 MDMA feeds the SRAM buffer with the DDR data, and the STM32 DMA reads
  306. data from SRAM buffer. So some data (the first period) have to be copied in
  307. SRAM buffer when the STM32 DMA starts to read.
  308. A trick could be pausing the STM32 DMA channel (that will raise a Transfer
  309. Complete signal, triggering the STM32 MDMA channel), but the first data read
  310. by the STM32 DMA could be "wrong". The proper way is to prepare the first SRAM
  311. period with dmaengine_prep_dma_memcpy(). Then this first period should be
  312. "removed" from the sg or the cyclic transfer.
  313. Due to this complexity, rather use the STM32 DMA-MDMA chaining for
  314. DMA_DEV_TO_MEM and keep the "classic" DMA usage for DMA_MEM_TO_DEV, unless
  315. you're not afraid.
  316. Resources
  317. ---------
  318. Application note, datasheet and reference manual are available on ST website
  319. (STM32MP1_).
  320. Dedicated focus on three application notes (AN5224_, AN4031_ & AN5001_)
  321. dealing with STM32 DMAMUX, STM32 DMA and STM32 MDMA.
  322. .. _STM32MP1: https://www.st.com/en/microcontrollers-microprocessors/stm32mp1-series.html
  323. .. _AN5224: https://www.st.com/resource/en/application_note/an5224-stm32-dmamux-the-dma-request-router-stmicroelectronics.pdf
  324. .. _AN4031: https://www.st.com/resource/en/application_note/dm00046011-using-the-stm32f2-stm32f4-and-stm32f7-series-dma-controller-stmicroelectronics.pdf
  325. .. _AN5001: https://www.st.com/resource/en/application_note/an5001-stm32cube-expansion-package-for-stm32h7-series-mdma-stmicroelectronics.pdf
  326. :Authors:
  327. - Amelie Delaunay <[email protected]>