123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415 |
- .. SPDX-License-Identifier: GPL-2.0
- =======================
- STM32 DMA-MDMA chaining
- =======================
- Introduction
- ------------
- This document describes the STM32 DMA-MDMA chaining feature. But before going
- further, let's introduce the peripherals involved.
- To offload data transfers from the CPU, STM32 microprocessors (MPUs) embed
- direct memory access controllers (DMA).
- STM32MP1 SoCs embed both STM32 DMA and STM32 MDMA controllers. STM32 DMA
- request routing capabilities are enhanced by a DMA request multiplexer
- (STM32 DMAMUX).
- **STM32 DMAMUX**
- STM32 DMAMUX routes any DMA request from a given peripheral to any STM32 DMA
- controller (STM32MP1 counts two STM32 DMA controllers) channels.
- **STM32 DMA**
- STM32 DMA is mainly used to implement central data buffer storage (usually in
- the system SRAM) for different peripheral. It can access external RAMs but
- without the ability to generate convenient burst transfer ensuring the best
- load of the AXI.
- **STM32 MDMA**
- STM32 MDMA (Master DMA) is mainly used to manage direct data transfers between
- RAM data buffers without CPU intervention. It can also be used in a
- hierarchical structure that uses STM32 DMA as first level data buffer
- interfaces for AHB peripherals, while the STM32 MDMA acts as a second level
- DMA with better performance. As a AXI/AHB master, STM32 MDMA can take control
- of the AXI/AHB bus.
- Principles
- ----------
- STM32 DMA-MDMA chaining feature relies on the strengths of STM32 DMA and
- STM32 MDMA controllers.
- STM32 DMA has a circular Double Buffer Mode (DBM). At each end of transaction
- (when DMA data counter - DMA_SxNDTR - reaches 0), the memory pointers
- (configured with DMA_SxSM0AR and DMA_SxM1AR) are swapped and the DMA data
- counter is automatically reloaded. This allows the SW or the STM32 MDMA to
- process one memory area while the second memory area is being filled/used by
- the STM32 DMA transfer.
- With STM32 MDMA linked-list mode, a single request initiates the data array
- (collection of nodes) to be transferred until the linked-list pointer for the
- channel is null. The channel transfer complete of the last node is the end of
- transfer, unless first and last nodes are linked to each other, in such a
- case, the linked-list loops on to create a circular MDMA transfer.
- STM32 MDMA has direct connections with STM32 DMA. This enables autonomous
- communication and synchronization between peripherals, thus saving CPU
- resources and bus congestion. Transfer Complete signal of STM32 DMA channel
- can triggers STM32 MDMA transfer. STM32 MDMA can clear the request generated
- by the STM32 DMA by writing to its Interrupt Clear register (whose address is
- stored in MDMA_CxMAR, and bit mask in MDMA_CxMDR).
- .. table:: STM32 MDMA interconnect table with STM32 DMA
- +--------------+----------------+-----------+------------+
- | STM32 DMAMUX | STM32 DMA | STM32 DMA | STM32 MDMA |
- | channels | channels | Transfer | request |
- | | | complete | |
- | | | signal | |
- +==============+================+===========+============+
- | Channel *0* | DMA1 channel 0 | dma1_tcf0 | *0x00* |
- +--------------+----------------+-----------+------------+
- | Channel *1* | DMA1 channel 1 | dma1_tcf1 | *0x01* |
- +--------------+----------------+-----------+------------+
- | Channel *2* | DMA1 channel 2 | dma1_tcf2 | *0x02* |
- +--------------+----------------+-----------+------------+
- | Channel *3* | DMA1 channel 3 | dma1_tcf3 | *0x03* |
- +--------------+----------------+-----------+------------+
- | Channel *4* | DMA1 channel 4 | dma1_tcf4 | *0x04* |
- +--------------+----------------+-----------+------------+
- | Channel *5* | DMA1 channel 5 | dma1_tcf5 | *0x05* |
- +--------------+----------------+-----------+------------+
- | Channel *6* | DMA1 channel 6 | dma1_tcf6 | *0x06* |
- +--------------+----------------+-----------+------------+
- | Channel *7* | DMA1 channel 7 | dma1_tcf7 | *0x07* |
- +--------------+----------------+-----------+------------+
- | Channel *8* | DMA2 channel 0 | dma2_tcf0 | *0x08* |
- +--------------+----------------+-----------+------------+
- | Channel *9* | DMA2 channel 1 | dma2_tcf1 | *0x09* |
- +--------------+----------------+-----------+------------+
- | Channel *10* | DMA2 channel 2 | dma2_tcf2 | *0x0A* |
- +--------------+----------------+-----------+------------+
- | Channel *11* | DMA2 channel 3 | dma2_tcf3 | *0x0B* |
- +--------------+----------------+-----------+------------+
- | Channel *12* | DMA2 channel 4 | dma2_tcf4 | *0x0C* |
- +--------------+----------------+-----------+------------+
- | Channel *13* | DMA2 channel 5 | dma2_tcf5 | *0x0D* |
- +--------------+----------------+-----------+------------+
- | Channel *14* | DMA2 channel 6 | dma2_tcf6 | *0x0E* |
- +--------------+----------------+-----------+------------+
- | Channel *15* | DMA2 channel 7 | dma2_tcf7 | *0x0F* |
- +--------------+----------------+-----------+------------+
- STM32 DMA-MDMA chaining feature then uses a SRAM buffer. STM32MP1 SoCs embed
- three fast access static internal RAMs of various size, used for data storage.
- Due to STM32 DMA legacy (within microcontrollers), STM32 DMA performances are
- bad with DDR, while they are optimal with SRAM. Hence the SRAM buffer used
- between STM32 DMA and STM32 MDMA. This buffer is split in two equal periods
- and STM32 DMA uses one period while STM32 MDMA uses the other period
- simultaneously.
- ::
- dma[1:2]-tcf[0:7]
- .----------------.
- ____________ ' _________ V____________
- | STM32 DMA | / __|>_ \ | STM32 MDMA |
- |------------| | / \ | |------------|
- | DMA_SxM0AR |<=>| | SRAM | |<=>| []-[]...[] |
- | DMA_SxM1AR | | \_____/ | | |
- |____________| \___<|____/ |____________|
- STM32 DMA-MDMA chaining uses (struct dma_slave_config).peripheral_config to
- exchange the parameters needed to configure MDMA. These parameters are
- gathered into a u32 array with three values:
- * the STM32 MDMA request (which is actually the DMAMUX channel ID),
- * the address of the STM32 DMA register to clear the Transfer Complete
- interrupt flag,
- * the mask of the Transfer Complete interrupt flag of the STM32 DMA channel.
- Device Tree updates for STM32 DMA-MDMA chaining support
- -------------------------------------------------------
- **1. Allocate a SRAM buffer**
- SRAM device tree node is defined in SoC device tree. You can refer to it in
- your board device tree to define your SRAM pool.
- ::
- &sram {
- my_foo_device_dma_pool: dma-sram@0 {
- reg = <0x0 0x1000>;
- };
- };
- Be careful of the start index, in case there are other SRAM consumers.
- Define your pool size strategically: to optimise chaining, the idea is that
- STM32 DMA and STM32 MDMA can work simultaneously, on each buffer of the
- SRAM.
- If the SRAM period is greater than the expected DMA transfer, then STM32 DMA
- and STM32 MDMA will work sequentially instead of simultaneously. It is not a
- functional issue but it is not optimal.
- Don't forget to refer to your SRAM pool in your device node. You need to
- define a new property.
- ::
- &my_foo_device {
- ...
- my_dma_pool = &my_foo_device_dma_pool;
- };
- Then get this SRAM pool in your foo driver and allocate your SRAM buffer.
- **2. Allocate a STM32 DMA channel and a STM32 MDMA channel**
- You need to define an extra channel in your device tree node, in addition to
- the one you should already have for "classic" DMA operation.
- This new channel must be taken from STM32 MDMA channels, so, the phandle of
- the DMA controller to use is the MDMA controller's one.
- ::
- &my_foo_device {
- [...]
- my_dma_pool = &my_foo_device_dma_pool;
- dmas = <&dmamux1 ...>, // STM32 DMA channel
- <&mdma1 0 0x3 0x1200000a 0 0>; // + STM32 MDMA channel
- };
- Concerning STM32 MDMA bindings:
- 1. The request line number : whatever the value here, it will be overwritten
- by MDMA driver with the STM32 DMAMUX channel ID passed through
- (struct dma_slave_config).peripheral_config
- 2. The priority level : choose Very High (0x3) so that your channel will
- take priority other the other during request arbitration
- 3. A 32bit mask specifying the DMA channel configuration : source and
- destination address increment, block transfer with 128 bytes per single
- transfer
- 4. The 32bit value specifying the register to be used to acknowledge the
- request: it will be overwritten by MDMA driver, with the DMA channel
- interrupt flag clear register address passed through
- (struct dma_slave_config).peripheral_config
- 5. The 32bit mask specifying the value to be written to acknowledge the
- request: it will be overwritten by MDMA driver, with the DMA channel
- Transfer Complete flag passed through
- (struct dma_slave_config).peripheral_config
- Driver updates for STM32 DMA-MDMA chaining support in foo driver
- ----------------------------------------------------------------
- **0. (optional) Refactor the original sg_table if dmaengine_prep_slave_sg()**
- In case of dmaengine_prep_slave_sg(), the original sg_table can't be used as
- is. Two new sg_tables must be created from the original one. One for
- STM32 DMA transfer (where memory address targets now the SRAM buffer instead
- of DDR buffer) and one for STM32 MDMA transfer (where memory address targets
- the DDR buffer).
- The new sg_list items must fit SRAM period length. Here is an example for
- DMA_DEV_TO_MEM:
- ::
- /*
- * Assuming sgl and nents, respectively the initial scatterlist and its
- * length.
- * Assuming sram_dma_buf and sram_period, respectively the memory
- * allocated from the pool for DMA usage, and the length of the period,
- * which is half of the sram_buf size.
- */
- struct sg_table new_dma_sgt, new_mdma_sgt;
- struct scatterlist *s, *_sgl;
- dma_addr_t ddr_dma_buf;
- u32 new_nents = 0, len;
- int i;
- /* Count the number of entries needed */
- for_each_sg(sgl, s, nents, i)
- if (sg_dma_len(s) > sram_period)
- new_nents += DIV_ROUND_UP(sg_dma_len(s), sram_period);
- else
- new_nents++;
- /* Create sg table for STM32 DMA channel */
- ret = sg_alloc_table(&new_dma_sgt, new_nents, GFP_ATOMIC);
- if (ret)
- dev_err(dev, "DMA sg table alloc failed\n");
- for_each_sg(new_dma_sgt.sgl, s, new_dma_sgt.nents, i) {
- _sgl = sgl;
- sg_dma_len(s) = min(sg_dma_len(_sgl), sram_period);
- /* Targets the beginning = first half of the sram_buf */
- s->dma_address = sram_buf;
- /*
- * Targets the second half of the sram_buf
- * for odd indexes of the item of the sg_list
- */
- if (i & 1)
- s->dma_address += sram_period;
- }
- /* Create sg table for STM32 MDMA channel */
- ret = sg_alloc_table(&new_mdma_sgt, new_nents, GFP_ATOMIC);
- if (ret)
- dev_err(dev, "MDMA sg_table alloc failed\n");
- _sgl = sgl;
- len = sg_dma_len(sgl);
- ddr_dma_buf = sg_dma_address(sgl);
- for_each_sg(mdma_sgt.sgl, s, mdma_sgt.nents, i) {
- size_t bytes = min_t(size_t, len, sram_period);
- sg_dma_len(s) = bytes;
- sg_dma_address(s) = ddr_dma_buf;
- len -= bytes;
- if (!len && sg_next(_sgl)) {
- _sgl = sg_next(_sgl);
- len = sg_dma_len(_sgl);
- ddr_dma_buf = sg_dma_address(_sgl);
- } else {
- ddr_dma_buf += bytes;
- }
- }
- Don't forget to release these new sg_tables after getting the descriptors
- with dmaengine_prep_slave_sg().
- **1. Set controller specific parameters**
- First, use dmaengine_slave_config() with a struct dma_slave_config to
- configure STM32 DMA channel. You just have to take care of DMA addresses,
- the memory address (depending on the transfer direction) must point on your
- SRAM buffer, and set (struct dma_slave_config).peripheral_size != 0.
- STM32 DMA driver will check (struct dma_slave_config).peripheral_size to
- determine if chaining is being used or not. If it is used, then STM32 DMA
- driver fills (struct dma_slave_config).peripheral_config with an array of
- three u32 : the first one containing STM32 DMAMUX channel ID, the second one
- the channel interrupt flag clear register address, and the third one the
- channel Transfer Complete flag mask.
- Then, use dmaengine_slave_config with another struct dma_slave_config to
- configure STM32 MDMA channel. Take care of DMA addresses, the device address
- (depending on the transfer direction) must point on your SRAM buffer, and
- the memory address must point to the buffer originally used for "classic"
- DMA operation. Use the previous (struct dma_slave_config).peripheral_size
- and .peripheral_config that have been updated by STM32 DMA driver, to set
- (struct dma_slave_config).peripheral_size and .peripheral_config of the
- struct dma_slave_config to configure STM32 MDMA channel.
- ::
- struct dma_slave_config dma_conf;
- struct dma_slave_config mdma_conf;
- memset(&dma_conf, 0, sizeof(dma_conf));
- [...]
- config.direction = DMA_DEV_TO_MEM;
- config.dst_addr = sram_dma_buf; // SRAM buffer
- config.peripheral_size = 1; // peripheral_size != 0 => chaining
- dmaengine_slave_config(dma_chan, &dma_config);
- memset(&mdma_conf, 0, sizeof(mdma_conf));
- config.direction = DMA_DEV_TO_MEM;
- mdma_conf.src_addr = sram_dma_buf; // SRAM buffer
- mdma_conf.dst_addr = rx_dma_buf; // original memory buffer
- mdma_conf.peripheral_size = dma_conf.peripheral_size; // <- dma_conf
- mdma_conf.peripheral_config = dma_config.peripheral_config; // <- dma_conf
- dmaengine_slave_config(mdma_chan, &mdma_conf);
- **2. Get a descriptor for STM32 DMA channel transaction**
- In the same way you get your descriptor for your "classic" DMA operation,
- you just have to replace the original sg_list (in case of
- dmaengine_prep_slave_sg()) with the new sg_list using SRAM buffer, or to
- replace the original buffer address, length and period (in case of
- dmaengine_prep_dma_cyclic()) with the new SRAM buffer.
- **3. Get a descriptor for STM32 MDMA channel transaction**
- If you previously get descriptor (for STM32 DMA) with
- * dmaengine_prep_slave_sg(), then use dmaengine_prep_slave_sg() for
- STM32 MDMA;
- * dmaengine_prep_dma_cyclic(), then use dmaengine_prep_dma_cyclic() for
- STM32 MDMA.
- Use the new sg_list using SRAM buffer (in case of dmaengine_prep_slave_sg())
- or, depending on the transfer direction, either the original DDR buffer (in
- case of DMA_DEV_TO_MEM) or the SRAM buffer (in case of DMA_MEM_TO_DEV), the
- source address being previously set with dmaengine_slave_config().
- **4. Submit both transactions**
- Before submitting your transactions, you may need to define on which
- descriptor you want a callback to be called at the end of the transfer
- (dmaengine_prep_slave_sg()) or the period (dmaengine_prep_dma_cyclic()).
- Depending on the direction, set the callback on the descriptor that finishes
- the overal transfer:
- * DMA_DEV_TO_MEM: set the callback on the "MDMA" descriptor
- * DMA_MEM_TO_DEV: set the callback on the "DMA" descriptor
- Then, submit the descriptors whatever the order, with dmaengine_tx_submit().
- **5. Issue pending requests (and wait for callback notification)**
- As STM32 MDMA channel transfer is triggered by STM32 DMA, you must issue
- STM32 MDMA channel before STM32 DMA channel.
- If any, your callback will be called to warn you about the end of the overal
- transfer or the period completion.
- Don't forget to terminate both channels. STM32 DMA channel is configured in
- cyclic Double-Buffer mode so it won't be disabled by HW, you need to terminate
- it. STM32 MDMA channel will be stopped by HW in case of sg transfer, but not
- in case of cyclic transfer. You can terminate it whatever the kind of transfer.
- **STM32 DMA-MDMA chaining DMA_MEM_TO_DEV special case**
- STM32 DMA-MDMA chaining in DMA_MEM_TO_DEV is a special case. Indeed, the
- STM32 MDMA feeds the SRAM buffer with the DDR data, and the STM32 DMA reads
- data from SRAM buffer. So some data (the first period) have to be copied in
- SRAM buffer when the STM32 DMA starts to read.
- A trick could be pausing the STM32 DMA channel (that will raise a Transfer
- Complete signal, triggering the STM32 MDMA channel), but the first data read
- by the STM32 DMA could be "wrong". The proper way is to prepare the first SRAM
- period with dmaengine_prep_dma_memcpy(). Then this first period should be
- "removed" from the sg or the cyclic transfer.
- Due to this complexity, rather use the STM32 DMA-MDMA chaining for
- DMA_DEV_TO_MEM and keep the "classic" DMA usage for DMA_MEM_TO_DEV, unless
- you're not afraid.
- Resources
- ---------
- Application note, datasheet and reference manual are available on ST website
- (STM32MP1_).
- Dedicated focus on three application notes (AN5224_, AN4031_ & AN5001_)
- dealing with STM32 DMAMUX, STM32 DMA and STM32 MDMA.
- .. _STM32MP1: https://www.st.com/en/microcontrollers-microprocessors/stm32mp1-series.html
- .. _AN5224: https://www.st.com/resource/en/application_note/an5224-stm32-dmamux-the-dma-request-router-stmicroelectronics.pdf
- .. _AN4031: https://www.st.com/resource/en/application_note/dm00046011-using-the-stm32f2-stm32f4-and-stm32f7-series-dma-controller-stmicroelectronics.pdf
- .. _AN5001: https://www.st.com/resource/en/application_note/an5001-stm32cube-expansion-package-for-stm32h7-series-mdma-stmicroelectronics.pdf
- :Authors:
- - Amelie Delaunay <[email protected]>
|