net/mlx5e: TX latency optimization to save DMA reads
A regular TX WQE execution involves two or more DMA reads - one to fetch the WQE, and another one per WQE gather entry. These DMA reads obviously increase the TX latency. There are two mlx5 mechanisms to bypass these DMA reads: 1) Inline WQE 2) Blue Flame (BF) An inline WQE contains a whole packet, thus saves the DMA read/s of the regular WQE gather entry/s. Inline WQE support was already added in the previous commit. A BF WQE is written directly to the device I/O mapped memory, thus enables saving the DMA read that fetches the WQE. The BF WQE I/O write must be in cache line granularity, thus uses the CPU write combining mechanism. A BF WQE I/O write acts also as a TX doorbell for notifying the device of new TX WQEs. A BF WQE is written to the same I/O mapped address as the regular TX doorbell, thus this address is being mapped twice - once by ioremap() and once by io_mapping_map_wc(). While both mechanisms reduce the TX latency, they both consume more CPU cycles than a regular WQE: - A BF WQE must still be written to host memory, in addition to being written directly to the device I/O mapped memory. - An inline WQE involves copying the SKB data into it. To handle this tradeoff, we introduce here a heuristic algorithm that strives to avoid using these two mechanisms in case the TX queue is being back-pressured by the device, and limit their usage rate otherwise. An inline WQE will always be "Blue Flamed" (written directly to the device I/O mapped memory) while a BF WQE may not be inlined (may contain gather entries). Preliminary testing using netperf UDP_RR shows that the latency goes down from 17.5us to 16.9us, while the message rate (tested with pktgen) stays the same. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:

committed by
David S. Miller

parent
58d522912a
commit
88a85f99e5
@@ -380,7 +380,7 @@ struct mlx5_uar {
|
||||
u32 index;
|
||||
struct list_head bf_list;
|
||||
unsigned free_bf_bmap;
|
||||
void __iomem *wc_map;
|
||||
void __iomem *bf_map;
|
||||
void __iomem *map;
|
||||
};
|
||||
|
||||
@@ -435,6 +435,8 @@ struct mlx5_priv {
|
||||
struct mlx5_uuar_info uuari;
|
||||
MLX5_DECLARE_DOORBELL_LOCK(cq_uar_lock);
|
||||
|
||||
struct io_mapping *bf_mapping;
|
||||
|
||||
/* pages stuff */
|
||||
struct workqueue_struct *pg_wq;
|
||||
struct rb_root page_root;
|
||||
|
Reference in New Issue
Block a user