net/mlx5e: TX latency optimization to save DMA reads

A regular TX WQE execution involves two or more DMA reads -
one to fetch the WQE, and another one per WQE gather entry.

These DMA reads obviously increase the TX latency.
There are two mlx5 mechanisms to bypass these DMA reads:
1) Inline WQE
2) Blue Flame (BF)

An inline WQE contains a whole packet, thus saves the DMA read/s
of the regular WQE gather entry/s. Inline WQE support was already
added in the previous commit.

A BF WQE is written directly to the device I/O mapped memory, thus
enables saving the DMA read that fetches the WQE.

The BF WQE I/O write must be in cache line granularity, thus uses
the CPU write combining mechanism.
A BF WQE I/O write acts also as a TX doorbell for notifying the
device of new TX WQEs.
A BF WQE is written to the same I/O mapped address as the regular TX
doorbell, thus this address is being mapped twice - once by ioremap()
and once by io_mapping_map_wc().

While both mechanisms reduce the TX latency, they both consume more CPU
cycles than a regular WQE:
- A BF WQE must still be written to host memory, in addition to being
  written directly to the device I/O mapped memory.
- An inline WQE involves copying the SKB data into it.

To handle this tradeoff, we introduce here a heuristic algorithm that
strives to avoid using these two mechanisms in case the TX queue is
being back-pressured by the device, and limit their usage rate otherwise.

An inline WQE will always be "Blue Flamed" (written directly to the
device I/O mapped memory) while a BF WQE may not be inlined (may contain
gather entries).

Preliminary testing using netperf UDP_RR shows that the latency goes down
from 17.5us to 16.9us, while the message rate (tested with pktgen) stays
the same.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
Achiad Shochat
2015-07-23 23:35:59 +03:00
committed by David S. Miller
parent 58d522912a
commit 88a85f99e5
6 changed files with 79 additions and 19 deletions

View File

@@ -380,7 +380,7 @@ struct mlx5_uar {
u32 index;
struct list_head bf_list;
unsigned free_bf_bmap;
void __iomem *wc_map;
void __iomem *bf_map;
void __iomem *map;
};
@@ -435,6 +435,8 @@ struct mlx5_priv {
struct mlx5_uuar_info uuari;
MLX5_DECLARE_DOORBELL_LOCK(cq_uar_lock);
struct io_mapping *bf_mapping;
/* pages stuff */
struct workqueue_struct *pg_wq;
struct rb_root page_root;