net: documentation: build a directory structure for drivers
Documentation/networking/ is full of cryptically named files with driver documentation. This makes finding interesting information at a glance really hard. Move all those files into a directory called device_drivers (since not all drivers are for device) and fix up references. RFC v0.1 -> RFC v1: - also add .txt suffix to the files which are missing it (Quentin) Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Henrik Austad <henrik@austad.us> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:

committed by
David S. Miller

parent
a74f0fa082
commit
b255e500c8
305
Documentation/networking/device_drivers/amazon/ena.txt
Normal file
305
Documentation/networking/device_drivers/amazon/ena.txt
Normal file
@@ -0,0 +1,305 @@
|
||||
Linux kernel driver for Elastic Network Adapter (ENA) family:
|
||||
=============================================================
|
||||
|
||||
Overview:
|
||||
=========
|
||||
ENA is a networking interface designed to make good use of modern CPU
|
||||
features and system architectures.
|
||||
|
||||
The ENA device exposes a lightweight management interface with a
|
||||
minimal set of memory mapped registers and extendable command set
|
||||
through an Admin Queue.
|
||||
|
||||
The driver supports a range of ENA devices, is link-speed independent
|
||||
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
|
||||
a negotiated and extendable feature set.
|
||||
|
||||
Some ENA devices support SR-IOV. This driver is used for both the
|
||||
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
|
||||
|
||||
ENA devices enable high speed and low overhead network traffic
|
||||
processing by providing multiple Tx/Rx queue pairs (the maximum number
|
||||
is advertised by the device via the Admin Queue), a dedicated MSI-X
|
||||
interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation,
|
||||
and CPU cacheline optimized data placement.
|
||||
|
||||
The ENA driver supports industry standard TCP/IP offload features such
|
||||
as checksum offload and TCP transmit segmentation offload (TSO).
|
||||
Receive-side scaling (RSS) is supported for multi-core scaling.
|
||||
|
||||
The ENA driver and its corresponding devices implement health
|
||||
monitoring mechanisms such as watchdog, enabling the device and driver
|
||||
to recover in a manner transparent to the application, as well as
|
||||
debug logs.
|
||||
|
||||
Some of the ENA devices support a working mode called Low-latency
|
||||
Queue (LLQ), which saves several more microseconds.
|
||||
|
||||
Supported PCI vendor ID/device IDs:
|
||||
===================================
|
||||
1d0f:0ec2 - ENA PF
|
||||
1d0f:1ec2 - ENA PF with LLQ support
|
||||
1d0f:ec20 - ENA VF
|
||||
1d0f:ec21 - ENA VF with LLQ support
|
||||
|
||||
ENA Source Code Directory Structure:
|
||||
====================================
|
||||
ena_com.[ch] - Management communication layer. This layer is
|
||||
responsible for the handling all the management
|
||||
(admin) communication between the device and the
|
||||
driver.
|
||||
ena_eth_com.[ch] - Tx/Rx data path.
|
||||
ena_admin_defs.h - Definition of ENA management interface.
|
||||
ena_eth_io_defs.h - Definition of ENA data path interface.
|
||||
ena_common_defs.h - Common definitions for ena_com layer.
|
||||
ena_regs_defs.h - Definition of ENA PCI memory-mapped (MMIO) registers.
|
||||
ena_netdev.[ch] - Main Linux kernel driver.
|
||||
ena_syfsfs.[ch] - Sysfs files.
|
||||
ena_ethtool.c - ethtool callbacks.
|
||||
ena_pci_id_tbl.h - Supported device IDs.
|
||||
|
||||
Management Interface:
|
||||
=====================
|
||||
ENA management interface is exposed by means of:
|
||||
- PCIe Configuration Space
|
||||
- Device Registers
|
||||
- Admin Queue (AQ) and Admin Completion Queue (ACQ)
|
||||
- Asynchronous Event Notification Queue (AENQ)
|
||||
|
||||
ENA device MMIO Registers are accessed only during driver
|
||||
initialization and are not involved in further normal device
|
||||
operation.
|
||||
|
||||
AQ is used for submitting management commands, and the
|
||||
results/responses are reported asynchronously through ACQ.
|
||||
|
||||
ENA introduces a very small set of management commands with room for
|
||||
vendor-specific extensions. Most of the management operations are
|
||||
framed in a generic Get/Set feature command.
|
||||
|
||||
The following admin queue commands are supported:
|
||||
- Create I/O submission queue
|
||||
- Create I/O completion queue
|
||||
- Destroy I/O submission queue
|
||||
- Destroy I/O completion queue
|
||||
- Get feature
|
||||
- Set feature
|
||||
- Configure AENQ
|
||||
- Get statistics
|
||||
|
||||
Refer to ena_admin_defs.h for the list of supported Get/Set Feature
|
||||
properties.
|
||||
|
||||
The Asynchronous Event Notification Queue (AENQ) is a uni-directional
|
||||
queue used by the ENA device to send to the driver events that cannot
|
||||
be reported using ACQ. AENQ events are subdivided into groups. Each
|
||||
group may have multiple syndromes, as shown below
|
||||
|
||||
The events are:
|
||||
Group Syndrome
|
||||
Link state change - X -
|
||||
Fatal error - X -
|
||||
Notification Suspend traffic
|
||||
Notification Resume traffic
|
||||
Keep-Alive - X -
|
||||
|
||||
ACQ and AENQ share the same MSI-X vector.
|
||||
|
||||
Keep-Alive is a special mechanism that allows monitoring of the
|
||||
device's health. The driver maintains a watchdog (WD) handler which,
|
||||
if fired, logs the current state and statistics then resets and
|
||||
restarts the ENA device and driver. A Keep-Alive event is delivered by
|
||||
the device every second. The driver re-arms the WD upon reception of a
|
||||
Keep-Alive event. A missed Keep-Alive event causes the WD handler to
|
||||
fire.
|
||||
|
||||
Data Path Interface:
|
||||
====================
|
||||
I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
|
||||
SQ correspondingly). Each SQ has a completion queue (CQ) associated
|
||||
with it.
|
||||
|
||||
The SQs and CQs are implemented as descriptor rings in contiguous
|
||||
physical memory.
|
||||
|
||||
The ENA driver supports two Queue Operation modes for Tx SQs:
|
||||
- Regular mode
|
||||
* In this mode the Tx SQs reside in the host's memory. The ENA
|
||||
device fetches the ENA Tx descriptors and packet data from host
|
||||
memory.
|
||||
- Low Latency Queue (LLQ) mode or "push-mode".
|
||||
* In this mode the driver pushes the transmit descriptors and the
|
||||
first 128 bytes of the packet directly to the ENA device memory
|
||||
space. The rest of the packet payload is fetched by the
|
||||
device. For this operation mode, the driver uses a dedicated PCI
|
||||
device memory BAR, which is mapped with write-combine capability.
|
||||
|
||||
The Rx SQs support only the regular mode.
|
||||
|
||||
Note: Not all ENA devices support LLQ, and this feature is negotiated
|
||||
with the device upon initialization. If the ENA device does not
|
||||
support LLQ mode, the driver falls back to the regular mode.
|
||||
|
||||
The driver supports multi-queue for both Tx and Rx. This has various
|
||||
benefits:
|
||||
- Reduced CPU/thread/process contention on a given Ethernet interface.
|
||||
- Cache miss rate on completion is reduced, particularly for data
|
||||
cache lines that hold the sk_buff structures.
|
||||
- Increased process-level parallelism when handling received packets.
|
||||
- Increased data cache hit rate, by steering kernel processing of
|
||||
packets to the CPU, where the application thread consuming the
|
||||
packet is running.
|
||||
- In hardware interrupt re-direction.
|
||||
|
||||
Interrupt Modes:
|
||||
================
|
||||
The driver assigns a single MSI-X vector per queue pair (for both Tx
|
||||
and Rx directions). The driver assigns an additional dedicated MSI-X vector
|
||||
for management (for ACQ and AENQ).
|
||||
|
||||
Management interrupt registration is performed when the Linux kernel
|
||||
probes the adapter, and it is de-registered when the adapter is
|
||||
removed. I/O queue interrupt registration is performed when the Linux
|
||||
interface of the adapter is opened, and it is de-registered when the
|
||||
interface is closed.
|
||||
|
||||
The management interrupt is named:
|
||||
ena-mgmnt@pci:<PCI domain:bus:slot.function>
|
||||
and for each queue pair, an interrupt is named:
|
||||
<interface name>-Tx-Rx-<queue index>
|
||||
|
||||
The ENA device operates in auto-mask and auto-clear interrupt
|
||||
modes. That is, once MSI-X is delivered to the host, its Cause bit is
|
||||
automatically cleared and the interrupt is masked. The interrupt is
|
||||
unmasked by the driver after NAPI processing is complete.
|
||||
|
||||
Interrupt Moderation:
|
||||
=====================
|
||||
ENA driver and device can operate in conventional or adaptive interrupt
|
||||
moderation mode.
|
||||
|
||||
In conventional mode the driver instructs device to postpone interrupt
|
||||
posting according to static interrupt delay value. The interrupt delay
|
||||
value can be configured through ethtool(8). The following ethtool
|
||||
parameters are supported by the driver: tx-usecs, rx-usecs
|
||||
|
||||
In adaptive interrupt moderation mode the interrupt delay value is
|
||||
updated by the driver dynamically and adjusted every NAPI cycle
|
||||
according to the traffic nature.
|
||||
|
||||
By default ENA driver applies adaptive coalescing on Rx traffic and
|
||||
conventional coalescing on Tx traffic.
|
||||
|
||||
Adaptive coalescing can be switched on/off through ethtool(8)
|
||||
adaptive_rx on|off parameter.
|
||||
|
||||
The driver chooses interrupt delay value according to the number of
|
||||
bytes and packets received between interrupt unmasking and interrupt
|
||||
posting. The driver uses interrupt delay table that subdivides the
|
||||
range of received bytes/packets into 5 levels and assigns interrupt
|
||||
delay value to each level.
|
||||
|
||||
The user can enable/disable adaptive moderation, modify the interrupt
|
||||
delay table and restore its default values through sysfs.
|
||||
|
||||
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
|
||||
and can be configured by the ETHTOOL_STUNABLE command of the
|
||||
SIOCETHTOOL ioctl.
|
||||
|
||||
SKB:
|
||||
The driver-allocated SKB for frames received from Rx handling using
|
||||
NAPI context. The allocation method depends on the size of the packet.
|
||||
If the frame length is larger than rx_copybreak, napi_get_frags()
|
||||
is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer
|
||||
content is copied (by CPU) to the SKB, and the buffer is recycled.
|
||||
|
||||
Statistics:
|
||||
===========
|
||||
The user can obtain ENA device and driver statistics using ethtool.
|
||||
The driver can collect regular or extended statistics (including
|
||||
per-queue stats) from the device.
|
||||
|
||||
In addition the driver logs the stats to syslog upon device reset.
|
||||
|
||||
MTU:
|
||||
====
|
||||
The driver supports an arbitrarily large MTU with a maximum that is
|
||||
negotiated with the device. The driver configures MTU using the
|
||||
SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
|
||||
via ip(8) and similar legacy tools.
|
||||
|
||||
Stateless Offloads:
|
||||
===================
|
||||
The ENA driver supports:
|
||||
- TSO over IPv4/IPv6
|
||||
- TSO with ECN
|
||||
- IPv4 header checksum offload
|
||||
- TCP/UDP over IPv4/IPv6 checksum offloads
|
||||
|
||||
RSS:
|
||||
====
|
||||
- The ENA device supports RSS that allows flexible Rx traffic
|
||||
steering.
|
||||
- Toeplitz and CRC32 hash functions are supported.
|
||||
- Different combinations of L2/L3/L4 fields can be configured as
|
||||
inputs for hash functions.
|
||||
- The driver configures RSS settings using the AQ SetFeature command
|
||||
(ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and
|
||||
ENA_ADMIN_RSS_REDIRECTION_TABLE_CONFIG properties).
|
||||
- If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash
|
||||
function delivered in the Rx CQ descriptor is set in the received
|
||||
SKB.
|
||||
- The user can provide a hash key, hash function, and configure the
|
||||
indirection table through ethtool(8).
|
||||
|
||||
DATA PATH:
|
||||
==========
|
||||
Tx:
|
||||
---
|
||||
end_start_xmit() is called by the stack. This function does the following:
|
||||
- Maps data buffers (skb->data and frags).
|
||||
- Populates ena_buf for the push buffer (if the driver and device are
|
||||
in push mode.)
|
||||
- Prepares ENA bufs for the remaining frags.
|
||||
- Allocates a new request ID from the empty req_id ring. The request
|
||||
ID is the index of the packet in the Tx info. This is used for
|
||||
out-of-order TX completions.
|
||||
- Adds the packet to the proper place in the Tx ring.
|
||||
- Calls ena_com_prepare_tx(), an ENA communication layer that converts
|
||||
the ena_bufs to ENA descriptors (and adds meta ENA descriptors as
|
||||
needed.)
|
||||
* This function also copies the ENA descriptors and the push buffer
|
||||
to the Device memory space (if in push mode.)
|
||||
- Writes doorbell to the ENA device.
|
||||
- When the ENA device finishes sending the packet, a completion
|
||||
interrupt is raised.
|
||||
- The interrupt handler schedules NAPI.
|
||||
- The ena_clean_tx_irq() function is called. This function handles the
|
||||
completion descriptors generated by the ENA, with a single
|
||||
completion descriptor per completed packet.
|
||||
* req_id is retrieved from the completion descriptor. The tx_info of
|
||||
the packet is retrieved via the req_id. The data buffers are
|
||||
unmapped and req_id is returned to the empty req_id ring.
|
||||
* The function stops when the completion descriptors are completed or
|
||||
the budget is reached.
|
||||
|
||||
Rx:
|
||||
---
|
||||
- When a packet is received from the ENA device.
|
||||
- The interrupt handler schedules NAPI.
|
||||
- The ena_clean_rx_irq() function is called. This function calls
|
||||
ena_rx_pkt(), an ENA communication layer function, which returns the
|
||||
number of descriptors used for a new unhandled packet, and zero if
|
||||
no new packet is found.
|
||||
- Then it calls the ena_clean_rx_irq() function.
|
||||
- ena_eth_rx_skb() checks packet length:
|
||||
* If the packet is small (len < rx_copybreak), the driver allocates
|
||||
a SKB for the new packet, and copies the packet payload into the
|
||||
SKB data buffer.
|
||||
- In this way the original data buffer is not passed to the stack
|
||||
and is reused for future Rx packets.
|
||||
* Otherwise the function unmaps the Rx buffer, then allocates the
|
||||
new SKB structure and hooks the Rx buffer to the SKB frags.
|
||||
- The new SKB is updated with the necessary information (protocol,
|
||||
checksum hw verify result, etc.), and then passed to the network
|
||||
stack, using the NAPI interface function napi_gro_receive().
|
Reference in New Issue
Block a user