Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
This commit is contained in:
Linus Torvalds
2020-10-15 18:42:13 -07:00
2301 changed files with 130033 additions and 50954 deletions

View File

@@ -258,14 +258,21 @@ socket into zero-copy mode or fail.
XDP_SHARED_UMEM bind flag
-------------------------
This flag enables you to bind multiple sockets to the same UMEM, but
only if they share the same queue id. In this mode, each socket has
their own RX and TX rings, but the UMEM (tied to the fist socket
created) only has a single FILL ring and a single COMPLETION
ring. To use this mode, create the first socket and bind it in the normal
way. Create a second socket and create an RX and a TX ring, or at
least one of them, but no FILL or COMPLETION rings as the ones from
the first socket will be used. In the bind call, set he
This flag enables you to bind multiple sockets to the same UMEM. It
works on the same queue id, between queue ids and between
netdevs/devices. In this mode, each socket has their own RX and TX
rings as usual, but you are going to have one or more FILL and
COMPLETION ring pairs. You have to create one of these pairs per
unique netdev and queue id tuple that you bind to.
Starting with the case were we would like to share a UMEM between
sockets bound to the same netdev and queue id. The UMEM (tied to the
fist socket created) will only have a single FILL ring and a single
COMPLETION ring as there is only on unique netdev,queue_id tuple that
we have bound to. To use this mode, create the first socket and bind
it in the normal way. Create a second socket and create an RX and a TX
ring, or at least one of them, but no FILL or COMPLETION rings as the
ones from the first socket will be used. In the bind call, set he
XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field. You can attach an arbitrary number of extra
sockets this way.
@@ -305,11 +312,41 @@ concurrently. There are no synchronization primitives in the
libbpf code that protects multiple users at this point in time.
Libbpf uses this mode if you create more than one socket tied to the
same umem. However, note that you need to supply the
same UMEM. However, note that you need to supply the
XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the
xsk_socket__create calls and load your own XDP program as there is no
built in one in libbpf that will route the traffic for you.
The second case is when you share a UMEM between sockets that are
bound to different queue ids and/or netdevs. In this case you have to
create one FILL ring and one COMPLETION ring for each unique
netdev,queue_id pair. Let us say you want to create two sockets bound
to two different queue ids on the same netdev. Create the first socket
and bind it in the normal way. Create a second socket and create an RX
and a TX ring, or at least one of them, and then one FILL and
COMPLETION ring for this socket. Then in the bind call, set he
XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field as you registered the UMEM on that
socket. These two sockets will now share one and the same UMEM.
There is no need to supply an XDP program like the one in the previous
case where sockets were bound to the same queue id and
device. Instead, use the NIC's packet steering capabilities to steer
the packets to the right queue. In the previous example, there is only
one queue shared among sockets, so the NIC cannot do this steering. It
can only steer between queues.
In libbpf, you need to use the xsk_socket__create_shared() API as it
takes a reference to a FILL ring and a COMPLETION ring that will be
created for you and bound to the shared UMEM. You can use this
function for all the sockets you create, or you can use it for the
second and following ones and use xsk_socket__create() for the first
one. Both methods yield the same result.
Note that a UMEM can be shared between sockets on the same queue id
and device, as well as between queues on the same device and between
devices at the same time.
XDP_USE_NEED_WAKEUP bind flag
-----------------------------
@@ -364,7 +401,7 @@ resources by only setting up one of them. Both the FILL ring and the
COMPLETION ring are mandatory as you need to have a UMEM tied to your
socket. But if the XDP_SHARED_UMEM flag is used, any socket after the
first one does not have a UMEM and should in that case not have any
FILL or COMPLETION rings created as the ones from the shared umem will
FILL or COMPLETION rings created as the ones from the shared UMEM will
be used. Note, that the rings are single-producer single-consumer, so
do not try to access them from multiple processes at the same
time. See the XDP_SHARED_UMEM section.
@@ -567,6 +604,17 @@ A: The short answer is no, that is not supported at the moment. The
switch, or other distribution mechanism, in your NIC to direct
traffic to the correct queue id and socket.
Q: My packets are sometimes corrupted. What is wrong?
A: Care has to be taken not to feed the same buffer in the UMEM into
more than one ring at the same time. If you for example feed the
same buffer into the FILL ring and the TX ring at the same time, the
NIC might receive data into the buffer at the same time it is
sending it. This will cause some packets to become corrupted. Same
thing goes for feeding the same buffer into the FILL rings
belonging to different queue ids or netdevs bound with the
XDP_SHARED_UMEM flag.
Credits
=======

View File

@@ -10,4 +10,3 @@ Contents:
linux_caif
caif
spi_porting

View File

@@ -1,229 +0,0 @@
.. SPDX-License-Identifier: GPL-2.0
================
CAIF SPI porting
================
CAIF SPI basics
===============
Running CAIF over SPI needs some extra setup, owing to the nature of SPI.
Two extra GPIOs have been added in order to negotiate the transfers
between the master and the slave. The minimum requirement for running
CAIF over SPI is a SPI slave chip and two GPIOs (more details below).
Please note that running as a slave implies that you need to keep up
with the master clock. An overrun or underrun event is fatal.
CAIF SPI framework
==================
To make porting as easy as possible, the CAIF SPI has been divided in
two parts. The first part (called the interface part) deals with all
generic functionality such as length framing, SPI frame negotiation
and SPI frame delivery and transmission. The other part is the CAIF
SPI slave device part, which is the module that you have to write if
you want to run SPI CAIF on a new hardware. This part takes care of
the physical hardware, both with regard to SPI and to GPIOs.
- Implementing a CAIF SPI device:
- Functionality provided by the CAIF SPI slave device:
In order to implement a SPI device you will, as a minimum,
need to implement the following
functions:
::
int (*init_xfer) (struct cfspi_xfer * xfer, struct cfspi_dev *dev):
This function is called by the CAIF SPI interface to give
you a chance to set up your hardware to be ready to receive
a stream of data from the master. The xfer structure contains
both physical and logical addresses, as well as the total length
of the transfer in both directions.The dev parameter can be used
to map to different CAIF SPI slave devices.
::
void (*sig_xfer) (bool xfer, struct cfspi_dev *dev):
This function is called by the CAIF SPI interface when the output
(SPI_INT) GPIO needs to change state. The boolean value of the xfer
variable indicates whether the GPIO should be asserted (HIGH) or
deasserted (LOW). The dev parameter can be used to map to different CAIF
SPI slave devices.
- Functionality provided by the CAIF SPI interface:
::
void (*ss_cb) (bool assert, struct cfspi_ifc *ifc);
This function is called by the CAIF SPI slave device in order to
signal a change of state of the input GPIO (SS) to the interface.
Only active edges are mandatory to be reported.
This function can be called from IRQ context (recommended in order
not to introduce latency). The ifc parameter should be the pointer
returned from the platform probe function in the SPI device structure.
::
void (*xfer_done_cb) (struct cfspi_ifc *ifc);
This function is called by the CAIF SPI slave device in order to
report that a transfer is completed. This function should only be
called once both the transmission and the reception are completed.
This function can be called from IRQ context (recommended in order
not to introduce latency). The ifc parameter should be the pointer
returned from the platform probe function in the SPI device structure.
- Connecting the bits and pieces:
- Filling in the SPI slave device structure:
Connect the necessary callback functions.
Indicate clock speed (used to calculate toggle delays).
Chose a suitable name (helps debugging if you use several CAIF
SPI slave devices).
Assign your private data (can be used to map to your
structure).
- Filling in the SPI slave platform device structure:
Add name of driver to connect to ("cfspi_sspi").
Assign the SPI slave device structure as platform data.
Padding
=======
In order to optimize throughput, a number of SPI padding options are provided.
Padding can be enabled independently for uplink and downlink transfers.
Padding can be enabled for the head, the tail and for the total frame size.
The padding needs to be correctly configured on both sides of the link.
The padding can be changed via module parameters in cfspi_sspi.c or via
the sysfs directory of the cfspi_sspi driver (before device registration).
- CAIF SPI device template::
/*
* Copyright (C) ST-Ericsson AB 2010
* Author: Daniel Martensson / Daniel.Martensson@stericsson.com
* License terms: GNU General Public License (GPL), version 2.
*
*/
#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/wait.h>
#include <linux/interrupt.h>
#include <linux/dma-mapping.h>
#include <net/caif/caif_spi.h>
MODULE_LICENSE("GPL");
struct sspi_struct {
struct cfspi_dev sdev;
struct cfspi_xfer *xfer;
};
static struct sspi_struct slave;
static struct platform_device slave_device;
static irqreturn_t sspi_irq(int irq, void *arg)
{
/* You only need to trigger on an edge to the active state of the
* SS signal. Once a edge is detected, the ss_cb() function should be
* called with the parameter assert set to true. It is OK
* (and even advised) to call the ss_cb() function in IRQ context in
* order not to add any delay. */
return IRQ_HANDLED;
}
static void sspi_complete(void *context)
{
/* Normally the DMA or the SPI framework will call you back
* in something similar to this. The only thing you need to
* do is to call the xfer_done_cb() function, providing the pointer
* to the CAIF SPI interface. It is OK to call this function
* from IRQ context. */
}
static int sspi_init_xfer(struct cfspi_xfer *xfer, struct cfspi_dev *dev)
{
/* Store transfer info. For a normal implementation you should
* set up your DMA here and make sure that you are ready to
* receive the data from the master SPI. */
struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
sspi->xfer = xfer;
return 0;
}
void sspi_sig_xfer(bool xfer, struct cfspi_dev *dev)
{
/* If xfer is true then you should assert the SPI_INT to indicate to
* the master that you are ready to receive the data from the master
* SPI. If xfer is false then you should de-assert SPI_INT to indicate
* that the transfer is done.
*/
struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
}
static void sspi_release(struct device *dev)
{
/*
* Here you should release your SPI device resources.
*/
}
static int __init sspi_init(void)
{
/* Here you should initialize your SPI device by providing the
* necessary functions, clock speed, name and private data. Once
* done, you can register your device with the
* platform_device_register() function. This function will return
* with the CAIF SPI interface initialized. This is probably also
* the place where you should set up your GPIOs, interrupts and SPI
* resources. */
int res = 0;
/* Initialize slave device. */
slave.sdev.init_xfer = sspi_init_xfer;
slave.sdev.sig_xfer = sspi_sig_xfer;
slave.sdev.clk_mhz = 13;
slave.sdev.priv = &slave;
slave.sdev.name = "spi_sspi";
slave_device.dev.release = sspi_release;
/* Initialize platform device. */
slave_device.name = "cfspi_sspi";
slave_device.dev.platform_data = &slave.sdev;
/* Register platform device. */
res = platform_device_register(&slave_device);
if (res) {
printk(KERN_WARNING "sspi_init: failed to register dev.\n");
return -ENODEV;
}
return res;
}
static void __exit sspi_exit(void)
{
platform_device_del(&slave_device);
}
module_init(sspi_init);
module_exit(sspi_exit);

View File

@@ -39,16 +39,6 @@ debug logs.
Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds.
Supported PCI vendor ID/device IDs
==================================
========= =======================
1d0f:0ec2 ENA PF
1d0f:1ec2 ENA PF with LLQ support
1d0f:ec20 ENA VF
1d0f:ec21 ENA VF with LLQ support
========= =======================
ENA Source Code Directory Structure
===================================
@@ -212,20 +202,11 @@ In adaptive interrupt moderation mode the interrupt delay value is
updated by the driver dynamically and adjusted every NAPI cycle
according to the traffic nature.
By default ENA driver applies adaptive coalescing on Rx traffic and
conventional coalescing on Tx traffic.
Adaptive coalescing can be switched on/off through ethtool(8)
adaptive_rx on|off parameter.
The driver chooses interrupt delay value according to the number of
bytes and packets received between interrupt unmasking and interrupt
posting. The driver uses interrupt delay table that subdivides the
range of received bytes/packets into 5 levels and assigns interrupt
delay value to each level.
The user can enable/disable adaptive moderation, modify the interrupt
delay table and restore its default values through sysfs.
More information about Adaptive Interrupt Moderation (DIM) can be found in
Documentation/networking/net_dim.rst
RX copybreak
============
@@ -274,7 +255,7 @@ RSS
inputs for hash functions.
- The driver configures RSS settings using the AQ SetFeature command
(ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and
ENA_ADMIN_RSS_REDIRECTION_TABLE_CONFIG properties).
ENA_ADMIN_RSS_INDIRECTION_TABLE_CONFIG properties).
- If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash
function delivered in the Rx CQ descriptor is set in the received
SKB.

View File

@@ -16,6 +16,34 @@ Note that the file name is a path relative to the firmware loading path
(usually ``/lib/firmware/``). Drivers may send status updates to inform
user space about the progress of the update operation.
Overwrite Mask
==============
The ``devlink-flash`` command allows optionally specifying a mask indicating
how the device should handle subsections of flash components when updating.
This mask indicates the set of sections which are allowed to be overwritten.
.. list-table:: List of overwrite mask bits
:widths: 5 95
* - Name
- Description
* - ``DEVLINK_FLASH_OVERWRITE_SETTINGS``
- Indicates that the device should overwrite settings in the components
being updated with the settings found in the provided image.
* - ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS``
- Indicates that the device should overwrite identifiers in the
components being updated with the identifiers found in the provided
image. This includes MAC addresses, serial IDs, and similar device
identifiers.
Multiple overwrite bits may be combined and requested together. If no bits
are provided, it is expected that the device only update firmware binaries
in the components being updated. Settings and identifiers are expected to be
preserved across the update. A device may not support every combination and
the driver for such a device must reject any combination which cannot be
faithfully implemented.
Firmware Loading
================

View File

@@ -108,3 +108,9 @@ own name.
* - ``region_snapshot_enable``
- Boolean
- Enable capture of ``devlink-region`` snapshots.
* - ``enable_remote_dev_reset``
- Boolean
- Enable device reset by remote host. When cleared, the device driver
will NACK any attempt of other host to reset the device. This parameter
is useful for setups where a device is shared by different hosts, such
as multi-host setup.

View File

@@ -0,0 +1,81 @@
.. SPDX-License-Identifier: GPL-2.0
==============
Devlink Reload
==============
``devlink-reload`` provides mechanism to reinit driver entities, applying
``devlink-params`` and ``devlink-resources`` new values. It also provides
mechanism to activate firmware.
Reload Actions
==============
User may select a reload action.
By default ``driver_reinit`` action is selected.
.. list-table:: Possible reload actions
:widths: 5 90
* - Name
- Description
* - ``driver-reinit``
- Devlink driver entities re-initialization, including applying
new values to devlink entities which are used during driver
load such as ``devlink-params`` in configuration mode
``driverinit`` or ``devlink-resources``
* - ``fw_activate``
- Firmware activate. Activates new firmware if such image is stored and
pending activation. If no limitation specified this action may involve
firmware reset. If no new image pending this action will reload current
firmware image.
Note that even though user asks for a specific action, the driver
implementation might require to perform another action alongside with
it. For example, some driver do not support driver reinitialization
being performed without fw activation. Therefore, the devlink reload
command returns the list of actions which were actrually performed.
Reload Limits
=============
By default reload actions are not limited and driver implementation may
include reset or downtime as needed to perform the actions.
However, some drivers support action limits, which limit the action
implementation to specific constraints.
.. list-table:: Possible reload limits
:widths: 5 90
* - Name
- Description
* - ``no_reset``
- No reset allowed, no down time allowed, no link flap and no
configuration is lost.
Change Namespace
================
The netns option allows user to be able to move devlink instances into
namespaces during devlink reload operation.
By default all devlink instances are created in init_net and stay there.
example usage
-------------
.. code:: shell
$ devlink dev reload help
$ devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { driver_reinit | fw_activate } ] [ limit no_reset ]
# Run reload command for devlink driver entities re-initialization:
$ devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
driver_reinit
# Run reload command to activate firmware:
# Note that mlx5 driver reloads the driver while activating firmware
$ devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
driver_reinit fw_activate

View File

@@ -409,6 +409,73 @@ be added to the following table:
- ``drop``
- Traps packets dropped due to the RED (Random Early Detection) algorithm
(i.e., early drops)
* - ``vxlan_parsing``
- ``drop``
- Traps packets dropped due to an error in the VXLAN header parsing which
might be because of packet truncation or the I flag is not set.
* - ``llc_snap_parsing``
- ``drop``
- Traps packets dropped due to an error in the LLC+SNAP header parsing
* - ``vlan_parsing``
- ``drop``
- Traps packets dropped due to an error in the VLAN header parsing. Could
include unexpected packet truncation.
* - ``pppoe_ppp_parsing``
- ``drop``
- Traps packets dropped due to an error in the PPPoE+PPP header parsing.
This could include finding a session ID of 0xFFFF (which is reserved and
not for use), a PPPoE length which is larger than the frame received or
any common error on this type of header
* - ``mpls_parsing``
- ``drop``
- Traps packets dropped due to an error in the MPLS header parsing which
could include unexpected header truncation
* - ``arp_parsing``
- ``drop``
- Traps packets dropped due to an error in the ARP header parsing
* - ``ip_1_parsing``
- ``drop``
- Traps packets dropped due to an error in the first IP header parsing.
This packet trap could include packets which do not pass an IP checksum
check, a header length check (a minimum of 20 bytes), which might suffer
from packet truncation thus the total length field exceeds the received
packet length etc
* - ``ip_n_parsing``
- ``drop``
- Traps packets dropped due to an error in the parsing of the last IP
header (the inner one in case of an IP over IP tunnel). The same common
error checking is performed here as for the ip_1_parsing trap
* - ``gre_parsing``
- ``drop``
- Traps packets dropped due to an error in the GRE header parsing
* - ``udp_parsing``
- ``drop``
- Traps packets dropped due to an error in the UDP header parsing.
This packet trap could include checksum errorrs, an improper UDP
length detected (smaller than 8 bytes) or detection of header
truncation.
* - ``tcp_parsing``
- ``drop``
- Traps packets dropped due to an error in the TCP header parsing.
This could include TCP checksum errors, improper combination of SYN, FIN
and/or RESET etc.
* - ``ipsec_parsing``
- ``drop``
- Traps packets dropped due to an error in the IPSEC header parsing
* - ``sctp_parsing``
- ``drop``
- Traps packets dropped due to an error in the SCTP header parsing.
This would mean that port number 0 was used or that the header is
truncated.
* - ``dccp_parsing``
- ``drop``
- Traps packets dropped due to an error in the DCCP header parsing
* - ``gtp_parsing``
- ``drop``
- Traps packets dropped due to an error in the GTP header parsing
* - ``esp_parsing``
- ``drop``
- Traps packets dropped due to an error in the ESP header parsing
Driver-specific Packet Traps
============================
@@ -509,6 +576,9 @@ narrow. The description of these groups must be added to the following table:
* - ``acl_trap``
- Contains packet traps for packets that were trapped (logged) by the
device during ACL processing
* - ``parser_error_drops``
- Contains packet traps for packets that were marked by the device during
parsing as erroneous
Packet Trap Policers
====================

View File

@@ -69,6 +69,11 @@ The ``ice`` driver reports the following versions
- The version of the DDP package that is active in the device. Note
that both the name (as reported by ``fw.app.name``) and version are
required to uniquely identify the package.
* - ``fw.app.bundle_id``
- 0xc0000001
- Unique identifier for the DDP package loaded in the device. Also
referred to as the DDP Track ID. Can be used to uniquely identify
the specific DDP package.
* - ``fw.netlist``
- running
- 1.1.2000-6.7.0
@@ -81,6 +86,37 @@ The ``ice`` driver reports the following versions
- 0xee16ced7
- The first 4 bytes of the hash of the netlist module contents.
Flash Update
============
The ``ice`` driver implements support for flash update using the
``devlink-flash`` interface. It supports updating the device flash using a
combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and
``fw.netlist`` components.
.. list-table:: List of supported overwrite modes
:widths: 5 95
* - Bits
- Behavior
* - ``DEVLINK_FLASH_OVERWRITE_SETTINGS``
- Do not preserve settings stored in the flash components being
updated. This includes overwriting the port configuration that
determines the number of physical functions the device will
initialize with.
* - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS``
- Do not preserve either settings or identifiers. Overwrite everything
in the flash with the contents from the provided image, without
performing any preservation. This includes overwriting device
identifying fields such as the MAC address, VPD area, and device
serial number. It is expected that this combination be used with an
image customized for the specific device.
The ice hardware does not support overwriting only identifiers while
preserving settings, and thus ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` on its
own will be rejected. If no overwrite mask is provided, the firmware will be
instructed to preserve all settings and identifying fields when updating.
Regions
=======

View File

@@ -20,6 +20,7 @@ general.
devlink-params
devlink-region
devlink-resource
devlink-reload
devlink-trap
Driver-specific documentation

View File

@@ -68,6 +68,7 @@ the flags may not apply to requests. Recognized flags are:
================================= ===================================
``ETHTOOL_FLAG_COMPACT_BITSETS`` use compact format bitsets in reply
``ETHTOOL_FLAG_OMIT_REPLY`` omit optional reply (_SET and _ACT)
``ETHTOOL_FLAG_STATS`` include optional device statistics
================================= ===================================
New request flags should follow the general idea that if the flag is not set,
@@ -991,8 +992,18 @@ Kernel response contents:
``ETHTOOL_A_PAUSE_AUTONEG`` bool pause autonegotiation
``ETHTOOL_A_PAUSE_RX`` bool receive pause frames
``ETHTOOL_A_PAUSE_TX`` bool transmit pause frames
``ETHTOOL_A_PAUSE_STATS`` nested pause statistics
===================================== ====== ==========================
``ETHTOOL_A_PAUSE_STATS`` are reported if ``ETHTOOL_FLAG_STATS`` was set
in ``ETHTOOL_A_HEADER_FLAGS``.
It will be empty if driver did not report any statistics. Drivers fill in
the statistics in the following structure:
.. kernel-doc:: include/linux/ethtool.h
:identifiers: ethtool_pause_stats
Each member has a corresponding attribute defined.
PAUSE_SET
============

View File

@@ -93,6 +93,7 @@ Contents:
sctp
secid
seg6-sysctl
statistics
strparser
switchdev
sysfs-tagging

View File

@@ -134,6 +134,15 @@ PHY Support
.. kernel-doc:: drivers/net/phy/phy.c
:internal:
.. kernel-doc:: drivers/net/phy/phy-core.c
:export:
.. kernel-doc:: drivers/net/phy/phy-c45.c
:export:
.. kernel-doc:: include/linux/phy.h
:internal:
.. kernel-doc:: drivers/net/phy/phy_device.c
:export:

View File

@@ -4,124 +4,364 @@
L2TP
====
This document describes how to use the kernel's L2TP drivers to
provide L2TP functionality. L2TP is a protocol that tunnels one or
more sessions over an IP tunnel. It is commonly used for VPNs
(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP
network infrastructure. With L2TPv3, it is also useful as a Layer-2
tunneling infrastructure.
Layer 2 Tunneling Protocol (L2TP) allows L2 frames to be tunneled over
an IP network.
Features
This document covers the kernel's L2TP subsystem. It documents kernel
APIs for application developers who want to use the L2TP subsystem and
it provides some technical details about the internal implementation
which may be useful to kernel developers and maintainers.
Overview
========
L2TPv2 (PPP over L2TP (UDP tunnels)).
L2TPv3 ethernet pseudowires.
L2TPv3 PPP pseudowires.
L2TPv3 IP encapsulation.
Netlink sockets for L2TPv3 configuration management.
The kernel's L2TP subsystem implements the datapath for L2TPv2 and
L2TPv3. L2TPv2 is carried over UDP. L2TPv3 is carried over UDP or
directly over IP (protocol 115).
History
=======
The L2TP RFCs define two basic kinds of L2TP packets: control packets
(the "control plane"), and data packets (the "data plane"). The kernel
deals only with data packets. The more complex control packets are
handled by user space.
The original pppol2tp driver was introduced in 2.6.23 and provided
L2TPv2 functionality (rfc2661). L2TPv2 is used to tunnel one or more PPP
sessions over a UDP tunnel.
An L2TP tunnel carries one or more L2TP sessions. Each tunnel is
associated with a socket. Each session is associated with a virtual
netdevice, e.g. ``pppN``, ``l2tpethN``, through which data frames pass
to/from L2TP. Fields in the L2TP header identify the tunnel or session
and whether it is a control or data packet. When tunnels and sessions
are set up using the Linux kernel API, we're just setting up the L2TP
data path. All aspects of the control protocol are to be handled by
user space.
L2TPv3 (rfc3931) changes the protocol to allow different frame types
to be passed over an L2TP tunnel by moving the PPP-specific parts of
the protocol out of the core L2TP packet headers. Each frame type is
known as a pseudowire type. Ethernet, PPP, HDLC, Frame Relay and ATM
pseudowires for L2TP are defined in separate RFC standards. Another
change for L2TPv3 is that it can be carried directly over IP with no
UDP header (UDP is optional). It is also possible to create static
unmanaged L2TPv3 tunnels manually without a control protocol
(userspace daemon) to manage them.
This split in responsibilities leads to a natural sequence of
operations when establishing tunnels and sessions. The procedure looks
like this:
To support L2TPv3, the original pppol2tp driver was split up to
separate the L2TP and PPP functionality. Existing L2TPv2 userspace
apps should be unaffected as the original pppol2tp sockets API is
retained. L2TPv3, however, uses netlink to manage L2TPv3 tunnels and
sessions.
1) Create a tunnel socket. Exchange L2TP control protocol messages
with the peer over that socket in order to establish a tunnel.
Design
======
2) Create a tunnel context in the kernel, using information
obtained from the peer using the control protocol messages.
The L2TP protocol separates control and data frames. The L2TP kernel
drivers handle only L2TP data frames; control frames are always
handled by userspace. L2TP control frames carry messages between L2TP
clients/servers and are used to setup / teardown tunnels and
sessions. An L2TP client or server is implemented in userspace.
3) Exchange L2TP control protocol messages with the peer over the
tunnel socket in order to establish a session.
Each L2TP tunnel is implemented using a UDP or L2TPIP socket; L2TPIP
provides L2TPv3 IP encapsulation (no UDP) and is implemented using a
new l2tpip socket family. The tunnel socket is typically created by
userspace, though for unmanaged L2TPv3 tunnels, the socket can also be
created by the kernel. Each L2TP session (pseudowire) gets a network
interface instance. In the case of PPP, these interfaces are created
indirectly by pppd using a pppol2tp socket. In the case of ethernet,
the netdevice is created upon a netlink request to create an L2TPv3
ethernet pseudowire.
4) Create a session context in the kernel using information
obtained from the peer using the control protocol messages.
For PPP, the PPPoL2TP driver, net/l2tp/l2tp_ppp.c, provides a
mechanism by which PPP frames carried through an L2TP session are
passed through the kernel's PPP subsystem. The standard PPP daemon,
pppd, handles all PPP interaction with the peer. PPP network
interfaces are created for each local PPP endpoint. The kernel's PPP
subsystem arranges for PPP control frames to be delivered to pppd,
while data frames are forwarded as usual.
L2TP APIs
=========
For ethernet, the L2TPETH driver, net/l2tp/l2tp_eth.c, implements a
netdevice driver, managing virtual ethernet devices, one per
pseudowire. These interfaces can be managed using standard Linux tools
such as "ip" and "ifconfig". If only IP frames are passed over the
tunnel, the interface can be given an IP addresses of itself and its
peer. If non-IP frames are to be passed over the tunnel, the interface
can be added to a bridge using brctl. All L2TP datapath protocol
functions are handled by the L2TP core driver.
This section documents each userspace API of the L2TP subsystem.
Each tunnel and session within a tunnel is assigned a unique tunnel_id
and session_id. These ids are carried in the L2TP header of every
control and data packet. (Actually, in L2TPv3, the tunnel_id isn't
present in data frames - it is inferred from the IP connection on
which the packet was received.) The L2TP driver uses the ids to lookup
internal tunnel and/or session contexts to determine how to handle the
packet. Zero tunnel / session ids are treated specially - zero ids are
never assigned to tunnels or sessions in the network. In the driver,
the tunnel context keeps a reference to the tunnel UDP or L2TPIP
socket. The session context holds data that lets the driver interface
to the kernel's network frame type subsystems, i.e. PPP, ethernet.
Tunnel Sockets
--------------
Userspace Programming
=====================
L2TPv2 always uses UDP. L2TPv3 may use UDP or IP encapsulation.
For L2TPv2, there are a number of requirements on the userspace L2TP
daemon in order to use the pppol2tp driver.
To create a tunnel socket for use by L2TP, the standard POSIX
socket API is used.
1. Use a UDP socket per tunnel.
For example, for a tunnel using IPv4 addresses and UDP encapsulation::
2. Create a single PPPoL2TP socket per tunnel bound to a special null
session id. This is used only for communicating with the driver but
must remain open while the tunnel is active. Opening this tunnel
management socket causes the driver to mark the tunnel socket as an
L2TP UDP encapsulation socket and flags it for use by the
referenced tunnel id. This hooks up the UDP receive path via
udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed
in this special PPPoX socket.
int sockfd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
3. Create a PPPoL2TP socket per L2TP session. This is typically done
by starting pppd with the pppol2tp plugin and appropriate
arguments. A PPPoL2TP tunnel management socket (Step 2) must be
created before the first PPPoL2TP session socket is created.
Or for a tunnel using IPv6 addresses and IP encapsulation::
int sockfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP);
UDP socket programming doesn't need to be covered here.
IPPROTO_L2TP is an IP protocol type implemented by the kernel's L2TP
subsystem. The L2TPIP socket address is defined in struct
sockaddr_l2tpip and struct sockaddr_l2tpip6 at
`include/uapi/linux/l2tp.h`_. The address includes the L2TP tunnel
(connection) id. To use L2TP IP encapsulation, an L2TPv3 application
should bind the L2TPIP socket using the locally assigned
tunnel id. When the peer's tunnel id and IP address is known, a
connect must be done.
If the L2TP application needs to handle L2TPv3 tunnel setup requests
from peers using L2TPIP, it must open a dedicated L2TPIP
socket to listen for those requests and bind the socket using tunnel
id 0 since tunnel setup requests are addressed to tunnel id 0.
An L2TP tunnel and all of its sessions are automatically closed when
its tunnel socket is closed.
Netlink API
-----------
L2TP applications use netlink to manage L2TP tunnel and session
instances in the kernel. The L2TP netlink API is defined in
`include/uapi/linux/l2tp.h`_.
L2TP uses `Generic Netlink`_ (GENL). Several commands are defined:
Create, Delete, Modify and Get for tunnel and session
instances, e.g. ``L2TP_CMD_TUNNEL_CREATE``. The API header lists the
netlink attribute types that can be used with each command.
Tunnel and session instances are identified by a locally unique
32-bit id. L2TP tunnel ids are given by ``L2TP_ATTR_CONN_ID`` and
``L2TP_ATTR_PEER_CONN_ID`` attributes and L2TP session ids are given
by ``L2TP_ATTR_SESSION_ID`` and ``L2TP_ATTR_PEER_SESSION_ID``
attributes. If netlink is used to manage L2TPv2 tunnel and session
instances, the L2TPv2 16-bit tunnel/session id is cast to a 32-bit
value in these attributes.
In the ``L2TP_CMD_TUNNEL_CREATE`` command, ``L2TP_ATTR_FD`` tells the
kernel the tunnel socket fd being used. If not specified, the kernel
creates a kernel socket for the tunnel, using IP parameters set in
``L2TP_ATTR_IP[6]_SADDR``, ``L2TP_ATTR_IP[6]_DADDR``,
``L2TP_ATTR_UDP_SPORT``, ``L2TP_ATTR_UDP_DPORT`` attributes. Kernel
sockets are used to implement unmanaged L2TPv3 tunnels (iproute2's "ip
l2tp" commands). If ``L2TP_ATTR_FD`` is given, it must be a socket fd
that is already bound and connected. There is more information about
unmanaged tunnels later in this document.
``L2TP_CMD_TUNNEL_CREATE`` attributes:-
================== ======== ===
Attribute Required Use
================== ======== ===
CONN_ID Y Sets the tunnel (connection) id.
PEER_CONN_ID Y Sets the peer tunnel (connection) id.
PROTO_VERSION Y Protocol version. 2 or 3.
ENCAP_TYPE Y Encapsulation type: UDP or IP.
FD N Tunnel socket file descriptor.
UDP_CSUM N Enable IPv4 UDP checksums. Used only if FD is
not set.
UDP_ZERO_CSUM6_TX N Zero IPv6 UDP checksum on transmit. Used only
if FD is not set.
UDP_ZERO_CSUM6_RX N Zero IPv6 UDP checksum on receive. Used only if
FD is not set.
IP_SADDR N IPv4 source address. Used only if FD is not
set.
IP_DADDR N IPv4 destination address. Used only if FD is
not set.
UDP_SPORT N UDP source port. Used only if FD is not set.
UDP_DPORT N UDP destination port. Used only if FD is not
set.
IP6_SADDR N IPv6 source address. Used only if FD is not
set.
IP6_DADDR N IPv6 destination address. Used only if FD is
not set.
DEBUG N Debug flags.
================== ======== ===
``L2TP_CMD_TUNNEL_DESTROY`` attributes:-
================== ======== ===
Attribute Required Use
================== ======== ===
CONN_ID Y Identifies the tunnel id to be destroyed.
================== ======== ===
``L2TP_CMD_TUNNEL_MODIFY`` attributes:-
================== ======== ===
Attribute Required Use
================== ======== ===
CONN_ID Y Identifies the tunnel id to be modified.
DEBUG N Debug flags.
================== ======== ===
``L2TP_CMD_TUNNEL_GET`` attributes:-
================== ======== ===
Attribute Required Use
================== ======== ===
CONN_ID N Identifies the tunnel id to be queried.
Ignored in DUMP requests.
================== ======== ===
``L2TP_CMD_SESSION_CREATE`` attributes:-
================== ======== ===
Attribute Required Use
================== ======== ===
CONN_ID Y The parent tunnel id.
SESSION_ID Y Sets the session id.
PEER_SESSION_ID Y Sets the parent session id.
PW_TYPE Y Sets the pseudowire type.
DEBUG N Debug flags.
RECV_SEQ N Enable rx data sequence numbers.
SEND_SEQ N Enable tx data sequence numbers.
LNS_MODE N Enable LNS mode (auto-enable data sequence
numbers).
RECV_TIMEOUT N Timeout to wait when reordering received
packets.
L2SPEC_TYPE N Sets layer2-specific-sublayer type (L2TPv3
only).
COOKIE N Sets optional cookie (L2TPv3 only).
PEER_COOKIE N Sets optional peer cookie (L2TPv3 only).
IFNAME N Sets interface name (L2TPv3 only).
================== ======== ===
For Ethernet session types, this will create an l2tpeth virtual
interface which can then be configured as required. For PPP session
types, a PPPoL2TP socket must also be opened and connected, mapping it
onto the new session. This is covered in "PPPoL2TP Sockets" later.
``L2TP_CMD_SESSION_DESTROY`` attributes:-
================== ======== ===
Attribute Required Use
================== ======== ===
CONN_ID Y Identifies the parent tunnel id of the session
to be destroyed.
SESSION_ID Y Identifies the session id to be destroyed.
IFNAME N Identifies the session by interface name. If
set, this overrides any CONN_ID and SESSION_ID
attributes. Currently supported for L2TPv3
Ethernet sessions only.
================== ======== ===
``L2TP_CMD_SESSION_MODIFY`` attributes:-
================== ======== ===
Attribute Required Use
================== ======== ===
CONN_ID Y Identifies the parent tunnel id of the session
to be modified.
SESSION_ID Y Identifies the session id to be modified.
IFNAME N Identifies the session by interface name. If
set, this overrides any CONN_ID and SESSION_ID
attributes. Currently supported for L2TPv3
Ethernet sessions only.
DEBUG N Debug flags.
RECV_SEQ N Enable rx data sequence numbers.
SEND_SEQ N Enable tx data sequence numbers.
LNS_MODE N Enable LNS mode (auto-enable data sequence
numbers).
RECV_TIMEOUT N Timeout to wait when reordering received
packets.
================== ======== ===
``L2TP_CMD_SESSION_GET`` attributes:-
================== ======== ===
Attribute Required Use
================== ======== ===
CONN_ID N Identifies the tunnel id to be queried.
Ignored for DUMP requests.
SESSION_ID N Identifies the session id to be queried.
Ignored for DUMP requests.
IFNAME N Identifies the session by interface name.
If set, this overrides any CONN_ID and
SESSION_ID attributes. Ignored for DUMP
requests. Currently supported for L2TPv3
Ethernet sessions only.
================== ======== ===
Application developers should refer to `include/uapi/linux/l2tp.h`_ for
netlink command and attribute definitions.
Sample userspace code using libmnl_:
- Open L2TP netlink socket::
struct nl_sock *nl_sock;
int l2tp_nl_family_id;
nl_sock = nl_socket_alloc();
genl_connect(nl_sock);
genl_id = genl_ctrl_resolve(nl_sock, L2TP_GENL_NAME);
- Create a tunnel::
struct nlmsghdr *nlh;
struct genlmsghdr *gnlh;
nlh = mnl_nlmsg_put_header(buf);
nlh->nlmsg_type = genl_id; /* assigned to genl socket */
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
nlh->nlmsg_seq = seq;
gnlh = mnl_nlmsg_put_extra_header(nlh, sizeof(*gnlh));
gnlh->cmd = L2TP_CMD_TUNNEL_CREATE;
gnlh->version = L2TP_GENL_VERSION;
gnlh->reserved = 0;
mnl_attr_put_u32(nlh, L2TP_ATTR_FD, tunl_sock_fd);
mnl_attr_put_u32(nlh, L2TP_ATTR_CONN_ID, tid);
mnl_attr_put_u32(nlh, L2TP_ATTR_PEER_CONN_ID, peer_tid);
mnl_attr_put_u8(nlh, L2TP_ATTR_PROTO_VERSION, protocol_version);
mnl_attr_put_u16(nlh, L2TP_ATTR_ENCAP_TYPE, encap);
- Create a session::
struct nlmsghdr *nlh;
struct genlmsghdr *gnlh;
nlh = mnl_nlmsg_put_header(buf);
nlh->nlmsg_type = genl_id; /* assigned to genl socket */
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
nlh->nlmsg_seq = seq;
gnlh = mnl_nlmsg_put_extra_header(nlh, sizeof(*gnlh));
gnlh->cmd = L2TP_CMD_SESSION_CREATE;
gnlh->version = L2TP_GENL_VERSION;
gnlh->reserved = 0;
mnl_attr_put_u32(nlh, L2TP_ATTR_CONN_ID, tid);
mnl_attr_put_u32(nlh, L2TP_ATTR_PEER_CONN_ID, peer_tid);
mnl_attr_put_u32(nlh, L2TP_ATTR_SESSION_ID, sid);
mnl_attr_put_u32(nlh, L2TP_ATTR_PEER_SESSION_ID, peer_sid);
mnl_attr_put_u16(nlh, L2TP_ATTR_PW_TYPE, pwtype);
/* there are other session options which can be set using netlink
* attributes during session creation -- see l2tp.h
*/
- Delete a session::
struct nlmsghdr *nlh;
struct genlmsghdr *gnlh;
nlh = mnl_nlmsg_put_header(buf);
nlh->nlmsg_type = genl_id; /* assigned to genl socket */
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
nlh->nlmsg_seq = seq;
gnlh = mnl_nlmsg_put_extra_header(nlh, sizeof(*gnlh));
gnlh->cmd = L2TP_CMD_SESSION_DELETE;
gnlh->version = L2TP_GENL_VERSION;
gnlh->reserved = 0;
mnl_attr_put_u32(nlh, L2TP_ATTR_CONN_ID, tid);
mnl_attr_put_u32(nlh, L2TP_ATTR_SESSION_ID, sid);
- Delete a tunnel and all of its sessions (if any)::
struct nlmsghdr *nlh;
struct genlmsghdr *gnlh;
nlh = mnl_nlmsg_put_header(buf);
nlh->nlmsg_type = genl_id; /* assigned to genl socket */
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
nlh->nlmsg_seq = seq;
gnlh = mnl_nlmsg_put_extra_header(nlh, sizeof(*gnlh));
gnlh->cmd = L2TP_CMD_TUNNEL_DELETE;
gnlh->version = L2TP_GENL_VERSION;
gnlh->reserved = 0;
mnl_attr_put_u32(nlh, L2TP_ATTR_CONN_ID, tid);
PPPoL2TP Session Socket API
---------------------------
For PPP session types, a PPPoL2TP socket must be opened and connected
to the L2TP session.
When creating PPPoL2TP sockets, the application provides information
to the driver about the socket in a socket connect() call. Source and
destination tunnel and session ids are provided, as well as the file
descriptor of a UDP socket. See struct pppol2tp_addr in
include/linux/if_pppol2tp.h. Note that zero tunnel / session ids are
treated specially. When creating the per-tunnel PPPoL2TP management
socket in Step 2 above, zero source and destination session ids are
specified, which tells the driver to prepare the supplied UDP file
descriptor for use as an L2TP tunnel socket.
to the kernel about the tunnel and session in a socket connect()
call. Source and destination tunnel and session ids are provided, as
well as the file descriptor of a UDP or L2TPIP socket. See struct
pppol2tp_addr in `include/linux/if_pppol2tp.h`_. For historical reasons,
there are unfortunately slightly different address structures for
L2TPv2/L2TPv3 IPv4/IPv6 tunnels and userspace must use the appropriate
structure that matches the tunnel socket type.
Userspace may control behavior of the tunnel or session using
setsockopt and ioctl on the PPPoX socket. The following socket
@@ -130,229 +370,308 @@ options are supported:-
========= ===========================================================
DEBUG bitmask of debug message categories. See below.
SENDSEQ - 0 => don't send packets with sequence numbers
- 1 => send packets with sequence numbers
- 1 => send packets with sequence numbers
RECVSEQ - 0 => receive packet sequence numbers are optional
- 1 => drop receive packets without sequence numbers
- 1 => drop receive packets without sequence numbers
LNSMODE - 0 => act as LAC.
- 1 => act as LNS.
- 1 => act as LNS.
REORDERTO reorder timeout (in millisecs). If 0, don't try to reorder.
========= ===========================================================
Only the DEBUG option is supported by the special tunnel management
PPPoX socket.
In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided
to retrieve tunnel and session statistics from the kernel using the
PPPoX socket of the appropriate tunnel or session.
For L2TPv3, userspace must use the netlink API defined in
include/linux/l2tp.h to manage tunnel and session contexts. The
general procedure to create a new L2TP tunnel with one session is:-
Sample userspace code:
1. Open a GENL socket using L2TP_GENL_NAME for configuring the kernel
using netlink.
- Create session PPPoX data socket::
2. Create a UDP or L2TPIP socket for the tunnel.
struct sockaddr_pppol2tp sax;
int fd;
3. Create a new L2TP tunnel using a L2TP_CMD_TUNNEL_CREATE
request. Set attributes according to desired tunnel parameters,
referencing the UDP or L2TPIP socket created in the previous step.
/* Note, the tunnel socket must be bound already, else it
* will not be ready
*/
sax.sa_family = AF_PPPOX;
sax.sa_protocol = PX_PROTO_OL2TP;
sax.pppol2tp.fd = tunnel_fd;
sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
sax.pppol2tp.addr.sin_port = addr->sin_port;
sax.pppol2tp.addr.sin_family = AF_INET;
sax.pppol2tp.s_tunnel = tunnel_id;
sax.pppol2tp.s_session = session_id;
sax.pppol2tp.d_tunnel = peer_tunnel_id;
sax.pppol2tp.d_session = peer_session_id;
4. Create a new L2TP session in the tunnel using a
L2TP_CMD_SESSION_CREATE request.
/* session_fd is the fd of the session's PPPoL2TP socket.
* tunnel_fd is the fd of the tunnel UDP / L2TPIP socket.
*/
fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
if (fd < 0 ) {
return -errno;
}
return 0;
The tunnel and all of its sessions are closed when the tunnel socket
is closed. The netlink API may also be used to delete sessions and
tunnels. Configuration and status info may be set or read using netlink.
Old L2TPv2-only API
-------------------
The L2TP driver also supports static (unmanaged) L2TPv3 tunnels. These
are where there is no L2TP control message exchange with the peer to
setup the tunnel; the tunnel is configured manually at each end of the
tunnel. There is no need for an L2TP userspace application in this
case -- the tunnel socket is created by the kernel and configured
using parameters sent in the L2TP_CMD_TUNNEL_CREATE netlink
request. The "ip" utility of iproute2 has commands for managing static
L2TPv3 tunnels; do "ip l2tp help" for more information.
When L2TP was first added to the Linux kernel in 2.6.23, it
implemented only L2TPv2 and did not include a netlink API. Instead,
tunnel and session instances in the kernel were managed directly using
only PPPoL2TP sockets. The PPPoL2TP socket is used as described in
section "PPPoL2TP Session Socket API" but tunnel and session instances
are automatically created on a connect() of the socket instead of
being created by a separate netlink request:
- Tunnels are managed using a tunnel management socket which is a
dedicated PPPoL2TP socket, connected to (invalid) session
id 0. The L2TP tunnel instance is created when the PPPoL2TP
tunnel management socket is connected and is destroyed when the
socket is closed.
- Session instances are created in the kernel when a PPPoL2TP
socket is connected to a non-zero session id. Session parameters
are set using setsockopt. The L2TP session instance is destroyed
when the socket is closed.
This API is still supported but its use is discouraged. Instead, new
L2TPv2 applications should use netlink to first create the tunnel and
session, then create a PPPoL2TP socket for the session.
Unmanaged L2TPv3 tunnels
------------------------
The kernel L2TP subsystem also supports static (unmanaged) L2TPv3
tunnels. Unmanaged tunnels have no userspace tunnel socket, and
exchange no control messages with the peer to set up the tunnel; the
tunnel is configured manually at each end of the tunnel. All
configuration is done using netlink. There is no need for an L2TP
userspace application in this case -- the tunnel socket is created by
the kernel and configured using parameters sent in the
``L2TP_CMD_TUNNEL_CREATE`` netlink request. The ``ip`` utility of
``iproute2`` has commands for managing static L2TPv3 tunnels; do ``ip
l2tp help`` for more information.
Debugging
=========
---------
The driver supports a flexible debug scheme where kernel trace
messages may be optionally enabled per tunnel and per session. Care is
needed when debugging a live system since the messages are not
rate-limited and a busy system could be swamped. Userspace uses
setsockopt on the PPPoX socket to set a debug mask.
The L2TP subsystem offers a range of debugging interfaces through the
debugfs filesystem.
The following debug mask bits are available:
To access these interfaces, the debugfs filesystem must first be mounted::
================ ==============================
L2TP_MSG_DEBUG verbose debug (if compiled in)
L2TP_MSG_CONTROL userspace - kernel interface
L2TP_MSG_SEQ sequence numbers handling
L2TP_MSG_DATA data packets
================ ==============================
# mount -t debugfs debugfs /debug
If enabled, files under a l2tp debugfs directory can be used to dump
kernel state about L2TP tunnels and sessions. To access it, the
debugfs filesystem must first be mounted::
Files under the l2tp directory can then be accessed, providing a summary
of the current population of tunnel and session contexts existing in the
kernel::
# mount -t debugfs debugfs /debug
Files under the l2tp directory can then be accessed::
# cat /debug/l2tp/tunnels
# cat /debug/l2tp/tunnels
The debugfs files should not be used by applications to obtain L2TP
state information because the file format is subject to change. It is
implemented to provide extra debug information to help diagnose
problems.) Users should use the netlink API.
problems. Applications should instead use the netlink API.
/proc/net/pppol2tp is also provided for backwards compatibility with
the original pppol2tp driver. It lists information about L2TPv2
In addition the L2TP subsystem implements tracepoints using the standard
kernel event tracing API. The available L2TP events can be reviewed as
follows::
# find /debug/tracing/events/l2tp
Finally, /proc/net/pppol2tp is also provided for backwards compatibility
with the original pppol2tp code. It lists information about L2TPv2
tunnels and sessions only. Its use is discouraged.
Unmanaged L2TPv3 Tunnels
========================
Some commercial L2TP products support unmanaged L2TPv3 ethernet
tunnels, where there is no L2TP control protocol; tunnels are
configured at each side manually. New commands are available in
iproute2's ip utility to support this.
To create an L2TPv3 ethernet pseudowire between local host 192.168.1.1
and peer 192.168.1.2, using IP addresses 10.5.1.1 and 10.5.1.2 for the
tunnel endpoints::
# ip l2tp add tunnel tunnel_id 1 peer_tunnel_id 1 udp_sport 5000 \
udp_dport 5000 encap udp local 192.168.1.1 remote 192.168.1.2
# ip l2tp add session tunnel_id 1 session_id 1 peer_session_id 1
# ip -s -d show dev l2tpeth0
# ip addr add 10.5.1.2/32 peer 10.5.1.1/32 dev l2tpeth0
# ip li set dev l2tpeth0 up
Choose IP addresses to be the address of a local IP interface and that
of the remote system. The IP addresses of the l2tpeth0 interface can be
anything suitable.
Repeat the above at the peer, with ports, tunnel/session ids and IP
addresses reversed. The tunnel and session IDs can be any non-zero
32-bit number, but the values must be reversed at the peer.
======================== ===================
Host 1 Host2
======================== ===================
udp_sport=5000 udp_sport=5001
udp_dport=5001 udp_dport=5000
tunnel_id=42 tunnel_id=45
peer_tunnel_id=45 peer_tunnel_id=42
session_id=128 session_id=5196755
peer_session_id=5196755 peer_session_id=128
======================== ===================
When done at both ends of the tunnel, it should be possible to send
data over the network. e.g.::
# ping 10.5.1.1
Sample Userspace Code
=====================
1. Create tunnel management PPPoX socket::
kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
if (kernel_fd >= 0) {
struct sockaddr_pppol2tp sax;
struct sockaddr_in const *peer_addr;
peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
memset(&sax, 0, sizeof(sax));
sax.sa_family = AF_PPPOX;
sax.sa_protocol = PX_PROTO_OL2TP;
sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */
sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
sax.pppol2tp.addr.sin_family = AF_INET;
sax.pppol2tp.s_tunnel = tunnel_id;
sax.pppol2tp.s_session = 0; /* special case: mgmt socket */
sax.pppol2tp.d_tunnel = 0;
sax.pppol2tp.d_session = 0; /* special case: mgmt socket */
if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
perror("connect failed");
result = -errno;
goto err;
}
}
2. Create session PPPoX data socket::
struct sockaddr_pppol2tp sax;
int fd;
/* Note, the target socket must be bound already, else it will not be ready */
sax.sa_family = AF_PPPOX;
sax.sa_protocol = PX_PROTO_OL2TP;
sax.pppol2tp.fd = tunnel_fd;
sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
sax.pppol2tp.addr.sin_port = addr->sin_port;
sax.pppol2tp.addr.sin_family = AF_INET;
sax.pppol2tp.s_tunnel = tunnel_id;
sax.pppol2tp.s_session = session_id;
sax.pppol2tp.d_tunnel = peer_tunnel_id;
sax.pppol2tp.d_session = peer_session_id;
/* session_fd is the fd of the session's PPPoL2TP socket.
* tunnel_fd is the fd of the tunnel UDP socket.
*/
fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
if (fd < 0 ) {
return -errno;
}
return 0;
Internal Implementation
=======================
The driver keeps a struct l2tp_tunnel context per L2TP tunnel and a
struct l2tp_session context for each session. The l2tp_tunnel is
always associated with a UDP or L2TP/IP socket and keeps a list of
sessions in the tunnel. The l2tp_session context keeps kernel state
about the session. It has private data which is used for data specific
to the session type. With L2TPv2, the session always carried PPP
traffic. With L2TPv3, the session can also carry ethernet frames
(ethernet pseudowire) or other data types such as ATM, HDLC or Frame
Relay.
This section is for kernel developers and maintainers.
When a tunnel is first opened, the reference count on the socket is
increased using sock_hold(). This ensures that the kernel socket
cannot be removed while L2TP's data structures reference it.
Sockets
-------
Some L2TP sessions also have a socket (PPP pseudowires) while others
do not (ethernet pseudowires). We can't use the socket reference count
as the reference count for session contexts. The L2TP implementation
therefore has its own internal reference counts on the session
contexts.
UDP sockets are implemented by the networking core. When an L2TP
tunnel is created using a UDP socket, the socket is set up as an
encapsulated UDP socket by setting encap_rcv and encap_destroy
callbacks on the UDP socket. l2tp_udp_encap_recv is called when
packets are received on the socket. l2tp_udp_encap_destroy is called
when userspace closes the socket.
To Do
=====
L2TPIP sockets are implemented in `net/l2tp/l2tp_ip.c`_ and
`net/l2tp/l2tp_ip6.c`_.
Add L2TP tunnel switching support. This would route tunneled traffic
from one L2TP tunnel into another. Specified in
http://tools.ietf.org/html/draft-ietf-l2tpext-tunnel-switching-08
Tunnels
-------
Add L2TPv3 VLAN pseudowire support.
The kernel keeps a struct l2tp_tunnel context per L2TP tunnel. The
l2tp_tunnel is always associated with a UDP or L2TP/IP socket and
keeps a list of sessions in the tunnel. When a tunnel is first
registered with L2TP core, the reference count on the socket is
increased. This ensures that the socket cannot be removed while L2TP's
data structures reference it.
Add L2TPv3 IP pseudowire support.
Tunnels are identified by a unique tunnel id. The id is 16-bit for
L2TPv2 and 32-bit for L2TPv3. Internally, the id is stored as a 32-bit
value.
Add L2TPv3 ATM pseudowire support.
Tunnels are kept in a per-net list, indexed by tunnel id. The tunnel
id namespace is shared by L2TPv2 and L2TPv3. The tunnel context can be
derived from the socket's sk_user_data.
Handling tunnel socket close is perhaps the most tricky part of the
L2TP implementation. If userspace closes a tunnel socket, the L2TP
tunnel and all of its sessions must be closed and destroyed. Since the
tunnel context holds a ref on the tunnel socket, the socket's
sk_destruct won't be called until the tunnel sock_put's its
socket. For UDP sockets, when userspace closes the tunnel socket, the
socket's encap_destroy handler is invoked, which L2TP uses to initiate
its tunnel close actions. For L2TPIP sockets, the socket's close
handler initiates the same tunnel close actions. All sessions are
first closed. Each session drops its tunnel ref. When the tunnel ref
reaches zero, the tunnel puts its socket ref. When the socket is
eventually destroyed, it's sk_destruct finally frees the L2TP tunnel
context.
Sessions
--------
The kernel keeps a struct l2tp_session context for each session. Each
session has private data which is used for data specific to the
session type. With L2TPv2, the session always carries PPP
traffic. With L2TPv3, the session can carry Ethernet frames (Ethernet
pseudowire) or other data types such as PPP, ATM, HDLC or Frame
Relay. Linux currently implements only Ethernet and PPP session types.
Some L2TP session types also have a socket (PPP pseudowires) while
others do not (Ethernet pseudowires). We can't therefore use the
socket reference count as the reference count for session
contexts. The L2TP implementation therefore has its own internal
reference counts on the session contexts.
Like tunnels, L2TP sessions are identified by a unique
session id. Just as with tunnel ids, the session id is 16-bit for
L2TPv2 and 32-bit for L2TPv3. Internally, the id is stored as a 32-bit
value.
Sessions hold a ref on their parent tunnel to ensure that the tunnel
stays extant while one or more sessions references it.
Sessions are kept in a per-tunnel list, indexed by session id. L2TPv3
sessions are also kept in a per-net list indexed by session id,
because L2TPv3 session ids are unique across all tunnels and L2TPv3
data packets do not contain a tunnel id in the header. This list is
therefore needed to find the session context associated with a
received data packet when the tunnel context cannot be derived from
the tunnel socket.
Although the L2TPv3 RFC specifies that L2TPv3 session ids are not
scoped by the tunnel, the kernel does not police this for L2TPv3 UDP
tunnels and does not add sessions of L2TPv3 UDP tunnels into the
per-net session list. In the UDP receive code, we must trust that the
tunnel can be identified using the tunnel socket's sk_user_data and
lookup the session in the tunnel's session list instead of the per-net
session list.
PPP
---
`net/l2tp/l2tp_ppp.c`_ implements the PPPoL2TP socket family. Each PPP
session has a PPPoL2TP socket.
The PPPoL2TP socket's sk_user_data references the l2tp_session.
Userspace sends and receives PPP packets over L2TP using a PPPoL2TP
socket. Only PPP control frames pass over this socket: PPP data
packets are handled entirely by the kernel, passing between the L2TP
session and its associated ``pppN`` netdev through the PPP channel
interface of the kernel PPP subsystem.
The L2TP PPP implementation handles the closing of a PPPoL2TP socket
by closing its corresponding L2TP session. This is complicated because
it must consider racing with netlink session create/destroy requests
and pppol2tp_connect trying to reconnect with a session that is in the
process of being closed. Unlike tunnels, PPP sessions do not hold a
ref on their associated socket, so code must be careful to sock_hold
the socket where necessary. For all the details, see commit
3d609342cc04129ff7568e19316ce3d7451a27e8.
Ethernet
--------
`net/l2tp/l2tp_eth.c`_ implements L2TPv3 Ethernet pseudowires. It
manages a netdev for each session.
L2TP Ethernet sessions are created and destroyed by netlink request,
or are destroyed when the tunnel is destroyed. Unlike PPP sessions,
Ethernet sessions do not have an associated socket.
Miscellaneous
=============
The L2TP drivers were developed as part of the OpenL2TP project by
Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server,
designed from the ground up to have the L2TP datapath in the
kernel. The project also implemented the pppol2tp plugin for pppd
which allows pppd to use the kernel driver. Details can be found at
http://www.openl2tp.org.
RFCs
----
The kernel code implements the datapath features specified in the
following RFCs:
======= =============== ===================================
RFC2661 L2TPv2 https://tools.ietf.org/html/rfc2661
RFC3931 L2TPv3 https://tools.ietf.org/html/rfc3931
RFC4719 L2TPv3 Ethernet https://tools.ietf.org/html/rfc4719
======= =============== ===================================
Implementations
---------------
A number of open source applications use the L2TP kernel subsystem:
============ ==============================================
iproute2 https://github.com/shemminger/iproute2
go-l2tp https://github.com/katalix/go-l2tp
tunneldigger https://github.com/wlanslovenija/tunneldigger
xl2tpd https://github.com/xelerance/xl2tpd
============ ==============================================
Limitations
-----------
The current implementation has a number of limitations:
1) Multiple UDP sockets with the same 5-tuple address cannot be
used. The kernel's tunnel context is identified using private
data associated with the socket so it is important that each
socket is uniquely identified by its address.
2) Interfacing with openvswitch is not yet implemented. It may be
useful to map OVS Ethernet and VLAN ports into L2TPv3 tunnels.
3) VLAN pseudowires are implemented using an ``l2tpethN`` interface
configured with a VLAN sub-interface. Since L2TPv3 VLAN
pseudowires carry one and only one VLAN, it may be better to use
a single netdevice rather than an ``l2tpethN`` and ``l2tpethN``:M
pair per VLAN session. The netlink attribute
``L2TP_ATTR_VLAN_ID`` was added for this, but it was never
implemented.
Testing
-------
Unmanaged L2TPv3 Ethernet features are tested by the kernel's built-in
selftests. See `tools/testing/selftests/net/l2tp.sh`_.
Another test suite, l2tp-ktest_, covers all
of the L2TP APIs and tunnel/session types. This may be integrated into
the kernel's built-in L2TP selftests in the future.
.. Links
.. _Generic Netlink: generic_netlink.html
.. _libmnl: https://www.netfilter.org/projects/libmnl
.. _include/uapi/linux/l2tp.h: ../../../include/uapi/linux/l2tp.h
.. _include/linux/if_pppol2tp.h: ../../../include/linux/if_pppol2tp.h
.. _net/l2tp/l2tp_ip.c: ../../../net/l2tp/l2tp_ip.c
.. _net/l2tp/l2tp_ip6.c: ../../../net/l2tp/l2tp_ip6.c
.. _net/l2tp/l2tp_ppp.c: ../../../net/l2tp/l2tp_ppp.c
.. _net/l2tp/l2tp_eth.c: ../../../net/l2tp/l2tp_eth.c
.. _tools/testing/selftests/net/l2tp.sh: ../../../tools/testing/selftests/net/l2tp.sh
.. _l2tp-ktest: https://github.com/katalix/l2tp-ktest

View File

@@ -465,9 +465,9 @@ XPS Configuration
-----------------
XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
default for SMP). The functionality remains disabled until explicitly
configured. To enable XPS, the bitmap of CPUs/receive-queues that may
use a transmit queue is configured using the sysfs file entry:
default for SMP). If compiled in, it is driver dependent whether, and
how, XPS is configured at device init. The mapping of CPUs/receive-queues
to transmit queue can be inspected and configured using sysfs:
For selection based on CPUs map::

View File

@@ -0,0 +1,179 @@
.. SPDX-License-Identifier: GPL-2.0
====================
Interface statistics
====================
Overview
========
This document is a guide to Linux network interface statistics.
There are three main sources of interface statistics in Linux:
- standard interface statistics based on
:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`;
- protocol-specific statistics; and
- driver-defined statistics available via ethtool.
Standard interface statistics
-----------------------------
There are multiple interfaces to reach the standard statistics.
Most commonly used is the `ip` command from `iproute2`::
$ ip -s -s link show dev ens4u1u1
6: ens4u1u1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 48:2a:e3:4c:b1:d1 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
74327665117 69016965 0 0 0 0
RX errors: length crc frame fifo missed
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
21405556176 44608960 0 0 0 0
TX errors: aborted fifo window heartbeat transns
0 0 0 0 128
altname enp58s0u1u1
Note that `-s` has been specified twice to see all members of
:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`.
If `-s` is specified once the detailed errors won't be shown.
`ip` supports JSON formatting via the `-j` option.
Protocol-specific statistics
----------------------------
Some of the interfaces used for configuring devices are also able
to report related statistics. For example ethtool interface used
to configure pause frames can report corresponding hardware counters::
$ ethtool --include-statistics -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX: on
TX: on
Statistics:
tx_pause_frames: 1
rx_pause_frames: 1
Driver-defined statistics
-------------------------
Driver-defined ethtool statistics can be dumped using `ethtool -S $ifc`, e.g.::
$ ethtool -S ens4u1u1
NIC statistics:
tx_single_collisions: 0
tx_multi_collisions: 0
uAPIs
=====
procfs
------
The historical `/proc/net/dev` text interface gives access to the list
of interfaces as well as their statistics.
Note that even though this interface is using
:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`
internally it combines some of the fields.
sysfs
-----
Each device directory in sysfs contains a `statistics` directory (e.g.
`/sys/class/net/lo/statistics/`) with files corresponding to
members of :c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`.
This simple interface is convenient especially in constrained/embedded
environments without access to tools. However, it's inefficient when
reading multiple stats as it internally performs a full dump of
:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`
and reports only the stat corresponding to the accessed file.
Sysfs files are documented in
`Documentation/ABI/testing/sysfs-class-net-statistics`.
netlink
-------
`rtnetlink` (`NETLINK_ROUTE`) is the preferred method of accessing
:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>` stats.
Statistics are reported both in the responses to link information
requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`,
when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request).
ethtool
-------
Ethtool IOCTL interface allows drivers to report implementation
specific statistics. Historically it has also been used to report
statistics for which other APIs did not exist, like per-device-queue
statistics, or standard-based statistics (e.g. RFC 2863).
Statistics and their string identifiers are retrieved separately.
Identifiers via `ETHTOOL_GSTRINGS` with `string_set` set to `ETH_SS_STATS`,
and values via `ETHTOOL_GSTATS`. User space should use `ETHTOOL_GDRVINFO`
to retrieve the number of statistics (`.n_stats`).
ethtool-netlink
---------------
Ethtool netlink is a replacement for the older IOCTL interface.
Protocol-related statistics can be requested in get commands by setting
the `ETHTOOL_FLAG_STATS` flag in `ETHTOOL_A_HEADER_FLAGS`. Currently
statistics are supported in the following commands:
- `ETHTOOL_MSG_PAUSE_GET`
debugfs
-------
Some drivers expose extra statistics via `debugfs`.
struct rtnl_link_stats64
========================
.. kernel-doc:: include/uapi/linux/if_link.h
:identifiers: rtnl_link_stats64
Notes for driver authors
========================
Drivers should report all statistics which have a matching member in
:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>` exclusively
via `.ndo_get_stats64`. Reporting such standard stats via ethtool
or debugfs will not be accepted.
Drivers must ensure best possible compliance with
:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`.
Please note for example that detailed error statistics must be
added into the general `rx_error` / `tx_error` counters.
The `.ndo_get_stats64` callback can not sleep because of accesses
via `/proc/net/dev`. If driver may sleep when retrieving the statistics
from the device it should do so periodically asynchronously and only return
a recent copy from `.ndo_get_stats64`. Ethtool interrupt coalescing interface
allows setting the frequency of refreshing statistics, if needed.
Retrieving ethtool statistics is a multi-syscall process, drivers are advised
to keep the number of statistics constant to avoid race conditions with
user space trying to read them.
Statistics must persist across routine operations like bringing the interface
down and up.
Kernel-internal data structures
-------------------------------
The following structures are internal to the kernel, their members are
translated to netlink attributes when dumped. Drivers must not overwrite
the statistics they don't report with 0.
.. kernel-doc:: include/linux/ethtool.h
:identifiers: ethtool_pause_stats

View File

@@ -58,3 +58,31 @@ forwarding table using the new bridge command.
3. Show forwarding table::
# bridge fdb show dev vxlan0
The following NIC features may indicate support for UDP tunnel-related
offloads (most commonly VXLAN features, but support for a particular
encapsulation protocol is NIC specific):
- `tx-udp_tnl-segmentation`
- `tx-udp_tnl-csum-segmentation`
ability to perform TCP segmentation offload of UDP encapsulated frames
- `rx-udp_tunnel-port-offload`
receive side parsing of UDP encapsulated frames which allows NICs to
perform protocol-aware offloads, like checksum validation offload of
inner frames (only needed by NICs without protocol-agnostic offloads)
For devices supporting `rx-udp_tunnel-port-offload` the list of currently
offloaded ports can be interrogated with `ethtool`::
$ ethtool --show-tunnels eth0
Tunnel information for eth0:
UDP port table 0:
Size: 4
Types: vxlan
No entries
UDP port table 1:
Size: 4
Types: geneve, vxlan-gpe
Entries (1):
port 1230, vxlan-gpe