docs: networking: reorganize driver documentation again
Organize driver documentation by device type. Most documents have fairly verbose yet uninformative names, so let users first select a well defined device type, and then search for a particular driver. While at it rename the section from Vendor drivers to Hardware drivers. This seems more accurate, besides people sometimes refer to out-of-tree drivers as vendor drivers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:

committed by
David S. Miller

parent
ab696fa70f
commit
132db93572
249
Documentation/networking/device_drivers/ethernet/3com/3c509.rst
Normal file
249
Documentation/networking/device_drivers/ethernet/3com/3c509.rst
Normal file
@@ -0,0 +1,249 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=============================================================================
|
||||
Linux and the 3Com EtherLink III Series Ethercards (driver v1.18c and higher)
|
||||
=============================================================================
|
||||
|
||||
This file contains the instructions and caveats for v1.18c and higher versions
|
||||
of the 3c509 driver. You should not use the driver without reading this file.
|
||||
|
||||
release 1.0
|
||||
|
||||
28 February 2002
|
||||
|
||||
Current maintainer (corrections to):
|
||||
David Ruggiero <jdr@farfalle.com>
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The following are notes and information on using the 3Com EtherLink III series
|
||||
ethercards in Linux. These cards are commonly known by the most widely-used
|
||||
card's 3Com model number, 3c509. They are all 10mb/s ISA-bus cards and shouldn't
|
||||
be (but sometimes are) confused with the similarly-numbered PCI-bus "3c905"
|
||||
(aka "Vortex" or "Boomerang") series. Kernel support for the 3c509 family is
|
||||
provided by the module 3c509.c, which has code to support all of the following
|
||||
models:
|
||||
|
||||
- 3c509 (original ISA card)
|
||||
- 3c509B (later revision of the ISA card; supports full-duplex)
|
||||
- 3c589 (PCMCIA)
|
||||
- 3c589B (later revision of the 3c589; supports full-duplex)
|
||||
- 3c579 (EISA)
|
||||
|
||||
Large portions of this documentation were heavily borrowed from the guide
|
||||
written the original author of the 3c509 driver, Donald Becker. The master
|
||||
copy of that document, which contains notes on older versions of the driver,
|
||||
currently resides on Scyld web server: http://www.scyld.com/.
|
||||
|
||||
|
||||
Special Driver Features
|
||||
=======================
|
||||
|
||||
Overriding card settings
|
||||
|
||||
The driver allows boot- or load-time overriding of the card's detected IOADDR,
|
||||
IRQ, and transceiver settings, although this capability shouldn't generally be
|
||||
needed except to enable full-duplex mode (see below). An example of the syntax
|
||||
for LILO parameters for doing this::
|
||||
|
||||
ether=10,0x310,3,0x3c509,eth0
|
||||
|
||||
This configures the first found 3c509 card for IRQ 10, base I/O 0x310, and
|
||||
transceiver type 3 (10base2). The flag "0x3c509" must be set to avoid conflicts
|
||||
with other card types when overriding the I/O address. When the driver is
|
||||
loaded as a module, only the IRQ may be overridden. For example,
|
||||
setting two cards to IRQ10 and IRQ11 is done by using the irq module
|
||||
option::
|
||||
|
||||
options 3c509 irq=10,11
|
||||
|
||||
|
||||
Full-duplex mode
|
||||
================
|
||||
|
||||
The v1.18c driver added support for the 3c509B's full-duplex capabilities.
|
||||
In order to enable and successfully use full-duplex mode, three conditions
|
||||
must be met:
|
||||
|
||||
(a) You must have a Etherlink III card model whose hardware supports full-
|
||||
duplex operations. Currently, the only members of the 3c509 family that are
|
||||
positively known to support full-duplex are the 3c509B (ISA bus) and 3c589B
|
||||
(PCMCIA) cards. Cards without the "B" model designation do *not* support
|
||||
full-duplex mode; these include the original 3c509 (no "B"), the original
|
||||
3c589, the 3c529 (MCA bus), and the 3c579 (EISA bus).
|
||||
|
||||
(b) You must be using your card's 10baseT transceiver (i.e., the RJ-45
|
||||
connector), not its AUI (thick-net) or 10base2 (thin-net/coax) interfaces.
|
||||
AUI and 10base2 network cabling is physically incapable of full-duplex
|
||||
operation.
|
||||
|
||||
(c) Most importantly, your 3c509B must be connected to a link partner that is
|
||||
itself full-duplex capable. This is almost certainly one of two things: a full-
|
||||
duplex-capable Ethernet switch (*not* a hub), or a full-duplex-capable NIC on
|
||||
another system that's connected directly to the 3c509B via a crossover cable.
|
||||
|
||||
Full-duplex mode can be enabled using 'ethtool'.
|
||||
|
||||
.. warning::
|
||||
|
||||
Extremely important caution concerning full-duplex mode
|
||||
|
||||
Understand that the 3c509B's hardware's full-duplex support is much more
|
||||
limited than that provide by more modern network interface cards. Although
|
||||
at the physical layer of the network it fully supports full-duplex operation,
|
||||
the card was designed before the current Ethernet auto-negotiation (N-way)
|
||||
spec was written. This means that the 3c509B family ***cannot and will not
|
||||
auto-negotiate a full-duplex connection with its link partner under any
|
||||
circumstances, no matter how it is initialized***. If the full-duplex mode
|
||||
of the 3c509B is enabled, its link partner will very likely need to be
|
||||
independently _forced_ into full-duplex mode as well; otherwise various nasty
|
||||
failures will occur - at the very least, you'll see massive numbers of packet
|
||||
collisions. This is one of very rare circumstances where disabling auto-
|
||||
negotiation and forcing the duplex mode of a network interface card or switch
|
||||
would ever be necessary or desirable.
|
||||
|
||||
|
||||
Available Transceiver Types
|
||||
===========================
|
||||
|
||||
For versions of the driver v1.18c and above, the available transceiver types are:
|
||||
|
||||
== =========================================================================
|
||||
0 transceiver type from EEPROM config (normally 10baseT); force half-duplex
|
||||
1 AUI (thick-net / DB15 connector)
|
||||
2 (undefined)
|
||||
3 10base2 (thin-net == coax / BNC connector)
|
||||
4 10baseT (RJ-45 connector); force half-duplex mode
|
||||
8 transceiver type and duplex mode taken from card's EEPROM config settings
|
||||
12 10baseT (RJ-45 connector); force full-duplex mode
|
||||
== =========================================================================
|
||||
|
||||
Prior to driver version 1.18c, only transceiver codes 0-4 were supported. Note
|
||||
that the new transceiver codes 8 and 12 are the *only* ones that will enable
|
||||
full-duplex mode, no matter what the card's detected EEPROM settings might be.
|
||||
This insured that merely upgrading the driver from an earlier version would
|
||||
never automatically enable full-duplex mode in an existing installation;
|
||||
it must always be explicitly enabled via one of these code in order to be
|
||||
activated.
|
||||
|
||||
The transceiver type can be changed using 'ethtool'.
|
||||
|
||||
|
||||
Interpretation of error messages and common problems
|
||||
----------------------------------------------------
|
||||
|
||||
Error Messages
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
eth0: Infinite loop in interrupt, status 2011.
|
||||
These are "mostly harmless" message indicating that the driver had too much
|
||||
work during that interrupt cycle. With a status of 0x2011 you are receiving
|
||||
packets faster than they can be removed from the card. This should be rare
|
||||
or impossible in normal operation. Possible causes of this error report are:
|
||||
|
||||
- a "green" mode enabled that slows the processor down when there is no
|
||||
keyboard activity.
|
||||
|
||||
- some other device or device driver hogging the bus or disabling interrupts.
|
||||
Check /proc/interrupts for excessive interrupt counts. The timer tick
|
||||
interrupt should always be incrementing faster than the others.
|
||||
|
||||
No received packets
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If a 3c509, 3c562 or 3c589 can successfully transmit packets, but never
|
||||
receives packets (as reported by /proc/net/dev or 'ifconfig') you likely
|
||||
have an interrupt line problem. Check /proc/interrupts to verify that the
|
||||
card is actually generating interrupts. If the interrupt count is not
|
||||
increasing you likely have a physical conflict with two devices trying to
|
||||
use the same ISA IRQ line. The common conflict is with a sound card on IRQ10
|
||||
or IRQ5, and the easiest solution is to move the 3c509 to a different
|
||||
interrupt line. If the device is receiving packets but 'ping' doesn't work,
|
||||
you have a routing problem.
|
||||
|
||||
Tx Carrier Errors Reported in /proc/net/dev
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
||||
If an EtherLink III appears to transmit packets, but the "Tx carrier errors"
|
||||
field in /proc/net/dev increments as quickly as the Tx packet count, you
|
||||
likely have an unterminated network or the incorrect media transceiver selected.
|
||||
|
||||
3c509B card is not detected on machines with an ISA PnP BIOS.
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
While the updated driver works with most PnP BIOS programs, it does not work
|
||||
with all. This can be fixed by disabling PnP support using the 3Com-supplied
|
||||
setup program.
|
||||
|
||||
3c509 card is not detected on overclocked machines
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Increase the delay time in id_read_eeprom() from the current value, 500,
|
||||
to an absurdly high value, such as 5000.
|
||||
|
||||
|
||||
Decoding Status and Error Messages
|
||||
----------------------------------
|
||||
|
||||
|
||||
The bits in the main status register are:
|
||||
|
||||
===== ======================================
|
||||
value description
|
||||
===== ======================================
|
||||
0x01 Interrupt latch
|
||||
0x02 Tx overrun, or Rx underrun
|
||||
0x04 Tx complete
|
||||
0x08 Tx FIFO room available
|
||||
0x10 A complete Rx packet has arrived
|
||||
0x20 A Rx packet has started to arrive
|
||||
0x40 The driver has requested an interrupt
|
||||
0x80 Statistics counter nearly full
|
||||
===== ======================================
|
||||
|
||||
The bits in the transmit (Tx) status word are:
|
||||
|
||||
===== ============================================
|
||||
value description
|
||||
===== ============================================
|
||||
0x02 Out-of-window collision.
|
||||
0x04 Status stack overflow (normally impossible).
|
||||
0x08 16 collisions.
|
||||
0x10 Tx underrun (not enough PCI bus bandwidth).
|
||||
0x20 Tx jabber.
|
||||
0x40 Tx interrupt requested.
|
||||
0x80 Status is valid (this should always be set).
|
||||
===== ============================================
|
||||
|
||||
|
||||
When a transmit error occurs the driver produces a status message such as::
|
||||
|
||||
eth0: Transmit error, Tx status register 82
|
||||
|
||||
The two values typically seen here are:
|
||||
|
||||
0x82
|
||||
^^^^
|
||||
|
||||
Out of window collision. This typically occurs when some other Ethernet
|
||||
host is incorrectly set to full duplex on a half duplex network.
|
||||
|
||||
0x88
|
||||
^^^^
|
||||
|
||||
16 collisions. This typically occurs when the network is exceptionally busy
|
||||
or when another host doesn't correctly back off after a collision. If this
|
||||
error is mixed with 0x82 errors it is the result of a host incorrectly set
|
||||
to full duplex (see above).
|
||||
|
||||
Both of these errors are the result of network problems that should be
|
||||
corrected. They do not represent driver malfunction.
|
||||
|
||||
|
||||
Revision history (this file)
|
||||
============================
|
||||
|
||||
28Feb02 v1.0 DR New; major portions based on Becker original 3c509 docs
|
||||
|
459
Documentation/networking/device_drivers/ethernet/3com/vortex.rst
Normal file
459
Documentation/networking/device_drivers/ethernet/3com/vortex.rst
Normal file
@@ -0,0 +1,459 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================
|
||||
3Com Vortex device driver
|
||||
=========================
|
||||
|
||||
Andrew Morton
|
||||
|
||||
30 April 2000
|
||||
|
||||
|
||||
This document describes the usage and errata of the 3Com "Vortex" device
|
||||
driver for Linux, 3c59x.c.
|
||||
|
||||
The driver was written by Donald Becker <becker@scyld.com>
|
||||
|
||||
Don is no longer the prime maintainer of this version of the driver.
|
||||
Please report problems to one or more of:
|
||||
|
||||
- Andrew Morton
|
||||
- Netdev mailing list <netdev@vger.kernel.org>
|
||||
- Linux kernel mailing list <linux-kernel@vger.kernel.org>
|
||||
|
||||
Please note the 'Reporting and Diagnosing Problems' section at the end
|
||||
of this file.
|
||||
|
||||
|
||||
Since kernel 2.3.99-pre6, this driver incorporates the support for the
|
||||
3c575-series Cardbus cards which used to be handled by 3c575_cb.c.
|
||||
|
||||
This driver supports the following hardware:
|
||||
|
||||
- 3c590 Vortex 10Mbps
|
||||
- 3c592 EISA 10Mbps Demon/Vortex
|
||||
- 3c597 EISA Fast Demon/Vortex
|
||||
- 3c595 Vortex 100baseTx
|
||||
- 3c595 Vortex 100baseT4
|
||||
- 3c595 Vortex 100base-MII
|
||||
- 3c900 Boomerang 10baseT
|
||||
- 3c900 Boomerang 10Mbps Combo
|
||||
- 3c900 Cyclone 10Mbps TPO
|
||||
- 3c900 Cyclone 10Mbps Combo
|
||||
- 3c900 Cyclone 10Mbps TPC
|
||||
- 3c900B-FL Cyclone 10base-FL
|
||||
- 3c905 Boomerang 100baseTx
|
||||
- 3c905 Boomerang 100baseT4
|
||||
- 3c905B Cyclone 100baseTx
|
||||
- 3c905B Cyclone 10/100/BNC
|
||||
- 3c905B-FX Cyclone 100baseFx
|
||||
- 3c905C Tornado
|
||||
- 3c920B-EMB-WNM (ATI Radeon 9100 IGP)
|
||||
- 3c980 Cyclone
|
||||
- 3c980C Python-T
|
||||
- 3cSOHO100-TX Hurricane
|
||||
- 3c555 Laptop Hurricane
|
||||
- 3c556 Laptop Tornado
|
||||
- 3c556B Laptop Hurricane
|
||||
- 3c575 [Megahertz] 10/100 LAN CardBus
|
||||
- 3c575 Boomerang CardBus
|
||||
- 3CCFE575BT Cyclone CardBus
|
||||
- 3CCFE575CT Tornado CardBus
|
||||
- 3CCFE656 Cyclone CardBus
|
||||
- 3CCFEM656B Cyclone+Winmodem CardBus
|
||||
- 3CXFEM656C Tornado+Winmodem CardBus
|
||||
- 3c450 HomePNA Tornado
|
||||
- 3c920 Tornado
|
||||
- 3c982 Hydra Dual Port A
|
||||
- 3c982 Hydra Dual Port B
|
||||
- 3c905B-T4
|
||||
- 3c920B-EMB-WNM Tornado
|
||||
|
||||
Module parameters
|
||||
=================
|
||||
|
||||
There are several parameters which may be provided to the driver when
|
||||
its module is loaded. These are usually placed in ``/etc/modprobe.d/*.conf``
|
||||
configuration files. Example::
|
||||
|
||||
options 3c59x debug=3 rx_copybreak=300
|
||||
|
||||
If you are using the PCMCIA tools (cardmgr) then the options may be
|
||||
placed in /etc/pcmcia/config.opts::
|
||||
|
||||
module "3c59x" opts "debug=3 rx_copybreak=300"
|
||||
|
||||
|
||||
The supported parameters are:
|
||||
|
||||
debug=N
|
||||
|
||||
Where N is a number from 0 to 7. Anything above 3 produces a lot
|
||||
of output in your system logs. debug=1 is default.
|
||||
|
||||
options=N1,N2,N3,...
|
||||
|
||||
Each number in the list provides an option to the corresponding
|
||||
network card. So if you have two 3c905's and you wish to provide
|
||||
them with option 0x204 you would use::
|
||||
|
||||
options=0x204,0x204
|
||||
|
||||
The individual options are composed of a number of bitfields which
|
||||
have the following meanings:
|
||||
|
||||
Possible media type settings
|
||||
|
||||
== =================================
|
||||
0 10baseT
|
||||
1 10Mbs AUI
|
||||
2 undefined
|
||||
3 10base2 (BNC)
|
||||
4 100base-TX
|
||||
5 100base-FX
|
||||
6 MII (Media Independent Interface)
|
||||
7 Use default setting from EEPROM
|
||||
8 Autonegotiate
|
||||
9 External MII
|
||||
10 Use default setting from EEPROM
|
||||
== =================================
|
||||
|
||||
When generating a value for the 'options' setting, the above media
|
||||
selection values may be OR'ed (or added to) the following:
|
||||
|
||||
====== =============================================
|
||||
0x8000 Set driver debugging level to 7
|
||||
0x4000 Set driver debugging level to 2
|
||||
0x0400 Enable Wake-on-LAN
|
||||
0x0200 Force full duplex mode.
|
||||
0x0010 Bus-master enable bit (Old Vortex cards only)
|
||||
====== =============================================
|
||||
|
||||
For example::
|
||||
|
||||
insmod 3c59x options=0x204
|
||||
|
||||
will force full-duplex 100base-TX, rather than allowing the usual
|
||||
autonegotiation.
|
||||
|
||||
global_options=N
|
||||
|
||||
Sets the ``options`` parameter for all 3c59x NICs in the machine.
|
||||
Entries in the ``options`` array above will override any setting of
|
||||
this.
|
||||
|
||||
full_duplex=N1,N2,N3...
|
||||
|
||||
Similar to bit 9 of 'options'. Forces the corresponding card into
|
||||
full-duplex mode. Please use this in preference to the ``options``
|
||||
parameter.
|
||||
|
||||
In fact, please don't use this at all! You're better off getting
|
||||
autonegotiation working properly.
|
||||
|
||||
global_full_duplex=N1
|
||||
|
||||
Sets full duplex mode for all 3c59x NICs in the machine. Entries
|
||||
in the ``full_duplex`` array above will override any setting of this.
|
||||
|
||||
flow_ctrl=N1,N2,N3...
|
||||
|
||||
Use 802.3x MAC-layer flow control. The 3com cards only support the
|
||||
PAUSE command, which means that they will stop sending packets for a
|
||||
short period if they receive a PAUSE frame from the link partner.
|
||||
|
||||
The driver only allows flow control on a link which is operating in
|
||||
full duplex mode.
|
||||
|
||||
This feature does not appear to work on the 3c905 - only 3c905B and
|
||||
3c905C have been tested.
|
||||
|
||||
The 3com cards appear to only respond to PAUSE frames which are
|
||||
sent to the reserved destination address of 01:80:c2:00:00:01. They
|
||||
do not honour PAUSE frames which are sent to the station MAC address.
|
||||
|
||||
rx_copybreak=M
|
||||
|
||||
The driver preallocates 32 full-sized (1536 byte) network buffers
|
||||
for receiving. When a packet arrives, the driver has to decide
|
||||
whether to leave the packet in its full-sized buffer, or to allocate
|
||||
a smaller buffer and copy the packet across into it.
|
||||
|
||||
This is a speed/space tradeoff.
|
||||
|
||||
The value of rx_copybreak is used to decide when to make the copy.
|
||||
If the packet size is less than rx_copybreak, the packet is copied.
|
||||
The default value for rx_copybreak is 200 bytes.
|
||||
|
||||
max_interrupt_work=N
|
||||
|
||||
The driver's interrupt service routine can handle many receive and
|
||||
transmit packets in a single invocation. It does this in a loop.
|
||||
The value of max_interrupt_work governs how many times the interrupt
|
||||
service routine will loop. The default value is 32 loops. If this
|
||||
is exceeded the interrupt service routine gives up and generates a
|
||||
warning message "eth0: Too much work in interrupt".
|
||||
|
||||
hw_checksums=N1,N2,N3,...
|
||||
|
||||
Recent 3com NICs are able to generate IPv4, TCP and UDP checksums
|
||||
in hardware. Linux has used the Rx checksumming for a long time.
|
||||
The "zero copy" patch which is planned for the 2.4 kernel series
|
||||
allows you to make use of the NIC's DMA scatter/gather and transmit
|
||||
checksumming as well.
|
||||
|
||||
The driver is set up so that, when the zerocopy patch is applied,
|
||||
all Tornado and Cyclone devices will use S/G and Tx checksums.
|
||||
|
||||
This module parameter has been provided so you can override this
|
||||
decision. If you think that Tx checksums are causing a problem, you
|
||||
may disable the feature with ``hw_checksums=0``.
|
||||
|
||||
If you think your NIC should be performing Tx checksumming and the
|
||||
driver isn't enabling it, you can force the use of hardware Tx
|
||||
checksumming with ``hw_checksums=1``.
|
||||
|
||||
The driver drops a message in the logfiles to indicate whether or
|
||||
not it is using hardware scatter/gather and hardware Tx checksums.
|
||||
|
||||
Scatter/gather and hardware checksums provide considerable
|
||||
performance improvement for the sendfile() system call, but a small
|
||||
decrease in throughput for send(). There is no effect upon receive
|
||||
efficiency.
|
||||
|
||||
compaq_ioaddr=N,
|
||||
compaq_irq=N,
|
||||
compaq_device_id=N
|
||||
|
||||
"Variables to work-around the Compaq PCI BIOS32 problem"....
|
||||
|
||||
watchdog=N
|
||||
|
||||
Sets the time duration (in milliseconds) after which the kernel
|
||||
decides that the transmitter has become stuck and needs to be reset.
|
||||
This is mainly for debugging purposes, although it may be advantageous
|
||||
to increase this value on LANs which have very high collision rates.
|
||||
The default value is 5000 (5.0 seconds).
|
||||
|
||||
enable_wol=N1,N2,N3,...
|
||||
|
||||
Enable Wake-on-LAN support for the relevant interface. Donald
|
||||
Becker's ``ether-wake`` application may be used to wake suspended
|
||||
machines.
|
||||
|
||||
Also enables the NIC's power management support.
|
||||
|
||||
global_enable_wol=N
|
||||
|
||||
Sets enable_wol mode for all 3c59x NICs in the machine. Entries in
|
||||
the ``enable_wol`` array above will override any setting of this.
|
||||
|
||||
Media selection
|
||||
---------------
|
||||
|
||||
A number of the older NICs such as the 3c590 and 3c900 series have
|
||||
10base2 and AUI interfaces.
|
||||
|
||||
Prior to January, 2001 this driver would autoeselect the 10base2 or AUI
|
||||
port if it didn't detect activity on the 10baseT port. It would then
|
||||
get stuck on the 10base2 port and a driver reload was necessary to
|
||||
switch back to 10baseT. This behaviour could not be prevented with a
|
||||
module option override.
|
||||
|
||||
Later (current) versions of the driver _do_ support locking of the
|
||||
media type. So if you load the driver module with
|
||||
|
||||
modprobe 3c59x options=0
|
||||
|
||||
it will permanently select the 10baseT port. Automatic selection of
|
||||
other media types does not occur.
|
||||
|
||||
|
||||
Transmit error, Tx status register 82
|
||||
-------------------------------------
|
||||
|
||||
This is a common error which is almost always caused by another host on
|
||||
the same network being in full-duplex mode, while this host is in
|
||||
half-duplex mode. You need to find that other host and make it run in
|
||||
half-duplex mode or fix this host to run in full-duplex mode.
|
||||
|
||||
As a last resort, you can force the 3c59x driver into full-duplex mode
|
||||
with
|
||||
|
||||
options 3c59x full_duplex=1
|
||||
|
||||
but this has to be viewed as a workaround for broken network gear and
|
||||
should only really be used for equipment which cannot autonegotiate.
|
||||
|
||||
|
||||
Additional resources
|
||||
--------------------
|
||||
|
||||
Details of the device driver implementation are at the top of the source file.
|
||||
|
||||
Additional documentation is available at Don Becker's Linux Drivers site:
|
||||
|
||||
http://www.scyld.com/vortex.html
|
||||
|
||||
Donald Becker's driver development site:
|
||||
|
||||
http://www.scyld.com/network.html
|
||||
|
||||
Donald's vortex-diag program is useful for inspecting the NIC's state:
|
||||
|
||||
http://www.scyld.com/ethercard_diag.html
|
||||
|
||||
Donald's mii-diag program may be used for inspecting and manipulating
|
||||
the NIC's Media Independent Interface subsystem:
|
||||
|
||||
http://www.scyld.com/ethercard_diag.html#mii-diag
|
||||
|
||||
Donald's wake-on-LAN page:
|
||||
|
||||
http://www.scyld.com/wakeonlan.html
|
||||
|
||||
3Com's DOS-based application for setting up the NICs EEPROMs:
|
||||
|
||||
ftp://ftp.3com.com/pub/nic/3c90x/3c90xx2.exe
|
||||
|
||||
|
||||
Autonegotiation notes
|
||||
---------------------
|
||||
|
||||
The driver uses a one-minute heartbeat for adapting to changes in
|
||||
the external LAN environment if link is up and 5 seconds if link is down.
|
||||
This means that when, for example, a machine is unplugged from a hubbed
|
||||
10baseT LAN plugged into a switched 100baseT LAN, the throughput
|
||||
will be quite dreadful for up to sixty seconds. Be patient.
|
||||
|
||||
Cisco interoperability note from Walter Wong <wcw+@CMU.EDU>:
|
||||
|
||||
On a side note, adding HAS_NWAY seems to share a problem with the
|
||||
Cisco 6509 switch. Specifically, you need to change the spanning
|
||||
tree parameter for the port the machine is plugged into to 'portfast'
|
||||
mode. Otherwise, the negotiation fails. This has been an issue
|
||||
we've noticed for a while but haven't had the time to track down.
|
||||
|
||||
Cisco switches (Jeff Busch <jbusch@deja.com>)
|
||||
|
||||
My "standard config" for ports to which PC's/servers connect directly::
|
||||
|
||||
interface FastEthernet0/N
|
||||
description machinename
|
||||
load-interval 30
|
||||
spanning-tree portfast
|
||||
|
||||
If autonegotiation is a problem, you may need to specify "speed
|
||||
100" and "duplex full" as well (or "speed 10" and "duplex half").
|
||||
|
||||
WARNING: DO NOT hook up hubs/switches/bridges to these
|
||||
specially-configured ports! The switch will become very confused.
|
||||
|
||||
|
||||
Reporting and diagnosing problems
|
||||
---------------------------------
|
||||
|
||||
Maintainers find that accurate and complete problem reports are
|
||||
invaluable in resolving driver problems. We are frequently not able to
|
||||
reproduce problems and must rely on your patience and efforts to get to
|
||||
the bottom of the problem.
|
||||
|
||||
If you believe you have a driver problem here are some of the
|
||||
steps you should take:
|
||||
|
||||
- Is it really a driver problem?
|
||||
|
||||
Eliminate some variables: try different cards, different
|
||||
computers, different cables, different ports on the switch/hub,
|
||||
different versions of the kernel or of the driver, etc.
|
||||
|
||||
- OK, it's a driver problem.
|
||||
|
||||
You need to generate a report. Typically this is an email to the
|
||||
maintainer and/or netdev@vger.kernel.org. The maintainer's
|
||||
email address will be in the driver source or in the MAINTAINERS file.
|
||||
|
||||
- The contents of your report will vary a lot depending upon the
|
||||
problem. If it's a kernel crash then you should refer to the
|
||||
admin-guide/reporting-bugs.rst file.
|
||||
|
||||
But for most problems it is useful to provide the following:
|
||||
|
||||
- Kernel version, driver version
|
||||
|
||||
- A copy of the banner message which the driver generates when
|
||||
it is initialised. For example:
|
||||
|
||||
eth0: 3Com PCI 3c905C Tornado at 0xa400, 00:50:da:6a:88:f0, IRQ 19
|
||||
8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
|
||||
MII transceiver found at address 24, status 782d.
|
||||
Enabling bus-master transmits and whole-frame receives.
|
||||
|
||||
NOTE: You must provide the ``debug=2`` modprobe option to generate
|
||||
a full detection message. Please do this::
|
||||
|
||||
modprobe 3c59x debug=2
|
||||
|
||||
- If it is a PCI device, the relevant output from 'lspci -vx', eg::
|
||||
|
||||
00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
|
||||
Subsystem: 3Com Corporation: Unknown device 9200
|
||||
Flags: bus master, medium devsel, latency 32, IRQ 19
|
||||
I/O ports at a400 [size=128]
|
||||
Memory at db000000 (32-bit, non-prefetchable) [size=128]
|
||||
Expansion ROM at <unassigned> [disabled] [size=128K]
|
||||
Capabilities: [dc] Power Management version 2
|
||||
00: b7 10 00 92 07 00 10 02 74 00 00 02 08 20 00 00
|
||||
10: 01 a4 00 00 00 00 00 db 00 00 00 00 00 00 00 00
|
||||
20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 00 10
|
||||
30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 0a 0a
|
||||
|
||||
- A description of the environment: 10baseT? 100baseT?
|
||||
full/half duplex? switched or hubbed?
|
||||
|
||||
- Any additional module parameters which you may be providing to the driver.
|
||||
|
||||
- Any kernel logs which are produced. The more the merrier.
|
||||
If this is a large file and you are sending your report to a
|
||||
mailing list, mention that you have the logfile, but don't send
|
||||
it. If you're reporting direct to the maintainer then just send
|
||||
it.
|
||||
|
||||
To ensure that all kernel logs are available, add the
|
||||
following line to /etc/syslog.conf::
|
||||
|
||||
kern.* /var/log/messages
|
||||
|
||||
Then restart syslogd with::
|
||||
|
||||
/etc/rc.d/init.d/syslog restart
|
||||
|
||||
(The above may vary, depending upon which Linux distribution you use).
|
||||
|
||||
- If your problem is reproducible then that's great. Try the
|
||||
following:
|
||||
|
||||
1) Increase the debug level. Usually this is done via:
|
||||
|
||||
a) modprobe driver debug=7
|
||||
b) In /etc/modprobe.d/driver.conf:
|
||||
options driver debug=7
|
||||
|
||||
2) Recreate the problem with the higher debug level,
|
||||
send all logs to the maintainer.
|
||||
|
||||
3) Download you card's diagnostic tool from Donald
|
||||
Becker's website <http://www.scyld.com/ethercard_diag.html>.
|
||||
Download mii-diag.c as well. Build these.
|
||||
|
||||
a) Run 'vortex-diag -aaee' and 'mii-diag -v' when the card is
|
||||
working correctly. Save the output.
|
||||
|
||||
b) Run the above commands when the card is malfunctioning. Send
|
||||
both sets of output.
|
||||
|
||||
Finally, please be patient and be prepared to do some work. You may
|
||||
end up working on this problem for a week or more as the maintainer
|
||||
asks more questions, asks for more tests, asks for patches to be
|
||||
applied, etc. At the end of it all, the problem may even remain
|
||||
unresolved.
|
344
Documentation/networking/device_drivers/ethernet/amazon/ena.rst
Normal file
344
Documentation/networking/device_drivers/ethernet/amazon/ena.rst
Normal file
@@ -0,0 +1,344 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============================================================
|
||||
Linux kernel driver for Elastic Network Adapter (ENA) family
|
||||
============================================================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
ENA is a networking interface designed to make good use of modern CPU
|
||||
features and system architectures.
|
||||
|
||||
The ENA device exposes a lightweight management interface with a
|
||||
minimal set of memory mapped registers and extendable command set
|
||||
through an Admin Queue.
|
||||
|
||||
The driver supports a range of ENA devices, is link-speed independent
|
||||
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
|
||||
a negotiated and extendable feature set.
|
||||
|
||||
Some ENA devices support SR-IOV. This driver is used for both the
|
||||
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
|
||||
|
||||
ENA devices enable high speed and low overhead network traffic
|
||||
processing by providing multiple Tx/Rx queue pairs (the maximum number
|
||||
is advertised by the device via the Admin Queue), a dedicated MSI-X
|
||||
interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation,
|
||||
and CPU cacheline optimized data placement.
|
||||
|
||||
The ENA driver supports industry standard TCP/IP offload features such
|
||||
as checksum offload and TCP transmit segmentation offload (TSO).
|
||||
Receive-side scaling (RSS) is supported for multi-core scaling.
|
||||
|
||||
The ENA driver and its corresponding devices implement health
|
||||
monitoring mechanisms such as watchdog, enabling the device and driver
|
||||
to recover in a manner transparent to the application, as well as
|
||||
debug logs.
|
||||
|
||||
Some of the ENA devices support a working mode called Low-latency
|
||||
Queue (LLQ), which saves several more microseconds.
|
||||
|
||||
Supported PCI vendor ID/device IDs
|
||||
==================================
|
||||
|
||||
========= =======================
|
||||
1d0f:0ec2 ENA PF
|
||||
1d0f:1ec2 ENA PF with LLQ support
|
||||
1d0f:ec20 ENA VF
|
||||
1d0f:ec21 ENA VF with LLQ support
|
||||
========= =======================
|
||||
|
||||
ENA Source Code Directory Structure
|
||||
===================================
|
||||
|
||||
================= ======================================================
|
||||
ena_com.[ch] Management communication layer. This layer is
|
||||
responsible for the handling all the management
|
||||
(admin) communication between the device and the
|
||||
driver.
|
||||
ena_eth_com.[ch] Tx/Rx data path.
|
||||
ena_admin_defs.h Definition of ENA management interface.
|
||||
ena_eth_io_defs.h Definition of ENA data path interface.
|
||||
ena_common_defs.h Common definitions for ena_com layer.
|
||||
ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers.
|
||||
ena_netdev.[ch] Main Linux kernel driver.
|
||||
ena_syfsfs.[ch] Sysfs files.
|
||||
ena_ethtool.c ethtool callbacks.
|
||||
ena_pci_id_tbl.h Supported device IDs.
|
||||
================= ======================================================
|
||||
|
||||
Management Interface:
|
||||
=====================
|
||||
|
||||
ENA management interface is exposed by means of:
|
||||
|
||||
- PCIe Configuration Space
|
||||
- Device Registers
|
||||
- Admin Queue (AQ) and Admin Completion Queue (ACQ)
|
||||
- Asynchronous Event Notification Queue (AENQ)
|
||||
|
||||
ENA device MMIO Registers are accessed only during driver
|
||||
initialization and are not involved in further normal device
|
||||
operation.
|
||||
|
||||
AQ is used for submitting management commands, and the
|
||||
results/responses are reported asynchronously through ACQ.
|
||||
|
||||
ENA introduces a small set of management commands with room for
|
||||
vendor-specific extensions. Most of the management operations are
|
||||
framed in a generic Get/Set feature command.
|
||||
|
||||
The following admin queue commands are supported:
|
||||
|
||||
- Create I/O submission queue
|
||||
- Create I/O completion queue
|
||||
- Destroy I/O submission queue
|
||||
- Destroy I/O completion queue
|
||||
- Get feature
|
||||
- Set feature
|
||||
- Configure AENQ
|
||||
- Get statistics
|
||||
|
||||
Refer to ena_admin_defs.h for the list of supported Get/Set Feature
|
||||
properties.
|
||||
|
||||
The Asynchronous Event Notification Queue (AENQ) is a uni-directional
|
||||
queue used by the ENA device to send to the driver events that cannot
|
||||
be reported using ACQ. AENQ events are subdivided into groups. Each
|
||||
group may have multiple syndromes, as shown below
|
||||
|
||||
The events are:
|
||||
|
||||
==================== ===============
|
||||
Group Syndrome
|
||||
==================== ===============
|
||||
Link state change **X**
|
||||
Fatal error **X**
|
||||
Notification Suspend traffic
|
||||
Notification Resume traffic
|
||||
Keep-Alive **X**
|
||||
==================== ===============
|
||||
|
||||
ACQ and AENQ share the same MSI-X vector.
|
||||
|
||||
Keep-Alive is a special mechanism that allows monitoring of the
|
||||
device's health. The driver maintains a watchdog (WD) handler which,
|
||||
if fired, logs the current state and statistics then resets and
|
||||
restarts the ENA device and driver. A Keep-Alive event is delivered by
|
||||
the device every second. The driver re-arms the WD upon reception of a
|
||||
Keep-Alive event. A missed Keep-Alive event causes the WD handler to
|
||||
fire.
|
||||
|
||||
Data Path Interface
|
||||
===================
|
||||
I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
|
||||
SQ correspondingly). Each SQ has a completion queue (CQ) associated
|
||||
with it.
|
||||
|
||||
The SQs and CQs are implemented as descriptor rings in contiguous
|
||||
physical memory.
|
||||
|
||||
The ENA driver supports two Queue Operation modes for Tx SQs:
|
||||
|
||||
- Regular mode
|
||||
|
||||
* In this mode the Tx SQs reside in the host's memory. The ENA
|
||||
device fetches the ENA Tx descriptors and packet data from host
|
||||
memory.
|
||||
|
||||
- Low Latency Queue (LLQ) mode or "push-mode".
|
||||
|
||||
* In this mode the driver pushes the transmit descriptors and the
|
||||
first 128 bytes of the packet directly to the ENA device memory
|
||||
space. The rest of the packet payload is fetched by the
|
||||
device. For this operation mode, the driver uses a dedicated PCI
|
||||
device memory BAR, which is mapped with write-combine capability.
|
||||
|
||||
The Rx SQs support only the regular mode.
|
||||
|
||||
Note: Not all ENA devices support LLQ, and this feature is negotiated
|
||||
with the device upon initialization. If the ENA device does not
|
||||
support LLQ mode, the driver falls back to the regular mode.
|
||||
|
||||
The driver supports multi-queue for both Tx and Rx. This has various
|
||||
benefits:
|
||||
|
||||
- Reduced CPU/thread/process contention on a given Ethernet interface.
|
||||
- Cache miss rate on completion is reduced, particularly for data
|
||||
cache lines that hold the sk_buff structures.
|
||||
- Increased process-level parallelism when handling received packets.
|
||||
- Increased data cache hit rate, by steering kernel processing of
|
||||
packets to the CPU, where the application thread consuming the
|
||||
packet is running.
|
||||
- In hardware interrupt re-direction.
|
||||
|
||||
Interrupt Modes
|
||||
===============
|
||||
The driver assigns a single MSI-X vector per queue pair (for both Tx
|
||||
and Rx directions). The driver assigns an additional dedicated MSI-X vector
|
||||
for management (for ACQ and AENQ).
|
||||
|
||||
Management interrupt registration is performed when the Linux kernel
|
||||
probes the adapter, and it is de-registered when the adapter is
|
||||
removed. I/O queue interrupt registration is performed when the Linux
|
||||
interface of the adapter is opened, and it is de-registered when the
|
||||
interface is closed.
|
||||
|
||||
The management interrupt is named::
|
||||
|
||||
ena-mgmnt@pci:<PCI domain:bus:slot.function>
|
||||
|
||||
and for each queue pair, an interrupt is named::
|
||||
|
||||
<interface name>-Tx-Rx-<queue index>
|
||||
|
||||
The ENA device operates in auto-mask and auto-clear interrupt
|
||||
modes. That is, once MSI-X is delivered to the host, its Cause bit is
|
||||
automatically cleared and the interrupt is masked. The interrupt is
|
||||
unmasked by the driver after NAPI processing is complete.
|
||||
|
||||
Interrupt Moderation
|
||||
====================
|
||||
ENA driver and device can operate in conventional or adaptive interrupt
|
||||
moderation mode.
|
||||
|
||||
In conventional mode the driver instructs device to postpone interrupt
|
||||
posting according to static interrupt delay value. The interrupt delay
|
||||
value can be configured through ethtool(8). The following ethtool
|
||||
parameters are supported by the driver: tx-usecs, rx-usecs
|
||||
|
||||
In adaptive interrupt moderation mode the interrupt delay value is
|
||||
updated by the driver dynamically and adjusted every NAPI cycle
|
||||
according to the traffic nature.
|
||||
|
||||
By default ENA driver applies adaptive coalescing on Rx traffic and
|
||||
conventional coalescing on Tx traffic.
|
||||
|
||||
Adaptive coalescing can be switched on/off through ethtool(8)
|
||||
adaptive_rx on|off parameter.
|
||||
|
||||
The driver chooses interrupt delay value according to the number of
|
||||
bytes and packets received between interrupt unmasking and interrupt
|
||||
posting. The driver uses interrupt delay table that subdivides the
|
||||
range of received bytes/packets into 5 levels and assigns interrupt
|
||||
delay value to each level.
|
||||
|
||||
The user can enable/disable adaptive moderation, modify the interrupt
|
||||
delay table and restore its default values through sysfs.
|
||||
|
||||
RX copybreak
|
||||
============
|
||||
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
|
||||
and can be configured by the ETHTOOL_STUNABLE command of the
|
||||
SIOCETHTOOL ioctl.
|
||||
|
||||
SKB
|
||||
===
|
||||
The driver-allocated SKB for frames received from Rx handling using
|
||||
NAPI context. The allocation method depends on the size of the packet.
|
||||
If the frame length is larger than rx_copybreak, napi_get_frags()
|
||||
is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer
|
||||
content is copied (by CPU) to the SKB, and the buffer is recycled.
|
||||
|
||||
Statistics
|
||||
==========
|
||||
The user can obtain ENA device and driver statistics using ethtool.
|
||||
The driver can collect regular or extended statistics (including
|
||||
per-queue stats) from the device.
|
||||
|
||||
In addition the driver logs the stats to syslog upon device reset.
|
||||
|
||||
MTU
|
||||
===
|
||||
The driver supports an arbitrarily large MTU with a maximum that is
|
||||
negotiated with the device. The driver configures MTU using the
|
||||
SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
|
||||
via ip(8) and similar legacy tools.
|
||||
|
||||
Stateless Offloads
|
||||
==================
|
||||
The ENA driver supports:
|
||||
|
||||
- TSO over IPv4/IPv6
|
||||
- TSO with ECN
|
||||
- IPv4 header checksum offload
|
||||
- TCP/UDP over IPv4/IPv6 checksum offloads
|
||||
|
||||
RSS
|
||||
===
|
||||
- The ENA device supports RSS that allows flexible Rx traffic
|
||||
steering.
|
||||
- Toeplitz and CRC32 hash functions are supported.
|
||||
- Different combinations of L2/L3/L4 fields can be configured as
|
||||
inputs for hash functions.
|
||||
- The driver configures RSS settings using the AQ SetFeature command
|
||||
(ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and
|
||||
ENA_ADMIN_RSS_REDIRECTION_TABLE_CONFIG properties).
|
||||
- If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash
|
||||
function delivered in the Rx CQ descriptor is set in the received
|
||||
SKB.
|
||||
- The user can provide a hash key, hash function, and configure the
|
||||
indirection table through ethtool(8).
|
||||
|
||||
DATA PATH
|
||||
=========
|
||||
Tx
|
||||
--
|
||||
|
||||
end_start_xmit() is called by the stack. This function does the following:
|
||||
|
||||
- Maps data buffers (skb->data and frags).
|
||||
- Populates ena_buf for the push buffer (if the driver and device are
|
||||
in push mode.)
|
||||
- Prepares ENA bufs for the remaining frags.
|
||||
- Allocates a new request ID from the empty req_id ring. The request
|
||||
ID is the index of the packet in the Tx info. This is used for
|
||||
out-of-order TX completions.
|
||||
- Adds the packet to the proper place in the Tx ring.
|
||||
- Calls ena_com_prepare_tx(), an ENA communication layer that converts
|
||||
the ena_bufs to ENA descriptors (and adds meta ENA descriptors as
|
||||
needed.)
|
||||
|
||||
* This function also copies the ENA descriptors and the push buffer
|
||||
to the Device memory space (if in push mode.)
|
||||
|
||||
- Writes doorbell to the ENA device.
|
||||
- When the ENA device finishes sending the packet, a completion
|
||||
interrupt is raised.
|
||||
- The interrupt handler schedules NAPI.
|
||||
- The ena_clean_tx_irq() function is called. This function handles the
|
||||
completion descriptors generated by the ENA, with a single
|
||||
completion descriptor per completed packet.
|
||||
|
||||
* req_id is retrieved from the completion descriptor. The tx_info of
|
||||
the packet is retrieved via the req_id. The data buffers are
|
||||
unmapped and req_id is returned to the empty req_id ring.
|
||||
* The function stops when the completion descriptors are completed or
|
||||
the budget is reached.
|
||||
|
||||
Rx
|
||||
--
|
||||
|
||||
- When a packet is received from the ENA device.
|
||||
- The interrupt handler schedules NAPI.
|
||||
- The ena_clean_rx_irq() function is called. This function calls
|
||||
ena_rx_pkt(), an ENA communication layer function, which returns the
|
||||
number of descriptors used for a new unhandled packet, and zero if
|
||||
no new packet is found.
|
||||
- Then it calls the ena_clean_rx_irq() function.
|
||||
- ena_eth_rx_skb() checks packet length:
|
||||
|
||||
* If the packet is small (len < rx_copybreak), the driver allocates
|
||||
a SKB for the new packet, and copies the packet payload into the
|
||||
SKB data buffer.
|
||||
|
||||
- In this way the original data buffer is not passed to the stack
|
||||
and is reused for future Rx packets.
|
||||
|
||||
* Otherwise the function unmaps the Rx buffer, then allocates the
|
||||
new SKB structure and hooks the Rx buffer to the SKB frags.
|
||||
|
||||
- The new SKB is updated with the necessary information (protocol,
|
||||
checksum hw verify result, etc.), and then passed to the network
|
||||
stack, using the NAPI interface function napi_gro_receive().
|
@@ -0,0 +1,556 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
===============================
|
||||
Marvell(Aquantia) AQtion Driver
|
||||
===============================
|
||||
|
||||
For the aQuantia Multi-Gigabit PCI Express Family of Ethernet Adapters
|
||||
|
||||
.. Contents
|
||||
|
||||
- Identifying Your Adapter
|
||||
- Configuration
|
||||
- Supported ethtool options
|
||||
- Command Line Parameters
|
||||
- Config file parameters
|
||||
- Support
|
||||
- License
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
|
||||
The driver in this release is compatible with AQC-100, AQC-107, AQC-108
|
||||
based ethernet adapters.
|
||||
|
||||
|
||||
SFP+ Devices (for AQC-100 based adapters)
|
||||
-----------------------------------------
|
||||
|
||||
This release tested with passive Direct Attach Cables (DAC) and SFP+/LC
|
||||
Optical Transceiver.
|
||||
|
||||
Configuration
|
||||
=============
|
||||
|
||||
Viewing Link Messages
|
||||
---------------------
|
||||
Link messages will not be displayed to the console if the distribution is
|
||||
restricting system messages. In order to see network driver link messages on
|
||||
your console, set dmesg to eight by entering the following::
|
||||
|
||||
dmesg -n 8
|
||||
|
||||
.. note::
|
||||
|
||||
This setting is not saved across reboots.
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
|
||||
enabled by changing the MTU to a value larger than the default of 1500.
|
||||
The maximum value for the MTU is 16000. Use the `ip` command to
|
||||
increase the MTU size. For example::
|
||||
|
||||
ip link set mtu 16000 dev enp1s0
|
||||
|
||||
ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The latest
|
||||
ethtool version is required for this functionality.
|
||||
|
||||
NAPI
|
||||
----
|
||||
NAPI (Rx polling mode) is supported in the atlantic driver.
|
||||
|
||||
Supported ethtool options
|
||||
=========================
|
||||
|
||||
Viewing adapter settings
|
||||
------------------------
|
||||
|
||||
::
|
||||
|
||||
ethtool <ethX>
|
||||
|
||||
Output example::
|
||||
|
||||
Settings for enp1s0:
|
||||
Supported ports: [ TP ]
|
||||
Supported link modes: 100baseT/Full
|
||||
1000baseT/Full
|
||||
10000baseT/Full
|
||||
2500baseT/Full
|
||||
5000baseT/Full
|
||||
Supported pause frame use: Symmetric
|
||||
Supports auto-negotiation: Yes
|
||||
Supported FEC modes: Not reported
|
||||
Advertised link modes: 100baseT/Full
|
||||
1000baseT/Full
|
||||
10000baseT/Full
|
||||
2500baseT/Full
|
||||
5000baseT/Full
|
||||
Advertised pause frame use: Symmetric
|
||||
Advertised auto-negotiation: Yes
|
||||
Advertised FEC modes: Not reported
|
||||
Speed: 10000Mb/s
|
||||
Duplex: Full
|
||||
Port: Twisted Pair
|
||||
PHYAD: 0
|
||||
Transceiver: internal
|
||||
Auto-negotiation: on
|
||||
MDI-X: Unknown
|
||||
Supports Wake-on: g
|
||||
Wake-on: d
|
||||
Link detected: yes
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
AQrate speeds (2.5/5 Gb/s) will be displayed only with linux kernels > 4.10.
|
||||
But you can still use these speeds::
|
||||
|
||||
ethtool -s eth0 autoneg off speed 2500
|
||||
|
||||
Viewing adapter information
|
||||
---------------------------
|
||||
|
||||
::
|
||||
|
||||
ethtool -i <ethX>
|
||||
|
||||
Output example::
|
||||
|
||||
driver: atlantic
|
||||
version: 5.2.0-050200rc5-generic-kern
|
||||
firmware-version: 3.1.78
|
||||
expansion-rom-version:
|
||||
bus-info: 0000:01:00.0
|
||||
supports-statistics: yes
|
||||
supports-test: no
|
||||
supports-eeprom-access: no
|
||||
supports-register-dump: yes
|
||||
supports-priv-flags: no
|
||||
|
||||
|
||||
Viewing Ethernet adapter statistics
|
||||
-----------------------------------
|
||||
|
||||
::
|
||||
|
||||
ethtool -S <ethX>
|
||||
|
||||
Output example::
|
||||
|
||||
NIC statistics:
|
||||
InPackets: 13238607
|
||||
InUCast: 13293852
|
||||
InMCast: 52
|
||||
InBCast: 3
|
||||
InErrors: 0
|
||||
OutPackets: 23703019
|
||||
OutUCast: 23704941
|
||||
OutMCast: 67
|
||||
OutBCast: 11
|
||||
InUCastOctects: 213182760
|
||||
OutUCastOctects: 22698443
|
||||
InMCastOctects: 6600
|
||||
OutMCastOctects: 8776
|
||||
InBCastOctects: 192
|
||||
OutBCastOctects: 704
|
||||
InOctects: 2131839552
|
||||
OutOctects: 226938073
|
||||
InPacketsDma: 95532300
|
||||
OutPacketsDma: 59503397
|
||||
InOctetsDma: 1137102462
|
||||
OutOctetsDma: 2394339518
|
||||
InDroppedDma: 0
|
||||
Queue[0] InPackets: 23567131
|
||||
Queue[0] OutPackets: 20070028
|
||||
Queue[0] InJumboPackets: 0
|
||||
Queue[0] InLroPackets: 0
|
||||
Queue[0] InErrors: 0
|
||||
Queue[1] InPackets: 45428967
|
||||
Queue[1] OutPackets: 11306178
|
||||
Queue[1] InJumboPackets: 0
|
||||
Queue[1] InLroPackets: 0
|
||||
Queue[1] InErrors: 0
|
||||
Queue[2] InPackets: 3187011
|
||||
Queue[2] OutPackets: 13080381
|
||||
Queue[2] InJumboPackets: 0
|
||||
Queue[2] InLroPackets: 0
|
||||
Queue[2] InErrors: 0
|
||||
Queue[3] InPackets: 23349136
|
||||
Queue[3] OutPackets: 15046810
|
||||
Queue[3] InJumboPackets: 0
|
||||
Queue[3] InLroPackets: 0
|
||||
Queue[3] InErrors: 0
|
||||
|
||||
Interrupt coalescing support
|
||||
----------------------------
|
||||
|
||||
ITR mode, TX/RX coalescing timings could be viewed with::
|
||||
|
||||
ethtool -c <ethX>
|
||||
|
||||
and changed with::
|
||||
|
||||
ethtool -C <ethX> tx-usecs <usecs> rx-usecs <usecs>
|
||||
|
||||
To disable coalescing::
|
||||
|
||||
ethtool -C <ethX> tx-usecs 0 rx-usecs 0 tx-max-frames 1 tx-max-frames 1
|
||||
|
||||
Wake on LAN support
|
||||
-------------------
|
||||
|
||||
WOL support by magic packet::
|
||||
|
||||
ethtool -s <ethX> wol g
|
||||
|
||||
To disable WOL::
|
||||
|
||||
ethtool -s <ethX> wol d
|
||||
|
||||
Set and check the driver message level
|
||||
--------------------------------------
|
||||
|
||||
Set message level
|
||||
|
||||
::
|
||||
|
||||
ethtool -s <ethX> msglvl <level>
|
||||
|
||||
Level values:
|
||||
|
||||
====== =============================
|
||||
0x0001 general driver status.
|
||||
0x0002 hardware probing.
|
||||
0x0004 link state.
|
||||
0x0008 periodic status check.
|
||||
0x0010 interface being brought down.
|
||||
0x0020 interface being brought up.
|
||||
0x0040 receive error.
|
||||
0x0080 transmit error.
|
||||
0x0200 interrupt handling.
|
||||
0x0400 transmit completion.
|
||||
0x0800 receive completion.
|
||||
0x1000 packet contents.
|
||||
0x2000 hardware status.
|
||||
0x4000 Wake-on-LAN status.
|
||||
====== =============================
|
||||
|
||||
By default, the level of debugging messages is set 0x0001(general driver status).
|
||||
|
||||
Check message level
|
||||
|
||||
::
|
||||
|
||||
ethtool <ethX> | grep "Current message level"
|
||||
|
||||
If you want to disable the output of messages::
|
||||
|
||||
ethtool -s <ethX> msglvl 0
|
||||
|
||||
RX flow rules (ntuple filters)
|
||||
------------------------------
|
||||
|
||||
There are separate rules supported, that applies in that order:
|
||||
|
||||
1. 16 VLAN ID rules
|
||||
2. 16 L2 EtherType rules
|
||||
3. 8 L3/L4 5-Tuple rules
|
||||
|
||||
|
||||
The driver utilizes the ethtool interface for configuring ntuple filters,
|
||||
via ``ethtool -N <device> <filter>``.
|
||||
|
||||
To enable or disable the RX flow rules::
|
||||
|
||||
ethtool -K ethX ntuple <on|off>
|
||||
|
||||
When disabling ntuple filters, all the user programed filters are
|
||||
flushed from the driver cache and hardware. All needed filters must
|
||||
be re-added when ntuple is re-enabled.
|
||||
|
||||
Because of the fixed order of the rules, the location of filters is also fixed:
|
||||
|
||||
- Locations 0 - 15 for VLAN ID filters
|
||||
- Locations 16 - 31 for L2 EtherType filters
|
||||
- Locations 32 - 39 for L3/L4 5-tuple filters (locations 32, 36 for IPv6)
|
||||
|
||||
The L3/L4 5-tuple (protocol, source and destination IP address, source and
|
||||
destination TCP/UDP/SCTP port) is compared against 8 filters. For IPv4, up to
|
||||
8 source and destination addresses can be matched. For IPv6, up to 2 pairs of
|
||||
addresses can be supported. Source and destination ports are only compared for
|
||||
TCP/UDP/SCTP packets.
|
||||
|
||||
To add a filter that directs packet to queue 5, use
|
||||
``<-N|-U|--config-nfc|--config-ntuple>`` switch::
|
||||
|
||||
ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.1 dst-ip 10.0.0.2 src-port 2000 dst-port 2001 action 5 <loc 32>
|
||||
|
||||
- action is the queue number.
|
||||
- loc is the rule number.
|
||||
|
||||
For ``flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6`` you must set the loc
|
||||
number within 32 - 39.
|
||||
For ``flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6`` you can set 8 rules
|
||||
for traffic IPv4 or you can set 2 rules for traffic IPv6. Loc number traffic
|
||||
IPv6 is 32 and 36.
|
||||
At the moment you can not use IPv4 and IPv6 filters at the same time.
|
||||
|
||||
Example filter for IPv6 filter traffic::
|
||||
|
||||
sudo ethtool -N <ethX> flow-type tcp6 src-ip 2001:db8:0:f101::1 dst-ip 2001:db8:0:f101::2 action 1 loc 32
|
||||
sudo ethtool -N <ethX> flow-type ip6 src-ip 2001:db8:0:f101::2 dst-ip 2001:db8:0:f101::5 action -1 loc 36
|
||||
|
||||
Example filter for IPv4 filter traffic::
|
||||
|
||||
sudo ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.4 dst-ip 10.0.0.7 src-port 2000 dst-port 2001 loc 32
|
||||
sudo ethtool -N <ethX> flow-type tcp4 src-ip 10.0.0.3 dst-ip 10.0.0.9 src-port 2000 dst-port 2001 loc 33
|
||||
sudo ethtool -N <ethX> flow-type ip4 src-ip 10.0.0.6 dst-ip 10.0.0.4 loc 34
|
||||
|
||||
If you set action -1, then all traffic corresponding to the filter will be discarded.
|
||||
|
||||
The maximum value action is 31.
|
||||
|
||||
|
||||
The VLAN filter (VLAN id) is compared against 16 filters.
|
||||
VLAN id must be accompanied by mask 0xF000. That is to distinguish VLAN filter
|
||||
from L2 Ethertype filter with UserPriority since both User Priority and VLAN ID
|
||||
are passed in the same 'vlan' parameter.
|
||||
|
||||
To add a filter that directs packets from VLAN 2001 to queue 5::
|
||||
|
||||
ethtool -N <ethX> flow-type ip4 vlan 2001 m 0xF000 action 1 loc 0
|
||||
|
||||
|
||||
L2 EtherType filters allows filter packet by EtherType field or both EtherType
|
||||
and User Priority (PCP) field of 802.1Q.
|
||||
UserPriority (vlan) parameter must be accompanied by mask 0x1FFF. That is to
|
||||
distinguish VLAN filter from L2 Ethertype filter with UserPriority since both
|
||||
User Priority and VLAN ID are passed in the same 'vlan' parameter.
|
||||
|
||||
To add a filter that directs IP4 packess of priority 3 to queue 3::
|
||||
|
||||
ethtool -N <ethX> flow-type ether proto 0x800 vlan 0x600 m 0x1FFF action 3 loc 16
|
||||
|
||||
To see the list of filters currently present::
|
||||
|
||||
ethtool <-u|-n|--show-nfc|--show-ntuple> <ethX>
|
||||
|
||||
Rules may be deleted from the table itself. This is done using::
|
||||
|
||||
sudo ethtool <-N|-U|--config-nfc|--config-ntuple> <ethX> delete <loc>
|
||||
|
||||
- loc is the rule number to be deleted.
|
||||
|
||||
Rx filters is an interface to load the filter table that funnels all flow
|
||||
into queue 0 unless an alternative queue is specified using "action". In that
|
||||
case, any flow that matches the filter criteria will be directed to the
|
||||
appropriate queue. RX filters is supported on all kernels 2.6.30 and later.
|
||||
|
||||
RSS for UDP
|
||||
-----------
|
||||
|
||||
Currently, NIC does not support RSS for fragmented IP packets, which leads to
|
||||
incorrect working of RSS for fragmented UDP traffic. To disable RSS for UDP the
|
||||
RX Flow L3/L4 rule may be used.
|
||||
|
||||
Example::
|
||||
|
||||
ethtool -N eth0 flow-type udp4 action 0 loc 32
|
||||
|
||||
UDP GSO hardware offload
|
||||
------------------------
|
||||
|
||||
UDP GSO allows to boost UDP tx rates by offloading UDP headers allocation
|
||||
into hardware. A special userspace socket option is required for this,
|
||||
could be validated with /kernel/tools/testing/selftests/net/::
|
||||
|
||||
udpgso_bench_tx -u -4 -D 10.0.1.1 -s 6300 -S 100
|
||||
|
||||
Will cause sending out of 100 byte sized UDP packets formed from single
|
||||
6300 bytes user buffer.
|
||||
|
||||
UDP GSO is configured by::
|
||||
|
||||
ethtool -K eth0 tx-udp-segmentation on
|
||||
|
||||
Private flags (testing)
|
||||
-----------------------
|
||||
|
||||
Atlantic driver supports private flags for hardware custom features::
|
||||
|
||||
$ ethtool --show-priv-flags ethX
|
||||
|
||||
Private flags for ethX:
|
||||
DMASystemLoopback : off
|
||||
PKTSystemLoopback : off
|
||||
DMANetworkLoopback : off
|
||||
PHYInternalLoopback: off
|
||||
PHYExternalLoopback: off
|
||||
|
||||
Example::
|
||||
|
||||
$ ethtool --set-priv-flags ethX DMASystemLoopback on
|
||||
|
||||
DMASystemLoopback: DMA Host loopback.
|
||||
PKTSystemLoopback: Packet buffer host loopback.
|
||||
DMANetworkLoopback: Network side loopback on DMA block.
|
||||
PHYInternalLoopback: Internal loopback on Phy.
|
||||
PHYExternalLoopback: External loopback on Phy (with loopback ethernet cable).
|
||||
|
||||
|
||||
Command Line Parameters
|
||||
=======================
|
||||
The following command line parameters are available on atlantic driver:
|
||||
|
||||
aq_itr -Interrupt throttling mode
|
||||
---------------------------------
|
||||
Accepted values: 0, 1, 0xFFFF
|
||||
|
||||
Default value: 0xFFFF
|
||||
|
||||
====== ==============================================================
|
||||
0 Disable interrupt throttling.
|
||||
1 Enable interrupt throttling and use specified tx and rx rates.
|
||||
0xFFFF Auto throttling mode. Driver will choose the best RX and TX
|
||||
interrupt throtting settings based on link speed.
|
||||
====== ==============================================================
|
||||
|
||||
aq_itr_tx - TX interrupt throttle rate
|
||||
--------------------------------------
|
||||
|
||||
Accepted values: 0 - 0x1FF
|
||||
|
||||
Default value: 0
|
||||
|
||||
TX side throttling in microseconds. Adapter will setup maximum interrupt delay
|
||||
to this value. Minimum interrupt delay will be a half of this value
|
||||
|
||||
aq_itr_rx - RX interrupt throttle rate
|
||||
--------------------------------------
|
||||
|
||||
Accepted values: 0 - 0x1FF
|
||||
|
||||
Default value: 0
|
||||
|
||||
RX side throttling in microseconds. Adapter will setup maximum interrupt delay
|
||||
to this value. Minimum interrupt delay will be a half of this value
|
||||
|
||||
.. note::
|
||||
|
||||
ITR settings could be changed in runtime by ethtool -c means (see below)
|
||||
|
||||
Config file parameters
|
||||
======================
|
||||
|
||||
For some fine tuning and performance optimizations,
|
||||
some parameters can be changed in the {source_dir}/aq_cfg.h file.
|
||||
|
||||
AQ_CFG_RX_PAGEORDER
|
||||
-------------------
|
||||
|
||||
Default value: 0
|
||||
|
||||
RX page order override. Thats a power of 2 number of RX pages allocated for
|
||||
each descriptor. Received descriptor size is still limited by
|
||||
AQ_CFG_RX_FRAME_MAX.
|
||||
|
||||
Increasing pageorder makes page reuse better (actual on iommu enabled systems).
|
||||
|
||||
AQ_CFG_RX_REFILL_THRES
|
||||
----------------------
|
||||
|
||||
Default value: 32
|
||||
|
||||
RX refill threshold. RX path will not refill freed descriptors until the
|
||||
specified number of free descriptors is observed. Larger values may help
|
||||
better page reuse but may lead to packet drops as well.
|
||||
|
||||
AQ_CFG_VECS_DEF
|
||||
---------------
|
||||
|
||||
Number of queues
|
||||
|
||||
Valid Range: 0 - 8 (up to AQ_CFG_VECS_MAX)
|
||||
|
||||
Default value: 8
|
||||
|
||||
Notice this value will be capped by the number of cores available on the system.
|
||||
|
||||
AQ_CFG_IS_RSS_DEF
|
||||
-----------------
|
||||
|
||||
Enable/disable Receive Side Scaling
|
||||
|
||||
This feature allows the adapter to distribute receive processing
|
||||
across multiple CPU-cores and to prevent from overloading a single CPU core.
|
||||
|
||||
Valid values
|
||||
|
||||
== ========
|
||||
0 disabled
|
||||
1 enabled
|
||||
== ========
|
||||
|
||||
Default value: 1
|
||||
|
||||
AQ_CFG_NUM_RSS_QUEUES_DEF
|
||||
-------------------------
|
||||
|
||||
Number of queues for Receive Side Scaling
|
||||
|
||||
Valid Range: 0 - 8 (up to AQ_CFG_VECS_DEF)
|
||||
|
||||
Default value: AQ_CFG_VECS_DEF
|
||||
|
||||
AQ_CFG_IS_LRO_DEF
|
||||
-----------------
|
||||
|
||||
Enable/disable Large Receive Offload
|
||||
|
||||
This offload enables the adapter to coalesce multiple TCP segments and indicate
|
||||
them as a single coalesced unit to the OS networking subsystem.
|
||||
|
||||
The system consumes less energy but it also introduces more latency in packets
|
||||
processing.
|
||||
|
||||
Valid values
|
||||
|
||||
== ========
|
||||
0 disabled
|
||||
1 enabled
|
||||
== ========
|
||||
|
||||
Default value: 1
|
||||
|
||||
AQ_CFG_TX_CLEAN_BUDGET
|
||||
----------------------
|
||||
|
||||
Maximum descriptors to cleanup on TX at once.
|
||||
|
||||
Default value: 256
|
||||
|
||||
After the aq_cfg.h file changed the driver must be rebuilt to take effect.
|
||||
|
||||
Support
|
||||
=======
|
||||
|
||||
If an issue is identified with the released source code on the supported
|
||||
kernel with a supported adapter, email the specific information related
|
||||
to the issue to aqn_support@marvell.com
|
||||
|
||||
License
|
||||
=======
|
||||
|
||||
aQuantia Corporation Network Driver
|
||||
|
||||
Copyright |copy| 2014 - 2019 aQuantia Corporation.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the terms and conditions of the GNU General Public License,
|
||||
version 2, as published by the Free Software Foundation.
|
@@ -0,0 +1,393 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=============================================
|
||||
Chelsio N210 10Gb Ethernet Network Controller
|
||||
=============================================
|
||||
|
||||
Driver Release Notes for Linux
|
||||
|
||||
Version 2.1.1
|
||||
|
||||
June 20, 2005
|
||||
|
||||
.. Contents
|
||||
|
||||
INTRODUCTION
|
||||
FEATURES
|
||||
PERFORMANCE
|
||||
DRIVER MESSAGES
|
||||
KNOWN ISSUES
|
||||
SUPPORT
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
This document describes the Linux driver for Chelsio 10Gb Ethernet Network
|
||||
Controller. This driver supports the Chelsio N210 NIC and is backward
|
||||
compatible with the Chelsio N110 model 10Gb NICs.
|
||||
|
||||
|
||||
Features
|
||||
========
|
||||
|
||||
Adaptive Interrupts (adaptive-rx)
|
||||
---------------------------------
|
||||
|
||||
This feature provides an adaptive algorithm that adjusts the interrupt
|
||||
coalescing parameters, allowing the driver to dynamically adapt the latency
|
||||
settings to achieve the highest performance during various types of network
|
||||
load.
|
||||
|
||||
The interface used to control this feature is ethtool. Please see the
|
||||
ethtool manpage for additional usage information.
|
||||
|
||||
By default, adaptive-rx is disabled.
|
||||
To enable adaptive-rx::
|
||||
|
||||
ethtool -C <interface> adaptive-rx on
|
||||
|
||||
To disable adaptive-rx, use ethtool::
|
||||
|
||||
ethtool -C <interface> adaptive-rx off
|
||||
|
||||
After disabling adaptive-rx, the timer latency value will be set to 50us.
|
||||
You may set the timer latency after disabling adaptive-rx::
|
||||
|
||||
ethtool -C <interface> rx-usecs <microseconds>
|
||||
|
||||
An example to set the timer latency value to 100us on eth0::
|
||||
|
||||
ethtool -C eth0 rx-usecs 100
|
||||
|
||||
You may also provide a timer latency value while disabling adaptive-rx::
|
||||
|
||||
ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
|
||||
|
||||
If adaptive-rx is disabled and a timer latency value is specified, the timer
|
||||
will be set to the specified value until changed by the user or until
|
||||
adaptive-rx is enabled.
|
||||
|
||||
To view the status of the adaptive-rx and timer latency values::
|
||||
|
||||
ethtool -c <interface>
|
||||
|
||||
|
||||
TCP Segmentation Offloading (TSO) Support
|
||||
-----------------------------------------
|
||||
|
||||
This feature, also known as "large send", enables a system's protocol stack
|
||||
to offload portions of outbound TCP processing to a network interface card
|
||||
thereby reducing system CPU utilization and enhancing performance.
|
||||
|
||||
The interface used to control this feature is ethtool version 1.8 or higher.
|
||||
Please see the ethtool manpage for additional usage information.
|
||||
|
||||
By default, TSO is enabled.
|
||||
To disable TSO::
|
||||
|
||||
ethtool -K <interface> tso off
|
||||
|
||||
To enable TSO::
|
||||
|
||||
ethtool -K <interface> tso on
|
||||
|
||||
To view the status of TSO::
|
||||
|
||||
ethtool -k <interface>
|
||||
|
||||
|
||||
Performance
|
||||
===========
|
||||
|
||||
The following information is provided as an example of how to change system
|
||||
parameters for "performance tuning" an what value to use. You may or may not
|
||||
want to change these system parameters, depending on your server/workstation
|
||||
application. Doing so is not warranted in any way by Chelsio Communications,
|
||||
and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
|
||||
of data or damage to equipment.
|
||||
|
||||
Your distribution may have a different way of doing things, or you may prefer
|
||||
a different method. These commands are shown only to provide an example of
|
||||
what to do and are by no means definitive.
|
||||
|
||||
Making any of the following system changes will only last until you reboot
|
||||
your system. You may want to write a script that runs at boot-up which
|
||||
includes the optimal settings for your system.
|
||||
|
||||
Setting PCI Latency Timer::
|
||||
|
||||
setpci -d 1425::
|
||||
|
||||
* 0x0c.l=0x0000F800
|
||||
|
||||
Disabling TCP timestamp::
|
||||
|
||||
sysctl -w net.ipv4.tcp_timestamps=0
|
||||
|
||||
Disabling SACK::
|
||||
|
||||
sysctl -w net.ipv4.tcp_sack=0
|
||||
|
||||
Setting large number of incoming connection requests::
|
||||
|
||||
sysctl -w net.ipv4.tcp_max_syn_backlog=3000
|
||||
|
||||
Setting maximum receive socket buffer size::
|
||||
|
||||
sysctl -w net.core.rmem_max=1024000
|
||||
|
||||
Setting maximum send socket buffer size::
|
||||
|
||||
sysctl -w net.core.wmem_max=1024000
|
||||
|
||||
Set smp_affinity (on a multiprocessor system) to a single CPU::
|
||||
|
||||
echo 1 > /proc/irq/<interrupt_number>/smp_affinity
|
||||
|
||||
Setting default receive socket buffer size::
|
||||
|
||||
sysctl -w net.core.rmem_default=524287
|
||||
|
||||
Setting default send socket buffer size::
|
||||
|
||||
sysctl -w net.core.wmem_default=524287
|
||||
|
||||
Setting maximum option memory buffers::
|
||||
|
||||
sysctl -w net.core.optmem_max=524287
|
||||
|
||||
Setting maximum backlog (# of unprocessed packets before kernel drops)::
|
||||
|
||||
sysctl -w net.core.netdev_max_backlog=300000
|
||||
|
||||
Setting TCP read buffers (min/default/max)::
|
||||
|
||||
sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
|
||||
|
||||
Setting TCP write buffers (min/pressure/max)::
|
||||
|
||||
sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
|
||||
|
||||
Setting TCP buffer space (min/pressure/max)::
|
||||
|
||||
sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
|
||||
|
||||
TCP window size for single connections:
|
||||
|
||||
The receive buffer (RX_WINDOW) size must be at least as large as the
|
||||
Bandwidth-Delay Product of the communication link between the sender and
|
||||
receiver. Due to the variations of RTT, you may want to increase the buffer
|
||||
size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
|
||||
"TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
|
||||
|
||||
At 10Gb speeds, use the following formula::
|
||||
|
||||
RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
|
||||
Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
|
||||
|
||||
RX_WINDOW sizes of 256KB - 512KB should be sufficient.
|
||||
|
||||
Setting the min, max, and default receive buffer (RX_WINDOW) size::
|
||||
|
||||
sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
|
||||
|
||||
TCP window size for multiple connections:
|
||||
The receive buffer (RX_WINDOW) size may be calculated the same as single
|
||||
connections, but should be divided by the number of connections. The
|
||||
smaller window prevents congestion and facilitates better pacing,
|
||||
especially if/when MAC level flow control does not work well or when it is
|
||||
not supported on the machine. Experimentation may be necessary to attain
|
||||
the correct value. This method is provided as a starting point for the
|
||||
correct receive buffer size.
|
||||
|
||||
Setting the min, max, and default receive buffer (RX_WINDOW) size is
|
||||
performed in the same manner as single connection.
|
||||
|
||||
|
||||
Driver Messages
|
||||
===============
|
||||
|
||||
The following messages are the most common messages logged by syslog. These
|
||||
may be found in /var/log/messages.
|
||||
|
||||
Driver up::
|
||||
|
||||
Chelsio Network Driver - version 2.1.1
|
||||
|
||||
NIC detected::
|
||||
|
||||
eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
|
||||
|
||||
Link up::
|
||||
|
||||
eth#: link is up at 10 Gbps, full duplex
|
||||
|
||||
Link down::
|
||||
|
||||
eth#: link is down
|
||||
|
||||
|
||||
Known Issues
|
||||
============
|
||||
|
||||
These issues have been identified during testing. The following information
|
||||
is provided as a workaround to the problem. In some cases, this problem is
|
||||
inherent to Linux or to a particular Linux Distribution and/or hardware
|
||||
platform.
|
||||
|
||||
1. Large number of TCP retransmits on a multiprocessor (SMP) system.
|
||||
|
||||
On a system with multiple CPUs, the interrupt (IRQ) for the network
|
||||
controller may be bound to more than one CPU. This will cause TCP
|
||||
retransmits if the packet data were to be split across different CPUs
|
||||
and re-assembled in a different order than expected.
|
||||
|
||||
To eliminate the TCP retransmits, set smp_affinity on the particular
|
||||
interrupt to a single CPU. You can locate the interrupt (IRQ) used on
|
||||
the N110/N210 by using ifconfig::
|
||||
|
||||
ifconfig <dev_name> | grep Interrupt
|
||||
|
||||
Set the smp_affinity to a single CPU::
|
||||
|
||||
echo 1 > /proc/irq/<interrupt_number>/smp_affinity
|
||||
|
||||
It is highly suggested that you do not run the irqbalance daemon on your
|
||||
system, as this will change any smp_affinity setting you have applied.
|
||||
The irqbalance daemon runs on a 10 second interval and binds interrupts
|
||||
to the least loaded CPU determined by the daemon. To disable this daemon::
|
||||
|
||||
chkconfig --level 2345 irqbalance off
|
||||
|
||||
By default, some Linux distributions enable the kernel feature,
|
||||
irqbalance, which performs the same function as the daemon. To disable
|
||||
this feature, add the following line to your bootloader::
|
||||
|
||||
noirqbalance
|
||||
|
||||
Example using the Grub bootloader::
|
||||
|
||||
title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
|
||||
root (hd0,0)
|
||||
kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
|
||||
initrd /initrd-2.4.21-27.ELsmp.img
|
||||
|
||||
2. After running insmod, the driver is loaded and the incorrect network
|
||||
interface is brought up without running ifup.
|
||||
|
||||
When using 2.4.x kernels, including RHEL kernels, the Linux kernel
|
||||
invokes a script named "hotplug". This script is primarily used to
|
||||
automatically bring up USB devices when they are plugged in, however,
|
||||
the script also attempts to automatically bring up a network interface
|
||||
after loading the kernel module. The hotplug script does this by scanning
|
||||
the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
|
||||
for HWADDR=<mac_address>.
|
||||
|
||||
If the hotplug script does not find the HWADDRR within any of the
|
||||
ifcfg-eth# files, it will bring up the device with the next available
|
||||
interface name. If this interface is already configured for a different
|
||||
network card, your new interface will have incorrect IP address and
|
||||
network settings.
|
||||
|
||||
To solve this issue, you can add the HWADDR=<mac_address> key to the
|
||||
interface config file of your network controller.
|
||||
|
||||
To disable this "hotplug" feature, you may add the driver (module name)
|
||||
to the "blacklist" file located in /etc/hotplug. It has been noted that
|
||||
this does not work for network devices because the net.agent script
|
||||
does not use the blacklist file. Simply remove, or rename, the net.agent
|
||||
script located in /etc/hotplug to disable this feature.
|
||||
|
||||
3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
|
||||
on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
|
||||
|
||||
If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
|
||||
chipset, you may experience the "133-Mhz Mode Split Completion Data
|
||||
Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
|
||||
bus PCI-X bus.
|
||||
|
||||
AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
|
||||
can provide stale data via split completion cycles to a PCI-X card that
|
||||
is operating at 133 Mhz", causing data corruption.
|
||||
|
||||
AMD's provides three workarounds for this problem, however, Chelsio
|
||||
recommends the first option for best performance with this bug:
|
||||
|
||||
For 133Mhz secondary bus operation, limit the transaction length and
|
||||
the number of outstanding transactions, via BIOS configuration
|
||||
programming of the PCI-X card, to the following:
|
||||
|
||||
Data Length (bytes): 1k
|
||||
|
||||
Total allowed outstanding transactions: 2
|
||||
|
||||
Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
|
||||
section 56, "133-MHz Mode Split Completion Data Corruption" for more
|
||||
details with this bug and workarounds suggested by AMD.
|
||||
|
||||
It may be possible to work outside AMD's recommended PCI-X settings, try
|
||||
increasing the Data Length to 2k bytes for increased performance. If you
|
||||
have issues with these settings, please revert to the "safe" settings
|
||||
and duplicate the problem before submitting a bug or asking for support.
|
||||
|
||||
.. note::
|
||||
|
||||
The default setting on most systems is 8 outstanding transactions
|
||||
and 2k bytes data length.
|
||||
|
||||
4. On multiprocessor systems, it has been noted that an application which
|
||||
is handling 10Gb networking can switch between CPUs causing degraded
|
||||
and/or unstable performance.
|
||||
|
||||
If running on an SMP system and taking performance measurements, it
|
||||
is suggested you either run the latest netperf-2.4.0+ or use a binding
|
||||
tool such as Tim Hockin's procstate utilities (runon)
|
||||
<http://www.hockin.org/~thockin/procstate/>.
|
||||
|
||||
Binding netserver and netperf (or other applications) to particular
|
||||
CPUs will have a significant difference in performance measurements.
|
||||
You may need to experiment which CPU to bind the application to in
|
||||
order to achieve the best performance for your system.
|
||||
|
||||
If you are developing an application designed for 10Gb networking,
|
||||
please keep in mind you may want to look at kernel functions
|
||||
sched_setaffinity & sched_getaffinity to bind your application.
|
||||
|
||||
If you are just running user-space applications such as ftp, telnet,
|
||||
etc., you may want to try the runon tool provided by Tim Hockin's
|
||||
procstate utility. You could also try binding the interface to a
|
||||
particular CPU: runon 0 ifup eth0
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
|
||||
If you have problems with the software or hardware, please contact our
|
||||
customer support team via email at support@chelsio.com or check our website
|
||||
at http://www.chelsio.com
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
Chelsio Communications
|
||||
370 San Aleso Ave.
|
||||
Suite 100
|
||||
Sunnyvale, CA 94085
|
||||
http://www.chelsio.com
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License, version 2, as
|
||||
published by the Free Software Foundation.
|
||||
|
||||
You should have received a copy of the GNU General Public License along
|
||||
with this program; if not, write to the Free Software Foundation, Inc.,
|
||||
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED ``AS IS`` AND WITHOUT ANY EXPRESS OR IMPLIED
|
||||
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
Copyright |copy| 2003-2005 Chelsio Communications. All rights reserved.
|
@@ -0,0 +1,647 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================================================
|
||||
Cirrus Logic LAN CS8900/CS8920 Ethernet Adapters
|
||||
================================================
|
||||
|
||||
.. note::
|
||||
|
||||
This document was contributed by Cirrus Logic for kernel 2.2.5. This version
|
||||
has been updated for 2.3.48 by Andrew Morton.
|
||||
|
||||
Still, this is too outdated! A major cleanup is needed here.
|
||||
|
||||
Cirrus make a copy of this driver available at their website, as
|
||||
described below. In general, you should use the driver version which
|
||||
comes with your Linux distribution.
|
||||
|
||||
|
||||
Linux Network Interface Driver ver. 2.00 <kernel 2.3.48>
|
||||
|
||||
|
||||
.. TABLE OF CONTENTS
|
||||
|
||||
1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS
|
||||
1.1 Product Overview
|
||||
1.2 Driver Description
|
||||
1.2.1 Driver Name
|
||||
1.2.2 File in the Driver Package
|
||||
1.3 System Requirements
|
||||
1.4 Licensing Information
|
||||
|
||||
2.0 ADAPTER INSTALLATION and CONFIGURATION
|
||||
2.1 CS8900-based Adapter Configuration
|
||||
2.2 CS8920-based Adapter Configuration
|
||||
|
||||
3.0 LOADING THE DRIVER AS A MODULE
|
||||
|
||||
4.0 COMPILING THE DRIVER
|
||||
4.1 Compiling the Driver as a Loadable Module
|
||||
4.2 Compiling the driver to support memory mode
|
||||
4.3 Compiling the driver to support Rx DMA
|
||||
|
||||
5.0 TESTING AND TROUBLESHOOTING
|
||||
5.1 Known Defects and Limitations
|
||||
5.2 Testing the Adapter
|
||||
5.2.1 Diagnostic Self-Test
|
||||
5.2.2 Diagnostic Network Test
|
||||
5.3 Using the Adapter's LEDs
|
||||
5.4 Resolving I/O Conflicts
|
||||
|
||||
6.0 TECHNICAL SUPPORT
|
||||
6.1 Contacting Cirrus Logic's Technical Support
|
||||
6.2 Information Required Before Contacting Technical Support
|
||||
6.3 Obtaining the Latest Driver Version
|
||||
6.4 Current maintainer
|
||||
6.5 Kernel boot parameters
|
||||
|
||||
|
||||
1. Cirrus Logic LAN CS8900/CS8920 Ethernet Adapters
|
||||
===================================================
|
||||
|
||||
|
||||
1.1. Product Overview
|
||||
=====================
|
||||
|
||||
The CS8900-based ISA Ethernet Adapters from Cirrus Logic follow
|
||||
IEEE 802.3 standards and support half or full-duplex operation in ISA bus
|
||||
computers on 10 Mbps Ethernet networks. The adapters are designed for operation
|
||||
in 16-bit ISA or EISA bus expansion slots and are available in
|
||||
10BaseT-only or 3-media configurations (10BaseT, 10Base2, and AUI for 10Base-5
|
||||
or fiber networks).
|
||||
|
||||
CS8920-based adapters are similar to the CS8900-based adapter with additional
|
||||
features for Plug and Play (PnP) support and Wakeup Frame recognition. As
|
||||
such, the configuration procedures differ somewhat between the two types of
|
||||
adapters. Refer to the "Adapter Configuration" section for details on
|
||||
configuring both types of adapters.
|
||||
|
||||
|
||||
1.2. Driver Description
|
||||
=======================
|
||||
|
||||
The CS8900/CS8920 Ethernet Adapter driver for Linux supports the Linux
|
||||
v2.3.48 or greater kernel. It can be compiled directly into the kernel
|
||||
or loaded at run-time as a device driver module.
|
||||
|
||||
1.2.1 Driver Name: cs89x0
|
||||
|
||||
1.2.2 Files in the Driver Archive:
|
||||
|
||||
The files in the driver at Cirrus' website include:
|
||||
|
||||
=================== ====================================================
|
||||
readme.txt this file
|
||||
build batch file to compile cs89x0.c.
|
||||
cs89x0.c driver C code
|
||||
cs89x0.h driver header file
|
||||
cs89x0.o pre-compiled module (for v2.2.5 kernel)
|
||||
config/Config.in sample file to include cs89x0 driver in the kernel.
|
||||
config/Makefile sample file to include cs89x0 driver in the kernel.
|
||||
config/Space.c sample file to include cs89x0 driver in the kernel.
|
||||
=================== ====================================================
|
||||
|
||||
|
||||
|
||||
1.3. System Requirements
|
||||
------------------------
|
||||
|
||||
The following hardware is required:
|
||||
|
||||
* Cirrus Logic LAN (CS8900/20-based) Ethernet ISA Adapter
|
||||
|
||||
* IBM or IBM-compatible PC with:
|
||||
* An 80386 or higher processor
|
||||
* 16 bytes of contiguous IO space available between 210h - 370h
|
||||
* One available IRQ (5,10,11,or 12 for the CS8900, 3-7,9-15 for CS8920).
|
||||
|
||||
* Appropriate cable (and connector for AUI, 10BASE-2) for your network
|
||||
topology.
|
||||
|
||||
The following software is required:
|
||||
|
||||
* LINUX kernel version 2.3.48 or higher
|
||||
|
||||
* CS8900/20 Setup Utility (DOS-based)
|
||||
|
||||
* LINUX kernel sources for your kernel (if compiling into kernel)
|
||||
|
||||
* GNU Toolkit (gcc and make) v2.6 or above (if compiling into kernel
|
||||
or a module)
|
||||
|
||||
|
||||
|
||||
1.4. Licensing Information
|
||||
--------------------------
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it under
|
||||
the terms of the GNU General Public License as published by the Free Software
|
||||
Foundation, version 1.
|
||||
|
||||
This program is distributed in the hope that it will be useful, but WITHOUT
|
||||
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
more details.
|
||||
|
||||
For a full copy of the GNU General Public License, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
|
||||
|
||||
|
||||
2. Adapter Installation and Configuration
|
||||
=========================================
|
||||
|
||||
Both the CS8900 and CS8920-based adapters can be configured using parameters
|
||||
stored in an on-board EEPROM. You must use the DOS-based CS8900/20 Setup
|
||||
Utility if you want to change the adapter's configuration in EEPROM.
|
||||
|
||||
When loading the driver as a module, you can specify many of the adapter's
|
||||
configuration parameters on the command-line to override the EEPROM's settings
|
||||
or for interface configuration when an EEPROM is not used. (CS8920-based
|
||||
adapters must use an EEPROM.) See Section 3.0 LOADING THE DRIVER AS A MODULE.
|
||||
|
||||
Since the CS8900/20 Setup Utility is a DOS-based application, you must install
|
||||
and configure the adapter in a DOS-based system using the CS8900/20 Setup
|
||||
Utility before installation in the target LINUX system. (Not required if
|
||||
installing a CS8900-based adapter and the default configuration is acceptable.)
|
||||
|
||||
|
||||
2.1. CS8900-based Adapter Configuration
|
||||
---------------------------------------
|
||||
|
||||
CS8900-based adapters shipped from Cirrus Logic have been configured
|
||||
with the following "default" settings::
|
||||
|
||||
Operation Mode: Memory Mode
|
||||
IRQ: 10
|
||||
Base I/O Address: 300
|
||||
Memory Base Address: D0000
|
||||
Optimization: DOS Client
|
||||
Transmission Mode: Half-duplex
|
||||
BootProm: None
|
||||
Media Type: Autodetect (3-media cards) or
|
||||
10BASE-T (10BASE-T only adapter)
|
||||
|
||||
You should only change the default configuration settings if conflicts with
|
||||
another adapter exists. To change the adapter's configuration, run the
|
||||
CS8900/20 Setup Utility.
|
||||
|
||||
|
||||
2.2. CS8920-based Adapter Configuration
|
||||
---------------------------------------
|
||||
|
||||
CS8920-based adapters are shipped from Cirrus Logic configured as Plug
|
||||
and Play (PnP) enabled. However, since the cs89x0 driver does NOT
|
||||
support PnP, you must install the CS8920 adapter in a DOS-based PC and
|
||||
run the CS8900/20 Setup Utility to disable PnP and configure the
|
||||
adapter before installation in the target Linux system. Failure to do
|
||||
this will leave the adapter inactive and the driver will be unable to
|
||||
communicate with the adapter.
|
||||
|
||||
::
|
||||
|
||||
****************************************************************
|
||||
* CS8920-BASED ADAPTERS: *
|
||||
* *
|
||||
* CS8920-BASED ADAPTERS ARE PLUG and PLAY ENABLED BY DEFAULT. *
|
||||
* THE CS89X0 DRIVER DOES NOT SUPPORT PnP. THEREFORE, YOU MUST *
|
||||
* RUN THE CS8900/20 SETUP UTILITY TO DISABLE PnP SUPPORT AND *
|
||||
* TO ACTIVATE THE ADAPTER. *
|
||||
****************************************************************
|
||||
|
||||
|
||||
|
||||
|
||||
3. Loading the Driver as a Module
|
||||
=================================
|
||||
|
||||
If the driver is compiled as a loadable module, you can load the driver module
|
||||
with the 'modprobe' command. Many of the adapter's configuration parameters can
|
||||
be specified as command-line arguments to the load command. This facility
|
||||
provides a means to override the EEPROM's settings or for interface
|
||||
configuration when an EEPROM is not used.
|
||||
|
||||
Example::
|
||||
|
||||
insmod cs89x0.o io=0x200 irq=0xA media=aui
|
||||
|
||||
This example loads the module and configures the adapter to use an IO port base
|
||||
address of 200h, interrupt 10, and use the AUI media connection. The following
|
||||
configuration options are available on the command line::
|
||||
|
||||
io=### - specify IO address (200h-360h)
|
||||
irq=## - specify interrupt level
|
||||
use_dma=1 - Enable DMA
|
||||
dma=# - specify dma channel (Driver is compiled to support
|
||||
Rx DMA only)
|
||||
dmasize=# (16 or 64) - DMA size 16K or 64K. Default value is set to 16.
|
||||
media=rj45 - specify media type
|
||||
or media=bnc
|
||||
or media=aui
|
||||
or media=auto
|
||||
duplex=full - specify forced half/full/autonegotiate duplex
|
||||
or duplex=half
|
||||
or duplex=auto
|
||||
debug=# - debug level (only available if the driver was compiled
|
||||
for debugging)
|
||||
|
||||
**Notes:**
|
||||
|
||||
a) If an EEPROM is present, any specified command-line parameter
|
||||
will override the corresponding configuration value stored in
|
||||
EEPROM.
|
||||
|
||||
b) The "io" parameter must be specified on the command-line.
|
||||
|
||||
c) The driver's hardware probe routine is designed to avoid
|
||||
writing to I/O space until it knows that there is a cs89x0
|
||||
card at the written addresses. This could cause problems
|
||||
with device probing. To avoid this behaviour, add one
|
||||
to the ``io=`` module parameter. This doesn't actually change
|
||||
the I/O address, but it is a flag to tell the driver
|
||||
to partially initialise the hardware before trying to
|
||||
identify the card. This could be dangerous if you are
|
||||
not sure that there is a cs89x0 card at the provided address.
|
||||
|
||||
For example, to scan for an adapter located at IO base 0x300,
|
||||
specify an IO address of 0x301.
|
||||
|
||||
d) The "duplex=auto" parameter is only supported for the CS8920.
|
||||
|
||||
e) The minimum command-line configuration required if an EEPROM is
|
||||
not present is:
|
||||
|
||||
io
|
||||
irq
|
||||
media type (no autodetect)
|
||||
|
||||
f) The following additional parameters are CS89XX defaults (values
|
||||
used with no EEPROM or command-line argument).
|
||||
|
||||
* DMA Burst = enabled
|
||||
* IOCHRDY Enabled = enabled
|
||||
* UseSA = enabled
|
||||
* CS8900 defaults to half-duplex if not specified on command-line
|
||||
* CS8920 defaults to autoneg if not specified on command-line
|
||||
* Use reset defaults for other config parameters
|
||||
* dma_mode = 0
|
||||
|
||||
g) You can use ifconfig to set the adapter's Ethernet address.
|
||||
|
||||
h) Many Linux distributions use the 'modprobe' command to load
|
||||
modules. This program uses the '/etc/conf.modules' file to
|
||||
determine configuration information which is passed to a driver
|
||||
module when it is loaded. All the configuration options which are
|
||||
described above may be placed within /etc/conf.modules.
|
||||
|
||||
For example::
|
||||
|
||||
> cat /etc/conf.modules
|
||||
...
|
||||
alias eth0 cs89x0
|
||||
options cs89x0 io=0x0200 dma=5 use_dma=1
|
||||
...
|
||||
|
||||
In this example we are telling the module system that the
|
||||
ethernet driver for this machine should use the cs89x0 driver. We
|
||||
are asking 'modprobe' to pass the 'io', 'dma' and 'use_dma'
|
||||
arguments to the driver when it is loaded.
|
||||
|
||||
i) Cirrus recommend that the cs89x0 use the ISA DMA channels 5, 6 or
|
||||
7. You will probably find that other DMA channels will not work.
|
||||
|
||||
j) The cs89x0 supports DMA for receiving only. DMA mode is
|
||||
significantly more efficient. Flooding a 400 MHz Celeron machine
|
||||
with large ping packets consumes 82% of its CPU capacity in non-DMA
|
||||
mode. With DMA this is reduced to 45%.
|
||||
|
||||
k) If your Linux kernel was compiled with inbuilt plug-and-play
|
||||
support you will be able to find information about the cs89x0 card
|
||||
with the command::
|
||||
|
||||
cat /proc/isapnp
|
||||
|
||||
l) If during DMA operation you find erratic behavior or network data
|
||||
corruption you should use your PC's BIOS to slow the EISA bus clock.
|
||||
|
||||
m) If the cs89x0 driver is compiled directly into the kernel
|
||||
(non-modular) then its I/O address is automatically determined by
|
||||
ISA bus probing. The IRQ number, media options, etc are determined
|
||||
from the card's EEPROM.
|
||||
|
||||
n) If the cs89x0 driver is compiled directly into the kernel, DMA
|
||||
mode may be selected by providing the kernel with a boot option
|
||||
'cs89x0_dma=N' where 'N' is the desired DMA channel number (5, 6 or 7).
|
||||
|
||||
Kernel boot options may be provided on the LILO command line::
|
||||
|
||||
LILO boot: linux cs89x0_dma=5
|
||||
|
||||
or they may be placed in /etc/lilo.conf::
|
||||
|
||||
image=/boot/bzImage-2.3.48
|
||||
append="cs89x0_dma=5"
|
||||
label=linux
|
||||
root=/dev/hda5
|
||||
read-only
|
||||
|
||||
The DMA Rx buffer size is hardwired to 16 kbytes in this mode.
|
||||
(64k mode is not available).
|
||||
|
||||
|
||||
4. Compiling the Driver
|
||||
=======================
|
||||
|
||||
The cs89x0 driver can be compiled directly into the kernel or compiled into
|
||||
a loadable device driver module.
|
||||
|
||||
Just use the standard way to configure the driver and compile the Kernel.
|
||||
|
||||
|
||||
4.1. Compiling the Driver to Support Rx DMA
|
||||
-------------------------------------------
|
||||
|
||||
The compile-time optionality for DMA was removed in the 2.3 kernel
|
||||
series. DMA support is now unconditionally part of the driver. It is
|
||||
enabled by the 'use_dma=1' module option.
|
||||
|
||||
|
||||
5. Testing and Troubleshooting
|
||||
==============================
|
||||
|
||||
5.1. Known Defects and Limitations
|
||||
----------------------------------
|
||||
|
||||
Refer to the RELEASE.TXT file distributed as part of this archive for a list of
|
||||
known defects, driver limitations, and work arounds.
|
||||
|
||||
|
||||
5.2. Testing the Adapter
|
||||
------------------------
|
||||
|
||||
Once the adapter has been installed and configured, the diagnostic option of
|
||||
the CS8900/20 Setup Utility can be used to test the functionality of the
|
||||
adapter and its network connection. Use the diagnostics 'Self Test' option to
|
||||
test the functionality of the adapter with the hardware configuration you have
|
||||
assigned. You can use the diagnostics 'Network Test' to test the ability of the
|
||||
adapter to communicate across the Ethernet with another PC equipped with a
|
||||
CS8900/20-based adapter card (it must also be running the CS8900/20 Setup
|
||||
Utility).
|
||||
|
||||
.. note::
|
||||
|
||||
The Setup Utility's diagnostics are designed to run in a
|
||||
DOS-only operating system environment. DO NOT run the diagnostics
|
||||
from a DOS or command prompt session under Windows 95, Windows NT,
|
||||
OS/2, or other operating system.
|
||||
|
||||
To run the diagnostics tests on the CS8900/20 adapter:
|
||||
|
||||
1. Boot DOS on the PC and start the CS8900/20 Setup Utility.
|
||||
|
||||
2. The adapter's current configuration is displayed. Hit the ENTER key to
|
||||
get to the main menu.
|
||||
|
||||
4. Select 'Diagnostics' (ALT-G) from the main menu.
|
||||
* Select 'Self-Test' to test the adapter's basic functionality.
|
||||
* Select 'Network Test' to test the network connection and cabling.
|
||||
|
||||
|
||||
5.2.1. Diagnostic Self-test
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The diagnostic self-test checks the adapter's basic functionality as well as
|
||||
its ability to communicate across the ISA bus based on the system resources
|
||||
assigned during hardware configuration. The following tests are performed:
|
||||
|
||||
* IO Register Read/Write Test
|
||||
|
||||
The IO Register Read/Write test insures that the CS8900/20 can be
|
||||
accessed in IO mode, and that the IO base address is correct.
|
||||
|
||||
* Shared Memory Test
|
||||
|
||||
The Shared Memory test insures the CS8900/20 can be accessed in memory
|
||||
mode and that the range of memory addresses assigned does not conflict
|
||||
with other devices in the system.
|
||||
|
||||
* Interrupt Test
|
||||
|
||||
The Interrupt test insures there are no conflicts with the assigned IRQ
|
||||
signal.
|
||||
|
||||
* EEPROM Test
|
||||
|
||||
The EEPROM test insures the EEPROM can be read.
|
||||
|
||||
* Chip RAM Test
|
||||
|
||||
The Chip RAM test insures the 4K of memory internal to the CS8900/20 is
|
||||
working properly.
|
||||
|
||||
* Internal Loop-back Test
|
||||
|
||||
The Internal Loop Back test insures the adapter's transmitter and
|
||||
receiver are operating properly. If this test fails, make sure the
|
||||
adapter's cable is connected to the network (check for LED activity for
|
||||
example).
|
||||
|
||||
* Boot PROM Test
|
||||
|
||||
The Boot PROM test insures the Boot PROM is present, and can be read.
|
||||
Failure indicates the Boot PROM was not successfully read due to a
|
||||
hardware problem or due to a conflicts on the Boot PROM address
|
||||
assignment. (Test only applies if the adapter is configured to use the
|
||||
Boot PROM option.)
|
||||
|
||||
Failure of a test item indicates a possible system resource conflict with
|
||||
another device on the ISA bus. In this case, you should use the Manual Setup
|
||||
option to reconfigure the adapter by selecting a different value for the system
|
||||
resource that failed.
|
||||
|
||||
|
||||
5.2.2. Diagnostic Network Test
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The Diagnostic Network Test verifies a working network connection by
|
||||
transferring data between two CS8900/20 adapters installed in different PCs
|
||||
on the same network. (Note: the diagnostic network test should not be run
|
||||
between two nodes across a router.)
|
||||
|
||||
This test requires that each of the two PCs have a CS8900/20-based adapter
|
||||
installed and have the CS8900/20 Setup Utility running. The first PC is
|
||||
configured as a Responder and the other PC is configured as an Initiator.
|
||||
Once the Initiator is started, it sends data frames to the Responder which
|
||||
returns the frames to the Initiator.
|
||||
|
||||
The total number of frames received and transmitted are displayed on the
|
||||
Initiator's display, along with a count of the number of frames received and
|
||||
transmitted OK or in error. The test can be terminated anytime by the user at
|
||||
either PC.
|
||||
|
||||
To setup the Diagnostic Network Test:
|
||||
|
||||
1. Select a PC with a CS8900/20-based adapter and a known working network
|
||||
connection to act as the Responder. Run the CS8900/20 Setup Utility
|
||||
and select 'Diagnostics -> Network Test -> Responder' from the main
|
||||
menu. Hit ENTER to start the Responder.
|
||||
|
||||
2. Return to the PC with the CS8900/20-based adapter you want to test and
|
||||
start the CS8900/20 Setup Utility.
|
||||
|
||||
3. From the main menu, Select 'Diagnostic -> Network Test -> Initiator'.
|
||||
Hit ENTER to start the test.
|
||||
|
||||
You may stop the test on the Initiator at any time while allowing the Responder
|
||||
to continue running. In this manner, you can move to additional PCs and test
|
||||
them by starting the Initiator on another PC without having to stop/start the
|
||||
Responder.
|
||||
|
||||
|
||||
|
||||
5.3. Using the Adapter's LEDs
|
||||
-----------------------------
|
||||
|
||||
The 2 and 3-media adapters have two LEDs visible on the back end of the board
|
||||
located near the 10Base-T connector.
|
||||
|
||||
Link Integrity LED: A "steady" ON of the green LED indicates a valid 10Base-T
|
||||
connection. (Only applies to 10Base-T. The green LED has no significance for
|
||||
a 10Base-2 or AUI connection.)
|
||||
|
||||
TX/RX LED: The yellow LED lights briefly each time the adapter transmits or
|
||||
receives data. (The yellow LED will appear to "flicker" on a typical network.)
|
||||
|
||||
|
||||
5.4. Resolving I/O Conflicts
|
||||
----------------------------
|
||||
|
||||
An IO conflict occurs when two or more adapter use the same ISA resource (IO
|
||||
address, memory address or IRQ). You can usually detect an IO conflict in one
|
||||
of four ways after installing and or configuring the CS8900/20-based adapter:
|
||||
|
||||
1. The system does not boot properly (or at all).
|
||||
|
||||
2. The driver cannot communicate with the adapter, reporting an "Adapter
|
||||
not found" error message.
|
||||
|
||||
3. You cannot connect to the network or the driver will not load.
|
||||
|
||||
4. If you have configured the adapter to run in memory mode but the driver
|
||||
reports it is using IO mode when loading, this is an indication of a
|
||||
memory address conflict.
|
||||
|
||||
If an IO conflict occurs, run the CS8900/20 Setup Utility and perform a
|
||||
diagnostic self-test. Normally, the ISA resource in conflict will fail the
|
||||
self-test. If so, reconfigure the adapter selecting another choice for the
|
||||
resource in conflict. Run the diagnostics again to check for further IO
|
||||
conflicts.
|
||||
|
||||
In some cases, such as when the PC will not boot, it may be necessary to remove
|
||||
the adapter and reconfigure it by installing it in another PC to run the
|
||||
CS8900/20 Setup Utility. Once reinstalled in the target system, run the
|
||||
diagnostics self-test to ensure the new configuration is free of conflicts
|
||||
before loading the driver again.
|
||||
|
||||
When manually configuring the adapter, keep in mind the typical ISA system
|
||||
resource usage as indicated in the tables below.
|
||||
|
||||
::
|
||||
|
||||
I/O Address Device IRQ Device
|
||||
----------- -------- --- --------
|
||||
200-20F Game I/O adapter 3 COM2, Bus Mouse
|
||||
230-23F Bus Mouse 4 COM1
|
||||
270-27F LPT3: third parallel port 5 LPT2
|
||||
2F0-2FF COM2: second serial port 6 Floppy Disk controller
|
||||
320-32F Fixed disk controller 7 LPT1
|
||||
8 Real-time Clock
|
||||
9 EGA/VGA display adapter
|
||||
12 Mouse (PS/2)
|
||||
Memory Address Device 13 Math Coprocessor
|
||||
-------------- --------------------- 14 Hard Disk controller
|
||||
A000-BFFF EGA Graphics Adapter
|
||||
A000-C7FF VGA Graphics Adapter
|
||||
B000-BFFF Mono Graphics Adapter
|
||||
B800-BFFF Color Graphics Adapter
|
||||
E000-FFFF AT BIOS
|
||||
|
||||
|
||||
|
||||
|
||||
6. Technical Support
|
||||
====================
|
||||
|
||||
6.1. Contacting Cirrus Logic's Technical Support
|
||||
------------------------------------------------
|
||||
|
||||
Cirrus Logic's CS89XX Technical Application Support can be reached at::
|
||||
|
||||
Telephone :(800) 888-5016 (from inside U.S. and Canada)
|
||||
:(512) 442-7555 (from outside the U.S. and Canada)
|
||||
Fax :(512) 912-3871
|
||||
Email :ethernet@crystal.cirrus.com
|
||||
WWW :http://www.cirrus.com
|
||||
|
||||
|
||||
6.2. Information Required before Contacting Technical Support
|
||||
-------------------------------------------------------------
|
||||
|
||||
Before contacting Cirrus Logic for technical support, be prepared to provide as
|
||||
Much of the following information as possible.
|
||||
|
||||
1.) Adapter type (CRD8900, CDB8900, CDB8920, etc.)
|
||||
|
||||
2.) Adapter configuration
|
||||
|
||||
* IO Base, Memory Base, IO or memory mode enabled, IRQ, DMA channel
|
||||
* Plug and Play enabled/disabled (CS8920-based adapters only)
|
||||
* Configured for media auto-detect or specific media type (which type).
|
||||
|
||||
3.) PC System's Configuration
|
||||
|
||||
* Plug and Play system (yes/no)
|
||||
* BIOS (make and version)
|
||||
* System make and model
|
||||
* CPU (type and speed)
|
||||
* System RAM
|
||||
* SCSI Adapter
|
||||
|
||||
4.) Software
|
||||
|
||||
* CS89XX driver and version
|
||||
* Your network operating system and version
|
||||
* Your system's OS version
|
||||
* Version of all protocol support files
|
||||
|
||||
5.) Any Error Message displayed.
|
||||
|
||||
|
||||
|
||||
6.3 Obtaining the Latest Driver Version
|
||||
---------------------------------------
|
||||
|
||||
You can obtain the latest CS89XX drivers and support software from Cirrus Logic's
|
||||
Web site. You can also contact Cirrus Logic's Technical Support (email:
|
||||
ethernet@crystal.cirrus.com) and request that you be registered for automatic
|
||||
software-update notification.
|
||||
|
||||
Cirrus Logic maintains a web page at http://www.cirrus.com with the
|
||||
latest drivers and technical publications.
|
||||
|
||||
|
||||
6.4. Current maintainer
|
||||
-----------------------
|
||||
|
||||
In February 2000 the maintenance of this driver was assumed by Andrew
|
||||
Morton.
|
||||
|
||||
6.5 Kernel module parameters
|
||||
----------------------------
|
||||
|
||||
For use in embedded environments with no cs89x0 EEPROM, the kernel boot
|
||||
parameter ``cs89x0_media=`` has been implemented. Usage is::
|
||||
|
||||
cs89x0_media=rj45 or
|
||||
cs89x0_media=aui or
|
||||
cs89x0_media=bnc
|
@@ -0,0 +1,171 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================
|
||||
DM9000 Network driver
|
||||
=====================
|
||||
|
||||
Copyright 2008 Simtec Electronics,
|
||||
|
||||
Ben Dooks <ben@simtec.co.uk> <ben-linux@fluff.org>
|
||||
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
This file describes how to use the DM9000 platform-device based network driver
|
||||
that is contained in the files drivers/net/dm9000.c and drivers/net/dm9000.h.
|
||||
|
||||
The driver supports three DM9000 variants, the DM9000E which is the first chip
|
||||
supported as well as the newer DM9000A and DM9000B devices. It is currently
|
||||
maintained and tested by Ben Dooks, who should be CC: to any patches for this
|
||||
driver.
|
||||
|
||||
|
||||
Defining the platform device
|
||||
----------------------------
|
||||
|
||||
The minimum set of resources attached to the platform device are as follows:
|
||||
|
||||
1) The physical address of the address register
|
||||
2) The physical address of the data register
|
||||
3) The IRQ line the device's interrupt pin is connected to.
|
||||
|
||||
These resources should be specified in that order, as the ordering of the
|
||||
two address regions is important (the driver expects these to be address
|
||||
and then data).
|
||||
|
||||
An example from arch/arm/mach-s3c2410/mach-bast.c is::
|
||||
|
||||
static struct resource bast_dm9k_resource[] = {
|
||||
[0] = {
|
||||
.start = S3C2410_CS5 + BAST_PA_DM9000,
|
||||
.end = S3C2410_CS5 + BAST_PA_DM9000 + 3,
|
||||
.flags = IORESOURCE_MEM,
|
||||
},
|
||||
[1] = {
|
||||
.start = S3C2410_CS5 + BAST_PA_DM9000 + 0x40,
|
||||
.end = S3C2410_CS5 + BAST_PA_DM9000 + 0x40 + 0x3f,
|
||||
.flags = IORESOURCE_MEM,
|
||||
},
|
||||
[2] = {
|
||||
.start = IRQ_DM9000,
|
||||
.end = IRQ_DM9000,
|
||||
.flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,
|
||||
}
|
||||
};
|
||||
|
||||
static struct platform_device bast_device_dm9k = {
|
||||
.name = "dm9000",
|
||||
.id = 0,
|
||||
.num_resources = ARRAY_SIZE(bast_dm9k_resource),
|
||||
.resource = bast_dm9k_resource,
|
||||
};
|
||||
|
||||
Note the setting of the IRQ trigger flag in bast_dm9k_resource[2].flags,
|
||||
as this will generate a warning if it is not present. The trigger from
|
||||
the flags field will be passed to request_irq() when registering the IRQ
|
||||
handler to ensure that the IRQ is setup correctly.
|
||||
|
||||
This shows a typical platform device, without the optional configuration
|
||||
platform data supplied. The next example uses the same resources, but adds
|
||||
the optional platform data to pass extra configuration data::
|
||||
|
||||
static struct dm9000_plat_data bast_dm9k_platdata = {
|
||||
.flags = DM9000_PLATF_16BITONLY,
|
||||
};
|
||||
|
||||
static struct platform_device bast_device_dm9k = {
|
||||
.name = "dm9000",
|
||||
.id = 0,
|
||||
.num_resources = ARRAY_SIZE(bast_dm9k_resource),
|
||||
.resource = bast_dm9k_resource,
|
||||
.dev = {
|
||||
.platform_data = &bast_dm9k_platdata,
|
||||
}
|
||||
};
|
||||
|
||||
The platform data is defined in include/linux/dm9000.h and described below.
|
||||
|
||||
|
||||
Platform data
|
||||
-------------
|
||||
|
||||
Extra platform data for the DM9000 can describe the IO bus width to the
|
||||
device, whether or not an external PHY is attached to the device and
|
||||
the availability of an external configuration EEPROM.
|
||||
|
||||
The flags for the platform data .flags field are as follows:
|
||||
|
||||
DM9000_PLATF_8BITONLY
|
||||
|
||||
The IO should be done with 8bit operations.
|
||||
|
||||
DM9000_PLATF_16BITONLY
|
||||
|
||||
The IO should be done with 16bit operations.
|
||||
|
||||
DM9000_PLATF_32BITONLY
|
||||
|
||||
The IO should be done with 32bit operations.
|
||||
|
||||
DM9000_PLATF_EXT_PHY
|
||||
|
||||
The chip is connected to an external PHY.
|
||||
|
||||
DM9000_PLATF_NO_EEPROM
|
||||
|
||||
This can be used to signify that the board does not have an
|
||||
EEPROM, or that the EEPROM should be hidden from the user.
|
||||
|
||||
DM9000_PLATF_SIMPLE_PHY
|
||||
|
||||
Switch to using the simpler PHY polling method which does not
|
||||
try and read the MII PHY state regularly. This is only available
|
||||
when using the internal PHY. See the section on link state polling
|
||||
for more information.
|
||||
|
||||
The config symbol DM9000_FORCE_SIMPLE_PHY_POLL, Kconfig entry
|
||||
"Force simple NSR based PHY polling" allows this flag to be
|
||||
forced on at build time.
|
||||
|
||||
|
||||
PHY Link state polling
|
||||
----------------------
|
||||
|
||||
The driver keeps track of the link state and informs the network core
|
||||
about link (carrier) availability. This is managed by several methods
|
||||
depending on the version of the chip and on which PHY is being used.
|
||||
|
||||
For the internal PHY, the original (and currently default) method is
|
||||
to read the MII state, either when the status changes if we have the
|
||||
necessary interrupt support in the chip or every two seconds via a
|
||||
periodic timer.
|
||||
|
||||
To reduce the overhead for the internal PHY, there is now the option
|
||||
of using the DM9000_FORCE_SIMPLE_PHY_POLL config, or DM9000_PLATF_SIMPLE_PHY
|
||||
platform data option to read the summary information without the
|
||||
expensive MII accesses. This method is faster, but does not print
|
||||
as much information.
|
||||
|
||||
When using an external PHY, the driver currently has to poll the MII
|
||||
link status as there is no method for getting an interrupt on link change.
|
||||
|
||||
|
||||
DM9000A / DM9000B
|
||||
-----------------
|
||||
|
||||
These chips are functionally similar to the DM9000E and are supported easily
|
||||
by the same driver. The features are:
|
||||
|
||||
1) Interrupt on internal PHY state change. This means that the periodic
|
||||
polling of the PHY status may be disabled on these devices when using
|
||||
the internal PHY.
|
||||
|
||||
2) TCP/UDP checksum offloading, which the driver does not currently support.
|
||||
|
||||
|
||||
ethtool
|
||||
-------
|
||||
|
||||
The driver supports the ethtool interface for access to the driver
|
||||
state information, the PHY state and the EEPROM.
|
189
Documentation/networking/device_drivers/ethernet/dec/de4x5.rst
Normal file
189
Documentation/networking/device_drivers/ethernet/dec/de4x5.rst
Normal file
@@ -0,0 +1,189 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================================
|
||||
DEC EtherWORKS Ethernet De4x5 cards
|
||||
===================================
|
||||
|
||||
Originally, this driver was written for the Digital Equipment
|
||||
Corporation series of EtherWORKS Ethernet cards:
|
||||
|
||||
- DE425 TP/COAX EISA
|
||||
- DE434 TP PCI
|
||||
- DE435 TP/COAX/AUI PCI
|
||||
- DE450 TP/COAX/AUI PCI
|
||||
- DE500 10/100 PCI Fasternet
|
||||
|
||||
but it will now attempt to support all cards which conform to the
|
||||
Digital Semiconductor SROM Specification. The driver currently
|
||||
recognises the following chips:
|
||||
|
||||
- DC21040 (no SROM)
|
||||
- DC21041[A]
|
||||
- DC21140[A]
|
||||
- DC21142
|
||||
- DC21143
|
||||
|
||||
So far the driver is known to work with the following cards:
|
||||
|
||||
- KINGSTON
|
||||
- Linksys
|
||||
- ZNYX342
|
||||
- SMC8432
|
||||
- SMC9332 (w/new SROM)
|
||||
- ZNYX31[45]
|
||||
- ZNYX346 10/100 4 port (can act as a 10/100 bridge!)
|
||||
|
||||
The driver has been tested on a relatively busy network using the DE425,
|
||||
DE434, DE435 and DE500 cards and benchmarked with 'ttcp': it transferred
|
||||
16M of data to a DECstation 5000/200 as follows::
|
||||
|
||||
TCP UDP
|
||||
TX RX TX RX
|
||||
DE425 1030k 997k 1170k 1128k
|
||||
DE434 1063k 995k 1170k 1125k
|
||||
DE435 1063k 995k 1170k 1125k
|
||||
DE500 1063k 998k 1170k 1125k in 10Mb/s mode
|
||||
|
||||
All values are typical (in kBytes/sec) from a sample of 4 for each
|
||||
measurement. Their error is +/-20k on a quiet (private) network and also
|
||||
depend on what load the CPU has.
|
||||
|
||||
----------------------------------------------------------------------------
|
||||
|
||||
The ability to load this driver as a loadable module has been included
|
||||
and used extensively during the driver development (to save those long
|
||||
reboot sequences). Loadable module support under PCI and EISA has been
|
||||
achieved by letting the driver autoprobe as if it were compiled into the
|
||||
kernel. Do make sure you're not sharing interrupts with anything that
|
||||
cannot accommodate interrupt sharing!
|
||||
|
||||
To utilise this ability, you have to do 8 things:
|
||||
|
||||
0) have a copy of the loadable modules code installed on your system.
|
||||
1) copy de4x5.c from the /linux/drivers/net directory to your favourite
|
||||
temporary directory.
|
||||
2) for fixed autoprobes (not recommended), edit the source code near
|
||||
line 5594 to reflect the I/O address you're using, or assign these when
|
||||
loading by::
|
||||
|
||||
insmod de4x5 io=0xghh where g = bus number
|
||||
hh = device number
|
||||
|
||||
.. note::
|
||||
|
||||
autoprobing for modules is now supported by default. You may just
|
||||
use::
|
||||
|
||||
insmod de4x5
|
||||
|
||||
to load all available boards. For a specific board, still use
|
||||
the 'io=?' above.
|
||||
3) compile de4x5.c, but include -DMODULE in the command line to ensure
|
||||
that the correct bits are compiled (see end of source code).
|
||||
4) if you are wanting to add a new card, goto 5. Otherwise, recompile a
|
||||
kernel with the de4x5 configuration turned off and reboot.
|
||||
5) insmod de4x5 [io=0xghh]
|
||||
6) run the net startup bits for your new eth?? interface(s) manually
|
||||
(usually /etc/rc.inet[12] at boot time).
|
||||
7) enjoy!
|
||||
|
||||
To unload a module, turn off the associated interface(s)
|
||||
'ifconfig eth?? down' then 'rmmod de4x5'.
|
||||
|
||||
Automedia detection is included so that in principle you can disconnect
|
||||
from, e.g. TP, reconnect to BNC and things will still work (after a
|
||||
pause while the driver figures out where its media went). My tests
|
||||
using ping showed that it appears to work....
|
||||
|
||||
By default, the driver will now autodetect any DECchip based card.
|
||||
Should you have a need to restrict the driver to DIGITAL only cards, you
|
||||
can compile with a DEC_ONLY define, or if loading as a module, use the
|
||||
'dec_only=1' parameter.
|
||||
|
||||
I've changed the timing routines to use the kernel timer and scheduling
|
||||
functions so that the hangs and other assorted problems that occurred
|
||||
while autosensing the media should be gone. A bonus for the DC21040
|
||||
auto media sense algorithm is that it can now use one that is more in
|
||||
line with the rest (the DC21040 chip doesn't have a hardware timer).
|
||||
The downside is the 1 'jiffies' (10ms) resolution.
|
||||
|
||||
IEEE 802.3u MII interface code has been added in anticipation that some
|
||||
products may use it in the future.
|
||||
|
||||
The SMC9332 card has a non-compliant SROM which needs fixing - I have
|
||||
patched this driver to detect it because the SROM format used complies
|
||||
to a previous DEC-STD format.
|
||||
|
||||
I have removed the buffer copies needed for receive on Intels. I cannot
|
||||
remove them for Alphas since the Tulip hardware only does longword
|
||||
aligned DMA transfers and the Alphas get alignment traps with non
|
||||
longword aligned data copies (which makes them really slow). No comment.
|
||||
|
||||
I have added SROM decoding routines to make this driver work with any
|
||||
card that supports the Digital Semiconductor SROM spec. This will help
|
||||
all cards running the dc2114x series chips in particular. Cards using
|
||||
the dc2104x chips should run correctly with the basic driver. I'm in
|
||||
debt to <mjacob@feral.com> for the testing and feedback that helped get
|
||||
this feature working. So far we have tested KINGSTON, SMC8432, SMC9332
|
||||
(with the latest SROM complying with the SROM spec V3: their first was
|
||||
broken), ZNYX342 and LinkSys. ZNYX314 (dual 21041 MAC) and ZNYX 315
|
||||
(quad 21041 MAC) cards also appear to work despite their incorrectly
|
||||
wired IRQs.
|
||||
|
||||
I have added a temporary fix for interrupt problems when some SCSI cards
|
||||
share the same interrupt as the DECchip based cards. The problem occurs
|
||||
because the SCSI card wants to grab the interrupt as a fast interrupt
|
||||
(runs the service routine with interrupts turned off) vs. this card
|
||||
which really needs to run the service routine with interrupts turned on.
|
||||
This driver will now add the interrupt service routine as a fast
|
||||
interrupt if it is bounced from the slow interrupt. THIS IS NOT A
|
||||
RECOMMENDED WAY TO RUN THE DRIVER and has been done for a limited time
|
||||
until people sort out their compatibility issues and the kernel
|
||||
interrupt service code is fixed. YOU SHOULD SEPARATE OUT THE FAST
|
||||
INTERRUPT CARDS FROM THE SLOW INTERRUPT CARDS to ensure that they do not
|
||||
run on the same interrupt. PCMCIA/CardBus is another can of worms...
|
||||
|
||||
Finally, I think I have really fixed the module loading problem with
|
||||
more than one DECchip based card. As a side effect, I don't mess with
|
||||
the device structure any more which means that if more than 1 card in
|
||||
2.0.x is installed (4 in 2.1.x), the user will have to edit
|
||||
linux/drivers/net/Space.c to make room for them. Hence, module loading
|
||||
is the preferred way to use this driver, since it doesn't have this
|
||||
limitation.
|
||||
|
||||
Where SROM media detection is used and full duplex is specified in the
|
||||
SROM, the feature is ignored unless lp->params.fdx is set at compile
|
||||
time OR during a module load (insmod de4x5 args='eth??:fdx' [see
|
||||
below]). This is because there is no way to automatically detect full
|
||||
duplex links except through autonegotiation. When I include the
|
||||
autonegotiation feature in the SROM autoconf code, this detection will
|
||||
occur automatically for that case.
|
||||
|
||||
Command line arguments are now allowed, similar to passing arguments
|
||||
through LILO. This will allow a per adapter board set up of full duplex
|
||||
and media. The only lexical constraints are: the board name (dev->name)
|
||||
appears in the list before its parameters. The list of parameters ends
|
||||
either at the end of the parameter list or with another board name. The
|
||||
following parameters are allowed:
|
||||
|
||||
========= ===============================================
|
||||
fdx for full duplex
|
||||
autosense to set the media/speed; with the following
|
||||
sub-parameters:
|
||||
TP, TP_NW, BNC, AUI, BNC_AUI, 100Mb, 10Mb, AUTO
|
||||
========= ===============================================
|
||||
|
||||
Case sensitivity is important for the sub-parameters. They *must* be
|
||||
upper case. Examples::
|
||||
|
||||
insmod de4x5 args='eth1:fdx autosense=BNC eth0:autosense=100Mb'.
|
||||
|
||||
For a compiled in driver, in linux/drivers/net/CONFIG, place e.g.::
|
||||
|
||||
DE4X5_OPTS = -DDE4X5_PARM='"eth0:fdx autosense=AUI eth2:autosense=TP"'
|
||||
|
||||
Yes, I know full duplex isn't permissible on BNC or AUI; they're just
|
||||
examples. By default, full duplex is turned off and AUTO is the default
|
||||
autosense setting. In reality, I expect only the full duplex option to
|
||||
be used. Note the use of single quotes in the two examples above and the
|
||||
lack of commas to separate items.
|
@@ -0,0 +1,71 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============================================================
|
||||
Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux
|
||||
==============================================================
|
||||
|
||||
Note: This driver doesn't have a maintainer.
|
||||
|
||||
|
||||
This program is free software; you can redistribute it and/or
|
||||
modify it under the terms of the GNU General Public License
|
||||
as published by the Free Software Foundation; either version 2
|
||||
of the License, or (at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
|
||||
This driver provides kernel support for Davicom DM9102(A)/DM9132/DM9801 ethernet cards ( CNET
|
||||
10/100 ethernet cards uses Davicom chipset too, so this driver supports CNET cards too ).If you
|
||||
didn't compile this driver as a module, it will automatically load itself on boot and print a
|
||||
line similar to::
|
||||
|
||||
dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17)
|
||||
|
||||
If you compiled this driver as a module, you have to load it on boot.You can load it with command::
|
||||
|
||||
insmod dmfe
|
||||
|
||||
This way it will autodetect the device mode.This is the suggested way to load the module.Or you can pass
|
||||
a mode= setting to module while loading, like::
|
||||
|
||||
insmod dmfe mode=0 # Force 10M Half Duplex
|
||||
insmod dmfe mode=1 # Force 100M Half Duplex
|
||||
insmod dmfe mode=4 # Force 10M Full Duplex
|
||||
insmod dmfe mode=5 # Force 100M Full Duplex
|
||||
|
||||
Next you should configure your network interface with a command similar to::
|
||||
|
||||
ifconfig eth0 172.22.3.18
|
||||
^^^^^^^^^^^
|
||||
Your IP Address
|
||||
|
||||
Then you may have to modify the default routing table with command::
|
||||
|
||||
route add default eth0
|
||||
|
||||
|
||||
Now your ethernet card should be up and running.
|
||||
|
||||
|
||||
TODO:
|
||||
|
||||
- Implement pci_driver::suspend() and pci_driver::resume() power management methods.
|
||||
- Check on 64 bit boxes.
|
||||
- Check and fix on big endian boxes.
|
||||
- Test and make sure PCI latency is now correct for all cases.
|
||||
|
||||
|
||||
Authors:
|
||||
|
||||
Sten Wang <sten_wang@davicom.com.tw > : Original Author
|
||||
|
||||
Contributors:
|
||||
|
||||
- Marcelo Tosatti <marcelo@conectiva.com.br>
|
||||
- Alan Cox <alan@lxorguk.ukuu.org.uk>
|
||||
- Jeff Garzik <jgarzik@pobox.com>
|
||||
- Vojtech Pavlik <vojtech@suse.cz>
|
314
Documentation/networking/device_drivers/ethernet/dlink/dl2k.rst
Normal file
314
Documentation/networking/device_drivers/ethernet/dlink/dl2k.rst
Normal file
@@ -0,0 +1,314 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================================================
|
||||
D-Link DL2000-based Gigabit Ethernet Adapter Installation
|
||||
=========================================================
|
||||
|
||||
May 23, 2002
|
||||
|
||||
.. Contents
|
||||
|
||||
- Compatibility List
|
||||
- Quick Install
|
||||
- Compiling the Driver
|
||||
- Installing the Driver
|
||||
- Option parameter
|
||||
- Configuration Script Sample
|
||||
- Troubleshooting
|
||||
|
||||
|
||||
Compatibility List
|
||||
==================
|
||||
|
||||
Adapter Support:
|
||||
|
||||
- D-Link DGE-550T Gigabit Ethernet Adapter.
|
||||
- D-Link DGE-550SX Gigabit Ethernet Adapter.
|
||||
- D-Link DL2000-based Gigabit Ethernet Adapter.
|
||||
|
||||
|
||||
The driver support Linux kernel 2.4.7 later. We had tested it
|
||||
on the environments below.
|
||||
|
||||
. Red Hat v6.2 (update kernel to 2.4.7)
|
||||
. Red Hat v7.0 (update kernel to 2.4.7)
|
||||
. Red Hat v7.1 (kernel 2.4.7)
|
||||
. Red Hat v7.2 (kernel 2.4.7-10)
|
||||
|
||||
|
||||
Quick Install
|
||||
=============
|
||||
Install linux driver as following command::
|
||||
|
||||
1. make all
|
||||
2. insmod dl2k.ko
|
||||
3. ifconfig eth0 up 10.xxx.xxx.xxx netmask 255.0.0.0
|
||||
^^^^^^^^^^^^^^^\ ^^^^^^^^\
|
||||
IP NETMASK
|
||||
|
||||
Now eth0 should active, you can test it by "ping" or get more information by
|
||||
"ifconfig". If tested ok, continue the next step.
|
||||
|
||||
4. ``cp dl2k.ko /lib/modules/`uname -r`/kernel/drivers/net``
|
||||
5. Add the following line to /etc/modprobe.d/dl2k.conf::
|
||||
|
||||
alias eth0 dl2k
|
||||
|
||||
6. Run ``depmod`` to updated module indexes.
|
||||
7. Run ``netconfig`` or ``netconf`` to create configuration script ifcfg-eth0
|
||||
located at /etc/sysconfig/network-scripts or create it manually.
|
||||
|
||||
[see - Configuration Script Sample]
|
||||
8. Driver will automatically load and configure at next boot time.
|
||||
|
||||
Compiling the Driver
|
||||
====================
|
||||
In Linux, NIC drivers are most commonly configured as loadable modules.
|
||||
The approach of building a monolithic kernel has become obsolete. The driver
|
||||
can be compiled as part of a monolithic kernel, but is strongly discouraged.
|
||||
The remainder of this section assumes the driver is built as a loadable module.
|
||||
In the Linux environment, it is a good idea to rebuild the driver from the
|
||||
source instead of relying on a precompiled version. This approach provides
|
||||
better reliability since a precompiled driver might depend on libraries or
|
||||
kernel features that are not present in a given Linux installation.
|
||||
|
||||
The 3 files necessary to build Linux device driver are dl2k.c, dl2k.h and
|
||||
Makefile. To compile, the Linux installation must include the gcc compiler,
|
||||
the kernel source, and the kernel headers. The Linux driver supports Linux
|
||||
Kernels 2.4.7. Copy the files to a directory and enter the following command
|
||||
to compile and link the driver:
|
||||
|
||||
CD-ROM drive
|
||||
------------
|
||||
|
||||
::
|
||||
|
||||
[root@XXX /] mkdir cdrom
|
||||
[root@XXX /] mount -r -t iso9660 -o conv=auto /dev/cdrom /cdrom
|
||||
[root@XXX /] cd root
|
||||
[root@XXX /root] mkdir dl2k
|
||||
[root@XXX /root] cd dl2k
|
||||
[root@XXX dl2k] cp /cdrom/linux/dl2k.tgz /root/dl2k
|
||||
[root@XXX dl2k] tar xfvz dl2k.tgz
|
||||
[root@XXX dl2k] make all
|
||||
|
||||
Floppy disc drive
|
||||
-----------------
|
||||
|
||||
::
|
||||
|
||||
[root@XXX /] cd root
|
||||
[root@XXX /root] mkdir dl2k
|
||||
[root@XXX /root] cd dl2k
|
||||
[root@XXX dl2k] mcopy a:/linux/dl2k.tgz /root/dl2k
|
||||
[root@XXX dl2k] tar xfvz dl2k.tgz
|
||||
[root@XXX dl2k] make all
|
||||
|
||||
Installing the Driver
|
||||
=====================
|
||||
|
||||
Manual Installation
|
||||
-------------------
|
||||
|
||||
Once the driver has been compiled, it must be loaded, enabled, and bound
|
||||
to a protocol stack in order to establish network connectivity. To load a
|
||||
module enter the command::
|
||||
|
||||
insmod dl2k.o
|
||||
|
||||
or::
|
||||
|
||||
insmod dl2k.o <optional parameter> ; add parameter
|
||||
|
||||
---------------------------------------------------------
|
||||
|
||||
example::
|
||||
|
||||
insmod dl2k.o media=100mbps_hd
|
||||
|
||||
or::
|
||||
|
||||
insmod dl2k.o media=3
|
||||
|
||||
or::
|
||||
|
||||
insmod dl2k.o media=3,2 ; for 2 cards
|
||||
|
||||
---------------------------------------------------------
|
||||
|
||||
Please reference the list of the command line parameters supported by
|
||||
the Linux device driver below.
|
||||
|
||||
The insmod command only loads the driver and gives it a name of the form
|
||||
eth0, eth1, etc. To bring the NIC into an operational state,
|
||||
it is necessary to issue the following command::
|
||||
|
||||
ifconfig eth0 up
|
||||
|
||||
Finally, to bind the driver to the active protocol (e.g., TCP/IP with
|
||||
Linux), enter the following command::
|
||||
|
||||
ifup eth0
|
||||
|
||||
Note that this is meaningful only if the system can find a configuration
|
||||
script that contains the necessary network information. A sample will be
|
||||
given in the next paragraph.
|
||||
|
||||
The commands to unload a driver are as follows::
|
||||
|
||||
ifdown eth0
|
||||
ifconfig eth0 down
|
||||
rmmod dl2k.o
|
||||
|
||||
The following are the commands to list the currently loaded modules and
|
||||
to see the current network configuration::
|
||||
|
||||
lsmod
|
||||
ifconfig
|
||||
|
||||
|
||||
Automated Installation
|
||||
----------------------
|
||||
This section describes how to install the driver such that it is
|
||||
automatically loaded and configured at boot time. The following description
|
||||
is based on a Red Hat 6.0/7.0 distribution, but it can easily be ported to
|
||||
other distributions as well.
|
||||
|
||||
Red Hat v6.x/v7.x
|
||||
-----------------
|
||||
1. Copy dl2k.o to the network modules directory, typically
|
||||
/lib/modules/2.x.x-xx/net or /lib/modules/2.x.x/kernel/drivers/net.
|
||||
2. Locate the boot module configuration file, most commonly in the
|
||||
/etc/modprobe.d/ directory. Add the following lines::
|
||||
|
||||
alias ethx dl2k
|
||||
options dl2k <optional parameters>
|
||||
|
||||
where ethx will be eth0 if the NIC is the only ethernet adapter, eth1 if
|
||||
one other ethernet adapter is installed, etc. Refer to the table in the
|
||||
previous section for the list of optional parameters.
|
||||
3. Locate the network configuration scripts, normally the
|
||||
/etc/sysconfig/network-scripts directory, and create a configuration
|
||||
script named ifcfg-ethx that contains network information.
|
||||
4. Note that for most Linux distributions, Red Hat included, a configuration
|
||||
utility with a graphical user interface is provided to perform steps 2
|
||||
and 3 above.
|
||||
|
||||
|
||||
Parameter Description
|
||||
=====================
|
||||
You can install this driver without any additional parameter. However, if you
|
||||
are going to have extensive functions then it is necessary to set extra
|
||||
parameter. Below is a list of the command line parameters supported by the
|
||||
Linux device
|
||||
driver.
|
||||
|
||||
|
||||
=============================== ==============================================
|
||||
mtu=packet_size Specifies the maximum packet size. default
|
||||
is 1500.
|
||||
|
||||
media=media_type Specifies the media type the NIC operates at.
|
||||
autosense Autosensing active media.
|
||||
|
||||
=========== =========================
|
||||
10mbps_hd 10Mbps half duplex.
|
||||
10mbps_fd 10Mbps full duplex.
|
||||
100mbps_hd 100Mbps half duplex.
|
||||
100mbps_fd 100Mbps full duplex.
|
||||
1000mbps_fd 1000Mbps full duplex.
|
||||
1000mbps_hd 1000Mbps half duplex.
|
||||
0 Autosensing active media.
|
||||
1 10Mbps half duplex.
|
||||
2 10Mbps full duplex.
|
||||
3 100Mbps half duplex.
|
||||
4 100Mbps full duplex.
|
||||
5 1000Mbps half duplex.
|
||||
6 1000Mbps full duplex.
|
||||
=========== =========================
|
||||
|
||||
By default, the NIC operates at autosense.
|
||||
1000mbps_fd and 1000mbps_hd types are only
|
||||
available for fiber adapter.
|
||||
|
||||
vlan=n Specifies the VLAN ID. If vlan=0, the
|
||||
Virtual Local Area Network (VLAN) function is
|
||||
disable.
|
||||
|
||||
jumbo=[0|1] Specifies the jumbo frame support. If jumbo=1,
|
||||
the NIC accept jumbo frames. By default, this
|
||||
function is disabled.
|
||||
Jumbo frame usually improve the performance
|
||||
int gigabit.
|
||||
This feature need jumbo frame compatible
|
||||
remote.
|
||||
|
||||
rx_coalesce=m Number of rx frame handled each interrupt.
|
||||
rx_timeout=n Rx DMA wait time for an interrupt.
|
||||
If set rx_coalesce > 0, hardware only assert
|
||||
an interrupt for m frames. Hardware won't
|
||||
assert rx interrupt until m frames received or
|
||||
reach timeout of n * 640 nano seconds.
|
||||
Set proper rx_coalesce and rx_timeout can
|
||||
reduce congestion collapse and overload which
|
||||
has been a bottleneck for high speed network.
|
||||
|
||||
For example, rx_coalesce=10 rx_timeout=800.
|
||||
that is, hardware assert only 1 interrupt
|
||||
for 10 frames received or timeout of 512 us.
|
||||
|
||||
tx_coalesce=n Number of tx frame handled each interrupt.
|
||||
Set n > 1 can reduce the interrupts
|
||||
congestion usually lower performance of
|
||||
high speed network card. Default is 16.
|
||||
|
||||
tx_flow=[1|0] Specifies the Tx flow control. If tx_flow=0,
|
||||
the Tx flow control disable else driver
|
||||
autodetect.
|
||||
rx_flow=[1|0] Specifies the Rx flow control. If rx_flow=0,
|
||||
the Rx flow control enable else driver
|
||||
autodetect.
|
||||
=============================== ==============================================
|
||||
|
||||
|
||||
Configuration Script Sample
|
||||
===========================
|
||||
Here is a sample of a simple configuration script::
|
||||
|
||||
DEVICE=eth0
|
||||
USERCTL=no
|
||||
ONBOOT=yes
|
||||
POOTPROTO=none
|
||||
BROADCAST=207.200.5.255
|
||||
NETWORK=207.200.5.0
|
||||
NETMASK=255.255.255.0
|
||||
IPADDR=207.200.5.2
|
||||
|
||||
|
||||
Troubleshooting
|
||||
===============
|
||||
Q1. Source files contain ^ M behind every line.
|
||||
|
||||
Make sure all files are Unix file format (no LF). Try the following
|
||||
shell command to convert files::
|
||||
|
||||
cat dl2k.c | col -b > dl2k.tmp
|
||||
mv dl2k.tmp dl2k.c
|
||||
|
||||
OR::
|
||||
|
||||
cat dl2k.c | tr -d "\r" > dl2k.tmp
|
||||
mv dl2k.tmp dl2k.c
|
||||
|
||||
Q2: Could not find header files (``*.h``)?
|
||||
|
||||
To compile the driver, you need kernel header files. After
|
||||
installing the kernel source, the header files are usually located in
|
||||
/usr/src/linux/include, which is the default include directory configured
|
||||
in Makefile. For some distributions, there is a copy of header files in
|
||||
/usr/src/include/linux and /usr/src/include/asm, that you can change the
|
||||
INCLUDEDIR in Makefile to /usr/include without installing kernel source.
|
||||
|
||||
Note that RH 7.0 didn't provide correct header files in /usr/include,
|
||||
including those files will make a wrong version driver.
|
||||
|
@@ -0,0 +1,269 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============================
|
||||
The QorIQ DPAA Ethernet Driver
|
||||
==============================
|
||||
|
||||
Authors:
|
||||
- Madalin Bucur <madalin.bucur@nxp.com>
|
||||
- Camelia Groza <camelia.groza@nxp.com>
|
||||
|
||||
.. Contents
|
||||
|
||||
- DPAA Ethernet Overview
|
||||
- DPAA Ethernet Supported SoCs
|
||||
- Configuring DPAA Ethernet in your kernel
|
||||
- DPAA Ethernet Frame Processing
|
||||
- DPAA Ethernet Features
|
||||
- DPAA IRQ Affinity and Receive Side Scaling
|
||||
- Debugging
|
||||
|
||||
DPAA Ethernet Overview
|
||||
======================
|
||||
|
||||
DPAA stands for Data Path Acceleration Architecture and it is a
|
||||
set of networking acceleration IPs that are available on several
|
||||
generations of SoCs, both on PowerPC and ARM64.
|
||||
|
||||
The Freescale DPAA architecture consists of a series of hardware blocks
|
||||
that support Ethernet connectivity. The Ethernet driver depends upon the
|
||||
following drivers in the Linux kernel:
|
||||
|
||||
- Peripheral Access Memory Unit (PAMU) (* needed only for PPC platforms)
|
||||
drivers/iommu/fsl_*
|
||||
- Frame Manager (FMan)
|
||||
drivers/net/ethernet/freescale/fman
|
||||
- Queue Manager (QMan), Buffer Manager (BMan)
|
||||
drivers/soc/fsl/qbman
|
||||
|
||||
A simplified view of the dpaa_eth interfaces mapped to FMan MACs::
|
||||
|
||||
dpaa_eth /eth0\ ... /ethN\
|
||||
driver | | | |
|
||||
------------- ---- ----------- ---- -------------
|
||||
-Ports / Tx Rx \ ... / Tx Rx \
|
||||
FMan | | | |
|
||||
-MACs | MAC0 | | MACN |
|
||||
/ dtsec0 \ ... / dtsecN \ (or tgec)
|
||||
/ \ / \(or memac)
|
||||
--------- -------------- --- -------------- ---------
|
||||
FMan, FMan Port, FMan SP, FMan MURAM drivers
|
||||
---------------------------------------------------------
|
||||
FMan HW blocks: MURAM, MACs, Ports, SP
|
||||
---------------------------------------------------------
|
||||
|
||||
The dpaa_eth relation to the QMan, BMan and FMan::
|
||||
|
||||
________________________________
|
||||
dpaa_eth / eth0 \
|
||||
driver / \
|
||||
--------- -^- -^- -^- --- ---------
|
||||
QMan driver / \ / \ / \ \ / | BMan |
|
||||
|Rx | |Rx | |Tx | |Tx | | driver |
|
||||
--------- |Dfl| |Err| |Cnf| |FQs| | |
|
||||
QMan HW |FQ | |FQ | |FQs| | | | |
|
||||
/ \ / \ / \ \ / | |
|
||||
--------- --- --- --- -v- ---------
|
||||
| FMan QMI | |
|
||||
| FMan HW FMan BMI | BMan HW |
|
||||
----------------------- --------
|
||||
|
||||
where the acronyms used above (and in the code) are:
|
||||
|
||||
=============== ===========================================================
|
||||
DPAA Data Path Acceleration Architecture
|
||||
FMan DPAA Frame Manager
|
||||
QMan DPAA Queue Manager
|
||||
BMan DPAA Buffers Manager
|
||||
QMI QMan interface in FMan
|
||||
BMI BMan interface in FMan
|
||||
FMan SP FMan Storage Profiles
|
||||
MURAM Multi-user RAM in FMan
|
||||
FQ QMan Frame Queue
|
||||
Rx Dfl FQ default reception FQ
|
||||
Rx Err FQ Rx error frames FQ
|
||||
Tx Cnf FQ Tx confirmation FQs
|
||||
Tx FQs transmission frame queues
|
||||
dtsec datapath three speed Ethernet controller (10/100/1000 Mbps)
|
||||
tgec ten gigabit Ethernet controller (10 Gbps)
|
||||
memac multirate Ethernet MAC (10/100/1000/10000)
|
||||
=============== ===========================================================
|
||||
|
||||
DPAA Ethernet Supported SoCs
|
||||
============================
|
||||
|
||||
The DPAA drivers enable the Ethernet controllers present on the following SoCs:
|
||||
|
||||
PPC
|
||||
- P1023
|
||||
- P2041
|
||||
- P3041
|
||||
- P4080
|
||||
- P5020
|
||||
- P5040
|
||||
- T1023
|
||||
- T1024
|
||||
- T1040
|
||||
- T1042
|
||||
- T2080
|
||||
- T4240
|
||||
- B4860
|
||||
|
||||
ARM
|
||||
- LS1043A
|
||||
- LS1046A
|
||||
|
||||
Configuring DPAA Ethernet in your kernel
|
||||
========================================
|
||||
|
||||
To enable the DPAA Ethernet driver, the following Kconfig options are required::
|
||||
|
||||
# common for arch/arm64 and arch/powerpc platforms
|
||||
CONFIG_FSL_DPAA=y
|
||||
CONFIG_FSL_FMAN=y
|
||||
CONFIG_FSL_DPAA_ETH=y
|
||||
CONFIG_FSL_XGMAC_MDIO=y
|
||||
|
||||
# for arch/powerpc only
|
||||
CONFIG_FSL_PAMU=y
|
||||
|
||||
# common options needed for the PHYs used on the RDBs
|
||||
CONFIG_VITESSE_PHY=y
|
||||
CONFIG_REALTEK_PHY=y
|
||||
CONFIG_AQUANTIA_PHY=y
|
||||
|
||||
DPAA Ethernet Frame Processing
|
||||
==============================
|
||||
|
||||
On Rx, buffers for the incoming frames are retrieved from the buffers found
|
||||
in the dedicated interface buffer pool. The driver initializes and seeds these
|
||||
with one page buffers.
|
||||
|
||||
On Tx, all transmitted frames are returned to the driver through Tx
|
||||
confirmation frame queues. The driver is then responsible for freeing the
|
||||
buffers. In order to do this properly, a backpointer is added to the buffer
|
||||
before transmission that points to the skb. When the buffer returns to the
|
||||
driver on a confirmation FQ, the skb can be correctly consumed.
|
||||
|
||||
DPAA Ethernet Features
|
||||
======================
|
||||
|
||||
Currently the DPAA Ethernet driver enables the basic features required for
|
||||
a Linux Ethernet driver. The support for advanced features will be added
|
||||
gradually.
|
||||
|
||||
The driver has Rx and Tx checksum offloading for UDP and TCP. Currently the Rx
|
||||
checksum offload feature is enabled by default and cannot be controlled through
|
||||
ethtool. Also, rx-flow-hash and rx-hashing was added. The addition of RSS
|
||||
provides a big performance boost for the forwarding scenarios, allowing
|
||||
different traffic flows received by one interface to be processed by different
|
||||
CPUs in parallel.
|
||||
|
||||
The driver has support for multiple prioritized Tx traffic classes. Priorities
|
||||
range from 0 (lowest) to 3 (highest). These are mapped to HW workqueues with
|
||||
strict priority levels. Each traffic class contains NR_CPU TX queues. By
|
||||
default, only one traffic class is enabled and the lowest priority Tx queues
|
||||
are used. Higher priority traffic classes can be enabled with the mqprio
|
||||
qdisc. For example, all four traffic classes are enabled on an interface with
|
||||
the following command. Furthermore, skb priority levels are mapped to traffic
|
||||
classes as follows:
|
||||
|
||||
* priorities 0 to 3 - traffic class 0 (low priority)
|
||||
* priorities 4 to 7 - traffic class 1 (medium-low priority)
|
||||
* priorities 8 to 11 - traffic class 2 (medium-high priority)
|
||||
* priorities 12 to 15 - traffic class 3 (high priority)
|
||||
|
||||
::
|
||||
|
||||
tc qdisc add dev <int> root handle 1: \
|
||||
mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
|
||||
|
||||
DPAA IRQ Affinity and Receive Side Scaling
|
||||
==========================================
|
||||
|
||||
Traffic coming on the DPAA Rx queues or on the DPAA Tx confirmation
|
||||
queues is seen by the CPU as ingress traffic on a certain portal.
|
||||
The DPAA QMan portal interrupts are affined each to a certain CPU.
|
||||
The same portal interrupt services all the QMan portal consumers.
|
||||
|
||||
By default the DPAA Ethernet driver enables RSS, making use of the
|
||||
DPAA FMan Parser and Keygen blocks to distribute traffic on 128
|
||||
hardware frame queues using a hash on IP v4/v6 source and destination
|
||||
and L4 source and destination ports, in present in the received frame.
|
||||
When RSS is disabled, all traffic received by a certain interface is
|
||||
received on the default Rx frame queue. The default DPAA Rx frame
|
||||
queues are configured to put the received traffic into a pool channel
|
||||
that allows any available CPU portal to dequeue the ingress traffic.
|
||||
The default frame queues have the HOLDACTIVE option set, ensuring that
|
||||
traffic bursts from a certain queue are serviced by the same CPU.
|
||||
This ensures a very low rate of frame reordering. A drawback of this
|
||||
is that only one CPU at a time can service the traffic received by a
|
||||
certain interface when RSS is not enabled.
|
||||
|
||||
To implement RSS, the DPAA Ethernet driver allocates an extra set of
|
||||
128 Rx frame queues that are configured to dedicated channels, in a
|
||||
round-robin manner. The mapping of the frame queues to CPUs is now
|
||||
hardcoded, there is no indirection table to move traffic for a certain
|
||||
FQ (hash result) to another CPU. The ingress traffic arriving on one
|
||||
of these frame queues will arrive at the same portal and will always
|
||||
be processed by the same CPU. This ensures intra-flow order preservation
|
||||
and workload distribution for multiple traffic flows.
|
||||
|
||||
RSS can be turned off for a certain interface using ethtool, i.e.::
|
||||
|
||||
# ethtool -N fm1-mac9 rx-flow-hash tcp4 ""
|
||||
|
||||
To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6::
|
||||
|
||||
# ethtool -N fm1-mac9 rx-flow-hash udp4 sfdn
|
||||
|
||||
There is no independent control for individual protocols, any command
|
||||
run for one of tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 is
|
||||
going to control the rx-flow-hashing for all protocols on that interface.
|
||||
|
||||
Besides using the FMan Keygen computed hash for spreading traffic on the
|
||||
128 Rx FQs, the DPAA Ethernet driver also sets the skb hash value when
|
||||
the NETIF_F_RXHASH feature is on (active by default). This can be turned
|
||||
on or off through ethtool, i.e.::
|
||||
|
||||
# ethtool -K fm1-mac9 rx-hashing off
|
||||
# ethtool -k fm1-mac9 | grep hash
|
||||
receive-hashing: off
|
||||
# ethtool -K fm1-mac9 rx-hashing on
|
||||
Actual changes:
|
||||
receive-hashing: on
|
||||
# ethtool -k fm1-mac9 | grep hash
|
||||
receive-hashing: on
|
||||
|
||||
Please note that Rx hashing depends upon the rx-flow-hashing being on
|
||||
for that interface - turning off rx-flow-hashing will also disable the
|
||||
rx-hashing (without ethtool reporting it as off as that depends on the
|
||||
NETIF_F_RXHASH feature flag).
|
||||
|
||||
Debugging
|
||||
=========
|
||||
|
||||
The following statistics are exported for each interface through ethtool:
|
||||
|
||||
- interrupt count per CPU
|
||||
- Rx packets count per CPU
|
||||
- Tx packets count per CPU
|
||||
- Tx confirmed packets count per CPU
|
||||
- Tx S/G frames count per CPU
|
||||
- Tx error count per CPU
|
||||
- Rx error count per CPU
|
||||
- Rx error count per type
|
||||
- congestion related statistics:
|
||||
|
||||
- congestion status
|
||||
- time spent in congestion
|
||||
- number of time the device entered congestion
|
||||
- dropped packets count per cause
|
||||
|
||||
The driver also exports the following information in sysfs:
|
||||
|
||||
- the FQ IDs for each FQ type
|
||||
/sys/devices/platform/soc/<addr>.fman/<addr>.ethernet/dpaa-ethernet.<id>/net/fm<nr>-mac<nr>/fqids
|
||||
|
||||
- the ID of the buffer pool in use
|
||||
/sys/devices/platform/soc/<addr>.fman/<addr>.ethernet/dpaa-ethernet.<id>/net/fm<nr>-mac<nr>/bpids
|
@@ -0,0 +1,160 @@
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
DPAA2 DPIO (Data Path I/O) Overview
|
||||
===================================
|
||||
|
||||
:Copyright: |copy| 2016-2018 NXP
|
||||
|
||||
This document provides an overview of the Freescale DPAA2 DPIO
|
||||
drivers
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
A DPAA2 DPIO (Data Path I/O) is a hardware object that provides
|
||||
interfaces to enqueue and dequeue frames to/from network interfaces
|
||||
and other accelerators. A DPIO also provides hardware buffer
|
||||
pool management for network interfaces.
|
||||
|
||||
This document provides an overview the Linux DPIO driver, its
|
||||
subcomponents, and its APIs.
|
||||
|
||||
See
|
||||
Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
|
||||
for a general overview of DPAA2 and the general DPAA2 driver architecture
|
||||
in Linux.
|
||||
|
||||
Driver Overview
|
||||
---------------
|
||||
|
||||
The DPIO driver is bound to DPIO objects discovered on the fsl-mc bus and
|
||||
provides services that:
|
||||
|
||||
A. allow other drivers, such as the Ethernet driver, to enqueue and dequeue
|
||||
frames for their respective objects
|
||||
B. allow drivers to register callbacks for data availability notifications
|
||||
when data becomes available on a queue or channel
|
||||
C. allow drivers to manage hardware buffer pools
|
||||
|
||||
The Linux DPIO driver consists of 3 primary components--
|
||||
DPIO object driver-- fsl-mc driver that manages the DPIO object
|
||||
|
||||
DPIO service-- provides APIs to other Linux drivers for services
|
||||
|
||||
QBman portal interface-- sends portal commands, gets responses::
|
||||
|
||||
fsl-mc other
|
||||
bus drivers
|
||||
| |
|
||||
+---+----+ +------+-----+
|
||||
|DPIO obj| |DPIO service|
|
||||
| driver |---| (DPIO) |
|
||||
+--------+ +------+-----+
|
||||
|
|
||||
+------+-----+
|
||||
| QBman |
|
||||
| portal i/f |
|
||||
+------------+
|
||||
|
|
||||
hardware
|
||||
|
||||
|
||||
The diagram below shows how the DPIO driver components fit with the other
|
||||
DPAA2 Linux driver components::
|
||||
|
||||
+------------+
|
||||
| OS Network |
|
||||
| Stack |
|
||||
+------------+ +------------+
|
||||
| Allocator |. . . . . . . | Ethernet |
|
||||
|(DPMCP,DPBP)| | (DPNI) |
|
||||
+-.----------+ +---+---+----+
|
||||
. . ^ |
|
||||
. . <data avail, | |<enqueue,
|
||||
. . tx confirm> | | dequeue>
|
||||
+-------------+ . | |
|
||||
| DPRC driver | . +--------+ +------------+
|
||||
| (DPRC) | . . |DPIO obj| |DPIO service|
|
||||
+----------+--+ | driver |-| (DPIO) |
|
||||
| +--------+ +------+-----+
|
||||
|<dev add/remove> +------|-----+
|
||||
| | QBman |
|
||||
+----+--------------+ | portal i/f |
|
||||
| MC-bus driver | +------------+
|
||||
| | |
|
||||
| /soc/fsl-mc | |
|
||||
+-------------------+ |
|
||||
|
|
||||
=========================================|=========|========================
|
||||
+-+--DPIO---|-----------+
|
||||
| | |
|
||||
| QBman Portal |
|
||||
+-----------------------+
|
||||
|
||||
============================================================================
|
||||
|
||||
|
||||
DPIO Object Driver (dpio-driver.c)
|
||||
----------------------------------
|
||||
|
||||
The dpio-driver component registers with the fsl-mc bus to handle objects of
|
||||
type "dpio". The implementation of probe() handles basic initialization
|
||||
of the DPIO including mapping of the DPIO regions (the QBman SW portal)
|
||||
and initializing interrupts and registering irq handlers. The dpio-driver
|
||||
registers the probed DPIO with dpio-service.
|
||||
|
||||
DPIO service (dpio-service.c, dpaa2-io.h)
|
||||
------------------------------------------
|
||||
|
||||
The dpio service component provides queuing, notification, and buffers
|
||||
management services to DPAA2 drivers, such as the Ethernet driver. A system
|
||||
will typically allocate 1 DPIO object per CPU to allow queuing operations
|
||||
to happen simultaneously across all CPUs.
|
||||
|
||||
Notification handling
|
||||
dpaa2_io_service_register()
|
||||
|
||||
dpaa2_io_service_deregister()
|
||||
|
||||
dpaa2_io_service_rearm()
|
||||
|
||||
Queuing
|
||||
dpaa2_io_service_pull_fq()
|
||||
|
||||
dpaa2_io_service_pull_channel()
|
||||
|
||||
dpaa2_io_service_enqueue_fq()
|
||||
|
||||
dpaa2_io_service_enqueue_qd()
|
||||
|
||||
dpaa2_io_store_create()
|
||||
|
||||
dpaa2_io_store_destroy()
|
||||
|
||||
dpaa2_io_store_next()
|
||||
|
||||
Buffer pool management
|
||||
dpaa2_io_service_release()
|
||||
|
||||
dpaa2_io_service_acquire()
|
||||
|
||||
QBman portal interface (qbman-portal.c)
|
||||
---------------------------------------
|
||||
|
||||
The qbman-portal component provides APIs to do the low level hardware
|
||||
bit twiddling for operations such as:
|
||||
|
||||
- initializing Qman software portals
|
||||
- building and sending portal commands
|
||||
- portal interrupt configuration and processing
|
||||
|
||||
The qbman-portal APIs are not public to other drivers, and are
|
||||
only used by dpio-service.
|
||||
|
||||
Other (dpaa2-fd.h, dpaa2-global.h)
|
||||
----------------------------------
|
||||
|
||||
Frame descriptor and scatter-gather definitions and the APIs used to
|
||||
manipulate them are defined in dpaa2-fd.h.
|
||||
|
||||
Dequeue result struct and parsing APIs are defined in dpaa2-global.h.
|
@@ -0,0 +1,186 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
===============================
|
||||
DPAA2 Ethernet driver
|
||||
===============================
|
||||
|
||||
:Copyright: |copy| 2017-2018 NXP
|
||||
|
||||
This file provides documentation for the Freescale DPAA2 Ethernet driver.
|
||||
|
||||
Supported Platforms
|
||||
===================
|
||||
This driver provides networking support for Freescale DPAA2 SoCs, e.g.
|
||||
LS2080A, LS2088A, LS1088A.
|
||||
|
||||
|
||||
Architecture Overview
|
||||
=====================
|
||||
Unlike regular NICs, in the DPAA2 architecture there is no single hardware block
|
||||
representing network interfaces; instead, several separate hardware resources
|
||||
concur to provide the networking functionality:
|
||||
|
||||
- network interfaces
|
||||
- queues, channels
|
||||
- buffer pools
|
||||
- MAC/PHY
|
||||
|
||||
All hardware resources are allocated and configured through the Management
|
||||
Complex (MC) portals. MC abstracts most of these resources as DPAA2 objects
|
||||
and exposes ABIs through which they can be configured and controlled. A few
|
||||
hardware resources, like queues, do not have a corresponding MC object and
|
||||
are treated as internal resources of other objects.
|
||||
|
||||
For a more detailed description of the DPAA2 architecture and its object
|
||||
abstractions see
|
||||
*Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst*.
|
||||
|
||||
Each Linux net device is built on top of a Datapath Network Interface (DPNI)
|
||||
object and uses Buffer Pools (DPBPs), I/O Portals (DPIOs) and Concentrators
|
||||
(DPCONs).
|
||||
|
||||
Configuration interface::
|
||||
|
||||
-----------------------
|
||||
| DPAA2 Ethernet Driver |
|
||||
-----------------------
|
||||
. . .
|
||||
. . .
|
||||
. . . . . . . . . . . .
|
||||
. . .
|
||||
. . .
|
||||
---------- ---------- -----------
|
||||
| DPBP API | | DPNI API | | DPCON API |
|
||||
---------- ---------- -----------
|
||||
. . . software
|
||||
======= . ========== . ============ . ===================
|
||||
. . . hardware
|
||||
------------------------------------------
|
||||
| MC hardware portals |
|
||||
------------------------------------------
|
||||
. . .
|
||||
. . .
|
||||
------ ------ -------
|
||||
| DPBP | | DPNI | | DPCON |
|
||||
------ ------ -------
|
||||
|
||||
The DPNIs are network interfaces without a direct one-on-one mapping to PHYs.
|
||||
DPBPs represent hardware buffer pools. Packet I/O is performed in the context
|
||||
of DPCON objects, using DPIO portals for managing and communicating with the
|
||||
hardware resources.
|
||||
|
||||
Datapath (I/O) interface::
|
||||
|
||||
-----------------------------------------------
|
||||
| DPAA2 Ethernet Driver |
|
||||
-----------------------------------------------
|
||||
| ^ ^ | |
|
||||
| | | | |
|
||||
enqueue| dequeue| data | dequeue| seed |
|
||||
(Tx) | (Rx, TxC)| avail.| request| buffers|
|
||||
| | notify| | |
|
||||
| | | | |
|
||||
V | | V V
|
||||
-----------------------------------------------
|
||||
| DPIO Driver |
|
||||
-----------------------------------------------
|
||||
| | | | | software
|
||||
| | | | | ================
|
||||
| | | | | hardware
|
||||
-----------------------------------------------
|
||||
| I/O hardware portals |
|
||||
-----------------------------------------------
|
||||
| ^ ^ | |
|
||||
| | | | |
|
||||
| | | V |
|
||||
V | ================ V
|
||||
---------------------- | -------------
|
||||
queues ---------------------- | | Buffer pool |
|
||||
---------------------- | -------------
|
||||
=======================
|
||||
Channel
|
||||
|
||||
Datapath I/O (DPIO) portals provide enqueue and dequeue services, data
|
||||
availability notifications and buffer pool management. DPIOs are shared between
|
||||
all DPAA2 objects (and implicitly all DPAA2 kernel drivers) that work with data
|
||||
frames, but must be affine to the CPUs for the purpose of traffic distribution.
|
||||
|
||||
Frames are transmitted and received through hardware frame queues, which can be
|
||||
grouped in channels for the purpose of hardware scheduling. The Ethernet driver
|
||||
enqueues TX frames on egress queues and after transmission is complete a TX
|
||||
confirmation frame is sent back to the CPU.
|
||||
|
||||
When frames are available on ingress queues, a data availability notification
|
||||
is sent to the CPU; notifications are raised per channel, so even if multiple
|
||||
queues in the same channel have available frames, only one notification is sent.
|
||||
After a channel fires a notification, is must be explicitly rearmed.
|
||||
|
||||
Each network interface can have multiple Rx, Tx and confirmation queues affined
|
||||
to CPUs, and one channel (DPCON) for each CPU that services at least one queue.
|
||||
DPCONs are used to distribute ingress traffic to different CPUs via the cores'
|
||||
affine DPIOs.
|
||||
|
||||
The role of hardware buffer pools is storage of ingress frame data. Each network
|
||||
interface has a privately owned buffer pool which it seeds with kernel allocated
|
||||
buffers.
|
||||
|
||||
|
||||
DPNIs are decoupled from PHYs; a DPNI can be connected to a PHY through a DPMAC
|
||||
object or to another DPNI through an internal link, but the connection is
|
||||
managed by MC and completely transparent to the Ethernet driver.
|
||||
|
||||
::
|
||||
|
||||
--------- --------- ---------
|
||||
| eth if1 | | eth if2 | | eth ifn |
|
||||
--------- --------- ---------
|
||||
. . .
|
||||
. . .
|
||||
. . .
|
||||
---------------------------
|
||||
| DPAA2 Ethernet Driver |
|
||||
---------------------------
|
||||
. . .
|
||||
. . .
|
||||
. . .
|
||||
------ ------ ------ -------
|
||||
| DPNI | | DPNI | | DPNI | | DPMAC |----+
|
||||
------ ------ ------ ------- |
|
||||
| | | | |
|
||||
| | | | -----
|
||||
=========== ================== | PHY |
|
||||
-----
|
||||
|
||||
Creating a Network Interface
|
||||
============================
|
||||
A net device is created for each DPNI object probed on the MC bus. Each DPNI has
|
||||
a number of properties which determine the network interface configuration
|
||||
options and associated hardware resources.
|
||||
|
||||
DPNI objects (and the other DPAA2 objects needed for a network interface) can be
|
||||
added to a container on the MC bus in one of two ways: statically, through a
|
||||
Datapath Layout Binary file (DPL) that is parsed by MC at boot time; or created
|
||||
dynamically at runtime, via the DPAA2 objects APIs.
|
||||
|
||||
|
||||
Features & Offloads
|
||||
===================
|
||||
Hardware checksum offloading is supported for TCP and UDP over IPv4/6 frames.
|
||||
The checksum offloads can be independently configured on RX and TX through
|
||||
ethtool.
|
||||
|
||||
Hardware offload of unicast and multicast MAC filtering is supported on the
|
||||
ingress path and permanently enabled.
|
||||
|
||||
Scatter-gather frames are supported on both RX and TX paths. On TX, SG support
|
||||
is configurable via ethtool; on RX it is always enabled.
|
||||
|
||||
The DPAA2 hardware can process jumbo Ethernet frames of up to 10K bytes.
|
||||
|
||||
The Ethernet driver defines a static flow hashing scheme that distributes
|
||||
traffic based on a 5-tuple key: src IP, dst IP, IP proto, L4 src port,
|
||||
L4 dst port. No user configuration is supported for now.
|
||||
|
||||
Hardware specific statistics for the network interface as well as some
|
||||
non-standard driver stats can be consulted through ethtool -S option.
|
@@ -0,0 +1,11 @@
|
||||
===================
|
||||
DPAA2 Documentation
|
||||
===================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
overview
|
||||
dpio-driver
|
||||
ethernet-driver
|
||||
mac-phy-support
|
@@ -0,0 +1,191 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=======================
|
||||
DPAA2 MAC / PHY support
|
||||
=======================
|
||||
|
||||
:Copyright: |copy| 2019 NXP
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
The DPAA2 MAC / PHY support consists of a set of APIs that help DPAA2 network
|
||||
drivers (dpaa2-eth, dpaa2-ethsw) interract with the PHY library.
|
||||
|
||||
DPAA2 Software Architecture
|
||||
---------------------------
|
||||
|
||||
Among other DPAA2 objects, the fsl-mc bus exports DPNI objects (abstracting a
|
||||
network interface) and DPMAC objects (abstracting a MAC). The dpaa2-eth driver
|
||||
probes on the DPNI object and connects to and configures a DPMAC object with
|
||||
the help of phylink.
|
||||
|
||||
Data connections may be established between a DPNI and a DPMAC, or between two
|
||||
DPNIs. Depending on the connection type, the netif_carrier_[on/off] is handled
|
||||
directly by the dpaa2-eth driver or by phylink.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
Sources of abstracted link state information presented by the MC firmware
|
||||
|
||||
+--------------------------------------+
|
||||
+------------+ +---------+ | xgmac_mdio |
|
||||
| net_device | | phylink |--| +-----+ +-----+ +-----+ +-----+ |
|
||||
+------------+ +---------+ | | PHY | | PHY | | PHY | | PHY | |
|
||||
| | | +-----+ +-----+ +-----+ +-----+ |
|
||||
+------------------------------------+ | External MDIO bus |
|
||||
| dpaa2-eth | +--------------------------------------+
|
||||
+------------------------------------+
|
||||
| | Linux
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
| | MC firmware
|
||||
| /| V
|
||||
+----------+ / | +----------+
|
||||
| | / | | |
|
||||
| | | | | |
|
||||
| DPNI |<------| |<------| DPMAC |
|
||||
| | | | | |
|
||||
| | \ |<---+ | |
|
||||
+----------+ \ | | +----------+
|
||||
\| |
|
||||
|
|
||||
+--------------------------------------+
|
||||
| MC firmware polling MAC PCS for link |
|
||||
| +-----+ +-----+ +-----+ +-----+ |
|
||||
| | PCS | | PCS | | PCS | | PCS | |
|
||||
| +-----+ +-----+ +-----+ +-----+ |
|
||||
| Internal MDIO bus |
|
||||
+--------------------------------------+
|
||||
|
||||
|
||||
Depending on an MC firmware configuration setting, each MAC may be in one of two modes:
|
||||
|
||||
- DPMAC_LINK_TYPE_FIXED: the link state management is handled exclusively by
|
||||
the MC firmware by polling the MAC PCS. Without the need to register a
|
||||
phylink instance, the dpaa2-eth driver will not bind to the connected dpmac
|
||||
object at all.
|
||||
|
||||
- DPMAC_LINK_TYPE_PHY: The MC firmware is left waiting for link state update
|
||||
events, but those are in fact passed strictly between the dpaa2-mac (based on
|
||||
phylink) and its attached net_device driver (dpaa2-eth, dpaa2-ethsw),
|
||||
effectively bypassing the firmware.
|
||||
|
||||
Implementation
|
||||
--------------
|
||||
|
||||
At probe time or when a DPNI's endpoint is dynamically changed, the dpaa2-eth
|
||||
is responsible to find out if the peer object is a DPMAC and if this is the
|
||||
case, to integrate it with PHYLINK using the dpaa2_mac_connect() API, which
|
||||
will do the following:
|
||||
|
||||
- look up the device tree for PHYLINK-compatible of binding (phy-handle)
|
||||
- will create a PHYLINK instance associated with the received net_device
|
||||
- connect to the PHY using phylink_of_phy_connect()
|
||||
|
||||
The following phylink_mac_ops callback are implemented:
|
||||
|
||||
- .validate() will populate the supported linkmodes with the MAC capabilities
|
||||
only when the phy_interface_t is RGMII_* (at the moment, this is the only
|
||||
link type supported by the driver).
|
||||
|
||||
- .mac_config() will configure the MAC in the new configuration using the
|
||||
dpmac_set_link_state() MC firmware API.
|
||||
|
||||
- .mac_link_up() / .mac_link_down() will update the MAC link using the same
|
||||
API described above.
|
||||
|
||||
At driver unbind() or when the DPNI object is disconnected from the DPMAC, the
|
||||
dpaa2-eth driver calls dpaa2_mac_disconnect() which will, in turn, disconnect
|
||||
from the PHY and destroy the PHYLINK instance.
|
||||
|
||||
In case of a DPNI-DPMAC connection, an 'ip link set dev eth0 up' would start
|
||||
the following sequence of operations:
|
||||
|
||||
(1) phylink_start() called from .dev_open().
|
||||
(2) The .mac_config() and .mac_link_up() callbacks are called by PHYLINK.
|
||||
(3) In order to configure the HW MAC, the MC Firmware API
|
||||
dpmac_set_link_state() is called.
|
||||
(4) The firmware will eventually setup the HW MAC in the new configuration.
|
||||
(5) A netif_carrier_on() call is made directly from PHYLINK on the associated
|
||||
net_device.
|
||||
(6) The dpaa2-eth driver handles the LINK_STATE_CHANGE irq in order to
|
||||
enable/disable Rx taildrop based on the pause frame settings.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
+---------+ +---------+
|
||||
| PHYLINK |-------------->| eth0 |
|
||||
+---------+ (5) +---------+
|
||||
(1) ^ |
|
||||
| |
|
||||
| v (2)
|
||||
+-----------------------------------+
|
||||
| dpaa2-eth |
|
||||
+-----------------------------------+
|
||||
| ^ (6)
|
||||
| |
|
||||
v (3) |
|
||||
+---------+---------------+---------+
|
||||
| DPMAC | | DPNI |
|
||||
+---------+ +---------+
|
||||
| MC Firmware |
|
||||
+-----------------------------------+
|
||||
|
|
||||
|
|
||||
v (4)
|
||||
+-----------------------------------+
|
||||
| HW MAC |
|
||||
+-----------------------------------+
|
||||
|
||||
In case of a DPNI-DPNI connection, a usual sequence of operations looks like
|
||||
the following:
|
||||
|
||||
(1) ip link set dev eth0 up
|
||||
(2) The dpni_enable() MC API called on the associated fsl_mc_device.
|
||||
(3) ip link set dev eth1 up
|
||||
(4) The dpni_enable() MC API called on the associated fsl_mc_device.
|
||||
(5) The LINK_STATE_CHANGED irq is received by both instances of the dpaa2-eth
|
||||
driver because now the operational link state is up.
|
||||
(6) The netif_carrier_on() is called on the exported net_device from
|
||||
link_state_update().
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
+---------+ +---------+
|
||||
| eth0 | | eth1 |
|
||||
+---------+ +---------+
|
||||
| ^ ^ |
|
||||
| | | |
|
||||
(1) v | (6) (6) | v (3)
|
||||
+---------+ +---------+
|
||||
|dpaa2-eth| |dpaa2-eth|
|
||||
+---------+ +---------+
|
||||
| ^ ^ |
|
||||
| | | |
|
||||
(2) v | (5) (5) | v (4)
|
||||
+---------+---------------+---------+
|
||||
| DPNI | | DPNI |
|
||||
+---------+ +---------+
|
||||
| MC Firmware |
|
||||
+-----------------------------------+
|
||||
|
||||
|
||||
Exported API
|
||||
------------
|
||||
|
||||
Any DPAA2 driver that drivers endpoints of DPMAC objects should service its
|
||||
_EVENT_ENDPOINT_CHANGED irq and connect/disconnect from the associated DPMAC
|
||||
when necessary using the below listed API::
|
||||
|
||||
- int dpaa2_mac_connect(struct dpaa2_mac *mac);
|
||||
- void dpaa2_mac_disconnect(struct dpaa2_mac *mac);
|
||||
|
||||
A phylink integration is necessary only when the partner DPMAC is not of TYPE_FIXED.
|
||||
One can check for this condition using the below API::
|
||||
|
||||
- bool dpaa2_mac_is_type_fixed(struct fsl_mc_device *dpmac_dev,struct fsl_mc_io *mc_io);
|
||||
|
||||
Before connection to a MAC, the caller must allocate and populate the
|
||||
dpaa2_mac structure with the associated net_device, a pointer to the MC portal
|
||||
to be used and the actual fsl_mc_device structure of the DPMAC.
|
@@ -0,0 +1,405 @@
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=========================================================
|
||||
DPAA2 (Data Path Acceleration Architecture Gen2) Overview
|
||||
=========================================================
|
||||
|
||||
:Copyright: |copy| 2015 Freescale Semiconductor Inc.
|
||||
:Copyright: |copy| 2018 NXP
|
||||
|
||||
This document provides an overview of the Freescale DPAA2 architecture
|
||||
and how it is integrated into the Linux kernel.
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
DPAA2 is a hardware architecture designed for high-speeed network
|
||||
packet processing. DPAA2 consists of sophisticated mechanisms for
|
||||
processing Ethernet packets, queue management, buffer management,
|
||||
autonomous L2 switching, virtual Ethernet bridging, and accelerator
|
||||
(e.g. crypto) sharing.
|
||||
|
||||
A DPAA2 hardware component called the Management Complex (or MC) manages the
|
||||
DPAA2 hardware resources. The MC provides an object-based abstraction for
|
||||
software drivers to use the DPAA2 hardware.
|
||||
The MC uses DPAA2 hardware resources such as queues, buffer pools, and
|
||||
network ports to create functional objects/devices such as network
|
||||
interfaces, an L2 switch, or accelerator instances.
|
||||
The MC provides memory-mapped I/O command interfaces (MC portals)
|
||||
which DPAA2 software drivers use to operate on DPAA2 objects.
|
||||
|
||||
The diagram below shows an overview of the DPAA2 resource management
|
||||
architecture::
|
||||
|
||||
+--------------------------------------+
|
||||
| OS |
|
||||
| DPAA2 drivers |
|
||||
| | |
|
||||
+-----------------------------|--------+
|
||||
|
|
||||
| (create,discover,connect
|
||||
| config,use,destroy)
|
||||
|
|
||||
DPAA2 |
|
||||
+------------------------| mc portal |-+
|
||||
| | |
|
||||
| +- - - - - - - - - - - - -V- - -+ |
|
||||
| | | |
|
||||
| | Management Complex (MC) | |
|
||||
| | | |
|
||||
| +- - - - - - - - - - - - - - - -+ |
|
||||
| |
|
||||
| Hardware Hardware |
|
||||
| Resources Objects |
|
||||
| --------- ------- |
|
||||
| -queues -DPRC |
|
||||
| -buffer pools -DPMCP |
|
||||
| -Eth MACs/ports -DPIO |
|
||||
| -network interface -DPNI |
|
||||
| profiles -DPMAC |
|
||||
| -queue portals -DPBP |
|
||||
| -MC portals ... |
|
||||
| ... |
|
||||
| |
|
||||
+--------------------------------------+
|
||||
|
||||
|
||||
The MC mediates operations such as create, discover,
|
||||
connect, configuration, and destroy. Fast-path operations
|
||||
on data, such as packet transmit/receive, are not mediated by
|
||||
the MC and are done directly using memory mapped regions in
|
||||
DPIO objects.
|
||||
|
||||
Overview of DPAA2 Objects
|
||||
=========================
|
||||
|
||||
The section provides a brief overview of some key DPAA2 objects.
|
||||
A simple scenario is described illustrating the objects involved
|
||||
in creating a network interfaces.
|
||||
|
||||
DPRC (Datapath Resource Container)
|
||||
----------------------------------
|
||||
|
||||
A DPRC is a container object that holds all the other
|
||||
types of DPAA2 objects. In the example diagram below there
|
||||
are 8 objects of 5 types (DPMCP, DPIO, DPBP, DPNI, and DPMAC)
|
||||
in the container.
|
||||
|
||||
::
|
||||
|
||||
+---------------------------------------------------------+
|
||||
| DPRC |
|
||||
| |
|
||||
| +-------+ +-------+ +-------+ +-------+ +-------+ |
|
||||
| | DPMCP | | DPIO | | DPBP | | DPNI | | DPMAC | |
|
||||
| +-------+ +-------+ +-------+ +---+---+ +---+---+ |
|
||||
| | DPMCP | | DPIO | |
|
||||
| +-------+ +-------+ |
|
||||
| | DPMCP | |
|
||||
| +-------+ |
|
||||
| |
|
||||
+---------------------------------------------------------+
|
||||
|
||||
From the point of view of an OS, a DPRC behaves similar to a plug and
|
||||
play bus, like PCI. DPRC commands can be used to enumerate the contents
|
||||
of the DPRC, discover the hardware objects present (including mappable
|
||||
regions and interrupts).
|
||||
|
||||
::
|
||||
|
||||
DPRC.1 (bus)
|
||||
|
|
||||
+--+--------+-------+-------+-------+
|
||||
| | | | |
|
||||
DPMCP.1 DPIO.1 DPBP.1 DPNI.1 DPMAC.1
|
||||
DPMCP.2 DPIO.2
|
||||
DPMCP.3
|
||||
|
||||
Hardware objects can be created and destroyed dynamically, providing
|
||||
the ability to hot plug/unplug objects in and out of the DPRC.
|
||||
|
||||
A DPRC has a mappable MMIO region (an MC portal) that can be used
|
||||
to send MC commands. It has an interrupt for status events (like
|
||||
hotplug).
|
||||
All objects in a container share the same hardware "isolation context".
|
||||
This means that with respect to an IOMMU the isolation granularity
|
||||
is at the DPRC (container) level, not at the individual object
|
||||
level.
|
||||
|
||||
DPRCs can be defined statically and populated with objects
|
||||
via a config file passed to the MC when firmware starts it.
|
||||
|
||||
DPAA2 Objects for an Ethernet Network Interface
|
||||
-----------------------------------------------
|
||||
|
||||
A typical Ethernet NIC is monolithic-- the NIC device contains TX/RX
|
||||
queuing mechanisms, configuration mechanisms, buffer management,
|
||||
physical ports, and interrupts. DPAA2 uses a more granular approach
|
||||
utilizing multiple hardware objects. Each object provides specialized
|
||||
functions. Groups of these objects are used by software to provide
|
||||
Ethernet network interface functionality. This approach provides
|
||||
efficient use of finite hardware resources, flexibility, and
|
||||
performance advantages.
|
||||
|
||||
The diagram below shows the objects needed for a simple
|
||||
network interface configuration on a system with 2 CPUs.
|
||||
|
||||
::
|
||||
|
||||
+---+---+ +---+---+
|
||||
CPU0 CPU1
|
||||
+---+---+ +---+---+
|
||||
| |
|
||||
+---+---+ +---+---+
|
||||
DPIO DPIO
|
||||
+---+---+ +---+---+
|
||||
\ /
|
||||
\ /
|
||||
\ /
|
||||
+---+---+
|
||||
DPNI --- DPBP,DPMCP
|
||||
+---+---+
|
||||
|
|
||||
|
|
||||
+---+---+
|
||||
DPMAC
|
||||
+---+---+
|
||||
|
|
||||
port/PHY
|
||||
|
||||
Below the objects are described. For each object a brief description
|
||||
is provided along with a summary of the kinds of operations the object
|
||||
supports and a summary of key resources of the object (MMIO regions
|
||||
and IRQs).
|
||||
|
||||
DPMAC (Datapath Ethernet MAC)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Represents an Ethernet MAC, a hardware device that connects to an Ethernet
|
||||
PHY and allows physical transmission and reception of Ethernet frames.
|
||||
|
||||
- MMIO regions: none
|
||||
- IRQs: DPNI link change
|
||||
- commands: set link up/down, link config, get stats,
|
||||
IRQ config, enable, reset
|
||||
|
||||
DPNI (Datapath Network Interface)
|
||||
Contains TX/RX queues, network interface configuration, and RX buffer pool
|
||||
configuration mechanisms. The TX/RX queues are in memory and are identified
|
||||
by queue number.
|
||||
|
||||
- MMIO regions: none
|
||||
- IRQs: link state
|
||||
- commands: port config, offload config, queue config,
|
||||
parse/classify config, IRQ config, enable, reset
|
||||
|
||||
DPIO (Datapath I/O)
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
Provides interfaces to enqueue and dequeue
|
||||
packets and do hardware buffer pool management operations. The DPAA2
|
||||
architecture separates the mechanism to access queues (the DPIO object)
|
||||
from the queues themselves. The DPIO provides an MMIO interface to
|
||||
enqueue/dequeue packets. To enqueue something a descriptor is written
|
||||
to the DPIO MMIO region, which includes the target queue number.
|
||||
There will typically be one DPIO assigned to each CPU. This allows all
|
||||
CPUs to simultaneously perform enqueue/dequeued operations. DPIOs are
|
||||
expected to be shared by different DPAA2 drivers.
|
||||
|
||||
- MMIO regions: queue operations, buffer management
|
||||
- IRQs: data availability, congestion notification, buffer
|
||||
pool depletion
|
||||
- commands: IRQ config, enable, reset
|
||||
|
||||
DPBP (Datapath Buffer Pool)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Represents a hardware buffer pool.
|
||||
|
||||
- MMIO regions: none
|
||||
- IRQs: none
|
||||
- commands: enable, reset
|
||||
|
||||
DPMCP (Datapath MC Portal)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Provides an MC command portal.
|
||||
Used by drivers to send commands to the MC to manage
|
||||
objects.
|
||||
|
||||
- MMIO regions: MC command portal
|
||||
- IRQs: command completion
|
||||
- commands: IRQ config, enable, reset
|
||||
|
||||
Object Connections
|
||||
==================
|
||||
Some objects have explicit relationships that must
|
||||
be configured:
|
||||
|
||||
- DPNI <--> DPMAC
|
||||
- DPNI <--> DPNI
|
||||
- DPNI <--> L2-switch-port
|
||||
|
||||
A DPNI must be connected to something such as a DPMAC,
|
||||
another DPNI, or L2 switch port. The DPNI connection
|
||||
is made via a DPRC command.
|
||||
|
||||
::
|
||||
|
||||
+-------+ +-------+
|
||||
| DPNI | | DPMAC |
|
||||
+---+---+ +---+---+
|
||||
| |
|
||||
+==========+
|
||||
|
||||
- DPNI <--> DPBP
|
||||
|
||||
A network interface requires a 'buffer pool' (DPBP
|
||||
object) which provides a list of pointers to memory
|
||||
where received Ethernet data is to be copied. The
|
||||
Ethernet driver configures the DPBPs associated with
|
||||
the network interface.
|
||||
|
||||
Interrupts
|
||||
==========
|
||||
All interrupts generated by DPAA2 objects are message
|
||||
interrupts. At the hardware level message interrupts
|
||||
generated by devices will normally have 3 components--
|
||||
1) a non-spoofable 'device-id' expressed on the hardware
|
||||
bus, 2) an address, 3) a data value.
|
||||
|
||||
In the case of DPAA2 devices/objects, all objects in the
|
||||
same container/DPRC share the same 'device-id'.
|
||||
For ARM-based SoC this is the same as the stream ID.
|
||||
|
||||
|
||||
DPAA2 Linux Drivers Overview
|
||||
============================
|
||||
|
||||
This section provides an overview of the Linux kernel drivers for
|
||||
DPAA2-- 1) the bus driver and associated "DPAA2 infrastructure"
|
||||
drivers and 2) functional object drivers (such as Ethernet).
|
||||
|
||||
As described previously, a DPRC is a container that holds the other
|
||||
types of DPAA2 objects. It is functionally similar to a plug-and-play
|
||||
bus controller.
|
||||
Each object in the DPRC is a Linux "device" and is bound to a driver.
|
||||
The diagram below shows the Linux drivers involved in a networking
|
||||
scenario and the objects bound to each driver. A brief description
|
||||
of each driver follows.
|
||||
|
||||
::
|
||||
|
||||
+------------+
|
||||
| OS Network |
|
||||
| Stack |
|
||||
+------------+ +------------+
|
||||
| Allocator |. . . . . . . | Ethernet |
|
||||
|(DPMCP,DPBP)| | (DPNI) |
|
||||
+-.----------+ +---+---+----+
|
||||
. . ^ |
|
||||
. . <data avail, | | <enqueue,
|
||||
. . tx confirm> | | dequeue>
|
||||
+-------------+ . | |
|
||||
| DPRC driver | . +---+---V----+ +---------+
|
||||
| (DPRC) | . . . . . .| DPIO driver| | MAC |
|
||||
+----------+--+ | (DPIO) | | (DPMAC) |
|
||||
| +------+-----+ +-----+---+
|
||||
|<dev add/remove> | |
|
||||
| | |
|
||||
+--------+----------+ | +--+---+
|
||||
| MC-bus driver | | | PHY |
|
||||
| | | |driver|
|
||||
| /bus/fsl-mc | | +--+---+
|
||||
+-------------------+ | |
|
||||
| |
|
||||
========================= HARDWARE =========|=================|======
|
||||
DPIO |
|
||||
| |
|
||||
DPNI---DPBP |
|
||||
| |
|
||||
DPMAC |
|
||||
| |
|
||||
PHY ---------------+
|
||||
============================================|========================
|
||||
|
||||
A brief description of each driver is provided below.
|
||||
|
||||
MC-bus driver
|
||||
-------------
|
||||
The MC-bus driver is a platform driver and is probed from a
|
||||
node in the device tree (compatible "fsl,qoriq-mc") passed in by boot
|
||||
firmware. It is responsible for bootstrapping the DPAA2 kernel
|
||||
infrastructure.
|
||||
Key functions include:
|
||||
|
||||
- registering a new bus type named "fsl-mc" with the kernel,
|
||||
and implementing bus call-backs (e.g. match/uevent/dev_groups)
|
||||
- implementing APIs for DPAA2 driver registration and for device
|
||||
add/remove
|
||||
- creates an MSI IRQ domain
|
||||
- doing a 'device add' to expose the 'root' DPRC, in turn triggering
|
||||
a bind of the root DPRC to the DPRC driver
|
||||
|
||||
The binding for the MC-bus device-tree node can be consulted at
|
||||
*Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt*.
|
||||
The sysfs bind/unbind interfaces for the MC-bus can be consulted at
|
||||
*Documentation/ABI/testing/sysfs-bus-fsl-mc*.
|
||||
|
||||
DPRC driver
|
||||
-----------
|
||||
The DPRC driver is bound to DPRC objects and does runtime management
|
||||
of a bus instance. It performs the initial bus scan of the DPRC
|
||||
and handles interrupts for container events such as hot plug by
|
||||
re-scanning the DPRC.
|
||||
|
||||
Allocator
|
||||
---------
|
||||
Certain objects such as DPMCP and DPBP are generic and fungible,
|
||||
and are intended to be used by other drivers. For example,
|
||||
the DPAA2 Ethernet driver needs:
|
||||
|
||||
- DPMCPs to send MC commands, to configure network interfaces
|
||||
- DPBPs for network buffer pools
|
||||
|
||||
The allocator driver registers for these allocatable object types
|
||||
and those objects are bound to the allocator when the bus is probed.
|
||||
The allocator maintains a pool of objects that are available for
|
||||
allocation by other DPAA2 drivers.
|
||||
|
||||
DPIO driver
|
||||
-----------
|
||||
The DPIO driver is bound to DPIO objects and provides services that allow
|
||||
other drivers such as the Ethernet driver to enqueue and dequeue data for
|
||||
their respective objects.
|
||||
Key services include:
|
||||
|
||||
- data availability notifications
|
||||
- hardware queuing operations (enqueue and dequeue of data)
|
||||
- hardware buffer pool management
|
||||
|
||||
To transmit a packet the Ethernet driver puts data on a queue and
|
||||
invokes a DPIO API. For receive, the Ethernet driver registers
|
||||
a data availability notification callback. To dequeue a packet
|
||||
a DPIO API is used.
|
||||
There is typically one DPIO object per physical CPU for optimum
|
||||
performance, allowing different CPUs to simultaneously enqueue
|
||||
and dequeue data.
|
||||
|
||||
The DPIO driver operates on behalf of all DPAA2 drivers
|
||||
active in the kernel-- Ethernet, crypto, compression,
|
||||
etc.
|
||||
|
||||
Ethernet driver
|
||||
---------------
|
||||
The Ethernet driver is bound to a DPNI and implements the kernel
|
||||
interfaces needed to connect the DPAA2 network interface to
|
||||
the network stack.
|
||||
Each DPNI corresponds to a Linux network interface.
|
||||
|
||||
MAC driver
|
||||
----------
|
||||
An Ethernet PHY is an off-chip, board specific component and is managed
|
||||
by the appropriate PHY driver via an mdio bus. The MAC driver
|
||||
plays a role of being a proxy between the PHY driver and the
|
||||
MC. It does this proxy via the MC commands to a DPMAC object.
|
||||
If the PHY driver signals a link change, the MAC driver notifies
|
||||
the MC via a DPMAC command. If a network interface is brought
|
||||
up or down, the MC notifies the DPMAC driver via an interrupt and
|
||||
the driver can take appropriate action.
|
@@ -0,0 +1,51 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===========================
|
||||
The Gianfar Ethernet Driver
|
||||
===========================
|
||||
|
||||
:Author: Andy Fleming <afleming@freescale.com>
|
||||
:Updated: 2005-07-28
|
||||
|
||||
|
||||
Checksum Offloading
|
||||
===================
|
||||
|
||||
The eTSEC controller (first included in parts from late 2005 like
|
||||
the 8548) has the ability to perform TCP, UDP, and IP checksums
|
||||
in hardware. The Linux kernel only offloads the TCP and UDP
|
||||
checksums (and always performs the pseudo header checksums), so
|
||||
the driver only supports checksumming for TCP/IP and UDP/IP
|
||||
packets. Use ethtool to enable or disable this feature for RX
|
||||
and TX.
|
||||
|
||||
VLAN
|
||||
====
|
||||
|
||||
In order to use VLAN, please consult Linux documentation on
|
||||
configuring VLANs. The gianfar driver supports hardware insertion and
|
||||
extraction of VLAN headers, but not filtering. Filtering will be
|
||||
done by the kernel.
|
||||
|
||||
Multicasting
|
||||
============
|
||||
|
||||
The gianfar driver supports using the group hash table on the
|
||||
TSEC (and the extended hash table on the eTSEC) for multicast
|
||||
filtering. On the eTSEC, the exact-match MAC registers are used
|
||||
before the hash tables. See Linux documentation on how to join
|
||||
multicast groups.
|
||||
|
||||
Padding
|
||||
=======
|
||||
|
||||
The gianfar driver supports padding received frames with 2 bytes
|
||||
to align the IP header to a 16-byte boundary, when supported by
|
||||
hardware.
|
||||
|
||||
Ethtool
|
||||
=======
|
||||
|
||||
The gianfar driver supports the use of ethtool for many
|
||||
configuration options. You must run ethtool only on currently
|
||||
open interfaces. See ethtool documentation for details.
|
123
Documentation/networking/device_drivers/ethernet/google/gve.rst
Normal file
123
Documentation/networking/device_drivers/ethernet/google/gve.rst
Normal file
@@ -0,0 +1,123 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
==============================================================
|
||||
Linux kernel driver for Compute Engine Virtual Ethernet (gve):
|
||||
==============================================================
|
||||
|
||||
Supported Hardware
|
||||
===================
|
||||
The GVE driver binds to a single PCI device id used by the virtual
|
||||
Ethernet device found in some Compute Engine VMs.
|
||||
|
||||
+--------------+----------+---------+
|
||||
|Field | Value | Comments|
|
||||
+==============+==========+=========+
|
||||
|Vendor ID | `0x1AE0` | Google |
|
||||
+--------------+----------+---------+
|
||||
|Device ID | `0x0042` | |
|
||||
+--------------+----------+---------+
|
||||
|Sub-vendor ID | `0x1AE0` | Google |
|
||||
+--------------+----------+---------+
|
||||
|Sub-device ID | `0x0058` | |
|
||||
+--------------+----------+---------+
|
||||
|Revision ID | `0x0` | |
|
||||
+--------------+----------+---------+
|
||||
|Device Class | `0x200` | Ethernet|
|
||||
+--------------+----------+---------+
|
||||
|
||||
PCI Bars
|
||||
========
|
||||
The gVNIC PCI device exposes three 32-bit memory BARS:
|
||||
- Bar0 - Device configuration and status registers.
|
||||
- Bar1 - MSI-X vector table
|
||||
- Bar2 - IRQ, RX and TX doorbells
|
||||
|
||||
Device Interactions
|
||||
===================
|
||||
The driver interacts with the device in the following ways:
|
||||
- Registers
|
||||
- A block of MMIO registers
|
||||
- See gve_register.h for more detail
|
||||
- Admin Queue
|
||||
- See description below
|
||||
- Reset
|
||||
- At any time the device can be reset
|
||||
- Interrupts
|
||||
- See supported interrupts below
|
||||
- Transmit and Receive Queues
|
||||
- See description below
|
||||
|
||||
Registers
|
||||
---------
|
||||
All registers are MMIO and big endian.
|
||||
|
||||
The registers are used for initializing and configuring the device as well as
|
||||
querying device status in response to management interrupts.
|
||||
|
||||
Admin Queue (AQ)
|
||||
----------------
|
||||
The Admin Queue is a PAGE_SIZE memory block, treated as an array of AQ
|
||||
commands, used by the driver to issue commands to the device and set up
|
||||
resources.The driver and the device maintain a count of how many commands
|
||||
have been submitted and executed. To issue AQ commands, the driver must do
|
||||
the following (with proper locking):
|
||||
|
||||
1) Copy new commands into next available slots in the AQ array
|
||||
2) Increment its counter by he number of new commands
|
||||
3) Write the counter into the GVE_ADMIN_QUEUE_DOORBELL register
|
||||
4) Poll the ADMIN_QUEUE_EVENT_COUNTER register until it equals
|
||||
the value written to the doorbell, or until a timeout.
|
||||
|
||||
The device will update the status field in each AQ command reported as
|
||||
executed through the ADMIN_QUEUE_EVENT_COUNTER register.
|
||||
|
||||
Device Resets
|
||||
-------------
|
||||
A device reset is triggered by writing 0x0 to the AQ PFN register.
|
||||
This causes the device to release all resources allocated by the
|
||||
driver, including the AQ itself.
|
||||
|
||||
Interrupts
|
||||
----------
|
||||
The following interrupts are supported by the driver:
|
||||
|
||||
Management Interrupt
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
The management interrupt is used by the device to tell the driver to
|
||||
look at the GVE_DEVICE_STATUS register.
|
||||
|
||||
The handler for the management irq simply queues the service task in
|
||||
the workqueue to check the register and acks the irq.
|
||||
|
||||
Notification Block Interrupts
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The notification block interrupts are used to tell the driver to poll
|
||||
the queues associated with that interrupt.
|
||||
|
||||
The handler for these irqs schedule the napi for that block to run
|
||||
and poll the queues.
|
||||
|
||||
Traffic Queues
|
||||
--------------
|
||||
gVNIC's queues are composed of a descriptor ring and a buffer and are
|
||||
assigned to a notification block.
|
||||
|
||||
The descriptor rings are power-of-two-sized ring buffers consisting of
|
||||
fixed-size descriptors. They advance their head pointer using a __be32
|
||||
doorbell located in Bar2. The tail pointers are advanced by consuming
|
||||
descriptors in-order and updating a __be32 counter. Both the doorbell
|
||||
and the counter overflow to zero.
|
||||
|
||||
Each queue's buffers must be registered in advance with the device as a
|
||||
queue page list, and packet data can only be put in those pages.
|
||||
|
||||
Transmit
|
||||
~~~~~~~~
|
||||
gve maps the buffers for transmit rings into a FIFO and copies the packets
|
||||
into the FIFO before sending them to the NIC.
|
||||
|
||||
Receive
|
||||
~~~~~~~
|
||||
The buffers for receive rings are put into a data ring that is the same
|
||||
length as the descriptor ring and the head and tail pointers advance over
|
||||
the rings together.
|
58
Documentation/networking/device_drivers/ethernet/index.rst
Normal file
58
Documentation/networking/device_drivers/ethernet/index.rst
Normal file
@@ -0,0 +1,58 @@
|
||||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
|
||||
Ethernet Device Drivers
|
||||
=======================
|
||||
|
||||
Device drivers for Ethernet and Ethernet-based virtual function devices.
|
||||
|
||||
Contents:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
3com/3c509
|
||||
3com/vortex
|
||||
amazon/ena
|
||||
aquantia/atlantic
|
||||
chelsio/cxgb
|
||||
cirrus/cs89x0
|
||||
dlink/dl2k
|
||||
davicom/dm9000
|
||||
dec/de4x5
|
||||
dec/dmfe
|
||||
freescale/dpaa
|
||||
freescale/dpaa2/index
|
||||
freescale/gianfar
|
||||
google/gve
|
||||
intel/e100
|
||||
intel/e1000
|
||||
intel/e1000e
|
||||
intel/fm10k
|
||||
intel/igb
|
||||
intel/igbvf
|
||||
intel/ixgb
|
||||
intel/ixgbe
|
||||
intel/ixgbevf
|
||||
intel/i40e
|
||||
intel/iavf
|
||||
intel/ice
|
||||
marvell/octeontx2
|
||||
mellanox/mlx5
|
||||
microsoft/netvsc
|
||||
neterion/s2io
|
||||
neterion/vxge
|
||||
netronome/nfp
|
||||
pensando/ionic
|
||||
smsc/smc9
|
||||
stmicro/stmmac
|
||||
ti/cpsw
|
||||
ti/cpsw_switchdev
|
||||
ti/tlan
|
||||
toshiba/spider_net
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
188
Documentation/networking/device_drivers/ethernet/intel/e100.rst
Normal file
188
Documentation/networking/device_drivers/ethernet/intel/e100.rst
Normal file
@@ -0,0 +1,188 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
=============================================================
|
||||
Linux Base Driver for the Intel(R) PRO/100 Family of Adapters
|
||||
=============================================================
|
||||
|
||||
June 1, 2018
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- In This Release
|
||||
- Identifying Your Adapter
|
||||
- Building and Installation
|
||||
- Driver Configuration Parameters
|
||||
- Additional Configurations
|
||||
- Known Issues
|
||||
- Support
|
||||
|
||||
|
||||
In This Release
|
||||
===============
|
||||
|
||||
This file describes the Linux Base Driver for the Intel(R) PRO/100 Family of
|
||||
Adapters. This driver includes support for Itanium(R)2-based systems.
|
||||
|
||||
For questions related to hardware requirements, refer to the documentation
|
||||
supplied with your Intel PRO/100 adapter.
|
||||
|
||||
The following features are now available in supported kernels:
|
||||
- Native VLANs
|
||||
- Channel Bonding (teaming)
|
||||
- SNMP
|
||||
|
||||
Channel Bonding documentation can be found in the Linux kernel source:
|
||||
/Documentation/networking/bonding.rst
|
||||
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
|
||||
For information on how to identify your adapter, and for the latest Intel
|
||||
network drivers, refer to the Intel Support website:
|
||||
http://www.intel.com/support
|
||||
|
||||
Driver Configuration Parameters
|
||||
===============================
|
||||
|
||||
The default value for each parameter is generally the recommended setting,
|
||||
unless otherwise noted.
|
||||
|
||||
Rx Descriptors:
|
||||
Number of receive descriptors. A receive descriptor is a data
|
||||
structure that describes a receive buffer and its attributes to the network
|
||||
controller. The data in the descriptor is used by the controller to write
|
||||
data from the controller to host memory. In the 3.x.x driver the valid range
|
||||
for this parameter is 64-256. The default value is 256. This parameter can be
|
||||
changed using the command::
|
||||
|
||||
ethtool -G eth? rx n
|
||||
|
||||
Where n is the number of desired Rx descriptors.
|
||||
|
||||
Tx Descriptors:
|
||||
Number of transmit descriptors. A transmit descriptor is a data
|
||||
structure that describes a transmit buffer and its attributes to the network
|
||||
controller. The data in the descriptor is used by the controller to read
|
||||
data from the host memory to the controller. In the 3.x.x driver the valid
|
||||
range for this parameter is 64-256. The default value is 128. This parameter
|
||||
can be changed using the command::
|
||||
|
||||
ethtool -G eth? tx n
|
||||
|
||||
Where n is the number of desired Tx descriptors.
|
||||
|
||||
Speed/Duplex:
|
||||
The driver auto-negotiates the link speed and duplex settings by
|
||||
default. The ethtool utility can be used as follows to force speed/duplex.::
|
||||
|
||||
ethtool -s eth? autoneg off speed {10|100} duplex {full|half}
|
||||
|
||||
NOTE: setting the speed/duplex to incorrect values will cause the link to
|
||||
fail.
|
||||
|
||||
Event Log Message Level:
|
||||
The driver uses the message level flag to log events
|
||||
to syslog. The message level can be set at driver load time. It can also be
|
||||
set using the command::
|
||||
|
||||
ethtool -s eth? msglvl n
|
||||
|
||||
|
||||
Additional Configurations
|
||||
=========================
|
||||
|
||||
Configuring the Driver on Different Distributions
|
||||
-------------------------------------------------
|
||||
|
||||
Configuring a network driver to load properly when the system is started
|
||||
is distribution dependent. Typically, the configuration process involves
|
||||
adding an alias line to `/etc/modprobe.d/*.conf` as well as editing other
|
||||
system startup scripts and/or configuration files. Many popular Linux
|
||||
distributions ship with tools to make these changes for you. To learn
|
||||
the proper way to configure a network device for your system, refer to
|
||||
your distribution documentation. If during this process you are asked
|
||||
for the driver or module name, the name for the Linux Base Driver for
|
||||
the Intel PRO/100 Family of Adapters is e100.
|
||||
|
||||
As an example, if you install the e100 driver for two PRO/100 adapters
|
||||
(eth0 and eth1), add the following to a configuration file in
|
||||
/etc/modprobe.d/::
|
||||
|
||||
alias eth0 e100
|
||||
alias eth1 e100
|
||||
|
||||
Viewing Link Messages
|
||||
---------------------
|
||||
|
||||
In order to see link messages and other Intel driver information on your
|
||||
console, you must set the dmesg level up to six. This can be done by
|
||||
entering the following on the command line before loading the e100
|
||||
driver::
|
||||
|
||||
dmesg -n 6
|
||||
|
||||
If you wish to see all messages issued by the driver, including debug
|
||||
messages, set the dmesg level to eight.
|
||||
|
||||
NOTE: This setting is not saved across reboots.
|
||||
|
||||
ethtool
|
||||
-------
|
||||
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The ethtool
|
||||
version 1.6 or later is required for this functionality.
|
||||
|
||||
The latest release of ethtool can be found from
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
Enabling Wake on LAN (WoL)
|
||||
--------------------------
|
||||
WoL is provided through the ethtool utility. For instructions on
|
||||
enabling WoL with ethtool, refer to the ethtool man page. WoL will be
|
||||
enabled on the system during the next shut down or reboot. For this
|
||||
driver version, in order to enable WoL, the e100 driver must be loaded
|
||||
when shutting down or rebooting the system.
|
||||
|
||||
NAPI
|
||||
----
|
||||
|
||||
NAPI (Rx polling mode) is supported in the e100 driver.
|
||||
|
||||
See https://wiki.linuxfoundation.org/networking/napi for more
|
||||
information on NAPI.
|
||||
|
||||
Multiple Interfaces on Same Ethernet Broadcast Network
|
||||
------------------------------------------------------
|
||||
|
||||
Due to the default ARP behavior on Linux, it is not possible to have one
|
||||
system on two IP networks in the same Ethernet broadcast domain
|
||||
(non-partitioned switch) behave as expected. All Ethernet interfaces
|
||||
will respond to IP traffic for any IP address assigned to the system.
|
||||
This results in unbalanced receive traffic.
|
||||
|
||||
If you have multiple interfaces in a server, either turn on ARP
|
||||
filtering by
|
||||
|
||||
(1) entering::
|
||||
|
||||
echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
|
||||
|
||||
(this only works if your kernel's version is higher than 2.4.5), or
|
||||
|
||||
(2) installing the interfaces in separate broadcast domains (either
|
||||
in different switches or in a switch partitioned to VLANs).
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
http://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
http://sourceforge.net/projects/e1000
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
463
Documentation/networking/device_drivers/ethernet/intel/e1000.rst
Normal file
463
Documentation/networking/device_drivers/ethernet/intel/e1000.rst
Normal file
@@ -0,0 +1,463 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
==========================================================
|
||||
Linux Base Driver for Intel(R) Ethernet Network Connection
|
||||
==========================================================
|
||||
|
||||
Intel Gigabit Linux driver.
|
||||
Copyright(c) 1999 - 2013 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Identifying Your Adapter
|
||||
- Command Line Parameters
|
||||
- Speed and Duplex Configuration
|
||||
- Additional Configurations
|
||||
- Support
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
|
||||
For more information on how to identify your adapter, go to the Adapter &
|
||||
Driver ID Guide at:
|
||||
|
||||
http://support.intel.com/support/go/network/adapter/idguide.htm
|
||||
|
||||
For the latest Intel network drivers for Linux, refer to the following
|
||||
website. In the search field, enter your adapter name or type, or use the
|
||||
networking link on the left to search for your adapter:
|
||||
|
||||
http://support.intel.com/support/go/network/adapter/home.htm
|
||||
|
||||
Command Line Parameters
|
||||
=======================
|
||||
|
||||
The default value for each parameter is generally the recommended setting,
|
||||
unless otherwise noted.
|
||||
|
||||
NOTES:
|
||||
For more information about the AutoNeg, Duplex, and Speed
|
||||
parameters, see the "Speed and Duplex Configuration" section in
|
||||
this document.
|
||||
|
||||
For more information about the InterruptThrottleRate,
|
||||
RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay
|
||||
parameters, see the application note at:
|
||||
http://www.intel.com/design/network/applnots/ap450.htm
|
||||
|
||||
AutoNeg
|
||||
-------
|
||||
|
||||
(Supported only on adapters with copper connections)
|
||||
|
||||
:Valid Range: 0x01-0x0F, 0x20-0x2F
|
||||
:Default Value: 0x2F
|
||||
|
||||
This parameter is a bit-mask that specifies the speed and duplex settings
|
||||
advertised by the adapter. When this parameter is used, the Speed and
|
||||
Duplex parameters must not be specified.
|
||||
|
||||
NOTE:
|
||||
Refer to the Speed and Duplex section of this readme for more
|
||||
information on the AutoNeg parameter.
|
||||
|
||||
Duplex
|
||||
------
|
||||
|
||||
(Supported only on adapters with copper connections)
|
||||
|
||||
:Valid Range: 0-2 (0=auto-negotiate, 1=half, 2=full)
|
||||
:Default Value: 0
|
||||
|
||||
This defines the direction in which data is allowed to flow. Can be
|
||||
either one or two-directional. If both Duplex and the link partner are
|
||||
set to auto-negotiate, the board auto-detects the correct duplex. If the
|
||||
link partner is forced (either full or half), Duplex defaults to half-
|
||||
duplex.
|
||||
|
||||
FlowControl
|
||||
-----------
|
||||
|
||||
:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
|
||||
:Default Value: Reads flow control settings from the EEPROM
|
||||
|
||||
This parameter controls the automatic generation(Tx) and response(Rx)
|
||||
to Ethernet PAUSE frames.
|
||||
|
||||
InterruptThrottleRate
|
||||
---------------------
|
||||
|
||||
(not supported on Intel(R) 82542, 82543 or 82544-based adapters)
|
||||
|
||||
:Valid Range:
|
||||
0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
|
||||
4=simplified balancing)
|
||||
:Default Value: 3
|
||||
|
||||
The driver can limit the amount of interrupts per second that the adapter
|
||||
will generate for incoming packets. It does this by writing a value to the
|
||||
adapter that is based on the maximum amount of interrupts that the adapter
|
||||
will generate per second.
|
||||
|
||||
Setting InterruptThrottleRate to a value greater or equal to 100
|
||||
will program the adapter to send out a maximum of that many interrupts
|
||||
per second, even if more packets have come in. This reduces interrupt
|
||||
load on the system and can lower CPU utilization under heavy load,
|
||||
but will increase latency as packets are not processed as quickly.
|
||||
|
||||
The default behaviour of the driver previously assumed a static
|
||||
InterruptThrottleRate value of 8000, providing a good fallback value for
|
||||
all traffic types,but lacking in small packet performance and latency.
|
||||
The hardware can handle many more small packets per second however, and
|
||||
for this reason an adaptive interrupt moderation algorithm was implemented.
|
||||
|
||||
Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which
|
||||
it dynamically adjusts the InterruptThrottleRate value based on the traffic
|
||||
that it receives. After determining the type of incoming traffic in the last
|
||||
timeframe, it will adjust the InterruptThrottleRate to an appropriate value
|
||||
for that traffic.
|
||||
|
||||
The algorithm classifies the incoming traffic every interval into
|
||||
classes. Once the class is determined, the InterruptThrottleRate value is
|
||||
adjusted to suit that traffic type the best. There are three classes defined:
|
||||
"Bulk traffic", for large amounts of packets of normal size; "Low latency",
|
||||
for small amounts of traffic and/or a significant percentage of small
|
||||
packets; and "Lowest latency", for almost completely small packets or
|
||||
minimal traffic.
|
||||
|
||||
In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
|
||||
for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
|
||||
latency" or "Lowest latency" class, the InterruptThrottleRate is increased
|
||||
stepwise to 20000. This default mode is suitable for most applications.
|
||||
|
||||
For situations where low latency is vital such as cluster or
|
||||
grid computing, the algorithm can reduce latency even more when
|
||||
InterruptThrottleRate is set to mode 1. In this mode, which operates
|
||||
the same as mode 3, the InterruptThrottleRate will be increased stepwise to
|
||||
70000 for traffic in class "Lowest latency".
|
||||
|
||||
In simplified mode the interrupt rate is based on the ratio of TX and
|
||||
RX traffic. If the bytes per second rate is approximately equal, the
|
||||
interrupt rate will drop as low as 2000 interrupts per second. If the
|
||||
traffic is mostly transmit or mostly receive, the interrupt rate could
|
||||
be as high as 8000.
|
||||
|
||||
Setting InterruptThrottleRate to 0 turns off any interrupt moderation
|
||||
and may improve small packet latency, but is generally not suitable
|
||||
for bulk throughput traffic.
|
||||
|
||||
NOTE:
|
||||
InterruptThrottleRate takes precedence over the TxAbsIntDelay and
|
||||
RxAbsIntDelay parameters. In other words, minimizing the receive
|
||||
and/or transmit absolute delays does not force the controller to
|
||||
generate more interrupts than what the Interrupt Throttle Rate
|
||||
allows.
|
||||
|
||||
CAUTION:
|
||||
If you are using the Intel(R) PRO/1000 CT Network Connection
|
||||
(controller 82547), setting InterruptThrottleRate to a value
|
||||
greater than 75,000, may hang (stop transmitting) adapters
|
||||
under certain network conditions. If this occurs a NETDEV
|
||||
WATCHDOG message is logged in the system event log. In
|
||||
addition, the controller is automatically reset, restoring
|
||||
the network connection. To eliminate the potential for the
|
||||
hang, ensure that InterruptThrottleRate is set no greater
|
||||
than 75,000 and is not set to 0.
|
||||
|
||||
NOTE:
|
||||
When e1000 is loaded with default settings and multiple adapters
|
||||
are in use simultaneously, the CPU utilization may increase non-
|
||||
linearly. In order to limit the CPU utilization without impacting
|
||||
the overall throughput, we recommend that you load the driver as
|
||||
follows::
|
||||
|
||||
modprobe e1000 InterruptThrottleRate=3000,3000,3000
|
||||
|
||||
This sets the InterruptThrottleRate to 3000 interrupts/sec for
|
||||
the first, second, and third instances of the driver. The range
|
||||
of 2000 to 3000 interrupts per second works on a majority of
|
||||
systems and is a good starting point, but the optimal value will
|
||||
be platform-specific. If CPU utilization is not a concern, use
|
||||
RX_POLLING (NAPI) and default driver settings.
|
||||
|
||||
RxDescriptors
|
||||
-------------
|
||||
|
||||
:Valid Range:
|
||||
- 48-256 for 82542 and 82543-based adapters
|
||||
- 48-4096 for all other supported adapters
|
||||
:Default Value: 256
|
||||
|
||||
This value specifies the number of receive buffer descriptors allocated
|
||||
by the driver. Increasing this value allows the driver to buffer more
|
||||
incoming packets, at the expense of increased system memory utilization.
|
||||
|
||||
Each descriptor is 16 bytes. A receive buffer is also allocated for each
|
||||
descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending
|
||||
on the MTU setting. The maximum MTU size is 16110.
|
||||
|
||||
NOTE:
|
||||
MTU designates the frame size. It only needs to be set for Jumbo
|
||||
Frames. Depending on the available system resources, the request
|
||||
for a higher number of receive descriptors may be denied. In this
|
||||
case, use a lower number.
|
||||
|
||||
RxIntDelay
|
||||
----------
|
||||
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 0
|
||||
|
||||
This value delays the generation of receive interrupts in units of 1.024
|
||||
microseconds. Receive interrupt reduction can improve CPU efficiency if
|
||||
properly tuned for specific network traffic. Increasing this value adds
|
||||
extra latency to frame reception and can end up decreasing the throughput
|
||||
of TCP traffic. If the system is reporting dropped receives, this value
|
||||
may be set too high, causing the driver to run out of available receive
|
||||
descriptors.
|
||||
|
||||
CAUTION:
|
||||
When setting RxIntDelay to a value other than 0, adapters may
|
||||
hang (stop transmitting) under certain network conditions. If
|
||||
this occurs a NETDEV WATCHDOG message is logged in the system
|
||||
event log. In addition, the controller is automatically reset,
|
||||
restoring the network connection. To eliminate the potential
|
||||
for the hang ensure that RxIntDelay is set to 0.
|
||||
|
||||
RxAbsIntDelay
|
||||
-------------
|
||||
|
||||
(This parameter is supported only on 82540, 82545 and later adapters.)
|
||||
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 128
|
||||
|
||||
This value, in units of 1.024 microseconds, limits the delay in which a
|
||||
receive interrupt is generated. Useful only if RxIntDelay is non-zero,
|
||||
this value ensures that an interrupt is generated after the initial
|
||||
packet is received within the set amount of time. Proper tuning,
|
||||
along with RxIntDelay, may improve traffic throughput in specific network
|
||||
conditions.
|
||||
|
||||
Speed
|
||||
-----
|
||||
|
||||
(This parameter is supported only on adapters with copper connections.)
|
||||
|
||||
:Valid Settings: 0, 10, 100, 1000
|
||||
:Default Value: 0 (auto-negotiate at all supported speeds)
|
||||
|
||||
Speed forces the line speed to the specified value in megabits per second
|
||||
(Mbps). If this parameter is not specified or is set to 0 and the link
|
||||
partner is set to auto-negotiate, the board will auto-detect the correct
|
||||
speed. Duplex should also be set when Speed is set to either 10 or 100.
|
||||
|
||||
TxDescriptors
|
||||
-------------
|
||||
|
||||
:Valid Range:
|
||||
- 48-256 for 82542 and 82543-based adapters
|
||||
- 48-4096 for all other supported adapters
|
||||
:Default Value: 256
|
||||
|
||||
This value is the number of transmit descriptors allocated by the driver.
|
||||
Increasing this value allows the driver to queue more transmits. Each
|
||||
descriptor is 16 bytes.
|
||||
|
||||
NOTE:
|
||||
Depending on the available system resources, the request for a
|
||||
higher number of transmit descriptors may be denied. In this case,
|
||||
use a lower number.
|
||||
|
||||
TxIntDelay
|
||||
----------
|
||||
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 8
|
||||
|
||||
This value delays the generation of transmit interrupts in units of
|
||||
1.024 microseconds. Transmit interrupt reduction can improve CPU
|
||||
efficiency if properly tuned for specific network traffic. If the
|
||||
system is reporting dropped transmits, this value may be set too high
|
||||
causing the driver to run out of available transmit descriptors.
|
||||
|
||||
TxAbsIntDelay
|
||||
-------------
|
||||
|
||||
(This parameter is supported only on 82540, 82545 and later adapters.)
|
||||
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 32
|
||||
|
||||
This value, in units of 1.024 microseconds, limits the delay in which a
|
||||
transmit interrupt is generated. Useful only if TxIntDelay is non-zero,
|
||||
this value ensures that an interrupt is generated after the initial
|
||||
packet is sent on the wire within the set amount of time. Proper tuning,
|
||||
along with TxIntDelay, may improve traffic throughput in specific
|
||||
network conditions.
|
||||
|
||||
XsumRX
|
||||
------
|
||||
|
||||
(This parameter is NOT supported on the 82542-based adapter.)
|
||||
|
||||
:Valid Range: 0-1
|
||||
:Default Value: 1
|
||||
|
||||
A value of '1' indicates that the driver should enable IP checksum
|
||||
offload for received packets (both UDP and TCP) to the adapter hardware.
|
||||
|
||||
Copybreak
|
||||
---------
|
||||
|
||||
:Valid Range: 0-xxxxxxx (0=off)
|
||||
:Default Value: 256
|
||||
:Usage: modprobe e1000.ko copybreak=128
|
||||
|
||||
Driver copies all packets below or equaling this size to a fresh RX
|
||||
buffer before handing it up the stack.
|
||||
|
||||
This parameter is different than other parameters, in that it is a
|
||||
single (not 1,1,1 etc.) parameter applied to all driver instances and
|
||||
it is also available during runtime at
|
||||
/sys/module/e1000/parameters/copybreak
|
||||
|
||||
SmartPowerDownEnable
|
||||
--------------------
|
||||
|
||||
:Valid Range: 0-1
|
||||
:Default Value: 0 (disabled)
|
||||
|
||||
Allows PHY to turn off in lower power states. The user can turn off
|
||||
this parameter in supported chipsets.
|
||||
|
||||
Speed and Duplex Configuration
|
||||
==============================
|
||||
|
||||
Three keywords are used to control the speed and duplex configuration.
|
||||
These keywords are Speed, Duplex, and AutoNeg.
|
||||
|
||||
If the board uses a fiber interface, these keywords are ignored, and the
|
||||
fiber interface board only links at 1000 Mbps full-duplex.
|
||||
|
||||
For copper-based boards, the keywords interact as follows:
|
||||
|
||||
- The default operation is auto-negotiate. The board advertises all
|
||||
supported speed and duplex combinations, and it links at the highest
|
||||
common speed and duplex mode IF the link partner is set to auto-negotiate.
|
||||
|
||||
- If Speed = 1000, limited auto-negotiation is enabled and only 1000 Mbps
|
||||
is advertised (The 1000BaseT spec requires auto-negotiation.)
|
||||
|
||||
- If Speed = 10 or 100, then both Speed and Duplex should be set. Auto-
|
||||
negotiation is disabled, and the AutoNeg parameter is ignored. Partner
|
||||
SHOULD also be forced.
|
||||
|
||||
The AutoNeg parameter is used when more control is required over the
|
||||
auto-negotiation process. It should be used when you wish to control which
|
||||
speed and duplex combinations are advertised during the auto-negotiation
|
||||
process.
|
||||
|
||||
The parameter may be specified as either a decimal or hexadecimal value as
|
||||
determined by the bitmap below.
|
||||
|
||||
============== ====== ====== ======= ======= ====== ====== ======= ======
|
||||
Bit position 7 6 5 4 3 2 1 0
|
||||
Decimal Value 128 64 32 16 8 4 2 1
|
||||
Hex value 80 40 20 10 8 4 2 1
|
||||
Speed (Mbps) N/A N/A 1000 N/A 100 100 10 10
|
||||
Duplex Full Full Half Full Half
|
||||
============== ====== ====== ======= ======= ====== ====== ======= ======
|
||||
|
||||
Some examples of using AutoNeg::
|
||||
|
||||
modprobe e1000 AutoNeg=0x01 (Restricts autonegotiation to 10 Half)
|
||||
modprobe e1000 AutoNeg=1 (Same as above)
|
||||
modprobe e1000 AutoNeg=0x02 (Restricts autonegotiation to 10 Full)
|
||||
modprobe e1000 AutoNeg=0x03 (Restricts autonegotiation to 10 Half or 10 Full)
|
||||
modprobe e1000 AutoNeg=0x04 (Restricts autonegotiation to 100 Half)
|
||||
modprobe e1000 AutoNeg=0x05 (Restricts autonegotiation to 10 Half or 100
|
||||
Half)
|
||||
modprobe e1000 AutoNeg=0x020 (Restricts autonegotiation to 1000 Full)
|
||||
modprobe e1000 AutoNeg=32 (Same as above)
|
||||
|
||||
Note that when this parameter is used, Speed and Duplex must not be specified.
|
||||
|
||||
If the link partner is forced to a specific speed and duplex, then this
|
||||
parameter should not be used. Instead, use the Speed and Duplex parameters
|
||||
previously mentioned to force the adapter to the same speed and duplex.
|
||||
|
||||
Additional Configurations
|
||||
=========================
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
|
||||
Jumbo Frames support is enabled by changing the MTU to a value larger than
|
||||
the default of 1500. Use the ifconfig command to increase the MTU size.
|
||||
For example::
|
||||
|
||||
ifconfig eth<x> mtu 9000 up
|
||||
|
||||
This setting is not saved across reboots. It can be made permanent if
|
||||
you add::
|
||||
|
||||
MTU=9000
|
||||
|
||||
to the file /etc/sysconfig/network-scripts/ifcfg-eth<x>. This example
|
||||
applies to the Red Hat distributions; other distributions may store this
|
||||
setting in a different location.
|
||||
|
||||
Notes:
|
||||
Degradation in throughput performance may be observed in some Jumbo frames
|
||||
environments. If this is observed, increasing the application's socket buffer
|
||||
size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
|
||||
See the specific application manual and /usr/src/linux*/Documentation/
|
||||
networking/ip-sysctl.txt for more details.
|
||||
|
||||
- The maximum MTU setting for Jumbo Frames is 16110. This value coincides
|
||||
with the maximum Jumbo Frames size of 16128.
|
||||
|
||||
- Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
|
||||
poor performance or loss of link.
|
||||
|
||||
- Adapters based on the Intel(R) 82542 and 82573V/E controller do not
|
||||
support Jumbo Frames. These correspond to the following product names::
|
||||
|
||||
Intel(R) PRO/1000 Gigabit Server Adapter
|
||||
Intel(R) PRO/1000 PM Network Connection
|
||||
|
||||
ethtool
|
||||
-------
|
||||
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The ethtool
|
||||
version 1.6 or later is required for this functionality.
|
||||
|
||||
The latest release of ethtool can be found from
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
Enabling Wake on LAN (WoL)
|
||||
--------------------------
|
||||
|
||||
WoL is configured through the ethtool utility.
|
||||
|
||||
WoL will be enabled on the system during the next shut down or reboot.
|
||||
For this driver version, in order to enable WoL, the e1000 driver must be
|
||||
loaded when shutting down or rebooting the system.
|
||||
|
||||
Support
|
||||
=======
|
||||
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
http://support.intel.com
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
http://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on the supported
|
||||
kernel with a supported adapter, email the specific information related
|
||||
to the issue to e1000-devel@lists.sf.net
|
@@ -0,0 +1,383 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
=====================================================
|
||||
Linux Driver for Intel(R) Ethernet Network Connection
|
||||
=====================================================
|
||||
|
||||
Intel Gigabit Linux driver.
|
||||
Copyright(c) 2008-2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Identifying Your Adapter
|
||||
- Command Line Parameters
|
||||
- Additional Configurations
|
||||
- Support
|
||||
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
For information on how to identify your adapter, and for the latest Intel
|
||||
network drivers, refer to the Intel Support website:
|
||||
https://www.intel.com/support
|
||||
|
||||
|
||||
Command Line Parameters
|
||||
=======================
|
||||
If the driver is built as a module, the following optional parameters are used
|
||||
by entering them on the command line with the modprobe command using this
|
||||
syntax::
|
||||
|
||||
modprobe e1000e [<option>=<VAL1>,<VAL2>,...]
|
||||
|
||||
There needs to be a <VAL#> for each network port in the system supported by
|
||||
this driver. The values will be applied to each instance, in function order.
|
||||
For example::
|
||||
|
||||
modprobe e1000e InterruptThrottleRate=16000,16000
|
||||
|
||||
In this case, there are two network ports supported by e1000e in the system.
|
||||
The default value for each parameter is generally the recommended setting,
|
||||
unless otherwise noted.
|
||||
|
||||
NOTE: A descriptor describes a data buffer and attributes related to the data
|
||||
buffer. This information is accessed by the hardware.
|
||||
|
||||
InterruptThrottleRate
|
||||
---------------------
|
||||
:Valid Range: 0,1,3,4,100-100000
|
||||
:Default Value: 3
|
||||
|
||||
Interrupt Throttle Rate controls the number of interrupts each interrupt
|
||||
vector can generate per second. Increasing ITR lowers latency at the cost of
|
||||
increased CPU utilization, though it may help throughput in some circumstances.
|
||||
|
||||
Setting InterruptThrottleRate to a value greater or equal to 100
|
||||
will program the adapter to send out a maximum of that many interrupts
|
||||
per second, even if more packets have come in. This reduces interrupt
|
||||
load on the system and can lower CPU utilization under heavy load,
|
||||
but will increase latency as packets are not processed as quickly.
|
||||
|
||||
The default behaviour of the driver previously assumed a static
|
||||
InterruptThrottleRate value of 8000, providing a good fallback value for
|
||||
all traffic types, but lacking in small packet performance and latency.
|
||||
The hardware can handle many more small packets per second however, and
|
||||
for this reason an adaptive interrupt moderation algorithm was implemented.
|
||||
|
||||
The driver has two adaptive modes (setting 1 or 3) in which
|
||||
it dynamically adjusts the InterruptThrottleRate value based on the traffic
|
||||
that it receives. After determining the type of incoming traffic in the last
|
||||
timeframe, it will adjust the InterruptThrottleRate to an appropriate value
|
||||
for that traffic.
|
||||
|
||||
The algorithm classifies the incoming traffic every interval into
|
||||
classes. Once the class is determined, the InterruptThrottleRate value is
|
||||
adjusted to suit that traffic type the best. There are three classes defined:
|
||||
"Bulk traffic", for large amounts of packets of normal size; "Low latency",
|
||||
for small amounts of traffic and/or a significant percentage of small
|
||||
packets; and "Lowest latency", for almost completely small packets or
|
||||
minimal traffic.
|
||||
|
||||
- 0: Off
|
||||
Turns off any interrupt moderation and may improve small packet latency.
|
||||
However, this is generally not suitable for bulk throughput traffic due
|
||||
to the increased CPU utilization of the higher interrupt rate.
|
||||
- 1: Dynamic mode
|
||||
This mode attempts to moderate interrupts per vector while maintaining
|
||||
very low latency. This can sometimes cause extra CPU utilization. If
|
||||
planning on deploying e1000e in a latency sensitive environment, this
|
||||
parameter should be considered.
|
||||
- 3: Dynamic Conservative mode (default)
|
||||
In dynamic conservative mode, the InterruptThrottleRate value is set to
|
||||
4000 for traffic that falls in class "Bulk traffic". If traffic falls in
|
||||
the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is
|
||||
increased stepwise to 20000. This default mode is suitable for most
|
||||
applications.
|
||||
- 4: Simplified Balancing mode
|
||||
In simplified mode the interrupt rate is based on the ratio of TX and
|
||||
RX traffic. If the bytes per second rate is approximately equal, the
|
||||
interrupt rate will drop as low as 2000 interrupts per second. If the
|
||||
traffic is mostly transmit or mostly receive, the interrupt rate could
|
||||
be as high as 8000.
|
||||
- 100-100000:
|
||||
Setting InterruptThrottleRate to a value greater or equal to 100
|
||||
will program the adapter to send at most that many interrupts per second,
|
||||
even if more packets have come in. This reduces interrupt load on the
|
||||
system and can lower CPU utilization under heavy load, but will increase
|
||||
latency as packets are not processed as quickly.
|
||||
|
||||
NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and
|
||||
RxAbsIntDelay parameters. In other words, minimizing the receive and/or
|
||||
transmit absolute delays does not force the controller to generate more
|
||||
interrupts than what the Interrupt Throttle Rate allows.
|
||||
|
||||
RxIntDelay
|
||||
----------
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 0
|
||||
|
||||
This value delays the generation of receive interrupts in units of 1.024
|
||||
microseconds. Receive interrupt reduction can improve CPU efficiency if
|
||||
properly tuned for specific network traffic. Increasing this value adds extra
|
||||
latency to frame reception and can end up decreasing the throughput of TCP
|
||||
traffic. If the system is reporting dropped receives, this value may be set
|
||||
too high, causing the driver to run out of available receive descriptors.
|
||||
|
||||
CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang
|
||||
(stop transmitting) under certain network conditions. If this occurs a NETDEV
|
||||
WATCHDOG message is logged in the system event log. In addition, the
|
||||
controller is automatically reset, restoring the network connection. To
|
||||
eliminate the potential for the hang ensure that RxIntDelay is set to 0.
|
||||
|
||||
RxAbsIntDelay
|
||||
-------------
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 8
|
||||
|
||||
This value, in units of 1.024 microseconds, limits the delay in which a
|
||||
receive interrupt is generated. This value ensures that an interrupt is
|
||||
generated after the initial packet is received within the set amount of time,
|
||||
which is useful only if RxIntDelay is non-zero. Proper tuning, along with
|
||||
RxIntDelay, may improve traffic throughput in specific network conditions.
|
||||
|
||||
TxIntDelay
|
||||
----------
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 8
|
||||
|
||||
This value delays the generation of transmit interrupts in units of 1.024
|
||||
microseconds. Transmit interrupt reduction can improve CPU efficiency if
|
||||
properly tuned for specific network traffic. If the system is reporting
|
||||
dropped transmits, this value may be set too high causing the driver to run
|
||||
out of available transmit descriptors.
|
||||
|
||||
TxAbsIntDelay
|
||||
-------------
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 32
|
||||
|
||||
This value, in units of 1.024 microseconds, limits the delay in which a
|
||||
transmit interrupt is generated. It is useful only if TxIntDelay is non-zero.
|
||||
It ensures that an interrupt is generated after the initial Packet is sent on
|
||||
the wire within the set amount of time. Proper tuning, along with TxIntDelay,
|
||||
may improve traffic throughput in specific network conditions.
|
||||
|
||||
copybreak
|
||||
---------
|
||||
:Valid Range: 0-xxxxxxx (0=off)
|
||||
:Default Value: 256
|
||||
|
||||
The driver copies all packets below or equaling this size to a fresh receive
|
||||
buffer before handing it up the stack.
|
||||
This parameter differs from other parameters because it is a single (not 1,1,1
|
||||
etc.) parameter applied to all driver instances and it is also available
|
||||
during runtime at /sys/module/e1000e/parameters/copybreak.
|
||||
|
||||
To use copybreak, type::
|
||||
|
||||
modprobe e1000e.ko copybreak=128
|
||||
|
||||
SmartPowerDownEnable
|
||||
--------------------
|
||||
:Valid Range: 0,1
|
||||
:Default Value: 0 (disabled)
|
||||
|
||||
Allows the PHY to turn off in lower power states. The user can turn off this
|
||||
parameter in supported chipsets.
|
||||
|
||||
KumeranLockLoss
|
||||
---------------
|
||||
:Valid Range: 0,1
|
||||
:Default Value: 1 (enabled)
|
||||
|
||||
This workaround skips resetting the PHY at shutdown for the initial silicon
|
||||
releases of ICH8 systems.
|
||||
|
||||
IntMode
|
||||
-------
|
||||
:Valid Range: 0-2
|
||||
:Default Value: 0
|
||||
|
||||
+-------+----------------+
|
||||
| Value | Interrupt Mode |
|
||||
+=======+================+
|
||||
| 0 | Legacy |
|
||||
+-------+----------------+
|
||||
| 1 | MSI |
|
||||
+-------+----------------+
|
||||
| 2 | MSI-X |
|
||||
+-------+----------------+
|
||||
|
||||
IntMode allows load time control over the type of interrupt registered for by
|
||||
the driver. MSI-X is required for multiple queue support, and some kernels and
|
||||
combinations of kernel .config options will force a lower level of interrupt
|
||||
support.
|
||||
|
||||
This command will show different values for each type of interrupt::
|
||||
|
||||
cat /proc/interrupts
|
||||
|
||||
CrcStripping
|
||||
------------
|
||||
:Valid Range: 0,1
|
||||
:Default Value: 1 (enabled)
|
||||
|
||||
Strip the CRC from received packets before sending up the network stack. If
|
||||
you have a machine with a BMC enabled but cannot receive IPMI traffic after
|
||||
loading or enabling the driver, try disabling this feature.
|
||||
|
||||
WriteProtectNVM
|
||||
---------------
|
||||
:Valid Range: 0,1
|
||||
:Default Value: 1 (enabled)
|
||||
|
||||
If set to 1, configure the hardware to ignore all write/erase cycles to the
|
||||
GbE region in the ICHx NVM (in order to prevent accidental corruption of the
|
||||
NVM). This feature can be disabled by setting the parameter to 0 during initial
|
||||
driver load.
|
||||
|
||||
NOTE: The machine must be power cycled (full off/on) when enabling NVM writes
|
||||
via setting the parameter to zero. Once the NVM has been locked (via the
|
||||
parameter at 1 when the driver loads) it cannot be unlocked except via power
|
||||
cycle.
|
||||
|
||||
Debug
|
||||
-----
|
||||
:Valid Range: 0-16 (0=none,...,16=all)
|
||||
:Default Value: 0
|
||||
|
||||
This parameter adjusts the level of debug messages displayed in the system logs.
|
||||
|
||||
|
||||
Additional Features and Configurations
|
||||
======================================
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
|
||||
to a value larger than the default value of 1500.
|
||||
|
||||
Use the ifconfig command to increase the MTU size. For example, enter the
|
||||
following where <x> is the interface number::
|
||||
|
||||
ifconfig eth<x> mtu 9000 up
|
||||
|
||||
Alternatively, you can use the ip command as follows::
|
||||
|
||||
ip link set mtu 9000 dev eth<x>
|
||||
ip link set up dev eth<x>
|
||||
|
||||
This setting is not saved across reboots. The setting change can be made
|
||||
permanent by adding 'MTU=9000' to the file:
|
||||
|
||||
- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
|
||||
- For SLES: /etc/sysconfig/network/<config_file>
|
||||
|
||||
NOTE: The maximum MTU setting for Jumbo Frames is 8996. This value coincides
|
||||
with the maximum Jumbo Frames size of 9018 bytes.
|
||||
|
||||
NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
|
||||
poor performance or loss of link.
|
||||
|
||||
NOTE: The following adapters limit Jumbo Frames sized packets to a maximum of
|
||||
4088 bytes:
|
||||
|
||||
- Intel(R) 82578DM Gigabit Network Connection
|
||||
- Intel(R) 82577LM Gigabit Network Connection
|
||||
|
||||
The following adapters do not support Jumbo Frames:
|
||||
|
||||
- Intel(R) PRO/1000 Gigabit Server Adapter
|
||||
- Intel(R) PRO/1000 PM Network Connection
|
||||
- Intel(R) 82562G 10/100 Network Connection
|
||||
- Intel(R) 82562G-2 10/100 Network Connection
|
||||
- Intel(R) 82562GT 10/100 Network Connection
|
||||
- Intel(R) 82562GT-2 10/100 Network Connection
|
||||
- Intel(R) 82562V 10/100 Network Connection
|
||||
- Intel(R) 82562V-2 10/100 Network Connection
|
||||
- Intel(R) 82566DC Gigabit Network Connection
|
||||
- Intel(R) 82566DC-2 Gigabit Network Connection
|
||||
- Intel(R) 82566DM Gigabit Network Connection
|
||||
- Intel(R) 82566MC Gigabit Network Connection
|
||||
- Intel(R) 82566MM Gigabit Network Connection
|
||||
- Intel(R) 82567V-3 Gigabit Network Connection
|
||||
- Intel(R) 82577LC Gigabit Network Connection
|
||||
- Intel(R) 82578DC Gigabit Network Connection
|
||||
|
||||
NOTE: Jumbo Frames cannot be configured on an 82579-based Network device if
|
||||
MACSec is enabled on the system.
|
||||
|
||||
|
||||
ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The latest ethtool
|
||||
version is required for this functionality. Download it at:
|
||||
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
NOTE: When validating enable/disable tests on some parts (for example, 82578),
|
||||
it is necessary to add a few seconds between tests when working with ethtool.
|
||||
|
||||
|
||||
Speed and Duplex Configuration
|
||||
------------------------------
|
||||
In addressing speed and duplex configuration issues, you need to distinguish
|
||||
between copper-based adapters and fiber-based adapters.
|
||||
|
||||
In the default mode, an Intel(R) Ethernet Network Adapter using copper
|
||||
connections will attempt to auto-negotiate with its link partner to determine
|
||||
the best setting. If the adapter cannot establish link with the link partner
|
||||
using auto-negotiation, you may need to manually configure the adapter and link
|
||||
partner to identical settings to establish link and pass packets. This should
|
||||
only be needed when attempting to link with an older switch that does not
|
||||
support auto-negotiation or one that has been forced to a specific speed or
|
||||
duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
|
||||
and higher cannot be forced. Use the autonegotiation advertising setting to
|
||||
manually set devices for 1 Gbps and higher.
|
||||
|
||||
Speed, duplex, and autonegotiation advertising are configured through the
|
||||
ethtool utility.
|
||||
|
||||
Caution: Only experienced network administrators should force speed and duplex
|
||||
or change autonegotiation advertising manually. The settings at the switch must
|
||||
always match the adapter settings. Adapter performance may suffer or your
|
||||
adapter may not operate if you configure the adapter differently from your
|
||||
switch.
|
||||
|
||||
An Intel(R) Ethernet Network Adapter using fiber-based connections, however,
|
||||
will not attempt to auto-negotiate with its link partner since those adapters
|
||||
operate only in full duplex and only at their native speed.
|
||||
|
||||
|
||||
Enabling Wake on LAN (WoL)
|
||||
--------------------------
|
||||
WoL is configured through the ethtool utility.
|
||||
|
||||
WoL will be enabled on the system during the next shut down or reboot. For
|
||||
this driver version, in order to enable WoL, the e1000e driver must be loaded
|
||||
prior to shutting down or suspending the system.
|
||||
|
||||
NOTE: Wake on LAN is only supported on port A for the following devices:
|
||||
- Intel(R) PRO/1000 PT Dual Port Network Connection
|
||||
- Intel(R) PRO/1000 PT Dual Port Server Connection
|
||||
- Intel(R) PRO/1000 PT Dual Port Server Adapter
|
||||
- Intel(R) PRO/1000 PF Dual Port Server Adapter
|
||||
- Intel(R) PRO/1000 PT Quad Port Server Adapter
|
||||
- Intel(R) Gigabit PT Quad Port Server ExpressModule
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
142
Documentation/networking/device_drivers/ethernet/intel/fm10k.rst
Normal file
142
Documentation/networking/device_drivers/ethernet/intel/fm10k.rst
Normal file
@@ -0,0 +1,142 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
=============================================================
|
||||
Linux Base Driver for Intel(R) Ethernet Multi-host Controller
|
||||
=============================================================
|
||||
|
||||
August 20, 2018
|
||||
Copyright(c) 2015-2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
- Identifying Your Adapter
|
||||
- Additional Configurations
|
||||
- Performance Tuning
|
||||
- Known Issues
|
||||
- Support
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
The driver in this release is compatible with devices based on the Intel(R)
|
||||
Ethernet Multi-host Controller.
|
||||
|
||||
For information on how to identify your adapter, and for the latest Intel
|
||||
network drivers, refer to the Intel Support website:
|
||||
http://www.intel.com/support
|
||||
|
||||
|
||||
Flow Control
|
||||
------------
|
||||
The Intel(R) Ethernet Switch Host Interface Driver does not support Flow
|
||||
Control. It will not send pause frames. This may result in dropped frames.
|
||||
|
||||
|
||||
Virtual Functions (VFs)
|
||||
-----------------------
|
||||
Use sysfs to enable VFs.
|
||||
Valid Range: 0-64
|
||||
|
||||
For example::
|
||||
|
||||
echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs //enable VFs
|
||||
echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs
|
||||
|
||||
NOTE: Neither the device nor the driver control how VFs are mapped into config
|
||||
space. Bus layout will vary by operating system. On operating systems that
|
||||
support it, you can check sysfs to find the mapping.
|
||||
|
||||
NOTE: When SR-IOV mode is enabled, hardware VLAN filtering and VLAN tag
|
||||
stripping/insertion will remain enabled. Please remove the old VLAN filter
|
||||
before the new VLAN filter is added. For example::
|
||||
|
||||
ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0
|
||||
ip link set eth0 vf 0 vlan 0 // Delete vlan 100
|
||||
ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0
|
||||
|
||||
|
||||
Additional Features and Configurations
|
||||
======================================
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
|
||||
to a value larger than the default value of 1500.
|
||||
|
||||
Use the ifconfig command to increase the MTU size. For example, enter the
|
||||
following where <x> is the interface number::
|
||||
|
||||
ifconfig eth<x> mtu 9000 up
|
||||
|
||||
Alternatively, you can use the ip command as follows::
|
||||
|
||||
ip link set mtu 9000 dev eth<x>
|
||||
ip link set up dev eth<x>
|
||||
|
||||
This setting is not saved across reboots. The setting change can be made
|
||||
permanent by adding 'MTU=9000' to the file:
|
||||
|
||||
- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
|
||||
- For SLES: /etc/sysconfig/network/<config_file>
|
||||
|
||||
NOTE: The maximum MTU setting for Jumbo Frames is 15342. This value coincides
|
||||
with the maximum Jumbo Frames size of 15364 bytes.
|
||||
|
||||
NOTE: This driver will attempt to use multiple page sized buffers to receive
|
||||
each jumbo packet. This should help to avoid buffer starvation issues when
|
||||
allocating receive packets.
|
||||
|
||||
|
||||
Generic Receive Offload, aka GRO
|
||||
--------------------------------
|
||||
The driver supports the in-kernel software implementation of GRO. GRO has
|
||||
shown that by coalescing Rx traffic into larger chunks of data, CPU
|
||||
utilization can be significantly reduced when under large Rx load. GRO is an
|
||||
evolution of the previously-used LRO interface. GRO is able to coalesce
|
||||
other protocols besides TCP. It's also safe to use with configurations that
|
||||
are problematic for LRO, namely bridging and iSCSI.
|
||||
|
||||
|
||||
|
||||
Supported ethtool Commands and Options for Filtering
|
||||
----------------------------------------------------
|
||||
-n --show-nfc
|
||||
Retrieves the receive network flow classification configurations.
|
||||
|
||||
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6
|
||||
Retrieves the hash options for the specified network traffic type.
|
||||
|
||||
-N --config-nfc
|
||||
Configures the receive network flow classification.
|
||||
|
||||
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r
|
||||
Configures the hash options for the specified network traffic type.
|
||||
|
||||
- udp4: UDP over IPv4
|
||||
- udp6: UDP over IPv6
|
||||
- f Hash on bytes 0 and 1 of the Layer 4 header of the rx packet.
|
||||
- n Hash on bytes 2 and 3 of the Layer 4 header of the rx packet.
|
||||
|
||||
|
||||
Known Issues/Troubleshooting
|
||||
============================
|
||||
|
||||
Enabling SR-IOV in a 64-bit Microsoft Windows Server 2012/R2 guest OS under Linux KVM
|
||||
-------------------------------------------------------------------------------------
|
||||
KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This
|
||||
includes traditional PCIe devices, as well as SR-IOV-capable devices based on
|
||||
the Intel Ethernet Controller XL710.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
771
Documentation/networking/device_drivers/ethernet/intel/i40e.rst
Normal file
771
Documentation/networking/device_drivers/ethernet/intel/i40e.rst
Normal file
@@ -0,0 +1,771 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
=================================================================
|
||||
Linux Base Driver for the Intel(R) Ethernet Controller 700 Series
|
||||
=================================================================
|
||||
|
||||
Intel 40 Gigabit Linux driver.
|
||||
Copyright(c) 1999-2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Overview
|
||||
- Identifying Your Adapter
|
||||
- Intel(R) Ethernet Flow Director
|
||||
- Additional Configurations
|
||||
- Known Issues
|
||||
- Support
|
||||
|
||||
|
||||
Driver information can be obtained using ethtool, lspci, and ifconfig.
|
||||
Instructions on updating ethtool can be found in the section Additional
|
||||
Configurations later in this document.
|
||||
|
||||
For questions related to hardware requirements, refer to the documentation
|
||||
supplied with your Intel adapter. All hardware requirements listed apply to use
|
||||
with Linux.
|
||||
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
The driver is compatible with devices based on the following:
|
||||
|
||||
* Intel(R) Ethernet Controller X710
|
||||
* Intel(R) Ethernet Controller XL710
|
||||
* Intel(R) Ethernet Network Connection X722
|
||||
* Intel(R) Ethernet Controller XXV710
|
||||
|
||||
For the best performance, make sure the latest NVM/FW is installed on your
|
||||
device.
|
||||
|
||||
For information on how to identify your adapter, and for the latest NVM/FW
|
||||
images and Intel network drivers, refer to the Intel Support website:
|
||||
https://www.intel.com/support
|
||||
|
||||
SFP+ and QSFP+ Devices
|
||||
----------------------
|
||||
For information about supported media, refer to this document:
|
||||
https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf
|
||||
|
||||
NOTE: Some adapters based on the Intel(R) Ethernet Controller 700 Series only
|
||||
support Intel Ethernet Optics modules. On these adapters, other modules are not
|
||||
supported and will not function. In all cases Intel recommends using Intel
|
||||
Ethernet Optics; other modules may function but are not validated by Intel.
|
||||
Contact Intel for supported media types.
|
||||
|
||||
NOTE: For connections based on Intel(R) Ethernet Controller 700 Series, support
|
||||
is dependent on your system board. Please see your vendor for details.
|
||||
|
||||
NOTE: In systems that do not have adequate airflow to cool the adapter and
|
||||
optical modules, you must use high temperature optical modules.
|
||||
|
||||
Virtual Functions (VFs)
|
||||
-----------------------
|
||||
Use sysfs to enable VFs. For example::
|
||||
|
||||
#echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs #enable VFs
|
||||
#echo 0 > /sys/class/net/$dev/device/sriov_numvfs #disable VFs
|
||||
|
||||
For example, the following instructions will configure PF eth0 and the first VF
|
||||
on VLAN 10::
|
||||
|
||||
$ ip link set dev eth0 vf 0 vlan 10
|
||||
|
||||
VLAN Tag Packet Steering
|
||||
------------------------
|
||||
Allows you to send all packets with a specific VLAN tag to a particular SR-IOV
|
||||
virtual function (VF). Further, this feature allows you to designate a
|
||||
particular VF as trusted, and allows that trusted VF to request selective
|
||||
promiscuous mode on the Physical Function (PF).
|
||||
|
||||
To set a VF as trusted or untrusted, enter the following command in the
|
||||
Hypervisor::
|
||||
|
||||
# ip link set dev eth0 vf 1 trust [on|off]
|
||||
|
||||
Once the VF is designated as trusted, use the following commands in the VM to
|
||||
set the VF to promiscuous mode.
|
||||
|
||||
::
|
||||
|
||||
For promiscuous all:
|
||||
#ip link set eth2 promisc on
|
||||
Where eth2 is a VF interface in the VM
|
||||
|
||||
For promiscuous Multicast:
|
||||
#ip link set eth2 allmulticast on
|
||||
Where eth2 is a VF interface in the VM
|
||||
|
||||
NOTE: By default, the ethtool priv-flag vf-true-promisc-support is set to
|
||||
"off",meaning that promiscuous mode for the VF will be limited. To set the
|
||||
promiscuous mode for the VF to true promiscuous and allow the VF to see all
|
||||
ingress traffic, use the following command::
|
||||
|
||||
#ethtool -set-priv-flags p261p1 vf-true-promisc-support on
|
||||
|
||||
The vf-true-promisc-support priv-flag does not enable promiscuous mode; rather,
|
||||
it designates which type of promiscuous mode (limited or true) you will get
|
||||
when you enable promiscuous mode using the ip link commands above. Note that
|
||||
this is a global setting that affects the entire device. However,the
|
||||
vf-true-promisc-support priv-flag is only exposed to the first PF of the
|
||||
device. The PF remains in limited promiscuous mode (unless it is in MFP mode)
|
||||
regardless of the vf-true-promisc-support setting.
|
||||
|
||||
Now add a VLAN interface on the VF interface::
|
||||
|
||||
#ip link add link eth2 name eth2.100 type vlan id 100
|
||||
|
||||
Note that the order in which you set the VF to promiscuous mode and add the
|
||||
VLAN interface does not matter (you can do either first). The end result in
|
||||
this example is that the VF will get all traffic that is tagged with VLAN 100.
|
||||
|
||||
Intel(R) Ethernet Flow Director
|
||||
-------------------------------
|
||||
The Intel Ethernet Flow Director performs the following tasks:
|
||||
|
||||
- Directs receive packets according to their flows to different queues.
|
||||
- Enables tight control on routing a flow in the platform.
|
||||
- Matches flows and CPU cores for flow affinity.
|
||||
- Supports multiple parameters for flexible flow classification and load
|
||||
balancing (in SFP mode only).
|
||||
|
||||
NOTE: The Linux i40e driver supports the following flow types: IPv4, TCPv4, and
|
||||
UDPv4. For a given flow type, it supports valid combinations of IP addresses
|
||||
(source or destination) and UDP/TCP ports (source and destination). For
|
||||
example, you can supply only a source IP address, a source IP address and a
|
||||
destination port, or any combination of one or more of these four parameters.
|
||||
|
||||
NOTE: The Linux i40e driver allows you to filter traffic based on a
|
||||
user-defined flexible two-byte pattern and offset by using the ethtool user-def
|
||||
and mask fields. Only L3 and L4 flow types are supported for user-defined
|
||||
flexible filters. For a given flow type, you must clear all Intel Ethernet Flow
|
||||
Director filters before changing the input set (for that flow type).
|
||||
|
||||
To enable or disable the Intel Ethernet Flow Director::
|
||||
|
||||
# ethtool -K ethX ntuple <on|off>
|
||||
|
||||
When disabling ntuple filters, all the user programmed filters are flushed from
|
||||
the driver cache and hardware. All needed filters must be re-added when ntuple
|
||||
is re-enabled.
|
||||
|
||||
To add a filter that directs packet to queue 2, use -U or -N switch::
|
||||
|
||||
# ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
|
||||
192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
|
||||
|
||||
To set a filter using only the source and destination IP address::
|
||||
|
||||
# ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
|
||||
192.168.10.2 action 2 [loc 1]
|
||||
|
||||
To see the list of filters currently present::
|
||||
|
||||
# ethtool <-u|-n> ethX
|
||||
|
||||
Application Targeted Routing (ATR) Perfect Filters
|
||||
--------------------------------------------------
|
||||
ATR is enabled by default when the kernel is in multiple transmit queue mode.
|
||||
An ATR Intel Ethernet Flow Director filter rule is added when a TCP-IP flow
|
||||
starts and is deleted when the flow ends. When a TCP-IP Intel Ethernet Flow
|
||||
Director rule is added from ethtool (Sideband filter), ATR is turned off by the
|
||||
driver. To re-enable ATR, the sideband can be disabled with the ethtool -K
|
||||
option. For example::
|
||||
|
||||
ethtool –K [adapter] ntuple [off|on]
|
||||
|
||||
If sideband is re-enabled after ATR is re-enabled, ATR remains enabled until a
|
||||
TCP-IP flow is added. When all TCP-IP sideband rules are deleted, ATR is
|
||||
automatically re-enabled.
|
||||
|
||||
Packets that match the ATR rules are counted in fdir_atr_match stats in
|
||||
ethtool, which also can be used to verify whether ATR rules still exist.
|
||||
|
||||
Sideband Perfect Filters
|
||||
------------------------
|
||||
Sideband Perfect Filters are used to direct traffic that matches specified
|
||||
characteristics. They are enabled through ethtool's ntuple interface. To add a
|
||||
new filter use the following command::
|
||||
|
||||
ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \
|
||||
dst-port <port> action <queue>
|
||||
|
||||
Where:
|
||||
<device> - the ethernet device to program
|
||||
<type> - can be ip4, tcp4, udp4, or sctp4
|
||||
<ip> - the ip address to match on
|
||||
<port> - the port number to match on
|
||||
<queue> - the queue to direct traffic towards (-1 discards matching traffic)
|
||||
|
||||
Use the following command to display all of the active filters::
|
||||
|
||||
ethtool -u <device>
|
||||
|
||||
Use the following command to delete a filter::
|
||||
|
||||
ethtool -U <device> delete <N>
|
||||
|
||||
Where <N> is the filter id displayed when printing all the active filters, and
|
||||
may also have been specified using "loc <N>" when adding the filter.
|
||||
|
||||
The following example matches TCP traffic sent from 192.168.0.1, port 5300,
|
||||
directed to 192.168.0.5, port 80, and sends it to queue 7::
|
||||
|
||||
ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \
|
||||
src-port 5300 dst-port 80 action 7
|
||||
|
||||
For each flow-type, the programmed filters must all have the same matching
|
||||
input set. For example, issuing the following two commands is acceptable::
|
||||
|
||||
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
|
||||
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
|
||||
|
||||
Issuing the next two commands, however, is not acceptable, since the first
|
||||
specifies src-ip and the second specifies dst-ip::
|
||||
|
||||
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
|
||||
ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
|
||||
|
||||
The second command will fail with an error. You may program multiple filters
|
||||
with the same fields, using different values, but, on one device, you may not
|
||||
program two tcp4 filters with different matching fields.
|
||||
|
||||
Matching on a sub-portion of a field is not supported by the i40e driver, thus
|
||||
partial mask fields are not supported.
|
||||
|
||||
The driver also supports matching user-defined data within the packet payload.
|
||||
This flexible data is specified using the "user-def" field of the ethtool
|
||||
command in the following way:
|
||||
|
||||
+----------------------------+--------------------------+
|
||||
| 31 28 24 20 16 | 15 12 8 4 0 |
|
||||
+----------------------------+--------------------------+
|
||||
| offset into packet payload | 2 bytes of flexible data |
|
||||
+----------------------------+--------------------------+
|
||||
|
||||
For example,
|
||||
|
||||
::
|
||||
|
||||
... user-def 0x4FFFF ...
|
||||
|
||||
tells the filter to look 4 bytes into the payload and match that value against
|
||||
0xFFFF. The offset is based on the beginning of the payload, and not the
|
||||
beginning of the packet. Thus
|
||||
|
||||
::
|
||||
|
||||
flow-type tcp4 ... user-def 0x8BEAF ...
|
||||
|
||||
would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the
|
||||
TCP/IPv4 payload.
|
||||
|
||||
Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload.
|
||||
Thus to match the first byte of the payload, you must actually add 4 bytes to
|
||||
the offset. Also note that ip4 filters match both ICMP frames as well as raw
|
||||
(unknown) ip4 frames, where the payload will be the L3 payload of the IP4 frame.
|
||||
|
||||
The maximum offset is 64. The hardware will only read up to 64 bytes of data
|
||||
from the payload. The offset must be even because the flexible data is 2 bytes
|
||||
long and must be aligned to byte 0 of the packet payload.
|
||||
|
||||
The user-defined flexible offset is also considered part of the input set and
|
||||
cannot be programmed separately for multiple filters of the same type. However,
|
||||
the flexible data is not part of the input set and multiple filters may use the
|
||||
same offset but match against different data.
|
||||
|
||||
To create filters that direct traffic to a specific Virtual Function, use the
|
||||
"action" parameter. Specify the action as a 64 bit value, where the lower 32
|
||||
bits represents the queue number, while the next 8 bits represent which VF.
|
||||
Note that 0 is the PF, so the VF identifier is offset by 1. For example::
|
||||
|
||||
... action 0x800000002 ...
|
||||
|
||||
specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of
|
||||
that VF.
|
||||
|
||||
Note that these filters will not break internal routing rules, and will not
|
||||
route traffic that otherwise would not have been sent to the specified Virtual
|
||||
Function.
|
||||
|
||||
Setting the link-down-on-close Private Flag
|
||||
-------------------------------------------
|
||||
When the link-down-on-close private flag is set to "on", the port's link will
|
||||
go down when the interface is brought down using the ifconfig ethX down command.
|
||||
|
||||
Use ethtool to view and set link-down-on-close, as follows::
|
||||
|
||||
ethtool --show-priv-flags ethX
|
||||
ethtool --set-priv-flags ethX link-down-on-close [on|off]
|
||||
|
||||
Viewing Link Messages
|
||||
---------------------
|
||||
Link messages will not be displayed to the console if the distribution is
|
||||
restricting system messages. In order to see network driver link messages on
|
||||
your console, set dmesg to eight by entering the following::
|
||||
|
||||
dmesg -n 8
|
||||
|
||||
NOTE: This setting is not saved across reboots.
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
|
||||
to a value larger than the default value of 1500.
|
||||
|
||||
Use the ifconfig command to increase the MTU size. For example, enter the
|
||||
following where <x> is the interface number::
|
||||
|
||||
ifconfig eth<x> mtu 9000 up
|
||||
|
||||
Alternatively, you can use the ip command as follows::
|
||||
|
||||
ip link set mtu 9000 dev eth<x>
|
||||
ip link set up dev eth<x>
|
||||
|
||||
This setting is not saved across reboots. The setting change can be made
|
||||
permanent by adding 'MTU=9000' to the file::
|
||||
|
||||
/etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL
|
||||
/etc/sysconfig/network/<config_file> // for SLES
|
||||
|
||||
NOTE: The maximum MTU setting for Jumbo Frames is 9702. This value coincides
|
||||
with the maximum Jumbo Frames size of 9728 bytes.
|
||||
|
||||
NOTE: This driver will attempt to use multiple page sized buffers to receive
|
||||
each jumbo packet. This should help to avoid buffer starvation issues when
|
||||
allocating receive packets.
|
||||
|
||||
ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The latest ethtool
|
||||
version is required for this functionality. Download it at:
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
Supported ethtool Commands and Options for Filtering
|
||||
----------------------------------------------------
|
||||
-n --show-nfc
|
||||
Retrieves the receive network flow classification configurations.
|
||||
|
||||
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6
|
||||
Retrieves the hash options for the specified network traffic type.
|
||||
|
||||
-N --config-nfc
|
||||
Configures the receive network flow classification.
|
||||
|
||||
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r...
|
||||
Configures the hash options for the specified network traffic type.
|
||||
|
||||
udp4 UDP over IPv4
|
||||
udp6 UDP over IPv6
|
||||
|
||||
f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
|
||||
n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
|
||||
|
||||
Speed and Duplex Configuration
|
||||
------------------------------
|
||||
In addressing speed and duplex configuration issues, you need to distinguish
|
||||
between copper-based adapters and fiber-based adapters.
|
||||
|
||||
In the default mode, an Intel(R) Ethernet Network Adapter using copper
|
||||
connections will attempt to auto-negotiate with its link partner to determine
|
||||
the best setting. If the adapter cannot establish link with the link partner
|
||||
using auto-negotiation, you may need to manually configure the adapter and link
|
||||
partner to identical settings to establish link and pass packets. This should
|
||||
only be needed when attempting to link with an older switch that does not
|
||||
support auto-negotiation or one that has been forced to a specific speed or
|
||||
duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
|
||||
and higher cannot be forced. Use the autonegotiation advertising setting to
|
||||
manually set devices for 1 Gbps and higher.
|
||||
|
||||
NOTE: You cannot set the speed for devices based on the Intel(R) Ethernet
|
||||
Network Adapter XXV710 based devices.
|
||||
|
||||
Speed, duplex, and autonegotiation advertising are configured through the
|
||||
ethtool utility.
|
||||
|
||||
Caution: Only experienced network administrators should force speed and duplex
|
||||
or change autonegotiation advertising manually. The settings at the switch must
|
||||
always match the adapter settings. Adapter performance may suffer or your
|
||||
adapter may not operate if you configure the adapter differently from your
|
||||
switch.
|
||||
|
||||
An Intel(R) Ethernet Network Adapter using fiber-based connections, however,
|
||||
will not attempt to auto-negotiate with its link partner since those adapters
|
||||
operate only in full duplex and only at their native speed.
|
||||
|
||||
NAPI
|
||||
----
|
||||
NAPI (Rx polling mode) is supported in the i40e driver.
|
||||
For more information on NAPI, see
|
||||
https://wiki.linuxfoundation.org/networking/napi
|
||||
|
||||
Flow Control
|
||||
------------
|
||||
Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
|
||||
receiving and transmitting pause frames for i40e. When transmit is enabled,
|
||||
pause frames are generated when the receive packet buffer crosses a predefined
|
||||
threshold. When receive is enabled, the transmit unit will halt for the time
|
||||
delay specified when a pause frame is received.
|
||||
|
||||
NOTE: You must have a flow control capable link partner.
|
||||
|
||||
Flow Control is on by default.
|
||||
|
||||
Use ethtool to change the flow control settings.
|
||||
|
||||
To enable or disable Rx or Tx Flow Control::
|
||||
|
||||
ethtool -A eth? rx <on|off> tx <on|off>
|
||||
|
||||
Note: This command only enables or disables Flow Control if auto-negotiation is
|
||||
disabled. If auto-negotiation is enabled, this command changes the parameters
|
||||
used for auto-negotiation with the link partner.
|
||||
|
||||
To enable or disable auto-negotiation::
|
||||
|
||||
ethtool -s eth? autoneg <on|off>
|
||||
|
||||
Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
|
||||
on your device, you may not be able to change the auto-negotiation setting.
|
||||
|
||||
RSS Hash Flow
|
||||
-------------
|
||||
Allows you to set the hash bytes per flow type and any combination of one or
|
||||
more options for Receive Side Scaling (RSS) hash byte configuration.
|
||||
|
||||
::
|
||||
|
||||
# ethtool -N <dev> rx-flow-hash <type> <option>
|
||||
|
||||
Where <type> is:
|
||||
tcp4 signifying TCP over IPv4
|
||||
udp4 signifying UDP over IPv4
|
||||
tcp6 signifying TCP over IPv6
|
||||
udp6 signifying UDP over IPv6
|
||||
And <option> is one or more of:
|
||||
s Hash on the IP source address of the Rx packet.
|
||||
d Hash on the IP destination address of the Rx packet.
|
||||
f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
|
||||
n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
|
||||
|
||||
MAC and VLAN anti-spoofing feature
|
||||
----------------------------------
|
||||
When a malicious driver attempts to send a spoofed packet, it is dropped by the
|
||||
hardware and not transmitted.
|
||||
NOTE: This feature can be disabled for a specific Virtual Function (VF)::
|
||||
|
||||
ip link set <pf dev> vf <vf id> spoofchk {off|on}
|
||||
|
||||
IEEE 1588 Precision Time Protocol (PTP) Hardware Clock (PHC)
|
||||
------------------------------------------------------------
|
||||
Precision Time Protocol (PTP) is used to synchronize clocks in a computer
|
||||
network. PTP support varies among Intel devices that support this driver. Use
|
||||
"ethtool -T <netdev name>" to get a definitive list of PTP capabilities
|
||||
supported by the device.
|
||||
|
||||
IEEE 802.1ad (QinQ) Support
|
||||
---------------------------
|
||||
The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
|
||||
IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
|
||||
"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
|
||||
allow L2 tunneling and the ability to segregate traffic within a particular
|
||||
VLAN ID, among other uses.
|
||||
|
||||
The following are examples of how to configure 802.1ad (QinQ)::
|
||||
|
||||
ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
|
||||
ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
|
||||
|
||||
Where "24" and "371" are example VLAN IDs.
|
||||
|
||||
NOTES:
|
||||
Receive checksum offloads, cloud filters, and VLAN acceleration are not
|
||||
supported for 802.1ad (QinQ) packets.
|
||||
|
||||
VXLAN and GENEVE Overlay HW Offloading
|
||||
--------------------------------------
|
||||
Virtual Extensible LAN (VXLAN) allows you to extend an L2 network over an L3
|
||||
network, which may be useful in a virtualized or cloud environment. Some
|
||||
Intel(R) Ethernet Network devices perform VXLAN processing, offloading it from
|
||||
the operating system. This reduces CPU utilization.
|
||||
|
||||
VXLAN offloading is controlled by the Tx and Rx checksum offload options
|
||||
provided by ethtool. That is, if Tx checksum offload is enabled, and the
|
||||
adapter has the capability, VXLAN offloading is also enabled.
|
||||
|
||||
Support for VXLAN and GENEVE HW offloading is dependent on kernel support of
|
||||
the HW offloading features.
|
||||
|
||||
Multiple Functions per Port
|
||||
---------------------------
|
||||
Some adapters based on the Intel Ethernet Controller X710/XL710 support
|
||||
multiple functions on a single physical port. Configure these functions through
|
||||
the System Setup/BIOS.
|
||||
|
||||
Minimum TX Bandwidth is the guaranteed minimum data transmission bandwidth, as
|
||||
a percentage of the full physical port link speed, that the partition will
|
||||
receive. The bandwidth the partition is awarded will never fall below the level
|
||||
you specify.
|
||||
|
||||
The range for the minimum bandwidth values is:
|
||||
1 to ((100 minus # of partitions on the physical port) plus 1)
|
||||
For example, if a physical port has 4 partitions, the range would be:
|
||||
1 to ((100 - 4) + 1 = 97)
|
||||
|
||||
The Maximum Bandwidth percentage represents the maximum transmit bandwidth
|
||||
allocated to the partition as a percentage of the full physical port link
|
||||
speed. The accepted range of values is 1-100. The value is used as a limiter,
|
||||
should you chose that any one particular function not be able to consume 100%
|
||||
of a port's bandwidth (should it be available). The sum of all the values for
|
||||
Maximum Bandwidth is not restricted, because no more than 100% of a port's
|
||||
bandwidth can ever be used.
|
||||
|
||||
NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions
|
||||
per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says
|
||||
"add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than
|
||||
64 virtual functions (VFs).
|
||||
|
||||
Data Center Bridging (DCB)
|
||||
--------------------------
|
||||
DCB is a configuration Quality of Service implementation in hardware. It uses
|
||||
the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
|
||||
different priorities that traffic can be filtered into. It also enables
|
||||
priority flow control (802.1Qbb) which can limit or eliminate the number of
|
||||
dropped packets during network stress. Bandwidth can be allocated to each of
|
||||
these priorities, which is enforced at the hardware level (802.1Qaz).
|
||||
|
||||
Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and
|
||||
802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only
|
||||
and can accept settings from a DCBX capable peer. Software configuration of
|
||||
DCBX parameters via dcbtool/lldptool are not supported.
|
||||
|
||||
NOTE: Firmware LLDP can be disabled by setting the private flag disable-fw-lldp.
|
||||
|
||||
The i40e driver implements the DCB netlink interface layer to allow user-space
|
||||
to communicate with the driver and query DCB configuration for the port.
|
||||
|
||||
NOTE:
|
||||
The kernel assumes that TC0 is available, and will disable Priority Flow
|
||||
Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
|
||||
enabled when setting up DCB on your switch.
|
||||
|
||||
Interrupt Rate Limiting
|
||||
-----------------------
|
||||
:Valid Range: 0-235 (0=no limit)
|
||||
|
||||
The Intel(R) Ethernet Controller XL710 family supports an interrupt rate
|
||||
limiting mechanism. The user can control, via ethtool, the number of
|
||||
microseconds between interrupts.
|
||||
|
||||
Syntax::
|
||||
|
||||
# ethtool -C ethX rx-usecs-high N
|
||||
|
||||
The range of 0-235 microseconds provides an effective range of 4,310 to 250,000
|
||||
interrupts per second. The value of rx-usecs-high can be set independently of
|
||||
rx-usecs and tx-usecs in the same ethtool command, and is also independent of
|
||||
the adaptive interrupt moderation algorithm. The underlying hardware supports
|
||||
granularity in 4-microsecond intervals, so adjacent values may result in the
|
||||
same interrupt rate.
|
||||
|
||||
One possible use case is the following::
|
||||
|
||||
# ethtool -C ethX adaptive-rx off adaptive-tx off rx-usecs-high 20 rx-usecs \
|
||||
5 tx-usecs 5
|
||||
|
||||
The above command would disable adaptive interrupt moderation, and allow a
|
||||
maximum of 5 microseconds before indicating a receive or transmit was complete.
|
||||
However, instead of resulting in as many as 200,000 interrupts per second, it
|
||||
limits total interrupts per second to 50,000 via the rx-usecs-high parameter.
|
||||
|
||||
Performance Optimization
|
||||
========================
|
||||
Driver defaults are meant to fit a wide variety of workloads, but if further
|
||||
optimization is required we recommend experimenting with the following settings.
|
||||
|
||||
NOTE: For better performance when processing small (64B) frame sizes, try
|
||||
enabling Hyper threading in the BIOS in order to increase the number of logical
|
||||
cores in the system and subsequently increase the number of queues available to
|
||||
the adapter.
|
||||
|
||||
Virtualized Environments
|
||||
------------------------
|
||||
1. Disable XPS on both ends by using the included virt_perf_default script
|
||||
or by running the following command as root::
|
||||
|
||||
for file in `ls /sys/class/net/<ethX>/queues/tx-*/xps_cpus`;
|
||||
do echo 0 > $file; done
|
||||
|
||||
2. Using the appropriate mechanism (vcpupin) in the vm, pin the cpu's to
|
||||
individual lcpu's, making sure to use a set of cpu's included in the
|
||||
device's local_cpulist: /sys/class/net/<ethX>/device/local_cpulist.
|
||||
|
||||
3. Configure as many Rx/Tx queues in the VM as available. Do not rely on
|
||||
the default setting of 1.
|
||||
|
||||
|
||||
Non-virtualized Environments
|
||||
----------------------------
|
||||
Pin the adapter's IRQs to specific cores by disabling the irqbalance service
|
||||
and using the included set_irq_affinity script. Please see the script's help
|
||||
text for further options.
|
||||
|
||||
- The following settings will distribute the IRQs across all the cores evenly::
|
||||
|
||||
# scripts/set_irq_affinity -x all <interface1> , [ <interface2>, ... ]
|
||||
|
||||
- The following settings will distribute the IRQs across all the cores that are
|
||||
local to the adapter (same NUMA node)::
|
||||
|
||||
# scripts/set_irq_affinity -x local <interface1> ,[ <interface2>, ... ]
|
||||
|
||||
For very CPU intensive workloads, we recommend pinning the IRQs to all cores.
|
||||
|
||||
For IP Forwarding: Disable Adaptive ITR and lower Rx and Tx interrupts per
|
||||
queue using ethtool.
|
||||
|
||||
- Setting rx-usecs and tx-usecs to 125 will limit interrupts to about 8000
|
||||
interrupts per second per queue.
|
||||
|
||||
::
|
||||
|
||||
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 125 \
|
||||
tx-usecs 125
|
||||
|
||||
For lower CPU utilization: Disable Adaptive ITR and lower Rx and Tx interrupts
|
||||
per queue using ethtool.
|
||||
|
||||
- Setting rx-usecs and tx-usecs to 250 will limit interrupts to about 4000
|
||||
interrupts per second per queue.
|
||||
|
||||
::
|
||||
|
||||
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 250 \
|
||||
tx-usecs 250
|
||||
|
||||
For lower latency: Disable Adaptive ITR and ITR by setting Rx and Tx to 0 using
|
||||
ethtool.
|
||||
|
||||
::
|
||||
|
||||
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 0 \
|
||||
tx-usecs 0
|
||||
|
||||
Application Device Queues (ADq)
|
||||
-------------------------------
|
||||
Application Device Queues (ADq) allows you to dedicate one or more queues to a
|
||||
specific application. This can reduce latency for the specified application,
|
||||
and allow Tx traffic to be rate limited per application. Follow the steps below
|
||||
to set ADq.
|
||||
|
||||
1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
|
||||
The shaper bw_rlimit parameter is optional.
|
||||
|
||||
Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
|
||||
to 1Gbit for tc0 and 3Gbit for tc1.
|
||||
|
||||
::
|
||||
|
||||
# tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
|
||||
queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
|
||||
max_rate 1Gbit 3Gbit
|
||||
|
||||
map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
|
||||
sets priorities 0-3 to use tc0 and 4-7 to use tc1)
|
||||
|
||||
queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
|
||||
16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
|
||||
number of queues for all tcs is 64 or number of cores, whichever is lower.)
|
||||
|
||||
hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware
|
||||
offload mode in mqprio that makes full use of the mqprio options, the
|
||||
TCs, the queue configurations, and the QoS parameters.
|
||||
|
||||
shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
|
||||
Totals must be equal or less than port speed.
|
||||
|
||||
For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
|
||||
monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
|
||||
|
||||
2. Enable HW TC offload on interface::
|
||||
|
||||
# ethtool -K <interface> hw-tc-offload on
|
||||
|
||||
3. Apply TCs to ingress (RX) flow of interface::
|
||||
|
||||
# tc qdisc add dev <interface> ingress
|
||||
|
||||
NOTES:
|
||||
- Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory.
|
||||
- ADq is not compatible with cloud filters.
|
||||
- Setting up channels via ethtool (ethtool -L) is not supported when the
|
||||
TCs are configured using mqprio.
|
||||
- You must have iproute2 latest version
|
||||
- NVM version 6.01 or later is required.
|
||||
- ADq cannot be enabled when any the following features are enabled: Data
|
||||
Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband
|
||||
Filters.
|
||||
- If another driver (for example, DPDK) has set cloud filters, you cannot
|
||||
enable ADq.
|
||||
- Tunnel filters are not supported in ADq. If encapsulated packets do
|
||||
arrive in non-tunnel mode, filtering will be done on the inner headers.
|
||||
For example, for VXLAN traffic in non-tunnel mode, PCTYPE is identified
|
||||
as a VXLAN encapsulated packet, outer headers are ignored. Therefore,
|
||||
inner headers are matched.
|
||||
- If a TC filter on a PF matches traffic over a VF (on the PF), that
|
||||
traffic will be routed to the appropriate queue of the PF, and will
|
||||
not be passed on the VF. Such traffic will end up getting dropped higher
|
||||
up in the TCP/IP stack as it does not match PF address data.
|
||||
- If traffic matches multiple TC filters that point to different TCs,
|
||||
that traffic will be duplicated and sent to all matching TC queues.
|
||||
The hardware switch mirrors the packet to a VSI list when multiple
|
||||
filters are matched.
|
||||
|
||||
|
||||
Known Issues/Troubleshooting
|
||||
============================
|
||||
|
||||
NOTE: 1 Gb devices based on the Intel(R) Ethernet Network Connection X722 do
|
||||
not support the following features:
|
||||
|
||||
* Data Center Bridging (DCB)
|
||||
* QOS
|
||||
* VMQ
|
||||
* SR-IOV
|
||||
* Task Encapsulation offload (VXLAN, NVGRE)
|
||||
* Energy Efficient Ethernet (EEE)
|
||||
* Auto-media detect
|
||||
|
||||
Unexpected Issues when the device driver and DPDK share a device
|
||||
----------------------------------------------------------------
|
||||
Unexpected issues may result when an i40e device is in multi driver mode and
|
||||
the kernel driver and DPDK driver are sharing the device. This is because
|
||||
access to the global NIC resources is not synchronized between multiple
|
||||
drivers. Any change to the global NIC configuration (writing to a global
|
||||
register, setting global configuration by AQ, or changing switch modes) will
|
||||
affect all ports and drivers on the device. Loading DPDK with the
|
||||
"multi-driver" module parameter may mitigate some of the issues.
|
||||
|
||||
TC0 must be enabled when setting up DCB on a switch
|
||||
---------------------------------------------------
|
||||
The kernel assumes that TC0 is available, and will disable Priority Flow
|
||||
Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
|
||||
enabled when setting up DCB on your switch.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
331
Documentation/networking/device_drivers/ethernet/intel/iavf.rst
Normal file
331
Documentation/networking/device_drivers/ethernet/intel/iavf.rst
Normal file
@@ -0,0 +1,331 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
=================================================================
|
||||
Linux Base Driver for Intel(R) Ethernet Adaptive Virtual Function
|
||||
=================================================================
|
||||
|
||||
Intel Ethernet Adaptive Virtual Function Linux driver.
|
||||
Copyright(c) 2013-2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Overview
|
||||
- Identifying Your Adapter
|
||||
- Additional Configurations
|
||||
- Known Issues/Troubleshooting
|
||||
- Support
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
This file describes the iavf Linux Base Driver. This driver was formerly
|
||||
called i40evf.
|
||||
|
||||
The iavf driver supports the below mentioned virtual function devices and
|
||||
can only be activated on kernels running the i40e or newer Physical Function
|
||||
(PF) driver compiled with CONFIG_PCI_IOV. The iavf driver requires
|
||||
CONFIG_PCI_MSI to be enabled.
|
||||
|
||||
The guest OS loading the iavf driver must support MSI-X interrupts.
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
|
||||
The driver in this kernel is compatible with devices based on the following:
|
||||
* Intel(R) XL710 X710 Virtual Function
|
||||
* Intel(R) X722 Virtual Function
|
||||
* Intel(R) XXV710 Virtual Function
|
||||
* Intel(R) Ethernet Adaptive Virtual Function
|
||||
|
||||
For the best performance, make sure the latest NVM/FW is installed on your
|
||||
device.
|
||||
|
||||
For information on how to identify your adapter, and for the latest NVM/FW
|
||||
images and Intel network drivers, refer to the Intel Support website:
|
||||
http://www.intel.com/support
|
||||
|
||||
|
||||
Additional Features and Configurations
|
||||
======================================
|
||||
|
||||
Viewing Link Messages
|
||||
---------------------
|
||||
Link messages will not be displayed to the console if the distribution is
|
||||
restricting system messages. In order to see network driver link messages on
|
||||
your console, set dmesg to eight by entering the following::
|
||||
|
||||
# dmesg -n 8
|
||||
|
||||
NOTE:
|
||||
This setting is not saved across reboots.
|
||||
|
||||
ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The latest ethtool
|
||||
version is required for this functionality. Download it at:
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
Setting VLAN Tag Stripping
|
||||
--------------------------
|
||||
If you have applications that require Virtual Functions (VFs) to receive
|
||||
packets with VLAN tags, you can disable VLAN tag stripping for the VF. The
|
||||
Physical Function (PF) processes requests issued from the VF to enable or
|
||||
disable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF,
|
||||
then requests from that VF to set VLAN tag stripping will be ignored.
|
||||
|
||||
To enable/disable VLAN tag stripping for a VF, issue the following command
|
||||
from inside the VM in which you are running the VF::
|
||||
|
||||
# ethtool -K <if_name> rxvlan on/off
|
||||
|
||||
or alternatively::
|
||||
|
||||
# ethtool --offload <if_name> rxvlan on/off
|
||||
|
||||
Adaptive Virtual Function
|
||||
-------------------------
|
||||
Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to
|
||||
adapt to changing feature sets of the physical function driver (PF) with which
|
||||
it is associated. This allows system administrators to update a PF without
|
||||
having to update all the VFs associated with it. All AVFs have a single common
|
||||
device ID and branding string.
|
||||
|
||||
AVFs have a minimum set of features known as "base mode," but may provide
|
||||
additional features depending on what features are available in the PF with
|
||||
which the AVF is associated. The following are base mode features:
|
||||
|
||||
- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
|
||||
for Tx/Rx
|
||||
- i40e descriptors and ring format
|
||||
- Descriptor write-back completion
|
||||
- 1 control queue, with i40e descriptors, CSRs and ring format
|
||||
- 5 MSI-X interrupt vectors and corresponding i40e CSRs
|
||||
- 1 Interrupt Throttle Rate (ITR) index
|
||||
- 1 Virtual Station Interface (VSI) per VF
|
||||
- 1 Traffic Class (TC), TC0
|
||||
- Receive Side Scaling (RSS) with 64 entry indirection table and key,
|
||||
configured through the PF
|
||||
- 1 unicast MAC address reserved per VF
|
||||
- 16 MAC address filters for each VF
|
||||
- Stateless offloads - non-tunneled checksums
|
||||
- AVF device ID
|
||||
- HW mailbox is used for VF to PF communications (including on Windows)
|
||||
|
||||
IEEE 802.1ad (QinQ) Support
|
||||
---------------------------
|
||||
The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
|
||||
IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
|
||||
"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
|
||||
allow L2 tunneling and the ability to segregate traffic within a particular
|
||||
VLAN ID, among other uses.
|
||||
|
||||
The following are examples of how to configure 802.1ad (QinQ)::
|
||||
|
||||
# ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
|
||||
# ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
|
||||
|
||||
Where "24" and "371" are example VLAN IDs.
|
||||
|
||||
NOTES:
|
||||
Receive checksum offloads, cloud filters, and VLAN acceleration are not
|
||||
supported for 802.1ad (QinQ) packets.
|
||||
|
||||
Application Device Queues (ADq)
|
||||
-------------------------------
|
||||
Application Device Queues (ADq) allows you to dedicate one or more queues to a
|
||||
specific application. This can reduce latency for the specified application,
|
||||
and allow Tx traffic to be rate limited per application. Follow the steps below
|
||||
to set ADq.
|
||||
|
||||
Requirements:
|
||||
|
||||
- The sch_mqprio, act_mirred and cls_flower modules must be loaded
|
||||
- The latest version of iproute2
|
||||
- If another driver (for example, DPDK) has set cloud filters, you cannot
|
||||
enable ADQ
|
||||
- Depending on the underlying PF device, ADQ cannot be enabled when the
|
||||
following features are enabled:
|
||||
|
||||
+ Data Center Bridging (DCB)
|
||||
+ Multiple Functions per Port (MFP)
|
||||
+ Sideband Filters
|
||||
|
||||
1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
|
||||
The shaper bw_rlimit parameter is optional.
|
||||
|
||||
Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
|
||||
to 1Gbit for tc0 and 3Gbit for tc1.
|
||||
|
||||
::
|
||||
|
||||
tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
|
||||
queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
|
||||
max_rate 1Gbit 3Gbit
|
||||
|
||||
map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
|
||||
sets priorities 0-3 to use tc0 and 4-7 to use tc1)
|
||||
|
||||
queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
|
||||
16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
|
||||
number of queues for all tcs is 64 or number of cores, whichever is lower.)
|
||||
|
||||
hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware
|
||||
offload mode in mqprio that makes full use of the mqprio options, the
|
||||
TCs, the queue configurations, and the QoS parameters.
|
||||
|
||||
shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
|
||||
Totals must be equal or less than port speed.
|
||||
|
||||
For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
|
||||
monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
|
||||
|
||||
NOTE:
|
||||
Setting up channels via ethtool (ethtool -L) is not supported when the
|
||||
TCs are configured using mqprio.
|
||||
|
||||
2. Enable HW TC offload on interface::
|
||||
|
||||
# ethtool -K <interface> hw-tc-offload on
|
||||
|
||||
3. Apply TCs to ingress (RX) flow of interface::
|
||||
|
||||
# tc qdisc add dev <interface> ingress
|
||||
|
||||
NOTES:
|
||||
- Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory
|
||||
- ADq is not compatible with cloud filters
|
||||
- Setting up channels via ethtool (ethtool -L) is not supported when the TCs
|
||||
are configured using mqprio
|
||||
- You must have iproute2 latest version
|
||||
- NVM version 6.01 or later is required
|
||||
- ADq cannot be enabled when any the following features are enabled: Data
|
||||
Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters
|
||||
- If another driver (for example, DPDK) has set cloud filters, you cannot
|
||||
enable ADq
|
||||
- Tunnel filters are not supported in ADq. If encapsulated packets do arrive
|
||||
in non-tunnel mode, filtering will be done on the inner headers. For example,
|
||||
for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN
|
||||
encapsulated packet, outer headers are ignored. Therefore, inner headers are
|
||||
matched.
|
||||
- If a TC filter on a PF matches traffic over a VF (on the PF), that traffic
|
||||
will be routed to the appropriate queue of the PF, and will not be passed on
|
||||
the VF. Such traffic will end up getting dropped higher up in the TCP/IP
|
||||
stack as it does not match PF address data.
|
||||
- If traffic matches multiple TC filters that point to different TCs, that
|
||||
traffic will be duplicated and sent to all matching TC queues. The hardware
|
||||
switch mirrors the packet to a VSI list when multiple filters are matched.
|
||||
|
||||
|
||||
Known Issues/Troubleshooting
|
||||
============================
|
||||
|
||||
Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device
|
||||
---------------------------------------------------------------------------------
|
||||
If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700
|
||||
series based device, the VF slaves may fail when they become the active slave.
|
||||
If the MAC address of the VF is set by the PF (Physical Function) of the
|
||||
device, when you add a slave, or change the active-backup slave, Linux bonding
|
||||
tries to sync the backup slave's MAC address to the same MAC address as the
|
||||
active slave. Linux bonding will fail at this point. This issue will not occur
|
||||
if the VF's MAC address is not set by the PF.
|
||||
|
||||
Traffic Is Not Being Passed Between VM and Client
|
||||
-------------------------------------------------
|
||||
You may not be able to pass traffic between a client system and a
|
||||
Virtual Machine (VM) running on a separate host if the Virtual Function
|
||||
(VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled
|
||||
on the VF. Note that this situation can occur in any combination of client,
|
||||
host, and guest operating system. For information on how to set the VF to
|
||||
trusted mode, refer to the section "VLAN Tag Packet Steering" in this
|
||||
readme document. For information on setting spoof checking, refer to the
|
||||
section "MAC and VLAN anti-spoofing feature" in this readme document.
|
||||
|
||||
Do not unload port driver if VF with active VM is bound to it
|
||||
-------------------------------------------------------------
|
||||
Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
|
||||
Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
|
||||
Once the VM shuts down, or otherwise releases the VF, the command will complete.
|
||||
|
||||
Using four traffic classes fails
|
||||
--------------------------------
|
||||
Do not try to reserve more than three traffic classes in the iavf driver. Doing
|
||||
so will fail to set any traffic classes and will cause the driver to write
|
||||
errors to stdout. Use a maximum of three queues to avoid this issue.
|
||||
|
||||
Multiple log error messages on iavf driver removal
|
||||
--------------------------------------------------
|
||||
If you have several VFs and you remove the iavf driver, several instances of
|
||||
the following log errors are written to the log::
|
||||
|
||||
Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
|
||||
Unable to send the message to VF 2 aq_err 12
|
||||
ARQ Overflow Error detected
|
||||
|
||||
Virtual machine does not get link
|
||||
---------------------------------
|
||||
If the virtual machine has more than one virtual port assigned to it, and those
|
||||
virtual ports are bound to different physical ports, you may not get link on
|
||||
all of the virtual ports. The following command may work around the issue::
|
||||
|
||||
# ethtool -r <PF>
|
||||
|
||||
Where <PF> is the PF interface in the host, for example: p5p1. You may need to
|
||||
run the command more than once to get link on all virtual ports.
|
||||
|
||||
MAC address of Virtual Function changes unexpectedly
|
||||
----------------------------------------------------
|
||||
If a Virtual Function's MAC address is not assigned in the host, then the VF
|
||||
(virtual function) driver will use a random MAC address. This random MAC
|
||||
address may change each time the VF driver is reloaded. You can assign a static
|
||||
MAC address in the host machine. This static MAC address will survive
|
||||
a VF driver reload.
|
||||
|
||||
Driver Buffer Overflow Fix
|
||||
--------------------------
|
||||
The fix to resolve CVE-2016-8105, referenced in Intel SA-00069
|
||||
https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00069.html
|
||||
is included in this and future versions of the driver.
|
||||
|
||||
Multiple Interfaces on Same Ethernet Broadcast Network
|
||||
------------------------------------------------------
|
||||
Due to the default ARP behavior on Linux, it is not possible to have one system
|
||||
on two IP networks in the same Ethernet broadcast domain (non-partitioned
|
||||
switch) behave as expected. All Ethernet interfaces will respond to IP traffic
|
||||
for any IP address assigned to the system. This results in unbalanced receive
|
||||
traffic.
|
||||
|
||||
If you have multiple interfaces in a server, either turn on ARP filtering by
|
||||
entering::
|
||||
|
||||
# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
|
||||
|
||||
NOTE:
|
||||
This setting is not saved across reboots. The configuration change can be
|
||||
made permanent by adding the following line to the file /etc/sysctl.conf::
|
||||
|
||||
net.ipv4.conf.all.arp_filter = 1
|
||||
|
||||
Another alternative is to install the interfaces in separate broadcast domains
|
||||
(either in different switches or in a switch partitioned to VLANs).
|
||||
|
||||
Rx Page Allocation Errors
|
||||
-------------------------
|
||||
'Page allocation failure. order:0' errors may occur under stress.
|
||||
This is caused by the way the Linux kernel reports this stressed condition.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://support.intel.com
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on the supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net
|
@@ -0,0 +1,46 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
==================================================================
|
||||
Linux Base Driver for the Intel(R) Ethernet Connection E800 Series
|
||||
==================================================================
|
||||
|
||||
Intel ice Linux driver.
|
||||
Copyright(c) 2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Enabling the driver
|
||||
- Support
|
||||
|
||||
The driver in this release supports Intel's E800 Series of products. For
|
||||
more information, visit Intel's support page at https://support.intel.com.
|
||||
|
||||
Enabling the driver
|
||||
===================
|
||||
The driver is enabled via the standard kernel configuration system,
|
||||
using the make command::
|
||||
|
||||
make oldconfig/menuconfig/etc.
|
||||
|
||||
The driver is located in the menu structure at:
|
||||
|
||||
-> Device Drivers
|
||||
-> Network device support (NETDEVICES [=y])
|
||||
-> Ethernet driver support
|
||||
-> Intel devices
|
||||
-> Intel(R) Ethernet Connection E800 Series Support
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
213
Documentation/networking/device_drivers/ethernet/intel/igb.rst
Normal file
213
Documentation/networking/device_drivers/ethernet/intel/igb.rst
Normal file
@@ -0,0 +1,213 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
==========================================================
|
||||
Linux Base Driver for Intel(R) Ethernet Network Connection
|
||||
==========================================================
|
||||
|
||||
Intel Gigabit Linux driver.
|
||||
Copyright(c) 1999-2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Identifying Your Adapter
|
||||
- Command Line Parameters
|
||||
- Additional Configurations
|
||||
- Support
|
||||
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
For information on how to identify your adapter, and for the latest Intel
|
||||
network drivers, refer to the Intel Support website:
|
||||
http://www.intel.com/support
|
||||
|
||||
|
||||
Command Line Parameters
|
||||
========================
|
||||
If the driver is built as a module, the following optional parameters are used
|
||||
by entering them on the command line with the modprobe command using this
|
||||
syntax::
|
||||
|
||||
modprobe igb [<option>=<VAL1>,<VAL2>,...]
|
||||
|
||||
There needs to be a <VAL#> for each network port in the system supported by
|
||||
this driver. The values will be applied to each instance, in function order.
|
||||
For example::
|
||||
|
||||
modprobe igb max_vfs=2,4
|
||||
|
||||
In this case, there are two network ports supported by igb in the system.
|
||||
|
||||
NOTE: A descriptor describes a data buffer and attributes related to the data
|
||||
buffer. This information is accessed by the hardware.
|
||||
|
||||
max_vfs
|
||||
-------
|
||||
:Valid Range: 0-7
|
||||
|
||||
This parameter adds support for SR-IOV. It causes the driver to spawn up to
|
||||
max_vfs worth of virtual functions. If the value is greater than 0 it will
|
||||
also force the VMDq parameter to be 1 or more.
|
||||
|
||||
The parameters for the driver are referenced by position. Thus, if you have a
|
||||
dual port adapter, or more than one adapter in your system, and want N virtual
|
||||
functions per port, you must specify a number for each port with each parameter
|
||||
separated by a comma. For example::
|
||||
|
||||
modprobe igb max_vfs=4
|
||||
|
||||
This will spawn 4 VFs on the first port.
|
||||
|
||||
::
|
||||
|
||||
modprobe igb max_vfs=2,4
|
||||
|
||||
This will spawn 2 VFs on the first port and 4 VFs on the second port.
|
||||
|
||||
NOTE: Caution must be used in loading the driver with these parameters.
|
||||
Depending on your system configuration, number of slots, etc., it is impossible
|
||||
to predict in all cases where the positions would be on the command line.
|
||||
|
||||
NOTE: Neither the device nor the driver control how VFs are mapped into config
|
||||
space. Bus layout will vary by operating system. On operating systems that
|
||||
support it, you can check sysfs to find the mapping.
|
||||
|
||||
NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering
|
||||
and VLAN tag stripping/insertion will remain enabled. Please remove the old
|
||||
VLAN filter before the new VLAN filter is added. For example::
|
||||
|
||||
ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0
|
||||
ip link set eth0 vf 0 vlan 0 // Delete vlan 100
|
||||
ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0
|
||||
|
||||
Debug
|
||||
-----
|
||||
:Valid Range: 0-16 (0=none,...,16=all)
|
||||
:Default Value: 0
|
||||
|
||||
This parameter adjusts the level debug messages displayed in the system logs.
|
||||
|
||||
|
||||
Additional Features and Configurations
|
||||
======================================
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
|
||||
to a value larger than the default value of 1500.
|
||||
|
||||
Use the ifconfig command to increase the MTU size. For example, enter the
|
||||
following where <x> is the interface number::
|
||||
|
||||
ifconfig eth<x> mtu 9000 up
|
||||
|
||||
Alternatively, you can use the ip command as follows::
|
||||
|
||||
ip link set mtu 9000 dev eth<x>
|
||||
ip link set up dev eth<x>
|
||||
|
||||
This setting is not saved across reboots. The setting change can be made
|
||||
permanent by adding 'MTU=9000' to the file:
|
||||
|
||||
- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
|
||||
- For SLES: /etc/sysconfig/network/<config_file>
|
||||
|
||||
NOTE: The maximum MTU setting for Jumbo Frames is 9216. This value coincides
|
||||
with the maximum Jumbo Frames size of 9234 bytes.
|
||||
|
||||
NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
|
||||
poor performance or loss of link.
|
||||
|
||||
|
||||
ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The latest ethtool
|
||||
version is required for this functionality. Download it at:
|
||||
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
|
||||
Enabling Wake on LAN (WoL)
|
||||
--------------------------
|
||||
WoL is configured through the ethtool utility.
|
||||
|
||||
WoL will be enabled on the system during the next shut down or reboot. For
|
||||
this driver version, in order to enable WoL, the igb driver must be loaded
|
||||
prior to shutting down or suspending the system.
|
||||
|
||||
NOTE: Wake on LAN is only supported on port A of multi-port devices. Also
|
||||
Wake On LAN is not supported for the following device:
|
||||
- Intel(R) Gigabit VT Quad Port Server Adapter
|
||||
|
||||
|
||||
Multiqueue
|
||||
----------
|
||||
In this mode, a separate MSI-X vector is allocated for each queue and one for
|
||||
"other" interrupts such as link status change and errors. All interrupts are
|
||||
throttled via interrupt moderation. Interrupt moderation must be used to avoid
|
||||
interrupt storms while the driver is processing one interrupt. The moderation
|
||||
value should be at least as large as the expected time for the driver to
|
||||
process an interrupt. Multiqueue is off by default.
|
||||
|
||||
REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not found,
|
||||
the system will fallback to MSI or to Legacy interrupts. This driver supports
|
||||
receive multiqueue on all kernels that support MSI-X.
|
||||
|
||||
NOTE: On some kernels a reboot is required to switch between single queue mode
|
||||
and multiqueue mode or vice-versa.
|
||||
|
||||
|
||||
MAC and VLAN anti-spoofing feature
|
||||
----------------------------------
|
||||
When a malicious driver attempts to send a spoofed packet, it is dropped by the
|
||||
hardware and not transmitted.
|
||||
|
||||
An interrupt is sent to the PF driver notifying it of the spoof attempt. When a
|
||||
spoofed packet is detected, the PF driver will send the following message to
|
||||
the system log (displayed by the "dmesg" command):
|
||||
Spoof event(s) detected on VF(n), where n = the VF that attempted to do the
|
||||
spoofing
|
||||
|
||||
|
||||
Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool
|
||||
------------------------------------------------------------
|
||||
You can set a MAC address of a Virtual Function (VF), a default VLAN and the
|
||||
rate limit using the IProute2 tool. Download the latest version of the
|
||||
IProute2 tool from Sourceforge if your version does not have all the features
|
||||
you require.
|
||||
|
||||
Credit Based Shaper (Qav Mode)
|
||||
------------------------------
|
||||
When enabling the CBS qdisc in the hardware offload mode, traffic shaping using
|
||||
the CBS (described in the IEEE 802.1Q-2018 Section 8.6.8.2 and discussed in the
|
||||
Annex L) algorithm will run in the i210 controller, so it's more accurate and
|
||||
uses less CPU.
|
||||
|
||||
When using offloaded CBS, and the traffic rate obeys the configured rate
|
||||
(doesn't go above it), CBS should have little to no effect in the latency.
|
||||
|
||||
The offloaded version of the algorithm has some limits, caused by how the idle
|
||||
slope is expressed in the adapter's registers. It can only represent idle slopes
|
||||
in 16.38431 kbps units, which means that if a idle slope of 2576kbps is
|
||||
requested, the controller will be configured to use a idle slope of ~2589 kbps,
|
||||
because the driver rounds the value up. For more details, see the comments on
|
||||
:c:func:`igb_config_tx_modes()`.
|
||||
|
||||
NOTE: This feature is exclusive to i210 models.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
@@ -0,0 +1,65 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
===========================================================
|
||||
Linux Base Virtual Function Driver for Intel(R) 1G Ethernet
|
||||
===========================================================
|
||||
|
||||
Intel Gigabit Virtual Function Linux driver.
|
||||
Copyright(c) 1999-2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
- Identifying Your Adapter
|
||||
- Additional Configurations
|
||||
- Support
|
||||
|
||||
This driver supports Intel 82576-based virtual function devices-based virtual
|
||||
function devices that can only be activated on kernels that support SR-IOV.
|
||||
|
||||
SR-IOV requires the correct platform and OS support.
|
||||
|
||||
The guest OS loading this driver must support MSI-X interrupts.
|
||||
|
||||
For questions related to hardware requirements, refer to the documentation
|
||||
supplied with your Intel adapter. All hardware requirements listed apply to use
|
||||
with Linux.
|
||||
|
||||
Driver information can be obtained using ethtool, lspci, and ifconfig.
|
||||
Instructions on updating ethtool can be found in the section Additional
|
||||
Configurations later in this document.
|
||||
|
||||
NOTE: There is a limit of a total of 32 shared VLANs to 1 or more VFs.
|
||||
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
For information on how to identify your adapter, and for the latest Intel
|
||||
network drivers, refer to the Intel Support website:
|
||||
http://www.intel.com/support
|
||||
|
||||
|
||||
Additional Features and Configurations
|
||||
======================================
|
||||
|
||||
ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The latest ethtool
|
||||
version is required for this functionality. Download it at:
|
||||
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
468
Documentation/networking/device_drivers/ethernet/intel/ixgb.rst
Normal file
468
Documentation/networking/device_drivers/ethernet/intel/ixgb.rst
Normal file
@@ -0,0 +1,468 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
=====================================================================
|
||||
Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection
|
||||
=====================================================================
|
||||
|
||||
October 1, 2018
|
||||
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- In This Release
|
||||
- Identifying Your Adapter
|
||||
- Command Line Parameters
|
||||
- Improving Performance
|
||||
- Additional Configurations
|
||||
- Known Issues/Troubleshooting
|
||||
- Support
|
||||
|
||||
|
||||
|
||||
In This Release
|
||||
===============
|
||||
|
||||
This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R)
|
||||
Network Connection. This driver includes support for Itanium(R)2-based
|
||||
systems.
|
||||
|
||||
For questions related to hardware requirements, refer to the documentation
|
||||
supplied with your 10 Gigabit adapter. All hardware requirements listed apply
|
||||
to use with Linux.
|
||||
|
||||
The following features are available in this kernel:
|
||||
- Native VLANs
|
||||
- Channel Bonding (teaming)
|
||||
- SNMP
|
||||
|
||||
Channel Bonding documentation can be found in the Linux kernel source:
|
||||
/Documentation/networking/bonding.rst
|
||||
|
||||
The driver information previously displayed in the /proc filesystem is not
|
||||
supported in this release. Alternatively, you can use ethtool (version 1.6
|
||||
or later), lspci, and iproute2 to obtain the same information.
|
||||
|
||||
Instructions on updating ethtool can be found in the section "Additional
|
||||
Configurations" later in this document.
|
||||
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
|
||||
The following Intel network adapters are compatible with the drivers in this
|
||||
release:
|
||||
|
||||
+------------+------------------------------+----------------------------------+
|
||||
| Controller | Adapter Name | Physical Layer |
|
||||
+============+==============================+==================================+
|
||||
| 82597EX | Intel(R) PRO/10GbE LR/SR/CX4 | - 10G Base-LR (fiber) |
|
||||
| | Server Adapters | - 10G Base-SR (fiber) |
|
||||
| | | - 10G Base-CX4 (copper) |
|
||||
+------------+------------------------------+----------------------------------+
|
||||
|
||||
For more information on how to identify your adapter, go to the Adapter &
|
||||
Driver ID Guide at:
|
||||
|
||||
https://support.intel.com
|
||||
|
||||
|
||||
Command Line Parameters
|
||||
=======================
|
||||
|
||||
If the driver is built as a module, the following optional parameters are
|
||||
used by entering them on the command line with the modprobe command using
|
||||
this syntax::
|
||||
|
||||
modprobe ixgb [<option>=<VAL1>,<VAL2>,...]
|
||||
|
||||
For example, with two 10GbE PCI adapters, entering::
|
||||
|
||||
modprobe ixgb TxDescriptors=80,128
|
||||
|
||||
loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
|
||||
resources for the second adapter.
|
||||
|
||||
The default value for each parameter is generally the recommended setting,
|
||||
unless otherwise noted.
|
||||
|
||||
Copybreak
|
||||
---------
|
||||
:Valid Range: 0-XXXX
|
||||
:Default Value: 256
|
||||
|
||||
This is the maximum size of packet that is copied to a new buffer on
|
||||
receive.
|
||||
|
||||
Debug
|
||||
-----
|
||||
:Valid Range: 0-16 (0=none,...,16=all)
|
||||
:Default Value: 0
|
||||
|
||||
This parameter adjusts the level of debug messages displayed in the
|
||||
system logs.
|
||||
|
||||
FlowControl
|
||||
-----------
|
||||
:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
|
||||
:Default Value: 1 if no EEPROM, otherwise read from EEPROM
|
||||
|
||||
This parameter controls the automatic generation(Tx) and response(Rx) to
|
||||
Ethernet PAUSE frames. There are hardware bugs associated with enabling
|
||||
Tx flow control so beware.
|
||||
|
||||
RxDescriptors
|
||||
-------------
|
||||
:Valid Range: 64-4096
|
||||
:Default Value: 1024
|
||||
|
||||
This value is the number of receive descriptors allocated by the driver.
|
||||
Increasing this value allows the driver to buffer more incoming packets.
|
||||
Each descriptor is 16 bytes. A receive buffer is also allocated for
|
||||
each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
|
||||
depending on the MTU setting. When the MTU size is 1500 or less, the
|
||||
receive buffer size is 2048 bytes. When the MTU is greater than 1500 the
|
||||
receive buffer size will be either 4056, 8192, or 16384 bytes. The
|
||||
maximum MTU size is 16114.
|
||||
|
||||
TxDescriptors
|
||||
-------------
|
||||
:Valid Range: 64-4096
|
||||
:Default Value: 256
|
||||
|
||||
This value is the number of transmit descriptors allocated by the driver.
|
||||
Increasing this value allows the driver to queue more transmits. Each
|
||||
descriptor is 16 bytes.
|
||||
|
||||
RxIntDelay
|
||||
----------
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 72
|
||||
|
||||
This value delays the generation of receive interrupts in units of
|
||||
0.8192 microseconds. Receive interrupt reduction can improve CPU
|
||||
efficiency if properly tuned for specific network traffic. Increasing
|
||||
this value adds extra latency to frame reception and can end up
|
||||
decreasing the throughput of TCP traffic. If the system is reporting
|
||||
dropped receives, this value may be set too high, causing the driver to
|
||||
run out of available receive descriptors.
|
||||
|
||||
TxIntDelay
|
||||
----------
|
||||
:Valid Range: 0-65535 (0=off)
|
||||
:Default Value: 32
|
||||
|
||||
This value delays the generation of transmit interrupts in units of
|
||||
0.8192 microseconds. Transmit interrupt reduction can improve CPU
|
||||
efficiency if properly tuned for specific network traffic. Increasing
|
||||
this value adds extra latency to frame transmission and can end up
|
||||
decreasing the throughput of TCP traffic. If this value is set too high,
|
||||
it will cause the driver to run out of available transmit descriptors.
|
||||
|
||||
XsumRX
|
||||
------
|
||||
:Valid Range: 0-1
|
||||
:Default Value: 1
|
||||
|
||||
A value of '1' indicates that the driver should enable IP checksum
|
||||
offload for received packets (both UDP and TCP) to the adapter hardware.
|
||||
|
||||
RxFCHighThresh
|
||||
--------------
|
||||
:Valid Range: 1,536-262,136 (0x600 - 0x3FFF8, 8 byte granularity)
|
||||
:Default Value: 196,608 (0x30000)
|
||||
|
||||
Receive Flow control high threshold (when we send a pause frame)
|
||||
|
||||
RxFCLowThresh
|
||||
-------------
|
||||
:Valid Range: 64-262,136 (0x40 - 0x3FFF8, 8 byte granularity)
|
||||
:Default Value: 163,840 (0x28000)
|
||||
|
||||
Receive Flow control low threshold (when we send a resume frame)
|
||||
|
||||
FCReqTimeout
|
||||
------------
|
||||
:Valid Range: 1-65535
|
||||
:Default Value: 65535
|
||||
|
||||
Flow control request timeout (how long to pause the link partner's tx)
|
||||
|
||||
IntDelayEnable
|
||||
--------------
|
||||
:Value Range: 0,1
|
||||
:Default Value: 1
|
||||
|
||||
Interrupt Delay, 0 disables transmit interrupt delay and 1 enables it.
|
||||
|
||||
|
||||
Improving Performance
|
||||
=====================
|
||||
|
||||
With the 10 Gigabit server adapters, the default Linux configuration will
|
||||
very likely limit the total available throughput artificially. There is a set
|
||||
of configuration changes that, when applied together, will increase the ability
|
||||
of Linux to transmit and receive data. The following enhancements were
|
||||
originally acquired from settings published at http://www.spec.org/web99/ for
|
||||
various submitted results using Linux.
|
||||
|
||||
NOTE:
|
||||
These changes are only suggestions, and serve as a starting point for
|
||||
tuning your network performance.
|
||||
|
||||
The changes are made in three major ways, listed in order of greatest effect:
|
||||
|
||||
- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen
|
||||
parameter.
|
||||
- Use sysctl to modify /proc parameters (essentially kernel tuning)
|
||||
- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
|
||||
transmit burst lengths on the bus.
|
||||
|
||||
NOTE:
|
||||
setpci modifies the adapter's configuration registers to allow it to read
|
||||
up to 4k bytes at a time (for transmits). However, for some systems the
|
||||
behavior after modifying this register may be undefined (possibly errors of
|
||||
some kind). A power-cycle, hard reset or explicitly setting the e6 register
|
||||
back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a
|
||||
stable configuration.
|
||||
|
||||
- COPY these lines and paste them into ixgb_perf.sh:
|
||||
|
||||
::
|
||||
|
||||
#!/bin/bash
|
||||
echo "configuring network performance , edit this file to change the interface
|
||||
or device ID of 10GbE card"
|
||||
# set mmrbc to 4k reads, modify only Intel 10GbE device IDs
|
||||
# replace 1a48 with appropriate 10GbE device's ID installed on the system,
|
||||
# if needed.
|
||||
setpci -d 8086:1a48 e6.b=2e
|
||||
# set the MTU (max transmission unit) - it requires your switch and clients
|
||||
# to change as well.
|
||||
# set the txqueuelen
|
||||
# your ixgb adapter should be loaded as eth1 for this to work, change if needed
|
||||
ip li set dev eth1 mtu 9000 txqueuelen 1000 up
|
||||
# call the sysctl utility to modify /proc/sys entries
|
||||
sysctl -p ./sysctl_ixgb.conf
|
||||
|
||||
- COPY these lines and paste them into sysctl_ixgb.conf:
|
||||
|
||||
::
|
||||
|
||||
# some of the defaults may be different for your kernel
|
||||
# call this file with sysctl -p <this file>
|
||||
# these are just suggested values that worked well to increase throughput in
|
||||
# several network benchmark tests, your mileage may vary
|
||||
|
||||
### IPV4 specific settings
|
||||
# turn TCP timestamp support off, default 1, reduces CPU use
|
||||
net.ipv4.tcp_timestamps = 0
|
||||
# turn SACK support off, default on
|
||||
# on systems with a VERY fast bus -> memory interface this is the big gainer
|
||||
net.ipv4.tcp_sack = 0
|
||||
# set min/default/max TCP read buffer, default 4096 87380 174760
|
||||
net.ipv4.tcp_rmem = 10000000 10000000 10000000
|
||||
# set min/pressure/max TCP write buffer, default 4096 16384 131072
|
||||
net.ipv4.tcp_wmem = 10000000 10000000 10000000
|
||||
# set min/pressure/max TCP buffer space, default 31744 32256 32768
|
||||
net.ipv4.tcp_mem = 10000000 10000000 10000000
|
||||
|
||||
### CORE settings (mostly for socket and UDP effect)
|
||||
# set maximum receive socket buffer size, default 131071
|
||||
net.core.rmem_max = 524287
|
||||
# set maximum send socket buffer size, default 131071
|
||||
net.core.wmem_max = 524287
|
||||
# set default receive socket buffer size, default 65535
|
||||
net.core.rmem_default = 524287
|
||||
# set default send socket buffer size, default 65535
|
||||
net.core.wmem_default = 524287
|
||||
# set maximum amount of option memory buffers, default 10240
|
||||
net.core.optmem_max = 524287
|
||||
# set number of unprocessed input packets before kernel starts dropping them; default 300
|
||||
net.core.netdev_max_backlog = 300000
|
||||
|
||||
Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
|
||||
your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's
|
||||
ID installed on the system.
|
||||
|
||||
NOTE:
|
||||
Unless these scripts are added to the boot process, these changes will
|
||||
only last only until the next system reboot.
|
||||
|
||||
|
||||
Resolving Slow UDP Traffic
|
||||
--------------------------
|
||||
If your server does not seem to be able to receive UDP traffic as fast as it
|
||||
can receive TCP traffic, it could be because Linux, by default, does not set
|
||||
the network stack buffers as large as they need to be to support high UDP
|
||||
transfer rates. One way to alleviate this problem is to allow more memory to
|
||||
be used by the IP stack to store incoming data.
|
||||
|
||||
For instance, use the commands::
|
||||
|
||||
sysctl -w net.core.rmem_max=262143
|
||||
|
||||
and::
|
||||
|
||||
sysctl -w net.core.rmem_default=262143
|
||||
|
||||
to increase the read buffer memory max and default to 262143 (256k - 1) from
|
||||
defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
|
||||
will increase the amount of memory used by the network stack for receives, and
|
||||
can be increased significantly more if necessary for your application.
|
||||
|
||||
|
||||
Additional Configurations
|
||||
=========================
|
||||
|
||||
Configuring the Driver on Different Distributions
|
||||
-------------------------------------------------
|
||||
Configuring a network driver to load properly when the system is started is
|
||||
distribution dependent. Typically, the configuration process involves adding
|
||||
an alias line to /etc/modprobe.conf as well as editing other system startup
|
||||
scripts and/or configuration files. Many popular Linux distributions ship
|
||||
with tools to make these changes for you. To learn the proper way to
|
||||
configure a network device for your system, refer to your distribution
|
||||
documentation. If during this process you are asked for the driver or module
|
||||
name, the name for the Linux Base Driver for the Intel 10GbE Family of
|
||||
Adapters is ixgb.
|
||||
|
||||
Viewing Link Messages
|
||||
---------------------
|
||||
Link messages will not be displayed to the console if the distribution is
|
||||
restricting system messages. In order to see network driver link messages on
|
||||
your console, set dmesg to eight by entering the following::
|
||||
|
||||
dmesg -n 8
|
||||
|
||||
NOTE: This setting is not saved across reboots.
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
|
||||
enabled by changing the MTU to a value larger than the default of 1500.
|
||||
The maximum value for the MTU is 16114. Use the ip command to
|
||||
increase the MTU size. For example::
|
||||
|
||||
ip li set dev ethx mtu 9000
|
||||
|
||||
The maximum MTU setting for Jumbo Frames is 16114. This value coincides
|
||||
with the maximum Jumbo Frames size of 16128.
|
||||
|
||||
Ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The ethtool
|
||||
version 1.6 or later is required for this functionality.
|
||||
|
||||
The latest release of ethtool can be found from
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
NOTE:
|
||||
The ethtool version 1.6 only supports a limited set of ethtool options.
|
||||
Support for a more complete ethtool feature set can be enabled by
|
||||
upgrading to the latest version.
|
||||
|
||||
NAPI
|
||||
----
|
||||
NAPI (Rx polling mode) is supported in the ixgb driver.
|
||||
|
||||
See https://wiki.linuxfoundation.org/networking/napi for more information on
|
||||
NAPI.
|
||||
|
||||
|
||||
Known Issues/Troubleshooting
|
||||
============================
|
||||
|
||||
NOTE:
|
||||
After installing the driver, if your Intel Network Connection is not
|
||||
working, verify in the "In This Release" section of the readme that you have
|
||||
installed the correct driver.
|
||||
|
||||
Cable Interoperability Issue with Fujitsu XENPAK Module in SmartBits Chassis
|
||||
----------------------------------------------------------------------------
|
||||
Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4
|
||||
Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits
|
||||
chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni.
|
||||
The CRC errors may be received either by the Intel(R) PRO/10GbE CX4
|
||||
Server adapter or the SmartBits. If this situation occurs using a different
|
||||
cable assembly may resolve the issue.
|
||||
|
||||
Cable Interoperability Issues with HP Procurve 3400cl Switch Port
|
||||
-----------------------------------------------------------------
|
||||
Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server
|
||||
adapter is connected to an HP Procurve 3400cl switch port using short cables
|
||||
(1 m or shorter). If this situation occurs, using a longer cable may resolve
|
||||
the issue.
|
||||
|
||||
Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that
|
||||
Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC
|
||||
errors may be received either by the CX4 Server adapter or at the switch. If
|
||||
this situation occurs, using a different cable assembly may resolve the issue.
|
||||
|
||||
Jumbo Frames System Requirement
|
||||
-------------------------------
|
||||
Memory allocation failures have been observed on Linux systems with 64 MB
|
||||
of RAM or less that are running Jumbo Frames. If you are using Jumbo
|
||||
Frames, your system may require more than the advertised minimum
|
||||
requirement of 64 MB of system memory.
|
||||
|
||||
Performance Degradation with Jumbo Frames
|
||||
-----------------------------------------
|
||||
Degradation in throughput performance may be observed in some Jumbo frames
|
||||
environments. If this is observed, increasing the application's socket buffer
|
||||
size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
|
||||
See the specific application manual and /usr/src/linux*/Documentation/
|
||||
networking/ip-sysctl.txt for more details.
|
||||
|
||||
Allocating Rx Buffers when Using Jumbo Frames
|
||||
---------------------------------------------
|
||||
Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
|
||||
the available memory is heavily fragmented. This issue may be seen with PCI-X
|
||||
adapters or with packet split disabled. This can be reduced or eliminated
|
||||
by changing the amount of available memory for receive buffer allocation, by
|
||||
increasing /proc/sys/vm/min_free_kbytes.
|
||||
|
||||
Multiple Interfaces on Same Ethernet Broadcast Network
|
||||
------------------------------------------------------
|
||||
Due to the default ARP behavior on Linux, it is not possible to have
|
||||
one system on two IP networks in the same Ethernet broadcast domain
|
||||
(non-partitioned switch) behave as expected. All Ethernet interfaces
|
||||
will respond to IP traffic for any IP address assigned to the system.
|
||||
This results in unbalanced receive traffic.
|
||||
|
||||
If you have multiple interfaces in a server, do either of the following:
|
||||
|
||||
- Turn on ARP filtering by entering::
|
||||
|
||||
echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
|
||||
|
||||
- Install the interfaces in separate broadcast domains - either in
|
||||
different switches or in a switch partitioned to VLANs.
|
||||
|
||||
UDP Stress Test Dropped Packet Issue
|
||||
--------------------------------------
|
||||
Under small packets UDP stress test with 10GbE driver, the Linux system
|
||||
may drop UDP packets due to the fullness of socket buffers. You may want
|
||||
to change the driver's Flow Control variables to the minimum value for
|
||||
controlling packet reception.
|
||||
|
||||
Tx Hangs Possible Under Stress
|
||||
------------------------------
|
||||
Under stress conditions, if TX hangs occur, turning off TSO
|
||||
"ethtool -K eth0 tso off" may resolve the problem.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net
|
541
Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst
Normal file
541
Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst
Normal file
@@ -0,0 +1,541 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
===========================================================================
|
||||
Linux Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters
|
||||
===========================================================================
|
||||
|
||||
Intel 10 Gigabit Linux driver.
|
||||
Copyright(c) 1999-2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Identifying Your Adapter
|
||||
- Command Line Parameters
|
||||
- Additional Configurations
|
||||
- Known Issues
|
||||
- Support
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
The driver is compatible with devices based on the following:
|
||||
|
||||
* Intel(R) Ethernet Controller 82598
|
||||
* Intel(R) Ethernet Controller 82599
|
||||
* Intel(R) Ethernet Controller X520
|
||||
* Intel(R) Ethernet Controller X540
|
||||
* Intel(R) Ethernet Controller x550
|
||||
* Intel(R) Ethernet Controller X552
|
||||
* Intel(R) Ethernet Controller X553
|
||||
|
||||
For information on how to identify your adapter, and for the latest Intel
|
||||
network drivers, refer to the Intel Support website:
|
||||
https://www.intel.com/support
|
||||
|
||||
SFP+ Devices with Pluggable Optics
|
||||
----------------------------------
|
||||
|
||||
82599-BASED ADAPTERS
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
NOTES:
|
||||
- If your 82599-based Intel(R) Network Adapter came with Intel optics or is an
|
||||
Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel optics
|
||||
and/or the direct attach cables listed below.
|
||||
- When 82599-based SFP+ devices are connected back to back, they should be set
|
||||
to the same Speed setting via ethtool. Results may vary if you mix speed
|
||||
settings.
|
||||
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Supplier | Type | Part Numbers |
|
||||
+===============+=======================================+==================+
|
||||
| SR Modules |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | FTLX8571D3BCV-IT |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDZ-IN2 |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDDZ-IN1 |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| LR Modules |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | FTLX1471D3BCV-IT |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDZ-IN2 |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDDZ-IN1 |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
|
||||
The following is a list of 3rd party SFP+ modules that have received some
|
||||
testing. Not all modules are applicable to all devices.
|
||||
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Supplier | Type | Part Numbers |
|
||||
+===============+=======================================+==================+
|
||||
| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Finisar | DUAL RATE 1G/10G SFP+ SR (No Bail) | FTLX8571D3QCV-IT |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Avago | DUAL RATE 1G/10G SFP+ SR (No Bail) | AFBR-703SDZ-IN1 |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Finisar | DUAL RATE 1G/10G SFP+ LR (No Bail) | FTLX1471D3QCV-IT |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Avago | DUAL RATE 1G/10G SFP+ LR (No Bail) | AFCT-701SDZ-IN1 |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Finisar | 1000BASE-T SFP | FCLF8522P2BTL |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Avago | 1000BASE-T | ABCU-5710RZ |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| HP | 1000BASE-SX SFP | 453153-001 |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
|
||||
82599-based adapters support all passive and active limiting direct attach
|
||||
cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications.
|
||||
|
||||
Laser turns off for SFP+ when ifconfig ethX down
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
"ifconfig ethX down" turns off the laser for 82599-based SFP+ fiber adapters.
|
||||
"ifconfig ethX up" turns on the laser.
|
||||
Alternatively, you can use "ip link set [down/up] dev ethX" to turn the
|
||||
laser off and on.
|
||||
|
||||
|
||||
82599-based QSFP+ Adapters
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
NOTES:
|
||||
- If your 82599-based Intel(R) Network Adapter came with Intel optics, it only
|
||||
supports Intel optics.
|
||||
- 82599-based QSFP+ adapters only support 4x10 Gbps connections. 1x40 Gbps
|
||||
connections are not supported. QSFP+ link partners must be configured for
|
||||
4x10 Gbps.
|
||||
- 82599-based QSFP+ adapters do not support automatic link speed detection.
|
||||
The link speed must be configured to either 10 Gbps or 1 Gbps to match the link
|
||||
partners speed capabilities. Incorrect speed configurations will result in
|
||||
failure to link.
|
||||
- Intel(R) Ethernet Converged Network Adapter X520-Q1 only supports the optics
|
||||
and direct attach cables listed below.
|
||||
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Supplier | Type | Part Numbers |
|
||||
+===============+=======================================+==================+
|
||||
| Intel | DUAL RATE 1G/10G QSFP+ SRL (bailed) | E10GQSFPSR |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
|
||||
82599-based QSFP+ adapters support all passive and active limiting QSFP+
|
||||
direct attach cables that comply with SFF-8436 v4.1 specifications.
|
||||
|
||||
82598-BASED ADAPTERS
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
NOTES:
|
||||
- Intel(r) Ethernet Network Adapters that support removable optical modules
|
||||
only support their original module type (for example, the Intel(R) 10 Gigabit
|
||||
SR Dual Port Express Module only supports SR optical modules). If you plug in
|
||||
a different type of module, the driver will not load.
|
||||
- Hot Swapping/hot plugging optical modules is not supported.
|
||||
- Only single speed, 10 gigabit modules are supported.
|
||||
- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module
|
||||
types are not supported. Please see your system documentation for details.
|
||||
|
||||
The following is a list of SFP+ modules and direct attach cables that have
|
||||
received some testing. Not all modules are applicable to all devices.
|
||||
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Supplier | Type | Part Numbers |
|
||||
+===============+=======================================+==================+
|
||||
| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL |
|
||||
+---------------+---------------------------------------+------------------+
|
||||
|
||||
82598-based adapters support all passive direct attach cables that comply with
|
||||
SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach cables
|
||||
are not supported.
|
||||
|
||||
Third party optic modules and cables referred to above are listed only for the
|
||||
purpose of highlighting third party specifications and potential
|
||||
compatibility, and are not recommendations or endorsements or sponsorship of
|
||||
any third party's product by Intel. Intel is not endorsing or promoting
|
||||
products made by any third party and the third party reference is provided
|
||||
only to share information regarding certain optic modules and cables with the
|
||||
above specifications. There may be other manufacturers or suppliers, producing
|
||||
or supplying optic modules and cables with similar or matching descriptions.
|
||||
Customers must use their own discretion and diligence to purchase optic
|
||||
modules and cables from any third party of their choice. Customers are solely
|
||||
responsible for assessing the suitability of the product and/or devices and
|
||||
for the selection of the vendor for purchasing any product. THE OPTIC MODULES
|
||||
AND CABLES REFERRED TO ABOVE ARE NOT WARRANTED OR SUPPORTED BY INTEL. INTEL
|
||||
ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED
|
||||
WARRANTY, RELATING TO SALE AND/OR USE OF SUCH THIRD PARTY PRODUCTS OR
|
||||
SELECTION OF VENDOR BY CUSTOMERS.
|
||||
|
||||
Command Line Parameters
|
||||
=======================
|
||||
|
||||
max_vfs
|
||||
-------
|
||||
:Valid Range: 1-63
|
||||
|
||||
This parameter adds support for SR-IOV. It causes the driver to spawn up to
|
||||
max_vfs worth of virtual functions.
|
||||
If the value is greater than 0 it will also force the VMDq parameter to be 1 or
|
||||
more.
|
||||
|
||||
NOTE: This parameter is only used on kernel 3.7.x and below. On kernel 3.8.x
|
||||
and above, use sysfs to enable VFs. Also, for Red Hat distributions, this
|
||||
parameter is only used on version 6.6 and older. For version 6.7 and newer, use
|
||||
sysfs. For example::
|
||||
|
||||
#echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs // enable VFs
|
||||
#echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs
|
||||
|
||||
The parameters for the driver are referenced by position. Thus, if you have a
|
||||
dual port adapter, or more than one adapter in your system, and want N virtual
|
||||
functions per port, you must specify a number for each port with each parameter
|
||||
separated by a comma. For example::
|
||||
|
||||
modprobe ixgbe max_vfs=4
|
||||
|
||||
This will spawn 4 VFs on the first port.
|
||||
|
||||
::
|
||||
|
||||
modprobe ixgbe max_vfs=2,4
|
||||
|
||||
This will spawn 2 VFs on the first port and 4 VFs on the second port.
|
||||
|
||||
NOTE: Caution must be used in loading the driver with these parameters.
|
||||
Depending on your system configuration, number of slots, etc., it is impossible
|
||||
to predict in all cases where the positions would be on the command line.
|
||||
|
||||
NOTE: Neither the device nor the driver control how VFs are mapped into config
|
||||
space. Bus layout will vary by operating system. On operating systems that
|
||||
support it, you can check sysfs to find the mapping.
|
||||
|
||||
NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering
|
||||
and VLAN tag stripping/insertion will remain enabled. Please remove the old
|
||||
VLAN filter before the new VLAN filter is added. For example,
|
||||
|
||||
::
|
||||
|
||||
ip link set eth0 vf 0 vlan 100 // set VLAN 100 for VF 0
|
||||
ip link set eth0 vf 0 vlan 0 // Delete VLAN 100
|
||||
ip link set eth0 vf 0 vlan 200 // set a new VLAN 200 for VF 0
|
||||
|
||||
With kernel 3.6, the driver supports the simultaneous usage of max_vfs and DCB
|
||||
features, subject to the constraints described below. Prior to kernel 3.6, the
|
||||
driver did not support the simultaneous operation of max_vfs greater than 0 and
|
||||
the DCB features (multiple traffic classes utilizing Priority Flow Control and
|
||||
Extended Transmission Selection).
|
||||
|
||||
When DCB is enabled, network traffic is transmitted and received through
|
||||
multiple traffic classes (packet buffers in the NIC). The traffic is associated
|
||||
with a specific class based on priority, which has a value of 0 through 7 used
|
||||
in the VLAN tag. When SR-IOV is not enabled, each traffic class is associated
|
||||
with a set of receive/transmit descriptor queue pairs. The number of queue
|
||||
pairs for a given traffic class depends on the hardware configuration. When
|
||||
SR-IOV is enabled, the descriptor queue pairs are grouped into pools. The
|
||||
Physical Function (PF) and each Virtual Function (VF) is allocated a pool of
|
||||
receive/transmit descriptor queue pairs. When multiple traffic classes are
|
||||
configured (for example, DCB is enabled), each pool contains a queue pair from
|
||||
each traffic class. When a single traffic class is configured in the hardware,
|
||||
the pools contain multiple queue pairs from the single traffic class.
|
||||
|
||||
The number of VFs that can be allocated depends on the number of traffic
|
||||
classes that can be enabled. The configurable number of traffic classes for
|
||||
each enabled VF is as follows:
|
||||
0 - 15 VFs = Up to 8 traffic classes, depending on device support
|
||||
16 - 31 VFs = Up to 4 traffic classes
|
||||
32 - 63 VFs = 1 traffic class
|
||||
|
||||
When VFs are configured, the PF is allocated one pool as well. The PF supports
|
||||
the DCB features with the constraint that each traffic class will only use a
|
||||
single queue pair. When zero VFs are configured, the PF can support multiple
|
||||
queue pairs per traffic class.
|
||||
|
||||
allow_unsupported_sfp
|
||||
---------------------
|
||||
:Valid Range: 0,1
|
||||
:Default Value: 0 (disabled)
|
||||
|
||||
This parameter allows unsupported and untested SFP+ modules on 82599-based
|
||||
adapters, as long as the type of module is known to the driver.
|
||||
|
||||
debug
|
||||
-----
|
||||
:Valid Range: 0-16 (0=none,...,16=all)
|
||||
:Default Value: 0
|
||||
|
||||
This parameter adjusts the level of debug messages displayed in the system
|
||||
logs.
|
||||
|
||||
|
||||
Additional Features and Configurations
|
||||
======================================
|
||||
|
||||
Flow Control
|
||||
------------
|
||||
Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
|
||||
receiving and transmitting pause frames for ixgbe. When transmit is enabled,
|
||||
pause frames are generated when the receive packet buffer crosses a predefined
|
||||
threshold. When receive is enabled, the transmit unit will halt for the time
|
||||
delay specified when a pause frame is received.
|
||||
|
||||
NOTE: You must have a flow control capable link partner.
|
||||
|
||||
Flow Control is enabled by default.
|
||||
|
||||
Use ethtool to change the flow control settings. To enable or disable Rx or
|
||||
Tx Flow Control::
|
||||
|
||||
ethtool -A eth? rx <on|off> tx <on|off>
|
||||
|
||||
Note: This command only enables or disables Flow Control if auto-negotiation is
|
||||
disabled. If auto-negotiation is enabled, this command changes the parameters
|
||||
used for auto-negotiation with the link partner.
|
||||
|
||||
To enable or disable auto-negotiation::
|
||||
|
||||
ethtool -s eth? autoneg <on|off>
|
||||
|
||||
Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
|
||||
on your device, you may not be able to change the auto-negotiation setting.
|
||||
|
||||
NOTE: For 82598 backplane cards entering 1 gigabit mode, flow control default
|
||||
behavior is changed to off. Flow control in 1 gigabit mode on these devices can
|
||||
lead to transmit hangs.
|
||||
|
||||
Intel(R) Ethernet Flow Director
|
||||
-------------------------------
|
||||
The Intel Ethernet Flow Director performs the following tasks:
|
||||
|
||||
- Directs receive packets according to their flows to different queues.
|
||||
- Enables tight control on routing a flow in the platform.
|
||||
- Matches flows and CPU cores for flow affinity.
|
||||
- Supports multiple parameters for flexible flow classification and load
|
||||
balancing (in SFP mode only).
|
||||
|
||||
NOTE: Intel Ethernet Flow Director masking works in the opposite manner from
|
||||
subnet masking. In the following command::
|
||||
|
||||
#ethtool -N eth11 flow-type ip4 src-ip 172.4.1.2 m 255.0.0.0 dst-ip \
|
||||
172.21.1.1 m 255.128.0.0 action 31
|
||||
|
||||
The src-ip value that is written to the filter will be 0.4.1.2, not 172.0.0.0
|
||||
as might be expected. Similarly, the dst-ip value written to the filter will be
|
||||
0.21.1.1, not 172.0.0.0.
|
||||
|
||||
To enable or disable the Intel Ethernet Flow Director::
|
||||
|
||||
# ethtool -K ethX ntuple <on|off>
|
||||
|
||||
When disabling ntuple filters, all the user programmed filters are flushed from
|
||||
the driver cache and hardware. All needed filters must be re-added when ntuple
|
||||
is re-enabled.
|
||||
|
||||
To add a filter that directs packet to queue 2, use -U or -N switch::
|
||||
|
||||
# ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
|
||||
192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
|
||||
|
||||
To see the list of filters currently present::
|
||||
|
||||
# ethtool <-u|-n> ethX
|
||||
|
||||
Sideband Perfect Filters
|
||||
------------------------
|
||||
Sideband Perfect Filters are used to direct traffic that matches specified
|
||||
characteristics. They are enabled through ethtool's ntuple interface. To add a
|
||||
new filter use the following command::
|
||||
|
||||
ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \
|
||||
dst-port <port> action <queue>
|
||||
|
||||
Where:
|
||||
<device> - the ethernet device to program
|
||||
<type> - can be ip4, tcp4, udp4, or sctp4
|
||||
<ip> - the IP address to match on
|
||||
<port> - the port number to match on
|
||||
<queue> - the queue to direct traffic towards (-1 discards the matched traffic)
|
||||
|
||||
Use the following command to delete a filter::
|
||||
|
||||
ethtool -U <device> delete <N>
|
||||
|
||||
Where <N> is the filter id displayed when printing all the active filters, and
|
||||
may also have been specified using "loc <N>" when adding the filter.
|
||||
|
||||
The following example matches TCP traffic sent from 192.168.0.1, port 5300,
|
||||
directed to 192.168.0.5, port 80, and sends it to queue 7::
|
||||
|
||||
ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \
|
||||
src-port 5300 dst-port 80 action 7
|
||||
|
||||
For each flow-type, the programmed filters must all have the same matching
|
||||
input set. For example, issuing the following two commands is acceptable::
|
||||
|
||||
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
|
||||
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
|
||||
|
||||
Issuing the next two commands, however, is not acceptable, since the first
|
||||
specifies src-ip and the second specifies dst-ip::
|
||||
|
||||
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
|
||||
ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
|
||||
|
||||
The second command will fail with an error. You may program multiple filters
|
||||
with the same fields, using different values, but, on one device, you may not
|
||||
program two TCP4 filters with different matching fields.
|
||||
|
||||
Matching on a sub-portion of a field is not supported by the ixgbe driver, thus
|
||||
partial mask fields are not supported.
|
||||
|
||||
To create filters that direct traffic to a specific Virtual Function, use the
|
||||
"user-def" parameter. Specify the user-def as a 64 bit value, where the lower 32
|
||||
bits represents the queue number, while the next 8 bits represent which VF.
|
||||
Note that 0 is the PF, so the VF identifier is offset by 1. For example::
|
||||
|
||||
... user-def 0x800000002 ...
|
||||
|
||||
specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of
|
||||
that VF.
|
||||
|
||||
Note that these filters will not break internal routing rules, and will not
|
||||
route traffic that otherwise would not have been sent to the specified Virtual
|
||||
Function.
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
|
||||
to a value larger than the default value of 1500.
|
||||
|
||||
Use the ifconfig command to increase the MTU size. For example, enter the
|
||||
following where <x> is the interface number::
|
||||
|
||||
ifconfig eth<x> mtu 9000 up
|
||||
|
||||
Alternatively, you can use the ip command as follows::
|
||||
|
||||
ip link set mtu 9000 dev eth<x>
|
||||
ip link set up dev eth<x>
|
||||
|
||||
This setting is not saved across reboots. The setting change can be made
|
||||
permanent by adding 'MTU=9000' to the file::
|
||||
|
||||
/etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL
|
||||
/etc/sysconfig/network/<config_file> // for SLES
|
||||
|
||||
NOTE: The maximum MTU setting for Jumbo Frames is 9710. This value coincides
|
||||
with the maximum Jumbo Frames size of 9728 bytes.
|
||||
|
||||
NOTE: This driver will attempt to use multiple page sized buffers to receive
|
||||
each jumbo packet. This should help to avoid buffer starvation issues when
|
||||
allocating receive packets.
|
||||
|
||||
NOTE: For 82599-based network connections, if you are enabling jumbo frames in
|
||||
a virtual function (VF), jumbo frames must first be enabled in the physical
|
||||
function (PF). The VF MTU setting cannot be larger than the PF MTU.
|
||||
|
||||
Generic Receive Offload, aka GRO
|
||||
--------------------------------
|
||||
The driver supports the in-kernel software implementation of GRO. GRO has
|
||||
shown that by coalescing Rx traffic into larger chunks of data, CPU
|
||||
utilization can be significantly reduced when under large Rx load. GRO is an
|
||||
evolution of the previously-used LRO interface. GRO is able to coalesce
|
||||
other protocols besides TCP. It's also safe to use with configurations that
|
||||
are problematic for LRO, namely bridging and iSCSI.
|
||||
|
||||
Data Center Bridging (DCB)
|
||||
--------------------------
|
||||
NOTE:
|
||||
The kernel assumes that TC0 is available, and will disable Priority Flow
|
||||
Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
|
||||
enabled when setting up DCB on your switch.
|
||||
|
||||
DCB is a configuration Quality of Service implementation in hardware. It uses
|
||||
the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
|
||||
different priorities that traffic can be filtered into. It also enables
|
||||
priority flow control (802.1Qbb) which can limit or eliminate the number of
|
||||
dropped packets during network stress. Bandwidth can be allocated to each of
|
||||
these priorities, which is enforced at the hardware level (802.1Qaz).
|
||||
|
||||
Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and
|
||||
802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only
|
||||
and can accept settings from a DCBX capable peer. Software configuration of
|
||||
DCBX parameters via dcbtool/lldptool are not supported.
|
||||
|
||||
The ixgbe driver implements the DCB netlink interface layer to allow user-space
|
||||
to communicate with the driver and query DCB configuration for the port.
|
||||
|
||||
ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. The latest ethtool
|
||||
version is required for this functionality. Download it at:
|
||||
https://www.kernel.org/pub/software/network/ethtool/
|
||||
|
||||
FCoE
|
||||
----
|
||||
The ixgbe driver supports Fiber Channel over Ethernet (FCoE) and Data Center
|
||||
Bridging (DCB). This code has no default effect on the regular driver
|
||||
operation. Configuring DCB and FCoE is outside the scope of this README. Refer
|
||||
to http://www.open-fcoe.org/ for FCoE project information and contact
|
||||
ixgbe-eedc@lists.sourceforge.net for DCB information.
|
||||
|
||||
MAC and VLAN anti-spoofing feature
|
||||
----------------------------------
|
||||
When a malicious driver attempts to send a spoofed packet, it is dropped by the
|
||||
hardware and not transmitted.
|
||||
|
||||
An interrupt is sent to the PF driver notifying it of the spoof attempt. When a
|
||||
spoofed packet is detected, the PF driver will send the following message to
|
||||
the system log (displayed by the "dmesg" command)::
|
||||
|
||||
ixgbe ethX: ixgbe_spoof_check: n spoofed packets detected
|
||||
|
||||
where "x" is the PF interface number; and "n" is number of spoofed packets.
|
||||
NOTE: This feature can be disabled for a specific Virtual Function (VF)::
|
||||
|
||||
ip link set <pf dev> vf <vf id> spoofchk {off|on}
|
||||
|
||||
IPsec Offload
|
||||
-------------
|
||||
The ixgbe driver supports IPsec Hardware Offload. When creating Security
|
||||
Associations with "ip xfrm ..." the 'offload' tag option can be used to
|
||||
register the IPsec SA with the driver in order to get higher throughput in
|
||||
the secure communications.
|
||||
|
||||
The offload is also supported for ixgbe's VFs, but the VF must be set as
|
||||
'trusted' and the support must be enabled with::
|
||||
|
||||
ethtool --set-priv-flags eth<x> vf-ipsec on
|
||||
ip link set eth<x> vf <y> trust on
|
||||
|
||||
|
||||
Known Issues/Troubleshooting
|
||||
============================
|
||||
|
||||
Enabling SR-IOV in a 64-bit Microsoft Windows Server 2012/R2 guest OS
|
||||
---------------------------------------------------------------------
|
||||
Linux KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM.
|
||||
This includes traditional PCIe devices, as well as SR-IOV-capable devices based
|
||||
on the Intel Ethernet Controller XL710.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
@@ -0,0 +1,67 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
============================================================
|
||||
Linux Base Virtual Function Driver for Intel(R) 10G Ethernet
|
||||
============================================================
|
||||
|
||||
Intel 10 Gigabit Virtual Function Linux driver.
|
||||
Copyright(c) 1999-2018 Intel Corporation.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Identifying Your Adapter
|
||||
- Known Issues
|
||||
- Support
|
||||
|
||||
This driver supports 82599, X540, X550, and X552-based virtual function devices
|
||||
that can only be activated on kernels that support SR-IOV.
|
||||
|
||||
For questions related to hardware requirements, refer to the documentation
|
||||
supplied with your Intel adapter. All hardware requirements listed apply to use
|
||||
with Linux.
|
||||
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
The driver is compatible with devices based on the following:
|
||||
|
||||
* Intel(R) Ethernet Controller 82598
|
||||
* Intel(R) Ethernet Controller 82599
|
||||
* Intel(R) Ethernet Controller X520
|
||||
* Intel(R) Ethernet Controller X540
|
||||
* Intel(R) Ethernet Controller x550
|
||||
* Intel(R) Ethernet Controller X552
|
||||
* Intel(R) Ethernet Controller X553
|
||||
|
||||
For information on how to identify your adapter, and for the latest Intel
|
||||
network drivers, refer to the Intel Support website:
|
||||
https://www.intel.com/support
|
||||
|
||||
Known Issues/Troubleshooting
|
||||
============================
|
||||
|
||||
SR-IOV requires the correct platform and OS support.
|
||||
|
||||
The guest OS loading this driver must support MSI-X interrupts.
|
||||
|
||||
This driver is only supported as a loadable module at this time. Intel is not
|
||||
supplying patches against the kernel source to allow for static linking of the
|
||||
drivers.
|
||||
|
||||
VLANs: There is a limit of a total of 64 shared VLANs to 1 or more VFs.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
https://www.intel.com/support/
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
https://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the issue
|
||||
to e1000-devel@lists.sf.net.
|
@@ -0,0 +1,159 @@
|
||||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
|
||||
====================================
|
||||
Marvell OcteonTx2 RVU Kernel Drivers
|
||||
====================================
|
||||
|
||||
Copyright (c) 2020 Marvell International Ltd.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- `Overview`_
|
||||
- `Drivers`_
|
||||
- `Basic packet flow`_
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
Resource virtualization unit (RVU) on Marvell's OcteonTX2 SOC maps HW
|
||||
resources from the network, crypto and other functional blocks into
|
||||
PCI-compatible physical and virtual functions. Each functional block
|
||||
again has multiple local functions (LFs) for provisioning to PCI devices.
|
||||
RVU supports multiple PCIe SRIOV physical functions (PFs) and virtual
|
||||
functions (VFs). PF0 is called the administrative / admin function (AF)
|
||||
and has privileges to provision RVU functional block's LFs to each of the
|
||||
PF/VF.
|
||||
|
||||
RVU managed networking functional blocks
|
||||
- Network pool or buffer allocator (NPA)
|
||||
- Network interface controller (NIX)
|
||||
- Network parser CAM (NPC)
|
||||
- Schedule/Synchronize/Order unit (SSO)
|
||||
- Loopback interface (LBK)
|
||||
|
||||
RVU managed non-networking functional blocks
|
||||
- Crypto accelerator (CPT)
|
||||
- Scheduled timers unit (TIM)
|
||||
- Schedule/Synchronize/Order unit (SSO)
|
||||
Used for both networking and non networking usecases
|
||||
|
||||
Resource provisioning examples
|
||||
- A PF/VF with NIX-LF & NPA-LF resources works as a pure network device
|
||||
- A PF/VF with CPT-LF resource works as a pure crypto offload device.
|
||||
|
||||
RVU functional blocks are highly configurable as per software requirements.
|
||||
|
||||
Firmware setups following stuff before kernel boots
|
||||
- Enables required number of RVU PFs based on number of physical links.
|
||||
- Number of VFs per PF are either static or configurable at compile time.
|
||||
Based on config, firmware assigns VFs to each of the PFs.
|
||||
- Also assigns MSIX vectors to each of PF and VFs.
|
||||
- These are not changed after kernel boot.
|
||||
|
||||
Drivers
|
||||
=======
|
||||
|
||||
Linux kernel will have multiple drivers registering to different PF and VFs
|
||||
of RVU. Wrt networking there will be 3 flavours of drivers.
|
||||
|
||||
Admin Function driver
|
||||
---------------------
|
||||
|
||||
As mentioned above RVU PF0 is called the admin function (AF), this driver
|
||||
supports resource provisioning and configuration of functional blocks.
|
||||
Doesn't handle any I/O. It sets up few basic stuff but most of the
|
||||
funcionality is achieved via configuration requests from PFs and VFs.
|
||||
|
||||
PF/VFs communicates with AF via a shared memory region (mailbox). Upon
|
||||
receiving requests AF does resource provisioning and other HW configuration.
|
||||
AF is always attached to host kernel, but PFs and their VFs may be used by host
|
||||
kernel itself, or attached to VMs or to userspace applications like
|
||||
DPDK etc. So AF has to handle provisioning/configuration requests sent
|
||||
by any device from any domain.
|
||||
|
||||
AF driver also interacts with underlying firmware to
|
||||
- Manage physical ethernet links ie CGX LMACs.
|
||||
- Retrieve information like speed, duplex, autoneg etc
|
||||
- Retrieve PHY EEPROM and stats.
|
||||
- Configure FEC, PAM modes
|
||||
- etc
|
||||
|
||||
From pure networking side AF driver supports following functionality.
|
||||
- Map a physical link to a RVU PF to which a netdev is registered.
|
||||
- Attach NIX and NPA block LFs to RVU PF/VF which provide buffer pools, RQs, SQs
|
||||
for regular networking functionality.
|
||||
- Flow control (pause frames) enable/disable/config.
|
||||
- HW PTP timestamping related config.
|
||||
- NPC parser profile config, basically how to parse pkt and what info to extract.
|
||||
- NPC extract profile config, what to extract from the pkt to match data in MCAM entries.
|
||||
- Manage NPC MCAM entries, upon request can frame and install requested packet forwarding rules.
|
||||
- Defines receive side scaling (RSS) algorithms.
|
||||
- Defines segmentation offload algorithms (eg TSO)
|
||||
- VLAN stripping, capture and insertion config.
|
||||
- SSO and TIM blocks config which provide packet scheduling support.
|
||||
- Debugfs support, to check current resource provising, current status of
|
||||
NPA pools, NIX RQ, SQ and CQs, various stats etc which helps in debugging issues.
|
||||
- And many more.
|
||||
|
||||
Physical Function driver
|
||||
------------------------
|
||||
|
||||
This RVU PF handles IO, is mapped to a physical ethernet link and this
|
||||
driver registers a netdev. This supports SR-IOV. As said above this driver
|
||||
communicates with AF with a mailbox. To retrieve information from physical
|
||||
links this driver talks to AF and AF gets that info from firmware and responds
|
||||
back ie cannot talk to firmware directly.
|
||||
|
||||
Supports ethtool for configuring links, RSS, queue count, queue size,
|
||||
flow control, ntuple filters, dump PHY EEPROM, config FEC etc.
|
||||
|
||||
Virtual Function driver
|
||||
-----------------------
|
||||
|
||||
There are two types VFs, VFs that share the physical link with their parent
|
||||
SR-IOV PF and the VFs which work in pairs using internal HW loopback channels (LBK).
|
||||
|
||||
Type1:
|
||||
- These VFs and their parent PF share a physical link and used for outside communication.
|
||||
- VFs cannot communicate with AF directly, they send mbox message to PF and PF
|
||||
forwards that to AF. AF after processing, responds back to PF and PF forwards
|
||||
the reply to VF.
|
||||
- From functionality point of view there is no difference between PF and VF as same type
|
||||
HW resources are attached to both. But user would be able to configure few stuff only
|
||||
from PF as PF is treated as owner/admin of the link.
|
||||
|
||||
Type2:
|
||||
- RVU PF0 ie admin function creates these VFs and maps them to loopback block's channels.
|
||||
- A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of
|
||||
VF0 will be received by VF1 and viceversa.
|
||||
- These VFs can be used by applications or virtual machines to communicate between them
|
||||
without sending traffic outside. There is no switch present in HW, hence the support
|
||||
for loopback VFs.
|
||||
- These communicate directly with AF (PF0) via mbox.
|
||||
|
||||
Except for the IO channels or links used for packet reception and transmission there is
|
||||
no other difference between these VF types. AF driver takes care of IO channel mapping,
|
||||
hence same VF driver works for both types of devices.
|
||||
|
||||
Basic packet flow
|
||||
=================
|
||||
|
||||
Ingress
|
||||
-------
|
||||
|
||||
1. CGX LMAC receives packet.
|
||||
2. Forwards the packet to the NIX block.
|
||||
3. Then submitted to NPC block for parsing and then MCAM lookup to get the destination RVU device.
|
||||
4. NIX LF attached to the destination RVU device allocates a buffer from RQ mapped buffer pool of NPA block LF.
|
||||
5. RQ may be selected by RSS or by configuring MCAM rule with a RQ number.
|
||||
6. Packet is DMA'ed and driver is notified.
|
||||
|
||||
Egress
|
||||
------
|
||||
|
||||
1. Driver prepares a send descriptor and submits to SQ for transmission.
|
||||
2. The SQ is already configured (by AF) to transmit on a specific link/channel.
|
||||
3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF.
|
||||
4. NIX block transmits the pkt on the designated channel.
|
||||
5. NPC MCAM entries can be installed to divert pkt onto a different channel.
|
@@ -0,0 +1,321 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
|
||||
|
||||
=================================================
|
||||
Mellanox ConnectX(R) mlx5 core VPI Network Driver
|
||||
=================================================
|
||||
|
||||
Copyright (c) 2019, Mellanox Technologies LTD.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- `Enabling the driver and kconfig options`_
|
||||
- `Devlink info`_
|
||||
- `Devlink parameters`_
|
||||
- `Devlink health reporters`_
|
||||
- `mlx5 tracepoints`_
|
||||
|
||||
Enabling the driver and kconfig options
|
||||
================================================
|
||||
|
||||
| mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out)
|
||||
| at build time via kernel Kconfig flags.
|
||||
| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
|
||||
| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
|
||||
| For the list of advanced features please see below.
|
||||
|
||||
**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
|
||||
|
||||
| The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config.
|
||||
| This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib).
|
||||
|
||||
|
||||
**CONFIG_MLX5_CORE_EN=(y/n)**
|
||||
|
||||
| Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads.
|
||||
| mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be
|
||||
| built-in into mlx5_core.ko.
|
||||
|
||||
|
||||
**CONFIG_MLX5_EN_ARFS=(y/n)**
|
||||
|
||||
| Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering.
|
||||
| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4
|
||||
|
||||
|
||||
**CONFIG_MLX5_EN_RXNFC=(y/n)**
|
||||
|
||||
| Enables ethtool receive network flow classification, which allows user defined
|
||||
| flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API.
|
||||
|
||||
|
||||
**CONFIG_MLX5_CORE_EN_DCB=(y/n)**:
|
||||
|
||||
| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_.
|
||||
|
||||
|
||||
**CONFIG_MLX5_MPFS=(y/n)**
|
||||
|
||||
| Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC.
|
||||
| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing
|
||||
| user configured unicast MAC addresses to the requesting PF.
|
||||
|
||||
|
||||
**CONFIG_MLX5_ESWITCH=(y/n)**
|
||||
|
||||
| Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering
|
||||
| and switching for the enabled VFs and PF in two available modes:
|
||||
| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_.
|
||||
| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_.
|
||||
|
||||
|
||||
**CONFIG_MLX5_CORE_IPOIB=(y/n)**
|
||||
|
||||
| IPoIB offloads & acceleration support.
|
||||
| Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma
|
||||
| IPoIB ulp netdevice.
|
||||
|
||||
|
||||
**CONFIG_MLX5_FPGA=(y/n)**
|
||||
|
||||
| Build support for the Innova family of network cards by Mellanox Technologies.
|
||||
| Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board.
|
||||
| If you select this option, the mlx5_core driver will include the Innova FPGA core and allow
|
||||
| building sandbox-specific client drivers.
|
||||
|
||||
|
||||
**CONFIG_MLX5_EN_IPSEC=(y/n)**
|
||||
|
||||
| Enables `IPSec XFRM cryptography-offload accelaration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
|
||||
|
||||
**CONFIG_MLX5_EN_TLS=(y/n)**
|
||||
|
||||
| TLS cryptography-offload accelaration.
|
||||
|
||||
|
||||
**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
|
||||
|
||||
| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support.
|
||||
|
||||
|
||||
**External options** ( Choose if the corresponding mlx5 feature is required )
|
||||
|
||||
- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled
|
||||
- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled.
|
||||
- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool).
|
||||
|
||||
Devlink info
|
||||
============
|
||||
|
||||
The devlink info reports the running and stored firmware versions on device.
|
||||
It also prints the device PSID which represents the HCA board type ID.
|
||||
|
||||
User command example::
|
||||
|
||||
$ devlink dev info pci/0000:00:06.0
|
||||
pci/0000:00:06.0:
|
||||
driver mlx5_core
|
||||
versions:
|
||||
fixed:
|
||||
fw.psid MT_0000000009
|
||||
running:
|
||||
fw.version 16.26.0100
|
||||
stored:
|
||||
fw.version 16.26.0100
|
||||
|
||||
Devlink parameters
|
||||
==================
|
||||
|
||||
flow_steering_mode: Device flow steering mode
|
||||
---------------------------------------------
|
||||
The flow steering mode parameter controls the flow steering mode of the driver.
|
||||
Two modes are supported:
|
||||
1. 'dmfs' - Device managed flow steering.
|
||||
2. 'smfs - Software/Driver managed flow steering.
|
||||
|
||||
In DMFS mode, the HW steering entities are created and managed through the
|
||||
Firmware.
|
||||
In SMFS mode, the HW steering entities are created and managed though by
|
||||
the driver directly into Hardware without firmware intervention.
|
||||
|
||||
SMFS mode is faster and provides better rule inserstion rate compared to default DMFS mode.
|
||||
|
||||
User command examples:
|
||||
|
||||
- Set SMFS flow steering mode::
|
||||
|
||||
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
|
||||
|
||||
- Read device flow steering mode::
|
||||
|
||||
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
|
||||
pci/0000:06:00.0:
|
||||
name flow_steering_mode type driver-specific
|
||||
values:
|
||||
cmode runtime value smfs
|
||||
|
||||
enable_roce: RoCE enablement state
|
||||
----------------------------------
|
||||
RoCE enablement state controls driver support for RoCE traffic.
|
||||
When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well known UDP RoCE port is handled as raw ethernet traffic.
|
||||
|
||||
To change RoCE enablement state a user must change the driverinit cmode value and run devlink reload.
|
||||
|
||||
User command examples:
|
||||
|
||||
- Disable RoCE::
|
||||
|
||||
$ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit
|
||||
$ devlink dev reload pci/0000:06:00.0
|
||||
|
||||
- Read RoCE enablement state::
|
||||
|
||||
$ devlink dev param show pci/0000:06:00.0 name enable_roce
|
||||
pci/0000:06:00.0:
|
||||
name enable_roce type generic
|
||||
values:
|
||||
cmode driverinit value true
|
||||
|
||||
Devlink health reporters
|
||||
========================
|
||||
|
||||
tx reporter
|
||||
-----------
|
||||
The tx reporter is responsible for reporting and recovering of the following two error scenarios:
|
||||
|
||||
- TX timeout
|
||||
Report on kernel tx timeout detection.
|
||||
Recover by searching lost interrupts.
|
||||
- TX error completion
|
||||
Report on error tx completion.
|
||||
Recover by flushing the TX queue and reset it.
|
||||
|
||||
TX reporter also support on demand diagnose callback, on which it provides
|
||||
real time information of its send queues status.
|
||||
|
||||
User commands examples:
|
||||
|
||||
- Diagnose send queues status::
|
||||
|
||||
$ devlink health diagnose pci/0000:82:00.0 reporter tx
|
||||
|
||||
NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
|
||||
|
||||
- Show number of tx errors indicated, number of recover flows ended successfully,
|
||||
is autorecover enabled and graceful period from last recover::
|
||||
|
||||
$ devlink health show pci/0000:82:00.0 reporter tx
|
||||
|
||||
rx reporter
|
||||
-----------
|
||||
The rx reporter is responsible for reporting and recovering of the following two error scenarios:
|
||||
|
||||
- RX queues initialization (population) timeout
|
||||
RX queues descriptors population on ring initialization is done in
|
||||
napi context via triggering an irq, in case of a failure to get
|
||||
the minimum amount of descriptors, a timeout would occur and it
|
||||
could be recoverable by polling the EQ (Event Queue).
|
||||
- RX completions with errors (reported by HW on interrupt context)
|
||||
Report on rx completion error.
|
||||
Recover (if needed) by flushing the related queue and reset it.
|
||||
|
||||
RX reporter also supports on demand diagnose callback, on which it
|
||||
provides real time information of its receive queues status.
|
||||
|
||||
- Diagnose rx queues status, and corresponding completion queue::
|
||||
|
||||
$ devlink health diagnose pci/0000:82:00.0 reporter rx
|
||||
|
||||
NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
|
||||
|
||||
- Show number of rx errors indicated, number of recover flows ended successfully,
|
||||
is autorecover enabled and graceful period from last recover::
|
||||
|
||||
$ devlink health show pci/0000:82:00.0 reporter rx
|
||||
|
||||
fw reporter
|
||||
-----------
|
||||
The fw reporter implements diagnose and dump callbacks.
|
||||
It follows symptoms of fw error such as fw syndrome by triggering
|
||||
fw core dump and storing it into the dump buffer.
|
||||
The fw reporter diagnose command can be triggered any time by the user to check
|
||||
current fw status.
|
||||
|
||||
User commands examples:
|
||||
|
||||
- Check fw heath status::
|
||||
|
||||
$ devlink health diagnose pci/0000:82:00.0 reporter fw
|
||||
|
||||
- Read FW core dump if already stored or trigger new one::
|
||||
|
||||
$ devlink health dump show pci/0000:82:00.0 reporter fw
|
||||
|
||||
NOTE: This command can run only on the PF which has fw tracer ownership,
|
||||
running it on other PF or any VF will return "Operation not permitted".
|
||||
|
||||
fw fatal reporter
|
||||
-----------------
|
||||
The fw fatal reporter implements dump and recover callbacks.
|
||||
It follows fatal errors indications by CR-space dump and recover flow.
|
||||
The CR-space dump uses vsc interface which is valid even if the FW command
|
||||
interface is not functional, which is the case in most FW fatal errors.
|
||||
The recover function runs recover flow which reloads the driver and triggers fw
|
||||
reset if needed.
|
||||
|
||||
User commands examples:
|
||||
|
||||
- Run fw recover flow manually::
|
||||
|
||||
$ devlink health recover pci/0000:82:00.0 reporter fw_fatal
|
||||
|
||||
- Read FW CR-space dump if already strored or trigger new one::
|
||||
|
||||
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
|
||||
|
||||
NOTE: This command can run only on PF.
|
||||
|
||||
mlx5 tracepoints
|
||||
================
|
||||
|
||||
mlx5 driver provides internal trace points for tracking and debugging using
|
||||
kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst).
|
||||
|
||||
For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/
|
||||
|
||||
tc and eswitch offloads tracepoints:
|
||||
|
||||
- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5::
|
||||
|
||||
$ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event
|
||||
$ cat /sys/kernel/debug/tracing/trace
|
||||
...
|
||||
tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT
|
||||
|
||||
- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5::
|
||||
|
||||
$ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event
|
||||
$ cat /sys/kernel/debug/tracing/trace
|
||||
...
|
||||
tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL
|
||||
|
||||
- mlx5e_stats_flower: trace flower stats request::
|
||||
|
||||
$ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event
|
||||
$ cat /sys/kernel/debug/tracing/trace
|
||||
...
|
||||
tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217
|
||||
|
||||
- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5::
|
||||
|
||||
$ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event
|
||||
$ cat /sys/kernel/debug/tracing/trace
|
||||
...
|
||||
kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1
|
||||
|
||||
- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events::
|
||||
|
||||
$ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event
|
||||
$ cat /sys/kernel/debug/tracing/trace
|
||||
...
|
||||
kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1
|
@@ -0,0 +1,116 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================
|
||||
Hyper-V network driver
|
||||
======================
|
||||
|
||||
Compatibility
|
||||
=============
|
||||
|
||||
This driver is compatible with Windows Server 2012 R2, 2016 and
|
||||
Windows 10.
|
||||
|
||||
Features
|
||||
========
|
||||
|
||||
Checksum offload
|
||||
----------------
|
||||
The netvsc driver supports checksum offload as long as the
|
||||
Hyper-V host version does. Windows Server 2016 and Azure
|
||||
support checksum offload for TCP and UDP for both IPv4 and
|
||||
IPv6. Windows Server 2012 only supports checksum offload for TCP.
|
||||
|
||||
Receive Side Scaling
|
||||
--------------------
|
||||
Hyper-V supports receive side scaling. For TCP & UDP, packets can
|
||||
be distributed among available queues based on IP address and port
|
||||
number.
|
||||
|
||||
For TCP & UDP, we can switch hash level between L3 and L4 by ethtool
|
||||
command. TCP/UDP over IPv4 and v6 can be set differently. The default
|
||||
hash level is L4. We currently only allow switching TX hash level
|
||||
from within the guests.
|
||||
|
||||
On Azure, fragmented UDP packets have high loss rate with L4
|
||||
hashing. Using L3 hashing is recommended in this case.
|
||||
|
||||
For example, for UDP over IPv4 on eth0:
|
||||
|
||||
To include UDP port numbers in hashing::
|
||||
|
||||
ethtool -N eth0 rx-flow-hash udp4 sdfn
|
||||
|
||||
To exclude UDP port numbers in hashing::
|
||||
|
||||
ethtool -N eth0 rx-flow-hash udp4 sd
|
||||
|
||||
To show UDP hash level::
|
||||
|
||||
ethtool -n eth0 rx-flow-hash udp4
|
||||
|
||||
Generic Receive Offload, aka GRO
|
||||
--------------------------------
|
||||
The driver supports GRO and it is enabled by default. GRO coalesces
|
||||
like packets and significantly reduces CPU usage under heavy Rx
|
||||
load.
|
||||
|
||||
Large Receive Offload (LRO), or Receive Side Coalescing (RSC)
|
||||
-------------------------------------------------------------
|
||||
The driver supports LRO/RSC in the vSwitch feature. It reduces the per packet
|
||||
processing overhead by coalescing multiple TCP segments when possible. The
|
||||
feature is enabled by default on VMs running on Windows Server 2019 and
|
||||
later. It may be changed by ethtool command::
|
||||
|
||||
ethtool -K eth0 lro on
|
||||
ethtool -K eth0 lro off
|
||||
|
||||
SR-IOV support
|
||||
--------------
|
||||
Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV
|
||||
is enabled in both the vSwitch and the guest configuration, then the
|
||||
Virtual Function (VF) device is passed to the guest as a PCI
|
||||
device. In this case, both a synthetic (netvsc) and VF device are
|
||||
visible in the guest OS and both NIC's have the same MAC address.
|
||||
|
||||
The VF is enslaved by netvsc device. The netvsc driver will transparently
|
||||
switch the data path to the VF when it is available and up.
|
||||
Network state (addresses, firewall, etc) should be applied only to the
|
||||
netvsc device; the slave device should not be accessed directly in
|
||||
most cases. The exceptions are if some special queue discipline or
|
||||
flow direction is desired, these should be applied directly to the
|
||||
VF slave device.
|
||||
|
||||
Receive Buffer
|
||||
--------------
|
||||
Packets are received into a receive area which is created when device
|
||||
is probed. The receive area is broken into MTU sized chunks and each may
|
||||
contain one or more packets. The number of receive sections may be changed
|
||||
via ethtool Rx ring parameters.
|
||||
|
||||
There is a similar send buffer which is used to aggregate packets for sending.
|
||||
The send area is broken into chunks of 6144 bytes, each of section may
|
||||
contain one or more packets. The send buffer is an optimization, the driver
|
||||
will use slower method to handle very large packets or if the send buffer
|
||||
area is exhausted.
|
||||
|
||||
XDP support
|
||||
-----------
|
||||
XDP (eXpress Data Path) is a feature that runs eBPF bytecode at the early
|
||||
stage when packets arrive at a NIC card. The goal is to increase performance
|
||||
for packet processing, reducing the overhead of SKB allocation and other
|
||||
upper network layers.
|
||||
|
||||
hv_netvsc supports XDP in native mode, and transparently sets the XDP
|
||||
program on the associated VF NIC as well.
|
||||
|
||||
Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to
|
||||
VF NIC automatically. Setting / unsetting XDP program on VF NIC directly
|
||||
is not recommended, also not propagated to synthetic NIC, and may be
|
||||
overwritten by setting of synthetic NIC.
|
||||
|
||||
XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO
|
||||
before running XDP::
|
||||
|
||||
ethtool -K eth0 lro off
|
||||
|
||||
XDP_REDIRECT action is not yet supported.
|
@@ -0,0 +1,196 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================================================
|
||||
Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver
|
||||
=========================================================
|
||||
|
||||
Release notes for Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver.
|
||||
|
||||
.. Contents
|
||||
- 1. Introduction
|
||||
- 2. Identifying the adapter/interface
|
||||
- 3. Features supported
|
||||
- 4. Command line parameters
|
||||
- 5. Performance suggestions
|
||||
- 6. Available Downloads
|
||||
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
This Linux driver supports Neterion's Xframe I PCI-X 1.0 and
|
||||
Xframe II PCI-X 2.0 adapters. It supports several features
|
||||
such as jumbo frames, MSI/MSI-X, checksum offloads, TSO, UFO and so on.
|
||||
See below for complete list of features.
|
||||
|
||||
All features are supported for both IPv4 and IPv6.
|
||||
|
||||
2. Identifying the adapter/interface
|
||||
====================================
|
||||
|
||||
a. Insert the adapter(s) in your system.
|
||||
b. Build and load driver::
|
||||
|
||||
# insmod s2io.ko
|
||||
|
||||
c. View log messages::
|
||||
|
||||
# dmesg | tail -40
|
||||
|
||||
You will see messages similar to::
|
||||
|
||||
eth3: Neterion Xframe I 10GbE adapter (rev 3), Version 2.0.9.1, Intr type INTA
|
||||
eth4: Neterion Xframe II 10GbE adapter (rev 2), Version 2.0.9.1, Intr type INTA
|
||||
eth4: Device is on 64 bit 133MHz PCIX(M1) bus
|
||||
|
||||
The above messages identify the adapter type(Xframe I/II), adapter revision,
|
||||
driver version, interface name(eth3, eth4), Interrupt type(INTA, MSI, MSI-X).
|
||||
In case of Xframe II, the PCI/PCI-X bus width and frequency are displayed
|
||||
as well.
|
||||
|
||||
To associate an interface with a physical adapter use "ethtool -p <ethX>".
|
||||
The corresponding adapter's LED will blink multiple times.
|
||||
|
||||
3. Features supported
|
||||
=====================
|
||||
a. Jumbo frames. Xframe I/II supports MTU up to 9600 bytes,
|
||||
modifiable using ip command.
|
||||
|
||||
b. Offloads. Supports checksum offload(TCP/UDP/IP) on transmit
|
||||
and receive, TSO.
|
||||
|
||||
c. Multi-buffer receive mode. Scattering of packet across multiple
|
||||
buffers. Currently driver supports 2-buffer mode which yields
|
||||
significant performance improvement on certain platforms(SGI Altix,
|
||||
IBM xSeries).
|
||||
|
||||
d. MSI/MSI-X. Can be enabled on platforms which support this feature
|
||||
(IA64, Xeon) resulting in noticeable performance improvement(up to 7%
|
||||
on certain platforms).
|
||||
|
||||
e. Statistics. Comprehensive MAC-level and software statistics displayed
|
||||
using "ethtool -S" option.
|
||||
|
||||
f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings,
|
||||
with multiple steering options.
|
||||
|
||||
4. Command line parameters
|
||||
==========================
|
||||
|
||||
a. tx_fifo_num
|
||||
Number of transmit queues
|
||||
|
||||
Valid range: 1-8
|
||||
|
||||
Default: 1
|
||||
|
||||
b. rx_ring_num
|
||||
Number of receive rings
|
||||
|
||||
Valid range: 1-8
|
||||
|
||||
Default: 1
|
||||
|
||||
c. tx_fifo_len
|
||||
Size of each transmit queue
|
||||
|
||||
Valid range: Total length of all queues should not exceed 8192
|
||||
|
||||
Default: 4096
|
||||
|
||||
d. rx_ring_sz
|
||||
Size of each receive ring(in 4K blocks)
|
||||
|
||||
Valid range: Limited by memory on system
|
||||
|
||||
Default: 30
|
||||
|
||||
e. intr_type
|
||||
Specifies interrupt type. Possible values 0(INTA), 2(MSI-X)
|
||||
|
||||
Valid values: 0, 2
|
||||
|
||||
Default: 2
|
||||
|
||||
5. Performance suggestions
|
||||
==========================
|
||||
|
||||
General:
|
||||
|
||||
a. Set MTU to maximum(9000 for switch setup, 9600 in back-to-back configuration)
|
||||
b. Set TCP windows size to optimal value.
|
||||
|
||||
For instance, for MTU=1500 a value of 210K has been observed to result in
|
||||
good performance::
|
||||
|
||||
# sysctl -w net.ipv4.tcp_rmem="210000 210000 210000"
|
||||
# sysctl -w net.ipv4.tcp_wmem="210000 210000 210000"
|
||||
|
||||
For MTU=9000, TCP window size of 10 MB is recommended::
|
||||
|
||||
# sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
|
||||
# sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
|
||||
|
||||
Transmit performance:
|
||||
|
||||
a. By default, the driver respects BIOS settings for PCI bus parameters.
|
||||
However, you may want to experiment with PCI bus parameters
|
||||
max-split-transactions(MOST) and MMRBC (use setpci command).
|
||||
|
||||
A MOST value of 2 has been found optimal for Opterons and 3 for Itanium.
|
||||
|
||||
It could be different for your hardware.
|
||||
|
||||
Set MMRBC to 4K**.
|
||||
|
||||
For example you can set
|
||||
|
||||
For opteron::
|
||||
|
||||
#setpci -d 17d5:* 62=1d
|
||||
|
||||
For Itanium::
|
||||
|
||||
#setpci -d 17d5:* 62=3d
|
||||
|
||||
For detailed description of the PCI registers, please see Xframe User Guide.
|
||||
|
||||
b. Ensure Transmit Checksum offload is enabled. Use ethtool to set/verify this
|
||||
parameter.
|
||||
|
||||
c. Turn on TSO(using "ethtool -K")::
|
||||
|
||||
# ethtool -K <ethX> tso on
|
||||
|
||||
Receive performance:
|
||||
|
||||
a. By default, the driver respects BIOS settings for PCI bus parameters.
|
||||
However, you may want to set PCI latency timer to 248::
|
||||
|
||||
#setpci -d 17d5:* LATENCY_TIMER=f8
|
||||
|
||||
For detailed description of the PCI registers, please see Xframe User Guide.
|
||||
|
||||
b. Use 2-buffer mode. This results in large performance boost on
|
||||
certain platforms(eg. SGI Altix, IBM xSeries).
|
||||
|
||||
c. Ensure Receive Checksum offload is enabled. Use "ethtool -K ethX" command to
|
||||
set/verify this option.
|
||||
|
||||
d. Enable NAPI feature(in kernel configuration Device Drivers ---> Network
|
||||
device support ---> Ethernet (10000 Mbit) ---> S2IO 10Gbe Xframe NIC) to
|
||||
bring down CPU utilization.
|
||||
|
||||
.. note::
|
||||
|
||||
For AMD opteron platforms with 8131 chipset, MMRBC=1 and MOST=1 are
|
||||
recommended as safe parameters.
|
||||
|
||||
For more information, please review the AMD8131 errata at
|
||||
http://vip.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/
|
||||
26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
|
||||
|
||||
6. Support
|
||||
==========
|
||||
|
||||
For further support please contact either your 10GbE Xframe NIC vendor (IBM,
|
||||
HP, SGI etc.)
|
@@ -0,0 +1,115 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============================================================================
|
||||
Neterion's (Formerly S2io) X3100 Series 10GbE PCIe Server Adapter Linux driver
|
||||
==============================================================================
|
||||
|
||||
.. Contents
|
||||
|
||||
1) Introduction
|
||||
2) Features supported
|
||||
3) Configurable driver parameters
|
||||
4) Troubleshooting
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
This Linux driver supports all Neterion's X3100 series 10 GbE PCIe I/O
|
||||
Virtualized Server adapters.
|
||||
|
||||
The X3100 series supports four modes of operation, configurable via
|
||||
firmware:
|
||||
|
||||
- Single function mode
|
||||
- Multi function mode
|
||||
- SRIOV mode
|
||||
- MRIOV mode
|
||||
|
||||
The functions share a 10GbE link and the pci-e bus, but hardly anything else
|
||||
inside the ASIC. Features like independent hw reset, statistics, bandwidth/
|
||||
priority allocation and guarantees, GRO, TSO, interrupt moderation etc are
|
||||
supported independently on each function.
|
||||
|
||||
(See below for a complete list of features supported for both IPv4 and IPv6)
|
||||
|
||||
2. Features supported
|
||||
=====================
|
||||
|
||||
i) Single function mode (up to 17 queues)
|
||||
|
||||
ii) Multi function mode (up to 17 functions)
|
||||
|
||||
iii) PCI-SIG's I/O Virtualization
|
||||
|
||||
- Single Root mode: v1.0 (up to 17 functions)
|
||||
- Multi-Root mode: v1.0 (up to 17 functions)
|
||||
|
||||
iv) Jumbo frames
|
||||
|
||||
X3100 Series supports MTU up to 9600 bytes, modifiable using
|
||||
ip command.
|
||||
|
||||
v) Offloads supported: (Enabled by default)
|
||||
|
||||
- Checksum offload (TCP/UDP/IP) on transmit and receive paths
|
||||
- TCP Segmentation Offload (TSO) on transmit path
|
||||
- Generic Receive Offload (GRO) on receive path
|
||||
|
||||
vi) MSI-X: (Enabled by default)
|
||||
|
||||
Resulting in noticeable performance improvement (up to 7% on certain
|
||||
platforms).
|
||||
|
||||
vii) NAPI: (Enabled by default)
|
||||
|
||||
For better Rx interrupt moderation.
|
||||
|
||||
viii)RTH (Receive Traffic Hash): (Enabled by default)
|
||||
|
||||
Receive side steering for better scaling.
|
||||
|
||||
ix) Statistics
|
||||
|
||||
Comprehensive MAC-level and software statistics displayed using
|
||||
"ethtool -S" option.
|
||||
|
||||
x) Multiple hardware queues: (Enabled by default)
|
||||
|
||||
Up to 17 hardware based transmit and receive data channels, with
|
||||
multiple steering options (transmit multiqueue enabled by default).
|
||||
|
||||
3) Configurable driver parameters:
|
||||
----------------------------------
|
||||
|
||||
i) max_config_dev
|
||||
Specifies maximum device functions to be enabled.
|
||||
|
||||
Valid range: 1-8
|
||||
|
||||
ii) max_config_port
|
||||
Specifies number of ports to be enabled.
|
||||
|
||||
Valid range: 1,2
|
||||
|
||||
Default: 1
|
||||
|
||||
iii) max_config_vpath
|
||||
Specifies maximum VPATH(s) configured for each device function.
|
||||
|
||||
Valid range: 1-17
|
||||
|
||||
iv) vlan_tag_strip
|
||||
Enables/disables vlan tag stripping from all received tagged frames that
|
||||
are not replicated at the internal L2 switch.
|
||||
|
||||
Valid range: 0,1 (disabled, enabled respectively)
|
||||
|
||||
Default: 1
|
||||
|
||||
v) addr_learn_en
|
||||
Enable learning the mac address of the guest OS interface in
|
||||
virtualization environment.
|
||||
|
||||
Valid range: 0,1 (disabled, enabled respectively)
|
||||
|
||||
Default: 0
|
@@ -0,0 +1,249 @@
|
||||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
|
||||
=============================================
|
||||
Netronome Flow Processor (NFP) Kernel Drivers
|
||||
=============================================
|
||||
|
||||
Copyright (c) 2019, Netronome Systems, Inc.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- `Overview`_
|
||||
- `Acquiring Firmware`_
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
This driver supports Netronome's line of Flow Processor devices,
|
||||
including the NFP4000, NFP5000, and NFP6000 models, which are also
|
||||
incorporated in the company's family of Agilio SmartNICs. The SR-IOV
|
||||
physical and virtual functions for these devices are supported by
|
||||
the driver.
|
||||
|
||||
Acquiring Firmware
|
||||
==================
|
||||
|
||||
The NFP4000 and NFP6000 devices require application specific firmware
|
||||
to function. Application firmware can be located either on the host file system
|
||||
or in the device flash (if supported by management firmware).
|
||||
|
||||
Firmware files on the host filesystem contain card type (`AMDA-*` string), media
|
||||
config etc. They should be placed in `/lib/firmware/netronome` directory to
|
||||
load firmware from the host file system.
|
||||
|
||||
Firmware for basic NIC operation is available in the upstream
|
||||
`linux-firmware.git` repository.
|
||||
|
||||
Firmware in NVRAM
|
||||
-----------------
|
||||
|
||||
Recent versions of management firmware supports loading application
|
||||
firmware from flash when the host driver gets probed. The firmware loading
|
||||
policy configuration may be used to configure this feature appropriately.
|
||||
|
||||
Devlink or ethtool can be used to update the application firmware on the device
|
||||
flash by providing the appropriate `nic_AMDA*.nffw` file to the respective
|
||||
command. Users need to take care to write the correct firmware image for the
|
||||
card and media configuration to flash.
|
||||
|
||||
Available storage space in flash depends on the card being used.
|
||||
|
||||
Dealing with multiple projects
|
||||
------------------------------
|
||||
|
||||
NFP hardware is fully programmable therefore there can be different
|
||||
firmware images targeting different applications.
|
||||
|
||||
When using application firmware from host, we recommend placing
|
||||
actual firmware files in application-named subdirectories in
|
||||
`/lib/firmware/netronome` and linking the desired files, e.g.::
|
||||
|
||||
$ tree /lib/firmware/netronome/
|
||||
/lib/firmware/netronome/
|
||||
├── bpf
|
||||
│ ├── nic_AMDA0081-0001_1x40.nffw
|
||||
│ └── nic_AMDA0081-0001_4x10.nffw
|
||||
├── flower
|
||||
│ ├── nic_AMDA0081-0001_1x40.nffw
|
||||
│ └── nic_AMDA0081-0001_4x10.nffw
|
||||
├── nic
|
||||
│ ├── nic_AMDA0081-0001_1x40.nffw
|
||||
│ └── nic_AMDA0081-0001_4x10.nffw
|
||||
├── nic_AMDA0081-0001_1x40.nffw -> bpf/nic_AMDA0081-0001_1x40.nffw
|
||||
└── nic_AMDA0081-0001_4x10.nffw -> bpf/nic_AMDA0081-0001_4x10.nffw
|
||||
|
||||
3 directories, 8 files
|
||||
|
||||
You may need to use hard instead of symbolic links on distributions
|
||||
which use old `mkinitrd` command instead of `dracut` (e.g. Ubuntu).
|
||||
|
||||
After changing firmware files you may need to regenerate the initramfs
|
||||
image. Initramfs contains drivers and firmware files your system may
|
||||
need to boot. Refer to the documentation of your distribution to find
|
||||
out how to update initramfs. Good indication of stale initramfs
|
||||
is system loading wrong driver or firmware on boot, but when driver is
|
||||
later reloaded manually everything works correctly.
|
||||
|
||||
Selecting firmware per device
|
||||
-----------------------------
|
||||
|
||||
Most commonly all cards on the system use the same type of firmware.
|
||||
If you want to load specific firmware image for a specific card, you
|
||||
can use either the PCI bus address or serial number. Driver will print
|
||||
which files it's looking for when it recognizes a NFP device::
|
||||
|
||||
nfp: Looking for firmware file in order of priority:
|
||||
nfp: netronome/serial-00-12-34-aa-bb-cc-10-ff.nffw: not found
|
||||
nfp: netronome/pci-0000:02:00.0.nffw: not found
|
||||
nfp: netronome/nic_AMDA0081-0001_1x40.nffw: found, loading...
|
||||
|
||||
In this case if file (or link) called *serial-00-12-34-aa-bb-5d-10-ff.nffw*
|
||||
or *pci-0000:02:00.0.nffw* is present in `/lib/firmware/netronome` this
|
||||
firmware file will take precedence over `nic_AMDA*` files.
|
||||
|
||||
Note that `serial-*` and `pci-*` files are **not** automatically included
|
||||
in initramfs, you will have to refer to documentation of appropriate tools
|
||||
to find out how to include them.
|
||||
|
||||
Firmware loading policy
|
||||
-----------------------
|
||||
|
||||
Firmware loading policy is controlled via three HWinfo parameters
|
||||
stored as key value pairs in the device flash:
|
||||
|
||||
app_fw_from_flash
|
||||
Defines which firmware should take precedence, 'Disk' (0), 'Flash' (1) or
|
||||
the 'Preferred' (2) firmware. When 'Preferred' is selected, the management
|
||||
firmware makes the decision over which firmware will be loaded by comparing
|
||||
versions of the flash firmware and the host supplied firmware.
|
||||
This variable is configurable using the 'fw_load_policy'
|
||||
devlink parameter.
|
||||
|
||||
abi_drv_reset
|
||||
Defines if the driver should reset the firmware when
|
||||
the driver is probed, either 'Disk' (0) if firmware was found on disk,
|
||||
'Always' (1) reset or 'Never' (2) reset. Note that the device is always
|
||||
reset on driver unload if firmware was loaded when the driver was probed.
|
||||
This variable is configurable using the 'reset_dev_on_drv_probe'
|
||||
devlink parameter.
|
||||
|
||||
abi_drv_load_ifc
|
||||
Defines a list of PF devices allowed to load FW on the device.
|
||||
This variable is not currently user configurable.
|
||||
|
||||
Statistics
|
||||
==========
|
||||
|
||||
Following device statistics are available through the ``ethtool -S`` interface:
|
||||
|
||||
.. flat-table:: NFP device statistics
|
||||
:header-rows: 1
|
||||
:widths: 3 1 11
|
||||
|
||||
* - Name
|
||||
- ID
|
||||
- Meaning
|
||||
|
||||
* - dev_rx_discards
|
||||
- 1
|
||||
- Packet can be discarded on the RX path for one of the following reasons:
|
||||
|
||||
* The NIC is not in promisc mode, and the destination MAC address
|
||||
doesn't match the interfaces' MAC address.
|
||||
* The received packet is larger than the max buffer size on the host.
|
||||
I.e. it exceeds the Layer 3 MRU.
|
||||
* There is no freelist descriptor available on the host for the packet.
|
||||
It is likely that the NIC couldn't cache one in time.
|
||||
* A BPF program discarded the packet.
|
||||
* The datapath drop action was executed.
|
||||
* The MAC discarded the packet due to lack of ingress buffer space
|
||||
on the NIC.
|
||||
|
||||
* - dev_rx_errors
|
||||
- 2
|
||||
- A packet can be counted (and dropped) as RX error for the following
|
||||
reasons:
|
||||
|
||||
* A problem with the VEB lookup (only when SR-IOV is used).
|
||||
* A physical layer problem that causes Ethernet errors, like FCS or
|
||||
alignment errors. The cause is usually faulty cables or SFPs.
|
||||
|
||||
* - dev_rx_bytes
|
||||
- 3
|
||||
- Total number of bytes received.
|
||||
|
||||
* - dev_rx_uc_bytes
|
||||
- 4
|
||||
- Unicast bytes received.
|
||||
|
||||
* - dev_rx_mc_bytes
|
||||
- 5
|
||||
- Multicast bytes received.
|
||||
|
||||
* - dev_rx_bc_bytes
|
||||
- 6
|
||||
- Broadcast bytes received.
|
||||
|
||||
* - dev_rx_pkts
|
||||
- 7
|
||||
- Total number of packets received.
|
||||
|
||||
* - dev_rx_mc_pkts
|
||||
- 8
|
||||
- Multicast packets received.
|
||||
|
||||
* - dev_rx_bc_pkts
|
||||
- 9
|
||||
- Broadcast packets received.
|
||||
|
||||
* - dev_tx_discards
|
||||
- 10
|
||||
- A packet can be discarded in the TX direction if the MAC is
|
||||
being flow controlled and the NIC runs out of TX queue space.
|
||||
|
||||
* - dev_tx_errors
|
||||
- 11
|
||||
- A packet can be counted as TX error (and dropped) for one for the
|
||||
following reasons:
|
||||
|
||||
* The packet is an LSO segment, but the Layer 3 or Layer 4 offset
|
||||
could not be determined. Therefore LSO could not continue.
|
||||
* An invalid packet descriptor was received over PCIe.
|
||||
* The packet Layer 3 length exceeds the device MTU.
|
||||
* An error on the MAC/physical layer. Usually due to faulty cables or
|
||||
SFPs.
|
||||
* A CTM buffer could not be allocated.
|
||||
* The packet offset was incorrect and could not be fixed by the NIC.
|
||||
|
||||
* - dev_tx_bytes
|
||||
- 12
|
||||
- Total number of bytes transmitted.
|
||||
|
||||
* - dev_tx_uc_bytes
|
||||
- 13
|
||||
- Unicast bytes transmitted.
|
||||
|
||||
* - dev_tx_mc_bytes
|
||||
- 14
|
||||
- Multicast bytes transmitted.
|
||||
|
||||
* - dev_tx_bc_bytes
|
||||
- 15
|
||||
- Broadcast bytes transmitted.
|
||||
|
||||
* - dev_tx_pkts
|
||||
- 16
|
||||
- Total number of packets transmitted.
|
||||
|
||||
* - dev_tx_mc_pkts
|
||||
- 17
|
||||
- Multicast packets transmitted.
|
||||
|
||||
* - dev_tx_bc_pkts
|
||||
- 18
|
||||
- Broadcast packets transmitted.
|
||||
|
||||
Note that statistics unknown to the driver will be displayed as
|
||||
``dev_unknown_stat$ID``, where ``$ID`` refers to the second column
|
||||
above.
|
@@ -0,0 +1,274 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
========================================================
|
||||
Linux Driver for the Pensando(R) Ethernet adapter family
|
||||
========================================================
|
||||
|
||||
Pensando Linux Ethernet driver.
|
||||
Copyright(c) 2019 Pensando Systems, Inc
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Identifying the Adapter
|
||||
- Enabling the driver
|
||||
- Configuring the driver
|
||||
- Statistics
|
||||
- Support
|
||||
|
||||
Identifying the Adapter
|
||||
=======================
|
||||
|
||||
To find if one or more Pensando PCI Ethernet devices are installed on the
|
||||
host, check for the PCI devices::
|
||||
|
||||
$ lspci -d 1dd8:
|
||||
b5:00.0 Ethernet controller: Device 1dd8:1002
|
||||
b6:00.0 Ethernet controller: Device 1dd8:1002
|
||||
|
||||
If such devices are listed as above, then the ionic.ko driver should find
|
||||
and configure them for use. There should be log entries in the kernel
|
||||
messages such as these::
|
||||
|
||||
$ dmesg | grep ionic
|
||||
ionic 0000:b5:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
|
||||
ionic 0000:b5:00.0 enp181s0: renamed from eth0
|
||||
ionic 0000:b5:00.0 enp181s0: Link up - 100 Gbps
|
||||
ionic 0000:b6:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
|
||||
ionic 0000:b6:00.0 enp182s0: renamed from eth0
|
||||
ionic 0000:b6:00.0 enp182s0: Link up - 100 Gbps
|
||||
|
||||
Driver and firmware version information can be gathered with either of
|
||||
ethtool or devlink tools::
|
||||
|
||||
$ ethtool -i enp181s0
|
||||
driver: ionic
|
||||
version: 5.7.0
|
||||
firmware-version: 1.8.0-28
|
||||
...
|
||||
|
||||
$ devlink dev info pci/0000:b5:00.0
|
||||
pci/0000:b5:00.0:
|
||||
driver ionic
|
||||
serial_number FLM18420073
|
||||
versions:
|
||||
fixed:
|
||||
asic.id 0x0
|
||||
asic.rev 0x0
|
||||
running:
|
||||
fw 1.8.0-28
|
||||
|
||||
See Documentation/networking/devlink/ionic.rst for more information
|
||||
on the devlink dev info data.
|
||||
|
||||
Enabling the driver
|
||||
===================
|
||||
|
||||
The driver is enabled via the standard kernel configuration system,
|
||||
using the make command::
|
||||
|
||||
make oldconfig/menuconfig/etc.
|
||||
|
||||
The driver is located in the menu structure at:
|
||||
|
||||
-> Device Drivers
|
||||
-> Network device support (NETDEVICES [=y])
|
||||
-> Ethernet driver support
|
||||
-> Pensando devices
|
||||
-> Pensando Ethernet IONIC Support
|
||||
|
||||
Configuring the Driver
|
||||
======================
|
||||
|
||||
MTU
|
||||
---
|
||||
|
||||
Jumbo frame support is available with a maximim size of 9194 bytes.
|
||||
|
||||
Interrupt coalescing
|
||||
--------------------
|
||||
|
||||
Interrupt coalescing can be configured by changing the rx-usecs value with
|
||||
the "ethtool -C" command. The rx-usecs range is 0-190. The tx-usecs value
|
||||
reflects the rx-usecs value as they are tied together on the same interrupt.
|
||||
|
||||
SR-IOV
|
||||
------
|
||||
|
||||
Minimal SR-IOV support is currently offered and can be enabled by setting
|
||||
the sysfs 'sriov_numvfs' value, if supported by your particular firmware
|
||||
configuration.
|
||||
|
||||
Statistics
|
||||
==========
|
||||
|
||||
Basic hardware stats
|
||||
--------------------
|
||||
|
||||
The commands ``netstat -i``, ``ip -s link show``, and ``ifconfig`` show
|
||||
a limited set of statistics taken directly from firmware. For example::
|
||||
|
||||
$ ip -s link show enp181s0
|
||||
7: enp181s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
|
||||
link/ether 00:ae:cd:00:07:68 brd ff:ff:ff:ff:ff:ff
|
||||
RX: bytes packets errors dropped overrun mcast
|
||||
414 5 0 0 0 0
|
||||
TX: bytes packets errors dropped carrier collsns
|
||||
1384 18 0 0 0 0
|
||||
|
||||
ethtool -S
|
||||
----------
|
||||
|
||||
The statistics shown from the ``ethtool -S`` command includes a combination of
|
||||
driver counters and firmware counters, including port and queue specific values.
|
||||
The driver values are counters computed by the driver, and the firmware values
|
||||
are gathered by the firmware from the port hardware and passed through the
|
||||
driver with no further interpretation.
|
||||
|
||||
Driver port specific::
|
||||
|
||||
tx_packets: 12
|
||||
tx_bytes: 964
|
||||
rx_packets: 5
|
||||
rx_bytes: 414
|
||||
tx_tso: 0
|
||||
tx_tso_bytes: 0
|
||||
tx_csum_none: 12
|
||||
tx_csum: 0
|
||||
rx_csum_none: 0
|
||||
rx_csum_complete: 3
|
||||
rx_csum_error: 0
|
||||
|
||||
Driver queue specific::
|
||||
|
||||
tx_0_pkts: 3
|
||||
tx_0_bytes: 294
|
||||
tx_0_clean: 3
|
||||
tx_0_dma_map_err: 0
|
||||
tx_0_linearize: 0
|
||||
tx_0_frags: 0
|
||||
tx_0_tso: 0
|
||||
tx_0_tso_bytes: 0
|
||||
tx_0_csum_none: 3
|
||||
tx_0_csum: 0
|
||||
tx_0_vlan_inserted: 0
|
||||
rx_0_pkts: 2
|
||||
rx_0_bytes: 120
|
||||
rx_0_dma_map_err: 0
|
||||
rx_0_alloc_err: 0
|
||||
rx_0_csum_none: 0
|
||||
rx_0_csum_complete: 0
|
||||
rx_0_csum_error: 0
|
||||
rx_0_dropped: 0
|
||||
rx_0_vlan_stripped: 0
|
||||
|
||||
Firmware port specific::
|
||||
|
||||
hw_tx_dropped: 0
|
||||
hw_rx_dropped: 0
|
||||
hw_rx_over_errors: 0
|
||||
hw_rx_missed_errors: 0
|
||||
hw_tx_aborted_errors: 0
|
||||
frames_rx_ok: 15
|
||||
frames_rx_all: 15
|
||||
frames_rx_bad_fcs: 0
|
||||
frames_rx_bad_all: 0
|
||||
octets_rx_ok: 1290
|
||||
octets_rx_all: 1290
|
||||
frames_rx_unicast: 10
|
||||
frames_rx_multicast: 5
|
||||
frames_rx_broadcast: 0
|
||||
frames_rx_pause: 0
|
||||
frames_rx_bad_length: 0
|
||||
frames_rx_undersized: 0
|
||||
frames_rx_oversized: 0
|
||||
frames_rx_fragments: 0
|
||||
frames_rx_jabber: 0
|
||||
frames_rx_pripause: 0
|
||||
frames_rx_stomped_crc: 0
|
||||
frames_rx_too_long: 0
|
||||
frames_rx_vlan_good: 3
|
||||
frames_rx_dropped: 0
|
||||
frames_rx_less_than_64b: 0
|
||||
frames_rx_64b: 4
|
||||
frames_rx_65b_127b: 11
|
||||
frames_rx_128b_255b: 0
|
||||
frames_rx_256b_511b: 0
|
||||
frames_rx_512b_1023b: 0
|
||||
frames_rx_1024b_1518b: 0
|
||||
frames_rx_1519b_2047b: 0
|
||||
frames_rx_2048b_4095b: 0
|
||||
frames_rx_4096b_8191b: 0
|
||||
frames_rx_8192b_9215b: 0
|
||||
frames_rx_other: 0
|
||||
frames_tx_ok: 31
|
||||
frames_tx_all: 31
|
||||
frames_tx_bad: 0
|
||||
octets_tx_ok: 2614
|
||||
octets_tx_total: 2614
|
||||
frames_tx_unicast: 8
|
||||
frames_tx_multicast: 21
|
||||
frames_tx_broadcast: 2
|
||||
frames_tx_pause: 0
|
||||
frames_tx_pripause: 0
|
||||
frames_tx_vlan: 0
|
||||
frames_tx_less_than_64b: 0
|
||||
frames_tx_64b: 4
|
||||
frames_tx_65b_127b: 27
|
||||
frames_tx_128b_255b: 0
|
||||
frames_tx_256b_511b: 0
|
||||
frames_tx_512b_1023b: 0
|
||||
frames_tx_1024b_1518b: 0
|
||||
frames_tx_1519b_2047b: 0
|
||||
frames_tx_2048b_4095b: 0
|
||||
frames_tx_4096b_8191b: 0
|
||||
frames_tx_8192b_9215b: 0
|
||||
frames_tx_other: 0
|
||||
frames_tx_pri_0: 0
|
||||
frames_tx_pri_1: 0
|
||||
frames_tx_pri_2: 0
|
||||
frames_tx_pri_3: 0
|
||||
frames_tx_pri_4: 0
|
||||
frames_tx_pri_5: 0
|
||||
frames_tx_pri_6: 0
|
||||
frames_tx_pri_7: 0
|
||||
frames_rx_pri_0: 0
|
||||
frames_rx_pri_1: 0
|
||||
frames_rx_pri_2: 0
|
||||
frames_rx_pri_3: 0
|
||||
frames_rx_pri_4: 0
|
||||
frames_rx_pri_5: 0
|
||||
frames_rx_pri_6: 0
|
||||
frames_rx_pri_7: 0
|
||||
tx_pripause_0_1us_count: 0
|
||||
tx_pripause_1_1us_count: 0
|
||||
tx_pripause_2_1us_count: 0
|
||||
tx_pripause_3_1us_count: 0
|
||||
tx_pripause_4_1us_count: 0
|
||||
tx_pripause_5_1us_count: 0
|
||||
tx_pripause_6_1us_count: 0
|
||||
tx_pripause_7_1us_count: 0
|
||||
rx_pripause_0_1us_count: 0
|
||||
rx_pripause_1_1us_count: 0
|
||||
rx_pripause_2_1us_count: 0
|
||||
rx_pripause_3_1us_count: 0
|
||||
rx_pripause_4_1us_count: 0
|
||||
rx_pripause_5_1us_count: 0
|
||||
rx_pripause_6_1us_count: 0
|
||||
rx_pripause_7_1us_count: 0
|
||||
rx_pause_1us_count: 0
|
||||
frames_tx_truncated: 0
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
|
||||
For general Linux networking support, please use the netdev mailing
|
||||
list, which is monitored by Pensando personnel::
|
||||
|
||||
netdev@vger.kernel.org
|
||||
|
||||
For more specific support needs, please use the Pensando driver support
|
||||
email::
|
||||
|
||||
drivers@pensando.io
|
@@ -0,0 +1,48 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================
|
||||
SMC 9xxxx Driver
|
||||
================
|
||||
|
||||
Revision 0.12
|
||||
|
||||
3/5/96
|
||||
|
||||
Copyright 1996 Erik Stahlman
|
||||
|
||||
Released under terms of the GNU General Public License.
|
||||
|
||||
This file contains the instructions and caveats for my SMC9xxx driver. You
|
||||
should not be using the driver without reading this file.
|
||||
|
||||
Things to note about installation:
|
||||
|
||||
1. The driver should work on all kernels from 1.2.13 until 1.3.71.
|
||||
(A kernel patch is supplied for 1.3.71 )
|
||||
|
||||
2. If you include this into the kernel, you might need to change some
|
||||
options, such as for forcing IRQ.
|
||||
|
||||
|
||||
3. To compile as a module, run 'make'.
|
||||
Make will give you the appropriate options for various kernel support.
|
||||
|
||||
4. Loading the driver as a module::
|
||||
|
||||
use: insmod smc9194.o
|
||||
optional parameters:
|
||||
io=xxxx : your base address
|
||||
irq=xx : your irq
|
||||
ifport=x : 0 for whatever is default
|
||||
1 for twisted pair
|
||||
2 for AUI ( or BNC on some cards )
|
||||
|
||||
How to obtain the latest version?
|
||||
|
||||
FTP:
|
||||
ftp://fenris.campus.vt.edu/smc9/smc9-12.tar.gz
|
||||
ftp://sfbox.vt.edu/filebox/F/fenris/smc9/smc9-12.tar.gz
|
||||
|
||||
|
||||
Contacting me:
|
||||
erik@mail.vt.edu
|
@@ -0,0 +1,700 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
==============================================================
|
||||
Linux Driver for the Synopsys(R) Ethernet Controllers "stmmac"
|
||||
==============================================================
|
||||
|
||||
Authors: Giuseppe Cavallaro <peppe.cavallaro@st.com>,
|
||||
Alexandre Torgue <alexandre.torgue@st.com>, Jose Abreu <joabreu@synopsys.com>
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- In This Release
|
||||
- Feature List
|
||||
- Kernel Configuration
|
||||
- Command Line Parameters
|
||||
- Driver Information and Notes
|
||||
- Debug Information
|
||||
- Support
|
||||
|
||||
In This Release
|
||||
===============
|
||||
|
||||
This file describes the stmmac Linux Driver for all the Synopsys(R) Ethernet
|
||||
Controllers.
|
||||
|
||||
Currently, this network device driver is for all STi embedded MAC/GMAC
|
||||
(i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XILINX XC2V3000
|
||||
FF1152AMT0221 D1215994A VIRTEX FPGA board. The Synopsys Ethernet QoS 5.0 IPK
|
||||
is also supported.
|
||||
|
||||
DesignWare(R) Cores Ethernet MAC 10/100/1000 Universal version 3.70a
|
||||
(and older) and DesignWare(R) Cores Ethernet Quality-of-Service version 4.0
|
||||
(and upper) have been used for developing this driver as well as
|
||||
DesignWare(R) Cores XGMAC - 10G Ethernet MAC and DesignWare(R) Cores
|
||||
Enterprise MAC - 100G Ethernet MAC.
|
||||
|
||||
This driver supports both the platform bus and PCI.
|
||||
|
||||
This driver includes support for the following Synopsys(R) DesignWare(R)
|
||||
Cores Ethernet Controllers and corresponding minimum and maximum versions:
|
||||
|
||||
+-------------------------------+--------------+--------------+--------------+
|
||||
| Controller Name | Min. Version | Max. Version | Abbrev. Name |
|
||||
+===============================+==============+==============+==============+
|
||||
| Ethernet MAC Universal | N/A | 3.73a | GMAC |
|
||||
+-------------------------------+--------------+--------------+--------------+
|
||||
| Ethernet Quality-of-Service | 4.00a | N/A | GMAC4+ |
|
||||
+-------------------------------+--------------+--------------+--------------+
|
||||
| XGMAC - 10G Ethernet MAC | 2.10a | N/A | XGMAC2+ |
|
||||
+-------------------------------+--------------+--------------+--------------+
|
||||
| XLGMAC - 100G Ethernet MAC | 2.00a | N/A | XLGMAC2+ |
|
||||
+-------------------------------+--------------+--------------+--------------+
|
||||
|
||||
For questions related to hardware requirements, refer to the documentation
|
||||
supplied with your Ethernet adapter. All hardware requirements listed apply
|
||||
to use with Linux.
|
||||
|
||||
Feature List
|
||||
============
|
||||
|
||||
The following features are available in this driver:
|
||||
- GMII/MII/RGMII/SGMII/RMII/XGMII/XLGMII Interface
|
||||
- Half-Duplex / Full-Duplex Operation
|
||||
- Energy Efficient Ethernet (EEE)
|
||||
- IEEE 802.3x PAUSE Packets (Flow Control)
|
||||
- RMON/MIB Counters
|
||||
- IEEE 1588 Timestamping (PTP)
|
||||
- Pulse-Per-Second Output (PPS)
|
||||
- MDIO Clause 22 / Clause 45 Interface
|
||||
- MAC Loopback
|
||||
- ARP Offloading
|
||||
- Automatic CRC / PAD Insertion and Checking
|
||||
- Checksum Offload for Received and Transmitted Packets
|
||||
- Standard or Jumbo Ethernet Packets
|
||||
- Source Address Insertion / Replacement
|
||||
- VLAN TAG Insertion / Replacement / Deletion / Filtering (HASH and PERFECT)
|
||||
- Programmable TX and RX Watchdog and Coalesce Settings
|
||||
- Destination Address Filtering (PERFECT)
|
||||
- HASH Filtering (Multicast)
|
||||
- Layer 3 / Layer 4 Filtering
|
||||
- Remote Wake-Up Detection
|
||||
- Receive Side Scaling (RSS)
|
||||
- Frame Preemption for TX and RX
|
||||
- Programmable Burst Length, Threshold, Queue Size
|
||||
- Multiple Queues (up to 8)
|
||||
- Multiple Scheduling Algorithms (TX: WRR, DWRR, WFQ, SP, CBS, EST, TBS;
|
||||
RX: WRR, SP)
|
||||
- Flexible RX Parser
|
||||
- TCP / UDP Segmentation Offload (TSO, USO)
|
||||
- Split Header (SPH)
|
||||
- Safety Features (ECC Protection, Data Parity Protection)
|
||||
- Selftests using Ethtool
|
||||
|
||||
Kernel Configuration
|
||||
====================
|
||||
|
||||
The kernel configuration option is ``CONFIG_STMMAC_ETH``:
|
||||
- ``CONFIG_STMMAC_PLATFORM``: is to enable the platform driver.
|
||||
- ``CONFIG_STMMAC_PCI``: is to enable the pci driver.
|
||||
|
||||
Command Line Parameters
|
||||
=======================
|
||||
|
||||
If the driver is built as a module the following optional parameters are used
|
||||
by entering them on the command line with the modprobe command using this
|
||||
syntax (e.g. for PCI module)::
|
||||
|
||||
modprobe stmmac_pci [<option>=<VAL1>,<VAL2>,...]
|
||||
|
||||
Driver parameters can be also passed in command line by using::
|
||||
|
||||
stmmaceth=watchdog:100,chain_mode=1
|
||||
|
||||
The default value for each parameter is generally the recommended setting,
|
||||
unless otherwise noted.
|
||||
|
||||
watchdog
|
||||
--------
|
||||
:Valid Range: 5000-None
|
||||
:Default Value: 5000
|
||||
|
||||
This parameter overrides the transmit timeout in milliseconds.
|
||||
|
||||
debug
|
||||
-----
|
||||
:Valid Range: 0-16 (0=none,...,16=all)
|
||||
:Default Value: 0
|
||||
|
||||
This parameter adjusts the level of debug messages displayed in the system
|
||||
logs.
|
||||
|
||||
phyaddr
|
||||
-------
|
||||
:Valid Range: 0-31
|
||||
:Default Value: -1
|
||||
|
||||
This parameter overrides the physical address of the PHY device.
|
||||
|
||||
flow_ctrl
|
||||
---------
|
||||
:Valid Range: 0-3 (0=off,1=rx,2=tx,3=rx/tx)
|
||||
:Default Value: 3
|
||||
|
||||
This parameter changes the default Flow Control ability.
|
||||
|
||||
pause
|
||||
-----
|
||||
:Valid Range: 0-65535
|
||||
:Default Value: 65535
|
||||
|
||||
This parameter changes the default Flow Control Pause time.
|
||||
|
||||
tc
|
||||
--
|
||||
:Valid Range: 64-256
|
||||
:Default Value: 64
|
||||
|
||||
This parameter changes the default HW FIFO Threshold control value.
|
||||
|
||||
buf_sz
|
||||
------
|
||||
:Valid Range: 1536-16384
|
||||
:Default Value: 1536
|
||||
|
||||
This parameter changes the default RX DMA packet buffer size.
|
||||
|
||||
eee_timer
|
||||
---------
|
||||
:Valid Range: 0-None
|
||||
:Default Value: 1000
|
||||
|
||||
This parameter changes the default LPI TX Expiration time in milliseconds.
|
||||
|
||||
chain_mode
|
||||
----------
|
||||
:Valid Range: 0-1 (0=off,1=on)
|
||||
:Default Value: 0
|
||||
|
||||
This parameter changes the default mode of operation from Ring Mode to
|
||||
Chain Mode.
|
||||
|
||||
Driver Information and Notes
|
||||
============================
|
||||
|
||||
Transmit Process
|
||||
----------------
|
||||
|
||||
The xmit method is invoked when the kernel needs to transmit a packet; it sets
|
||||
the descriptors in the ring and informs the DMA engine that there is a packet
|
||||
ready to be transmitted.
|
||||
|
||||
By default, the driver sets the ``NETIF_F_SG`` bit in the features field of
|
||||
the ``net_device`` structure, enabling the scatter-gather feature. This is
|
||||
true on chips and configurations where the checksum can be done in hardware.
|
||||
|
||||
Once the controller has finished transmitting the packet, timer will be
|
||||
scheduled to release the transmit resources.
|
||||
|
||||
Receive Process
|
||||
---------------
|
||||
|
||||
When one or more packets are received, an interrupt happens. The interrupts
|
||||
are not queued, so the driver has to scan all the descriptors in the ring
|
||||
during the receive process.
|
||||
|
||||
This is based on NAPI, so the interrupt handler signals only if there is work
|
||||
to be done, and it exits. Then the poll method will be scheduled at some
|
||||
future point.
|
||||
|
||||
The incoming packets are stored, by the DMA, in a list of pre-allocated socket
|
||||
buffers in order to avoid the memcpy (zero-copy).
|
||||
|
||||
Interrupt Mitigation
|
||||
--------------------
|
||||
|
||||
The driver is able to mitigate the number of its DMA interrupts using NAPI for
|
||||
the reception on chips older than the 3.50. New chips have an HW RX Watchdog
|
||||
used for this mitigation.
|
||||
|
||||
Mitigation parameters can be tuned by ethtool.
|
||||
|
||||
WoL
|
||||
---
|
||||
|
||||
Wake up on Lan feature through Magic and Unicast frames are supported for the
|
||||
GMAC, GMAC4/5 and XGMAC core.
|
||||
|
||||
DMA Descriptors
|
||||
---------------
|
||||
|
||||
Driver handles both normal and alternate descriptors. The latter has been only
|
||||
tested on DesignWare(R) Cores Ethernet MAC Universal version 3.41a and later.
|
||||
|
||||
stmmac supports DMA descriptor to operate both in dual buffer (RING) and
|
||||
linked-list(CHAINED) mode. In RING each descriptor points to two data buffer
|
||||
pointers whereas in CHAINED mode they point to only one data buffer pointer.
|
||||
RING mode is the default.
|
||||
|
||||
In CHAINED mode each descriptor will have pointer to next descriptor in the
|
||||
list, hence creating the explicit chaining in the descriptor itself, whereas
|
||||
such explicit chaining is not possible in RING mode.
|
||||
|
||||
Extended Descriptors
|
||||
--------------------
|
||||
|
||||
The extended descriptors give us information about the Ethernet payload when
|
||||
it is carrying PTP packets or TCP/UDP/ICMP over IP. These are not available on
|
||||
GMAC Synopsys(R) chips older than the 3.50. At probe time the driver will
|
||||
decide if these can be actually used. This support also is mandatory for PTPv2
|
||||
because the extra descriptors are used for saving the hardware timestamps and
|
||||
Extended Status.
|
||||
|
||||
Ethtool Support
|
||||
---------------
|
||||
|
||||
Ethtool is supported. For example, driver statistics (including RMON),
|
||||
internal errors can be taken using::
|
||||
|
||||
ethtool -S ethX
|
||||
|
||||
Ethtool selftests are also supported. This allows to do some early sanity
|
||||
checks to the HW using MAC and PHY loopback mechanisms::
|
||||
|
||||
ethtool -t ethX
|
||||
|
||||
Jumbo and Segmentation Offloading
|
||||
---------------------------------
|
||||
|
||||
Jumbo frames are supported and tested for the GMAC. The GSO has been also
|
||||
added but it's performed in software. LRO is not supported.
|
||||
|
||||
TSO Support
|
||||
-----------
|
||||
|
||||
TSO (TCP Segmentation Offload) feature is supported by GMAC > 4.x and XGMAC
|
||||
chip family. When a packet is sent through TCP protocol, the TCP stack ensures
|
||||
that the SKB provided to the low level driver (stmmac in our case) matches
|
||||
with the maximum frame len (IP header + TCP header + payload <= 1500 bytes
|
||||
(for MTU set to 1500)). It means that if an application using TCP want to send
|
||||
a packet which will have a length (after adding headers) > 1514 the packet
|
||||
will be split in several TCP packets: The data payload is split and headers
|
||||
(TCP/IP ..) are added. It is done by software.
|
||||
|
||||
When TSO is enabled, the TCP stack doesn't care about the maximum frame length
|
||||
and provide SKB packet to stmmac as it is. The GMAC IP will have to perform
|
||||
the segmentation by it self to match with maximum frame length.
|
||||
|
||||
This feature can be enabled in device tree through ``snps,tso`` entry.
|
||||
|
||||
Energy Efficient Ethernet
|
||||
-------------------------
|
||||
|
||||
Energy Efficient Ethernet (EEE) enables IEEE 802.3 MAC sublayer along with a
|
||||
family of Physical layer to operate in the Low Power Idle (LPI) mode. The EEE
|
||||
mode supports the IEEE 802.3 MAC operation at 100Mbps, 1000Mbps and 1Gbps.
|
||||
|
||||
The LPI mode allows power saving by switching off parts of the communication
|
||||
device functionality when there is no data to be transmitted & received.
|
||||
The system on both the side of the link can disable some functionalities and
|
||||
save power during the period of low-link utilization. The MAC controls whether
|
||||
the system should enter or exit the LPI mode and communicate this to PHY.
|
||||
|
||||
As soon as the interface is opened, the driver verifies if the EEE can be
|
||||
supported. This is done by looking at both the DMA HW capability register and
|
||||
the PHY devices MCD registers.
|
||||
|
||||
To enter in TX LPI mode the driver needs to have a software timer that enable
|
||||
and disable the LPI mode when there is nothing to be transmitted.
|
||||
|
||||
Precision Time Protocol (PTP)
|
||||
-----------------------------
|
||||
|
||||
The driver supports the IEEE 1588-2002, Precision Time Protocol (PTP), which
|
||||
enables precise synchronization of clocks in measurement and control systems
|
||||
implemented with technologies such as network communication.
|
||||
|
||||
In addition to the basic timestamp features mentioned in IEEE 1588-2002
|
||||
Timestamps, new GMAC cores support the advanced timestamp features.
|
||||
IEEE 1588-2008 can be enabled when configuring the Kernel.
|
||||
|
||||
SGMII/RGMII Support
|
||||
-------------------
|
||||
|
||||
New GMAC devices provide own way to manage RGMII/SGMII. This information is
|
||||
available at run-time by looking at the HW capability register. This means
|
||||
that the stmmac can manage auto-negotiation and link status w/o using the
|
||||
PHYLIB stuff. In fact, the HW provides a subset of extended registers to
|
||||
restart the ANE, verify Full/Half duplex mode and Speed. Thanks to these
|
||||
registers, it is possible to look at the Auto-negotiated Link Parter Ability.
|
||||
|
||||
Physical
|
||||
--------
|
||||
|
||||
The driver is compatible with Physical Abstraction Layer to be connected with
|
||||
PHY and GPHY devices.
|
||||
|
||||
Platform Information
|
||||
--------------------
|
||||
|
||||
Several information can be passed through the platform and device-tree.
|
||||
|
||||
::
|
||||
|
||||
struct plat_stmmacenet_data {
|
||||
|
||||
1) Bus identifier::
|
||||
|
||||
int bus_id;
|
||||
|
||||
2) PHY Physical Address. If set to -1 the driver will pick the first PHY it
|
||||
finds::
|
||||
|
||||
int phy_addr;
|
||||
|
||||
3) PHY Device Interface::
|
||||
|
||||
int interface;
|
||||
|
||||
4) Specific platform fields for the MDIO bus::
|
||||
|
||||
struct stmmac_mdio_bus_data *mdio_bus_data;
|
||||
|
||||
5) Internal DMA parameters::
|
||||
|
||||
struct stmmac_dma_cfg *dma_cfg;
|
||||
|
||||
6) Fixed CSR Clock Range selection::
|
||||
|
||||
int clk_csr;
|
||||
|
||||
7) HW uses the GMAC core::
|
||||
|
||||
int has_gmac;
|
||||
|
||||
8) If set the MAC will use Enhanced Descriptors::
|
||||
|
||||
int enh_desc;
|
||||
|
||||
9) Core is able to perform TX Checksum and/or RX Checksum in HW::
|
||||
|
||||
int tx_coe;
|
||||
int rx_coe;
|
||||
|
||||
11) Some HWs are not able to perform the csum in HW for over-sized frames due
|
||||
to limited buffer sizes. Setting this flag the csum will be done in SW on
|
||||
JUMBO frames::
|
||||
|
||||
int bugged_jumbo;
|
||||
|
||||
12) Core has the embedded power module::
|
||||
|
||||
int pmt;
|
||||
|
||||
13) Force DMA to use the Store and Forward mode or Threshold mode::
|
||||
|
||||
int force_sf_dma_mode;
|
||||
int force_thresh_dma_mode;
|
||||
|
||||
15) Force to disable the RX Watchdog feature and switch to NAPI mode::
|
||||
|
||||
int riwt_off;
|
||||
|
||||
16) Limit the maximum operating speed and MTU::
|
||||
|
||||
int max_speed;
|
||||
int maxmtu;
|
||||
|
||||
18) Number of Multicast/Unicast filters::
|
||||
|
||||
int multicast_filter_bins;
|
||||
int unicast_filter_entries;
|
||||
|
||||
20) Limit the maximum TX and RX FIFO size::
|
||||
|
||||
int tx_fifo_size;
|
||||
int rx_fifo_size;
|
||||
|
||||
21) Use the specified number of TX and RX Queues::
|
||||
|
||||
u32 rx_queues_to_use;
|
||||
u32 tx_queues_to_use;
|
||||
|
||||
22) Use the specified TX and RX scheduling algorithm::
|
||||
|
||||
u8 rx_sched_algorithm;
|
||||
u8 tx_sched_algorithm;
|
||||
|
||||
23) Internal TX and RX Queue parameters::
|
||||
|
||||
struct stmmac_rxq_cfg rx_queues_cfg[MTL_MAX_RX_QUEUES];
|
||||
struct stmmac_txq_cfg tx_queues_cfg[MTL_MAX_TX_QUEUES];
|
||||
|
||||
24) This callback is used for modifying some syscfg registers (on ST SoCs)
|
||||
according to the link speed negotiated by the physical layer::
|
||||
|
||||
void (*fix_mac_speed)(void *priv, unsigned int speed);
|
||||
|
||||
25) Callbacks used for calling a custom initialization; This is sometimes
|
||||
necessary on some platforms (e.g. ST boxes) where the HW needs to have set
|
||||
some PIO lines or system cfg registers. init/exit callbacks should not use
|
||||
or modify platform data::
|
||||
|
||||
int (*init)(struct platform_device *pdev, void *priv);
|
||||
void (*exit)(struct platform_device *pdev, void *priv);
|
||||
|
||||
26) Perform HW setup of the bus. For example, on some ST platforms this field
|
||||
is used to configure the AMBA bridge to generate more efficient STBus traffic::
|
||||
|
||||
struct mac_device_info *(*setup)(void *priv);
|
||||
void *bsp_priv;
|
||||
|
||||
27) Internal clocks and rates::
|
||||
|
||||
struct clk *stmmac_clk;
|
||||
struct clk *pclk;
|
||||
struct clk *clk_ptp_ref;
|
||||
unsigned int clk_ptp_rate;
|
||||
unsigned int clk_ref_rate;
|
||||
s32 ptp_max_adj;
|
||||
|
||||
28) Main reset::
|
||||
|
||||
struct reset_control *stmmac_rst;
|
||||
|
||||
29) AXI Internal Parameters::
|
||||
|
||||
struct stmmac_axi *axi;
|
||||
|
||||
30) HW uses GMAC>4 cores::
|
||||
|
||||
int has_gmac4;
|
||||
|
||||
31) HW is sun8i based::
|
||||
|
||||
bool has_sun8i;
|
||||
|
||||
32) Enables TSO feature::
|
||||
|
||||
bool tso_en;
|
||||
|
||||
33) Enables Receive Side Scaling (RSS) feature::
|
||||
|
||||
int rss_en;
|
||||
|
||||
34) MAC Port selection::
|
||||
|
||||
int mac_port_sel_speed;
|
||||
|
||||
35) Enables TX LPI Clock Gating::
|
||||
|
||||
bool en_tx_lpi_clockgating;
|
||||
|
||||
36) HW uses XGMAC>2.10 cores::
|
||||
|
||||
int has_xgmac;
|
||||
|
||||
::
|
||||
|
||||
}
|
||||
|
||||
For MDIO bus data, we have:
|
||||
|
||||
::
|
||||
|
||||
struct stmmac_mdio_bus_data {
|
||||
|
||||
1) PHY mask passed when MDIO bus is registered::
|
||||
|
||||
unsigned int phy_mask;
|
||||
|
||||
2) List of IRQs, one per PHY::
|
||||
|
||||
int *irqs;
|
||||
|
||||
3) If IRQs is NULL, use this for probed PHY::
|
||||
|
||||
int probed_phy_irq;
|
||||
|
||||
4) Set to true if PHY needs reset::
|
||||
|
||||
bool needs_reset;
|
||||
|
||||
::
|
||||
|
||||
}
|
||||
|
||||
For DMA engine configuration, we have:
|
||||
|
||||
::
|
||||
|
||||
struct stmmac_dma_cfg {
|
||||
|
||||
1) Programmable Burst Length (TX and RX)::
|
||||
|
||||
int pbl;
|
||||
|
||||
2) If set, DMA TX / RX will use this value rather than pbl::
|
||||
|
||||
int txpbl;
|
||||
int rxpbl;
|
||||
|
||||
3) Enable 8xPBL::
|
||||
|
||||
bool pblx8;
|
||||
|
||||
4) Enable Fixed or Mixed burst::
|
||||
|
||||
int fixed_burst;
|
||||
int mixed_burst;
|
||||
|
||||
5) Enable Address Aligned Beats::
|
||||
|
||||
bool aal;
|
||||
|
||||
6) Enable Enhanced Addressing (> 32 bits)::
|
||||
|
||||
bool eame;
|
||||
|
||||
::
|
||||
|
||||
}
|
||||
|
||||
For DMA AXI parameters, we have:
|
||||
|
||||
::
|
||||
|
||||
struct stmmac_axi {
|
||||
|
||||
1) Enable AXI LPI::
|
||||
|
||||
bool axi_lpi_en;
|
||||
bool axi_xit_frm;
|
||||
|
||||
2) Set AXI Write / Read maximum outstanding requests::
|
||||
|
||||
u32 axi_wr_osr_lmt;
|
||||
u32 axi_rd_osr_lmt;
|
||||
|
||||
3) Set AXI 4KB bursts::
|
||||
|
||||
bool axi_kbbe;
|
||||
|
||||
4) Set AXI maximum burst length map::
|
||||
|
||||
u32 axi_blen[AXI_BLEN];
|
||||
|
||||
5) Set AXI Fixed burst / mixed burst::
|
||||
|
||||
bool axi_fb;
|
||||
bool axi_mb;
|
||||
|
||||
6) Set AXI rebuild incrx mode::
|
||||
|
||||
bool axi_rb;
|
||||
|
||||
::
|
||||
|
||||
}
|
||||
|
||||
For the RX Queues configuration, we have:
|
||||
|
||||
::
|
||||
|
||||
struct stmmac_rxq_cfg {
|
||||
|
||||
1) Mode to use (DCB or AVB)::
|
||||
|
||||
u8 mode_to_use;
|
||||
|
||||
2) DMA channel to use::
|
||||
|
||||
u32 chan;
|
||||
|
||||
3) Packet routing, if applicable::
|
||||
|
||||
u8 pkt_route;
|
||||
|
||||
4) Use priority routing, and priority to route::
|
||||
|
||||
bool use_prio;
|
||||
u32 prio;
|
||||
|
||||
::
|
||||
|
||||
}
|
||||
|
||||
For the TX Queues configuration, we have:
|
||||
|
||||
::
|
||||
|
||||
struct stmmac_txq_cfg {
|
||||
|
||||
1) Queue weight in scheduler::
|
||||
|
||||
u32 weight;
|
||||
|
||||
2) Mode to use (DCB or AVB)::
|
||||
|
||||
u8 mode_to_use;
|
||||
|
||||
3) Credit Base Shaper Parameters::
|
||||
|
||||
u32 send_slope;
|
||||
u32 idle_slope;
|
||||
u32 high_credit;
|
||||
u32 low_credit;
|
||||
|
||||
4) Use priority scheduling, and priority::
|
||||
|
||||
bool use_prio;
|
||||
u32 prio;
|
||||
|
||||
::
|
||||
|
||||
}
|
||||
|
||||
Device Tree Information
|
||||
-----------------------
|
||||
|
||||
Please refer to the following document:
|
||||
Documentation/devicetree/bindings/net/snps,dwmac.yaml
|
||||
|
||||
HW Capabilities
|
||||
---------------
|
||||
|
||||
Note that, starting from new chips, where it is available the HW capability
|
||||
register, many configurations are discovered at run-time for example to
|
||||
understand if EEE, HW csum, PTP, enhanced descriptor etc are actually
|
||||
available. As strategy adopted in this driver, the information from the HW
|
||||
capability register can replace what has been passed from the platform.
|
||||
|
||||
Debug Information
|
||||
=================
|
||||
|
||||
The driver exports many information i.e. internal statistics, debug
|
||||
information, MAC and DMA registers etc.
|
||||
|
||||
These can be read in several ways depending on the type of the information
|
||||
actually needed.
|
||||
|
||||
For example a user can be use the ethtool support to get statistics: e.g.
|
||||
using: ``ethtool -S ethX`` (that shows the Management counters (MMC) if
|
||||
supported) or sees the MAC/DMA registers: e.g. using: ``ethtool -d ethX``
|
||||
|
||||
Compiling the Kernel with ``CONFIG_DEBUG_FS`` the driver will export the
|
||||
following debugfs entries:
|
||||
|
||||
- ``descriptors_status``: To show the DMA TX/RX descriptor rings
|
||||
- ``dma_cap``: To show the HW Capabilities
|
||||
|
||||
Developer can also use the ``debug`` module parameter to get further debug
|
||||
information (please see: NETIF Msg Level).
|
||||
|
||||
Support
|
||||
=======
|
||||
|
||||
If an issue is identified with the released source code on a supported kernel
|
||||
with a supported adapter, email the specific information related to the
|
||||
issue to netdev@vger.kernel.org
|
587
Documentation/networking/device_drivers/ethernet/ti/cpsw.rst
Normal file
587
Documentation/networking/device_drivers/ethernet/ti/cpsw.rst
Normal file
@@ -0,0 +1,587 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================================
|
||||
Texas Instruments CPSW ethernet driver
|
||||
======================================
|
||||
|
||||
Multiqueue & CBS & MQPRIO
|
||||
=========================
|
||||
|
||||
|
||||
The cpsw has 3 CBS shapers for each external ports. This document
|
||||
describes MQPRIO and CBS Qdisc offload configuration for cpsw driver
|
||||
based on examples. It potentially can be used in audio video bridging
|
||||
(AVB) and time sensitive networking (TSN).
|
||||
|
||||
The following examples were tested on AM572x EVM and BBB boards.
|
||||
|
||||
Test setup
|
||||
==========
|
||||
|
||||
Under consideration two examples with AM572x EVM running cpsw driver
|
||||
in dual_emac mode.
|
||||
|
||||
Several prerequisites:
|
||||
|
||||
- TX queues must be rated starting from txq0 that has highest priority
|
||||
- Traffic classes are used starting from 0, that has highest priority
|
||||
- CBS shapers should be used with rated queues
|
||||
- The bandwidth for CBS shapers has to be set a little bit more then
|
||||
potential incoming rate, thus, rate of all incoming tx queues has
|
||||
to be a little less
|
||||
- Real rates can differ, due to discreetness
|
||||
- Map skb-priority to txq is not enough, also skb-priority to l2 prio
|
||||
map has to be created with ip or vconfig tool
|
||||
- Any l2/socket prio (0 - 7) for classes can be used, but for
|
||||
simplicity default values are used: 3 and 2
|
||||
- only 2 classes tested: A and B, but checked and can work with more,
|
||||
maximum allowed 4, but only for 3 rate can be set.
|
||||
|
||||
Test setup for examples
|
||||
=======================
|
||||
|
||||
::
|
||||
|
||||
+-------------------------------+
|
||||
|--+ |
|
||||
| | Workstation0 |
|
||||
|E | MAC 18:03:73:66:87:42 |
|
||||
+-----------------------------+ +--|t | |
|
||||
| | 1 | E | | |h |./tsn_listener -d \ |
|
||||
| Target board: | 0 | t |--+ |0 | 18:03:73:66:87:42 -i eth0 \|
|
||||
| AM572x EVM | 0 | h | | | -s 1500 |
|
||||
| | 0 | 0 | |--+ |
|
||||
| Only 2 classes: |Mb +---| +-------------------------------+
|
||||
| class A, class B | |
|
||||
| | +---| +-------------------------------+
|
||||
| | 1 | E | |--+ |
|
||||
| | 0 | t | | | Workstation1 |
|
||||
| | 0 | h |--+ |E | MAC 20:cf:30:85:7d:fd |
|
||||
| |Mb | 1 | +--|t | |
|
||||
+-----------------------------+ |h |./tsn_listener -d \ |
|
||||
|0 | 20:cf:30:85:7d:fd -i eth0 \|
|
||||
| | -s 1500 |
|
||||
|--+ |
|
||||
+-------------------------------+
|
||||
|
||||
|
||||
Example 1: One port tx AVB configuration scheme for target board
|
||||
----------------------------------------------------------------
|
||||
|
||||
(prints and scheme for AM572x evm, applicable for single port boards)
|
||||
|
||||
- tc - traffic class
|
||||
- txq - transmit queue
|
||||
- p - priority
|
||||
- f - fifo (cpsw fifo)
|
||||
- S - shaper configured
|
||||
|
||||
::
|
||||
|
||||
+------------------------------------------------------------------+ u
|
||||
| +---------------+ +---------------+ +------+ +------+ | s
|
||||
| | | | | | | | | | e
|
||||
| | App 1 | | App 2 | | Apps | | Apps | | r
|
||||
| | Class A | | Class B | | Rest | | Rest | |
|
||||
| | Eth0 | | Eth0 | | Eth0 | | Eth1 | | s
|
||||
| | VLAN100 | | VLAN100 | | | | | | | | p
|
||||
| | 40 Mb/s | | 20 Mb/s | | | | | | | | a
|
||||
| | SO_PRIORITY=3 | | SO_PRIORITY=2 | | | | | | | | c
|
||||
| | | | | | | | | | | | | | e
|
||||
| +---|-----------+ +---|-----------+ +---|--+ +---|--+ |
|
||||
+-----|------------------|------------------|--------|-------------+
|
||||
+-+ +------------+ | |
|
||||
| | +-----------------+ +--+
|
||||
| | | |
|
||||
+---|-------|-------------|-----------------------|----------------+
|
||||
| +----+ +----+ +----+ +----+ +----+ |
|
||||
| | p3 | | p2 | | p1 | | p0 | | p0 | | k
|
||||
| \ / \ / \ / \ / \ / | e
|
||||
| \ / \ / \ / \ / \ / | r
|
||||
| \/ \/ \/ \/ \/ | n
|
||||
| | | | | | e
|
||||
| | | +-----+ | | l
|
||||
| | | | | |
|
||||
| +----+ +----+ +----+ +----+ | s
|
||||
| |tc0 | |tc1 | |tc2 | |tc0 | | p
|
||||
| \ / \ / \ / \ / | a
|
||||
| \ / \ / \ / \ / | c
|
||||
| \/ \/ \/ \/ | e
|
||||
| | | +-----+ | |
|
||||
| | | | | | |
|
||||
| | | | | | |
|
||||
| | | | | | |
|
||||
| +----+ +----+ +----+ +----+ +----+ |
|
||||
| |txq0| |txq1| |txq2| |txq3| |txq4| |
|
||||
| \ / \ / \ / \ / \ / |
|
||||
| \ / \ / \ / \ / \ / |
|
||||
| \/ \/ \/ \/ \/ |
|
||||
| +-|------|------|------|--+ +--|--------------+ |
|
||||
| | | | | | | Eth0.100 | | Eth1 | |
|
||||
+---|------|------|------|------------------------|----------------+
|
||||
| | | | |
|
||||
p p p p |
|
||||
3 2 0-1, 4-7 <- L2 priority |
|
||||
| | | | |
|
||||
| | | | |
|
||||
+---|------|------|------|------------------------|----------------+
|
||||
| | | | | |----------+ |
|
||||
| +----+ +----+ +----+ +----+ +----+ |
|
||||
| |dma7| |dma6| |dma5| |dma4| |dma3| |
|
||||
| \ / \ / \ / \ / \ / | c
|
||||
| \S / \S / \ / \ / \ / | p
|
||||
| \/ \/ \/ \/ \/ | s
|
||||
| | | | +----- | | w
|
||||
| | | | | | |
|
||||
| | | | | | | d
|
||||
| +----+ +----+ +----+p p+----+ | r
|
||||
| | | | | | |o o| | | i
|
||||
| | f3 | | f2 | | f0 |r r| f0 | | v
|
||||
| |tc0 | |tc1 | |tc2 |t t|tc0 | | e
|
||||
| \CBS / \CBS / \CBS /1 2\CBS / | r
|
||||
| \S / \S / \ / \ / |
|
||||
| \/ \/ \/ \/ |
|
||||
+------------------------------------------------------------------+
|
||||
|
||||
|
||||
1) ::
|
||||
|
||||
|
||||
// Add 4 tx queues, for interface Eth0, and 1 tx queue for Eth1
|
||||
$ ethtool -L eth0 rx 1 tx 5
|
||||
rx unmodified, ignoring
|
||||
|
||||
2) ::
|
||||
|
||||
// Check if num of queues is set correctly:
|
||||
$ ethtool -l eth0
|
||||
Channel parameters for eth0:
|
||||
Pre-set maximums:
|
||||
RX: 8
|
||||
TX: 8
|
||||
Other: 0
|
||||
Combined: 0
|
||||
Current hardware settings:
|
||||
RX: 1
|
||||
TX: 5
|
||||
Other: 0
|
||||
Combined: 0
|
||||
|
||||
3) ::
|
||||
|
||||
// TX queues must be rated starting from 0, so set bws for tx0 and tx1
|
||||
// Set rates 40 and 20 Mb/s appropriately.
|
||||
// Pay attention, real speed can differ a bit due to discreetness.
|
||||
// Leave last 2 tx queues not rated.
|
||||
$ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
|
||||
$ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
|
||||
|
||||
4) ::
|
||||
|
||||
// Check maximum rate of tx (cpdma) queues:
|
||||
$ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
|
||||
40
|
||||
20
|
||||
0
|
||||
0
|
||||
0
|
||||
|
||||
5) ::
|
||||
|
||||
// Map skb->priority to traffic class:
|
||||
// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
|
||||
// Map traffic class to transmit queue:
|
||||
// tc0 -> txq0, tc1 -> txq1, tc2 -> (txq2, txq3)
|
||||
$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
|
||||
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1
|
||||
|
||||
5a) ::
|
||||
|
||||
// As two interface sharing same set of tx queues, assign all traffic
|
||||
// coming to interface Eth1 to separate queue in order to not mix it
|
||||
// with traffic from interface Eth0, so use separate txq to send
|
||||
// packets to Eth1, so all prio -> tc0 and tc0 -> txq4
|
||||
// Here hw 0, so here still default configuration for eth1 in hw
|
||||
$ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 1 \
|
||||
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@4 hw 0
|
||||
|
||||
6) ::
|
||||
|
||||
// Check classes settings
|
||||
$ tc -g class show dev eth0
|
||||
+---(100:ffe2) mqprio
|
||||
| +---(100:3) mqprio
|
||||
| +---(100:4) mqprio
|
||||
|
|
||||
+---(100:ffe1) mqprio
|
||||
| +---(100:2) mqprio
|
||||
|
|
||||
+---(100:ffe0) mqprio
|
||||
+---(100:1) mqprio
|
||||
|
||||
$ tc -g class show dev eth1
|
||||
+---(100:ffe0) mqprio
|
||||
+---(100:5) mqprio
|
||||
|
||||
7) ::
|
||||
|
||||
// Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc
|
||||
// Set it +1 Mb for reserve (important!)
|
||||
// here only idle slope is important, others arg are ignored
|
||||
// Pay attention, real speed can differ a bit due to discreetness
|
||||
$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1438 \
|
||||
hicredit 62 sendslope -959000 idleslope 41000 offload 1
|
||||
net eth0: set FIFO3 bw = 50
|
||||
|
||||
8) ::
|
||||
|
||||
// Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc:
|
||||
// Set it +1 Mb for reserve (important!)
|
||||
$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1468 \
|
||||
hicredit 65 sendslope -979000 idleslope 21000 offload 1
|
||||
net eth0: set FIFO2 bw = 30
|
||||
|
||||
9) ::
|
||||
|
||||
// Create vlan 100 to map sk->priority to vlan qos
|
||||
$ ip link add link eth0 name eth0.100 type vlan id 100
|
||||
8021q: 802.1Q VLAN Support v1.8
|
||||
8021q: adding VLAN 0 to HW filter on device eth0
|
||||
8021q: adding VLAN 0 to HW filter on device eth1
|
||||
net eth0: Adding vlanid 100 to vlan filter
|
||||
|
||||
10) ::
|
||||
|
||||
// Map skb->priority to L2 prio, 1 to 1
|
||||
$ ip link set eth0.100 type vlan \
|
||||
egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
|
||||
|
||||
11) ::
|
||||
|
||||
// Check egress map for vlan 100
|
||||
$ cat /proc/net/vlan/eth0.100
|
||||
[...]
|
||||
INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
|
||||
EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
|
||||
|
||||
12) ::
|
||||
|
||||
// Run your appropriate tools with socket option "SO_PRIORITY"
|
||||
// to 3 for class A and/or to 2 for class B
|
||||
// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
|
||||
./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
|
||||
./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
|
||||
|
||||
13) ::
|
||||
|
||||
// run your listener on workstation (should be in same vlan)
|
||||
// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
|
||||
./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39000 kbps
|
||||
|
||||
14) ::
|
||||
|
||||
// Restore default configuration if needed
|
||||
$ ip link del eth0.100
|
||||
$ tc qdisc del dev eth1 root
|
||||
$ tc qdisc del dev eth0 root
|
||||
net eth0: Prev FIFO2 is shaped
|
||||
net eth0: set FIFO3 bw = 0
|
||||
net eth0: set FIFO2 bw = 0
|
||||
$ ethtool -L eth0 rx 1 tx 1
|
||||
|
||||
Example 2: Two port tx AVB configuration scheme for target board
|
||||
----------------------------------------------------------------
|
||||
|
||||
(prints and scheme for AM572x evm, for dual emac boards only)
|
||||
|
||||
::
|
||||
|
||||
+------------------------------------------------------------------+ u
|
||||
| +----------+ +----------+ +------+ +----------+ +----------+ | s
|
||||
| | | | | | | | | | | | e
|
||||
| | App 1 | | App 2 | | Apps | | App 3 | | App 4 | | r
|
||||
| | Class A | | Class B | | Rest | | Class B | | Class A | |
|
||||
| | Eth0 | | Eth0 | | | | | Eth1 | | Eth1 | | s
|
||||
| | VLAN100 | | VLAN100 | | | | | VLAN100 | | VLAN100 | | p
|
||||
| | 40 Mb/s | | 20 Mb/s | | | | | 10 Mb/s | | 30 Mb/s | | a
|
||||
| | SO_PRI=3 | | SO_PRI=2 | | | | | SO_PRI=3 | | SO_PRI=2 | | c
|
||||
| | | | | | | | | | | | | | | | | e
|
||||
| +---|------+ +---|------+ +---|--+ +---|------+ +---|------+ |
|
||||
+-----|-------------|-------------|---------|-------------|--------+
|
||||
+-+ +-------+ | +----------+ +----+
|
||||
| | +-------+------+ | |
|
||||
| | | | | |
|
||||
+---|-------|-------------|--------------|-------------|-------|---+
|
||||
| +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
|
||||
| | p3 | | p2 | | p1 | | p0 | | p0 | | p1 | | p2 | | p3 | | k
|
||||
| \ / \ / \ / \ / \ / \ / \ / \ / | e
|
||||
| \ / \ / \ / \ / \ / \ / \ / \ / | r
|
||||
| \/ \/ \/ \/ \/ \/ \/ \/ | n
|
||||
| | | | | | | | e
|
||||
| | | +----+ +----+ | | | l
|
||||
| | | | | | | |
|
||||
| +----+ +----+ +----+ +----+ +----+ +----+ | s
|
||||
| |tc0 | |tc1 | |tc2 | |tc2 | |tc1 | |tc0 | | p
|
||||
| \ / \ / \ / \ / \ / \ / | a
|
||||
| \ / \ / \ / \ / \ / \ / | c
|
||||
| \/ \/ \/ \/ \/ \/ | e
|
||||
| | | +-----+ +-----+ | | |
|
||||
| | | | | | | | | |
|
||||
| | | | | | | | | |
|
||||
| | | | | E E | | | | |
|
||||
| +----+ +----+ +----+ +----+ t t +----+ +----+ +----+ +----+ |
|
||||
| |txq0| |txq1| |txq4| |txq5| h h |txq6| |txq7| |txq3| |txq2| |
|
||||
| \ / \ / \ / \ / 0 1 \ / \ / \ / \ / |
|
||||
| \ / \ / \ / \ / . . \ / \ / \ / \ / |
|
||||
| \/ \/ \/ \/ 1 1 \/ \/ \/ \/ |
|
||||
| +-|------|------|------|--+ 0 0 +-|------|------|------|--+ |
|
||||
| | | | | | | 0 0 | | | | | | |
|
||||
+---|------|------|------|---------------|------|------|------|----+
|
||||
| | | | | | | |
|
||||
p p p p p p p p
|
||||
3 2 0-1, 4-7 <-L2 pri-> 0-1, 4-7 2 3
|
||||
| | | | | | | |
|
||||
| | | | | | | |
|
||||
+---|------|------|------|---------------|------|------|------|----+
|
||||
| | | | | | | | | |
|
||||
| +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
|
||||
| |dma7| |dma6| |dma3| |dma2| |dma1| |dma0| |dma4| |dma5| |
|
||||
| \ / \ / \ / \ / \ / \ / \ / \ / | c
|
||||
| \S / \S / \ / \ / \ / \ / \S / \S / | p
|
||||
| \/ \/ \/ \/ \/ \/ \/ \/ | s
|
||||
| | | | +----- | | | | | w
|
||||
| | | | | +----+ | | | |
|
||||
| | | | | | | | | | d
|
||||
| +----+ +----+ +----+p p+----+ +----+ +----+ | r
|
||||
| | | | | | |o o| | | | | | | i
|
||||
| | f3 | | f2 | | f0 |r CPSW r| f3 | | f2 | | f0 | | v
|
||||
| |tc0 | |tc1 | |tc2 |t t|tc0 | |tc1 | |tc2 | | e
|
||||
| \CBS / \CBS / \CBS /1 2\CBS / \CBS / \CBS / | r
|
||||
| \S / \S / \ / \S / \S / \ / |
|
||||
| \/ \/ \/ \/ \/ \/ |
|
||||
+------------------------------------------------------------------+
|
||||
========================================Eth==========================>
|
||||
|
||||
1) ::
|
||||
|
||||
// Add 8 tx queues, for interface Eth0, but they are common, so are accessed
|
||||
// by two interfaces Eth0 and Eth1.
|
||||
$ ethtool -L eth1 rx 1 tx 8
|
||||
rx unmodified, ignoring
|
||||
|
||||
2) ::
|
||||
|
||||
// Check if num of queues is set correctly:
|
||||
$ ethtool -l eth0
|
||||
Channel parameters for eth0:
|
||||
Pre-set maximums:
|
||||
RX: 8
|
||||
TX: 8
|
||||
Other: 0
|
||||
Combined: 0
|
||||
Current hardware settings:
|
||||
RX: 1
|
||||
TX: 8
|
||||
Other: 0
|
||||
Combined: 0
|
||||
|
||||
3) ::
|
||||
|
||||
// TX queues must be rated starting from 0, so set bws for tx0 and tx1 for Eth0
|
||||
// and for tx2 and tx3 for Eth1. That is, rates 40 and 20 Mb/s appropriately
|
||||
// for Eth0 and 30 and 10 Mb/s for Eth1.
|
||||
// Real speed can differ a bit due to discreetness
|
||||
// Leave last 4 tx queues as not rated
|
||||
$ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
|
||||
$ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
|
||||
$ echo 30 > /sys/class/net/eth1/queues/tx-2/tx_maxrate
|
||||
$ echo 10 > /sys/class/net/eth1/queues/tx-3/tx_maxrate
|
||||
|
||||
4) ::
|
||||
|
||||
// Check maximum rate of tx (cpdma) queues:
|
||||
$ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
|
||||
40
|
||||
20
|
||||
30
|
||||
10
|
||||
0
|
||||
0
|
||||
0
|
||||
0
|
||||
|
||||
5) ::
|
||||
|
||||
// Map skb->priority to traffic class for Eth0:
|
||||
// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
|
||||
// Map traffic class to transmit queue:
|
||||
// tc0 -> txq0, tc1 -> txq1, tc2 -> (txq4, txq5)
|
||||
$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
|
||||
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@4 hw 1
|
||||
|
||||
6) ::
|
||||
|
||||
// Check classes settings
|
||||
$ tc -g class show dev eth0
|
||||
+---(100:ffe2) mqprio
|
||||
| +---(100:5) mqprio
|
||||
| +---(100:6) mqprio
|
||||
|
|
||||
+---(100:ffe1) mqprio
|
||||
| +---(100:2) mqprio
|
||||
|
|
||||
+---(100:ffe0) mqprio
|
||||
+---(100:1) mqprio
|
||||
|
||||
7) ::
|
||||
|
||||
// Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc for Eth0
|
||||
// here only idle slope is important, others ignored
|
||||
// Real speed can differ a bit due to discreetness
|
||||
$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1470 \
|
||||
hicredit 62 sendslope -959000 idleslope 41000 offload 1
|
||||
net eth0: set FIFO3 bw = 50
|
||||
|
||||
8) ::
|
||||
|
||||
// Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc for Eth0
|
||||
$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1470 \
|
||||
hicredit 65 sendslope -979000 idleslope 21000 offload 1
|
||||
net eth0: set FIFO2 bw = 30
|
||||
|
||||
9) ::
|
||||
|
||||
// Create vlan 100 to map sk->priority to vlan qos for Eth0
|
||||
$ ip link add link eth0 name eth0.100 type vlan id 100
|
||||
net eth0: Adding vlanid 100 to vlan filter
|
||||
|
||||
10) ::
|
||||
|
||||
// Map skb->priority to L2 prio for Eth0.100, one to one
|
||||
$ ip link set eth0.100 type vlan \
|
||||
egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
|
||||
|
||||
11) ::
|
||||
|
||||
// Check egress map for vlan 100
|
||||
$ cat /proc/net/vlan/eth0.100
|
||||
[...]
|
||||
INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
|
||||
EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
|
||||
|
||||
12) ::
|
||||
|
||||
// Map skb->priority to traffic class for Eth1:
|
||||
// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
|
||||
// Map traffic class to transmit queue:
|
||||
// tc0 -> txq2, tc1 -> txq3, tc2 -> (txq6, txq7)
|
||||
$ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 3 \
|
||||
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@2 1@3 2@6 hw 1
|
||||
|
||||
13) ::
|
||||
|
||||
// Check classes settings
|
||||
$ tc -g class show dev eth1
|
||||
+---(100:ffe2) mqprio
|
||||
| +---(100:7) mqprio
|
||||
| +---(100:8) mqprio
|
||||
|
|
||||
+---(100:ffe1) mqprio
|
||||
| +---(100:4) mqprio
|
||||
|
|
||||
+---(100:ffe0) mqprio
|
||||
+---(100:3) mqprio
|
||||
|
||||
14) ::
|
||||
|
||||
// Set rate for class A - 31 Mbit (tc0, txq2) using CBS Qdisc for Eth1
|
||||
// here only idle slope is important, others ignored, but calculated
|
||||
// for interface speed - 100Mb for eth1 port.
|
||||
// Set it +1 Mb for reserve (important!)
|
||||
$ tc qdisc add dev eth1 parent 100:3 cbs locredit -1035 \
|
||||
hicredit 465 sendslope -69000 idleslope 31000 offload 1
|
||||
net eth1: set FIFO3 bw = 31
|
||||
|
||||
15) ::
|
||||
|
||||
// Set rate for class B - 11 Mbit (tc1, txq3) using CBS Qdisc for Eth1
|
||||
// Set it +1 Mb for reserve (important!)
|
||||
$ tc qdisc add dev eth1 parent 100:4 cbs locredit -1335 \
|
||||
hicredit 405 sendslope -89000 idleslope 11000 offload 1
|
||||
net eth1: set FIFO2 bw = 11
|
||||
|
||||
16) ::
|
||||
|
||||
// Create vlan 100 to map sk->priority to vlan qos for Eth1
|
||||
$ ip link add link eth1 name eth1.100 type vlan id 100
|
||||
net eth1: Adding vlanid 100 to vlan filter
|
||||
|
||||
17) ::
|
||||
|
||||
// Map skb->priority to L2 prio for Eth1.100, one to one
|
||||
$ ip link set eth1.100 type vlan \
|
||||
egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
|
||||
|
||||
18) ::
|
||||
|
||||
// Check egress map for vlan 100
|
||||
$ cat /proc/net/vlan/eth1.100
|
||||
[...]
|
||||
INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
|
||||
EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
|
||||
|
||||
19) ::
|
||||
|
||||
// Run appropriate tools with socket option "SO_PRIORITY" to 3
|
||||
// for class A and to 2 for class B. For both interfaces
|
||||
./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
|
||||
./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
|
||||
./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p2 -s 1500&
|
||||
./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p3 -s 1500&
|
||||
|
||||
20) ::
|
||||
|
||||
// run your listener on workstation (should be in same vlan)
|
||||
// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
|
||||
./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39012 kbps
|
||||
Receiving data rate: 39000 kbps
|
||||
|
||||
21) ::
|
||||
|
||||
// Restore default configuration if needed
|
||||
$ ip link del eth1.100
|
||||
$ ip link del eth0.100
|
||||
$ tc qdisc del dev eth1 root
|
||||
net eth1: Prev FIFO2 is shaped
|
||||
net eth1: set FIFO3 bw = 0
|
||||
net eth1: set FIFO2 bw = 0
|
||||
$ tc qdisc del dev eth0 root
|
||||
net eth0: Prev FIFO2 is shaped
|
||||
net eth0: set FIFO3 bw = 0
|
||||
net eth0: set FIFO2 bw = 0
|
||||
$ ethtool -L eth0 rx 1 tx 1
|
@@ -0,0 +1,242 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================================================
|
||||
Texas Instruments CPSW switchdev based ethernet driver
|
||||
======================================================
|
||||
|
||||
:Version: 2.0
|
||||
|
||||
Port renaming
|
||||
=============
|
||||
|
||||
On older udev versions renaming of ethX to swXpY will not be automatically
|
||||
supported
|
||||
|
||||
In order to rename via udev::
|
||||
|
||||
ip -d link show dev sw0p1 | grep switchid
|
||||
|
||||
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}==<switchid>, \
|
||||
ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
|
||||
|
||||
|
||||
Dual mac mode
|
||||
=============
|
||||
|
||||
- The new (cpsw_new.c) driver is operating in dual-emac mode by default, thus
|
||||
working as 2 individual network interfaces. Main differences from legacy CPSW
|
||||
driver are:
|
||||
|
||||
- optimized promiscuous mode: The P0_UNI_FLOOD (both ports) is enabled in
|
||||
addition to ALLMULTI (current port) instead of ALE_BYPASS.
|
||||
So, Ports in promiscuous mode will keep possibility of mcast and vlan
|
||||
filtering, which is provides significant benefits when ports are joined
|
||||
to the same bridge, but without enabling "switch" mode, or to different
|
||||
bridges.
|
||||
- learning disabled on ports as it make not too much sense for
|
||||
segregated ports - no forwarding in HW.
|
||||
- enabled basic support for devlink.
|
||||
|
||||
::
|
||||
|
||||
devlink dev show
|
||||
platform/48484000.switch
|
||||
|
||||
devlink dev param show
|
||||
platform/48484000.switch:
|
||||
name switch_mode type driver-specific
|
||||
values:
|
||||
cmode runtime value false
|
||||
name ale_bypass type driver-specific
|
||||
values:
|
||||
cmode runtime value false
|
||||
|
||||
Devlink configuration parameters
|
||||
================================
|
||||
|
||||
See Documentation/networking/devlink/ti-cpsw-switch.rst
|
||||
|
||||
Bridging in dual mac mode
|
||||
=========================
|
||||
|
||||
The dual_mac mode requires two vids to be reserved for internal purposes,
|
||||
which, by default, equal CPSW Port numbers. As result, bridge has to be
|
||||
configured in vlan unaware mode or default_pvid has to be adjusted::
|
||||
|
||||
ip link add name br0 type bridge
|
||||
ip link set dev br0 type bridge vlan_filtering 0
|
||||
echo 0 > /sys/class/net/br0/bridge/default_pvid
|
||||
ip link set dev sw0p1 master br0
|
||||
ip link set dev sw0p2 master br0
|
||||
|
||||
or::
|
||||
|
||||
ip link add name br0 type bridge
|
||||
ip link set dev br0 type bridge vlan_filtering 0
|
||||
echo 100 > /sys/class/net/br0/bridge/default_pvid
|
||||
ip link set dev br0 type bridge vlan_filtering 1
|
||||
ip link set dev sw0p1 master br0
|
||||
ip link set dev sw0p2 master br0
|
||||
|
||||
Enabling "switch"
|
||||
=================
|
||||
|
||||
The Switch mode can be enabled by configuring devlink driver parameter
|
||||
"switch_mode" to 1/true::
|
||||
|
||||
devlink dev param set platform/48484000.switch \
|
||||
name switch_mode value 1 cmode runtime
|
||||
|
||||
This can be done regardless of the state of Port's netdev devices - UP/DOWN, but
|
||||
Port's netdev devices have to be in UP before joining to the bridge to avoid
|
||||
overwriting of bridge configuration as CPSW switch driver copletly reloads its
|
||||
configuration when first Port changes its state to UP.
|
||||
|
||||
When the both interfaces joined the bridge - CPSW switch driver will enable
|
||||
marking packets with offload_fwd_mark flag unless "ale_bypass=0"
|
||||
|
||||
All configuration is implemented via switchdev API.
|
||||
|
||||
Bridge setup
|
||||
============
|
||||
|
||||
::
|
||||
|
||||
devlink dev param set platform/48484000.switch \
|
||||
name switch_mode value 1 cmode runtime
|
||||
|
||||
ip link add name br0 type bridge
|
||||
ip link set dev br0 type bridge ageing_time 1000
|
||||
ip link set dev sw0p1 up
|
||||
ip link set dev sw0p2 up
|
||||
ip link set dev sw0p1 master br0
|
||||
ip link set dev sw0p2 master br0
|
||||
|
||||
[*] bridge vlan add dev br0 vid 1 pvid untagged self
|
||||
|
||||
[*] if vlan_filtering=1. where default_pvid=1
|
||||
|
||||
Note. Steps [*] are mandatory.
|
||||
|
||||
|
||||
On/off STP
|
||||
==========
|
||||
|
||||
::
|
||||
|
||||
ip link set dev BRDEV type bridge stp_state 1/0
|
||||
|
||||
VLAN configuration
|
||||
==================
|
||||
|
||||
::
|
||||
|
||||
bridge vlan add dev br0 vid 1 pvid untagged self <---- add cpu port to VLAN 1
|
||||
|
||||
Note. This step is mandatory for bridge/default_pvid.
|
||||
|
||||
Add extra VLANs
|
||||
===============
|
||||
|
||||
1. untagged::
|
||||
|
||||
bridge vlan add dev sw0p1 vid 100 pvid untagged master
|
||||
bridge vlan add dev sw0p2 vid 100 pvid untagged master
|
||||
bridge vlan add dev br0 vid 100 pvid untagged self <---- Add cpu port to VLAN100
|
||||
|
||||
2. tagged::
|
||||
|
||||
bridge vlan add dev sw0p1 vid 100 master
|
||||
bridge vlan add dev sw0p2 vid 100 master
|
||||
bridge vlan add dev br0 vid 100 pvid tagged self <---- Add cpu port to VLAN100
|
||||
|
||||
FDBs
|
||||
----
|
||||
|
||||
FDBs are automatically added on the appropriate switch port upon detection
|
||||
|
||||
Manually adding FDBs::
|
||||
|
||||
bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
|
||||
bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master <---- Add on all VLANs
|
||||
|
||||
MDBs
|
||||
----
|
||||
|
||||
MDBs are automatically added on the appropriate switch port upon detection
|
||||
|
||||
Manually adding MDBs::
|
||||
|
||||
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100
|
||||
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent <---- Add on all VLANs
|
||||
|
||||
Multicast flooding
|
||||
==================
|
||||
CPU port mcast_flooding is always on
|
||||
|
||||
Turning flooding on/off on swithch ports:
|
||||
bridge link set dev sw0p1 mcast_flood on/off
|
||||
|
||||
Access and Trunk port
|
||||
=====================
|
||||
|
||||
::
|
||||
|
||||
bridge vlan add dev sw0p1 vid 100 pvid untagged master
|
||||
bridge vlan add dev sw0p2 vid 100 master
|
||||
|
||||
|
||||
bridge vlan add dev br0 vid 100 self
|
||||
ip link add link br0 name br0.100 type vlan id 100
|
||||
|
||||
Note. Setting PVID on Bridge device itself working only for
|
||||
default VLAN (default_pvid).
|
||||
|
||||
NFS
|
||||
===
|
||||
|
||||
The only way for NFS to work is by chrooting to a minimal environment when
|
||||
switch configuration that will affect connectivity is needed.
|
||||
Assuming you are booting NFS with eth1 interface(the script is hacky and
|
||||
it's just there to prove NFS is doable).
|
||||
|
||||
setup.sh::
|
||||
|
||||
#!/bin/sh
|
||||
mkdir proc
|
||||
mount -t proc none /proc
|
||||
ifconfig br0 > /dev/null
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "Setting up bridge"
|
||||
ip link add name br0 type bridge
|
||||
ip link set dev br0 type bridge ageing_time 1000
|
||||
ip link set dev br0 type bridge vlan_filtering 1
|
||||
|
||||
ip link set eth1 down
|
||||
ip link set eth1 name sw0p1
|
||||
ip link set dev sw0p1 up
|
||||
ip link set dev sw0p2 up
|
||||
ip link set dev sw0p2 master br0
|
||||
ip link set dev sw0p1 master br0
|
||||
bridge vlan add dev br0 vid 1 pvid untagged self
|
||||
ifconfig sw0p1 0.0.0.0
|
||||
udhchc -i br0
|
||||
fi
|
||||
umount /proc
|
||||
|
||||
run_nfs.sh:::
|
||||
|
||||
#!/bin/sh
|
||||
mkdir /tmp/root/bin -p
|
||||
mkdir /tmp/root/lib -p
|
||||
|
||||
cp -r /lib/ /tmp/root/
|
||||
cp -r /bin/ /tmp/root/
|
||||
cp /sbin/ip /tmp/root/bin
|
||||
cp /sbin/bridge /tmp/root/bin
|
||||
cp /sbin/ifconfig /tmp/root/bin
|
||||
cp /sbin/udhcpc /tmp/root/bin
|
||||
cp /path/to/setup.sh /tmp/root/bin
|
||||
chroot /tmp/root/ busybox sh /bin/setup.sh
|
||||
|
||||
run ./run_nfs.sh
|
140
Documentation/networking/device_drivers/ethernet/ti/tlan.rst
Normal file
140
Documentation/networking/device_drivers/ethernet/ti/tlan.rst
Normal file
@@ -0,0 +1,140 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================
|
||||
TLAN driver for Linux
|
||||
=====================
|
||||
|
||||
:Version: 1.14a
|
||||
|
||||
(C) 1997-1998 Caldera, Inc.
|
||||
|
||||
(C) 1998 James Banks
|
||||
|
||||
(C) 1999-2001 Torben Mathiasen <tmm@image.dk, torben.mathiasen@compaq.com>
|
||||
|
||||
For driver information/updates visit http://www.compaq.com
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
I. Supported Devices
|
||||
====================
|
||||
|
||||
Only PCI devices will work with this driver.
|
||||
|
||||
Supported:
|
||||
|
||||
========= ========= ===========================================
|
||||
Vendor ID Device ID Name
|
||||
========= ========= ===========================================
|
||||
0e11 ae32 Compaq Netelligent 10/100 TX PCI UTP
|
||||
0e11 ae34 Compaq Netelligent 10 T PCI UTP
|
||||
0e11 ae35 Compaq Integrated NetFlex 3/P
|
||||
0e11 ae40 Compaq Netelligent Dual 10/100 TX PCI UTP
|
||||
0e11 ae43 Compaq Netelligent Integrated 10/100 TX UTP
|
||||
0e11 b011 Compaq Netelligent 10/100 TX Embedded UTP
|
||||
0e11 b012 Compaq Netelligent 10 T/2 PCI UTP/Coax
|
||||
0e11 b030 Compaq Netelligent 10/100 TX UTP
|
||||
0e11 f130 Compaq NetFlex 3/P
|
||||
0e11 f150 Compaq NetFlex 3/P
|
||||
108d 0012 Olicom OC-2325
|
||||
108d 0013 Olicom OC-2183
|
||||
108d 0014 Olicom OC-2326
|
||||
========= ========= ===========================================
|
||||
|
||||
|
||||
Caveats:
|
||||
|
||||
I am not sure if 100BaseTX daughterboards (for those cards which
|
||||
support such things) will work. I haven't had any solid evidence
|
||||
either way.
|
||||
|
||||
However, if a card supports 100BaseTx without requiring an add
|
||||
on daughterboard, it should work with 100BaseTx.
|
||||
|
||||
The "Netelligent 10 T/2 PCI UTP/Coax" (b012) device is untested,
|
||||
but I do not expect any problems.
|
||||
|
||||
|
||||
II. Driver Options
|
||||
==================
|
||||
|
||||
1. You can append debug=x to the end of the insmod line to get
|
||||
debug messages, where x is a bit field where the bits mean
|
||||
the following:
|
||||
|
||||
==== =====================================
|
||||
0x01 Turn on general debugging messages.
|
||||
0x02 Turn on receive debugging messages.
|
||||
0x04 Turn on transmit debugging messages.
|
||||
0x08 Turn on list debugging messages.
|
||||
==== =====================================
|
||||
|
||||
2. You can append aui=1 to the end of the insmod line to cause
|
||||
the adapter to use the AUI interface instead of the 10 Base T
|
||||
interface. This is also what to do if you want to use the BNC
|
||||
connector on a TLAN based device. (Setting this option on a
|
||||
device that does not have an AUI/BNC connector will probably
|
||||
cause it to not function correctly.)
|
||||
|
||||
3. You can set duplex=1 to force half duplex, and duplex=2 to
|
||||
force full duplex.
|
||||
|
||||
4. You can set speed=10 to force 10Mbs operation, and speed=100
|
||||
to force 100Mbs operation. (I'm not sure what will happen
|
||||
if a card which only supports 10Mbs is forced into 100Mbs
|
||||
mode.)
|
||||
|
||||
5. You have to use speed=X duplex=Y together now. If you just
|
||||
do "insmod tlan.o speed=100" the driver will do Auto-Neg.
|
||||
To force a 10Mbps Half-Duplex link do "insmod tlan.o speed=10
|
||||
duplex=1".
|
||||
|
||||
6. If the driver is built into the kernel, you can use the 3rd
|
||||
and 4th parameters to set aui and debug respectively. For
|
||||
example::
|
||||
|
||||
ether=0,0,0x1,0x7,eth0
|
||||
|
||||
This sets aui to 0x1 and debug to 0x7, assuming eth0 is a
|
||||
supported TLAN device.
|
||||
|
||||
The bits in the third byte are assigned as follows:
|
||||
|
||||
==== ===============
|
||||
0x01 aui
|
||||
0x02 use half duplex
|
||||
0x04 use full duplex
|
||||
0x08 use 10BaseT
|
||||
0x10 use 100BaseTx
|
||||
==== ===============
|
||||
|
||||
You also need to set both speed and duplex settings when forcing
|
||||
speeds with kernel-parameters.
|
||||
ether=0,0,0x12,0,eth0 will force link to 100Mbps Half-Duplex.
|
||||
|
||||
7. If you have more than one tlan adapter in your system, you can
|
||||
use the above options on a per adapter basis. To force a 100Mbit/HD
|
||||
link with your eth1 adapter use::
|
||||
|
||||
insmod tlan speed=0,100 duplex=0,1
|
||||
|
||||
Now eth0 will use auto-neg and eth1 will be forced to 100Mbit/HD.
|
||||
Note that the tlan driver supports a maximum of 8 adapters.
|
||||
|
||||
|
||||
III. Things to try if you have problems
|
||||
=======================================
|
||||
|
||||
1. Make sure your card's PCI id is among those listed in
|
||||
section I, above.
|
||||
2. Make sure routing is correct.
|
||||
3. Try forcing different speed/duplex settings
|
||||
|
||||
|
||||
There is also a tlan mailing list which you can join by sending "subscribe tlan"
|
||||
in the body of an email to majordomo@vuser.vu.union.edu.
|
||||
|
||||
There is also a tlan website at http://www.compaq.com
|
||||
|
@@ -0,0 +1,202 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===========================
|
||||
The Spidernet Device Driver
|
||||
===========================
|
||||
|
||||
Written by Linas Vepstas <linas@austin.ibm.com>
|
||||
|
||||
Version of 7 June 2007
|
||||
|
||||
Abstract
|
||||
========
|
||||
This document sketches the structure of portions of the spidernet
|
||||
device driver in the Linux kernel tree. The spidernet is a gigabit
|
||||
ethernet device built into the Toshiba southbridge commonly used
|
||||
in the SONY Playstation 3 and the IBM QS20 Cell blade.
|
||||
|
||||
The Structure of the RX Ring.
|
||||
=============================
|
||||
The receive (RX) ring is a circular linked list of RX descriptors,
|
||||
together with three pointers into the ring that are used to manage its
|
||||
contents.
|
||||
|
||||
The elements of the ring are called "descriptors" or "descrs"; they
|
||||
describe the received data. This includes a pointer to a buffer
|
||||
containing the received data, the buffer size, and various status bits.
|
||||
|
||||
There are three primary states that a descriptor can be in: "empty",
|
||||
"full" and "not-in-use". An "empty" or "ready" descriptor is ready
|
||||
to receive data from the hardware. A "full" descriptor has data in it,
|
||||
and is waiting to be emptied and processed by the OS. A "not-in-use"
|
||||
descriptor is neither empty or full; it is simply not ready. It may
|
||||
not even have a data buffer in it, or is otherwise unusable.
|
||||
|
||||
During normal operation, on device startup, the OS (specifically, the
|
||||
spidernet device driver) allocates a set of RX descriptors and RX
|
||||
buffers. These are all marked "empty", ready to receive data. This
|
||||
ring is handed off to the hardware, which sequentially fills in the
|
||||
buffers, and marks them "full". The OS follows up, taking the full
|
||||
buffers, processing them, and re-marking them empty.
|
||||
|
||||
This filling and emptying is managed by three pointers, the "head"
|
||||
and "tail" pointers, managed by the OS, and a hardware current
|
||||
descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
|
||||
currently being filled. When this descr is filled, the hardware
|
||||
marks it full, and advances the GDACTDPA by one. Thus, when there is
|
||||
flowing RX traffic, every descr behind it should be marked "full",
|
||||
and everything in front of it should be "empty". If the hardware
|
||||
discovers that the current descr is not empty, it will signal an
|
||||
interrupt, and halt processing.
|
||||
|
||||
The tail pointer tails or trails the hardware pointer. When the
|
||||
hardware is ahead, the tail pointer will be pointing at a "full"
|
||||
descr. The OS will process this descr, and then mark it "not-in-use",
|
||||
and advance the tail pointer. Thus, when there is flowing RX traffic,
|
||||
all of the descrs in front of the tail pointer should be "full", and
|
||||
all of those behind it should be "not-in-use". When RX traffic is not
|
||||
flowing, then the tail pointer can catch up to the hardware pointer.
|
||||
The OS will then note that the current tail is "empty", and halt
|
||||
processing.
|
||||
|
||||
The head pointer (somewhat mis-named) follows after the tail pointer.
|
||||
When traffic is flowing, then the head pointer will be pointing at
|
||||
a "not-in-use" descr. The OS will perform various housekeeping duties
|
||||
on this descr. This includes allocating a new data buffer and
|
||||
dma-mapping it so as to make it visible to the hardware. The OS will
|
||||
then mark the descr as "empty", ready to receive data. Thus, when there
|
||||
is flowing RX traffic, everything in front of the head pointer should
|
||||
be "not-in-use", and everything behind it should be "empty". If no
|
||||
RX traffic is flowing, then the head pointer can catch up to the tail
|
||||
pointer, at which point the OS will notice that the head descr is
|
||||
"empty", and it will halt processing.
|
||||
|
||||
Thus, in an idle system, the GDACTDPA, tail and head pointers will
|
||||
all be pointing at the same descr, which should be "empty". All of the
|
||||
other descrs in the ring should be "empty" as well.
|
||||
|
||||
The show_rx_chain() routine will print out the locations of the
|
||||
GDACTDPA, tail and head pointers. It will also summarize the contents
|
||||
of the ring, starting at the tail pointer, and listing the status
|
||||
of the descrs that follow.
|
||||
|
||||
A typical example of the output, for a nearly idle system, might be::
|
||||
|
||||
net eth1: Total number of descrs=256
|
||||
net eth1: Chain tail located at descr=20
|
||||
net eth1: Chain head is at 20
|
||||
net eth1: HW curr desc (GDACTDPA) is at 21
|
||||
net eth1: Have 1 descrs with stat=x40800101
|
||||
net eth1: HW next desc (GDACNEXTDA) is at 22
|
||||
net eth1: Last 255 descrs with stat=xa0800000
|
||||
|
||||
In the above, the hardware has filled in one descr, number 20. Both
|
||||
head and tail are pointing at 20, because it has not yet been emptied.
|
||||
Meanwhile, hw is pointing at 21, which is free.
|
||||
|
||||
The "Have nnn decrs" refers to the descr starting at the tail: in this
|
||||
case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
|
||||
to all of the rest of the descrs, from the last status change. The "nnn"
|
||||
is a count of how many descrs have exactly the same status.
|
||||
|
||||
The status x4... corresponds to "full" and status xa... corresponds
|
||||
to "empty". The actual value printed is RXCOMST_A.
|
||||
|
||||
In the device driver source code, a different set of names are
|
||||
used for these same concepts, so that::
|
||||
|
||||
"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
|
||||
"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
|
||||
"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
|
||||
|
||||
|
||||
The RX RAM full bug/feature
|
||||
===========================
|
||||
|
||||
As long as the OS can empty out the RX buffers at a rate faster than
|
||||
the hardware can fill them, there is no problem. If, for some reason,
|
||||
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
|
||||
pointer will catch up to the head, notice the not-empty condition,
|
||||
ad stop. However, RX packets may still continue arriving on the wire.
|
||||
The spidernet chip can save some limited number of these in local RAM.
|
||||
When this local ram fills up, the spider chip will issue an interrupt
|
||||
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
|
||||
will be set in GHIINT1STS). When the RX ram full condition occurs,
|
||||
a certain bug/feature is triggered that has to be specially handled.
|
||||
This section describes the special handling for this condition.
|
||||
|
||||
When the OS finally has a chance to run, it will empty out the RX ring.
|
||||
In particular, it will clear the descriptor on which the hardware had
|
||||
stopped. However, once the hardware has decided that a certain
|
||||
descriptor is invalid, it will not restart at that descriptor; instead
|
||||
it will restart at the next descr. This potentially will lead to a
|
||||
deadlock condition, as the tail pointer will be pointing at this descr,
|
||||
which, from the OS point of view, is empty; the OS will be waiting for
|
||||
this descr to be filled. However, the hardware has skipped this descr,
|
||||
and is filling the next descrs. Since the OS doesn't see this, there
|
||||
is a potential deadlock, with the OS waiting for one descr to fill,
|
||||
while the hardware is waiting for a different set of descrs to become
|
||||
empty.
|
||||
|
||||
A call to show_rx_chain() at this point indicates the nature of the
|
||||
problem. A typical print when the network is hung shows the following::
|
||||
|
||||
net eth1: Spider RX RAM full, incoming packets might be discarded!
|
||||
net eth1: Total number of descrs=256
|
||||
net eth1: Chain tail located at descr=255
|
||||
net eth1: Chain head is at 255
|
||||
net eth1: HW curr desc (GDACTDPA) is at 0
|
||||
net eth1: Have 1 descrs with stat=xa0800000
|
||||
net eth1: HW next desc (GDACNEXTDA) is at 1
|
||||
net eth1: Have 127 descrs with stat=x40800101
|
||||
net eth1: Have 1 descrs with stat=x40800001
|
||||
net eth1: Have 126 descrs with stat=x40800101
|
||||
net eth1: Last 1 descrs with stat=xa0800000
|
||||
|
||||
Both the tail and head pointers are pointing at descr 255, which is
|
||||
marked xa... which is "empty". Thus, from the OS point of view, there
|
||||
is nothing to be done. In particular, there is the implicit assumption
|
||||
that everything in front of the "empty" descr must surely also be empty,
|
||||
as explained in the last section. The OS is waiting for descr 255 to
|
||||
become non-empty, which, in this case, will never happen.
|
||||
|
||||
The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
|
||||
Since its already full, the hardware can do nothing more, and thus has
|
||||
halted processing. Notice that descrs 0 through 254 are all marked
|
||||
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
|
||||
descr 254, since tail was at 255.) Thus, the system is deadlocked,
|
||||
and there can be no forward progress; the OS thinks there's nothing
|
||||
to do, and the hardware has nowhere to put incoming data.
|
||||
|
||||
This bug/feature is worked around with the spider_net_resync_head_ptr()
|
||||
routine. When the driver receives RX interrupts, but an examination
|
||||
of the RX chain seems to show it is empty, then it is probable that
|
||||
the hardware has skipped a descr or two (sometimes dozens under heavy
|
||||
network conditions). The spider_net_resync_head_ptr() subroutine will
|
||||
search the ring for the next full descr, and the driver will resume
|
||||
operations there. Since this will leave "holes" in the ring, there
|
||||
is also a spider_net_resync_tail_ptr() that will skip over such holes.
|
||||
|
||||
As of this writing, the spider_net_resync() strategy seems to work very
|
||||
well, even under heavy network loads.
|
||||
|
||||
|
||||
The TX ring
|
||||
===========
|
||||
The TX ring uses a low-watermark interrupt scheme to make sure that
|
||||
the TX queue is appropriately serviced for large packet sizes.
|
||||
|
||||
For packet sizes greater than about 1KBytes, the kernel can fill
|
||||
the TX ring quicker than the device can drain it. Once the ring
|
||||
is full, the netdev is stopped. When there is room in the ring,
|
||||
the netdev needs to be reawakened, so that more TX packets are placed
|
||||
in the ring. The hardware can empty the ring about four times per jiffy,
|
||||
so its not appropriate to wait for the poll routine to refill, since
|
||||
the poll routine runs only once per jiffy. The low-watermark mechanism
|
||||
marks a descr about 1/4th of the way from the bottom of the queue, so
|
||||
that an interrupt is generated when the descr is processed. This
|
||||
interrupt wakes up the netdev, which can then refill the queue.
|
||||
For large packets, this mechanism generates a relatively small number
|
||||
of interrupts, about 1K/sec. For smaller packets, this will drop to zero
|
||||
interrupts, as the hardware can empty the queue faster than the kernel
|
||||
can fill it.
|
Reference in New Issue
Block a user