docs: networking: reorganize driver documentation again

Organize driver documentation by device type. Most documents
have fairly verbose yet uninformative names, so let users
first select a well defined device type, and then search for
a particular driver.

While at it rename the section from Vendor drivers to
Hardware drivers. This seems more accurate, besides people
sometimes refer to out-of-tree drivers as vendor drivers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
Jakub Kicinski
2020-06-26 10:27:24 -07:00
committed by David S. Miller
parent ab696fa70f
commit 132db93572
69 changed files with 180 additions and 124 deletions

View File

@@ -0,0 +1,249 @@
.. SPDX-License-Identifier: GPL-2.0
=============================================================================
Linux and the 3Com EtherLink III Series Ethercards (driver v1.18c and higher)
=============================================================================
This file contains the instructions and caveats for v1.18c and higher versions
of the 3c509 driver. You should not use the driver without reading this file.
release 1.0
28 February 2002
Current maintainer (corrections to):
David Ruggiero <jdr@farfalle.com>
Introduction
============
The following are notes and information on using the 3Com EtherLink III series
ethercards in Linux. These cards are commonly known by the most widely-used
card's 3Com model number, 3c509. They are all 10mb/s ISA-bus cards and shouldn't
be (but sometimes are) confused with the similarly-numbered PCI-bus "3c905"
(aka "Vortex" or "Boomerang") series. Kernel support for the 3c509 family is
provided by the module 3c509.c, which has code to support all of the following
models:
- 3c509 (original ISA card)
- 3c509B (later revision of the ISA card; supports full-duplex)
- 3c589 (PCMCIA)
- 3c589B (later revision of the 3c589; supports full-duplex)
- 3c579 (EISA)
Large portions of this documentation were heavily borrowed from the guide
written the original author of the 3c509 driver, Donald Becker. The master
copy of that document, which contains notes on older versions of the driver,
currently resides on Scyld web server: http://www.scyld.com/.
Special Driver Features
=======================
Overriding card settings
The driver allows boot- or load-time overriding of the card's detected IOADDR,
IRQ, and transceiver settings, although this capability shouldn't generally be
needed except to enable full-duplex mode (see below). An example of the syntax
for LILO parameters for doing this::
ether=10,0x310,3,0x3c509,eth0
This configures the first found 3c509 card for IRQ 10, base I/O 0x310, and
transceiver type 3 (10base2). The flag "0x3c509" must be set to avoid conflicts
with other card types when overriding the I/O address. When the driver is
loaded as a module, only the IRQ may be overridden. For example,
setting two cards to IRQ10 and IRQ11 is done by using the irq module
option::
options 3c509 irq=10,11
Full-duplex mode
================
The v1.18c driver added support for the 3c509B's full-duplex capabilities.
In order to enable and successfully use full-duplex mode, three conditions
must be met:
(a) You must have a Etherlink III card model whose hardware supports full-
duplex operations. Currently, the only members of the 3c509 family that are
positively known to support full-duplex are the 3c509B (ISA bus) and 3c589B
(PCMCIA) cards. Cards without the "B" model designation do *not* support
full-duplex mode; these include the original 3c509 (no "B"), the original
3c589, the 3c529 (MCA bus), and the 3c579 (EISA bus).
(b) You must be using your card's 10baseT transceiver (i.e., the RJ-45
connector), not its AUI (thick-net) or 10base2 (thin-net/coax) interfaces.
AUI and 10base2 network cabling is physically incapable of full-duplex
operation.
(c) Most importantly, your 3c509B must be connected to a link partner that is
itself full-duplex capable. This is almost certainly one of two things: a full-
duplex-capable Ethernet switch (*not* a hub), or a full-duplex-capable NIC on
another system that's connected directly to the 3c509B via a crossover cable.
Full-duplex mode can be enabled using 'ethtool'.
.. warning::
Extremely important caution concerning full-duplex mode
Understand that the 3c509B's hardware's full-duplex support is much more
limited than that provide by more modern network interface cards. Although
at the physical layer of the network it fully supports full-duplex operation,
the card was designed before the current Ethernet auto-negotiation (N-way)
spec was written. This means that the 3c509B family ***cannot and will not
auto-negotiate a full-duplex connection with its link partner under any
circumstances, no matter how it is initialized***. If the full-duplex mode
of the 3c509B is enabled, its link partner will very likely need to be
independently _forced_ into full-duplex mode as well; otherwise various nasty
failures will occur - at the very least, you'll see massive numbers of packet
collisions. This is one of very rare circumstances where disabling auto-
negotiation and forcing the duplex mode of a network interface card or switch
would ever be necessary or desirable.
Available Transceiver Types
===========================
For versions of the driver v1.18c and above, the available transceiver types are:
== =========================================================================
0 transceiver type from EEPROM config (normally 10baseT); force half-duplex
1 AUI (thick-net / DB15 connector)
2 (undefined)
3 10base2 (thin-net == coax / BNC connector)
4 10baseT (RJ-45 connector); force half-duplex mode
8 transceiver type and duplex mode taken from card's EEPROM config settings
12 10baseT (RJ-45 connector); force full-duplex mode
== =========================================================================
Prior to driver version 1.18c, only transceiver codes 0-4 were supported. Note
that the new transceiver codes 8 and 12 are the *only* ones that will enable
full-duplex mode, no matter what the card's detected EEPROM settings might be.
This insured that merely upgrading the driver from an earlier version would
never automatically enable full-duplex mode in an existing installation;
it must always be explicitly enabled via one of these code in order to be
activated.
The transceiver type can be changed using 'ethtool'.
Interpretation of error messages and common problems
----------------------------------------------------
Error Messages
^^^^^^^^^^^^^^
eth0: Infinite loop in interrupt, status 2011.
These are "mostly harmless" message indicating that the driver had too much
work during that interrupt cycle. With a status of 0x2011 you are receiving
packets faster than they can be removed from the card. This should be rare
or impossible in normal operation. Possible causes of this error report are:
- a "green" mode enabled that slows the processor down when there is no
keyboard activity.
- some other device or device driver hogging the bus or disabling interrupts.
Check /proc/interrupts for excessive interrupt counts. The timer tick
interrupt should always be incrementing faster than the others.
No received packets
^^^^^^^^^^^^^^^^^^^
If a 3c509, 3c562 or 3c589 can successfully transmit packets, but never
receives packets (as reported by /proc/net/dev or 'ifconfig') you likely
have an interrupt line problem. Check /proc/interrupts to verify that the
card is actually generating interrupts. If the interrupt count is not
increasing you likely have a physical conflict with two devices trying to
use the same ISA IRQ line. The common conflict is with a sound card on IRQ10
or IRQ5, and the easiest solution is to move the 3c509 to a different
interrupt line. If the device is receiving packets but 'ping' doesn't work,
you have a routing problem.
Tx Carrier Errors Reported in /proc/net/dev
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If an EtherLink III appears to transmit packets, but the "Tx carrier errors"
field in /proc/net/dev increments as quickly as the Tx packet count, you
likely have an unterminated network or the incorrect media transceiver selected.
3c509B card is not detected on machines with an ISA PnP BIOS.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
While the updated driver works with most PnP BIOS programs, it does not work
with all. This can be fixed by disabling PnP support using the 3Com-supplied
setup program.
3c509 card is not detected on overclocked machines
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Increase the delay time in id_read_eeprom() from the current value, 500,
to an absurdly high value, such as 5000.
Decoding Status and Error Messages
----------------------------------
The bits in the main status register are:
===== ======================================
value description
===== ======================================
0x01 Interrupt latch
0x02 Tx overrun, or Rx underrun
0x04 Tx complete
0x08 Tx FIFO room available
0x10 A complete Rx packet has arrived
0x20 A Rx packet has started to arrive
0x40 The driver has requested an interrupt
0x80 Statistics counter nearly full
===== ======================================
The bits in the transmit (Tx) status word are:
===== ============================================
value description
===== ============================================
0x02 Out-of-window collision.
0x04 Status stack overflow (normally impossible).
0x08 16 collisions.
0x10 Tx underrun (not enough PCI bus bandwidth).
0x20 Tx jabber.
0x40 Tx interrupt requested.
0x80 Status is valid (this should always be set).
===== ============================================
When a transmit error occurs the driver produces a status message such as::
eth0: Transmit error, Tx status register 82
The two values typically seen here are:
0x82
^^^^
Out of window collision. This typically occurs when some other Ethernet
host is incorrectly set to full duplex on a half duplex network.
0x88
^^^^
16 collisions. This typically occurs when the network is exceptionally busy
or when another host doesn't correctly back off after a collision. If this
error is mixed with 0x82 errors it is the result of a host incorrectly set
to full duplex (see above).
Both of these errors are the result of network problems that should be
corrected. They do not represent driver malfunction.
Revision history (this file)
============================
28Feb02 v1.0 DR New; major portions based on Becker original 3c509 docs

View File

@@ -0,0 +1,459 @@
.. SPDX-License-Identifier: GPL-2.0
=========================
3Com Vortex device driver
=========================
Andrew Morton
30 April 2000
This document describes the usage and errata of the 3Com "Vortex" device
driver for Linux, 3c59x.c.
The driver was written by Donald Becker <becker@scyld.com>
Don is no longer the prime maintainer of this version of the driver.
Please report problems to one or more of:
- Andrew Morton
- Netdev mailing list <netdev@vger.kernel.org>
- Linux kernel mailing list <linux-kernel@vger.kernel.org>
Please note the 'Reporting and Diagnosing Problems' section at the end
of this file.
Since kernel 2.3.99-pre6, this driver incorporates the support for the
3c575-series Cardbus cards which used to be handled by 3c575_cb.c.
This driver supports the following hardware:
- 3c590 Vortex 10Mbps
- 3c592 EISA 10Mbps Demon/Vortex
- 3c597 EISA Fast Demon/Vortex
- 3c595 Vortex 100baseTx
- 3c595 Vortex 100baseT4
- 3c595 Vortex 100base-MII
- 3c900 Boomerang 10baseT
- 3c900 Boomerang 10Mbps Combo
- 3c900 Cyclone 10Mbps TPO
- 3c900 Cyclone 10Mbps Combo
- 3c900 Cyclone 10Mbps TPC
- 3c900B-FL Cyclone 10base-FL
- 3c905 Boomerang 100baseTx
- 3c905 Boomerang 100baseT4
- 3c905B Cyclone 100baseTx
- 3c905B Cyclone 10/100/BNC
- 3c905B-FX Cyclone 100baseFx
- 3c905C Tornado
- 3c920B-EMB-WNM (ATI Radeon 9100 IGP)
- 3c980 Cyclone
- 3c980C Python-T
- 3cSOHO100-TX Hurricane
- 3c555 Laptop Hurricane
- 3c556 Laptop Tornado
- 3c556B Laptop Hurricane
- 3c575 [Megahertz] 10/100 LAN CardBus
- 3c575 Boomerang CardBus
- 3CCFE575BT Cyclone CardBus
- 3CCFE575CT Tornado CardBus
- 3CCFE656 Cyclone CardBus
- 3CCFEM656B Cyclone+Winmodem CardBus
- 3CXFEM656C Tornado+Winmodem CardBus
- 3c450 HomePNA Tornado
- 3c920 Tornado
- 3c982 Hydra Dual Port A
- 3c982 Hydra Dual Port B
- 3c905B-T4
- 3c920B-EMB-WNM Tornado
Module parameters
=================
There are several parameters which may be provided to the driver when
its module is loaded. These are usually placed in ``/etc/modprobe.d/*.conf``
configuration files. Example::
options 3c59x debug=3 rx_copybreak=300
If you are using the PCMCIA tools (cardmgr) then the options may be
placed in /etc/pcmcia/config.opts::
module "3c59x" opts "debug=3 rx_copybreak=300"
The supported parameters are:
debug=N
Where N is a number from 0 to 7. Anything above 3 produces a lot
of output in your system logs. debug=1 is default.
options=N1,N2,N3,...
Each number in the list provides an option to the corresponding
network card. So if you have two 3c905's and you wish to provide
them with option 0x204 you would use::
options=0x204,0x204
The individual options are composed of a number of bitfields which
have the following meanings:
Possible media type settings
== =================================
0 10baseT
1 10Mbs AUI
2 undefined
3 10base2 (BNC)
4 100base-TX
5 100base-FX
6 MII (Media Independent Interface)
7 Use default setting from EEPROM
8 Autonegotiate
9 External MII
10 Use default setting from EEPROM
== =================================
When generating a value for the 'options' setting, the above media
selection values may be OR'ed (or added to) the following:
====== =============================================
0x8000 Set driver debugging level to 7
0x4000 Set driver debugging level to 2
0x0400 Enable Wake-on-LAN
0x0200 Force full duplex mode.
0x0010 Bus-master enable bit (Old Vortex cards only)
====== =============================================
For example::
insmod 3c59x options=0x204
will force full-duplex 100base-TX, rather than allowing the usual
autonegotiation.
global_options=N
Sets the ``options`` parameter for all 3c59x NICs in the machine.
Entries in the ``options`` array above will override any setting of
this.
full_duplex=N1,N2,N3...
Similar to bit 9 of 'options'. Forces the corresponding card into
full-duplex mode. Please use this in preference to the ``options``
parameter.
In fact, please don't use this at all! You're better off getting
autonegotiation working properly.
global_full_duplex=N1
Sets full duplex mode for all 3c59x NICs in the machine. Entries
in the ``full_duplex`` array above will override any setting of this.
flow_ctrl=N1,N2,N3...
Use 802.3x MAC-layer flow control. The 3com cards only support the
PAUSE command, which means that they will stop sending packets for a
short period if they receive a PAUSE frame from the link partner.
The driver only allows flow control on a link which is operating in
full duplex mode.
This feature does not appear to work on the 3c905 - only 3c905B and
3c905C have been tested.
The 3com cards appear to only respond to PAUSE frames which are
sent to the reserved destination address of 01:80:c2:00:00:01. They
do not honour PAUSE frames which are sent to the station MAC address.
rx_copybreak=M
The driver preallocates 32 full-sized (1536 byte) network buffers
for receiving. When a packet arrives, the driver has to decide
whether to leave the packet in its full-sized buffer, or to allocate
a smaller buffer and copy the packet across into it.
This is a speed/space tradeoff.
The value of rx_copybreak is used to decide when to make the copy.
If the packet size is less than rx_copybreak, the packet is copied.
The default value for rx_copybreak is 200 bytes.
max_interrupt_work=N
The driver's interrupt service routine can handle many receive and
transmit packets in a single invocation. It does this in a loop.
The value of max_interrupt_work governs how many times the interrupt
service routine will loop. The default value is 32 loops. If this
is exceeded the interrupt service routine gives up and generates a
warning message "eth0: Too much work in interrupt".
hw_checksums=N1,N2,N3,...
Recent 3com NICs are able to generate IPv4, TCP and UDP checksums
in hardware. Linux has used the Rx checksumming for a long time.
The "zero copy" patch which is planned for the 2.4 kernel series
allows you to make use of the NIC's DMA scatter/gather and transmit
checksumming as well.
The driver is set up so that, when the zerocopy patch is applied,
all Tornado and Cyclone devices will use S/G and Tx checksums.
This module parameter has been provided so you can override this
decision. If you think that Tx checksums are causing a problem, you
may disable the feature with ``hw_checksums=0``.
If you think your NIC should be performing Tx checksumming and the
driver isn't enabling it, you can force the use of hardware Tx
checksumming with ``hw_checksums=1``.
The driver drops a message in the logfiles to indicate whether or
not it is using hardware scatter/gather and hardware Tx checksums.
Scatter/gather and hardware checksums provide considerable
performance improvement for the sendfile() system call, but a small
decrease in throughput for send(). There is no effect upon receive
efficiency.
compaq_ioaddr=N,
compaq_irq=N,
compaq_device_id=N
"Variables to work-around the Compaq PCI BIOS32 problem"....
watchdog=N
Sets the time duration (in milliseconds) after which the kernel
decides that the transmitter has become stuck and needs to be reset.
This is mainly for debugging purposes, although it may be advantageous
to increase this value on LANs which have very high collision rates.
The default value is 5000 (5.0 seconds).
enable_wol=N1,N2,N3,...
Enable Wake-on-LAN support for the relevant interface. Donald
Becker's ``ether-wake`` application may be used to wake suspended
machines.
Also enables the NIC's power management support.
global_enable_wol=N
Sets enable_wol mode for all 3c59x NICs in the machine. Entries in
the ``enable_wol`` array above will override any setting of this.
Media selection
---------------
A number of the older NICs such as the 3c590 and 3c900 series have
10base2 and AUI interfaces.
Prior to January, 2001 this driver would autoeselect the 10base2 or AUI
port if it didn't detect activity on the 10baseT port. It would then
get stuck on the 10base2 port and a driver reload was necessary to
switch back to 10baseT. This behaviour could not be prevented with a
module option override.
Later (current) versions of the driver _do_ support locking of the
media type. So if you load the driver module with
modprobe 3c59x options=0
it will permanently select the 10baseT port. Automatic selection of
other media types does not occur.
Transmit error, Tx status register 82
-------------------------------------
This is a common error which is almost always caused by another host on
the same network being in full-duplex mode, while this host is in
half-duplex mode. You need to find that other host and make it run in
half-duplex mode or fix this host to run in full-duplex mode.
As a last resort, you can force the 3c59x driver into full-duplex mode
with
options 3c59x full_duplex=1
but this has to be viewed as a workaround for broken network gear and
should only really be used for equipment which cannot autonegotiate.
Additional resources
--------------------
Details of the device driver implementation are at the top of the source file.
Additional documentation is available at Don Becker's Linux Drivers site:
http://www.scyld.com/vortex.html
Donald Becker's driver development site:
http://www.scyld.com/network.html
Donald's vortex-diag program is useful for inspecting the NIC's state:
http://www.scyld.com/ethercard_diag.html
Donald's mii-diag program may be used for inspecting and manipulating
the NIC's Media Independent Interface subsystem:
http://www.scyld.com/ethercard_diag.html#mii-diag
Donald's wake-on-LAN page:
http://www.scyld.com/wakeonlan.html
3Com's DOS-based application for setting up the NICs EEPROMs:
ftp://ftp.3com.com/pub/nic/3c90x/3c90xx2.exe
Autonegotiation notes
---------------------
The driver uses a one-minute heartbeat for adapting to changes in
the external LAN environment if link is up and 5 seconds if link is down.
This means that when, for example, a machine is unplugged from a hubbed
10baseT LAN plugged into a switched 100baseT LAN, the throughput
will be quite dreadful for up to sixty seconds. Be patient.
Cisco interoperability note from Walter Wong <wcw+@CMU.EDU>:
On a side note, adding HAS_NWAY seems to share a problem with the
Cisco 6509 switch. Specifically, you need to change the spanning
tree parameter for the port the machine is plugged into to 'portfast'
mode. Otherwise, the negotiation fails. This has been an issue
we've noticed for a while but haven't had the time to track down.
Cisco switches (Jeff Busch <jbusch@deja.com>)
My "standard config" for ports to which PC's/servers connect directly::
interface FastEthernet0/N
description machinename
load-interval 30
spanning-tree portfast
If autonegotiation is a problem, you may need to specify "speed
100" and "duplex full" as well (or "speed 10" and "duplex half").
WARNING: DO NOT hook up hubs/switches/bridges to these
specially-configured ports! The switch will become very confused.
Reporting and diagnosing problems
---------------------------------
Maintainers find that accurate and complete problem reports are
invaluable in resolving driver problems. We are frequently not able to
reproduce problems and must rely on your patience and efforts to get to
the bottom of the problem.
If you believe you have a driver problem here are some of the
steps you should take:
- Is it really a driver problem?
Eliminate some variables: try different cards, different
computers, different cables, different ports on the switch/hub,
different versions of the kernel or of the driver, etc.
- OK, it's a driver problem.
You need to generate a report. Typically this is an email to the
maintainer and/or netdev@vger.kernel.org. The maintainer's
email address will be in the driver source or in the MAINTAINERS file.
- The contents of your report will vary a lot depending upon the
problem. If it's a kernel crash then you should refer to the
admin-guide/reporting-bugs.rst file.
But for most problems it is useful to provide the following:
- Kernel version, driver version
- A copy of the banner message which the driver generates when
it is initialised. For example:
eth0: 3Com PCI 3c905C Tornado at 0xa400, 00:50:da:6a:88:f0, IRQ 19
8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
MII transceiver found at address 24, status 782d.
Enabling bus-master transmits and whole-frame receives.
NOTE: You must provide the ``debug=2`` modprobe option to generate
a full detection message. Please do this::
modprobe 3c59x debug=2
- If it is a PCI device, the relevant output from 'lspci -vx', eg::
00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
Subsystem: 3Com Corporation: Unknown device 9200
Flags: bus master, medium devsel, latency 32, IRQ 19
I/O ports at a400 [size=128]
Memory at db000000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
00: b7 10 00 92 07 00 10 02 74 00 00 02 08 20 00 00
10: 01 a4 00 00 00 00 00 db 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 00 10
30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 0a 0a
- A description of the environment: 10baseT? 100baseT?
full/half duplex? switched or hubbed?
- Any additional module parameters which you may be providing to the driver.
- Any kernel logs which are produced. The more the merrier.
If this is a large file and you are sending your report to a
mailing list, mention that you have the logfile, but don't send
it. If you're reporting direct to the maintainer then just send
it.
To ensure that all kernel logs are available, add the
following line to /etc/syslog.conf::
kern.* /var/log/messages
Then restart syslogd with::
/etc/rc.d/init.d/syslog restart
(The above may vary, depending upon which Linux distribution you use).
- If your problem is reproducible then that's great. Try the
following:
1) Increase the debug level. Usually this is done via:
a) modprobe driver debug=7
b) In /etc/modprobe.d/driver.conf:
options driver debug=7
2) Recreate the problem with the higher debug level,
send all logs to the maintainer.
3) Download you card's diagnostic tool from Donald
Becker's website <http://www.scyld.com/ethercard_diag.html>.
Download mii-diag.c as well. Build these.
a) Run 'vortex-diag -aaee' and 'mii-diag -v' when the card is
working correctly. Save the output.
b) Run the above commands when the card is malfunctioning. Send
both sets of output.
Finally, please be patient and be prepared to do some work. You may
end up working on this problem for a week or more as the maintainer
asks more questions, asks for more tests, asks for patches to be
applied, etc. At the end of it all, the problem may even remain
unresolved.

View File

@@ -0,0 +1,344 @@
.. SPDX-License-Identifier: GPL-2.0
============================================================
Linux kernel driver for Elastic Network Adapter (ENA) family
============================================================
Overview
========
ENA is a networking interface designed to make good use of modern CPU
features and system architectures.
The ENA device exposes a lightweight management interface with a
minimal set of memory mapped registers and extendable command set
through an Admin Queue.
The driver supports a range of ENA devices, is link-speed independent
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
a negotiated and extendable feature set.
Some ENA devices support SR-IOV. This driver is used for both the
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
ENA devices enable high speed and low overhead network traffic
processing by providing multiple Tx/Rx queue pairs (the maximum number
is advertised by the device via the Admin Queue), a dedicated MSI-X
interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation,
and CPU cacheline optimized data placement.
The ENA driver supports industry standard TCP/IP offload features such
as checksum offload and TCP transmit segmentation offload (TSO).
Receive-side scaling (RSS) is supported for multi-core scaling.
The ENA driver and its corresponding devices implement health
monitoring mechanisms such as watchdog, enabling the device and driver
to recover in a manner transparent to the application, as well as
debug logs.
Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds.
Supported PCI vendor ID/device IDs
==================================
========= =======================
1d0f:0ec2 ENA PF
1d0f:1ec2 ENA PF with LLQ support
1d0f:ec20 ENA VF
1d0f:ec21 ENA VF with LLQ support
========= =======================
ENA Source Code Directory Structure
===================================
================= ======================================================
ena_com.[ch] Management communication layer. This layer is
responsible for the handling all the management
(admin) communication between the device and the
driver.
ena_eth_com.[ch] Tx/Rx data path.
ena_admin_defs.h Definition of ENA management interface.
ena_eth_io_defs.h Definition of ENA data path interface.
ena_common_defs.h Common definitions for ena_com layer.
ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers.
ena_netdev.[ch] Main Linux kernel driver.
ena_syfsfs.[ch] Sysfs files.
ena_ethtool.c ethtool callbacks.
ena_pci_id_tbl.h Supported device IDs.
================= ======================================================
Management Interface:
=====================
ENA management interface is exposed by means of:
- PCIe Configuration Space
- Device Registers
- Admin Queue (AQ) and Admin Completion Queue (ACQ)
- Asynchronous Event Notification Queue (AENQ)
ENA device MMIO Registers are accessed only during driver
initialization and are not involved in further normal device
operation.
AQ is used for submitting management commands, and the
results/responses are reported asynchronously through ACQ.
ENA introduces a small set of management commands with room for
vendor-specific extensions. Most of the management operations are
framed in a generic Get/Set feature command.
The following admin queue commands are supported:
- Create I/O submission queue
- Create I/O completion queue
- Destroy I/O submission queue
- Destroy I/O completion queue
- Get feature
- Set feature
- Configure AENQ
- Get statistics
Refer to ena_admin_defs.h for the list of supported Get/Set Feature
properties.
The Asynchronous Event Notification Queue (AENQ) is a uni-directional
queue used by the ENA device to send to the driver events that cannot
be reported using ACQ. AENQ events are subdivided into groups. Each
group may have multiple syndromes, as shown below
The events are:
==================== ===============
Group Syndrome
==================== ===============
Link state change **X**
Fatal error **X**
Notification Suspend traffic
Notification Resume traffic
Keep-Alive **X**
==================== ===============
ACQ and AENQ share the same MSI-X vector.
Keep-Alive is a special mechanism that allows monitoring of the
device's health. The driver maintains a watchdog (WD) handler which,
if fired, logs the current state and statistics then resets and
restarts the ENA device and driver. A Keep-Alive event is delivered by
the device every second. The driver re-arms the WD upon reception of a
Keep-Alive event. A missed Keep-Alive event causes the WD handler to
fire.
Data Path Interface
===================
I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
SQ correspondingly). Each SQ has a completion queue (CQ) associated
with it.
The SQs and CQs are implemented as descriptor rings in contiguous
physical memory.
The ENA driver supports two Queue Operation modes for Tx SQs:
- Regular mode
* In this mode the Tx SQs reside in the host's memory. The ENA
device fetches the ENA Tx descriptors and packet data from host
memory.
- Low Latency Queue (LLQ) mode or "push-mode".
* In this mode the driver pushes the transmit descriptors and the
first 128 bytes of the packet directly to the ENA device memory
space. The rest of the packet payload is fetched by the
device. For this operation mode, the driver uses a dedicated PCI
device memory BAR, which is mapped with write-combine capability.
The Rx SQs support only the regular mode.
Note: Not all ENA devices support LLQ, and this feature is negotiated
with the device upon initialization. If the ENA device does not
support LLQ mode, the driver falls back to the regular mode.
The driver supports multi-queue for both Tx and Rx. This has various
benefits:
- Reduced CPU/thread/process contention on a given Ethernet interface.
- Cache miss rate on completion is reduced, particularly for data
cache lines that hold the sk_buff structures.
- Increased process-level parallelism when handling received packets.
- Increased data cache hit rate, by steering kernel processing of
packets to the CPU, where the application thread consuming the
packet is running.
- In hardware interrupt re-direction.
Interrupt Modes
===============
The driver assigns a single MSI-X vector per queue pair (for both Tx
and Rx directions). The driver assigns an additional dedicated MSI-X vector
for management (for ACQ and AENQ).
Management interrupt registration is performed when the Linux kernel
probes the adapter, and it is de-registered when the adapter is
removed. I/O queue interrupt registration is performed when the Linux
interface of the adapter is opened, and it is de-registered when the
interface is closed.
The management interrupt is named::
ena-mgmnt@pci:<PCI domain:bus:slot.function>
and for each queue pair, an interrupt is named::
<interface name>-Tx-Rx-<queue index>
The ENA device operates in auto-mask and auto-clear interrupt
modes. That is, once MSI-X is delivered to the host, its Cause bit is
automatically cleared and the interrupt is masked. The interrupt is
unmasked by the driver after NAPI processing is complete.
Interrupt Moderation
====================
ENA driver and device can operate in conventional or adaptive interrupt
moderation mode.
In conventional mode the driver instructs device to postpone interrupt
posting according to static interrupt delay value. The interrupt delay
value can be configured through ethtool(8). The following ethtool
parameters are supported by the driver: tx-usecs, rx-usecs
In adaptive interrupt moderation mode the interrupt delay value is
updated by the driver dynamically and adjusted every NAPI cycle
according to the traffic nature.
By default ENA driver applies adaptive coalescing on Rx traffic and
conventional coalescing on Tx traffic.
Adaptive coalescing can be switched on/off through ethtool(8)
adaptive_rx on|off parameter.
The driver chooses interrupt delay value according to the number of
bytes and packets received between interrupt unmasking and interrupt
posting. The driver uses interrupt delay table that subdivides the
range of received bytes/packets into 5 levels and assigns interrupt
delay value to each level.
The user can enable/disable adaptive moderation, modify the interrupt
delay table and restore its default values through sysfs.
RX copybreak
============
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
and can be configured by the ETHTOOL_STUNABLE command of the
SIOCETHTOOL ioctl.
SKB
===
The driver-allocated SKB for frames received from Rx handling using
NAPI context. The allocation method depends on the size of the packet.
If the frame length is larger than rx_copybreak, napi_get_frags()
is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer
content is copied (by CPU) to the SKB, and the buffer is recycled.
Statistics
==========
The user can obtain ENA device and driver statistics using ethtool.
The driver can collect regular or extended statistics (including
per-queue stats) from the device.
In addition the driver logs the stats to syslog upon device reset.
MTU
===
The driver supports an arbitrarily large MTU with a maximum that is
negotiated with the device. The driver configures MTU using the
SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
via ip(8) and similar legacy tools.
Stateless Offloads
==================
The ENA driver supports:
- TSO over IPv4/IPv6
- TSO with ECN
- IPv4 header checksum offload
- TCP/UDP over IPv4/IPv6 checksum offloads
RSS
===
- The ENA device supports RSS that allows flexible Rx traffic
steering.
- Toeplitz and CRC32 hash functions are supported.
- Different combinations of L2/L3/L4 fields can be configured as
inputs for hash functions.
- The driver configures RSS settings using the AQ SetFeature command
(ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and
ENA_ADMIN_RSS_REDIRECTION_TABLE_CONFIG properties).
- If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash
function delivered in the Rx CQ descriptor is set in the received
SKB.
- The user can provide a hash key, hash function, and configure the
indirection table through ethtool(8).
DATA PATH
=========
Tx
--
end_start_xmit() is called by the stack. This function does the following:
- Maps data buffers (skb->data and frags).
- Populates ena_buf for the push buffer (if the driver and device are
in push mode.)
- Prepares ENA bufs for the remaining frags.
- Allocates a new request ID from the empty req_id ring. The request
ID is the index of the packet in the Tx info. This is used for
out-of-order TX completions.
- Adds the packet to the proper place in the Tx ring.
- Calls ena_com_prepare_tx(), an ENA communication layer that converts
the ena_bufs to ENA descriptors (and adds meta ENA descriptors as
needed.)
* This function also copies the ENA descriptors and the push buffer
to the Device memory space (if in push mode.)
- Writes doorbell to the ENA device.
- When the ENA device finishes sending the packet, a completion
interrupt is raised.
- The interrupt handler schedules NAPI.
- The ena_clean_tx_irq() function is called. This function handles the
completion descriptors generated by the ENA, with a single
completion descriptor per completed packet.
* req_id is retrieved from the completion descriptor. The tx_info of
the packet is retrieved via the req_id. The data buffers are
unmapped and req_id is returned to the empty req_id ring.
* The function stops when the completion descriptors are completed or
the budget is reached.
Rx
--
- When a packet is received from the ENA device.
- The interrupt handler schedules NAPI.
- The ena_clean_rx_irq() function is called. This function calls
ena_rx_pkt(), an ENA communication layer function, which returns the
number of descriptors used for a new unhandled packet, and zero if
no new packet is found.
- Then it calls the ena_clean_rx_irq() function.
- ena_eth_rx_skb() checks packet length:
* If the packet is small (len < rx_copybreak), the driver allocates
a SKB for the new packet, and copies the packet payload into the
SKB data buffer.
- In this way the original data buffer is not passed to the stack
and is reused for future Rx packets.
* Otherwise the function unmaps the Rx buffer, then allocates the
new SKB structure and hooks the Rx buffer to the SKB frags.
- The new SKB is updated with the necessary information (protocol,
checksum hw verify result, etc.), and then passed to the network
stack, using the NAPI interface function napi_gro_receive().

View File

@@ -0,0 +1,556 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
===============================
Marvell(Aquantia) AQtion Driver
===============================
For the aQuantia Multi-Gigabit PCI Express Family of Ethernet Adapters
.. Contents
- Identifying Your Adapter
- Configuration
- Supported ethtool options
- Command Line Parameters
- Config file parameters
- Support
- License
Identifying Your Adapter
========================
The driver in this release is compatible with AQC-100, AQC-107, AQC-108
based ethernet adapters.
SFP+ Devices (for AQC-100 based adapters)
-----------------------------------------
This release tested with passive Direct Attach Cables (DAC) and SFP+/LC
Optical Transceiver.
Configuration
=============
Viewing Link Messages
---------------------
Link messages will not be displayed to the console if the distribution is
restricting system messages. In order to see network driver link messages on
your console, set dmesg to eight by entering the following::
dmesg -n 8
.. note::
This setting is not saved across reboots.
Jumbo Frames
------------
The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
enabled by changing the MTU to a value larger than the default of 1500.
The maximum value for the MTU is 16000. Use the `ip` command to
increase the MTU size. For example::
ip link set mtu 16000 dev enp1s0
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest
ethtool version is required for this functionality.
NAPI
----
NAPI (Rx polling mode) is supported in the atlantic driver.
Supported ethtool options
=========================
Viewing adapter settings
------------------------
::
ethtool <ethX>
Output example::
Settings for enp1s0:
Supported ports: [ TP ]
Supported link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
2500baseT/Full
5000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
2500baseT/Full
5000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: g
Wake-on: d
Link detected: yes
.. note::
AQrate speeds (2.5/5 Gb/s) will be displayed only with linux kernels > 4.10.
But you can still use these speeds::
ethtool -s eth0 autoneg off speed 2500
Viewing adapter information
---------------------------
::
ethtool -i <ethX>
Output example::
driver: atlantic
version: 5.2.0-050200rc5-generic-kern
firmware-version: 3.1.78
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
Viewing Ethernet adapter statistics
-----------------------------------
::
ethtool -S <ethX>
Output example::
NIC statistics:
InPackets: 13238607
InUCast: 13293852
InMCast: 52
InBCast: 3
InErrors: 0
OutPackets: 23703019
OutUCast: 23704941
OutMCast: 67
OutBCast: 11
InUCastOctects: 213182760
OutUCastOctects: 22698443
InMCastOctects: 6600
OutMCastOctects: 8776
InBCastOctects: 192
OutBCastOctects: 704
InOctects: 2131839552
OutOctects: 226938073
InPacketsDma: 95532300
OutPacketsDma: 59503397
InOctetsDma: 1137102462
OutOctetsDma: 2394339518
InDroppedDma: 0
Queue[0] InPackets: 23567131
Queue[0] OutPackets: 20070028
Queue[0] InJumboPackets: 0
Queue[0] InLroPackets: 0
Queue[0] InErrors: 0
Queue[1] InPackets: 45428967
Queue[1] OutPackets: 11306178
Queue[1] InJumboPackets: 0
Queue[1] InLroPackets: 0
Queue[1] InErrors: 0
Queue[2] InPackets: 3187011
Queue[2] OutPackets: 13080381
Queue[2] InJumboPackets: 0
Queue[2] InLroPackets: 0
Queue[2] InErrors: 0
Queue[3] InPackets: 23349136
Queue[3] OutPackets: 15046810
Queue[3] InJumboPackets: 0
Queue[3] InLroPackets: 0
Queue[3] InErrors: 0
Interrupt coalescing support
----------------------------
ITR mode, TX/RX coalescing timings could be viewed with::
ethtool -c <ethX>
and changed with::
ethtool -C <ethX> tx-usecs <usecs> rx-usecs <usecs>
To disable coalescing::
ethtool -C <ethX> tx-usecs 0 rx-usecs 0 tx-max-frames 1 tx-max-frames 1
Wake on LAN support
-------------------
WOL support by magic packet::
ethtool -s <ethX> wol g
To disable WOL::
ethtool -s <ethX> wol d
Set and check the driver message level
--------------------------------------
Set message level
::
ethtool -s <ethX> msglvl <level>
Level values:
====== =============================
0x0001 general driver status.
0x0002 hardware probing.
0x0004 link state.
0x0008 periodic status check.
0x0010 interface being brought down.
0x0020 interface being brought up.
0x0040 receive error.
0x0080 transmit error.
0x0200 interrupt handling.
0x0400 transmit completion.
0x0800 receive completion.
0x1000 packet contents.
0x2000 hardware status.
0x4000 Wake-on-LAN status.
====== =============================
By default, the level of debugging messages is set 0x0001(general driver status).
Check message level
::
ethtool <ethX> | grep "Current message level"
If you want to disable the output of messages::
ethtool -s <ethX> msglvl 0
RX flow rules (ntuple filters)
------------------------------
There are separate rules supported, that applies in that order:
1. 16 VLAN ID rules
2. 16 L2 EtherType rules
3. 8 L3/L4 5-Tuple rules
The driver utilizes the ethtool interface for configuring ntuple filters,
via ``ethtool -N <device> <filter>``.
To enable or disable the RX flow rules::
ethtool -K ethX ntuple <on|off>
When disabling ntuple filters, all the user programed filters are
flushed from the driver cache and hardware. All needed filters must
be re-added when ntuple is re-enabled.
Because of the fixed order of the rules, the location of filters is also fixed:
- Locations 0 - 15 for VLAN ID filters
- Locations 16 - 31 for L2 EtherType filters
- Locations 32 - 39 for L3/L4 5-tuple filters (locations 32, 36 for IPv6)
The L3/L4 5-tuple (protocol, source and destination IP address, source and
destination TCP/UDP/SCTP port) is compared against 8 filters. For IPv4, up to
8 source and destination addresses can be matched. For IPv6, up to 2 pairs of
addresses can be supported. Source and destination ports are only compared for
TCP/UDP/SCTP packets.
To add a filter that directs packet to queue 5, use
``<-N|-U|--config-nfc|--config-ntuple>`` switch::
ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.1 dst-ip 10.0.0.2 src-port 2000 dst-port 2001 action 5 <loc 32>
- action is the queue number.
- loc is the rule number.
For ``flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6`` you must set the loc
number within 32 - 39.
For ``flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6`` you can set 8 rules
for traffic IPv4 or you can set 2 rules for traffic IPv6. Loc number traffic
IPv6 is 32 and 36.
At the moment you can not use IPv4 and IPv6 filters at the same time.
Example filter for IPv6 filter traffic::
sudo ethtool -N <ethX> flow-type tcp6 src-ip 2001:db8:0:f101::1 dst-ip 2001:db8:0:f101::2 action 1 loc 32
sudo ethtool -N <ethX> flow-type ip6 src-ip 2001:db8:0:f101::2 dst-ip 2001:db8:0:f101::5 action -1 loc 36
Example filter for IPv4 filter traffic::
sudo ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.4 dst-ip 10.0.0.7 src-port 2000 dst-port 2001 loc 32
sudo ethtool -N <ethX> flow-type tcp4 src-ip 10.0.0.3 dst-ip 10.0.0.9 src-port 2000 dst-port 2001 loc 33
sudo ethtool -N <ethX> flow-type ip4 src-ip 10.0.0.6 dst-ip 10.0.0.4 loc 34
If you set action -1, then all traffic corresponding to the filter will be discarded.
The maximum value action is 31.
The VLAN filter (VLAN id) is compared against 16 filters.
VLAN id must be accompanied by mask 0xF000. That is to distinguish VLAN filter
from L2 Ethertype filter with UserPriority since both User Priority and VLAN ID
are passed in the same 'vlan' parameter.
To add a filter that directs packets from VLAN 2001 to queue 5::
ethtool -N <ethX> flow-type ip4 vlan 2001 m 0xF000 action 1 loc 0
L2 EtherType filters allows filter packet by EtherType field or both EtherType
and User Priority (PCP) field of 802.1Q.
UserPriority (vlan) parameter must be accompanied by mask 0x1FFF. That is to
distinguish VLAN filter from L2 Ethertype filter with UserPriority since both
User Priority and VLAN ID are passed in the same 'vlan' parameter.
To add a filter that directs IP4 packess of priority 3 to queue 3::
ethtool -N <ethX> flow-type ether proto 0x800 vlan 0x600 m 0x1FFF action 3 loc 16
To see the list of filters currently present::
ethtool <-u|-n|--show-nfc|--show-ntuple> <ethX>
Rules may be deleted from the table itself. This is done using::
sudo ethtool <-N|-U|--config-nfc|--config-ntuple> <ethX> delete <loc>
- loc is the rule number to be deleted.
Rx filters is an interface to load the filter table that funnels all flow
into queue 0 unless an alternative queue is specified using "action". In that
case, any flow that matches the filter criteria will be directed to the
appropriate queue. RX filters is supported on all kernels 2.6.30 and later.
RSS for UDP
-----------
Currently, NIC does not support RSS for fragmented IP packets, which leads to
incorrect working of RSS for fragmented UDP traffic. To disable RSS for UDP the
RX Flow L3/L4 rule may be used.
Example::
ethtool -N eth0 flow-type udp4 action 0 loc 32
UDP GSO hardware offload
------------------------
UDP GSO allows to boost UDP tx rates by offloading UDP headers allocation
into hardware. A special userspace socket option is required for this,
could be validated with /kernel/tools/testing/selftests/net/::
udpgso_bench_tx -u -4 -D 10.0.1.1 -s 6300 -S 100
Will cause sending out of 100 byte sized UDP packets formed from single
6300 bytes user buffer.
UDP GSO is configured by::
ethtool -K eth0 tx-udp-segmentation on
Private flags (testing)
-----------------------
Atlantic driver supports private flags for hardware custom features::
$ ethtool --show-priv-flags ethX
Private flags for ethX:
DMASystemLoopback : off
PKTSystemLoopback : off
DMANetworkLoopback : off
PHYInternalLoopback: off
PHYExternalLoopback: off
Example::
$ ethtool --set-priv-flags ethX DMASystemLoopback on
DMASystemLoopback: DMA Host loopback.
PKTSystemLoopback: Packet buffer host loopback.
DMANetworkLoopback: Network side loopback on DMA block.
PHYInternalLoopback: Internal loopback on Phy.
PHYExternalLoopback: External loopback on Phy (with loopback ethernet cable).
Command Line Parameters
=======================
The following command line parameters are available on atlantic driver:
aq_itr -Interrupt throttling mode
---------------------------------
Accepted values: 0, 1, 0xFFFF
Default value: 0xFFFF
====== ==============================================================
0 Disable interrupt throttling.
1 Enable interrupt throttling and use specified tx and rx rates.
0xFFFF Auto throttling mode. Driver will choose the best RX and TX
interrupt throtting settings based on link speed.
====== ==============================================================
aq_itr_tx - TX interrupt throttle rate
--------------------------------------
Accepted values: 0 - 0x1FF
Default value: 0
TX side throttling in microseconds. Adapter will setup maximum interrupt delay
to this value. Minimum interrupt delay will be a half of this value
aq_itr_rx - RX interrupt throttle rate
--------------------------------------
Accepted values: 0 - 0x1FF
Default value: 0
RX side throttling in microseconds. Adapter will setup maximum interrupt delay
to this value. Minimum interrupt delay will be a half of this value
.. note::
ITR settings could be changed in runtime by ethtool -c means (see below)
Config file parameters
======================
For some fine tuning and performance optimizations,
some parameters can be changed in the {source_dir}/aq_cfg.h file.
AQ_CFG_RX_PAGEORDER
-------------------
Default value: 0
RX page order override. Thats a power of 2 number of RX pages allocated for
each descriptor. Received descriptor size is still limited by
AQ_CFG_RX_FRAME_MAX.
Increasing pageorder makes page reuse better (actual on iommu enabled systems).
AQ_CFG_RX_REFILL_THRES
----------------------
Default value: 32
RX refill threshold. RX path will not refill freed descriptors until the
specified number of free descriptors is observed. Larger values may help
better page reuse but may lead to packet drops as well.
AQ_CFG_VECS_DEF
---------------
Number of queues
Valid Range: 0 - 8 (up to AQ_CFG_VECS_MAX)
Default value: 8
Notice this value will be capped by the number of cores available on the system.
AQ_CFG_IS_RSS_DEF
-----------------
Enable/disable Receive Side Scaling
This feature allows the adapter to distribute receive processing
across multiple CPU-cores and to prevent from overloading a single CPU core.
Valid values
== ========
0 disabled
1 enabled
== ========
Default value: 1
AQ_CFG_NUM_RSS_QUEUES_DEF
-------------------------
Number of queues for Receive Side Scaling
Valid Range: 0 - 8 (up to AQ_CFG_VECS_DEF)
Default value: AQ_CFG_VECS_DEF
AQ_CFG_IS_LRO_DEF
-----------------
Enable/disable Large Receive Offload
This offload enables the adapter to coalesce multiple TCP segments and indicate
them as a single coalesced unit to the OS networking subsystem.
The system consumes less energy but it also introduces more latency in packets
processing.
Valid values
== ========
0 disabled
1 enabled
== ========
Default value: 1
AQ_CFG_TX_CLEAN_BUDGET
----------------------
Maximum descriptors to cleanup on TX at once.
Default value: 256
After the aq_cfg.h file changed the driver must be rebuilt to take effect.
Support
=======
If an issue is identified with the released source code on the supported
kernel with a supported adapter, email the specific information related
to the issue to aqn_support@marvell.com
License
=======
aQuantia Corporation Network Driver
Copyright |copy| 2014 - 2019 aQuantia Corporation.
This program is free software; you can redistribute it and/or modify it
under the terms and conditions of the GNU General Public License,
version 2, as published by the Free Software Foundation.

View File

@@ -0,0 +1,393 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
=============================================
Chelsio N210 10Gb Ethernet Network Controller
=============================================
Driver Release Notes for Linux
Version 2.1.1
June 20, 2005
.. Contents
INTRODUCTION
FEATURES
PERFORMANCE
DRIVER MESSAGES
KNOWN ISSUES
SUPPORT
Introduction
============
This document describes the Linux driver for Chelsio 10Gb Ethernet Network
Controller. This driver supports the Chelsio N210 NIC and is backward
compatible with the Chelsio N110 model 10Gb NICs.
Features
========
Adaptive Interrupts (adaptive-rx)
---------------------------------
This feature provides an adaptive algorithm that adjusts the interrupt
coalescing parameters, allowing the driver to dynamically adapt the latency
settings to achieve the highest performance during various types of network
load.
The interface used to control this feature is ethtool. Please see the
ethtool manpage for additional usage information.
By default, adaptive-rx is disabled.
To enable adaptive-rx::
ethtool -C <interface> adaptive-rx on
To disable adaptive-rx, use ethtool::
ethtool -C <interface> adaptive-rx off
After disabling adaptive-rx, the timer latency value will be set to 50us.
You may set the timer latency after disabling adaptive-rx::
ethtool -C <interface> rx-usecs <microseconds>
An example to set the timer latency value to 100us on eth0::
ethtool -C eth0 rx-usecs 100
You may also provide a timer latency value while disabling adaptive-rx::
ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
If adaptive-rx is disabled and a timer latency value is specified, the timer
will be set to the specified value until changed by the user or until
adaptive-rx is enabled.
To view the status of the adaptive-rx and timer latency values::
ethtool -c <interface>
TCP Segmentation Offloading (TSO) Support
-----------------------------------------
This feature, also known as "large send", enables a system's protocol stack
to offload portions of outbound TCP processing to a network interface card
thereby reducing system CPU utilization and enhancing performance.
The interface used to control this feature is ethtool version 1.8 or higher.
Please see the ethtool manpage for additional usage information.
By default, TSO is enabled.
To disable TSO::
ethtool -K <interface> tso off
To enable TSO::
ethtool -K <interface> tso on
To view the status of TSO::
ethtool -k <interface>
Performance
===========
The following information is provided as an example of how to change system
parameters for "performance tuning" an what value to use. You may or may not
want to change these system parameters, depending on your server/workstation
application. Doing so is not warranted in any way by Chelsio Communications,
and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
of data or damage to equipment.
Your distribution may have a different way of doing things, or you may prefer
a different method. These commands are shown only to provide an example of
what to do and are by no means definitive.
Making any of the following system changes will only last until you reboot
your system. You may want to write a script that runs at boot-up which
includes the optimal settings for your system.
Setting PCI Latency Timer::
setpci -d 1425::
* 0x0c.l=0x0000F800
Disabling TCP timestamp::
sysctl -w net.ipv4.tcp_timestamps=0
Disabling SACK::
sysctl -w net.ipv4.tcp_sack=0
Setting large number of incoming connection requests::
sysctl -w net.ipv4.tcp_max_syn_backlog=3000
Setting maximum receive socket buffer size::
sysctl -w net.core.rmem_max=1024000
Setting maximum send socket buffer size::
sysctl -w net.core.wmem_max=1024000
Set smp_affinity (on a multiprocessor system) to a single CPU::
echo 1 > /proc/irq/<interrupt_number>/smp_affinity
Setting default receive socket buffer size::
sysctl -w net.core.rmem_default=524287
Setting default send socket buffer size::
sysctl -w net.core.wmem_default=524287
Setting maximum option memory buffers::
sysctl -w net.core.optmem_max=524287
Setting maximum backlog (# of unprocessed packets before kernel drops)::
sysctl -w net.core.netdev_max_backlog=300000
Setting TCP read buffers (min/default/max)::
sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
Setting TCP write buffers (min/pressure/max)::
sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
Setting TCP buffer space (min/pressure/max)::
sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
TCP window size for single connections:
The receive buffer (RX_WINDOW) size must be at least as large as the
Bandwidth-Delay Product of the communication link between the sender and
receiver. Due to the variations of RTT, you may want to increase the buffer
size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
"TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
At 10Gb speeds, use the following formula::
RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
RX_WINDOW sizes of 256KB - 512KB should be sufficient.
Setting the min, max, and default receive buffer (RX_WINDOW) size::
sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
TCP window size for multiple connections:
The receive buffer (RX_WINDOW) size may be calculated the same as single
connections, but should be divided by the number of connections. The
smaller window prevents congestion and facilitates better pacing,
especially if/when MAC level flow control does not work well or when it is
not supported on the machine. Experimentation may be necessary to attain
the correct value. This method is provided as a starting point for the
correct receive buffer size.
Setting the min, max, and default receive buffer (RX_WINDOW) size is
performed in the same manner as single connection.
Driver Messages
===============
The following messages are the most common messages logged by syslog. These
may be found in /var/log/messages.
Driver up::
Chelsio Network Driver - version 2.1.1
NIC detected::
eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
Link up::
eth#: link is up at 10 Gbps, full duplex
Link down::
eth#: link is down
Known Issues
============
These issues have been identified during testing. The following information
is provided as a workaround to the problem. In some cases, this problem is
inherent to Linux or to a particular Linux Distribution and/or hardware
platform.
1. Large number of TCP retransmits on a multiprocessor (SMP) system.
On a system with multiple CPUs, the interrupt (IRQ) for the network
controller may be bound to more than one CPU. This will cause TCP
retransmits if the packet data were to be split across different CPUs
and re-assembled in a different order than expected.
To eliminate the TCP retransmits, set smp_affinity on the particular
interrupt to a single CPU. You can locate the interrupt (IRQ) used on
the N110/N210 by using ifconfig::
ifconfig <dev_name> | grep Interrupt
Set the smp_affinity to a single CPU::
echo 1 > /proc/irq/<interrupt_number>/smp_affinity
It is highly suggested that you do not run the irqbalance daemon on your
system, as this will change any smp_affinity setting you have applied.
The irqbalance daemon runs on a 10 second interval and binds interrupts
to the least loaded CPU determined by the daemon. To disable this daemon::
chkconfig --level 2345 irqbalance off
By default, some Linux distributions enable the kernel feature,
irqbalance, which performs the same function as the daemon. To disable
this feature, add the following line to your bootloader::
noirqbalance
Example using the Grub bootloader::
title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
root (hd0,0)
kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
initrd /initrd-2.4.21-27.ELsmp.img
2. After running insmod, the driver is loaded and the incorrect network
interface is brought up without running ifup.
When using 2.4.x kernels, including RHEL kernels, the Linux kernel
invokes a script named "hotplug". This script is primarily used to
automatically bring up USB devices when they are plugged in, however,
the script also attempts to automatically bring up a network interface
after loading the kernel module. The hotplug script does this by scanning
the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
for HWADDR=<mac_address>.
If the hotplug script does not find the HWADDRR within any of the
ifcfg-eth# files, it will bring up the device with the next available
interface name. If this interface is already configured for a different
network card, your new interface will have incorrect IP address and
network settings.
To solve this issue, you can add the HWADDR=<mac_address> key to the
interface config file of your network controller.
To disable this "hotplug" feature, you may add the driver (module name)
to the "blacklist" file located in /etc/hotplug. It has been noted that
this does not work for network devices because the net.agent script
does not use the blacklist file. Simply remove, or rename, the net.agent
script located in /etc/hotplug to disable this feature.
3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
chipset, you may experience the "133-Mhz Mode Split Completion Data
Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
bus PCI-X bus.
AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
can provide stale data via split completion cycles to a PCI-X card that
is operating at 133 Mhz", causing data corruption.
AMD's provides three workarounds for this problem, however, Chelsio
recommends the first option for best performance with this bug:
For 133Mhz secondary bus operation, limit the transaction length and
the number of outstanding transactions, via BIOS configuration
programming of the PCI-X card, to the following:
Data Length (bytes): 1k
Total allowed outstanding transactions: 2
Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
section 56, "133-MHz Mode Split Completion Data Corruption" for more
details with this bug and workarounds suggested by AMD.
It may be possible to work outside AMD's recommended PCI-X settings, try
increasing the Data Length to 2k bytes for increased performance. If you
have issues with these settings, please revert to the "safe" settings
and duplicate the problem before submitting a bug or asking for support.
.. note::
The default setting on most systems is 8 outstanding transactions
and 2k bytes data length.
4. On multiprocessor systems, it has been noted that an application which
is handling 10Gb networking can switch between CPUs causing degraded
and/or unstable performance.
If running on an SMP system and taking performance measurements, it
is suggested you either run the latest netperf-2.4.0+ or use a binding
tool such as Tim Hockin's procstate utilities (runon)
<http://www.hockin.org/~thockin/procstate/>.
Binding netserver and netperf (or other applications) to particular
CPUs will have a significant difference in performance measurements.
You may need to experiment which CPU to bind the application to in
order to achieve the best performance for your system.
If you are developing an application designed for 10Gb networking,
please keep in mind you may want to look at kernel functions
sched_setaffinity & sched_getaffinity to bind your application.
If you are just running user-space applications such as ftp, telnet,
etc., you may want to try the runon tool provided by Tim Hockin's
procstate utility. You could also try binding the interface to a
particular CPU: runon 0 ifup eth0
Support
=======
If you have problems with the software or hardware, please contact our
customer support team via email at support@chelsio.com or check our website
at http://www.chelsio.com
-------------------------------------------------------------------------------
::
Chelsio Communications
370 San Aleso Ave.
Suite 100
Sunnyvale, CA 94085
http://www.chelsio.com
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License, version 2, as
published by the Free Software Foundation.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
THIS SOFTWARE IS PROVIDED ``AS IS`` AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Copyright |copy| 2003-2005 Chelsio Communications. All rights reserved.

View File

@@ -0,0 +1,647 @@
.. SPDX-License-Identifier: GPL-2.0
================================================
Cirrus Logic LAN CS8900/CS8920 Ethernet Adapters
================================================
.. note::
This document was contributed by Cirrus Logic for kernel 2.2.5. This version
has been updated for 2.3.48 by Andrew Morton.
Still, this is too outdated! A major cleanup is needed here.
Cirrus make a copy of this driver available at their website, as
described below. In general, you should use the driver version which
comes with your Linux distribution.
Linux Network Interface Driver ver. 2.00 <kernel 2.3.48>
.. TABLE OF CONTENTS
1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS
1.1 Product Overview
1.2 Driver Description
1.2.1 Driver Name
1.2.2 File in the Driver Package
1.3 System Requirements
1.4 Licensing Information
2.0 ADAPTER INSTALLATION and CONFIGURATION
2.1 CS8900-based Adapter Configuration
2.2 CS8920-based Adapter Configuration
3.0 LOADING THE DRIVER AS A MODULE
4.0 COMPILING THE DRIVER
4.1 Compiling the Driver as a Loadable Module
4.2 Compiling the driver to support memory mode
4.3 Compiling the driver to support Rx DMA
5.0 TESTING AND TROUBLESHOOTING
5.1 Known Defects and Limitations
5.2 Testing the Adapter
5.2.1 Diagnostic Self-Test
5.2.2 Diagnostic Network Test
5.3 Using the Adapter's LEDs
5.4 Resolving I/O Conflicts
6.0 TECHNICAL SUPPORT
6.1 Contacting Cirrus Logic's Technical Support
6.2 Information Required Before Contacting Technical Support
6.3 Obtaining the Latest Driver Version
6.4 Current maintainer
6.5 Kernel boot parameters
1. Cirrus Logic LAN CS8900/CS8920 Ethernet Adapters
===================================================
1.1. Product Overview
=====================
The CS8900-based ISA Ethernet Adapters from Cirrus Logic follow
IEEE 802.3 standards and support half or full-duplex operation in ISA bus
computers on 10 Mbps Ethernet networks. The adapters are designed for operation
in 16-bit ISA or EISA bus expansion slots and are available in
10BaseT-only or 3-media configurations (10BaseT, 10Base2, and AUI for 10Base-5
or fiber networks).
CS8920-based adapters are similar to the CS8900-based adapter with additional
features for Plug and Play (PnP) support and Wakeup Frame recognition. As
such, the configuration procedures differ somewhat between the two types of
adapters. Refer to the "Adapter Configuration" section for details on
configuring both types of adapters.
1.2. Driver Description
=======================
The CS8900/CS8920 Ethernet Adapter driver for Linux supports the Linux
v2.3.48 or greater kernel. It can be compiled directly into the kernel
or loaded at run-time as a device driver module.
1.2.1 Driver Name: cs89x0
1.2.2 Files in the Driver Archive:
The files in the driver at Cirrus' website include:
=================== ====================================================
readme.txt this file
build batch file to compile cs89x0.c.
cs89x0.c driver C code
cs89x0.h driver header file
cs89x0.o pre-compiled module (for v2.2.5 kernel)
config/Config.in sample file to include cs89x0 driver in the kernel.
config/Makefile sample file to include cs89x0 driver in the kernel.
config/Space.c sample file to include cs89x0 driver in the kernel.
=================== ====================================================
1.3. System Requirements
------------------------
The following hardware is required:
* Cirrus Logic LAN (CS8900/20-based) Ethernet ISA Adapter
* IBM or IBM-compatible PC with:
* An 80386 or higher processor
* 16 bytes of contiguous IO space available between 210h - 370h
* One available IRQ (5,10,11,or 12 for the CS8900, 3-7,9-15 for CS8920).
* Appropriate cable (and connector for AUI, 10BASE-2) for your network
topology.
The following software is required:
* LINUX kernel version 2.3.48 or higher
* CS8900/20 Setup Utility (DOS-based)
* LINUX kernel sources for your kernel (if compiling into kernel)
* GNU Toolkit (gcc and make) v2.6 or above (if compiling into kernel
or a module)
1.4. Licensing Information
--------------------------
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, version 1.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
For a full copy of the GNU General Public License, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
2. Adapter Installation and Configuration
=========================================
Both the CS8900 and CS8920-based adapters can be configured using parameters
stored in an on-board EEPROM. You must use the DOS-based CS8900/20 Setup
Utility if you want to change the adapter's configuration in EEPROM.
When loading the driver as a module, you can specify many of the adapter's
configuration parameters on the command-line to override the EEPROM's settings
or for interface configuration when an EEPROM is not used. (CS8920-based
adapters must use an EEPROM.) See Section 3.0 LOADING THE DRIVER AS A MODULE.
Since the CS8900/20 Setup Utility is a DOS-based application, you must install
and configure the adapter in a DOS-based system using the CS8900/20 Setup
Utility before installation in the target LINUX system. (Not required if
installing a CS8900-based adapter and the default configuration is acceptable.)
2.1. CS8900-based Adapter Configuration
---------------------------------------
CS8900-based adapters shipped from Cirrus Logic have been configured
with the following "default" settings::
Operation Mode: Memory Mode
IRQ: 10
Base I/O Address: 300
Memory Base Address: D0000
Optimization: DOS Client
Transmission Mode: Half-duplex
BootProm: None
Media Type: Autodetect (3-media cards) or
10BASE-T (10BASE-T only adapter)
You should only change the default configuration settings if conflicts with
another adapter exists. To change the adapter's configuration, run the
CS8900/20 Setup Utility.
2.2. CS8920-based Adapter Configuration
---------------------------------------
CS8920-based adapters are shipped from Cirrus Logic configured as Plug
and Play (PnP) enabled. However, since the cs89x0 driver does NOT
support PnP, you must install the CS8920 adapter in a DOS-based PC and
run the CS8900/20 Setup Utility to disable PnP and configure the
adapter before installation in the target Linux system. Failure to do
this will leave the adapter inactive and the driver will be unable to
communicate with the adapter.
::
****************************************************************
* CS8920-BASED ADAPTERS: *
* *
* CS8920-BASED ADAPTERS ARE PLUG and PLAY ENABLED BY DEFAULT. *
* THE CS89X0 DRIVER DOES NOT SUPPORT PnP. THEREFORE, YOU MUST *
* RUN THE CS8900/20 SETUP UTILITY TO DISABLE PnP SUPPORT AND *
* TO ACTIVATE THE ADAPTER. *
****************************************************************
3. Loading the Driver as a Module
=================================
If the driver is compiled as a loadable module, you can load the driver module
with the 'modprobe' command. Many of the adapter's configuration parameters can
be specified as command-line arguments to the load command. This facility
provides a means to override the EEPROM's settings or for interface
configuration when an EEPROM is not used.
Example::
insmod cs89x0.o io=0x200 irq=0xA media=aui
This example loads the module and configures the adapter to use an IO port base
address of 200h, interrupt 10, and use the AUI media connection. The following
configuration options are available on the command line::
io=### - specify IO address (200h-360h)
irq=## - specify interrupt level
use_dma=1 - Enable DMA
dma=# - specify dma channel (Driver is compiled to support
Rx DMA only)
dmasize=# (16 or 64) - DMA size 16K or 64K. Default value is set to 16.
media=rj45 - specify media type
or media=bnc
or media=aui
or media=auto
duplex=full - specify forced half/full/autonegotiate duplex
or duplex=half
or duplex=auto
debug=# - debug level (only available if the driver was compiled
for debugging)
**Notes:**
a) If an EEPROM is present, any specified command-line parameter
will override the corresponding configuration value stored in
EEPROM.
b) The "io" parameter must be specified on the command-line.
c) The driver's hardware probe routine is designed to avoid
writing to I/O space until it knows that there is a cs89x0
card at the written addresses. This could cause problems
with device probing. To avoid this behaviour, add one
to the ``io=`` module parameter. This doesn't actually change
the I/O address, but it is a flag to tell the driver
to partially initialise the hardware before trying to
identify the card. This could be dangerous if you are
not sure that there is a cs89x0 card at the provided address.
For example, to scan for an adapter located at IO base 0x300,
specify an IO address of 0x301.
d) The "duplex=auto" parameter is only supported for the CS8920.
e) The minimum command-line configuration required if an EEPROM is
not present is:
io
irq
media type (no autodetect)
f) The following additional parameters are CS89XX defaults (values
used with no EEPROM or command-line argument).
* DMA Burst = enabled
* IOCHRDY Enabled = enabled
* UseSA = enabled
* CS8900 defaults to half-duplex if not specified on command-line
* CS8920 defaults to autoneg if not specified on command-line
* Use reset defaults for other config parameters
* dma_mode = 0
g) You can use ifconfig to set the adapter's Ethernet address.
h) Many Linux distributions use the 'modprobe' command to load
modules. This program uses the '/etc/conf.modules' file to
determine configuration information which is passed to a driver
module when it is loaded. All the configuration options which are
described above may be placed within /etc/conf.modules.
For example::
> cat /etc/conf.modules
...
alias eth0 cs89x0
options cs89x0 io=0x0200 dma=5 use_dma=1
...
In this example we are telling the module system that the
ethernet driver for this machine should use the cs89x0 driver. We
are asking 'modprobe' to pass the 'io', 'dma' and 'use_dma'
arguments to the driver when it is loaded.
i) Cirrus recommend that the cs89x0 use the ISA DMA channels 5, 6 or
7. You will probably find that other DMA channels will not work.
j) The cs89x0 supports DMA for receiving only. DMA mode is
significantly more efficient. Flooding a 400 MHz Celeron machine
with large ping packets consumes 82% of its CPU capacity in non-DMA
mode. With DMA this is reduced to 45%.
k) If your Linux kernel was compiled with inbuilt plug-and-play
support you will be able to find information about the cs89x0 card
with the command::
cat /proc/isapnp
l) If during DMA operation you find erratic behavior or network data
corruption you should use your PC's BIOS to slow the EISA bus clock.
m) If the cs89x0 driver is compiled directly into the kernel
(non-modular) then its I/O address is automatically determined by
ISA bus probing. The IRQ number, media options, etc are determined
from the card's EEPROM.
n) If the cs89x0 driver is compiled directly into the kernel, DMA
mode may be selected by providing the kernel with a boot option
'cs89x0_dma=N' where 'N' is the desired DMA channel number (5, 6 or 7).
Kernel boot options may be provided on the LILO command line::
LILO boot: linux cs89x0_dma=5
or they may be placed in /etc/lilo.conf::
image=/boot/bzImage-2.3.48
append="cs89x0_dma=5"
label=linux
root=/dev/hda5
read-only
The DMA Rx buffer size is hardwired to 16 kbytes in this mode.
(64k mode is not available).
4. Compiling the Driver
=======================
The cs89x0 driver can be compiled directly into the kernel or compiled into
a loadable device driver module.
Just use the standard way to configure the driver and compile the Kernel.
4.1. Compiling the Driver to Support Rx DMA
-------------------------------------------
The compile-time optionality for DMA was removed in the 2.3 kernel
series. DMA support is now unconditionally part of the driver. It is
enabled by the 'use_dma=1' module option.
5. Testing and Troubleshooting
==============================
5.1. Known Defects and Limitations
----------------------------------
Refer to the RELEASE.TXT file distributed as part of this archive for a list of
known defects, driver limitations, and work arounds.
5.2. Testing the Adapter
------------------------
Once the adapter has been installed and configured, the diagnostic option of
the CS8900/20 Setup Utility can be used to test the functionality of the
adapter and its network connection. Use the diagnostics 'Self Test' option to
test the functionality of the adapter with the hardware configuration you have
assigned. You can use the diagnostics 'Network Test' to test the ability of the
adapter to communicate across the Ethernet with another PC equipped with a
CS8900/20-based adapter card (it must also be running the CS8900/20 Setup
Utility).
.. note::
The Setup Utility's diagnostics are designed to run in a
DOS-only operating system environment. DO NOT run the diagnostics
from a DOS or command prompt session under Windows 95, Windows NT,
OS/2, or other operating system.
To run the diagnostics tests on the CS8900/20 adapter:
1. Boot DOS on the PC and start the CS8900/20 Setup Utility.
2. The adapter's current configuration is displayed. Hit the ENTER key to
get to the main menu.
4. Select 'Diagnostics' (ALT-G) from the main menu.
* Select 'Self-Test' to test the adapter's basic functionality.
* Select 'Network Test' to test the network connection and cabling.
5.2.1. Diagnostic Self-test
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The diagnostic self-test checks the adapter's basic functionality as well as
its ability to communicate across the ISA bus based on the system resources
assigned during hardware configuration. The following tests are performed:
* IO Register Read/Write Test
The IO Register Read/Write test insures that the CS8900/20 can be
accessed in IO mode, and that the IO base address is correct.
* Shared Memory Test
The Shared Memory test insures the CS8900/20 can be accessed in memory
mode and that the range of memory addresses assigned does not conflict
with other devices in the system.
* Interrupt Test
The Interrupt test insures there are no conflicts with the assigned IRQ
signal.
* EEPROM Test
The EEPROM test insures the EEPROM can be read.
* Chip RAM Test
The Chip RAM test insures the 4K of memory internal to the CS8900/20 is
working properly.
* Internal Loop-back Test
The Internal Loop Back test insures the adapter's transmitter and
receiver are operating properly. If this test fails, make sure the
adapter's cable is connected to the network (check for LED activity for
example).
* Boot PROM Test
The Boot PROM test insures the Boot PROM is present, and can be read.
Failure indicates the Boot PROM was not successfully read due to a
hardware problem or due to a conflicts on the Boot PROM address
assignment. (Test only applies if the adapter is configured to use the
Boot PROM option.)
Failure of a test item indicates a possible system resource conflict with
another device on the ISA bus. In this case, you should use the Manual Setup
option to reconfigure the adapter by selecting a different value for the system
resource that failed.
5.2.2. Diagnostic Network Test
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Diagnostic Network Test verifies a working network connection by
transferring data between two CS8900/20 adapters installed in different PCs
on the same network. (Note: the diagnostic network test should not be run
between two nodes across a router.)
This test requires that each of the two PCs have a CS8900/20-based adapter
installed and have the CS8900/20 Setup Utility running. The first PC is
configured as a Responder and the other PC is configured as an Initiator.
Once the Initiator is started, it sends data frames to the Responder which
returns the frames to the Initiator.
The total number of frames received and transmitted are displayed on the
Initiator's display, along with a count of the number of frames received and
transmitted OK or in error. The test can be terminated anytime by the user at
either PC.
To setup the Diagnostic Network Test:
1. Select a PC with a CS8900/20-based adapter and a known working network
connection to act as the Responder. Run the CS8900/20 Setup Utility
and select 'Diagnostics -> Network Test -> Responder' from the main
menu. Hit ENTER to start the Responder.
2. Return to the PC with the CS8900/20-based adapter you want to test and
start the CS8900/20 Setup Utility.
3. From the main menu, Select 'Diagnostic -> Network Test -> Initiator'.
Hit ENTER to start the test.
You may stop the test on the Initiator at any time while allowing the Responder
to continue running. In this manner, you can move to additional PCs and test
them by starting the Initiator on another PC without having to stop/start the
Responder.
5.3. Using the Adapter's LEDs
-----------------------------
The 2 and 3-media adapters have two LEDs visible on the back end of the board
located near the 10Base-T connector.
Link Integrity LED: A "steady" ON of the green LED indicates a valid 10Base-T
connection. (Only applies to 10Base-T. The green LED has no significance for
a 10Base-2 or AUI connection.)
TX/RX LED: The yellow LED lights briefly each time the adapter transmits or
receives data. (The yellow LED will appear to "flicker" on a typical network.)
5.4. Resolving I/O Conflicts
----------------------------
An IO conflict occurs when two or more adapter use the same ISA resource (IO
address, memory address or IRQ). You can usually detect an IO conflict in one
of four ways after installing and or configuring the CS8900/20-based adapter:
1. The system does not boot properly (or at all).
2. The driver cannot communicate with the adapter, reporting an "Adapter
not found" error message.
3. You cannot connect to the network or the driver will not load.
4. If you have configured the adapter to run in memory mode but the driver
reports it is using IO mode when loading, this is an indication of a
memory address conflict.
If an IO conflict occurs, run the CS8900/20 Setup Utility and perform a
diagnostic self-test. Normally, the ISA resource in conflict will fail the
self-test. If so, reconfigure the adapter selecting another choice for the
resource in conflict. Run the diagnostics again to check for further IO
conflicts.
In some cases, such as when the PC will not boot, it may be necessary to remove
the adapter and reconfigure it by installing it in another PC to run the
CS8900/20 Setup Utility. Once reinstalled in the target system, run the
diagnostics self-test to ensure the new configuration is free of conflicts
before loading the driver again.
When manually configuring the adapter, keep in mind the typical ISA system
resource usage as indicated in the tables below.
::
I/O Address Device IRQ Device
----------- -------- --- --------
200-20F Game I/O adapter 3 COM2, Bus Mouse
230-23F Bus Mouse 4 COM1
270-27F LPT3: third parallel port 5 LPT2
2F0-2FF COM2: second serial port 6 Floppy Disk controller
320-32F Fixed disk controller 7 LPT1
8 Real-time Clock
9 EGA/VGA display adapter
12 Mouse (PS/2)
Memory Address Device 13 Math Coprocessor
-------------- --------------------- 14 Hard Disk controller
A000-BFFF EGA Graphics Adapter
A000-C7FF VGA Graphics Adapter
B000-BFFF Mono Graphics Adapter
B800-BFFF Color Graphics Adapter
E000-FFFF AT BIOS
6. Technical Support
====================
6.1. Contacting Cirrus Logic's Technical Support
------------------------------------------------
Cirrus Logic's CS89XX Technical Application Support can be reached at::
Telephone :(800) 888-5016 (from inside U.S. and Canada)
:(512) 442-7555 (from outside the U.S. and Canada)
Fax :(512) 912-3871
Email :ethernet@crystal.cirrus.com
WWW :http://www.cirrus.com
6.2. Information Required before Contacting Technical Support
-------------------------------------------------------------
Before contacting Cirrus Logic for technical support, be prepared to provide as
Much of the following information as possible.
1.) Adapter type (CRD8900, CDB8900, CDB8920, etc.)
2.) Adapter configuration
* IO Base, Memory Base, IO or memory mode enabled, IRQ, DMA channel
* Plug and Play enabled/disabled (CS8920-based adapters only)
* Configured for media auto-detect or specific media type (which type).
3.) PC System's Configuration
* Plug and Play system (yes/no)
* BIOS (make and version)
* System make and model
* CPU (type and speed)
* System RAM
* SCSI Adapter
4.) Software
* CS89XX driver and version
* Your network operating system and version
* Your system's OS version
* Version of all protocol support files
5.) Any Error Message displayed.
6.3 Obtaining the Latest Driver Version
---------------------------------------
You can obtain the latest CS89XX drivers and support software from Cirrus Logic's
Web site. You can also contact Cirrus Logic's Technical Support (email:
ethernet@crystal.cirrus.com) and request that you be registered for automatic
software-update notification.
Cirrus Logic maintains a web page at http://www.cirrus.com with the
latest drivers and technical publications.
6.4. Current maintainer
-----------------------
In February 2000 the maintenance of this driver was assumed by Andrew
Morton.
6.5 Kernel module parameters
----------------------------
For use in embedded environments with no cs89x0 EEPROM, the kernel boot
parameter ``cs89x0_media=`` has been implemented. Usage is::
cs89x0_media=rj45 or
cs89x0_media=aui or
cs89x0_media=bnc

View File

@@ -0,0 +1,171 @@
.. SPDX-License-Identifier: GPL-2.0
=====================
DM9000 Network driver
=====================
Copyright 2008 Simtec Electronics,
Ben Dooks <ben@simtec.co.uk> <ben-linux@fluff.org>
Introduction
------------
This file describes how to use the DM9000 platform-device based network driver
that is contained in the files drivers/net/dm9000.c and drivers/net/dm9000.h.
The driver supports three DM9000 variants, the DM9000E which is the first chip
supported as well as the newer DM9000A and DM9000B devices. It is currently
maintained and tested by Ben Dooks, who should be CC: to any patches for this
driver.
Defining the platform device
----------------------------
The minimum set of resources attached to the platform device are as follows:
1) The physical address of the address register
2) The physical address of the data register
3) The IRQ line the device's interrupt pin is connected to.
These resources should be specified in that order, as the ordering of the
two address regions is important (the driver expects these to be address
and then data).
An example from arch/arm/mach-s3c2410/mach-bast.c is::
static struct resource bast_dm9k_resource[] = {
[0] = {
.start = S3C2410_CS5 + BAST_PA_DM9000,
.end = S3C2410_CS5 + BAST_PA_DM9000 + 3,
.flags = IORESOURCE_MEM,
},
[1] = {
.start = S3C2410_CS5 + BAST_PA_DM9000 + 0x40,
.end = S3C2410_CS5 + BAST_PA_DM9000 + 0x40 + 0x3f,
.flags = IORESOURCE_MEM,
},
[2] = {
.start = IRQ_DM9000,
.end = IRQ_DM9000,
.flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,
}
};
static struct platform_device bast_device_dm9k = {
.name = "dm9000",
.id = 0,
.num_resources = ARRAY_SIZE(bast_dm9k_resource),
.resource = bast_dm9k_resource,
};
Note the setting of the IRQ trigger flag in bast_dm9k_resource[2].flags,
as this will generate a warning if it is not present. The trigger from
the flags field will be passed to request_irq() when registering the IRQ
handler to ensure that the IRQ is setup correctly.
This shows a typical platform device, without the optional configuration
platform data supplied. The next example uses the same resources, but adds
the optional platform data to pass extra configuration data::
static struct dm9000_plat_data bast_dm9k_platdata = {
.flags = DM9000_PLATF_16BITONLY,
};
static struct platform_device bast_device_dm9k = {
.name = "dm9000",
.id = 0,
.num_resources = ARRAY_SIZE(bast_dm9k_resource),
.resource = bast_dm9k_resource,
.dev = {
.platform_data = &bast_dm9k_platdata,
}
};
The platform data is defined in include/linux/dm9000.h and described below.
Platform data
-------------
Extra platform data for the DM9000 can describe the IO bus width to the
device, whether or not an external PHY is attached to the device and
the availability of an external configuration EEPROM.
The flags for the platform data .flags field are as follows:
DM9000_PLATF_8BITONLY
The IO should be done with 8bit operations.
DM9000_PLATF_16BITONLY
The IO should be done with 16bit operations.
DM9000_PLATF_32BITONLY
The IO should be done with 32bit operations.
DM9000_PLATF_EXT_PHY
The chip is connected to an external PHY.
DM9000_PLATF_NO_EEPROM
This can be used to signify that the board does not have an
EEPROM, or that the EEPROM should be hidden from the user.
DM9000_PLATF_SIMPLE_PHY
Switch to using the simpler PHY polling method which does not
try and read the MII PHY state regularly. This is only available
when using the internal PHY. See the section on link state polling
for more information.
The config symbol DM9000_FORCE_SIMPLE_PHY_POLL, Kconfig entry
"Force simple NSR based PHY polling" allows this flag to be
forced on at build time.
PHY Link state polling
----------------------
The driver keeps track of the link state and informs the network core
about link (carrier) availability. This is managed by several methods
depending on the version of the chip and on which PHY is being used.
For the internal PHY, the original (and currently default) method is
to read the MII state, either when the status changes if we have the
necessary interrupt support in the chip or every two seconds via a
periodic timer.
To reduce the overhead for the internal PHY, there is now the option
of using the DM9000_FORCE_SIMPLE_PHY_POLL config, or DM9000_PLATF_SIMPLE_PHY
platform data option to read the summary information without the
expensive MII accesses. This method is faster, but does not print
as much information.
When using an external PHY, the driver currently has to poll the MII
link status as there is no method for getting an interrupt on link change.
DM9000A / DM9000B
-----------------
These chips are functionally similar to the DM9000E and are supported easily
by the same driver. The features are:
1) Interrupt on internal PHY state change. This means that the periodic
polling of the PHY status may be disabled on these devices when using
the internal PHY.
2) TCP/UDP checksum offloading, which the driver does not currently support.
ethtool
-------
The driver supports the ethtool interface for access to the driver
state information, the PHY state and the EEPROM.

View File

@@ -0,0 +1,189 @@
.. SPDX-License-Identifier: GPL-2.0
===================================
DEC EtherWORKS Ethernet De4x5 cards
===================================
Originally, this driver was written for the Digital Equipment
Corporation series of EtherWORKS Ethernet cards:
- DE425 TP/COAX EISA
- DE434 TP PCI
- DE435 TP/COAX/AUI PCI
- DE450 TP/COAX/AUI PCI
- DE500 10/100 PCI Fasternet
but it will now attempt to support all cards which conform to the
Digital Semiconductor SROM Specification. The driver currently
recognises the following chips:
- DC21040 (no SROM)
- DC21041[A]
- DC21140[A]
- DC21142
- DC21143
So far the driver is known to work with the following cards:
- KINGSTON
- Linksys
- ZNYX342
- SMC8432
- SMC9332 (w/new SROM)
- ZNYX31[45]
- ZNYX346 10/100 4 port (can act as a 10/100 bridge!)
The driver has been tested on a relatively busy network using the DE425,
DE434, DE435 and DE500 cards and benchmarked with 'ttcp': it transferred
16M of data to a DECstation 5000/200 as follows::
TCP UDP
TX RX TX RX
DE425 1030k 997k 1170k 1128k
DE434 1063k 995k 1170k 1125k
DE435 1063k 995k 1170k 1125k
DE500 1063k 998k 1170k 1125k in 10Mb/s mode
All values are typical (in kBytes/sec) from a sample of 4 for each
measurement. Their error is +/-20k on a quiet (private) network and also
depend on what load the CPU has.
----------------------------------------------------------------------------
The ability to load this driver as a loadable module has been included
and used extensively during the driver development (to save those long
reboot sequences). Loadable module support under PCI and EISA has been
achieved by letting the driver autoprobe as if it were compiled into the
kernel. Do make sure you're not sharing interrupts with anything that
cannot accommodate interrupt sharing!
To utilise this ability, you have to do 8 things:
0) have a copy of the loadable modules code installed on your system.
1) copy de4x5.c from the /linux/drivers/net directory to your favourite
temporary directory.
2) for fixed autoprobes (not recommended), edit the source code near
line 5594 to reflect the I/O address you're using, or assign these when
loading by::
insmod de4x5 io=0xghh where g = bus number
hh = device number
.. note::
autoprobing for modules is now supported by default. You may just
use::
insmod de4x5
to load all available boards. For a specific board, still use
the 'io=?' above.
3) compile de4x5.c, but include -DMODULE in the command line to ensure
that the correct bits are compiled (see end of source code).
4) if you are wanting to add a new card, goto 5. Otherwise, recompile a
kernel with the de4x5 configuration turned off and reboot.
5) insmod de4x5 [io=0xghh]
6) run the net startup bits for your new eth?? interface(s) manually
(usually /etc/rc.inet[12] at boot time).
7) enjoy!
To unload a module, turn off the associated interface(s)
'ifconfig eth?? down' then 'rmmod de4x5'.
Automedia detection is included so that in principle you can disconnect
from, e.g. TP, reconnect to BNC and things will still work (after a
pause while the driver figures out where its media went). My tests
using ping showed that it appears to work....
By default, the driver will now autodetect any DECchip based card.
Should you have a need to restrict the driver to DIGITAL only cards, you
can compile with a DEC_ONLY define, or if loading as a module, use the
'dec_only=1' parameter.
I've changed the timing routines to use the kernel timer and scheduling
functions so that the hangs and other assorted problems that occurred
while autosensing the media should be gone. A bonus for the DC21040
auto media sense algorithm is that it can now use one that is more in
line with the rest (the DC21040 chip doesn't have a hardware timer).
The downside is the 1 'jiffies' (10ms) resolution.
IEEE 802.3u MII interface code has been added in anticipation that some
products may use it in the future.
The SMC9332 card has a non-compliant SROM which needs fixing - I have
patched this driver to detect it because the SROM format used complies
to a previous DEC-STD format.
I have removed the buffer copies needed for receive on Intels. I cannot
remove them for Alphas since the Tulip hardware only does longword
aligned DMA transfers and the Alphas get alignment traps with non
longword aligned data copies (which makes them really slow). No comment.
I have added SROM decoding routines to make this driver work with any
card that supports the Digital Semiconductor SROM spec. This will help
all cards running the dc2114x series chips in particular. Cards using
the dc2104x chips should run correctly with the basic driver. I'm in
debt to <mjacob@feral.com> for the testing and feedback that helped get
this feature working. So far we have tested KINGSTON, SMC8432, SMC9332
(with the latest SROM complying with the SROM spec V3: their first was
broken), ZNYX342 and LinkSys. ZNYX314 (dual 21041 MAC) and ZNYX 315
(quad 21041 MAC) cards also appear to work despite their incorrectly
wired IRQs.
I have added a temporary fix for interrupt problems when some SCSI cards
share the same interrupt as the DECchip based cards. The problem occurs
because the SCSI card wants to grab the interrupt as a fast interrupt
(runs the service routine with interrupts turned off) vs. this card
which really needs to run the service routine with interrupts turned on.
This driver will now add the interrupt service routine as a fast
interrupt if it is bounced from the slow interrupt. THIS IS NOT A
RECOMMENDED WAY TO RUN THE DRIVER and has been done for a limited time
until people sort out their compatibility issues and the kernel
interrupt service code is fixed. YOU SHOULD SEPARATE OUT THE FAST
INTERRUPT CARDS FROM THE SLOW INTERRUPT CARDS to ensure that they do not
run on the same interrupt. PCMCIA/CardBus is another can of worms...
Finally, I think I have really fixed the module loading problem with
more than one DECchip based card. As a side effect, I don't mess with
the device structure any more which means that if more than 1 card in
2.0.x is installed (4 in 2.1.x), the user will have to edit
linux/drivers/net/Space.c to make room for them. Hence, module loading
is the preferred way to use this driver, since it doesn't have this
limitation.
Where SROM media detection is used and full duplex is specified in the
SROM, the feature is ignored unless lp->params.fdx is set at compile
time OR during a module load (insmod de4x5 args='eth??:fdx' [see
below]). This is because there is no way to automatically detect full
duplex links except through autonegotiation. When I include the
autonegotiation feature in the SROM autoconf code, this detection will
occur automatically for that case.
Command line arguments are now allowed, similar to passing arguments
through LILO. This will allow a per adapter board set up of full duplex
and media. The only lexical constraints are: the board name (dev->name)
appears in the list before its parameters. The list of parameters ends
either at the end of the parameter list or with another board name. The
following parameters are allowed:
========= ===============================================
fdx for full duplex
autosense to set the media/speed; with the following
sub-parameters:
TP, TP_NW, BNC, AUI, BNC_AUI, 100Mb, 10Mb, AUTO
========= ===============================================
Case sensitivity is important for the sub-parameters. They *must* be
upper case. Examples::
insmod de4x5 args='eth1:fdx autosense=BNC eth0:autosense=100Mb'.
For a compiled in driver, in linux/drivers/net/CONFIG, place e.g.::
DE4X5_OPTS = -DDE4X5_PARM='"eth0:fdx autosense=AUI eth2:autosense=TP"'
Yes, I know full duplex isn't permissible on BNC or AUI; they're just
examples. By default, full duplex is turned off and AUTO is the default
autosense setting. In reality, I expect only the full duplex option to
be used. Note the use of single quotes in the two examples above and the
lack of commas to separate items.

View File

@@ -0,0 +1,71 @@
.. SPDX-License-Identifier: GPL-2.0
==============================================================
Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux
==============================================================
Note: This driver doesn't have a maintainer.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
This driver provides kernel support for Davicom DM9102(A)/DM9132/DM9801 ethernet cards ( CNET
10/100 ethernet cards uses Davicom chipset too, so this driver supports CNET cards too ).If you
didn't compile this driver as a module, it will automatically load itself on boot and print a
line similar to::
dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17)
If you compiled this driver as a module, you have to load it on boot.You can load it with command::
insmod dmfe
This way it will autodetect the device mode.This is the suggested way to load the module.Or you can pass
a mode= setting to module while loading, like::
insmod dmfe mode=0 # Force 10M Half Duplex
insmod dmfe mode=1 # Force 100M Half Duplex
insmod dmfe mode=4 # Force 10M Full Duplex
insmod dmfe mode=5 # Force 100M Full Duplex
Next you should configure your network interface with a command similar to::
ifconfig eth0 172.22.3.18
^^^^^^^^^^^
Your IP Address
Then you may have to modify the default routing table with command::
route add default eth0
Now your ethernet card should be up and running.
TODO:
- Implement pci_driver::suspend() and pci_driver::resume() power management methods.
- Check on 64 bit boxes.
- Check and fix on big endian boxes.
- Test and make sure PCI latency is now correct for all cases.
Authors:
Sten Wang <sten_wang@davicom.com.tw > : Original Author
Contributors:
- Marcelo Tosatti <marcelo@conectiva.com.br>
- Alan Cox <alan@lxorguk.ukuu.org.uk>
- Jeff Garzik <jgarzik@pobox.com>
- Vojtech Pavlik <vojtech@suse.cz>

View File

@@ -0,0 +1,314 @@
.. SPDX-License-Identifier: GPL-2.0
=========================================================
D-Link DL2000-based Gigabit Ethernet Adapter Installation
=========================================================
May 23, 2002
.. Contents
- Compatibility List
- Quick Install
- Compiling the Driver
- Installing the Driver
- Option parameter
- Configuration Script Sample
- Troubleshooting
Compatibility List
==================
Adapter Support:
- D-Link DGE-550T Gigabit Ethernet Adapter.
- D-Link DGE-550SX Gigabit Ethernet Adapter.
- D-Link DL2000-based Gigabit Ethernet Adapter.
The driver support Linux kernel 2.4.7 later. We had tested it
on the environments below.
. Red Hat v6.2 (update kernel to 2.4.7)
. Red Hat v7.0 (update kernel to 2.4.7)
. Red Hat v7.1 (kernel 2.4.7)
. Red Hat v7.2 (kernel 2.4.7-10)
Quick Install
=============
Install linux driver as following command::
1. make all
2. insmod dl2k.ko
3. ifconfig eth0 up 10.xxx.xxx.xxx netmask 255.0.0.0
^^^^^^^^^^^^^^^\ ^^^^^^^^\
IP NETMASK
Now eth0 should active, you can test it by "ping" or get more information by
"ifconfig". If tested ok, continue the next step.
4. ``cp dl2k.ko /lib/modules/`uname -r`/kernel/drivers/net``
5. Add the following line to /etc/modprobe.d/dl2k.conf::
alias eth0 dl2k
6. Run ``depmod`` to updated module indexes.
7. Run ``netconfig`` or ``netconf`` to create configuration script ifcfg-eth0
located at /etc/sysconfig/network-scripts or create it manually.
[see - Configuration Script Sample]
8. Driver will automatically load and configure at next boot time.
Compiling the Driver
====================
In Linux, NIC drivers are most commonly configured as loadable modules.
The approach of building a monolithic kernel has become obsolete. The driver
can be compiled as part of a monolithic kernel, but is strongly discouraged.
The remainder of this section assumes the driver is built as a loadable module.
In the Linux environment, it is a good idea to rebuild the driver from the
source instead of relying on a precompiled version. This approach provides
better reliability since a precompiled driver might depend on libraries or
kernel features that are not present in a given Linux installation.
The 3 files necessary to build Linux device driver are dl2k.c, dl2k.h and
Makefile. To compile, the Linux installation must include the gcc compiler,
the kernel source, and the kernel headers. The Linux driver supports Linux
Kernels 2.4.7. Copy the files to a directory and enter the following command
to compile and link the driver:
CD-ROM drive
------------
::
[root@XXX /] mkdir cdrom
[root@XXX /] mount -r -t iso9660 -o conv=auto /dev/cdrom /cdrom
[root@XXX /] cd root
[root@XXX /root] mkdir dl2k
[root@XXX /root] cd dl2k
[root@XXX dl2k] cp /cdrom/linux/dl2k.tgz /root/dl2k
[root@XXX dl2k] tar xfvz dl2k.tgz
[root@XXX dl2k] make all
Floppy disc drive
-----------------
::
[root@XXX /] cd root
[root@XXX /root] mkdir dl2k
[root@XXX /root] cd dl2k
[root@XXX dl2k] mcopy a:/linux/dl2k.tgz /root/dl2k
[root@XXX dl2k] tar xfvz dl2k.tgz
[root@XXX dl2k] make all
Installing the Driver
=====================
Manual Installation
-------------------
Once the driver has been compiled, it must be loaded, enabled, and bound
to a protocol stack in order to establish network connectivity. To load a
module enter the command::
insmod dl2k.o
or::
insmod dl2k.o <optional parameter> ; add parameter
---------------------------------------------------------
example::
insmod dl2k.o media=100mbps_hd
or::
insmod dl2k.o media=3
or::
insmod dl2k.o media=3,2 ; for 2 cards
---------------------------------------------------------
Please reference the list of the command line parameters supported by
the Linux device driver below.
The insmod command only loads the driver and gives it a name of the form
eth0, eth1, etc. To bring the NIC into an operational state,
it is necessary to issue the following command::
ifconfig eth0 up
Finally, to bind the driver to the active protocol (e.g., TCP/IP with
Linux), enter the following command::
ifup eth0
Note that this is meaningful only if the system can find a configuration
script that contains the necessary network information. A sample will be
given in the next paragraph.
The commands to unload a driver are as follows::
ifdown eth0
ifconfig eth0 down
rmmod dl2k.o
The following are the commands to list the currently loaded modules and
to see the current network configuration::
lsmod
ifconfig
Automated Installation
----------------------
This section describes how to install the driver such that it is
automatically loaded and configured at boot time. The following description
is based on a Red Hat 6.0/7.0 distribution, but it can easily be ported to
other distributions as well.
Red Hat v6.x/v7.x
-----------------
1. Copy dl2k.o to the network modules directory, typically
/lib/modules/2.x.x-xx/net or /lib/modules/2.x.x/kernel/drivers/net.
2. Locate the boot module configuration file, most commonly in the
/etc/modprobe.d/ directory. Add the following lines::
alias ethx dl2k
options dl2k <optional parameters>
where ethx will be eth0 if the NIC is the only ethernet adapter, eth1 if
one other ethernet adapter is installed, etc. Refer to the table in the
previous section for the list of optional parameters.
3. Locate the network configuration scripts, normally the
/etc/sysconfig/network-scripts directory, and create a configuration
script named ifcfg-ethx that contains network information.
4. Note that for most Linux distributions, Red Hat included, a configuration
utility with a graphical user interface is provided to perform steps 2
and 3 above.
Parameter Description
=====================
You can install this driver without any additional parameter. However, if you
are going to have extensive functions then it is necessary to set extra
parameter. Below is a list of the command line parameters supported by the
Linux device
driver.
=============================== ==============================================
mtu=packet_size Specifies the maximum packet size. default
is 1500.
media=media_type Specifies the media type the NIC operates at.
autosense Autosensing active media.
=========== =========================
10mbps_hd 10Mbps half duplex.
10mbps_fd 10Mbps full duplex.
100mbps_hd 100Mbps half duplex.
100mbps_fd 100Mbps full duplex.
1000mbps_fd 1000Mbps full duplex.
1000mbps_hd 1000Mbps half duplex.
0 Autosensing active media.
1 10Mbps half duplex.
2 10Mbps full duplex.
3 100Mbps half duplex.
4 100Mbps full duplex.
5 1000Mbps half duplex.
6 1000Mbps full duplex.
=========== =========================
By default, the NIC operates at autosense.
1000mbps_fd and 1000mbps_hd types are only
available for fiber adapter.
vlan=n Specifies the VLAN ID. If vlan=0, the
Virtual Local Area Network (VLAN) function is
disable.
jumbo=[0|1] Specifies the jumbo frame support. If jumbo=1,
the NIC accept jumbo frames. By default, this
function is disabled.
Jumbo frame usually improve the performance
int gigabit.
This feature need jumbo frame compatible
remote.
rx_coalesce=m Number of rx frame handled each interrupt.
rx_timeout=n Rx DMA wait time for an interrupt.
If set rx_coalesce > 0, hardware only assert
an interrupt for m frames. Hardware won't
assert rx interrupt until m frames received or
reach timeout of n * 640 nano seconds.
Set proper rx_coalesce and rx_timeout can
reduce congestion collapse and overload which
has been a bottleneck for high speed network.
For example, rx_coalesce=10 rx_timeout=800.
that is, hardware assert only 1 interrupt
for 10 frames received or timeout of 512 us.
tx_coalesce=n Number of tx frame handled each interrupt.
Set n > 1 can reduce the interrupts
congestion usually lower performance of
high speed network card. Default is 16.
tx_flow=[1|0] Specifies the Tx flow control. If tx_flow=0,
the Tx flow control disable else driver
autodetect.
rx_flow=[1|0] Specifies the Rx flow control. If rx_flow=0,
the Rx flow control enable else driver
autodetect.
=============================== ==============================================
Configuration Script Sample
===========================
Here is a sample of a simple configuration script::
DEVICE=eth0
USERCTL=no
ONBOOT=yes
POOTPROTO=none
BROADCAST=207.200.5.255
NETWORK=207.200.5.0
NETMASK=255.255.255.0
IPADDR=207.200.5.2
Troubleshooting
===============
Q1. Source files contain ^ M behind every line.
Make sure all files are Unix file format (no LF). Try the following
shell command to convert files::
cat dl2k.c | col -b > dl2k.tmp
mv dl2k.tmp dl2k.c
OR::
cat dl2k.c | tr -d "\r" > dl2k.tmp
mv dl2k.tmp dl2k.c
Q2: Could not find header files (``*.h``)?
To compile the driver, you need kernel header files. After
installing the kernel source, the header files are usually located in
/usr/src/linux/include, which is the default include directory configured
in Makefile. For some distributions, there is a copy of header files in
/usr/src/include/linux and /usr/src/include/asm, that you can change the
INCLUDEDIR in Makefile to /usr/include without installing kernel source.
Note that RH 7.0 didn't provide correct header files in /usr/include,
including those files will make a wrong version driver.

View File

@@ -0,0 +1,269 @@
.. SPDX-License-Identifier: GPL-2.0
==============================
The QorIQ DPAA Ethernet Driver
==============================
Authors:
- Madalin Bucur <madalin.bucur@nxp.com>
- Camelia Groza <camelia.groza@nxp.com>
.. Contents
- DPAA Ethernet Overview
- DPAA Ethernet Supported SoCs
- Configuring DPAA Ethernet in your kernel
- DPAA Ethernet Frame Processing
- DPAA Ethernet Features
- DPAA IRQ Affinity and Receive Side Scaling
- Debugging
DPAA Ethernet Overview
======================
DPAA stands for Data Path Acceleration Architecture and it is a
set of networking acceleration IPs that are available on several
generations of SoCs, both on PowerPC and ARM64.
The Freescale DPAA architecture consists of a series of hardware blocks
that support Ethernet connectivity. The Ethernet driver depends upon the
following drivers in the Linux kernel:
- Peripheral Access Memory Unit (PAMU) (* needed only for PPC platforms)
drivers/iommu/fsl_*
- Frame Manager (FMan)
drivers/net/ethernet/freescale/fman
- Queue Manager (QMan), Buffer Manager (BMan)
drivers/soc/fsl/qbman
A simplified view of the dpaa_eth interfaces mapped to FMan MACs::
dpaa_eth /eth0\ ... /ethN\
driver | | | |
------------- ---- ----------- ---- -------------
-Ports / Tx Rx \ ... / Tx Rx \
FMan | | | |
-MACs | MAC0 | | MACN |
/ dtsec0 \ ... / dtsecN \ (or tgec)
/ \ / \(or memac)
--------- -------------- --- -------------- ---------
FMan, FMan Port, FMan SP, FMan MURAM drivers
---------------------------------------------------------
FMan HW blocks: MURAM, MACs, Ports, SP
---------------------------------------------------------
The dpaa_eth relation to the QMan, BMan and FMan::
________________________________
dpaa_eth / eth0 \
driver / \
--------- -^- -^- -^- --- ---------
QMan driver / \ / \ / \ \ / | BMan |
|Rx | |Rx | |Tx | |Tx | | driver |
--------- |Dfl| |Err| |Cnf| |FQs| | |
QMan HW |FQ | |FQ | |FQs| | | | |
/ \ / \ / \ \ / | |
--------- --- --- --- -v- ---------
| FMan QMI | |
| FMan HW FMan BMI | BMan HW |
----------------------- --------
where the acronyms used above (and in the code) are:
=============== ===========================================================
DPAA Data Path Acceleration Architecture
FMan DPAA Frame Manager
QMan DPAA Queue Manager
BMan DPAA Buffers Manager
QMI QMan interface in FMan
BMI BMan interface in FMan
FMan SP FMan Storage Profiles
MURAM Multi-user RAM in FMan
FQ QMan Frame Queue
Rx Dfl FQ default reception FQ
Rx Err FQ Rx error frames FQ
Tx Cnf FQ Tx confirmation FQs
Tx FQs transmission frame queues
dtsec datapath three speed Ethernet controller (10/100/1000 Mbps)
tgec ten gigabit Ethernet controller (10 Gbps)
memac multirate Ethernet MAC (10/100/1000/10000)
=============== ===========================================================
DPAA Ethernet Supported SoCs
============================
The DPAA drivers enable the Ethernet controllers present on the following SoCs:
PPC
- P1023
- P2041
- P3041
- P4080
- P5020
- P5040
- T1023
- T1024
- T1040
- T1042
- T2080
- T4240
- B4860
ARM
- LS1043A
- LS1046A
Configuring DPAA Ethernet in your kernel
========================================
To enable the DPAA Ethernet driver, the following Kconfig options are required::
# common for arch/arm64 and arch/powerpc platforms
CONFIG_FSL_DPAA=y
CONFIG_FSL_FMAN=y
CONFIG_FSL_DPAA_ETH=y
CONFIG_FSL_XGMAC_MDIO=y
# for arch/powerpc only
CONFIG_FSL_PAMU=y
# common options needed for the PHYs used on the RDBs
CONFIG_VITESSE_PHY=y
CONFIG_REALTEK_PHY=y
CONFIG_AQUANTIA_PHY=y
DPAA Ethernet Frame Processing
==============================
On Rx, buffers for the incoming frames are retrieved from the buffers found
in the dedicated interface buffer pool. The driver initializes and seeds these
with one page buffers.
On Tx, all transmitted frames are returned to the driver through Tx
confirmation frame queues. The driver is then responsible for freeing the
buffers. In order to do this properly, a backpointer is added to the buffer
before transmission that points to the skb. When the buffer returns to the
driver on a confirmation FQ, the skb can be correctly consumed.
DPAA Ethernet Features
======================
Currently the DPAA Ethernet driver enables the basic features required for
a Linux Ethernet driver. The support for advanced features will be added
gradually.
The driver has Rx and Tx checksum offloading for UDP and TCP. Currently the Rx
checksum offload feature is enabled by default and cannot be controlled through
ethtool. Also, rx-flow-hash and rx-hashing was added. The addition of RSS
provides a big performance boost for the forwarding scenarios, allowing
different traffic flows received by one interface to be processed by different
CPUs in parallel.
The driver has support for multiple prioritized Tx traffic classes. Priorities
range from 0 (lowest) to 3 (highest). These are mapped to HW workqueues with
strict priority levels. Each traffic class contains NR_CPU TX queues. By
default, only one traffic class is enabled and the lowest priority Tx queues
are used. Higher priority traffic classes can be enabled with the mqprio
qdisc. For example, all four traffic classes are enabled on an interface with
the following command. Furthermore, skb priority levels are mapped to traffic
classes as follows:
* priorities 0 to 3 - traffic class 0 (low priority)
* priorities 4 to 7 - traffic class 1 (medium-low priority)
* priorities 8 to 11 - traffic class 2 (medium-high priority)
* priorities 12 to 15 - traffic class 3 (high priority)
::
tc qdisc add dev <int> root handle 1: \
mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
DPAA IRQ Affinity and Receive Side Scaling
==========================================
Traffic coming on the DPAA Rx queues or on the DPAA Tx confirmation
queues is seen by the CPU as ingress traffic on a certain portal.
The DPAA QMan portal interrupts are affined each to a certain CPU.
The same portal interrupt services all the QMan portal consumers.
By default the DPAA Ethernet driver enables RSS, making use of the
DPAA FMan Parser and Keygen blocks to distribute traffic on 128
hardware frame queues using a hash on IP v4/v6 source and destination
and L4 source and destination ports, in present in the received frame.
When RSS is disabled, all traffic received by a certain interface is
received on the default Rx frame queue. The default DPAA Rx frame
queues are configured to put the received traffic into a pool channel
that allows any available CPU portal to dequeue the ingress traffic.
The default frame queues have the HOLDACTIVE option set, ensuring that
traffic bursts from a certain queue are serviced by the same CPU.
This ensures a very low rate of frame reordering. A drawback of this
is that only one CPU at a time can service the traffic received by a
certain interface when RSS is not enabled.
To implement RSS, the DPAA Ethernet driver allocates an extra set of
128 Rx frame queues that are configured to dedicated channels, in a
round-robin manner. The mapping of the frame queues to CPUs is now
hardcoded, there is no indirection table to move traffic for a certain
FQ (hash result) to another CPU. The ingress traffic arriving on one
of these frame queues will arrive at the same portal and will always
be processed by the same CPU. This ensures intra-flow order preservation
and workload distribution for multiple traffic flows.
RSS can be turned off for a certain interface using ethtool, i.e.::
# ethtool -N fm1-mac9 rx-flow-hash tcp4 ""
To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6::
# ethtool -N fm1-mac9 rx-flow-hash udp4 sfdn
There is no independent control for individual protocols, any command
run for one of tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 is
going to control the rx-flow-hashing for all protocols on that interface.
Besides using the FMan Keygen computed hash for spreading traffic on the
128 Rx FQs, the DPAA Ethernet driver also sets the skb hash value when
the NETIF_F_RXHASH feature is on (active by default). This can be turned
on or off through ethtool, i.e.::
# ethtool -K fm1-mac9 rx-hashing off
# ethtool -k fm1-mac9 | grep hash
receive-hashing: off
# ethtool -K fm1-mac9 rx-hashing on
Actual changes:
receive-hashing: on
# ethtool -k fm1-mac9 | grep hash
receive-hashing: on
Please note that Rx hashing depends upon the rx-flow-hashing being on
for that interface - turning off rx-flow-hashing will also disable the
rx-hashing (without ethtool reporting it as off as that depends on the
NETIF_F_RXHASH feature flag).
Debugging
=========
The following statistics are exported for each interface through ethtool:
- interrupt count per CPU
- Rx packets count per CPU
- Tx packets count per CPU
- Tx confirmed packets count per CPU
- Tx S/G frames count per CPU
- Tx error count per CPU
- Rx error count per CPU
- Rx error count per type
- congestion related statistics:
- congestion status
- time spent in congestion
- number of time the device entered congestion
- dropped packets count per cause
The driver also exports the following information in sysfs:
- the FQ IDs for each FQ type
/sys/devices/platform/soc/<addr>.fman/<addr>.ethernet/dpaa-ethernet.<id>/net/fm<nr>-mac<nr>/fqids
- the ID of the buffer pool in use
/sys/devices/platform/soc/<addr>.fman/<addr>.ethernet/dpaa-ethernet.<id>/net/fm<nr>-mac<nr>/bpids

View File

@@ -0,0 +1,160 @@
.. include:: <isonum.txt>
DPAA2 DPIO (Data Path I/O) Overview
===================================
:Copyright: |copy| 2016-2018 NXP
This document provides an overview of the Freescale DPAA2 DPIO
drivers
Introduction
============
A DPAA2 DPIO (Data Path I/O) is a hardware object that provides
interfaces to enqueue and dequeue frames to/from network interfaces
and other accelerators. A DPIO also provides hardware buffer
pool management for network interfaces.
This document provides an overview the Linux DPIO driver, its
subcomponents, and its APIs.
See
Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
for a general overview of DPAA2 and the general DPAA2 driver architecture
in Linux.
Driver Overview
---------------
The DPIO driver is bound to DPIO objects discovered on the fsl-mc bus and
provides services that:
A. allow other drivers, such as the Ethernet driver, to enqueue and dequeue
frames for their respective objects
B. allow drivers to register callbacks for data availability notifications
when data becomes available on a queue or channel
C. allow drivers to manage hardware buffer pools
The Linux DPIO driver consists of 3 primary components--
DPIO object driver-- fsl-mc driver that manages the DPIO object
DPIO service-- provides APIs to other Linux drivers for services
QBman portal interface-- sends portal commands, gets responses::
fsl-mc other
bus drivers
| |
+---+----+ +------+-----+
|DPIO obj| |DPIO service|
| driver |---| (DPIO) |
+--------+ +------+-----+
|
+------+-----+
| QBman |
| portal i/f |
+------------+
|
hardware
The diagram below shows how the DPIO driver components fit with the other
DPAA2 Linux driver components::
+------------+
| OS Network |
| Stack |
+------------+ +------------+
| Allocator |. . . . . . . | Ethernet |
|(DPMCP,DPBP)| | (DPNI) |
+-.----------+ +---+---+----+
. . ^ |
. . <data avail, | |<enqueue,
. . tx confirm> | | dequeue>
+-------------+ . | |
| DPRC driver | . +--------+ +------------+
| (DPRC) | . . |DPIO obj| |DPIO service|
+----------+--+ | driver |-| (DPIO) |
| +--------+ +------+-----+
|<dev add/remove> +------|-----+
| | QBman |
+----+--------------+ | portal i/f |
| MC-bus driver | +------------+
| | |
| /soc/fsl-mc | |
+-------------------+ |
|
=========================================|=========|========================
+-+--DPIO---|-----------+
| | |
| QBman Portal |
+-----------------------+
============================================================================
DPIO Object Driver (dpio-driver.c)
----------------------------------
The dpio-driver component registers with the fsl-mc bus to handle objects of
type "dpio". The implementation of probe() handles basic initialization
of the DPIO including mapping of the DPIO regions (the QBman SW portal)
and initializing interrupts and registering irq handlers. The dpio-driver
registers the probed DPIO with dpio-service.
DPIO service (dpio-service.c, dpaa2-io.h)
------------------------------------------
The dpio service component provides queuing, notification, and buffers
management services to DPAA2 drivers, such as the Ethernet driver. A system
will typically allocate 1 DPIO object per CPU to allow queuing operations
to happen simultaneously across all CPUs.
Notification handling
dpaa2_io_service_register()
dpaa2_io_service_deregister()
dpaa2_io_service_rearm()
Queuing
dpaa2_io_service_pull_fq()
dpaa2_io_service_pull_channel()
dpaa2_io_service_enqueue_fq()
dpaa2_io_service_enqueue_qd()
dpaa2_io_store_create()
dpaa2_io_store_destroy()
dpaa2_io_store_next()
Buffer pool management
dpaa2_io_service_release()
dpaa2_io_service_acquire()
QBman portal interface (qbman-portal.c)
---------------------------------------
The qbman-portal component provides APIs to do the low level hardware
bit twiddling for operations such as:
- initializing Qman software portals
- building and sending portal commands
- portal interrupt configuration and processing
The qbman-portal APIs are not public to other drivers, and are
only used by dpio-service.
Other (dpaa2-fd.h, dpaa2-global.h)
----------------------------------
Frame descriptor and scatter-gather definitions and the APIs used to
manipulate them are defined in dpaa2-fd.h.
Dequeue result struct and parsing APIs are defined in dpaa2-global.h.

View File

@@ -0,0 +1,186 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
===============================
DPAA2 Ethernet driver
===============================
:Copyright: |copy| 2017-2018 NXP
This file provides documentation for the Freescale DPAA2 Ethernet driver.
Supported Platforms
===================
This driver provides networking support for Freescale DPAA2 SoCs, e.g.
LS2080A, LS2088A, LS1088A.
Architecture Overview
=====================
Unlike regular NICs, in the DPAA2 architecture there is no single hardware block
representing network interfaces; instead, several separate hardware resources
concur to provide the networking functionality:
- network interfaces
- queues, channels
- buffer pools
- MAC/PHY
All hardware resources are allocated and configured through the Management
Complex (MC) portals. MC abstracts most of these resources as DPAA2 objects
and exposes ABIs through which they can be configured and controlled. A few
hardware resources, like queues, do not have a corresponding MC object and
are treated as internal resources of other objects.
For a more detailed description of the DPAA2 architecture and its object
abstractions see
*Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst*.
Each Linux net device is built on top of a Datapath Network Interface (DPNI)
object and uses Buffer Pools (DPBPs), I/O Portals (DPIOs) and Concentrators
(DPCONs).
Configuration interface::
-----------------------
| DPAA2 Ethernet Driver |
-----------------------
. . .
. . .
. . . . . . . . . . . .
. . .
. . .
---------- ---------- -----------
| DPBP API | | DPNI API | | DPCON API |
---------- ---------- -----------
. . . software
======= . ========== . ============ . ===================
. . . hardware
------------------------------------------
| MC hardware portals |
------------------------------------------
. . .
. . .
------ ------ -------
| DPBP | | DPNI | | DPCON |
------ ------ -------
The DPNIs are network interfaces without a direct one-on-one mapping to PHYs.
DPBPs represent hardware buffer pools. Packet I/O is performed in the context
of DPCON objects, using DPIO portals for managing and communicating with the
hardware resources.
Datapath (I/O) interface::
-----------------------------------------------
| DPAA2 Ethernet Driver |
-----------------------------------------------
| ^ ^ | |
| | | | |
enqueue| dequeue| data | dequeue| seed |
(Tx) | (Rx, TxC)| avail.| request| buffers|
| | notify| | |
| | | | |
V | | V V
-----------------------------------------------
| DPIO Driver |
-----------------------------------------------
| | | | | software
| | | | | ================
| | | | | hardware
-----------------------------------------------
| I/O hardware portals |
-----------------------------------------------
| ^ ^ | |
| | | | |
| | | V |
V | ================ V
---------------------- | -------------
queues ---------------------- | | Buffer pool |
---------------------- | -------------
=======================
Channel
Datapath I/O (DPIO) portals provide enqueue and dequeue services, data
availability notifications and buffer pool management. DPIOs are shared between
all DPAA2 objects (and implicitly all DPAA2 kernel drivers) that work with data
frames, but must be affine to the CPUs for the purpose of traffic distribution.
Frames are transmitted and received through hardware frame queues, which can be
grouped in channels for the purpose of hardware scheduling. The Ethernet driver
enqueues TX frames on egress queues and after transmission is complete a TX
confirmation frame is sent back to the CPU.
When frames are available on ingress queues, a data availability notification
is sent to the CPU; notifications are raised per channel, so even if multiple
queues in the same channel have available frames, only one notification is sent.
After a channel fires a notification, is must be explicitly rearmed.
Each network interface can have multiple Rx, Tx and confirmation queues affined
to CPUs, and one channel (DPCON) for each CPU that services at least one queue.
DPCONs are used to distribute ingress traffic to different CPUs via the cores'
affine DPIOs.
The role of hardware buffer pools is storage of ingress frame data. Each network
interface has a privately owned buffer pool which it seeds with kernel allocated
buffers.
DPNIs are decoupled from PHYs; a DPNI can be connected to a PHY through a DPMAC
object or to another DPNI through an internal link, but the connection is
managed by MC and completely transparent to the Ethernet driver.
::
--------- --------- ---------
| eth if1 | | eth if2 | | eth ifn |
--------- --------- ---------
. . .
. . .
. . .
---------------------------
| DPAA2 Ethernet Driver |
---------------------------
. . .
. . .
. . .
------ ------ ------ -------
| DPNI | | DPNI | | DPNI | | DPMAC |----+
------ ------ ------ ------- |
| | | | |
| | | | -----
=========== ================== | PHY |
-----
Creating a Network Interface
============================
A net device is created for each DPNI object probed on the MC bus. Each DPNI has
a number of properties which determine the network interface configuration
options and associated hardware resources.
DPNI objects (and the other DPAA2 objects needed for a network interface) can be
added to a container on the MC bus in one of two ways: statically, through a
Datapath Layout Binary file (DPL) that is parsed by MC at boot time; or created
dynamically at runtime, via the DPAA2 objects APIs.
Features & Offloads
===================
Hardware checksum offloading is supported for TCP and UDP over IPv4/6 frames.
The checksum offloads can be independently configured on RX and TX through
ethtool.
Hardware offload of unicast and multicast MAC filtering is supported on the
ingress path and permanently enabled.
Scatter-gather frames are supported on both RX and TX paths. On TX, SG support
is configurable via ethtool; on RX it is always enabled.
The DPAA2 hardware can process jumbo Ethernet frames of up to 10K bytes.
The Ethernet driver defines a static flow hashing scheme that distributes
traffic based on a 5-tuple key: src IP, dst IP, IP proto, L4 src port,
L4 dst port. No user configuration is supported for now.
Hardware specific statistics for the network interface as well as some
non-standard driver stats can be consulted through ethtool -S option.

View File

@@ -0,0 +1,11 @@
===================
DPAA2 Documentation
===================
.. toctree::
:maxdepth: 1
overview
dpio-driver
ethernet-driver
mac-phy-support

View File

@@ -0,0 +1,191 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
=======================
DPAA2 MAC / PHY support
=======================
:Copyright: |copy| 2019 NXP
Overview
--------
The DPAA2 MAC / PHY support consists of a set of APIs that help DPAA2 network
drivers (dpaa2-eth, dpaa2-ethsw) interract with the PHY library.
DPAA2 Software Architecture
---------------------------
Among other DPAA2 objects, the fsl-mc bus exports DPNI objects (abstracting a
network interface) and DPMAC objects (abstracting a MAC). The dpaa2-eth driver
probes on the DPNI object and connects to and configures a DPMAC object with
the help of phylink.
Data connections may be established between a DPNI and a DPMAC, or between two
DPNIs. Depending on the connection type, the netif_carrier_[on/off] is handled
directly by the dpaa2-eth driver or by phylink.
.. code-block:: none
Sources of abstracted link state information presented by the MC firmware
+--------------------------------------+
+------------+ +---------+ | xgmac_mdio |
| net_device | | phylink |--| +-----+ +-----+ +-----+ +-----+ |
+------------+ +---------+ | | PHY | | PHY | | PHY | | PHY | |
| | | +-----+ +-----+ +-----+ +-----+ |
+------------------------------------+ | External MDIO bus |
| dpaa2-eth | +--------------------------------------+
+------------------------------------+
| | Linux
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | MC firmware
| /| V
+----------+ / | +----------+
| | / | | |
| | | | | |
| DPNI |<------| |<------| DPMAC |
| | | | | |
| | \ |<---+ | |
+----------+ \ | | +----------+
\| |
|
+--------------------------------------+
| MC firmware polling MAC PCS for link |
| +-----+ +-----+ +-----+ +-----+ |
| | PCS | | PCS | | PCS | | PCS | |
| +-----+ +-----+ +-----+ +-----+ |
| Internal MDIO bus |
+--------------------------------------+
Depending on an MC firmware configuration setting, each MAC may be in one of two modes:
- DPMAC_LINK_TYPE_FIXED: the link state management is handled exclusively by
the MC firmware by polling the MAC PCS. Without the need to register a
phylink instance, the dpaa2-eth driver will not bind to the connected dpmac
object at all.
- DPMAC_LINK_TYPE_PHY: The MC firmware is left waiting for link state update
events, but those are in fact passed strictly between the dpaa2-mac (based on
phylink) and its attached net_device driver (dpaa2-eth, dpaa2-ethsw),
effectively bypassing the firmware.
Implementation
--------------
At probe time or when a DPNI's endpoint is dynamically changed, the dpaa2-eth
is responsible to find out if the peer object is a DPMAC and if this is the
case, to integrate it with PHYLINK using the dpaa2_mac_connect() API, which
will do the following:
- look up the device tree for PHYLINK-compatible of binding (phy-handle)
- will create a PHYLINK instance associated with the received net_device
- connect to the PHY using phylink_of_phy_connect()
The following phylink_mac_ops callback are implemented:
- .validate() will populate the supported linkmodes with the MAC capabilities
only when the phy_interface_t is RGMII_* (at the moment, this is the only
link type supported by the driver).
- .mac_config() will configure the MAC in the new configuration using the
dpmac_set_link_state() MC firmware API.
- .mac_link_up() / .mac_link_down() will update the MAC link using the same
API described above.
At driver unbind() or when the DPNI object is disconnected from the DPMAC, the
dpaa2-eth driver calls dpaa2_mac_disconnect() which will, in turn, disconnect
from the PHY and destroy the PHYLINK instance.
In case of a DPNI-DPMAC connection, an 'ip link set dev eth0 up' would start
the following sequence of operations:
(1) phylink_start() called from .dev_open().
(2) The .mac_config() and .mac_link_up() callbacks are called by PHYLINK.
(3) In order to configure the HW MAC, the MC Firmware API
dpmac_set_link_state() is called.
(4) The firmware will eventually setup the HW MAC in the new configuration.
(5) A netif_carrier_on() call is made directly from PHYLINK on the associated
net_device.
(6) The dpaa2-eth driver handles the LINK_STATE_CHANGE irq in order to
enable/disable Rx taildrop based on the pause frame settings.
.. code-block:: none
+---------+ +---------+
| PHYLINK |-------------->| eth0 |
+---------+ (5) +---------+
(1) ^ |
| |
| v (2)
+-----------------------------------+
| dpaa2-eth |
+-----------------------------------+
| ^ (6)
| |
v (3) |
+---------+---------------+---------+
| DPMAC | | DPNI |
+---------+ +---------+
| MC Firmware |
+-----------------------------------+
|
|
v (4)
+-----------------------------------+
| HW MAC |
+-----------------------------------+
In case of a DPNI-DPNI connection, a usual sequence of operations looks like
the following:
(1) ip link set dev eth0 up
(2) The dpni_enable() MC API called on the associated fsl_mc_device.
(3) ip link set dev eth1 up
(4) The dpni_enable() MC API called on the associated fsl_mc_device.
(5) The LINK_STATE_CHANGED irq is received by both instances of the dpaa2-eth
driver because now the operational link state is up.
(6) The netif_carrier_on() is called on the exported net_device from
link_state_update().
.. code-block:: none
+---------+ +---------+
| eth0 | | eth1 |
+---------+ +---------+
| ^ ^ |
| | | |
(1) v | (6) (6) | v (3)
+---------+ +---------+
|dpaa2-eth| |dpaa2-eth|
+---------+ +---------+
| ^ ^ |
| | | |
(2) v | (5) (5) | v (4)
+---------+---------------+---------+
| DPNI | | DPNI |
+---------+ +---------+
| MC Firmware |
+-----------------------------------+
Exported API
------------
Any DPAA2 driver that drivers endpoints of DPMAC objects should service its
_EVENT_ENDPOINT_CHANGED irq and connect/disconnect from the associated DPMAC
when necessary using the below listed API::
- int dpaa2_mac_connect(struct dpaa2_mac *mac);
- void dpaa2_mac_disconnect(struct dpaa2_mac *mac);
A phylink integration is necessary only when the partner DPMAC is not of TYPE_FIXED.
One can check for this condition using the below API::
- bool dpaa2_mac_is_type_fixed(struct fsl_mc_device *dpmac_dev,struct fsl_mc_io *mc_io);
Before connection to a MAC, the caller must allocate and populate the
dpaa2_mac structure with the associated net_device, a pointer to the MC portal
to be used and the actual fsl_mc_device structure of the DPMAC.

View File

@@ -0,0 +1,405 @@
.. include:: <isonum.txt>
=========================================================
DPAA2 (Data Path Acceleration Architecture Gen2) Overview
=========================================================
:Copyright: |copy| 2015 Freescale Semiconductor Inc.
:Copyright: |copy| 2018 NXP
This document provides an overview of the Freescale DPAA2 architecture
and how it is integrated into the Linux kernel.
Introduction
============
DPAA2 is a hardware architecture designed for high-speeed network
packet processing. DPAA2 consists of sophisticated mechanisms for
processing Ethernet packets, queue management, buffer management,
autonomous L2 switching, virtual Ethernet bridging, and accelerator
(e.g. crypto) sharing.
A DPAA2 hardware component called the Management Complex (or MC) manages the
DPAA2 hardware resources. The MC provides an object-based abstraction for
software drivers to use the DPAA2 hardware.
The MC uses DPAA2 hardware resources such as queues, buffer pools, and
network ports to create functional objects/devices such as network
interfaces, an L2 switch, or accelerator instances.
The MC provides memory-mapped I/O command interfaces (MC portals)
which DPAA2 software drivers use to operate on DPAA2 objects.
The diagram below shows an overview of the DPAA2 resource management
architecture::
+--------------------------------------+
| OS |
| DPAA2 drivers |
| | |
+-----------------------------|--------+
|
| (create,discover,connect
| config,use,destroy)
|
DPAA2 |
+------------------------| mc portal |-+
| | |
| +- - - - - - - - - - - - -V- - -+ |
| | | |
| | Management Complex (MC) | |
| | | |
| +- - - - - - - - - - - - - - - -+ |
| |
| Hardware Hardware |
| Resources Objects |
| --------- ------- |
| -queues -DPRC |
| -buffer pools -DPMCP |
| -Eth MACs/ports -DPIO |
| -network interface -DPNI |
| profiles -DPMAC |
| -queue portals -DPBP |
| -MC portals ... |
| ... |
| |
+--------------------------------------+
The MC mediates operations such as create, discover,
connect, configuration, and destroy. Fast-path operations
on data, such as packet transmit/receive, are not mediated by
the MC and are done directly using memory mapped regions in
DPIO objects.
Overview of DPAA2 Objects
=========================
The section provides a brief overview of some key DPAA2 objects.
A simple scenario is described illustrating the objects involved
in creating a network interfaces.
DPRC (Datapath Resource Container)
----------------------------------
A DPRC is a container object that holds all the other
types of DPAA2 objects. In the example diagram below there
are 8 objects of 5 types (DPMCP, DPIO, DPBP, DPNI, and DPMAC)
in the container.
::
+---------------------------------------------------------+
| DPRC |
| |
| +-------+ +-------+ +-------+ +-------+ +-------+ |
| | DPMCP | | DPIO | | DPBP | | DPNI | | DPMAC | |
| +-------+ +-------+ +-------+ +---+---+ +---+---+ |
| | DPMCP | | DPIO | |
| +-------+ +-------+ |
| | DPMCP | |
| +-------+ |
| |
+---------------------------------------------------------+
From the point of view of an OS, a DPRC behaves similar to a plug and
play bus, like PCI. DPRC commands can be used to enumerate the contents
of the DPRC, discover the hardware objects present (including mappable
regions and interrupts).
::
DPRC.1 (bus)
|
+--+--------+-------+-------+-------+
| | | | |
DPMCP.1 DPIO.1 DPBP.1 DPNI.1 DPMAC.1
DPMCP.2 DPIO.2
DPMCP.3
Hardware objects can be created and destroyed dynamically, providing
the ability to hot plug/unplug objects in and out of the DPRC.
A DPRC has a mappable MMIO region (an MC portal) that can be used
to send MC commands. It has an interrupt for status events (like
hotplug).
All objects in a container share the same hardware "isolation context".
This means that with respect to an IOMMU the isolation granularity
is at the DPRC (container) level, not at the individual object
level.
DPRCs can be defined statically and populated with objects
via a config file passed to the MC when firmware starts it.
DPAA2 Objects for an Ethernet Network Interface
-----------------------------------------------
A typical Ethernet NIC is monolithic-- the NIC device contains TX/RX
queuing mechanisms, configuration mechanisms, buffer management,
physical ports, and interrupts. DPAA2 uses a more granular approach
utilizing multiple hardware objects. Each object provides specialized
functions. Groups of these objects are used by software to provide
Ethernet network interface functionality. This approach provides
efficient use of finite hardware resources, flexibility, and
performance advantages.
The diagram below shows the objects needed for a simple
network interface configuration on a system with 2 CPUs.
::
+---+---+ +---+---+
CPU0 CPU1
+---+---+ +---+---+
| |
+---+---+ +---+---+
DPIO DPIO
+---+---+ +---+---+
\ /
\ /
\ /
+---+---+
DPNI --- DPBP,DPMCP
+---+---+
|
|
+---+---+
DPMAC
+---+---+
|
port/PHY
Below the objects are described. For each object a brief description
is provided along with a summary of the kinds of operations the object
supports and a summary of key resources of the object (MMIO regions
and IRQs).
DPMAC (Datapath Ethernet MAC)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Represents an Ethernet MAC, a hardware device that connects to an Ethernet
PHY and allows physical transmission and reception of Ethernet frames.
- MMIO regions: none
- IRQs: DPNI link change
- commands: set link up/down, link config, get stats,
IRQ config, enable, reset
DPNI (Datapath Network Interface)
Contains TX/RX queues, network interface configuration, and RX buffer pool
configuration mechanisms. The TX/RX queues are in memory and are identified
by queue number.
- MMIO regions: none
- IRQs: link state
- commands: port config, offload config, queue config,
parse/classify config, IRQ config, enable, reset
DPIO (Datapath I/O)
~~~~~~~~~~~~~~~~~~~
Provides interfaces to enqueue and dequeue
packets and do hardware buffer pool management operations. The DPAA2
architecture separates the mechanism to access queues (the DPIO object)
from the queues themselves. The DPIO provides an MMIO interface to
enqueue/dequeue packets. To enqueue something a descriptor is written
to the DPIO MMIO region, which includes the target queue number.
There will typically be one DPIO assigned to each CPU. This allows all
CPUs to simultaneously perform enqueue/dequeued operations. DPIOs are
expected to be shared by different DPAA2 drivers.
- MMIO regions: queue operations, buffer management
- IRQs: data availability, congestion notification, buffer
pool depletion
- commands: IRQ config, enable, reset
DPBP (Datapath Buffer Pool)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Represents a hardware buffer pool.
- MMIO regions: none
- IRQs: none
- commands: enable, reset
DPMCP (Datapath MC Portal)
~~~~~~~~~~~~~~~~~~~~~~~~~~
Provides an MC command portal.
Used by drivers to send commands to the MC to manage
objects.
- MMIO regions: MC command portal
- IRQs: command completion
- commands: IRQ config, enable, reset
Object Connections
==================
Some objects have explicit relationships that must
be configured:
- DPNI <--> DPMAC
- DPNI <--> DPNI
- DPNI <--> L2-switch-port
A DPNI must be connected to something such as a DPMAC,
another DPNI, or L2 switch port. The DPNI connection
is made via a DPRC command.
::
+-------+ +-------+
| DPNI | | DPMAC |
+---+---+ +---+---+
| |
+==========+
- DPNI <--> DPBP
A network interface requires a 'buffer pool' (DPBP
object) which provides a list of pointers to memory
where received Ethernet data is to be copied. The
Ethernet driver configures the DPBPs associated with
the network interface.
Interrupts
==========
All interrupts generated by DPAA2 objects are message
interrupts. At the hardware level message interrupts
generated by devices will normally have 3 components--
1) a non-spoofable 'device-id' expressed on the hardware
bus, 2) an address, 3) a data value.
In the case of DPAA2 devices/objects, all objects in the
same container/DPRC share the same 'device-id'.
For ARM-based SoC this is the same as the stream ID.
DPAA2 Linux Drivers Overview
============================
This section provides an overview of the Linux kernel drivers for
DPAA2-- 1) the bus driver and associated "DPAA2 infrastructure"
drivers and 2) functional object drivers (such as Ethernet).
As described previously, a DPRC is a container that holds the other
types of DPAA2 objects. It is functionally similar to a plug-and-play
bus controller.
Each object in the DPRC is a Linux "device" and is bound to a driver.
The diagram below shows the Linux drivers involved in a networking
scenario and the objects bound to each driver. A brief description
of each driver follows.
::
+------------+
| OS Network |
| Stack |
+------------+ +------------+
| Allocator |. . . . . . . | Ethernet |
|(DPMCP,DPBP)| | (DPNI) |
+-.----------+ +---+---+----+
. . ^ |
. . <data avail, | | <enqueue,
. . tx confirm> | | dequeue>
+-------------+ . | |
| DPRC driver | . +---+---V----+ +---------+
| (DPRC) | . . . . . .| DPIO driver| | MAC |
+----------+--+ | (DPIO) | | (DPMAC) |
| +------+-----+ +-----+---+
|<dev add/remove> | |
| | |
+--------+----------+ | +--+---+
| MC-bus driver | | | PHY |
| | | |driver|
| /bus/fsl-mc | | +--+---+
+-------------------+ | |
| |
========================= HARDWARE =========|=================|======
DPIO |
| |
DPNI---DPBP |
| |
DPMAC |
| |
PHY ---------------+
============================================|========================
A brief description of each driver is provided below.
MC-bus driver
-------------
The MC-bus driver is a platform driver and is probed from a
node in the device tree (compatible "fsl,qoriq-mc") passed in by boot
firmware. It is responsible for bootstrapping the DPAA2 kernel
infrastructure.
Key functions include:
- registering a new bus type named "fsl-mc" with the kernel,
and implementing bus call-backs (e.g. match/uevent/dev_groups)
- implementing APIs for DPAA2 driver registration and for device
add/remove
- creates an MSI IRQ domain
- doing a 'device add' to expose the 'root' DPRC, in turn triggering
a bind of the root DPRC to the DPRC driver
The binding for the MC-bus device-tree node can be consulted at
*Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt*.
The sysfs bind/unbind interfaces for the MC-bus can be consulted at
*Documentation/ABI/testing/sysfs-bus-fsl-mc*.
DPRC driver
-----------
The DPRC driver is bound to DPRC objects and does runtime management
of a bus instance. It performs the initial bus scan of the DPRC
and handles interrupts for container events such as hot plug by
re-scanning the DPRC.
Allocator
---------
Certain objects such as DPMCP and DPBP are generic and fungible,
and are intended to be used by other drivers. For example,
the DPAA2 Ethernet driver needs:
- DPMCPs to send MC commands, to configure network interfaces
- DPBPs for network buffer pools
The allocator driver registers for these allocatable object types
and those objects are bound to the allocator when the bus is probed.
The allocator maintains a pool of objects that are available for
allocation by other DPAA2 drivers.
DPIO driver
-----------
The DPIO driver is bound to DPIO objects and provides services that allow
other drivers such as the Ethernet driver to enqueue and dequeue data for
their respective objects.
Key services include:
- data availability notifications
- hardware queuing operations (enqueue and dequeue of data)
- hardware buffer pool management
To transmit a packet the Ethernet driver puts data on a queue and
invokes a DPIO API. For receive, the Ethernet driver registers
a data availability notification callback. To dequeue a packet
a DPIO API is used.
There is typically one DPIO object per physical CPU for optimum
performance, allowing different CPUs to simultaneously enqueue
and dequeue data.
The DPIO driver operates on behalf of all DPAA2 drivers
active in the kernel-- Ethernet, crypto, compression,
etc.
Ethernet driver
---------------
The Ethernet driver is bound to a DPNI and implements the kernel
interfaces needed to connect the DPAA2 network interface to
the network stack.
Each DPNI corresponds to a Linux network interface.
MAC driver
----------
An Ethernet PHY is an off-chip, board specific component and is managed
by the appropriate PHY driver via an mdio bus. The MAC driver
plays a role of being a proxy between the PHY driver and the
MC. It does this proxy via the MC commands to a DPMAC object.
If the PHY driver signals a link change, the MAC driver notifies
the MC via a DPMAC command. If a network interface is brought
up or down, the MC notifies the DPMAC driver via an interrupt and
the driver can take appropriate action.

View File

@@ -0,0 +1,51 @@
.. SPDX-License-Identifier: GPL-2.0
===========================
The Gianfar Ethernet Driver
===========================
:Author: Andy Fleming <afleming@freescale.com>
:Updated: 2005-07-28
Checksum Offloading
===================
The eTSEC controller (first included in parts from late 2005 like
the 8548) has the ability to perform TCP, UDP, and IP checksums
in hardware. The Linux kernel only offloads the TCP and UDP
checksums (and always performs the pseudo header checksums), so
the driver only supports checksumming for TCP/IP and UDP/IP
packets. Use ethtool to enable or disable this feature for RX
and TX.
VLAN
====
In order to use VLAN, please consult Linux documentation on
configuring VLANs. The gianfar driver supports hardware insertion and
extraction of VLAN headers, but not filtering. Filtering will be
done by the kernel.
Multicasting
============
The gianfar driver supports using the group hash table on the
TSEC (and the extended hash table on the eTSEC) for multicast
filtering. On the eTSEC, the exact-match MAC registers are used
before the hash tables. See Linux documentation on how to join
multicast groups.
Padding
=======
The gianfar driver supports padding received frames with 2 bytes
to align the IP header to a 16-byte boundary, when supported by
hardware.
Ethtool
=======
The gianfar driver supports the use of ethtool for many
configuration options. You must run ethtool only on currently
open interfaces. See ethtool documentation for details.

View File

@@ -0,0 +1,123 @@
.. SPDX-License-Identifier: GPL-2.0+
==============================================================
Linux kernel driver for Compute Engine Virtual Ethernet (gve):
==============================================================
Supported Hardware
===================
The GVE driver binds to a single PCI device id used by the virtual
Ethernet device found in some Compute Engine VMs.
+--------------+----------+---------+
|Field | Value | Comments|
+==============+==========+=========+
|Vendor ID | `0x1AE0` | Google |
+--------------+----------+---------+
|Device ID | `0x0042` | |
+--------------+----------+---------+
|Sub-vendor ID | `0x1AE0` | Google |
+--------------+----------+---------+
|Sub-device ID | `0x0058` | |
+--------------+----------+---------+
|Revision ID | `0x0` | |
+--------------+----------+---------+
|Device Class | `0x200` | Ethernet|
+--------------+----------+---------+
PCI Bars
========
The gVNIC PCI device exposes three 32-bit memory BARS:
- Bar0 - Device configuration and status registers.
- Bar1 - MSI-X vector table
- Bar2 - IRQ, RX and TX doorbells
Device Interactions
===================
The driver interacts with the device in the following ways:
- Registers
- A block of MMIO registers
- See gve_register.h for more detail
- Admin Queue
- See description below
- Reset
- At any time the device can be reset
- Interrupts
- See supported interrupts below
- Transmit and Receive Queues
- See description below
Registers
---------
All registers are MMIO and big endian.
The registers are used for initializing and configuring the device as well as
querying device status in response to management interrupts.
Admin Queue (AQ)
----------------
The Admin Queue is a PAGE_SIZE memory block, treated as an array of AQ
commands, used by the driver to issue commands to the device and set up
resources.The driver and the device maintain a count of how many commands
have been submitted and executed. To issue AQ commands, the driver must do
the following (with proper locking):
1) Copy new commands into next available slots in the AQ array
2) Increment its counter by he number of new commands
3) Write the counter into the GVE_ADMIN_QUEUE_DOORBELL register
4) Poll the ADMIN_QUEUE_EVENT_COUNTER register until it equals
the value written to the doorbell, or until a timeout.
The device will update the status field in each AQ command reported as
executed through the ADMIN_QUEUE_EVENT_COUNTER register.
Device Resets
-------------
A device reset is triggered by writing 0x0 to the AQ PFN register.
This causes the device to release all resources allocated by the
driver, including the AQ itself.
Interrupts
----------
The following interrupts are supported by the driver:
Management Interrupt
~~~~~~~~~~~~~~~~~~~~
The management interrupt is used by the device to tell the driver to
look at the GVE_DEVICE_STATUS register.
The handler for the management irq simply queues the service task in
the workqueue to check the register and acks the irq.
Notification Block Interrupts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The notification block interrupts are used to tell the driver to poll
the queues associated with that interrupt.
The handler for these irqs schedule the napi for that block to run
and poll the queues.
Traffic Queues
--------------
gVNIC's queues are composed of a descriptor ring and a buffer and are
assigned to a notification block.
The descriptor rings are power-of-two-sized ring buffers consisting of
fixed-size descriptors. They advance their head pointer using a __be32
doorbell located in Bar2. The tail pointers are advanced by consuming
descriptors in-order and updating a __be32 counter. Both the doorbell
and the counter overflow to zero.
Each queue's buffers must be registered in advance with the device as a
queue page list, and packet data can only be put in those pages.
Transmit
~~~~~~~~
gve maps the buffers for transmit rings into a FIFO and copies the packets
into the FIFO before sending them to the NIC.
Receive
~~~~~~~
The buffers for receive rings are put into a data ring that is the same
length as the descriptor ring and the head and tail pointers advance over
the rings together.

View File

@@ -0,0 +1,58 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
Ethernet Device Drivers
=======================
Device drivers for Ethernet and Ethernet-based virtual function devices.
Contents:
.. toctree::
:maxdepth: 2
3com/3c509
3com/vortex
amazon/ena
aquantia/atlantic
chelsio/cxgb
cirrus/cs89x0
dlink/dl2k
davicom/dm9000
dec/de4x5
dec/dmfe
freescale/dpaa
freescale/dpaa2/index
freescale/gianfar
google/gve
intel/e100
intel/e1000
intel/e1000e
intel/fm10k
intel/igb
intel/igbvf
intel/ixgb
intel/ixgbe
intel/ixgbevf
intel/i40e
intel/iavf
intel/ice
marvell/octeontx2
mellanox/mlx5
microsoft/netvsc
neterion/s2io
neterion/vxge
netronome/nfp
pensando/ionic
smsc/smc9
stmicro/stmmac
ti/cpsw
ti/cpsw_switchdev
ti/tlan
toshiba/spider_net
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@@ -0,0 +1,188 @@
.. SPDX-License-Identifier: GPL-2.0+
=============================================================
Linux Base Driver for the Intel(R) PRO/100 Family of Adapters
=============================================================
June 1, 2018
Contents
========
- In This Release
- Identifying Your Adapter
- Building and Installation
- Driver Configuration Parameters
- Additional Configurations
- Known Issues
- Support
In This Release
===============
This file describes the Linux Base Driver for the Intel(R) PRO/100 Family of
Adapters. This driver includes support for Itanium(R)2-based systems.
For questions related to hardware requirements, refer to the documentation
supplied with your Intel PRO/100 adapter.
The following features are now available in supported kernels:
- Native VLANs
- Channel Bonding (teaming)
- SNMP
Channel Bonding documentation can be found in the Linux kernel source:
/Documentation/networking/bonding.rst
Identifying Your Adapter
========================
For information on how to identify your adapter, and for the latest Intel
network drivers, refer to the Intel Support website:
http://www.intel.com/support
Driver Configuration Parameters
===============================
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
Rx Descriptors:
Number of receive descriptors. A receive descriptor is a data
structure that describes a receive buffer and its attributes to the network
controller. The data in the descriptor is used by the controller to write
data from the controller to host memory. In the 3.x.x driver the valid range
for this parameter is 64-256. The default value is 256. This parameter can be
changed using the command::
ethtool -G eth? rx n
Where n is the number of desired Rx descriptors.
Tx Descriptors:
Number of transmit descriptors. A transmit descriptor is a data
structure that describes a transmit buffer and its attributes to the network
controller. The data in the descriptor is used by the controller to read
data from the host memory to the controller. In the 3.x.x driver the valid
range for this parameter is 64-256. The default value is 128. This parameter
can be changed using the command::
ethtool -G eth? tx n
Where n is the number of desired Tx descriptors.
Speed/Duplex:
The driver auto-negotiates the link speed and duplex settings by
default. The ethtool utility can be used as follows to force speed/duplex.::
ethtool -s eth? autoneg off speed {10|100} duplex {full|half}
NOTE: setting the speed/duplex to incorrect values will cause the link to
fail.
Event Log Message Level:
The driver uses the message level flag to log events
to syslog. The message level can be set at driver load time. It can also be
set using the command::
ethtool -s eth? msglvl n
Additional Configurations
=========================
Configuring the Driver on Different Distributions
-------------------------------------------------
Configuring a network driver to load properly when the system is started
is distribution dependent. Typically, the configuration process involves
adding an alias line to `/etc/modprobe.d/*.conf` as well as editing other
system startup scripts and/or configuration files. Many popular Linux
distributions ship with tools to make these changes for you. To learn
the proper way to configure a network device for your system, refer to
your distribution documentation. If during this process you are asked
for the driver or module name, the name for the Linux Base Driver for
the Intel PRO/100 Family of Adapters is e100.
As an example, if you install the e100 driver for two PRO/100 adapters
(eth0 and eth1), add the following to a configuration file in
/etc/modprobe.d/::
alias eth0 e100
alias eth1 e100
Viewing Link Messages
---------------------
In order to see link messages and other Intel driver information on your
console, you must set the dmesg level up to six. This can be done by
entering the following on the command line before loading the e100
driver::
dmesg -n 6
If you wish to see all messages issued by the driver, including debug
messages, set the dmesg level to eight.
NOTE: This setting is not saved across reboots.
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The ethtool
version 1.6 or later is required for this functionality.
The latest release of ethtool can be found from
https://www.kernel.org/pub/software/network/ethtool/
Enabling Wake on LAN (WoL)
--------------------------
WoL is provided through the ethtool utility. For instructions on
enabling WoL with ethtool, refer to the ethtool man page. WoL will be
enabled on the system during the next shut down or reboot. For this
driver version, in order to enable WoL, the e100 driver must be loaded
when shutting down or rebooting the system.
NAPI
----
NAPI (Rx polling mode) is supported in the e100 driver.
See https://wiki.linuxfoundation.org/networking/napi for more
information on NAPI.
Multiple Interfaces on Same Ethernet Broadcast Network
------------------------------------------------------
Due to the default ARP behavior on Linux, it is not possible to have one
system on two IP networks in the same Ethernet broadcast domain
(non-partitioned switch) behave as expected. All Ethernet interfaces
will respond to IP traffic for any IP address assigned to the system.
This results in unbalanced receive traffic.
If you have multiple interfaces in a server, either turn on ARP
filtering by
(1) entering::
echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
(this only works if your kernel's version is higher than 2.4.5), or
(2) installing the interfaces in separate broadcast domains (either
in different switches or in a switch partitioned to VLANs).
Support
=======
For general information, go to the Intel support website at:
http://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
http://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,463 @@
.. SPDX-License-Identifier: GPL-2.0+
==========================================================
Linux Base Driver for Intel(R) Ethernet Network Connection
==========================================================
Intel Gigabit Linux driver.
Copyright(c) 1999 - 2013 Intel Corporation.
Contents
========
- Identifying Your Adapter
- Command Line Parameters
- Speed and Duplex Configuration
- Additional Configurations
- Support
Identifying Your Adapter
========================
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
http://support.intel.com/support/go/network/adapter/idguide.htm
For the latest Intel network drivers for Linux, refer to the following
website. In the search field, enter your adapter name or type, or use the
networking link on the left to search for your adapter:
http://support.intel.com/support/go/network/adapter/home.htm
Command Line Parameters
=======================
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
NOTES:
For more information about the AutoNeg, Duplex, and Speed
parameters, see the "Speed and Duplex Configuration" section in
this document.
For more information about the InterruptThrottleRate,
RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay
parameters, see the application note at:
http://www.intel.com/design/network/applnots/ap450.htm
AutoNeg
-------
(Supported only on adapters with copper connections)
:Valid Range: 0x01-0x0F, 0x20-0x2F
:Default Value: 0x2F
This parameter is a bit-mask that specifies the speed and duplex settings
advertised by the adapter. When this parameter is used, the Speed and
Duplex parameters must not be specified.
NOTE:
Refer to the Speed and Duplex section of this readme for more
information on the AutoNeg parameter.
Duplex
------
(Supported only on adapters with copper connections)
:Valid Range: 0-2 (0=auto-negotiate, 1=half, 2=full)
:Default Value: 0
This defines the direction in which data is allowed to flow. Can be
either one or two-directional. If both Duplex and the link partner are
set to auto-negotiate, the board auto-detects the correct duplex. If the
link partner is forced (either full or half), Duplex defaults to half-
duplex.
FlowControl
-----------
:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
:Default Value: Reads flow control settings from the EEPROM
This parameter controls the automatic generation(Tx) and response(Rx)
to Ethernet PAUSE frames.
InterruptThrottleRate
---------------------
(not supported on Intel(R) 82542, 82543 or 82544-based adapters)
:Valid Range:
0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
4=simplified balancing)
:Default Value: 3
The driver can limit the amount of interrupts per second that the adapter
will generate for incoming packets. It does this by writing a value to the
adapter that is based on the maximum amount of interrupts that the adapter
will generate per second.
Setting InterruptThrottleRate to a value greater or equal to 100
will program the adapter to send out a maximum of that many interrupts
per second, even if more packets have come in. This reduces interrupt
load on the system and can lower CPU utilization under heavy load,
but will increase latency as packets are not processed as quickly.
The default behaviour of the driver previously assumed a static
InterruptThrottleRate value of 8000, providing a good fallback value for
all traffic types,but lacking in small packet performance and latency.
The hardware can handle many more small packets per second however, and
for this reason an adaptive interrupt moderation algorithm was implemented.
Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which
it dynamically adjusts the InterruptThrottleRate value based on the traffic
that it receives. After determining the type of incoming traffic in the last
timeframe, it will adjust the InterruptThrottleRate to an appropriate value
for that traffic.
The algorithm classifies the incoming traffic every interval into
classes. Once the class is determined, the InterruptThrottleRate value is
adjusted to suit that traffic type the best. There are three classes defined:
"Bulk traffic", for large amounts of packets of normal size; "Low latency",
for small amounts of traffic and/or a significant percentage of small
packets; and "Lowest latency", for almost completely small packets or
minimal traffic.
In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
latency" or "Lowest latency" class, the InterruptThrottleRate is increased
stepwise to 20000. This default mode is suitable for most applications.
For situations where low latency is vital such as cluster or
grid computing, the algorithm can reduce latency even more when
InterruptThrottleRate is set to mode 1. In this mode, which operates
the same as mode 3, the InterruptThrottleRate will be increased stepwise to
70000 for traffic in class "Lowest latency".
In simplified mode the interrupt rate is based on the ratio of TX and
RX traffic. If the bytes per second rate is approximately equal, the
interrupt rate will drop as low as 2000 interrupts per second. If the
traffic is mostly transmit or mostly receive, the interrupt rate could
be as high as 8000.
Setting InterruptThrottleRate to 0 turns off any interrupt moderation
and may improve small packet latency, but is generally not suitable
for bulk throughput traffic.
NOTE:
InterruptThrottleRate takes precedence over the TxAbsIntDelay and
RxAbsIntDelay parameters. In other words, minimizing the receive
and/or transmit absolute delays does not force the controller to
generate more interrupts than what the Interrupt Throttle Rate
allows.
CAUTION:
If you are using the Intel(R) PRO/1000 CT Network Connection
(controller 82547), setting InterruptThrottleRate to a value
greater than 75,000, may hang (stop transmitting) adapters
under certain network conditions. If this occurs a NETDEV
WATCHDOG message is logged in the system event log. In
addition, the controller is automatically reset, restoring
the network connection. To eliminate the potential for the
hang, ensure that InterruptThrottleRate is set no greater
than 75,000 and is not set to 0.
NOTE:
When e1000 is loaded with default settings and multiple adapters
are in use simultaneously, the CPU utilization may increase non-
linearly. In order to limit the CPU utilization without impacting
the overall throughput, we recommend that you load the driver as
follows::
modprobe e1000 InterruptThrottleRate=3000,3000,3000
This sets the InterruptThrottleRate to 3000 interrupts/sec for
the first, second, and third instances of the driver. The range
of 2000 to 3000 interrupts per second works on a majority of
systems and is a good starting point, but the optimal value will
be platform-specific. If CPU utilization is not a concern, use
RX_POLLING (NAPI) and default driver settings.
RxDescriptors
-------------
:Valid Range:
- 48-256 for 82542 and 82543-based adapters
- 48-4096 for all other supported adapters
:Default Value: 256
This value specifies the number of receive buffer descriptors allocated
by the driver. Increasing this value allows the driver to buffer more
incoming packets, at the expense of increased system memory utilization.
Each descriptor is 16 bytes. A receive buffer is also allocated for each
descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending
on the MTU setting. The maximum MTU size is 16110.
NOTE:
MTU designates the frame size. It only needs to be set for Jumbo
Frames. Depending on the available system resources, the request
for a higher number of receive descriptors may be denied. In this
case, use a lower number.
RxIntDelay
----------
:Valid Range: 0-65535 (0=off)
:Default Value: 0
This value delays the generation of receive interrupts in units of 1.024
microseconds. Receive interrupt reduction can improve CPU efficiency if
properly tuned for specific network traffic. Increasing this value adds
extra latency to frame reception and can end up decreasing the throughput
of TCP traffic. If the system is reporting dropped receives, this value
may be set too high, causing the driver to run out of available receive
descriptors.
CAUTION:
When setting RxIntDelay to a value other than 0, adapters may
hang (stop transmitting) under certain network conditions. If
this occurs a NETDEV WATCHDOG message is logged in the system
event log. In addition, the controller is automatically reset,
restoring the network connection. To eliminate the potential
for the hang ensure that RxIntDelay is set to 0.
RxAbsIntDelay
-------------
(This parameter is supported only on 82540, 82545 and later adapters.)
:Valid Range: 0-65535 (0=off)
:Default Value: 128
This value, in units of 1.024 microseconds, limits the delay in which a
receive interrupt is generated. Useful only if RxIntDelay is non-zero,
this value ensures that an interrupt is generated after the initial
packet is received within the set amount of time. Proper tuning,
along with RxIntDelay, may improve traffic throughput in specific network
conditions.
Speed
-----
(This parameter is supported only on adapters with copper connections.)
:Valid Settings: 0, 10, 100, 1000
:Default Value: 0 (auto-negotiate at all supported speeds)
Speed forces the line speed to the specified value in megabits per second
(Mbps). If this parameter is not specified or is set to 0 and the link
partner is set to auto-negotiate, the board will auto-detect the correct
speed. Duplex should also be set when Speed is set to either 10 or 100.
TxDescriptors
-------------
:Valid Range:
- 48-256 for 82542 and 82543-based adapters
- 48-4096 for all other supported adapters
:Default Value: 256
This value is the number of transmit descriptors allocated by the driver.
Increasing this value allows the driver to queue more transmits. Each
descriptor is 16 bytes.
NOTE:
Depending on the available system resources, the request for a
higher number of transmit descriptors may be denied. In this case,
use a lower number.
TxIntDelay
----------
:Valid Range: 0-65535 (0=off)
:Default Value: 8
This value delays the generation of transmit interrupts in units of
1.024 microseconds. Transmit interrupt reduction can improve CPU
efficiency if properly tuned for specific network traffic. If the
system is reporting dropped transmits, this value may be set too high
causing the driver to run out of available transmit descriptors.
TxAbsIntDelay
-------------
(This parameter is supported only on 82540, 82545 and later adapters.)
:Valid Range: 0-65535 (0=off)
:Default Value: 32
This value, in units of 1.024 microseconds, limits the delay in which a
transmit interrupt is generated. Useful only if TxIntDelay is non-zero,
this value ensures that an interrupt is generated after the initial
packet is sent on the wire within the set amount of time. Proper tuning,
along with TxIntDelay, may improve traffic throughput in specific
network conditions.
XsumRX
------
(This parameter is NOT supported on the 82542-based adapter.)
:Valid Range: 0-1
:Default Value: 1
A value of '1' indicates that the driver should enable IP checksum
offload for received packets (both UDP and TCP) to the adapter hardware.
Copybreak
---------
:Valid Range: 0-xxxxxxx (0=off)
:Default Value: 256
:Usage: modprobe e1000.ko copybreak=128
Driver copies all packets below or equaling this size to a fresh RX
buffer before handing it up the stack.
This parameter is different than other parameters, in that it is a
single (not 1,1,1 etc.) parameter applied to all driver instances and
it is also available during runtime at
/sys/module/e1000/parameters/copybreak
SmartPowerDownEnable
--------------------
:Valid Range: 0-1
:Default Value: 0 (disabled)
Allows PHY to turn off in lower power states. The user can turn off
this parameter in supported chipsets.
Speed and Duplex Configuration
==============================
Three keywords are used to control the speed and duplex configuration.
These keywords are Speed, Duplex, and AutoNeg.
If the board uses a fiber interface, these keywords are ignored, and the
fiber interface board only links at 1000 Mbps full-duplex.
For copper-based boards, the keywords interact as follows:
- The default operation is auto-negotiate. The board advertises all
supported speed and duplex combinations, and it links at the highest
common speed and duplex mode IF the link partner is set to auto-negotiate.
- If Speed = 1000, limited auto-negotiation is enabled and only 1000 Mbps
is advertised (The 1000BaseT spec requires auto-negotiation.)
- If Speed = 10 or 100, then both Speed and Duplex should be set. Auto-
negotiation is disabled, and the AutoNeg parameter is ignored. Partner
SHOULD also be forced.
The AutoNeg parameter is used when more control is required over the
auto-negotiation process. It should be used when you wish to control which
speed and duplex combinations are advertised during the auto-negotiation
process.
The parameter may be specified as either a decimal or hexadecimal value as
determined by the bitmap below.
============== ====== ====== ======= ======= ====== ====== ======= ======
Bit position 7 6 5 4 3 2 1 0
Decimal Value 128 64 32 16 8 4 2 1
Hex value 80 40 20 10 8 4 2 1
Speed (Mbps) N/A N/A 1000 N/A 100 100 10 10
Duplex Full Full Half Full Half
============== ====== ====== ======= ======= ====== ====== ======= ======
Some examples of using AutoNeg::
modprobe e1000 AutoNeg=0x01 (Restricts autonegotiation to 10 Half)
modprobe e1000 AutoNeg=1 (Same as above)
modprobe e1000 AutoNeg=0x02 (Restricts autonegotiation to 10 Full)
modprobe e1000 AutoNeg=0x03 (Restricts autonegotiation to 10 Half or 10 Full)
modprobe e1000 AutoNeg=0x04 (Restricts autonegotiation to 100 Half)
modprobe e1000 AutoNeg=0x05 (Restricts autonegotiation to 10 Half or 100
Half)
modprobe e1000 AutoNeg=0x020 (Restricts autonegotiation to 1000 Full)
modprobe e1000 AutoNeg=32 (Same as above)
Note that when this parameter is used, Speed and Duplex must not be specified.
If the link partner is forced to a specific speed and duplex, then this
parameter should not be used. Instead, use the Speed and Duplex parameters
previously mentioned to force the adapter to the same speed and duplex.
Additional Configurations
=========================
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the MTU to a value larger than
the default of 1500. Use the ifconfig command to increase the MTU size.
For example::
ifconfig eth<x> mtu 9000 up
This setting is not saved across reboots. It can be made permanent if
you add::
MTU=9000
to the file /etc/sysconfig/network-scripts/ifcfg-eth<x>. This example
applies to the Red Hat distributions; other distributions may store this
setting in a different location.
Notes:
Degradation in throughput performance may be observed in some Jumbo frames
environments. If this is observed, increasing the application's socket buffer
size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
See the specific application manual and /usr/src/linux*/Documentation/
networking/ip-sysctl.txt for more details.
- The maximum MTU setting for Jumbo Frames is 16110. This value coincides
with the maximum Jumbo Frames size of 16128.
- Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
poor performance or loss of link.
- Adapters based on the Intel(R) 82542 and 82573V/E controller do not
support Jumbo Frames. These correspond to the following product names::
Intel(R) PRO/1000 Gigabit Server Adapter
Intel(R) PRO/1000 PM Network Connection
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The ethtool
version 1.6 or later is required for this functionality.
The latest release of ethtool can be found from
https://www.kernel.org/pub/software/network/ethtool/
Enabling Wake on LAN (WoL)
--------------------------
WoL is configured through the ethtool utility.
WoL will be enabled on the system during the next shut down or reboot.
For this driver version, in order to enable WoL, the e1000 driver must be
loaded when shutting down or rebooting the system.
Support
=======
For general information, go to the Intel support website at:
http://support.intel.com
or the Intel Wired Networking project hosted by Sourceforge at:
http://sourceforge.net/projects/e1000
If an issue is identified with the released source code on the supported
kernel with a supported adapter, email the specific information related
to the issue to e1000-devel@lists.sf.net

View File

@@ -0,0 +1,383 @@
.. SPDX-License-Identifier: GPL-2.0+
=====================================================
Linux Driver for Intel(R) Ethernet Network Connection
=====================================================
Intel Gigabit Linux driver.
Copyright(c) 2008-2018 Intel Corporation.
Contents
========
- Identifying Your Adapter
- Command Line Parameters
- Additional Configurations
- Support
Identifying Your Adapter
========================
For information on how to identify your adapter, and for the latest Intel
network drivers, refer to the Intel Support website:
https://www.intel.com/support
Command Line Parameters
=======================
If the driver is built as a module, the following optional parameters are used
by entering them on the command line with the modprobe command using this
syntax::
modprobe e1000e [<option>=<VAL1>,<VAL2>,...]
There needs to be a <VAL#> for each network port in the system supported by
this driver. The values will be applied to each instance, in function order.
For example::
modprobe e1000e InterruptThrottleRate=16000,16000
In this case, there are two network ports supported by e1000e in the system.
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
NOTE: A descriptor describes a data buffer and attributes related to the data
buffer. This information is accessed by the hardware.
InterruptThrottleRate
---------------------
:Valid Range: 0,1,3,4,100-100000
:Default Value: 3
Interrupt Throttle Rate controls the number of interrupts each interrupt
vector can generate per second. Increasing ITR lowers latency at the cost of
increased CPU utilization, though it may help throughput in some circumstances.
Setting InterruptThrottleRate to a value greater or equal to 100
will program the adapter to send out a maximum of that many interrupts
per second, even if more packets have come in. This reduces interrupt
load on the system and can lower CPU utilization under heavy load,
but will increase latency as packets are not processed as quickly.
The default behaviour of the driver previously assumed a static
InterruptThrottleRate value of 8000, providing a good fallback value for
all traffic types, but lacking in small packet performance and latency.
The hardware can handle many more small packets per second however, and
for this reason an adaptive interrupt moderation algorithm was implemented.
The driver has two adaptive modes (setting 1 or 3) in which
it dynamically adjusts the InterruptThrottleRate value based on the traffic
that it receives. After determining the type of incoming traffic in the last
timeframe, it will adjust the InterruptThrottleRate to an appropriate value
for that traffic.
The algorithm classifies the incoming traffic every interval into
classes. Once the class is determined, the InterruptThrottleRate value is
adjusted to suit that traffic type the best. There are three classes defined:
"Bulk traffic", for large amounts of packets of normal size; "Low latency",
for small amounts of traffic and/or a significant percentage of small
packets; and "Lowest latency", for almost completely small packets or
minimal traffic.
- 0: Off
Turns off any interrupt moderation and may improve small packet latency.
However, this is generally not suitable for bulk throughput traffic due
to the increased CPU utilization of the higher interrupt rate.
- 1: Dynamic mode
This mode attempts to moderate interrupts per vector while maintaining
very low latency. This can sometimes cause extra CPU utilization. If
planning on deploying e1000e in a latency sensitive environment, this
parameter should be considered.
- 3: Dynamic Conservative mode (default)
In dynamic conservative mode, the InterruptThrottleRate value is set to
4000 for traffic that falls in class "Bulk traffic". If traffic falls in
the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is
increased stepwise to 20000. This default mode is suitable for most
applications.
- 4: Simplified Balancing mode
In simplified mode the interrupt rate is based on the ratio of TX and
RX traffic. If the bytes per second rate is approximately equal, the
interrupt rate will drop as low as 2000 interrupts per second. If the
traffic is mostly transmit or mostly receive, the interrupt rate could
be as high as 8000.
- 100-100000:
Setting InterruptThrottleRate to a value greater or equal to 100
will program the adapter to send at most that many interrupts per second,
even if more packets have come in. This reduces interrupt load on the
system and can lower CPU utilization under heavy load, but will increase
latency as packets are not processed as quickly.
NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and
RxAbsIntDelay parameters. In other words, minimizing the receive and/or
transmit absolute delays does not force the controller to generate more
interrupts than what the Interrupt Throttle Rate allows.
RxIntDelay
----------
:Valid Range: 0-65535 (0=off)
:Default Value: 0
This value delays the generation of receive interrupts in units of 1.024
microseconds. Receive interrupt reduction can improve CPU efficiency if
properly tuned for specific network traffic. Increasing this value adds extra
latency to frame reception and can end up decreasing the throughput of TCP
traffic. If the system is reporting dropped receives, this value may be set
too high, causing the driver to run out of available receive descriptors.
CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang
(stop transmitting) under certain network conditions. If this occurs a NETDEV
WATCHDOG message is logged in the system event log. In addition, the
controller is automatically reset, restoring the network connection. To
eliminate the potential for the hang ensure that RxIntDelay is set to 0.
RxAbsIntDelay
-------------
:Valid Range: 0-65535 (0=off)
:Default Value: 8
This value, in units of 1.024 microseconds, limits the delay in which a
receive interrupt is generated. This value ensures that an interrupt is
generated after the initial packet is received within the set amount of time,
which is useful only if RxIntDelay is non-zero. Proper tuning, along with
RxIntDelay, may improve traffic throughput in specific network conditions.
TxIntDelay
----------
:Valid Range: 0-65535 (0=off)
:Default Value: 8
This value delays the generation of transmit interrupts in units of 1.024
microseconds. Transmit interrupt reduction can improve CPU efficiency if
properly tuned for specific network traffic. If the system is reporting
dropped transmits, this value may be set too high causing the driver to run
out of available transmit descriptors.
TxAbsIntDelay
-------------
:Valid Range: 0-65535 (0=off)
:Default Value: 32
This value, in units of 1.024 microseconds, limits the delay in which a
transmit interrupt is generated. It is useful only if TxIntDelay is non-zero.
It ensures that an interrupt is generated after the initial Packet is sent on
the wire within the set amount of time. Proper tuning, along with TxIntDelay,
may improve traffic throughput in specific network conditions.
copybreak
---------
:Valid Range: 0-xxxxxxx (0=off)
:Default Value: 256
The driver copies all packets below or equaling this size to a fresh receive
buffer before handing it up the stack.
This parameter differs from other parameters because it is a single (not 1,1,1
etc.) parameter applied to all driver instances and it is also available
during runtime at /sys/module/e1000e/parameters/copybreak.
To use copybreak, type::
modprobe e1000e.ko copybreak=128
SmartPowerDownEnable
--------------------
:Valid Range: 0,1
:Default Value: 0 (disabled)
Allows the PHY to turn off in lower power states. The user can turn off this
parameter in supported chipsets.
KumeranLockLoss
---------------
:Valid Range: 0,1
:Default Value: 1 (enabled)
This workaround skips resetting the PHY at shutdown for the initial silicon
releases of ICH8 systems.
IntMode
-------
:Valid Range: 0-2
:Default Value: 0
+-------+----------------+
| Value | Interrupt Mode |
+=======+================+
| 0 | Legacy |
+-------+----------------+
| 1 | MSI |
+-------+----------------+
| 2 | MSI-X |
+-------+----------------+
IntMode allows load time control over the type of interrupt registered for by
the driver. MSI-X is required for multiple queue support, and some kernels and
combinations of kernel .config options will force a lower level of interrupt
support.
This command will show different values for each type of interrupt::
cat /proc/interrupts
CrcStripping
------------
:Valid Range: 0,1
:Default Value: 1 (enabled)
Strip the CRC from received packets before sending up the network stack. If
you have a machine with a BMC enabled but cannot receive IPMI traffic after
loading or enabling the driver, try disabling this feature.
WriteProtectNVM
---------------
:Valid Range: 0,1
:Default Value: 1 (enabled)
If set to 1, configure the hardware to ignore all write/erase cycles to the
GbE region in the ICHx NVM (in order to prevent accidental corruption of the
NVM). This feature can be disabled by setting the parameter to 0 during initial
driver load.
NOTE: The machine must be power cycled (full off/on) when enabling NVM writes
via setting the parameter to zero. Once the NVM has been locked (via the
parameter at 1 when the driver loads) it cannot be unlocked except via power
cycle.
Debug
-----
:Valid Range: 0-16 (0=none,...,16=all)
:Default Value: 0
This parameter adjusts the level of debug messages displayed in the system logs.
Additional Features and Configurations
======================================
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
to a value larger than the default value of 1500.
Use the ifconfig command to increase the MTU size. For example, enter the
following where <x> is the interface number::
ifconfig eth<x> mtu 9000 up
Alternatively, you can use the ip command as follows::
ip link set mtu 9000 dev eth<x>
ip link set up dev eth<x>
This setting is not saved across reboots. The setting change can be made
permanent by adding 'MTU=9000' to the file:
- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
- For SLES: /etc/sysconfig/network/<config_file>
NOTE: The maximum MTU setting for Jumbo Frames is 8996. This value coincides
with the maximum Jumbo Frames size of 9018 bytes.
NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
poor performance or loss of link.
NOTE: The following adapters limit Jumbo Frames sized packets to a maximum of
4088 bytes:
- Intel(R) 82578DM Gigabit Network Connection
- Intel(R) 82577LM Gigabit Network Connection
The following adapters do not support Jumbo Frames:
- Intel(R) PRO/1000 Gigabit Server Adapter
- Intel(R) PRO/1000 PM Network Connection
- Intel(R) 82562G 10/100 Network Connection
- Intel(R) 82562G-2 10/100 Network Connection
- Intel(R) 82562GT 10/100 Network Connection
- Intel(R) 82562GT-2 10/100 Network Connection
- Intel(R) 82562V 10/100 Network Connection
- Intel(R) 82562V-2 10/100 Network Connection
- Intel(R) 82566DC Gigabit Network Connection
- Intel(R) 82566DC-2 Gigabit Network Connection
- Intel(R) 82566DM Gigabit Network Connection
- Intel(R) 82566MC Gigabit Network Connection
- Intel(R) 82566MM Gigabit Network Connection
- Intel(R) 82567V-3 Gigabit Network Connection
- Intel(R) 82577LC Gigabit Network Connection
- Intel(R) 82578DC Gigabit Network Connection
NOTE: Jumbo Frames cannot be configured on an 82579-based Network device if
MACSec is enabled on the system.
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest ethtool
version is required for this functionality. Download it at:
https://www.kernel.org/pub/software/network/ethtool/
NOTE: When validating enable/disable tests on some parts (for example, 82578),
it is necessary to add a few seconds between tests when working with ethtool.
Speed and Duplex Configuration
------------------------------
In addressing speed and duplex configuration issues, you need to distinguish
between copper-based adapters and fiber-based adapters.
In the default mode, an Intel(R) Ethernet Network Adapter using copper
connections will attempt to auto-negotiate with its link partner to determine
the best setting. If the adapter cannot establish link with the link partner
using auto-negotiation, you may need to manually configure the adapter and link
partner to identical settings to establish link and pass packets. This should
only be needed when attempting to link with an older switch that does not
support auto-negotiation or one that has been forced to a specific speed or
duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
and higher cannot be forced. Use the autonegotiation advertising setting to
manually set devices for 1 Gbps and higher.
Speed, duplex, and autonegotiation advertising are configured through the
ethtool utility.
Caution: Only experienced network administrators should force speed and duplex
or change autonegotiation advertising manually. The settings at the switch must
always match the adapter settings. Adapter performance may suffer or your
adapter may not operate if you configure the adapter differently from your
switch.
An Intel(R) Ethernet Network Adapter using fiber-based connections, however,
will not attempt to auto-negotiate with its link partner since those adapters
operate only in full duplex and only at their native speed.
Enabling Wake on LAN (WoL)
--------------------------
WoL is configured through the ethtool utility.
WoL will be enabled on the system during the next shut down or reboot. For
this driver version, in order to enable WoL, the e1000e driver must be loaded
prior to shutting down or suspending the system.
NOTE: Wake on LAN is only supported on port A for the following devices:
- Intel(R) PRO/1000 PT Dual Port Network Connection
- Intel(R) PRO/1000 PT Dual Port Server Connection
- Intel(R) PRO/1000 PT Dual Port Server Adapter
- Intel(R) PRO/1000 PF Dual Port Server Adapter
- Intel(R) PRO/1000 PT Quad Port Server Adapter
- Intel(R) Gigabit PT Quad Port Server ExpressModule
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,142 @@
.. SPDX-License-Identifier: GPL-2.0+
=============================================================
Linux Base Driver for Intel(R) Ethernet Multi-host Controller
=============================================================
August 20, 2018
Copyright(c) 2015-2018 Intel Corporation.
Contents
========
- Identifying Your Adapter
- Additional Configurations
- Performance Tuning
- Known Issues
- Support
Identifying Your Adapter
========================
The driver in this release is compatible with devices based on the Intel(R)
Ethernet Multi-host Controller.
For information on how to identify your adapter, and for the latest Intel
network drivers, refer to the Intel Support website:
http://www.intel.com/support
Flow Control
------------
The Intel(R) Ethernet Switch Host Interface Driver does not support Flow
Control. It will not send pause frames. This may result in dropped frames.
Virtual Functions (VFs)
-----------------------
Use sysfs to enable VFs.
Valid Range: 0-64
For example::
echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs //enable VFs
echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs
NOTE: Neither the device nor the driver control how VFs are mapped into config
space. Bus layout will vary by operating system. On operating systems that
support it, you can check sysfs to find the mapping.
NOTE: When SR-IOV mode is enabled, hardware VLAN filtering and VLAN tag
stripping/insertion will remain enabled. Please remove the old VLAN filter
before the new VLAN filter is added. For example::
ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0
ip link set eth0 vf 0 vlan 0 // Delete vlan 100
ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0
Additional Features and Configurations
======================================
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
to a value larger than the default value of 1500.
Use the ifconfig command to increase the MTU size. For example, enter the
following where <x> is the interface number::
ifconfig eth<x> mtu 9000 up
Alternatively, you can use the ip command as follows::
ip link set mtu 9000 dev eth<x>
ip link set up dev eth<x>
This setting is not saved across reboots. The setting change can be made
permanent by adding 'MTU=9000' to the file:
- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
- For SLES: /etc/sysconfig/network/<config_file>
NOTE: The maximum MTU setting for Jumbo Frames is 15342. This value coincides
with the maximum Jumbo Frames size of 15364 bytes.
NOTE: This driver will attempt to use multiple page sized buffers to receive
each jumbo packet. This should help to avoid buffer starvation issues when
allocating receive packets.
Generic Receive Offload, aka GRO
--------------------------------
The driver supports the in-kernel software implementation of GRO. GRO has
shown that by coalescing Rx traffic into larger chunks of data, CPU
utilization can be significantly reduced when under large Rx load. GRO is an
evolution of the previously-used LRO interface. GRO is able to coalesce
other protocols besides TCP. It's also safe to use with configurations that
are problematic for LRO, namely bridging and iSCSI.
Supported ethtool Commands and Options for Filtering
----------------------------------------------------
-n --show-nfc
Retrieves the receive network flow classification configurations.
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6
Retrieves the hash options for the specified network traffic type.
-N --config-nfc
Configures the receive network flow classification.
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r
Configures the hash options for the specified network traffic type.
- udp4: UDP over IPv4
- udp6: UDP over IPv6
- f Hash on bytes 0 and 1 of the Layer 4 header of the rx packet.
- n Hash on bytes 2 and 3 of the Layer 4 header of the rx packet.
Known Issues/Troubleshooting
============================
Enabling SR-IOV in a 64-bit Microsoft Windows Server 2012/R2 guest OS under Linux KVM
-------------------------------------------------------------------------------------
KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This
includes traditional PCIe devices, as well as SR-IOV-capable devices based on
the Intel Ethernet Controller XL710.
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,771 @@
.. SPDX-License-Identifier: GPL-2.0+
=================================================================
Linux Base Driver for the Intel(R) Ethernet Controller 700 Series
=================================================================
Intel 40 Gigabit Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
Contents
========
- Overview
- Identifying Your Adapter
- Intel(R) Ethernet Flow Director
- Additional Configurations
- Known Issues
- Support
Driver information can be obtained using ethtool, lspci, and ifconfig.
Instructions on updating ethtool can be found in the section Additional
Configurations later in this document.
For questions related to hardware requirements, refer to the documentation
supplied with your Intel adapter. All hardware requirements listed apply to use
with Linux.
Identifying Your Adapter
========================
The driver is compatible with devices based on the following:
* Intel(R) Ethernet Controller X710
* Intel(R) Ethernet Controller XL710
* Intel(R) Ethernet Network Connection X722
* Intel(R) Ethernet Controller XXV710
For the best performance, make sure the latest NVM/FW is installed on your
device.
For information on how to identify your adapter, and for the latest NVM/FW
images and Intel network drivers, refer to the Intel Support website:
https://www.intel.com/support
SFP+ and QSFP+ Devices
----------------------
For information about supported media, refer to this document:
https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf
NOTE: Some adapters based on the Intel(R) Ethernet Controller 700 Series only
support Intel Ethernet Optics modules. On these adapters, other modules are not
supported and will not function. In all cases Intel recommends using Intel
Ethernet Optics; other modules may function but are not validated by Intel.
Contact Intel for supported media types.
NOTE: For connections based on Intel(R) Ethernet Controller 700 Series, support
is dependent on your system board. Please see your vendor for details.
NOTE: In systems that do not have adequate airflow to cool the adapter and
optical modules, you must use high temperature optical modules.
Virtual Functions (VFs)
-----------------------
Use sysfs to enable VFs. For example::
#echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs #enable VFs
#echo 0 > /sys/class/net/$dev/device/sriov_numvfs #disable VFs
For example, the following instructions will configure PF eth0 and the first VF
on VLAN 10::
$ ip link set dev eth0 vf 0 vlan 10
VLAN Tag Packet Steering
------------------------
Allows you to send all packets with a specific VLAN tag to a particular SR-IOV
virtual function (VF). Further, this feature allows you to designate a
particular VF as trusted, and allows that trusted VF to request selective
promiscuous mode on the Physical Function (PF).
To set a VF as trusted or untrusted, enter the following command in the
Hypervisor::
# ip link set dev eth0 vf 1 trust [on|off]
Once the VF is designated as trusted, use the following commands in the VM to
set the VF to promiscuous mode.
::
For promiscuous all:
#ip link set eth2 promisc on
Where eth2 is a VF interface in the VM
For promiscuous Multicast:
#ip link set eth2 allmulticast on
Where eth2 is a VF interface in the VM
NOTE: By default, the ethtool priv-flag vf-true-promisc-support is set to
"off",meaning that promiscuous mode for the VF will be limited. To set the
promiscuous mode for the VF to true promiscuous and allow the VF to see all
ingress traffic, use the following command::
#ethtool -set-priv-flags p261p1 vf-true-promisc-support on
The vf-true-promisc-support priv-flag does not enable promiscuous mode; rather,
it designates which type of promiscuous mode (limited or true) you will get
when you enable promiscuous mode using the ip link commands above. Note that
this is a global setting that affects the entire device. However,the
vf-true-promisc-support priv-flag is only exposed to the first PF of the
device. The PF remains in limited promiscuous mode (unless it is in MFP mode)
regardless of the vf-true-promisc-support setting.
Now add a VLAN interface on the VF interface::
#ip link add link eth2 name eth2.100 type vlan id 100
Note that the order in which you set the VF to promiscuous mode and add the
VLAN interface does not matter (you can do either first). The end result in
this example is that the VF will get all traffic that is tagged with VLAN 100.
Intel(R) Ethernet Flow Director
-------------------------------
The Intel Ethernet Flow Director performs the following tasks:
- Directs receive packets according to their flows to different queues.
- Enables tight control on routing a flow in the platform.
- Matches flows and CPU cores for flow affinity.
- Supports multiple parameters for flexible flow classification and load
balancing (in SFP mode only).
NOTE: The Linux i40e driver supports the following flow types: IPv4, TCPv4, and
UDPv4. For a given flow type, it supports valid combinations of IP addresses
(source or destination) and UDP/TCP ports (source and destination). For
example, you can supply only a source IP address, a source IP address and a
destination port, or any combination of one or more of these four parameters.
NOTE: The Linux i40e driver allows you to filter traffic based on a
user-defined flexible two-byte pattern and offset by using the ethtool user-def
and mask fields. Only L3 and L4 flow types are supported for user-defined
flexible filters. For a given flow type, you must clear all Intel Ethernet Flow
Director filters before changing the input set (for that flow type).
To enable or disable the Intel Ethernet Flow Director::
# ethtool -K ethX ntuple <on|off>
When disabling ntuple filters, all the user programmed filters are flushed from
the driver cache and hardware. All needed filters must be re-added when ntuple
is re-enabled.
To add a filter that directs packet to queue 2, use -U or -N switch::
# ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
To set a filter using only the source and destination IP address::
# ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
192.168.10.2 action 2 [loc 1]
To see the list of filters currently present::
# ethtool <-u|-n> ethX
Application Targeted Routing (ATR) Perfect Filters
--------------------------------------------------
ATR is enabled by default when the kernel is in multiple transmit queue mode.
An ATR Intel Ethernet Flow Director filter rule is added when a TCP-IP flow
starts and is deleted when the flow ends. When a TCP-IP Intel Ethernet Flow
Director rule is added from ethtool (Sideband filter), ATR is turned off by the
driver. To re-enable ATR, the sideband can be disabled with the ethtool -K
option. For example::
ethtool K [adapter] ntuple [off|on]
If sideband is re-enabled after ATR is re-enabled, ATR remains enabled until a
TCP-IP flow is added. When all TCP-IP sideband rules are deleted, ATR is
automatically re-enabled.
Packets that match the ATR rules are counted in fdir_atr_match stats in
ethtool, which also can be used to verify whether ATR rules still exist.
Sideband Perfect Filters
------------------------
Sideband Perfect Filters are used to direct traffic that matches specified
characteristics. They are enabled through ethtool's ntuple interface. To add a
new filter use the following command::
ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \
dst-port <port> action <queue>
Where:
<device> - the ethernet device to program
<type> - can be ip4, tcp4, udp4, or sctp4
<ip> - the ip address to match on
<port> - the port number to match on
<queue> - the queue to direct traffic towards (-1 discards matching traffic)
Use the following command to display all of the active filters::
ethtool -u <device>
Use the following command to delete a filter::
ethtool -U <device> delete <N>
Where <N> is the filter id displayed when printing all the active filters, and
may also have been specified using "loc <N>" when adding the filter.
The following example matches TCP traffic sent from 192.168.0.1, port 5300,
directed to 192.168.0.5, port 80, and sends it to queue 7::
ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \
src-port 5300 dst-port 80 action 7
For each flow-type, the programmed filters must all have the same matching
input set. For example, issuing the following two commands is acceptable::
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
Issuing the next two commands, however, is not acceptable, since the first
specifies src-ip and the second specifies dst-ip::
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
The second command will fail with an error. You may program multiple filters
with the same fields, using different values, but, on one device, you may not
program two tcp4 filters with different matching fields.
Matching on a sub-portion of a field is not supported by the i40e driver, thus
partial mask fields are not supported.
The driver also supports matching user-defined data within the packet payload.
This flexible data is specified using the "user-def" field of the ethtool
command in the following way:
+----------------------------+--------------------------+
| 31 28 24 20 16 | 15 12 8 4 0 |
+----------------------------+--------------------------+
| offset into packet payload | 2 bytes of flexible data |
+----------------------------+--------------------------+
For example,
::
... user-def 0x4FFFF ...
tells the filter to look 4 bytes into the payload and match that value against
0xFFFF. The offset is based on the beginning of the payload, and not the
beginning of the packet. Thus
::
flow-type tcp4 ... user-def 0x8BEAF ...
would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the
TCP/IPv4 payload.
Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload.
Thus to match the first byte of the payload, you must actually add 4 bytes to
the offset. Also note that ip4 filters match both ICMP frames as well as raw
(unknown) ip4 frames, where the payload will be the L3 payload of the IP4 frame.
The maximum offset is 64. The hardware will only read up to 64 bytes of data
from the payload. The offset must be even because the flexible data is 2 bytes
long and must be aligned to byte 0 of the packet payload.
The user-defined flexible offset is also considered part of the input set and
cannot be programmed separately for multiple filters of the same type. However,
the flexible data is not part of the input set and multiple filters may use the
same offset but match against different data.
To create filters that direct traffic to a specific Virtual Function, use the
"action" parameter. Specify the action as a 64 bit value, where the lower 32
bits represents the queue number, while the next 8 bits represent which VF.
Note that 0 is the PF, so the VF identifier is offset by 1. For example::
... action 0x800000002 ...
specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of
that VF.
Note that these filters will not break internal routing rules, and will not
route traffic that otherwise would not have been sent to the specified Virtual
Function.
Setting the link-down-on-close Private Flag
-------------------------------------------
When the link-down-on-close private flag is set to "on", the port's link will
go down when the interface is brought down using the ifconfig ethX down command.
Use ethtool to view and set link-down-on-close, as follows::
ethtool --show-priv-flags ethX
ethtool --set-priv-flags ethX link-down-on-close [on|off]
Viewing Link Messages
---------------------
Link messages will not be displayed to the console if the distribution is
restricting system messages. In order to see network driver link messages on
your console, set dmesg to eight by entering the following::
dmesg -n 8
NOTE: This setting is not saved across reboots.
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
to a value larger than the default value of 1500.
Use the ifconfig command to increase the MTU size. For example, enter the
following where <x> is the interface number::
ifconfig eth<x> mtu 9000 up
Alternatively, you can use the ip command as follows::
ip link set mtu 9000 dev eth<x>
ip link set up dev eth<x>
This setting is not saved across reboots. The setting change can be made
permanent by adding 'MTU=9000' to the file::
/etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL
/etc/sysconfig/network/<config_file> // for SLES
NOTE: The maximum MTU setting for Jumbo Frames is 9702. This value coincides
with the maximum Jumbo Frames size of 9728 bytes.
NOTE: This driver will attempt to use multiple page sized buffers to receive
each jumbo packet. This should help to avoid buffer starvation issues when
allocating receive packets.
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest ethtool
version is required for this functionality. Download it at:
https://www.kernel.org/pub/software/network/ethtool/
Supported ethtool Commands and Options for Filtering
----------------------------------------------------
-n --show-nfc
Retrieves the receive network flow classification configurations.
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6
Retrieves the hash options for the specified network traffic type.
-N --config-nfc
Configures the receive network flow classification.
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r...
Configures the hash options for the specified network traffic type.
udp4 UDP over IPv4
udp6 UDP over IPv6
f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
Speed and Duplex Configuration
------------------------------
In addressing speed and duplex configuration issues, you need to distinguish
between copper-based adapters and fiber-based adapters.
In the default mode, an Intel(R) Ethernet Network Adapter using copper
connections will attempt to auto-negotiate with its link partner to determine
the best setting. If the adapter cannot establish link with the link partner
using auto-negotiation, you may need to manually configure the adapter and link
partner to identical settings to establish link and pass packets. This should
only be needed when attempting to link with an older switch that does not
support auto-negotiation or one that has been forced to a specific speed or
duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
and higher cannot be forced. Use the autonegotiation advertising setting to
manually set devices for 1 Gbps and higher.
NOTE: You cannot set the speed for devices based on the Intel(R) Ethernet
Network Adapter XXV710 based devices.
Speed, duplex, and autonegotiation advertising are configured through the
ethtool utility.
Caution: Only experienced network administrators should force speed and duplex
or change autonegotiation advertising manually. The settings at the switch must
always match the adapter settings. Adapter performance may suffer or your
adapter may not operate if you configure the adapter differently from your
switch.
An Intel(R) Ethernet Network Adapter using fiber-based connections, however,
will not attempt to auto-negotiate with its link partner since those adapters
operate only in full duplex and only at their native speed.
NAPI
----
NAPI (Rx polling mode) is supported in the i40e driver.
For more information on NAPI, see
https://wiki.linuxfoundation.org/networking/napi
Flow Control
------------
Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
receiving and transmitting pause frames for i40e. When transmit is enabled,
pause frames are generated when the receive packet buffer crosses a predefined
threshold. When receive is enabled, the transmit unit will halt for the time
delay specified when a pause frame is received.
NOTE: You must have a flow control capable link partner.
Flow Control is on by default.
Use ethtool to change the flow control settings.
To enable or disable Rx or Tx Flow Control::
ethtool -A eth? rx <on|off> tx <on|off>
Note: This command only enables or disables Flow Control if auto-negotiation is
disabled. If auto-negotiation is enabled, this command changes the parameters
used for auto-negotiation with the link partner.
To enable or disable auto-negotiation::
ethtool -s eth? autoneg <on|off>
Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
on your device, you may not be able to change the auto-negotiation setting.
RSS Hash Flow
-------------
Allows you to set the hash bytes per flow type and any combination of one or
more options for Receive Side Scaling (RSS) hash byte configuration.
::
# ethtool -N <dev> rx-flow-hash <type> <option>
Where <type> is:
tcp4 signifying TCP over IPv4
udp4 signifying UDP over IPv4
tcp6 signifying TCP over IPv6
udp6 signifying UDP over IPv6
And <option> is one or more of:
s Hash on the IP source address of the Rx packet.
d Hash on the IP destination address of the Rx packet.
f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
MAC and VLAN anti-spoofing feature
----------------------------------
When a malicious driver attempts to send a spoofed packet, it is dropped by the
hardware and not transmitted.
NOTE: This feature can be disabled for a specific Virtual Function (VF)::
ip link set <pf dev> vf <vf id> spoofchk {off|on}
IEEE 1588 Precision Time Protocol (PTP) Hardware Clock (PHC)
------------------------------------------------------------
Precision Time Protocol (PTP) is used to synchronize clocks in a computer
network. PTP support varies among Intel devices that support this driver. Use
"ethtool -T <netdev name>" to get a definitive list of PTP capabilities
supported by the device.
IEEE 802.1ad (QinQ) Support
---------------------------
The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
allow L2 tunneling and the ability to segregate traffic within a particular
VLAN ID, among other uses.
The following are examples of how to configure 802.1ad (QinQ)::
ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
Where "24" and "371" are example VLAN IDs.
NOTES:
Receive checksum offloads, cloud filters, and VLAN acceleration are not
supported for 802.1ad (QinQ) packets.
VXLAN and GENEVE Overlay HW Offloading
--------------------------------------
Virtual Extensible LAN (VXLAN) allows you to extend an L2 network over an L3
network, which may be useful in a virtualized or cloud environment. Some
Intel(R) Ethernet Network devices perform VXLAN processing, offloading it from
the operating system. This reduces CPU utilization.
VXLAN offloading is controlled by the Tx and Rx checksum offload options
provided by ethtool. That is, if Tx checksum offload is enabled, and the
adapter has the capability, VXLAN offloading is also enabled.
Support for VXLAN and GENEVE HW offloading is dependent on kernel support of
the HW offloading features.
Multiple Functions per Port
---------------------------
Some adapters based on the Intel Ethernet Controller X710/XL710 support
multiple functions on a single physical port. Configure these functions through
the System Setup/BIOS.
Minimum TX Bandwidth is the guaranteed minimum data transmission bandwidth, as
a percentage of the full physical port link speed, that the partition will
receive. The bandwidth the partition is awarded will never fall below the level
you specify.
The range for the minimum bandwidth values is:
1 to ((100 minus # of partitions on the physical port) plus 1)
For example, if a physical port has 4 partitions, the range would be:
1 to ((100 - 4) + 1 = 97)
The Maximum Bandwidth percentage represents the maximum transmit bandwidth
allocated to the partition as a percentage of the full physical port link
speed. The accepted range of values is 1-100. The value is used as a limiter,
should you chose that any one particular function not be able to consume 100%
of a port's bandwidth (should it be available). The sum of all the values for
Maximum Bandwidth is not restricted, because no more than 100% of a port's
bandwidth can ever be used.
NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions
per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says
"add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than
64 virtual functions (VFs).
Data Center Bridging (DCB)
--------------------------
DCB is a configuration Quality of Service implementation in hardware. It uses
the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
different priorities that traffic can be filtered into. It also enables
priority flow control (802.1Qbb) which can limit or eliminate the number of
dropped packets during network stress. Bandwidth can be allocated to each of
these priorities, which is enforced at the hardware level (802.1Qaz).
Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and
802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only
and can accept settings from a DCBX capable peer. Software configuration of
DCBX parameters via dcbtool/lldptool are not supported.
NOTE: Firmware LLDP can be disabled by setting the private flag disable-fw-lldp.
The i40e driver implements the DCB netlink interface layer to allow user-space
to communicate with the driver and query DCB configuration for the port.
NOTE:
The kernel assumes that TC0 is available, and will disable Priority Flow
Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
enabled when setting up DCB on your switch.
Interrupt Rate Limiting
-----------------------
:Valid Range: 0-235 (0=no limit)
The Intel(R) Ethernet Controller XL710 family supports an interrupt rate
limiting mechanism. The user can control, via ethtool, the number of
microseconds between interrupts.
Syntax::
# ethtool -C ethX rx-usecs-high N
The range of 0-235 microseconds provides an effective range of 4,310 to 250,000
interrupts per second. The value of rx-usecs-high can be set independently of
rx-usecs and tx-usecs in the same ethtool command, and is also independent of
the adaptive interrupt moderation algorithm. The underlying hardware supports
granularity in 4-microsecond intervals, so adjacent values may result in the
same interrupt rate.
One possible use case is the following::
# ethtool -C ethX adaptive-rx off adaptive-tx off rx-usecs-high 20 rx-usecs \
5 tx-usecs 5
The above command would disable adaptive interrupt moderation, and allow a
maximum of 5 microseconds before indicating a receive or transmit was complete.
However, instead of resulting in as many as 200,000 interrupts per second, it
limits total interrupts per second to 50,000 via the rx-usecs-high parameter.
Performance Optimization
========================
Driver defaults are meant to fit a wide variety of workloads, but if further
optimization is required we recommend experimenting with the following settings.
NOTE: For better performance when processing small (64B) frame sizes, try
enabling Hyper threading in the BIOS in order to increase the number of logical
cores in the system and subsequently increase the number of queues available to
the adapter.
Virtualized Environments
------------------------
1. Disable XPS on both ends by using the included virt_perf_default script
or by running the following command as root::
for file in `ls /sys/class/net/<ethX>/queues/tx-*/xps_cpus`;
do echo 0 > $file; done
2. Using the appropriate mechanism (vcpupin) in the vm, pin the cpu's to
individual lcpu's, making sure to use a set of cpu's included in the
device's local_cpulist: /sys/class/net/<ethX>/device/local_cpulist.
3. Configure as many Rx/Tx queues in the VM as available. Do not rely on
the default setting of 1.
Non-virtualized Environments
----------------------------
Pin the adapter's IRQs to specific cores by disabling the irqbalance service
and using the included set_irq_affinity script. Please see the script's help
text for further options.
- The following settings will distribute the IRQs across all the cores evenly::
# scripts/set_irq_affinity -x all <interface1> , [ <interface2>, ... ]
- The following settings will distribute the IRQs across all the cores that are
local to the adapter (same NUMA node)::
# scripts/set_irq_affinity -x local <interface1> ,[ <interface2>, ... ]
For very CPU intensive workloads, we recommend pinning the IRQs to all cores.
For IP Forwarding: Disable Adaptive ITR and lower Rx and Tx interrupts per
queue using ethtool.
- Setting rx-usecs and tx-usecs to 125 will limit interrupts to about 8000
interrupts per second per queue.
::
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 125 \
tx-usecs 125
For lower CPU utilization: Disable Adaptive ITR and lower Rx and Tx interrupts
per queue using ethtool.
- Setting rx-usecs and tx-usecs to 250 will limit interrupts to about 4000
interrupts per second per queue.
::
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 250 \
tx-usecs 250
For lower latency: Disable Adaptive ITR and ITR by setting Rx and Tx to 0 using
ethtool.
::
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 0 \
tx-usecs 0
Application Device Queues (ADq)
-------------------------------
Application Device Queues (ADq) allows you to dedicate one or more queues to a
specific application. This can reduce latency for the specified application,
and allow Tx traffic to be rate limited per application. Follow the steps below
to set ADq.
1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
The shaper bw_rlimit parameter is optional.
Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
to 1Gbit for tc0 and 3Gbit for tc1.
::
# tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
max_rate 1Gbit 3Gbit
map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
sets priorities 0-3 to use tc0 and 4-7 to use tc1)
queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
number of queues for all tcs is 64 or number of cores, whichever is lower.)
hw 1 mode channel: channel with hw set to 1 is a new new hardware
offload mode in mqprio that makes full use of the mqprio options, the
TCs, the queue configurations, and the QoS parameters.
shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
Totals must be equal or less than port speed.
For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
monitoring tools such as ifstat or sar n DEV [interval] [number of samples]
2. Enable HW TC offload on interface::
# ethtool -K <interface> hw-tc-offload on
3. Apply TCs to ingress (RX) flow of interface::
# tc qdisc add dev <interface> ingress
NOTES:
- Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory.
- ADq is not compatible with cloud filters.
- Setting up channels via ethtool (ethtool -L) is not supported when the
TCs are configured using mqprio.
- You must have iproute2 latest version
- NVM version 6.01 or later is required.
- ADq cannot be enabled when any the following features are enabled: Data
Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband
Filters.
- If another driver (for example, DPDK) has set cloud filters, you cannot
enable ADq.
- Tunnel filters are not supported in ADq. If encapsulated packets do
arrive in non-tunnel mode, filtering will be done on the inner headers.
For example, for VXLAN traffic in non-tunnel mode, PCTYPE is identified
as a VXLAN encapsulated packet, outer headers are ignored. Therefore,
inner headers are matched.
- If a TC filter on a PF matches traffic over a VF (on the PF), that
traffic will be routed to the appropriate queue of the PF, and will
not be passed on the VF. Such traffic will end up getting dropped higher
up in the TCP/IP stack as it does not match PF address data.
- If traffic matches multiple TC filters that point to different TCs,
that traffic will be duplicated and sent to all matching TC queues.
The hardware switch mirrors the packet to a VSI list when multiple
filters are matched.
Known Issues/Troubleshooting
============================
NOTE: 1 Gb devices based on the Intel(R) Ethernet Network Connection X722 do
not support the following features:
* Data Center Bridging (DCB)
* QOS
* VMQ
* SR-IOV
* Task Encapsulation offload (VXLAN, NVGRE)
* Energy Efficient Ethernet (EEE)
* Auto-media detect
Unexpected Issues when the device driver and DPDK share a device
----------------------------------------------------------------
Unexpected issues may result when an i40e device is in multi driver mode and
the kernel driver and DPDK driver are sharing the device. This is because
access to the global NIC resources is not synchronized between multiple
drivers. Any change to the global NIC configuration (writing to a global
register, setting global configuration by AQ, or changing switch modes) will
affect all ports and drivers on the device. Loading DPDK with the
"multi-driver" module parameter may mitigate some of the issues.
TC0 must be enabled when setting up DCB on a switch
---------------------------------------------------
The kernel assumes that TC0 is available, and will disable Priority Flow
Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
enabled when setting up DCB on your switch.
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,331 @@
.. SPDX-License-Identifier: GPL-2.0+
=================================================================
Linux Base Driver for Intel(R) Ethernet Adaptive Virtual Function
=================================================================
Intel Ethernet Adaptive Virtual Function Linux driver.
Copyright(c) 2013-2018 Intel Corporation.
Contents
========
- Overview
- Identifying Your Adapter
- Additional Configurations
- Known Issues/Troubleshooting
- Support
Overview
========
This file describes the iavf Linux Base Driver. This driver was formerly
called i40evf.
The iavf driver supports the below mentioned virtual function devices and
can only be activated on kernels running the i40e or newer Physical Function
(PF) driver compiled with CONFIG_PCI_IOV. The iavf driver requires
CONFIG_PCI_MSI to be enabled.
The guest OS loading the iavf driver must support MSI-X interrupts.
Identifying Your Adapter
========================
The driver in this kernel is compatible with devices based on the following:
* Intel(R) XL710 X710 Virtual Function
* Intel(R) X722 Virtual Function
* Intel(R) XXV710 Virtual Function
* Intel(R) Ethernet Adaptive Virtual Function
For the best performance, make sure the latest NVM/FW is installed on your
device.
For information on how to identify your adapter, and for the latest NVM/FW
images and Intel network drivers, refer to the Intel Support website:
http://www.intel.com/support
Additional Features and Configurations
======================================
Viewing Link Messages
---------------------
Link messages will not be displayed to the console if the distribution is
restricting system messages. In order to see network driver link messages on
your console, set dmesg to eight by entering the following::
# dmesg -n 8
NOTE:
This setting is not saved across reboots.
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest ethtool
version is required for this functionality. Download it at:
https://www.kernel.org/pub/software/network/ethtool/
Setting VLAN Tag Stripping
--------------------------
If you have applications that require Virtual Functions (VFs) to receive
packets with VLAN tags, you can disable VLAN tag stripping for the VF. The
Physical Function (PF) processes requests issued from the VF to enable or
disable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF,
then requests from that VF to set VLAN tag stripping will be ignored.
To enable/disable VLAN tag stripping for a VF, issue the following command
from inside the VM in which you are running the VF::
# ethtool -K <if_name> rxvlan on/off
or alternatively::
# ethtool --offload <if_name> rxvlan on/off
Adaptive Virtual Function
-------------------------
Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to
adapt to changing feature sets of the physical function driver (PF) with which
it is associated. This allows system administrators to update a PF without
having to update all the VFs associated with it. All AVFs have a single common
device ID and branding string.
AVFs have a minimum set of features known as "base mode," but may provide
additional features depending on what features are available in the PF with
which the AVF is associated. The following are base mode features:
- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
for Tx/Rx
- i40e descriptors and ring format
- Descriptor write-back completion
- 1 control queue, with i40e descriptors, CSRs and ring format
- 5 MSI-X interrupt vectors and corresponding i40e CSRs
- 1 Interrupt Throttle Rate (ITR) index
- 1 Virtual Station Interface (VSI) per VF
- 1 Traffic Class (TC), TC0
- Receive Side Scaling (RSS) with 64 entry indirection table and key,
configured through the PF
- 1 unicast MAC address reserved per VF
- 16 MAC address filters for each VF
- Stateless offloads - non-tunneled checksums
- AVF device ID
- HW mailbox is used for VF to PF communications (including on Windows)
IEEE 802.1ad (QinQ) Support
---------------------------
The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
allow L2 tunneling and the ability to segregate traffic within a particular
VLAN ID, among other uses.
The following are examples of how to configure 802.1ad (QinQ)::
# ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
# ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
Where "24" and "371" are example VLAN IDs.
NOTES:
Receive checksum offloads, cloud filters, and VLAN acceleration are not
supported for 802.1ad (QinQ) packets.
Application Device Queues (ADq)
-------------------------------
Application Device Queues (ADq) allows you to dedicate one or more queues to a
specific application. This can reduce latency for the specified application,
and allow Tx traffic to be rate limited per application. Follow the steps below
to set ADq.
Requirements:
- The sch_mqprio, act_mirred and cls_flower modules must be loaded
- The latest version of iproute2
- If another driver (for example, DPDK) has set cloud filters, you cannot
enable ADQ
- Depending on the underlying PF device, ADQ cannot be enabled when the
following features are enabled:
+ Data Center Bridging (DCB)
+ Multiple Functions per Port (MFP)
+ Sideband Filters
1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
The shaper bw_rlimit parameter is optional.
Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
to 1Gbit for tc0 and 3Gbit for tc1.
::
tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
max_rate 1Gbit 3Gbit
map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
sets priorities 0-3 to use tc0 and 4-7 to use tc1)
queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
number of queues for all tcs is 64 or number of cores, whichever is lower.)
hw 1 mode channel: channel with hw set to 1 is a new new hardware
offload mode in mqprio that makes full use of the mqprio options, the
TCs, the queue configurations, and the QoS parameters.
shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
Totals must be equal or less than port speed.
For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
monitoring tools such as ifstat or sar n DEV [interval] [number of samples]
NOTE:
Setting up channels via ethtool (ethtool -L) is not supported when the
TCs are configured using mqprio.
2. Enable HW TC offload on interface::
# ethtool -K <interface> hw-tc-offload on
3. Apply TCs to ingress (RX) flow of interface::
# tc qdisc add dev <interface> ingress
NOTES:
- Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory
- ADq is not compatible with cloud filters
- Setting up channels via ethtool (ethtool -L) is not supported when the TCs
are configured using mqprio
- You must have iproute2 latest version
- NVM version 6.01 or later is required
- ADq cannot be enabled when any the following features are enabled: Data
Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters
- If another driver (for example, DPDK) has set cloud filters, you cannot
enable ADq
- Tunnel filters are not supported in ADq. If encapsulated packets do arrive
in non-tunnel mode, filtering will be done on the inner headers. For example,
for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN
encapsulated packet, outer headers are ignored. Therefore, inner headers are
matched.
- If a TC filter on a PF matches traffic over a VF (on the PF), that traffic
will be routed to the appropriate queue of the PF, and will not be passed on
the VF. Such traffic will end up getting dropped higher up in the TCP/IP
stack as it does not match PF address data.
- If traffic matches multiple TC filters that point to different TCs, that
traffic will be duplicated and sent to all matching TC queues. The hardware
switch mirrors the packet to a VSI list when multiple filters are matched.
Known Issues/Troubleshooting
============================
Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device
---------------------------------------------------------------------------------
If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700
series based device, the VF slaves may fail when they become the active slave.
If the MAC address of the VF is set by the PF (Physical Function) of the
device, when you add a slave, or change the active-backup slave, Linux bonding
tries to sync the backup slave's MAC address to the same MAC address as the
active slave. Linux bonding will fail at this point. This issue will not occur
if the VF's MAC address is not set by the PF.
Traffic Is Not Being Passed Between VM and Client
-------------------------------------------------
You may not be able to pass traffic between a client system and a
Virtual Machine (VM) running on a separate host if the Virtual Function
(VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled
on the VF. Note that this situation can occur in any combination of client,
host, and guest operating system. For information on how to set the VF to
trusted mode, refer to the section "VLAN Tag Packet Steering" in this
readme document. For information on setting spoof checking, refer to the
section "MAC and VLAN anti-spoofing feature" in this readme document.
Do not unload port driver if VF with active VM is bound to it
-------------------------------------------------------------
Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
Once the VM shuts down, or otherwise releases the VF, the command will complete.
Using four traffic classes fails
--------------------------------
Do not try to reserve more than three traffic classes in the iavf driver. Doing
so will fail to set any traffic classes and will cause the driver to write
errors to stdout. Use a maximum of three queues to avoid this issue.
Multiple log error messages on iavf driver removal
--------------------------------------------------
If you have several VFs and you remove the iavf driver, several instances of
the following log errors are written to the log::
Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
Unable to send the message to VF 2 aq_err 12
ARQ Overflow Error detected
Virtual machine does not get link
---------------------------------
If the virtual machine has more than one virtual port assigned to it, and those
virtual ports are bound to different physical ports, you may not get link on
all of the virtual ports. The following command may work around the issue::
# ethtool -r <PF>
Where <PF> is the PF interface in the host, for example: p5p1. You may need to
run the command more than once to get link on all virtual ports.
MAC address of Virtual Function changes unexpectedly
----------------------------------------------------
If a Virtual Function's MAC address is not assigned in the host, then the VF
(virtual function) driver will use a random MAC address. This random MAC
address may change each time the VF driver is reloaded. You can assign a static
MAC address in the host machine. This static MAC address will survive
a VF driver reload.
Driver Buffer Overflow Fix
--------------------------
The fix to resolve CVE-2016-8105, referenced in Intel SA-00069
https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00069.html
is included in this and future versions of the driver.
Multiple Interfaces on Same Ethernet Broadcast Network
------------------------------------------------------
Due to the default ARP behavior on Linux, it is not possible to have one system
on two IP networks in the same Ethernet broadcast domain (non-partitioned
switch) behave as expected. All Ethernet interfaces will respond to IP traffic
for any IP address assigned to the system. This results in unbalanced receive
traffic.
If you have multiple interfaces in a server, either turn on ARP filtering by
entering::
# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
NOTE:
This setting is not saved across reboots. The configuration change can be
made permanent by adding the following line to the file /etc/sysctl.conf::
net.ipv4.conf.all.arp_filter = 1
Another alternative is to install the interfaces in separate broadcast domains
(either in different switches or in a switch partitioned to VLANs).
Rx Page Allocation Errors
-------------------------
'Page allocation failure. order:0' errors may occur under stress.
This is caused by the way the Linux kernel reports this stressed condition.
Support
=======
For general information, go to the Intel support website at:
https://support.intel.com
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on the supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net

View File

@@ -0,0 +1,46 @@
.. SPDX-License-Identifier: GPL-2.0+
==================================================================
Linux Base Driver for the Intel(R) Ethernet Connection E800 Series
==================================================================
Intel ice Linux driver.
Copyright(c) 2018 Intel Corporation.
Contents
========
- Enabling the driver
- Support
The driver in this release supports Intel's E800 Series of products. For
more information, visit Intel's support page at https://support.intel.com.
Enabling the driver
===================
The driver is enabled via the standard kernel configuration system,
using the make command::
make oldconfig/menuconfig/etc.
The driver is located in the menu structure at:
-> Device Drivers
-> Network device support (NETDEVICES [=y])
-> Ethernet driver support
-> Intel devices
-> Intel(R) Ethernet Connection E800 Series Support
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,213 @@
.. SPDX-License-Identifier: GPL-2.0+
==========================================================
Linux Base Driver for Intel(R) Ethernet Network Connection
==========================================================
Intel Gigabit Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
Contents
========
- Identifying Your Adapter
- Command Line Parameters
- Additional Configurations
- Support
Identifying Your Adapter
========================
For information on how to identify your adapter, and for the latest Intel
network drivers, refer to the Intel Support website:
http://www.intel.com/support
Command Line Parameters
========================
If the driver is built as a module, the following optional parameters are used
by entering them on the command line with the modprobe command using this
syntax::
modprobe igb [<option>=<VAL1>,<VAL2>,...]
There needs to be a <VAL#> for each network port in the system supported by
this driver. The values will be applied to each instance, in function order.
For example::
modprobe igb max_vfs=2,4
In this case, there are two network ports supported by igb in the system.
NOTE: A descriptor describes a data buffer and attributes related to the data
buffer. This information is accessed by the hardware.
max_vfs
-------
:Valid Range: 0-7
This parameter adds support for SR-IOV. It causes the driver to spawn up to
max_vfs worth of virtual functions. If the value is greater than 0 it will
also force the VMDq parameter to be 1 or more.
The parameters for the driver are referenced by position. Thus, if you have a
dual port adapter, or more than one adapter in your system, and want N virtual
functions per port, you must specify a number for each port with each parameter
separated by a comma. For example::
modprobe igb max_vfs=4
This will spawn 4 VFs on the first port.
::
modprobe igb max_vfs=2,4
This will spawn 2 VFs on the first port and 4 VFs on the second port.
NOTE: Caution must be used in loading the driver with these parameters.
Depending on your system configuration, number of slots, etc., it is impossible
to predict in all cases where the positions would be on the command line.
NOTE: Neither the device nor the driver control how VFs are mapped into config
space. Bus layout will vary by operating system. On operating systems that
support it, you can check sysfs to find the mapping.
NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering
and VLAN tag stripping/insertion will remain enabled. Please remove the old
VLAN filter before the new VLAN filter is added. For example::
ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0
ip link set eth0 vf 0 vlan 0 // Delete vlan 100
ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0
Debug
-----
:Valid Range: 0-16 (0=none,...,16=all)
:Default Value: 0
This parameter adjusts the level debug messages displayed in the system logs.
Additional Features and Configurations
======================================
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
to a value larger than the default value of 1500.
Use the ifconfig command to increase the MTU size. For example, enter the
following where <x> is the interface number::
ifconfig eth<x> mtu 9000 up
Alternatively, you can use the ip command as follows::
ip link set mtu 9000 dev eth<x>
ip link set up dev eth<x>
This setting is not saved across reboots. The setting change can be made
permanent by adding 'MTU=9000' to the file:
- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
- For SLES: /etc/sysconfig/network/<config_file>
NOTE: The maximum MTU setting for Jumbo Frames is 9216. This value coincides
with the maximum Jumbo Frames size of 9234 bytes.
NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
poor performance or loss of link.
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest ethtool
version is required for this functionality. Download it at:
https://www.kernel.org/pub/software/network/ethtool/
Enabling Wake on LAN (WoL)
--------------------------
WoL is configured through the ethtool utility.
WoL will be enabled on the system during the next shut down or reboot. For
this driver version, in order to enable WoL, the igb driver must be loaded
prior to shutting down or suspending the system.
NOTE: Wake on LAN is only supported on port A of multi-port devices. Also
Wake On LAN is not supported for the following device:
- Intel(R) Gigabit VT Quad Port Server Adapter
Multiqueue
----------
In this mode, a separate MSI-X vector is allocated for each queue and one for
"other" interrupts such as link status change and errors. All interrupts are
throttled via interrupt moderation. Interrupt moderation must be used to avoid
interrupt storms while the driver is processing one interrupt. The moderation
value should be at least as large as the expected time for the driver to
process an interrupt. Multiqueue is off by default.
REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not found,
the system will fallback to MSI or to Legacy interrupts. This driver supports
receive multiqueue on all kernels that support MSI-X.
NOTE: On some kernels a reboot is required to switch between single queue mode
and multiqueue mode or vice-versa.
MAC and VLAN anti-spoofing feature
----------------------------------
When a malicious driver attempts to send a spoofed packet, it is dropped by the
hardware and not transmitted.
An interrupt is sent to the PF driver notifying it of the spoof attempt. When a
spoofed packet is detected, the PF driver will send the following message to
the system log (displayed by the "dmesg" command):
Spoof event(s) detected on VF(n), where n = the VF that attempted to do the
spoofing
Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool
------------------------------------------------------------
You can set a MAC address of a Virtual Function (VF), a default VLAN and the
rate limit using the IProute2 tool. Download the latest version of the
IProute2 tool from Sourceforge if your version does not have all the features
you require.
Credit Based Shaper (Qav Mode)
------------------------------
When enabling the CBS qdisc in the hardware offload mode, traffic shaping using
the CBS (described in the IEEE 802.1Q-2018 Section 8.6.8.2 and discussed in the
Annex L) algorithm will run in the i210 controller, so it's more accurate and
uses less CPU.
When using offloaded CBS, and the traffic rate obeys the configured rate
(doesn't go above it), CBS should have little to no effect in the latency.
The offloaded version of the algorithm has some limits, caused by how the idle
slope is expressed in the adapter's registers. It can only represent idle slopes
in 16.38431 kbps units, which means that if a idle slope of 2576kbps is
requested, the controller will be configured to use a idle slope of ~2589 kbps,
because the driver rounds the value up. For more details, see the comments on
:c:func:`igb_config_tx_modes()`.
NOTE: This feature is exclusive to i210 models.
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,65 @@
.. SPDX-License-Identifier: GPL-2.0+
===========================================================
Linux Base Virtual Function Driver for Intel(R) 1G Ethernet
===========================================================
Intel Gigabit Virtual Function Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
Contents
========
- Identifying Your Adapter
- Additional Configurations
- Support
This driver supports Intel 82576-based virtual function devices-based virtual
function devices that can only be activated on kernels that support SR-IOV.
SR-IOV requires the correct platform and OS support.
The guest OS loading this driver must support MSI-X interrupts.
For questions related to hardware requirements, refer to the documentation
supplied with your Intel adapter. All hardware requirements listed apply to use
with Linux.
Driver information can be obtained using ethtool, lspci, and ifconfig.
Instructions on updating ethtool can be found in the section Additional
Configurations later in this document.
NOTE: There is a limit of a total of 32 shared VLANs to 1 or more VFs.
Identifying Your Adapter
========================
For information on how to identify your adapter, and for the latest Intel
network drivers, refer to the Intel Support website:
http://www.intel.com/support
Additional Features and Configurations
======================================
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest ethtool
version is required for this functionality. Download it at:
https://www.kernel.org/pub/software/network/ethtool/
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,468 @@
.. SPDX-License-Identifier: GPL-2.0+
=====================================================================
Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection
=====================================================================
October 1, 2018
Contents
========
- In This Release
- Identifying Your Adapter
- Command Line Parameters
- Improving Performance
- Additional Configurations
- Known Issues/Troubleshooting
- Support
In This Release
===============
This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R)
Network Connection. This driver includes support for Itanium(R)2-based
systems.
For questions related to hardware requirements, refer to the documentation
supplied with your 10 Gigabit adapter. All hardware requirements listed apply
to use with Linux.
The following features are available in this kernel:
- Native VLANs
- Channel Bonding (teaming)
- SNMP
Channel Bonding documentation can be found in the Linux kernel source:
/Documentation/networking/bonding.rst
The driver information previously displayed in the /proc filesystem is not
supported in this release. Alternatively, you can use ethtool (version 1.6
or later), lspci, and iproute2 to obtain the same information.
Instructions on updating ethtool can be found in the section "Additional
Configurations" later in this document.
Identifying Your Adapter
========================
The following Intel network adapters are compatible with the drivers in this
release:
+------------+------------------------------+----------------------------------+
| Controller | Adapter Name | Physical Layer |
+============+==============================+==================================+
| 82597EX | Intel(R) PRO/10GbE LR/SR/CX4 | - 10G Base-LR (fiber) |
| | Server Adapters | - 10G Base-SR (fiber) |
| | | - 10G Base-CX4 (copper) |
+------------+------------------------------+----------------------------------+
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
https://support.intel.com
Command Line Parameters
=======================
If the driver is built as a module, the following optional parameters are
used by entering them on the command line with the modprobe command using
this syntax::
modprobe ixgb [<option>=<VAL1>,<VAL2>,...]
For example, with two 10GbE PCI adapters, entering::
modprobe ixgb TxDescriptors=80,128
loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
resources for the second adapter.
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
Copybreak
---------
:Valid Range: 0-XXXX
:Default Value: 256
This is the maximum size of packet that is copied to a new buffer on
receive.
Debug
-----
:Valid Range: 0-16 (0=none,...,16=all)
:Default Value: 0
This parameter adjusts the level of debug messages displayed in the
system logs.
FlowControl
-----------
:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
:Default Value: 1 if no EEPROM, otherwise read from EEPROM
This parameter controls the automatic generation(Tx) and response(Rx) to
Ethernet PAUSE frames. There are hardware bugs associated with enabling
Tx flow control so beware.
RxDescriptors
-------------
:Valid Range: 64-4096
:Default Value: 1024
This value is the number of receive descriptors allocated by the driver.
Increasing this value allows the driver to buffer more incoming packets.
Each descriptor is 16 bytes. A receive buffer is also allocated for
each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
depending on the MTU setting. When the MTU size is 1500 or less, the
receive buffer size is 2048 bytes. When the MTU is greater than 1500 the
receive buffer size will be either 4056, 8192, or 16384 bytes. The
maximum MTU size is 16114.
TxDescriptors
-------------
:Valid Range: 64-4096
:Default Value: 256
This value is the number of transmit descriptors allocated by the driver.
Increasing this value allows the driver to queue more transmits. Each
descriptor is 16 bytes.
RxIntDelay
----------
:Valid Range: 0-65535 (0=off)
:Default Value: 72
This value delays the generation of receive interrupts in units of
0.8192 microseconds. Receive interrupt reduction can improve CPU
efficiency if properly tuned for specific network traffic. Increasing
this value adds extra latency to frame reception and can end up
decreasing the throughput of TCP traffic. If the system is reporting
dropped receives, this value may be set too high, causing the driver to
run out of available receive descriptors.
TxIntDelay
----------
:Valid Range: 0-65535 (0=off)
:Default Value: 32
This value delays the generation of transmit interrupts in units of
0.8192 microseconds. Transmit interrupt reduction can improve CPU
efficiency if properly tuned for specific network traffic. Increasing
this value adds extra latency to frame transmission and can end up
decreasing the throughput of TCP traffic. If this value is set too high,
it will cause the driver to run out of available transmit descriptors.
XsumRX
------
:Valid Range: 0-1
:Default Value: 1
A value of '1' indicates that the driver should enable IP checksum
offload for received packets (both UDP and TCP) to the adapter hardware.
RxFCHighThresh
--------------
:Valid Range: 1,536-262,136 (0x600 - 0x3FFF8, 8 byte granularity)
:Default Value: 196,608 (0x30000)
Receive Flow control high threshold (when we send a pause frame)
RxFCLowThresh
-------------
:Valid Range: 64-262,136 (0x40 - 0x3FFF8, 8 byte granularity)
:Default Value: 163,840 (0x28000)
Receive Flow control low threshold (when we send a resume frame)
FCReqTimeout
------------
:Valid Range: 1-65535
:Default Value: 65535
Flow control request timeout (how long to pause the link partner's tx)
IntDelayEnable
--------------
:Value Range: 0,1
:Default Value: 1
Interrupt Delay, 0 disables transmit interrupt delay and 1 enables it.
Improving Performance
=====================
With the 10 Gigabit server adapters, the default Linux configuration will
very likely limit the total available throughput artificially. There is a set
of configuration changes that, when applied together, will increase the ability
of Linux to transmit and receive data. The following enhancements were
originally acquired from settings published at http://www.spec.org/web99/ for
various submitted results using Linux.
NOTE:
These changes are only suggestions, and serve as a starting point for
tuning your network performance.
The changes are made in three major ways, listed in order of greatest effect:
- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen
parameter.
- Use sysctl to modify /proc parameters (essentially kernel tuning)
- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
transmit burst lengths on the bus.
NOTE:
setpci modifies the adapter's configuration registers to allow it to read
up to 4k bytes at a time (for transmits). However, for some systems the
behavior after modifying this register may be undefined (possibly errors of
some kind). A power-cycle, hard reset or explicitly setting the e6 register
back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a
stable configuration.
- COPY these lines and paste them into ixgb_perf.sh:
::
#!/bin/bash
echo "configuring network performance , edit this file to change the interface
or device ID of 10GbE card"
# set mmrbc to 4k reads, modify only Intel 10GbE device IDs
# replace 1a48 with appropriate 10GbE device's ID installed on the system,
# if needed.
setpci -d 8086:1a48 e6.b=2e
# set the MTU (max transmission unit) - it requires your switch and clients
# to change as well.
# set the txqueuelen
# your ixgb adapter should be loaded as eth1 for this to work, change if needed
ip li set dev eth1 mtu 9000 txqueuelen 1000 up
# call the sysctl utility to modify /proc/sys entries
sysctl -p ./sysctl_ixgb.conf
- COPY these lines and paste them into sysctl_ixgb.conf:
::
# some of the defaults may be different for your kernel
# call this file with sysctl -p <this file>
# these are just suggested values that worked well to increase throughput in
# several network benchmark tests, your mileage may vary
### IPV4 specific settings
# turn TCP timestamp support off, default 1, reduces CPU use
net.ipv4.tcp_timestamps = 0
# turn SACK support off, default on
# on systems with a VERY fast bus -> memory interface this is the big gainer
net.ipv4.tcp_sack = 0
# set min/default/max TCP read buffer, default 4096 87380 174760
net.ipv4.tcp_rmem = 10000000 10000000 10000000
# set min/pressure/max TCP write buffer, default 4096 16384 131072
net.ipv4.tcp_wmem = 10000000 10000000 10000000
# set min/pressure/max TCP buffer space, default 31744 32256 32768
net.ipv4.tcp_mem = 10000000 10000000 10000000
### CORE settings (mostly for socket and UDP effect)
# set maximum receive socket buffer size, default 131071
net.core.rmem_max = 524287
# set maximum send socket buffer size, default 131071
net.core.wmem_max = 524287
# set default receive socket buffer size, default 65535
net.core.rmem_default = 524287
# set default send socket buffer size, default 65535
net.core.wmem_default = 524287
# set maximum amount of option memory buffers, default 10240
net.core.optmem_max = 524287
# set number of unprocessed input packets before kernel starts dropping them; default 300
net.core.netdev_max_backlog = 300000
Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's
ID installed on the system.
NOTE:
Unless these scripts are added to the boot process, these changes will
only last only until the next system reboot.
Resolving Slow UDP Traffic
--------------------------
If your server does not seem to be able to receive UDP traffic as fast as it
can receive TCP traffic, it could be because Linux, by default, does not set
the network stack buffers as large as they need to be to support high UDP
transfer rates. One way to alleviate this problem is to allow more memory to
be used by the IP stack to store incoming data.
For instance, use the commands::
sysctl -w net.core.rmem_max=262143
and::
sysctl -w net.core.rmem_default=262143
to increase the read buffer memory max and default to 262143 (256k - 1) from
defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
will increase the amount of memory used by the network stack for receives, and
can be increased significantly more if necessary for your application.
Additional Configurations
=========================
Configuring the Driver on Different Distributions
-------------------------------------------------
Configuring a network driver to load properly when the system is started is
distribution dependent. Typically, the configuration process involves adding
an alias line to /etc/modprobe.conf as well as editing other system startup
scripts and/or configuration files. Many popular Linux distributions ship
with tools to make these changes for you. To learn the proper way to
configure a network device for your system, refer to your distribution
documentation. If during this process you are asked for the driver or module
name, the name for the Linux Base Driver for the Intel 10GbE Family of
Adapters is ixgb.
Viewing Link Messages
---------------------
Link messages will not be displayed to the console if the distribution is
restricting system messages. In order to see network driver link messages on
your console, set dmesg to eight by entering the following::
dmesg -n 8
NOTE: This setting is not saved across reboots.
Jumbo Frames
------------
The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
enabled by changing the MTU to a value larger than the default of 1500.
The maximum value for the MTU is 16114. Use the ip command to
increase the MTU size. For example::
ip li set dev ethx mtu 9000
The maximum MTU setting for Jumbo Frames is 16114. This value coincides
with the maximum Jumbo Frames size of 16128.
Ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The ethtool
version 1.6 or later is required for this functionality.
The latest release of ethtool can be found from
https://www.kernel.org/pub/software/network/ethtool/
NOTE:
The ethtool version 1.6 only supports a limited set of ethtool options.
Support for a more complete ethtool feature set can be enabled by
upgrading to the latest version.
NAPI
----
NAPI (Rx polling mode) is supported in the ixgb driver.
See https://wiki.linuxfoundation.org/networking/napi for more information on
NAPI.
Known Issues/Troubleshooting
============================
NOTE:
After installing the driver, if your Intel Network Connection is not
working, verify in the "In This Release" section of the readme that you have
installed the correct driver.
Cable Interoperability Issue with Fujitsu XENPAK Module in SmartBits Chassis
----------------------------------------------------------------------------
Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4
Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits
chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni.
The CRC errors may be received either by the Intel(R) PRO/10GbE CX4
Server adapter or the SmartBits. If this situation occurs using a different
cable assembly may resolve the issue.
Cable Interoperability Issues with HP Procurve 3400cl Switch Port
-----------------------------------------------------------------
Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server
adapter is connected to an HP Procurve 3400cl switch port using short cables
(1 m or shorter). If this situation occurs, using a longer cable may resolve
the issue.
Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that
Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC
errors may be received either by the CX4 Server adapter or at the switch. If
this situation occurs, using a different cable assembly may resolve the issue.
Jumbo Frames System Requirement
-------------------------------
Memory allocation failures have been observed on Linux systems with 64 MB
of RAM or less that are running Jumbo Frames. If you are using Jumbo
Frames, your system may require more than the advertised minimum
requirement of 64 MB of system memory.
Performance Degradation with Jumbo Frames
-----------------------------------------
Degradation in throughput performance may be observed in some Jumbo frames
environments. If this is observed, increasing the application's socket buffer
size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
See the specific application manual and /usr/src/linux*/Documentation/
networking/ip-sysctl.txt for more details.
Allocating Rx Buffers when Using Jumbo Frames
---------------------------------------------
Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
the available memory is heavily fragmented. This issue may be seen with PCI-X
adapters or with packet split disabled. This can be reduced or eliminated
by changing the amount of available memory for receive buffer allocation, by
increasing /proc/sys/vm/min_free_kbytes.
Multiple Interfaces on Same Ethernet Broadcast Network
------------------------------------------------------
Due to the default ARP behavior on Linux, it is not possible to have
one system on two IP networks in the same Ethernet broadcast domain
(non-partitioned switch) behave as expected. All Ethernet interfaces
will respond to IP traffic for any IP address assigned to the system.
This results in unbalanced receive traffic.
If you have multiple interfaces in a server, do either of the following:
- Turn on ARP filtering by entering::
echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
- Install the interfaces in separate broadcast domains - either in
different switches or in a switch partitioned to VLANs.
UDP Stress Test Dropped Packet Issue
--------------------------------------
Under small packets UDP stress test with 10GbE driver, the Linux system
may drop UDP packets due to the fullness of socket buffers. You may want
to change the driver's Flow Control variables to the minimum value for
controlling packet reception.
Tx Hangs Possible Under Stress
------------------------------
Under stress conditions, if TX hangs occur, turning off TSO
"ethtool -K eth0 tso off" may resolve the problem.
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net

View File

@@ -0,0 +1,541 @@
.. SPDX-License-Identifier: GPL-2.0+
===========================================================================
Linux Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters
===========================================================================
Intel 10 Gigabit Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
Contents
========
- Identifying Your Adapter
- Command Line Parameters
- Additional Configurations
- Known Issues
- Support
Identifying Your Adapter
========================
The driver is compatible with devices based on the following:
* Intel(R) Ethernet Controller 82598
* Intel(R) Ethernet Controller 82599
* Intel(R) Ethernet Controller X520
* Intel(R) Ethernet Controller X540
* Intel(R) Ethernet Controller x550
* Intel(R) Ethernet Controller X552
* Intel(R) Ethernet Controller X553
For information on how to identify your adapter, and for the latest Intel
network drivers, refer to the Intel Support website:
https://www.intel.com/support
SFP+ Devices with Pluggable Optics
----------------------------------
82599-BASED ADAPTERS
~~~~~~~~~~~~~~~~~~~~
NOTES:
- If your 82599-based Intel(R) Network Adapter came with Intel optics or is an
Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel optics
and/or the direct attach cables listed below.
- When 82599-based SFP+ devices are connected back to back, they should be set
to the same Speed setting via ethtool. Results may vary if you mix speed
settings.
+---------------+---------------------------------------+------------------+
| Supplier | Type | Part Numbers |
+===============+=======================================+==================+
| SR Modules |
+---------------+---------------------------------------+------------------+
| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | FTLX8571D3BCV-IT |
+---------------+---------------------------------------+------------------+
| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDZ-IN2 |
+---------------+---------------------------------------+------------------+
| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDDZ-IN1 |
+---------------+---------------------------------------+------------------+
| LR Modules |
+---------------+---------------------------------------+------------------+
| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | FTLX1471D3BCV-IT |
+---------------+---------------------------------------+------------------+
| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDZ-IN2 |
+---------------+---------------------------------------+------------------+
| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDDZ-IN1 |
+---------------+---------------------------------------+------------------+
The following is a list of 3rd party SFP+ modules that have received some
testing. Not all modules are applicable to all devices.
+---------------+---------------------------------------+------------------+
| Supplier | Type | Part Numbers |
+===============+=======================================+==================+
| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL |
+---------------+---------------------------------------+------------------+
| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ |
+---------------+---------------------------------------+------------------+
| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL |
+---------------+---------------------------------------+------------------+
| Finisar | DUAL RATE 1G/10G SFP+ SR (No Bail) | FTLX8571D3QCV-IT |
+---------------+---------------------------------------+------------------+
| Avago | DUAL RATE 1G/10G SFP+ SR (No Bail) | AFBR-703SDZ-IN1 |
+---------------+---------------------------------------+------------------+
| Finisar | DUAL RATE 1G/10G SFP+ LR (No Bail) | FTLX1471D3QCV-IT |
+---------------+---------------------------------------+------------------+
| Avago | DUAL RATE 1G/10G SFP+ LR (No Bail) | AFCT-701SDZ-IN1 |
+---------------+---------------------------------------+------------------+
| Finisar | 1000BASE-T SFP | FCLF8522P2BTL |
+---------------+---------------------------------------+------------------+
| Avago | 1000BASE-T | ABCU-5710RZ |
+---------------+---------------------------------------+------------------+
| HP | 1000BASE-SX SFP | 453153-001 |
+---------------+---------------------------------------+------------------+
82599-based adapters support all passive and active limiting direct attach
cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications.
Laser turns off for SFP+ when ifconfig ethX down
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"ifconfig ethX down" turns off the laser for 82599-based SFP+ fiber adapters.
"ifconfig ethX up" turns on the laser.
Alternatively, you can use "ip link set [down/up] dev ethX" to turn the
laser off and on.
82599-based QSFP+ Adapters
~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTES:
- If your 82599-based Intel(R) Network Adapter came with Intel optics, it only
supports Intel optics.
- 82599-based QSFP+ adapters only support 4x10 Gbps connections. 1x40 Gbps
connections are not supported. QSFP+ link partners must be configured for
4x10 Gbps.
- 82599-based QSFP+ adapters do not support automatic link speed detection.
The link speed must be configured to either 10 Gbps or 1 Gbps to match the link
partners speed capabilities. Incorrect speed configurations will result in
failure to link.
- Intel(R) Ethernet Converged Network Adapter X520-Q1 only supports the optics
and direct attach cables listed below.
+---------------+---------------------------------------+------------------+
| Supplier | Type | Part Numbers |
+===============+=======================================+==================+
| Intel | DUAL RATE 1G/10G QSFP+ SRL (bailed) | E10GQSFPSR |
+---------------+---------------------------------------+------------------+
82599-based QSFP+ adapters support all passive and active limiting QSFP+
direct attach cables that comply with SFF-8436 v4.1 specifications.
82598-BASED ADAPTERS
~~~~~~~~~~~~~~~~~~~~
NOTES:
- Intel(r) Ethernet Network Adapters that support removable optical modules
only support their original module type (for example, the Intel(R) 10 Gigabit
SR Dual Port Express Module only supports SR optical modules). If you plug in
a different type of module, the driver will not load.
- Hot Swapping/hot plugging optical modules is not supported.
- Only single speed, 10 gigabit modules are supported.
- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module
types are not supported. Please see your system documentation for details.
The following is a list of SFP+ modules and direct attach cables that have
received some testing. Not all modules are applicable to all devices.
+---------------+---------------------------------------+------------------+
| Supplier | Type | Part Numbers |
+===============+=======================================+==================+
| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL |
+---------------+---------------------------------------+------------------+
| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ |
+---------------+---------------------------------------+------------------+
| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL |
+---------------+---------------------------------------+------------------+
82598-based adapters support all passive direct attach cables that comply with
SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach cables
are not supported.
Third party optic modules and cables referred to above are listed only for the
purpose of highlighting third party specifications and potential
compatibility, and are not recommendations or endorsements or sponsorship of
any third party's product by Intel. Intel is not endorsing or promoting
products made by any third party and the third party reference is provided
only to share information regarding certain optic modules and cables with the
above specifications. There may be other manufacturers or suppliers, producing
or supplying optic modules and cables with similar or matching descriptions.
Customers must use their own discretion and diligence to purchase optic
modules and cables from any third party of their choice. Customers are solely
responsible for assessing the suitability of the product and/or devices and
for the selection of the vendor for purchasing any product. THE OPTIC MODULES
AND CABLES REFERRED TO ABOVE ARE NOT WARRANTED OR SUPPORTED BY INTEL. INTEL
ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED
WARRANTY, RELATING TO SALE AND/OR USE OF SUCH THIRD PARTY PRODUCTS OR
SELECTION OF VENDOR BY CUSTOMERS.
Command Line Parameters
=======================
max_vfs
-------
:Valid Range: 1-63
This parameter adds support for SR-IOV. It causes the driver to spawn up to
max_vfs worth of virtual functions.
If the value is greater than 0 it will also force the VMDq parameter to be 1 or
more.
NOTE: This parameter is only used on kernel 3.7.x and below. On kernel 3.8.x
and above, use sysfs to enable VFs. Also, for Red Hat distributions, this
parameter is only used on version 6.6 and older. For version 6.7 and newer, use
sysfs. For example::
#echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs // enable VFs
#echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs
The parameters for the driver are referenced by position. Thus, if you have a
dual port adapter, or more than one adapter in your system, and want N virtual
functions per port, you must specify a number for each port with each parameter
separated by a comma. For example::
modprobe ixgbe max_vfs=4
This will spawn 4 VFs on the first port.
::
modprobe ixgbe max_vfs=2,4
This will spawn 2 VFs on the first port and 4 VFs on the second port.
NOTE: Caution must be used in loading the driver with these parameters.
Depending on your system configuration, number of slots, etc., it is impossible
to predict in all cases where the positions would be on the command line.
NOTE: Neither the device nor the driver control how VFs are mapped into config
space. Bus layout will vary by operating system. On operating systems that
support it, you can check sysfs to find the mapping.
NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering
and VLAN tag stripping/insertion will remain enabled. Please remove the old
VLAN filter before the new VLAN filter is added. For example,
::
ip link set eth0 vf 0 vlan 100 // set VLAN 100 for VF 0
ip link set eth0 vf 0 vlan 0 // Delete VLAN 100
ip link set eth0 vf 0 vlan 200 // set a new VLAN 200 for VF 0
With kernel 3.6, the driver supports the simultaneous usage of max_vfs and DCB
features, subject to the constraints described below. Prior to kernel 3.6, the
driver did not support the simultaneous operation of max_vfs greater than 0 and
the DCB features (multiple traffic classes utilizing Priority Flow Control and
Extended Transmission Selection).
When DCB is enabled, network traffic is transmitted and received through
multiple traffic classes (packet buffers in the NIC). The traffic is associated
with a specific class based on priority, which has a value of 0 through 7 used
in the VLAN tag. When SR-IOV is not enabled, each traffic class is associated
with a set of receive/transmit descriptor queue pairs. The number of queue
pairs for a given traffic class depends on the hardware configuration. When
SR-IOV is enabled, the descriptor queue pairs are grouped into pools. The
Physical Function (PF) and each Virtual Function (VF) is allocated a pool of
receive/transmit descriptor queue pairs. When multiple traffic classes are
configured (for example, DCB is enabled), each pool contains a queue pair from
each traffic class. When a single traffic class is configured in the hardware,
the pools contain multiple queue pairs from the single traffic class.
The number of VFs that can be allocated depends on the number of traffic
classes that can be enabled. The configurable number of traffic classes for
each enabled VF is as follows:
0 - 15 VFs = Up to 8 traffic classes, depending on device support
16 - 31 VFs = Up to 4 traffic classes
32 - 63 VFs = 1 traffic class
When VFs are configured, the PF is allocated one pool as well. The PF supports
the DCB features with the constraint that each traffic class will only use a
single queue pair. When zero VFs are configured, the PF can support multiple
queue pairs per traffic class.
allow_unsupported_sfp
---------------------
:Valid Range: 0,1
:Default Value: 0 (disabled)
This parameter allows unsupported and untested SFP+ modules on 82599-based
adapters, as long as the type of module is known to the driver.
debug
-----
:Valid Range: 0-16 (0=none,...,16=all)
:Default Value: 0
This parameter adjusts the level of debug messages displayed in the system
logs.
Additional Features and Configurations
======================================
Flow Control
------------
Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
receiving and transmitting pause frames for ixgbe. When transmit is enabled,
pause frames are generated when the receive packet buffer crosses a predefined
threshold. When receive is enabled, the transmit unit will halt for the time
delay specified when a pause frame is received.
NOTE: You must have a flow control capable link partner.
Flow Control is enabled by default.
Use ethtool to change the flow control settings. To enable or disable Rx or
Tx Flow Control::
ethtool -A eth? rx <on|off> tx <on|off>
Note: This command only enables or disables Flow Control if auto-negotiation is
disabled. If auto-negotiation is enabled, this command changes the parameters
used for auto-negotiation with the link partner.
To enable or disable auto-negotiation::
ethtool -s eth? autoneg <on|off>
Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
on your device, you may not be able to change the auto-negotiation setting.
NOTE: For 82598 backplane cards entering 1 gigabit mode, flow control default
behavior is changed to off. Flow control in 1 gigabit mode on these devices can
lead to transmit hangs.
Intel(R) Ethernet Flow Director
-------------------------------
The Intel Ethernet Flow Director performs the following tasks:
- Directs receive packets according to their flows to different queues.
- Enables tight control on routing a flow in the platform.
- Matches flows and CPU cores for flow affinity.
- Supports multiple parameters for flexible flow classification and load
balancing (in SFP mode only).
NOTE: Intel Ethernet Flow Director masking works in the opposite manner from
subnet masking. In the following command::
#ethtool -N eth11 flow-type ip4 src-ip 172.4.1.2 m 255.0.0.0 dst-ip \
172.21.1.1 m 255.128.0.0 action 31
The src-ip value that is written to the filter will be 0.4.1.2, not 172.0.0.0
as might be expected. Similarly, the dst-ip value written to the filter will be
0.21.1.1, not 172.0.0.0.
To enable or disable the Intel Ethernet Flow Director::
# ethtool -K ethX ntuple <on|off>
When disabling ntuple filters, all the user programmed filters are flushed from
the driver cache and hardware. All needed filters must be re-added when ntuple
is re-enabled.
To add a filter that directs packet to queue 2, use -U or -N switch::
# ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
To see the list of filters currently present::
# ethtool <-u|-n> ethX
Sideband Perfect Filters
------------------------
Sideband Perfect Filters are used to direct traffic that matches specified
characteristics. They are enabled through ethtool's ntuple interface. To add a
new filter use the following command::
ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \
dst-port <port> action <queue>
Where:
<device> - the ethernet device to program
<type> - can be ip4, tcp4, udp4, or sctp4
<ip> - the IP address to match on
<port> - the port number to match on
<queue> - the queue to direct traffic towards (-1 discards the matched traffic)
Use the following command to delete a filter::
ethtool -U <device> delete <N>
Where <N> is the filter id displayed when printing all the active filters, and
may also have been specified using "loc <N>" when adding the filter.
The following example matches TCP traffic sent from 192.168.0.1, port 5300,
directed to 192.168.0.5, port 80, and sends it to queue 7::
ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \
src-port 5300 dst-port 80 action 7
For each flow-type, the programmed filters must all have the same matching
input set. For example, issuing the following two commands is acceptable::
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
Issuing the next two commands, however, is not acceptable, since the first
specifies src-ip and the second specifies dst-ip::
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
The second command will fail with an error. You may program multiple filters
with the same fields, using different values, but, on one device, you may not
program two TCP4 filters with different matching fields.
Matching on a sub-portion of a field is not supported by the ixgbe driver, thus
partial mask fields are not supported.
To create filters that direct traffic to a specific Virtual Function, use the
"user-def" parameter. Specify the user-def as a 64 bit value, where the lower 32
bits represents the queue number, while the next 8 bits represent which VF.
Note that 0 is the PF, so the VF identifier is offset by 1. For example::
... user-def 0x800000002 ...
specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of
that VF.
Note that these filters will not break internal routing rules, and will not
route traffic that otherwise would not have been sent to the specified Virtual
Function.
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
to a value larger than the default value of 1500.
Use the ifconfig command to increase the MTU size. For example, enter the
following where <x> is the interface number::
ifconfig eth<x> mtu 9000 up
Alternatively, you can use the ip command as follows::
ip link set mtu 9000 dev eth<x>
ip link set up dev eth<x>
This setting is not saved across reboots. The setting change can be made
permanent by adding 'MTU=9000' to the file::
/etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL
/etc/sysconfig/network/<config_file> // for SLES
NOTE: The maximum MTU setting for Jumbo Frames is 9710. This value coincides
with the maximum Jumbo Frames size of 9728 bytes.
NOTE: This driver will attempt to use multiple page sized buffers to receive
each jumbo packet. This should help to avoid buffer starvation issues when
allocating receive packets.
NOTE: For 82599-based network connections, if you are enabling jumbo frames in
a virtual function (VF), jumbo frames must first be enabled in the physical
function (PF). The VF MTU setting cannot be larger than the PF MTU.
Generic Receive Offload, aka GRO
--------------------------------
The driver supports the in-kernel software implementation of GRO. GRO has
shown that by coalescing Rx traffic into larger chunks of data, CPU
utilization can be significantly reduced when under large Rx load. GRO is an
evolution of the previously-used LRO interface. GRO is able to coalesce
other protocols besides TCP. It's also safe to use with configurations that
are problematic for LRO, namely bridging and iSCSI.
Data Center Bridging (DCB)
--------------------------
NOTE:
The kernel assumes that TC0 is available, and will disable Priority Flow
Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
enabled when setting up DCB on your switch.
DCB is a configuration Quality of Service implementation in hardware. It uses
the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
different priorities that traffic can be filtered into. It also enables
priority flow control (802.1Qbb) which can limit or eliminate the number of
dropped packets during network stress. Bandwidth can be allocated to each of
these priorities, which is enforced at the hardware level (802.1Qaz).
Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and
802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only
and can accept settings from a DCBX capable peer. Software configuration of
DCBX parameters via dcbtool/lldptool are not supported.
The ixgbe driver implements the DCB netlink interface layer to allow user-space
to communicate with the driver and query DCB configuration for the port.
ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest ethtool
version is required for this functionality. Download it at:
https://www.kernel.org/pub/software/network/ethtool/
FCoE
----
The ixgbe driver supports Fiber Channel over Ethernet (FCoE) and Data Center
Bridging (DCB). This code has no default effect on the regular driver
operation. Configuring DCB and FCoE is outside the scope of this README. Refer
to http://www.open-fcoe.org/ for FCoE project information and contact
ixgbe-eedc@lists.sourceforge.net for DCB information.
MAC and VLAN anti-spoofing feature
----------------------------------
When a malicious driver attempts to send a spoofed packet, it is dropped by the
hardware and not transmitted.
An interrupt is sent to the PF driver notifying it of the spoof attempt. When a
spoofed packet is detected, the PF driver will send the following message to
the system log (displayed by the "dmesg" command)::
ixgbe ethX: ixgbe_spoof_check: n spoofed packets detected
where "x" is the PF interface number; and "n" is number of spoofed packets.
NOTE: This feature can be disabled for a specific Virtual Function (VF)::
ip link set <pf dev> vf <vf id> spoofchk {off|on}
IPsec Offload
-------------
The ixgbe driver supports IPsec Hardware Offload. When creating Security
Associations with "ip xfrm ..." the 'offload' tag option can be used to
register the IPsec SA with the driver in order to get higher throughput in
the secure communications.
The offload is also supported for ixgbe's VFs, but the VF must be set as
'trusted' and the support must be enabled with::
ethtool --set-priv-flags eth<x> vf-ipsec on
ip link set eth<x> vf <y> trust on
Known Issues/Troubleshooting
============================
Enabling SR-IOV in a 64-bit Microsoft Windows Server 2012/R2 guest OS
---------------------------------------------------------------------
Linux KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM.
This includes traditional PCIe devices, as well as SR-IOV-capable devices based
on the Intel Ethernet Controller XL710.
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,67 @@
.. SPDX-License-Identifier: GPL-2.0+
============================================================
Linux Base Virtual Function Driver for Intel(R) 10G Ethernet
============================================================
Intel 10 Gigabit Virtual Function Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
Contents
========
- Identifying Your Adapter
- Known Issues
- Support
This driver supports 82599, X540, X550, and X552-based virtual function devices
that can only be activated on kernels that support SR-IOV.
For questions related to hardware requirements, refer to the documentation
supplied with your Intel adapter. All hardware requirements listed apply to use
with Linux.
Identifying Your Adapter
========================
The driver is compatible with devices based on the following:
* Intel(R) Ethernet Controller 82598
* Intel(R) Ethernet Controller 82599
* Intel(R) Ethernet Controller X520
* Intel(R) Ethernet Controller X540
* Intel(R) Ethernet Controller x550
* Intel(R) Ethernet Controller X552
* Intel(R) Ethernet Controller X553
For information on how to identify your adapter, and for the latest Intel
network drivers, refer to the Intel Support website:
https://www.intel.com/support
Known Issues/Troubleshooting
============================
SR-IOV requires the correct platform and OS support.
The guest OS loading this driver must support MSI-X interrupts.
This driver is only supported as a loadable module at this time. Intel is not
supplying patches against the kernel source to allow for static linking of the
drivers.
VLANs: There is a limit of a total of 64 shared VLANs to 1 or more VFs.
Support
=======
For general information, go to the Intel support website at:
https://www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
https://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to e1000-devel@lists.sf.net.

View File

@@ -0,0 +1,159 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
====================================
Marvell OcteonTx2 RVU Kernel Drivers
====================================
Copyright (c) 2020 Marvell International Ltd.
Contents
========
- `Overview`_
- `Drivers`_
- `Basic packet flow`_
Overview
========
Resource virtualization unit (RVU) on Marvell's OcteonTX2 SOC maps HW
resources from the network, crypto and other functional blocks into
PCI-compatible physical and virtual functions. Each functional block
again has multiple local functions (LFs) for provisioning to PCI devices.
RVU supports multiple PCIe SRIOV physical functions (PFs) and virtual
functions (VFs). PF0 is called the administrative / admin function (AF)
and has privileges to provision RVU functional block's LFs to each of the
PF/VF.
RVU managed networking functional blocks
- Network pool or buffer allocator (NPA)
- Network interface controller (NIX)
- Network parser CAM (NPC)
- Schedule/Synchronize/Order unit (SSO)
- Loopback interface (LBK)
RVU managed non-networking functional blocks
- Crypto accelerator (CPT)
- Scheduled timers unit (TIM)
- Schedule/Synchronize/Order unit (SSO)
Used for both networking and non networking usecases
Resource provisioning examples
- A PF/VF with NIX-LF & NPA-LF resources works as a pure network device
- A PF/VF with CPT-LF resource works as a pure crypto offload device.
RVU functional blocks are highly configurable as per software requirements.
Firmware setups following stuff before kernel boots
- Enables required number of RVU PFs based on number of physical links.
- Number of VFs per PF are either static or configurable at compile time.
Based on config, firmware assigns VFs to each of the PFs.
- Also assigns MSIX vectors to each of PF and VFs.
- These are not changed after kernel boot.
Drivers
=======
Linux kernel will have multiple drivers registering to different PF and VFs
of RVU. Wrt networking there will be 3 flavours of drivers.
Admin Function driver
---------------------
As mentioned above RVU PF0 is called the admin function (AF), this driver
supports resource provisioning and configuration of functional blocks.
Doesn't handle any I/O. It sets up few basic stuff but most of the
funcionality is achieved via configuration requests from PFs and VFs.
PF/VFs communicates with AF via a shared memory region (mailbox). Upon
receiving requests AF does resource provisioning and other HW configuration.
AF is always attached to host kernel, but PFs and their VFs may be used by host
kernel itself, or attached to VMs or to userspace applications like
DPDK etc. So AF has to handle provisioning/configuration requests sent
by any device from any domain.
AF driver also interacts with underlying firmware to
- Manage physical ethernet links ie CGX LMACs.
- Retrieve information like speed, duplex, autoneg etc
- Retrieve PHY EEPROM and stats.
- Configure FEC, PAM modes
- etc
From pure networking side AF driver supports following functionality.
- Map a physical link to a RVU PF to which a netdev is registered.
- Attach NIX and NPA block LFs to RVU PF/VF which provide buffer pools, RQs, SQs
for regular networking functionality.
- Flow control (pause frames) enable/disable/config.
- HW PTP timestamping related config.
- NPC parser profile config, basically how to parse pkt and what info to extract.
- NPC extract profile config, what to extract from the pkt to match data in MCAM entries.
- Manage NPC MCAM entries, upon request can frame and install requested packet forwarding rules.
- Defines receive side scaling (RSS) algorithms.
- Defines segmentation offload algorithms (eg TSO)
- VLAN stripping, capture and insertion config.
- SSO and TIM blocks config which provide packet scheduling support.
- Debugfs support, to check current resource provising, current status of
NPA pools, NIX RQ, SQ and CQs, various stats etc which helps in debugging issues.
- And many more.
Physical Function driver
------------------------
This RVU PF handles IO, is mapped to a physical ethernet link and this
driver registers a netdev. This supports SR-IOV. As said above this driver
communicates with AF with a mailbox. To retrieve information from physical
links this driver talks to AF and AF gets that info from firmware and responds
back ie cannot talk to firmware directly.
Supports ethtool for configuring links, RSS, queue count, queue size,
flow control, ntuple filters, dump PHY EEPROM, config FEC etc.
Virtual Function driver
-----------------------
There are two types VFs, VFs that share the physical link with their parent
SR-IOV PF and the VFs which work in pairs using internal HW loopback channels (LBK).
Type1:
- These VFs and their parent PF share a physical link and used for outside communication.
- VFs cannot communicate with AF directly, they send mbox message to PF and PF
forwards that to AF. AF after processing, responds back to PF and PF forwards
the reply to VF.
- From functionality point of view there is no difference between PF and VF as same type
HW resources are attached to both. But user would be able to configure few stuff only
from PF as PF is treated as owner/admin of the link.
Type2:
- RVU PF0 ie admin function creates these VFs and maps them to loopback block's channels.
- A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of
VF0 will be received by VF1 and viceversa.
- These VFs can be used by applications or virtual machines to communicate between them
without sending traffic outside. There is no switch present in HW, hence the support
for loopback VFs.
- These communicate directly with AF (PF0) via mbox.
Except for the IO channels or links used for packet reception and transmission there is
no other difference between these VF types. AF driver takes care of IO channel mapping,
hence same VF driver works for both types of devices.
Basic packet flow
=================
Ingress
-------
1. CGX LMAC receives packet.
2. Forwards the packet to the NIX block.
3. Then submitted to NPC block for parsing and then MCAM lookup to get the destination RVU device.
4. NIX LF attached to the destination RVU device allocates a buffer from RQ mapped buffer pool of NPA block LF.
5. RQ may be selected by RSS or by configuring MCAM rule with a RQ number.
6. Packet is DMA'ed and driver is notified.
Egress
------
1. Driver prepares a send descriptor and submits to SQ for transmission.
2. The SQ is already configured (by AF) to transmit on a specific link/channel.
3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF.
4. NIX block transmits the pkt on the designated channel.
5. NPC MCAM entries can be installed to divert pkt onto a different channel.

View File

@@ -0,0 +1,321 @@
.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
=================================================
Mellanox ConnectX(R) mlx5 core VPI Network Driver
=================================================
Copyright (c) 2019, Mellanox Technologies LTD.
Contents
========
- `Enabling the driver and kconfig options`_
- `Devlink info`_
- `Devlink parameters`_
- `Devlink health reporters`_
- `mlx5 tracepoints`_
Enabling the driver and kconfig options
================================================
| mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out)
| at build time via kernel Kconfig flags.
| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
| For the list of advanced features please see below.
**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
| The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config.
| This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib).
**CONFIG_MLX5_CORE_EN=(y/n)**
| Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads.
| mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be
| built-in into mlx5_core.ko.
**CONFIG_MLX5_EN_ARFS=(y/n)**
| Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering.
| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4
**CONFIG_MLX5_EN_RXNFC=(y/n)**
| Enables ethtool receive network flow classification, which allows user defined
| flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API.
**CONFIG_MLX5_CORE_EN_DCB=(y/n)**:
| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_.
**CONFIG_MLX5_MPFS=(y/n)**
| Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC.
| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing
| user configured unicast MAC addresses to the requesting PF.
**CONFIG_MLX5_ESWITCH=(y/n)**
| Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering
| and switching for the enabled VFs and PF in two available modes:
| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_.
| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_.
**CONFIG_MLX5_CORE_IPOIB=(y/n)**
| IPoIB offloads & acceleration support.
| Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma
| IPoIB ulp netdevice.
**CONFIG_MLX5_FPGA=(y/n)**
| Build support for the Innova family of network cards by Mellanox Technologies.
| Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board.
| If you select this option, the mlx5_core driver will include the Innova FPGA core and allow
| building sandbox-specific client drivers.
**CONFIG_MLX5_EN_IPSEC=(y/n)**
| Enables `IPSec XFRM cryptography-offload accelaration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
**CONFIG_MLX5_EN_TLS=(y/n)**
| TLS cryptography-offload accelaration.
**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support.
**External options** ( Choose if the corresponding mlx5 feature is required )
- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled
- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled.
- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool).
Devlink info
============
The devlink info reports the running and stored firmware versions on device.
It also prints the device PSID which represents the HCA board type ID.
User command example::
$ devlink dev info pci/0000:00:06.0
pci/0000:00:06.0:
driver mlx5_core
versions:
fixed:
fw.psid MT_0000000009
running:
fw.version 16.26.0100
stored:
fw.version 16.26.0100
Devlink parameters
==================
flow_steering_mode: Device flow steering mode
---------------------------------------------
The flow steering mode parameter controls the flow steering mode of the driver.
Two modes are supported:
1. 'dmfs' - Device managed flow steering.
2. 'smfs - Software/Driver managed flow steering.
In DMFS mode, the HW steering entities are created and managed through the
Firmware.
In SMFS mode, the HW steering entities are created and managed though by
the driver directly into Hardware without firmware intervention.
SMFS mode is faster and provides better rule inserstion rate compared to default DMFS mode.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
enable_roce: RoCE enablement state
----------------------------------
RoCE enablement state controls driver support for RoCE traffic.
When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well known UDP RoCE port is handled as raw ethernet traffic.
To change RoCE enablement state a user must change the driverinit cmode value and run devlink reload.
User command examples:
- Disable RoCE::
$ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
- Read RoCE enablement state::
$ devlink dev param show pci/0000:06:00.0 name enable_roce
pci/0000:06:00.0:
name enable_roce type generic
values:
cmode driverinit value true
Devlink health reporters
========================
tx reporter
-----------
The tx reporter is responsible for reporting and recovering of the following two error scenarios:
- TX timeout
Report on kernel tx timeout detection.
Recover by searching lost interrupts.
- TX error completion
Report on error tx completion.
Recover by flushing the TX queue and reset it.
TX reporter also support on demand diagnose callback, on which it provides
real time information of its send queues status.
User commands examples:
- Diagnose send queues status::
$ devlink health diagnose pci/0000:82:00.0 reporter tx
NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
- Show number of tx errors indicated, number of recover flows ended successfully,
is autorecover enabled and graceful period from last recover::
$ devlink health show pci/0000:82:00.0 reporter tx
rx reporter
-----------
The rx reporter is responsible for reporting and recovering of the following two error scenarios:
- RX queues initialization (population) timeout
RX queues descriptors population on ring initialization is done in
napi context via triggering an irq, in case of a failure to get
the minimum amount of descriptors, a timeout would occur and it
could be recoverable by polling the EQ (Event Queue).
- RX completions with errors (reported by HW on interrupt context)
Report on rx completion error.
Recover (if needed) by flushing the related queue and reset it.
RX reporter also supports on demand diagnose callback, on which it
provides real time information of its receive queues status.
- Diagnose rx queues status, and corresponding completion queue::
$ devlink health diagnose pci/0000:82:00.0 reporter rx
NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
- Show number of rx errors indicated, number of recover flows ended successfully,
is autorecover enabled and graceful period from last recover::
$ devlink health show pci/0000:82:00.0 reporter rx
fw reporter
-----------
The fw reporter implements diagnose and dump callbacks.
It follows symptoms of fw error such as fw syndrome by triggering
fw core dump and storing it into the dump buffer.
The fw reporter diagnose command can be triggered any time by the user to check
current fw status.
User commands examples:
- Check fw heath status::
$ devlink health diagnose pci/0000:82:00.0 reporter fw
- Read FW core dump if already stored or trigger new one::
$ devlink health dump show pci/0000:82:00.0 reporter fw
NOTE: This command can run only on the PF which has fw tracer ownership,
running it on other PF or any VF will return "Operation not permitted".
fw fatal reporter
-----------------
The fw fatal reporter implements dump and recover callbacks.
It follows fatal errors indications by CR-space dump and recover flow.
The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.
User commands examples:
- Run fw recover flow manually::
$ devlink health recover pci/0000:82:00.0 reporter fw_fatal
- Read FW CR-space dump if already strored or trigger new one::
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
NOTE: This command can run only on PF.
mlx5 tracepoints
================
mlx5 driver provides internal trace points for tracking and debugging using
kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst).
For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/
tc and eswitch offloads tracepoints:
- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5::
$ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace
...
tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT
- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5::
$ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace
...
tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL
- mlx5e_stats_flower: trace flower stats request::
$ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace
...
tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217
- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5::
$ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace
...
kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1
- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events::
$ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace
...
kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1

View File

@@ -0,0 +1,116 @@
.. SPDX-License-Identifier: GPL-2.0
======================
Hyper-V network driver
======================
Compatibility
=============
This driver is compatible with Windows Server 2012 R2, 2016 and
Windows 10.
Features
========
Checksum offload
----------------
The netvsc driver supports checksum offload as long as the
Hyper-V host version does. Windows Server 2016 and Azure
support checksum offload for TCP and UDP for both IPv4 and
IPv6. Windows Server 2012 only supports checksum offload for TCP.
Receive Side Scaling
--------------------
Hyper-V supports receive side scaling. For TCP & UDP, packets can
be distributed among available queues based on IP address and port
number.
For TCP & UDP, we can switch hash level between L3 and L4 by ethtool
command. TCP/UDP over IPv4 and v6 can be set differently. The default
hash level is L4. We currently only allow switching TX hash level
from within the guests.
On Azure, fragmented UDP packets have high loss rate with L4
hashing. Using L3 hashing is recommended in this case.
For example, for UDP over IPv4 on eth0:
To include UDP port numbers in hashing::
ethtool -N eth0 rx-flow-hash udp4 sdfn
To exclude UDP port numbers in hashing::
ethtool -N eth0 rx-flow-hash udp4 sd
To show UDP hash level::
ethtool -n eth0 rx-flow-hash udp4
Generic Receive Offload, aka GRO
--------------------------------
The driver supports GRO and it is enabled by default. GRO coalesces
like packets and significantly reduces CPU usage under heavy Rx
load.
Large Receive Offload (LRO), or Receive Side Coalescing (RSC)
-------------------------------------------------------------
The driver supports LRO/RSC in the vSwitch feature. It reduces the per packet
processing overhead by coalescing multiple TCP segments when possible. The
feature is enabled by default on VMs running on Windows Server 2019 and
later. It may be changed by ethtool command::
ethtool -K eth0 lro on
ethtool -K eth0 lro off
SR-IOV support
--------------
Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV
is enabled in both the vSwitch and the guest configuration, then the
Virtual Function (VF) device is passed to the guest as a PCI
device. In this case, both a synthetic (netvsc) and VF device are
visible in the guest OS and both NIC's have the same MAC address.
The VF is enslaved by netvsc device. The netvsc driver will transparently
switch the data path to the VF when it is available and up.
Network state (addresses, firewall, etc) should be applied only to the
netvsc device; the slave device should not be accessed directly in
most cases. The exceptions are if some special queue discipline or
flow direction is desired, these should be applied directly to the
VF slave device.
Receive Buffer
--------------
Packets are received into a receive area which is created when device
is probed. The receive area is broken into MTU sized chunks and each may
contain one or more packets. The number of receive sections may be changed
via ethtool Rx ring parameters.
There is a similar send buffer which is used to aggregate packets for sending.
The send area is broken into chunks of 6144 bytes, each of section may
contain one or more packets. The send buffer is an optimization, the driver
will use slower method to handle very large packets or if the send buffer
area is exhausted.
XDP support
-----------
XDP (eXpress Data Path) is a feature that runs eBPF bytecode at the early
stage when packets arrive at a NIC card. The goal is to increase performance
for packet processing, reducing the overhead of SKB allocation and other
upper network layers.
hv_netvsc supports XDP in native mode, and transparently sets the XDP
program on the associated VF NIC as well.
Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to
VF NIC automatically. Setting / unsetting XDP program on VF NIC directly
is not recommended, also not propagated to synthetic NIC, and may be
overwritten by setting of synthetic NIC.
XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO
before running XDP::
ethtool -K eth0 lro off
XDP_REDIRECT action is not yet supported.

View File

@@ -0,0 +1,196 @@
.. SPDX-License-Identifier: GPL-2.0
=========================================================
Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver
=========================================================
Release notes for Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver.
.. Contents
- 1. Introduction
- 2. Identifying the adapter/interface
- 3. Features supported
- 4. Command line parameters
- 5. Performance suggestions
- 6. Available Downloads
1. Introduction
===============
This Linux driver supports Neterion's Xframe I PCI-X 1.0 and
Xframe II PCI-X 2.0 adapters. It supports several features
such as jumbo frames, MSI/MSI-X, checksum offloads, TSO, UFO and so on.
See below for complete list of features.
All features are supported for both IPv4 and IPv6.
2. Identifying the adapter/interface
====================================
a. Insert the adapter(s) in your system.
b. Build and load driver::
# insmod s2io.ko
c. View log messages::
# dmesg | tail -40
You will see messages similar to::
eth3: Neterion Xframe I 10GbE adapter (rev 3), Version 2.0.9.1, Intr type INTA
eth4: Neterion Xframe II 10GbE adapter (rev 2), Version 2.0.9.1, Intr type INTA
eth4: Device is on 64 bit 133MHz PCIX(M1) bus
The above messages identify the adapter type(Xframe I/II), adapter revision,
driver version, interface name(eth3, eth4), Interrupt type(INTA, MSI, MSI-X).
In case of Xframe II, the PCI/PCI-X bus width and frequency are displayed
as well.
To associate an interface with a physical adapter use "ethtool -p <ethX>".
The corresponding adapter's LED will blink multiple times.
3. Features supported
=====================
a. Jumbo frames. Xframe I/II supports MTU up to 9600 bytes,
modifiable using ip command.
b. Offloads. Supports checksum offload(TCP/UDP/IP) on transmit
and receive, TSO.
c. Multi-buffer receive mode. Scattering of packet across multiple
buffers. Currently driver supports 2-buffer mode which yields
significant performance improvement on certain platforms(SGI Altix,
IBM xSeries).
d. MSI/MSI-X. Can be enabled on platforms which support this feature
(IA64, Xeon) resulting in noticeable performance improvement(up to 7%
on certain platforms).
e. Statistics. Comprehensive MAC-level and software statistics displayed
using "ethtool -S" option.
f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings,
with multiple steering options.
4. Command line parameters
==========================
a. tx_fifo_num
Number of transmit queues
Valid range: 1-8
Default: 1
b. rx_ring_num
Number of receive rings
Valid range: 1-8
Default: 1
c. tx_fifo_len
Size of each transmit queue
Valid range: Total length of all queues should not exceed 8192
Default: 4096
d. rx_ring_sz
Size of each receive ring(in 4K blocks)
Valid range: Limited by memory on system
Default: 30
e. intr_type
Specifies interrupt type. Possible values 0(INTA), 2(MSI-X)
Valid values: 0, 2
Default: 2
5. Performance suggestions
==========================
General:
a. Set MTU to maximum(9000 for switch setup, 9600 in back-to-back configuration)
b. Set TCP windows size to optimal value.
For instance, for MTU=1500 a value of 210K has been observed to result in
good performance::
# sysctl -w net.ipv4.tcp_rmem="210000 210000 210000"
# sysctl -w net.ipv4.tcp_wmem="210000 210000 210000"
For MTU=9000, TCP window size of 10 MB is recommended::
# sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
# sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
Transmit performance:
a. By default, the driver respects BIOS settings for PCI bus parameters.
However, you may want to experiment with PCI bus parameters
max-split-transactions(MOST) and MMRBC (use setpci command).
A MOST value of 2 has been found optimal for Opterons and 3 for Itanium.
It could be different for your hardware.
Set MMRBC to 4K**.
For example you can set
For opteron::
#setpci -d 17d5:* 62=1d
For Itanium::
#setpci -d 17d5:* 62=3d
For detailed description of the PCI registers, please see Xframe User Guide.
b. Ensure Transmit Checksum offload is enabled. Use ethtool to set/verify this
parameter.
c. Turn on TSO(using "ethtool -K")::
# ethtool -K <ethX> tso on
Receive performance:
a. By default, the driver respects BIOS settings for PCI bus parameters.
However, you may want to set PCI latency timer to 248::
#setpci -d 17d5:* LATENCY_TIMER=f8
For detailed description of the PCI registers, please see Xframe User Guide.
b. Use 2-buffer mode. This results in large performance boost on
certain platforms(eg. SGI Altix, IBM xSeries).
c. Ensure Receive Checksum offload is enabled. Use "ethtool -K ethX" command to
set/verify this option.
d. Enable NAPI feature(in kernel configuration Device Drivers ---> Network
device support ---> Ethernet (10000 Mbit) ---> S2IO 10Gbe Xframe NIC) to
bring down CPU utilization.
.. note::
For AMD opteron platforms with 8131 chipset, MMRBC=1 and MOST=1 are
recommended as safe parameters.
For more information, please review the AMD8131 errata at
http://vip.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/
26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
6. Support
==========
For further support please contact either your 10GbE Xframe NIC vendor (IBM,
HP, SGI etc.)

View File

@@ -0,0 +1,115 @@
.. SPDX-License-Identifier: GPL-2.0
==============================================================================
Neterion's (Formerly S2io) X3100 Series 10GbE PCIe Server Adapter Linux driver
==============================================================================
.. Contents
1) Introduction
2) Features supported
3) Configurable driver parameters
4) Troubleshooting
1. Introduction
===============
This Linux driver supports all Neterion's X3100 series 10 GbE PCIe I/O
Virtualized Server adapters.
The X3100 series supports four modes of operation, configurable via
firmware:
- Single function mode
- Multi function mode
- SRIOV mode
- MRIOV mode
The functions share a 10GbE link and the pci-e bus, but hardly anything else
inside the ASIC. Features like independent hw reset, statistics, bandwidth/
priority allocation and guarantees, GRO, TSO, interrupt moderation etc are
supported independently on each function.
(See below for a complete list of features supported for both IPv4 and IPv6)
2. Features supported
=====================
i) Single function mode (up to 17 queues)
ii) Multi function mode (up to 17 functions)
iii) PCI-SIG's I/O Virtualization
- Single Root mode: v1.0 (up to 17 functions)
- Multi-Root mode: v1.0 (up to 17 functions)
iv) Jumbo frames
X3100 Series supports MTU up to 9600 bytes, modifiable using
ip command.
v) Offloads supported: (Enabled by default)
- Checksum offload (TCP/UDP/IP) on transmit and receive paths
- TCP Segmentation Offload (TSO) on transmit path
- Generic Receive Offload (GRO) on receive path
vi) MSI-X: (Enabled by default)
Resulting in noticeable performance improvement (up to 7% on certain
platforms).
vii) NAPI: (Enabled by default)
For better Rx interrupt moderation.
viii)RTH (Receive Traffic Hash): (Enabled by default)
Receive side steering for better scaling.
ix) Statistics
Comprehensive MAC-level and software statistics displayed using
"ethtool -S" option.
x) Multiple hardware queues: (Enabled by default)
Up to 17 hardware based transmit and receive data channels, with
multiple steering options (transmit multiqueue enabled by default).
3) Configurable driver parameters:
----------------------------------
i) max_config_dev
Specifies maximum device functions to be enabled.
Valid range: 1-8
ii) max_config_port
Specifies number of ports to be enabled.
Valid range: 1,2
Default: 1
iii) max_config_vpath
Specifies maximum VPATH(s) configured for each device function.
Valid range: 1-17
iv) vlan_tag_strip
Enables/disables vlan tag stripping from all received tagged frames that
are not replicated at the internal L2 switch.
Valid range: 0,1 (disabled, enabled respectively)
Default: 1
v) addr_learn_en
Enable learning the mac address of the guest OS interface in
virtualization environment.
Valid range: 0,1 (disabled, enabled respectively)
Default: 0

View File

@@ -0,0 +1,249 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
=============================================
Netronome Flow Processor (NFP) Kernel Drivers
=============================================
Copyright (c) 2019, Netronome Systems, Inc.
Contents
========
- `Overview`_
- `Acquiring Firmware`_
Overview
========
This driver supports Netronome's line of Flow Processor devices,
including the NFP4000, NFP5000, and NFP6000 models, which are also
incorporated in the company's family of Agilio SmartNICs. The SR-IOV
physical and virtual functions for these devices are supported by
the driver.
Acquiring Firmware
==================
The NFP4000 and NFP6000 devices require application specific firmware
to function. Application firmware can be located either on the host file system
or in the device flash (if supported by management firmware).
Firmware files on the host filesystem contain card type (`AMDA-*` string), media
config etc. They should be placed in `/lib/firmware/netronome` directory to
load firmware from the host file system.
Firmware for basic NIC operation is available in the upstream
`linux-firmware.git` repository.
Firmware in NVRAM
-----------------
Recent versions of management firmware supports loading application
firmware from flash when the host driver gets probed. The firmware loading
policy configuration may be used to configure this feature appropriately.
Devlink or ethtool can be used to update the application firmware on the device
flash by providing the appropriate `nic_AMDA*.nffw` file to the respective
command. Users need to take care to write the correct firmware image for the
card and media configuration to flash.
Available storage space in flash depends on the card being used.
Dealing with multiple projects
------------------------------
NFP hardware is fully programmable therefore there can be different
firmware images targeting different applications.
When using application firmware from host, we recommend placing
actual firmware files in application-named subdirectories in
`/lib/firmware/netronome` and linking the desired files, e.g.::
$ tree /lib/firmware/netronome/
/lib/firmware/netronome/
├── bpf
│   ├── nic_AMDA0081-0001_1x40.nffw
│   └── nic_AMDA0081-0001_4x10.nffw
├── flower
│   ├── nic_AMDA0081-0001_1x40.nffw
│   └── nic_AMDA0081-0001_4x10.nffw
├── nic
│   ├── nic_AMDA0081-0001_1x40.nffw
│   └── nic_AMDA0081-0001_4x10.nffw
├── nic_AMDA0081-0001_1x40.nffw -> bpf/nic_AMDA0081-0001_1x40.nffw
└── nic_AMDA0081-0001_4x10.nffw -> bpf/nic_AMDA0081-0001_4x10.nffw
3 directories, 8 files
You may need to use hard instead of symbolic links on distributions
which use old `mkinitrd` command instead of `dracut` (e.g. Ubuntu).
After changing firmware files you may need to regenerate the initramfs
image. Initramfs contains drivers and firmware files your system may
need to boot. Refer to the documentation of your distribution to find
out how to update initramfs. Good indication of stale initramfs
is system loading wrong driver or firmware on boot, but when driver is
later reloaded manually everything works correctly.
Selecting firmware per device
-----------------------------
Most commonly all cards on the system use the same type of firmware.
If you want to load specific firmware image for a specific card, you
can use either the PCI bus address or serial number. Driver will print
which files it's looking for when it recognizes a NFP device::
nfp: Looking for firmware file in order of priority:
nfp: netronome/serial-00-12-34-aa-bb-cc-10-ff.nffw: not found
nfp: netronome/pci-0000:02:00.0.nffw: not found
nfp: netronome/nic_AMDA0081-0001_1x40.nffw: found, loading...
In this case if file (or link) called *serial-00-12-34-aa-bb-5d-10-ff.nffw*
or *pci-0000:02:00.0.nffw* is present in `/lib/firmware/netronome` this
firmware file will take precedence over `nic_AMDA*` files.
Note that `serial-*` and `pci-*` files are **not** automatically included
in initramfs, you will have to refer to documentation of appropriate tools
to find out how to include them.
Firmware loading policy
-----------------------
Firmware loading policy is controlled via three HWinfo parameters
stored as key value pairs in the device flash:
app_fw_from_flash
Defines which firmware should take precedence, 'Disk' (0), 'Flash' (1) or
the 'Preferred' (2) firmware. When 'Preferred' is selected, the management
firmware makes the decision over which firmware will be loaded by comparing
versions of the flash firmware and the host supplied firmware.
This variable is configurable using the 'fw_load_policy'
devlink parameter.
abi_drv_reset
Defines if the driver should reset the firmware when
the driver is probed, either 'Disk' (0) if firmware was found on disk,
'Always' (1) reset or 'Never' (2) reset. Note that the device is always
reset on driver unload if firmware was loaded when the driver was probed.
This variable is configurable using the 'reset_dev_on_drv_probe'
devlink parameter.
abi_drv_load_ifc
Defines a list of PF devices allowed to load FW on the device.
This variable is not currently user configurable.
Statistics
==========
Following device statistics are available through the ``ethtool -S`` interface:
.. flat-table:: NFP device statistics
:header-rows: 1
:widths: 3 1 11
* - Name
- ID
- Meaning
* - dev_rx_discards
- 1
- Packet can be discarded on the RX path for one of the following reasons:
* The NIC is not in promisc mode, and the destination MAC address
doesn't match the interfaces' MAC address.
* The received packet is larger than the max buffer size on the host.
I.e. it exceeds the Layer 3 MRU.
* There is no freelist descriptor available on the host for the packet.
It is likely that the NIC couldn't cache one in time.
* A BPF program discarded the packet.
* The datapath drop action was executed.
* The MAC discarded the packet due to lack of ingress buffer space
on the NIC.
* - dev_rx_errors
- 2
- A packet can be counted (and dropped) as RX error for the following
reasons:
* A problem with the VEB lookup (only when SR-IOV is used).
* A physical layer problem that causes Ethernet errors, like FCS or
alignment errors. The cause is usually faulty cables or SFPs.
* - dev_rx_bytes
- 3
- Total number of bytes received.
* - dev_rx_uc_bytes
- 4
- Unicast bytes received.
* - dev_rx_mc_bytes
- 5
- Multicast bytes received.
* - dev_rx_bc_bytes
- 6
- Broadcast bytes received.
* - dev_rx_pkts
- 7
- Total number of packets received.
* - dev_rx_mc_pkts
- 8
- Multicast packets received.
* - dev_rx_bc_pkts
- 9
- Broadcast packets received.
* - dev_tx_discards
- 10
- A packet can be discarded in the TX direction if the MAC is
being flow controlled and the NIC runs out of TX queue space.
* - dev_tx_errors
- 11
- A packet can be counted as TX error (and dropped) for one for the
following reasons:
* The packet is an LSO segment, but the Layer 3 or Layer 4 offset
could not be determined. Therefore LSO could not continue.
* An invalid packet descriptor was received over PCIe.
* The packet Layer 3 length exceeds the device MTU.
* An error on the MAC/physical layer. Usually due to faulty cables or
SFPs.
* A CTM buffer could not be allocated.
* The packet offset was incorrect and could not be fixed by the NIC.
* - dev_tx_bytes
- 12
- Total number of bytes transmitted.
* - dev_tx_uc_bytes
- 13
- Unicast bytes transmitted.
* - dev_tx_mc_bytes
- 14
- Multicast bytes transmitted.
* - dev_tx_bc_bytes
- 15
- Broadcast bytes transmitted.
* - dev_tx_pkts
- 16
- Total number of packets transmitted.
* - dev_tx_mc_pkts
- 17
- Multicast packets transmitted.
* - dev_tx_bc_pkts
- 18
- Broadcast packets transmitted.
Note that statistics unknown to the driver will be displayed as
``dev_unknown_stat$ID``, where ``$ID`` refers to the second column
above.

View File

@@ -0,0 +1,274 @@
.. SPDX-License-Identifier: GPL-2.0+
========================================================
Linux Driver for the Pensando(R) Ethernet adapter family
========================================================
Pensando Linux Ethernet driver.
Copyright(c) 2019 Pensando Systems, Inc
Contents
========
- Identifying the Adapter
- Enabling the driver
- Configuring the driver
- Statistics
- Support
Identifying the Adapter
=======================
To find if one or more Pensando PCI Ethernet devices are installed on the
host, check for the PCI devices::
$ lspci -d 1dd8:
b5:00.0 Ethernet controller: Device 1dd8:1002
b6:00.0 Ethernet controller: Device 1dd8:1002
If such devices are listed as above, then the ionic.ko driver should find
and configure them for use. There should be log entries in the kernel
messages such as these::
$ dmesg | grep ionic
ionic 0000:b5:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
ionic 0000:b5:00.0 enp181s0: renamed from eth0
ionic 0000:b5:00.0 enp181s0: Link up - 100 Gbps
ionic 0000:b6:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
ionic 0000:b6:00.0 enp182s0: renamed from eth0
ionic 0000:b6:00.0 enp182s0: Link up - 100 Gbps
Driver and firmware version information can be gathered with either of
ethtool or devlink tools::
$ ethtool -i enp181s0
driver: ionic
version: 5.7.0
firmware-version: 1.8.0-28
...
$ devlink dev info pci/0000:b5:00.0
pci/0000:b5:00.0:
driver ionic
serial_number FLM18420073
versions:
fixed:
asic.id 0x0
asic.rev 0x0
running:
fw 1.8.0-28
See Documentation/networking/devlink/ionic.rst for more information
on the devlink dev info data.
Enabling the driver
===================
The driver is enabled via the standard kernel configuration system,
using the make command::
make oldconfig/menuconfig/etc.
The driver is located in the menu structure at:
-> Device Drivers
-> Network device support (NETDEVICES [=y])
-> Ethernet driver support
-> Pensando devices
-> Pensando Ethernet IONIC Support
Configuring the Driver
======================
MTU
---
Jumbo frame support is available with a maximim size of 9194 bytes.
Interrupt coalescing
--------------------
Interrupt coalescing can be configured by changing the rx-usecs value with
the "ethtool -C" command. The rx-usecs range is 0-190. The tx-usecs value
reflects the rx-usecs value as they are tied together on the same interrupt.
SR-IOV
------
Minimal SR-IOV support is currently offered and can be enabled by setting
the sysfs 'sriov_numvfs' value, if supported by your particular firmware
configuration.
Statistics
==========
Basic hardware stats
--------------------
The commands ``netstat -i``, ``ip -s link show``, and ``ifconfig`` show
a limited set of statistics taken directly from firmware. For example::
$ ip -s link show enp181s0
7: enp181s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:ae:cd:00:07:68 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
414 5 0 0 0 0
TX: bytes packets errors dropped carrier collsns
1384 18 0 0 0 0
ethtool -S
----------
The statistics shown from the ``ethtool -S`` command includes a combination of
driver counters and firmware counters, including port and queue specific values.
The driver values are counters computed by the driver, and the firmware values
are gathered by the firmware from the port hardware and passed through the
driver with no further interpretation.
Driver port specific::
tx_packets: 12
tx_bytes: 964
rx_packets: 5
rx_bytes: 414
tx_tso: 0
tx_tso_bytes: 0
tx_csum_none: 12
tx_csum: 0
rx_csum_none: 0
rx_csum_complete: 3
rx_csum_error: 0
Driver queue specific::
tx_0_pkts: 3
tx_0_bytes: 294
tx_0_clean: 3
tx_0_dma_map_err: 0
tx_0_linearize: 0
tx_0_frags: 0
tx_0_tso: 0
tx_0_tso_bytes: 0
tx_0_csum_none: 3
tx_0_csum: 0
tx_0_vlan_inserted: 0
rx_0_pkts: 2
rx_0_bytes: 120
rx_0_dma_map_err: 0
rx_0_alloc_err: 0
rx_0_csum_none: 0
rx_0_csum_complete: 0
rx_0_csum_error: 0
rx_0_dropped: 0
rx_0_vlan_stripped: 0
Firmware port specific::
hw_tx_dropped: 0
hw_rx_dropped: 0
hw_rx_over_errors: 0
hw_rx_missed_errors: 0
hw_tx_aborted_errors: 0
frames_rx_ok: 15
frames_rx_all: 15
frames_rx_bad_fcs: 0
frames_rx_bad_all: 0
octets_rx_ok: 1290
octets_rx_all: 1290
frames_rx_unicast: 10
frames_rx_multicast: 5
frames_rx_broadcast: 0
frames_rx_pause: 0
frames_rx_bad_length: 0
frames_rx_undersized: 0
frames_rx_oversized: 0
frames_rx_fragments: 0
frames_rx_jabber: 0
frames_rx_pripause: 0
frames_rx_stomped_crc: 0
frames_rx_too_long: 0
frames_rx_vlan_good: 3
frames_rx_dropped: 0
frames_rx_less_than_64b: 0
frames_rx_64b: 4
frames_rx_65b_127b: 11
frames_rx_128b_255b: 0
frames_rx_256b_511b: 0
frames_rx_512b_1023b: 0
frames_rx_1024b_1518b: 0
frames_rx_1519b_2047b: 0
frames_rx_2048b_4095b: 0
frames_rx_4096b_8191b: 0
frames_rx_8192b_9215b: 0
frames_rx_other: 0
frames_tx_ok: 31
frames_tx_all: 31
frames_tx_bad: 0
octets_tx_ok: 2614
octets_tx_total: 2614
frames_tx_unicast: 8
frames_tx_multicast: 21
frames_tx_broadcast: 2
frames_tx_pause: 0
frames_tx_pripause: 0
frames_tx_vlan: 0
frames_tx_less_than_64b: 0
frames_tx_64b: 4
frames_tx_65b_127b: 27
frames_tx_128b_255b: 0
frames_tx_256b_511b: 0
frames_tx_512b_1023b: 0
frames_tx_1024b_1518b: 0
frames_tx_1519b_2047b: 0
frames_tx_2048b_4095b: 0
frames_tx_4096b_8191b: 0
frames_tx_8192b_9215b: 0
frames_tx_other: 0
frames_tx_pri_0: 0
frames_tx_pri_1: 0
frames_tx_pri_2: 0
frames_tx_pri_3: 0
frames_tx_pri_4: 0
frames_tx_pri_5: 0
frames_tx_pri_6: 0
frames_tx_pri_7: 0
frames_rx_pri_0: 0
frames_rx_pri_1: 0
frames_rx_pri_2: 0
frames_rx_pri_3: 0
frames_rx_pri_4: 0
frames_rx_pri_5: 0
frames_rx_pri_6: 0
frames_rx_pri_7: 0
tx_pripause_0_1us_count: 0
tx_pripause_1_1us_count: 0
tx_pripause_2_1us_count: 0
tx_pripause_3_1us_count: 0
tx_pripause_4_1us_count: 0
tx_pripause_5_1us_count: 0
tx_pripause_6_1us_count: 0
tx_pripause_7_1us_count: 0
rx_pripause_0_1us_count: 0
rx_pripause_1_1us_count: 0
rx_pripause_2_1us_count: 0
rx_pripause_3_1us_count: 0
rx_pripause_4_1us_count: 0
rx_pripause_5_1us_count: 0
rx_pripause_6_1us_count: 0
rx_pripause_7_1us_count: 0
rx_pause_1us_count: 0
frames_tx_truncated: 0
Support
=======
For general Linux networking support, please use the netdev mailing
list, which is monitored by Pensando personnel::
netdev@vger.kernel.org
For more specific support needs, please use the Pensando driver support
email::
drivers@pensando.io

View File

@@ -0,0 +1,48 @@
.. SPDX-License-Identifier: GPL-2.0
================
SMC 9xxxx Driver
================
Revision 0.12
3/5/96
Copyright 1996 Erik Stahlman
Released under terms of the GNU General Public License.
This file contains the instructions and caveats for my SMC9xxx driver. You
should not be using the driver without reading this file.
Things to note about installation:
1. The driver should work on all kernels from 1.2.13 until 1.3.71.
(A kernel patch is supplied for 1.3.71 )
2. If you include this into the kernel, you might need to change some
options, such as for forcing IRQ.
3. To compile as a module, run 'make'.
Make will give you the appropriate options for various kernel support.
4. Loading the driver as a module::
use: insmod smc9194.o
optional parameters:
io=xxxx : your base address
irq=xx : your irq
ifport=x : 0 for whatever is default
1 for twisted pair
2 for AUI ( or BNC on some cards )
How to obtain the latest version?
FTP:
ftp://fenris.campus.vt.edu/smc9/smc9-12.tar.gz
ftp://sfbox.vt.edu/filebox/F/fenris/smc9/smc9-12.tar.gz
Contacting me:
erik@mail.vt.edu

View File

@@ -0,0 +1,700 @@
.. SPDX-License-Identifier: GPL-2.0+
==============================================================
Linux Driver for the Synopsys(R) Ethernet Controllers "stmmac"
==============================================================
Authors: Giuseppe Cavallaro <peppe.cavallaro@st.com>,
Alexandre Torgue <alexandre.torgue@st.com>, Jose Abreu <joabreu@synopsys.com>
Contents
========
- In This Release
- Feature List
- Kernel Configuration
- Command Line Parameters
- Driver Information and Notes
- Debug Information
- Support
In This Release
===============
This file describes the stmmac Linux Driver for all the Synopsys(R) Ethernet
Controllers.
Currently, this network device driver is for all STi embedded MAC/GMAC
(i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XILINX XC2V3000
FF1152AMT0221 D1215994A VIRTEX FPGA board. The Synopsys Ethernet QoS 5.0 IPK
is also supported.
DesignWare(R) Cores Ethernet MAC 10/100/1000 Universal version 3.70a
(and older) and DesignWare(R) Cores Ethernet Quality-of-Service version 4.0
(and upper) have been used for developing this driver as well as
DesignWare(R) Cores XGMAC - 10G Ethernet MAC and DesignWare(R) Cores
Enterprise MAC - 100G Ethernet MAC.
This driver supports both the platform bus and PCI.
This driver includes support for the following Synopsys(R) DesignWare(R)
Cores Ethernet Controllers and corresponding minimum and maximum versions:
+-------------------------------+--------------+--------------+--------------+
| Controller Name | Min. Version | Max. Version | Abbrev. Name |
+===============================+==============+==============+==============+
| Ethernet MAC Universal | N/A | 3.73a | GMAC |
+-------------------------------+--------------+--------------+--------------+
| Ethernet Quality-of-Service | 4.00a | N/A | GMAC4+ |
+-------------------------------+--------------+--------------+--------------+
| XGMAC - 10G Ethernet MAC | 2.10a | N/A | XGMAC2+ |
+-------------------------------+--------------+--------------+--------------+
| XLGMAC - 100G Ethernet MAC | 2.00a | N/A | XLGMAC2+ |
+-------------------------------+--------------+--------------+--------------+
For questions related to hardware requirements, refer to the documentation
supplied with your Ethernet adapter. All hardware requirements listed apply
to use with Linux.
Feature List
============
The following features are available in this driver:
- GMII/MII/RGMII/SGMII/RMII/XGMII/XLGMII Interface
- Half-Duplex / Full-Duplex Operation
- Energy Efficient Ethernet (EEE)
- IEEE 802.3x PAUSE Packets (Flow Control)
- RMON/MIB Counters
- IEEE 1588 Timestamping (PTP)
- Pulse-Per-Second Output (PPS)
- MDIO Clause 22 / Clause 45 Interface
- MAC Loopback
- ARP Offloading
- Automatic CRC / PAD Insertion and Checking
- Checksum Offload for Received and Transmitted Packets
- Standard or Jumbo Ethernet Packets
- Source Address Insertion / Replacement
- VLAN TAG Insertion / Replacement / Deletion / Filtering (HASH and PERFECT)
- Programmable TX and RX Watchdog and Coalesce Settings
- Destination Address Filtering (PERFECT)
- HASH Filtering (Multicast)
- Layer 3 / Layer 4 Filtering
- Remote Wake-Up Detection
- Receive Side Scaling (RSS)
- Frame Preemption for TX and RX
- Programmable Burst Length, Threshold, Queue Size
- Multiple Queues (up to 8)
- Multiple Scheduling Algorithms (TX: WRR, DWRR, WFQ, SP, CBS, EST, TBS;
RX: WRR, SP)
- Flexible RX Parser
- TCP / UDP Segmentation Offload (TSO, USO)
- Split Header (SPH)
- Safety Features (ECC Protection, Data Parity Protection)
- Selftests using Ethtool
Kernel Configuration
====================
The kernel configuration option is ``CONFIG_STMMAC_ETH``:
- ``CONFIG_STMMAC_PLATFORM``: is to enable the platform driver.
- ``CONFIG_STMMAC_PCI``: is to enable the pci driver.
Command Line Parameters
=======================
If the driver is built as a module the following optional parameters are used
by entering them on the command line with the modprobe command using this
syntax (e.g. for PCI module)::
modprobe stmmac_pci [<option>=<VAL1>,<VAL2>,...]
Driver parameters can be also passed in command line by using::
stmmaceth=watchdog:100,chain_mode=1
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
watchdog
--------
:Valid Range: 5000-None
:Default Value: 5000
This parameter overrides the transmit timeout in milliseconds.
debug
-----
:Valid Range: 0-16 (0=none,...,16=all)
:Default Value: 0
This parameter adjusts the level of debug messages displayed in the system
logs.
phyaddr
-------
:Valid Range: 0-31
:Default Value: -1
This parameter overrides the physical address of the PHY device.
flow_ctrl
---------
:Valid Range: 0-3 (0=off,1=rx,2=tx,3=rx/tx)
:Default Value: 3
This parameter changes the default Flow Control ability.
pause
-----
:Valid Range: 0-65535
:Default Value: 65535
This parameter changes the default Flow Control Pause time.
tc
--
:Valid Range: 64-256
:Default Value: 64
This parameter changes the default HW FIFO Threshold control value.
buf_sz
------
:Valid Range: 1536-16384
:Default Value: 1536
This parameter changes the default RX DMA packet buffer size.
eee_timer
---------
:Valid Range: 0-None
:Default Value: 1000
This parameter changes the default LPI TX Expiration time in milliseconds.
chain_mode
----------
:Valid Range: 0-1 (0=off,1=on)
:Default Value: 0
This parameter changes the default mode of operation from Ring Mode to
Chain Mode.
Driver Information and Notes
============================
Transmit Process
----------------
The xmit method is invoked when the kernel needs to transmit a packet; it sets
the descriptors in the ring and informs the DMA engine that there is a packet
ready to be transmitted.
By default, the driver sets the ``NETIF_F_SG`` bit in the features field of
the ``net_device`` structure, enabling the scatter-gather feature. This is
true on chips and configurations where the checksum can be done in hardware.
Once the controller has finished transmitting the packet, timer will be
scheduled to release the transmit resources.
Receive Process
---------------
When one or more packets are received, an interrupt happens. The interrupts
are not queued, so the driver has to scan all the descriptors in the ring
during the receive process.
This is based on NAPI, so the interrupt handler signals only if there is work
to be done, and it exits. Then the poll method will be scheduled at some
future point.
The incoming packets are stored, by the DMA, in a list of pre-allocated socket
buffers in order to avoid the memcpy (zero-copy).
Interrupt Mitigation
--------------------
The driver is able to mitigate the number of its DMA interrupts using NAPI for
the reception on chips older than the 3.50. New chips have an HW RX Watchdog
used for this mitigation.
Mitigation parameters can be tuned by ethtool.
WoL
---
Wake up on Lan feature through Magic and Unicast frames are supported for the
GMAC, GMAC4/5 and XGMAC core.
DMA Descriptors
---------------
Driver handles both normal and alternate descriptors. The latter has been only
tested on DesignWare(R) Cores Ethernet MAC Universal version 3.41a and later.
stmmac supports DMA descriptor to operate both in dual buffer (RING) and
linked-list(CHAINED) mode. In RING each descriptor points to two data buffer
pointers whereas in CHAINED mode they point to only one data buffer pointer.
RING mode is the default.
In CHAINED mode each descriptor will have pointer to next descriptor in the
list, hence creating the explicit chaining in the descriptor itself, whereas
such explicit chaining is not possible in RING mode.
Extended Descriptors
--------------------
The extended descriptors give us information about the Ethernet payload when
it is carrying PTP packets or TCP/UDP/ICMP over IP. These are not available on
GMAC Synopsys(R) chips older than the 3.50. At probe time the driver will
decide if these can be actually used. This support also is mandatory for PTPv2
because the extra descriptors are used for saving the hardware timestamps and
Extended Status.
Ethtool Support
---------------
Ethtool is supported. For example, driver statistics (including RMON),
internal errors can be taken using::
ethtool -S ethX
Ethtool selftests are also supported. This allows to do some early sanity
checks to the HW using MAC and PHY loopback mechanisms::
ethtool -t ethX
Jumbo and Segmentation Offloading
---------------------------------
Jumbo frames are supported and tested for the GMAC. The GSO has been also
added but it's performed in software. LRO is not supported.
TSO Support
-----------
TSO (TCP Segmentation Offload) feature is supported by GMAC > 4.x and XGMAC
chip family. When a packet is sent through TCP protocol, the TCP stack ensures
that the SKB provided to the low level driver (stmmac in our case) matches
with the maximum frame len (IP header + TCP header + payload <= 1500 bytes
(for MTU set to 1500)). It means that if an application using TCP want to send
a packet which will have a length (after adding headers) > 1514 the packet
will be split in several TCP packets: The data payload is split and headers
(TCP/IP ..) are added. It is done by software.
When TSO is enabled, the TCP stack doesn't care about the maximum frame length
and provide SKB packet to stmmac as it is. The GMAC IP will have to perform
the segmentation by it self to match with maximum frame length.
This feature can be enabled in device tree through ``snps,tso`` entry.
Energy Efficient Ethernet
-------------------------
Energy Efficient Ethernet (EEE) enables IEEE 802.3 MAC sublayer along with a
family of Physical layer to operate in the Low Power Idle (LPI) mode. The EEE
mode supports the IEEE 802.3 MAC operation at 100Mbps, 1000Mbps and 1Gbps.
The LPI mode allows power saving by switching off parts of the communication
device functionality when there is no data to be transmitted & received.
The system on both the side of the link can disable some functionalities and
save power during the period of low-link utilization. The MAC controls whether
the system should enter or exit the LPI mode and communicate this to PHY.
As soon as the interface is opened, the driver verifies if the EEE can be
supported. This is done by looking at both the DMA HW capability register and
the PHY devices MCD registers.
To enter in TX LPI mode the driver needs to have a software timer that enable
and disable the LPI mode when there is nothing to be transmitted.
Precision Time Protocol (PTP)
-----------------------------
The driver supports the IEEE 1588-2002, Precision Time Protocol (PTP), which
enables precise synchronization of clocks in measurement and control systems
implemented with technologies such as network communication.
In addition to the basic timestamp features mentioned in IEEE 1588-2002
Timestamps, new GMAC cores support the advanced timestamp features.
IEEE 1588-2008 can be enabled when configuring the Kernel.
SGMII/RGMII Support
-------------------
New GMAC devices provide own way to manage RGMII/SGMII. This information is
available at run-time by looking at the HW capability register. This means
that the stmmac can manage auto-negotiation and link status w/o using the
PHYLIB stuff. In fact, the HW provides a subset of extended registers to
restart the ANE, verify Full/Half duplex mode and Speed. Thanks to these
registers, it is possible to look at the Auto-negotiated Link Parter Ability.
Physical
--------
The driver is compatible with Physical Abstraction Layer to be connected with
PHY and GPHY devices.
Platform Information
--------------------
Several information can be passed through the platform and device-tree.
::
struct plat_stmmacenet_data {
1) Bus identifier::
int bus_id;
2) PHY Physical Address. If set to -1 the driver will pick the first PHY it
finds::
int phy_addr;
3) PHY Device Interface::
int interface;
4) Specific platform fields for the MDIO bus::
struct stmmac_mdio_bus_data *mdio_bus_data;
5) Internal DMA parameters::
struct stmmac_dma_cfg *dma_cfg;
6) Fixed CSR Clock Range selection::
int clk_csr;
7) HW uses the GMAC core::
int has_gmac;
8) If set the MAC will use Enhanced Descriptors::
int enh_desc;
9) Core is able to perform TX Checksum and/or RX Checksum in HW::
int tx_coe;
int rx_coe;
11) Some HWs are not able to perform the csum in HW for over-sized frames due
to limited buffer sizes. Setting this flag the csum will be done in SW on
JUMBO frames::
int bugged_jumbo;
12) Core has the embedded power module::
int pmt;
13) Force DMA to use the Store and Forward mode or Threshold mode::
int force_sf_dma_mode;
int force_thresh_dma_mode;
15) Force to disable the RX Watchdog feature and switch to NAPI mode::
int riwt_off;
16) Limit the maximum operating speed and MTU::
int max_speed;
int maxmtu;
18) Number of Multicast/Unicast filters::
int multicast_filter_bins;
int unicast_filter_entries;
20) Limit the maximum TX and RX FIFO size::
int tx_fifo_size;
int rx_fifo_size;
21) Use the specified number of TX and RX Queues::
u32 rx_queues_to_use;
u32 tx_queues_to_use;
22) Use the specified TX and RX scheduling algorithm::
u8 rx_sched_algorithm;
u8 tx_sched_algorithm;
23) Internal TX and RX Queue parameters::
struct stmmac_rxq_cfg rx_queues_cfg[MTL_MAX_RX_QUEUES];
struct stmmac_txq_cfg tx_queues_cfg[MTL_MAX_TX_QUEUES];
24) This callback is used for modifying some syscfg registers (on ST SoCs)
according to the link speed negotiated by the physical layer::
void (*fix_mac_speed)(void *priv, unsigned int speed);
25) Callbacks used for calling a custom initialization; This is sometimes
necessary on some platforms (e.g. ST boxes) where the HW needs to have set
some PIO lines or system cfg registers. init/exit callbacks should not use
or modify platform data::
int (*init)(struct platform_device *pdev, void *priv);
void (*exit)(struct platform_device *pdev, void *priv);
26) Perform HW setup of the bus. For example, on some ST platforms this field
is used to configure the AMBA bridge to generate more efficient STBus traffic::
struct mac_device_info *(*setup)(void *priv);
void *bsp_priv;
27) Internal clocks and rates::
struct clk *stmmac_clk;
struct clk *pclk;
struct clk *clk_ptp_ref;
unsigned int clk_ptp_rate;
unsigned int clk_ref_rate;
s32 ptp_max_adj;
28) Main reset::
struct reset_control *stmmac_rst;
29) AXI Internal Parameters::
struct stmmac_axi *axi;
30) HW uses GMAC>4 cores::
int has_gmac4;
31) HW is sun8i based::
bool has_sun8i;
32) Enables TSO feature::
bool tso_en;
33) Enables Receive Side Scaling (RSS) feature::
int rss_en;
34) MAC Port selection::
int mac_port_sel_speed;
35) Enables TX LPI Clock Gating::
bool en_tx_lpi_clockgating;
36) HW uses XGMAC>2.10 cores::
int has_xgmac;
::
}
For MDIO bus data, we have:
::
struct stmmac_mdio_bus_data {
1) PHY mask passed when MDIO bus is registered::
unsigned int phy_mask;
2) List of IRQs, one per PHY::
int *irqs;
3) If IRQs is NULL, use this for probed PHY::
int probed_phy_irq;
4) Set to true if PHY needs reset::
bool needs_reset;
::
}
For DMA engine configuration, we have:
::
struct stmmac_dma_cfg {
1) Programmable Burst Length (TX and RX)::
int pbl;
2) If set, DMA TX / RX will use this value rather than pbl::
int txpbl;
int rxpbl;
3) Enable 8xPBL::
bool pblx8;
4) Enable Fixed or Mixed burst::
int fixed_burst;
int mixed_burst;
5) Enable Address Aligned Beats::
bool aal;
6) Enable Enhanced Addressing (> 32 bits)::
bool eame;
::
}
For DMA AXI parameters, we have:
::
struct stmmac_axi {
1) Enable AXI LPI::
bool axi_lpi_en;
bool axi_xit_frm;
2) Set AXI Write / Read maximum outstanding requests::
u32 axi_wr_osr_lmt;
u32 axi_rd_osr_lmt;
3) Set AXI 4KB bursts::
bool axi_kbbe;
4) Set AXI maximum burst length map::
u32 axi_blen[AXI_BLEN];
5) Set AXI Fixed burst / mixed burst::
bool axi_fb;
bool axi_mb;
6) Set AXI rebuild incrx mode::
bool axi_rb;
::
}
For the RX Queues configuration, we have:
::
struct stmmac_rxq_cfg {
1) Mode to use (DCB or AVB)::
u8 mode_to_use;
2) DMA channel to use::
u32 chan;
3) Packet routing, if applicable::
u8 pkt_route;
4) Use priority routing, and priority to route::
bool use_prio;
u32 prio;
::
}
For the TX Queues configuration, we have:
::
struct stmmac_txq_cfg {
1) Queue weight in scheduler::
u32 weight;
2) Mode to use (DCB or AVB)::
u8 mode_to_use;
3) Credit Base Shaper Parameters::
u32 send_slope;
u32 idle_slope;
u32 high_credit;
u32 low_credit;
4) Use priority scheduling, and priority::
bool use_prio;
u32 prio;
::
}
Device Tree Information
-----------------------
Please refer to the following document:
Documentation/devicetree/bindings/net/snps,dwmac.yaml
HW Capabilities
---------------
Note that, starting from new chips, where it is available the HW capability
register, many configurations are discovered at run-time for example to
understand if EEE, HW csum, PTP, enhanced descriptor etc are actually
available. As strategy adopted in this driver, the information from the HW
capability register can replace what has been passed from the platform.
Debug Information
=================
The driver exports many information i.e. internal statistics, debug
information, MAC and DMA registers etc.
These can be read in several ways depending on the type of the information
actually needed.
For example a user can be use the ethtool support to get statistics: e.g.
using: ``ethtool -S ethX`` (that shows the Management counters (MMC) if
supported) or sees the MAC/DMA registers: e.g. using: ``ethtool -d ethX``
Compiling the Kernel with ``CONFIG_DEBUG_FS`` the driver will export the
following debugfs entries:
- ``descriptors_status``: To show the DMA TX/RX descriptor rings
- ``dma_cap``: To show the HW Capabilities
Developer can also use the ``debug`` module parameter to get further debug
information (please see: NETIF Msg Level).
Support
=======
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the
issue to netdev@vger.kernel.org

View File

@@ -0,0 +1,587 @@
.. SPDX-License-Identifier: GPL-2.0
======================================
Texas Instruments CPSW ethernet driver
======================================
Multiqueue & CBS & MQPRIO
=========================
The cpsw has 3 CBS shapers for each external ports. This document
describes MQPRIO and CBS Qdisc offload configuration for cpsw driver
based on examples. It potentially can be used in audio video bridging
(AVB) and time sensitive networking (TSN).
The following examples were tested on AM572x EVM and BBB boards.
Test setup
==========
Under consideration two examples with AM572x EVM running cpsw driver
in dual_emac mode.
Several prerequisites:
- TX queues must be rated starting from txq0 that has highest priority
- Traffic classes are used starting from 0, that has highest priority
- CBS shapers should be used with rated queues
- The bandwidth for CBS shapers has to be set a little bit more then
potential incoming rate, thus, rate of all incoming tx queues has
to be a little less
- Real rates can differ, due to discreetness
- Map skb-priority to txq is not enough, also skb-priority to l2 prio
map has to be created with ip or vconfig tool
- Any l2/socket prio (0 - 7) for classes can be used, but for
simplicity default values are used: 3 and 2
- only 2 classes tested: A and B, but checked and can work with more,
maximum allowed 4, but only for 3 rate can be set.
Test setup for examples
=======================
::
+-------------------------------+
|--+ |
| | Workstation0 |
|E | MAC 18:03:73:66:87:42 |
+-----------------------------+ +--|t | |
| | 1 | E | | |h |./tsn_listener -d \ |
| Target board: | 0 | t |--+ |0 | 18:03:73:66:87:42 -i eth0 \|
| AM572x EVM | 0 | h | | | -s 1500 |
| | 0 | 0 | |--+ |
| Only 2 classes: |Mb +---| +-------------------------------+
| class A, class B | |
| | +---| +-------------------------------+
| | 1 | E | |--+ |
| | 0 | t | | | Workstation1 |
| | 0 | h |--+ |E | MAC 20:cf:30:85:7d:fd |
| |Mb | 1 | +--|t | |
+-----------------------------+ |h |./tsn_listener -d \ |
|0 | 20:cf:30:85:7d:fd -i eth0 \|
| | -s 1500 |
|--+ |
+-------------------------------+
Example 1: One port tx AVB configuration scheme for target board
----------------------------------------------------------------
(prints and scheme for AM572x evm, applicable for single port boards)
- tc - traffic class
- txq - transmit queue
- p - priority
- f - fifo (cpsw fifo)
- S - shaper configured
::
+------------------------------------------------------------------+ u
| +---------------+ +---------------+ +------+ +------+ | s
| | | | | | | | | | e
| | App 1 | | App 2 | | Apps | | Apps | | r
| | Class A | | Class B | | Rest | | Rest | |
| | Eth0 | | Eth0 | | Eth0 | | Eth1 | | s
| | VLAN100 | | VLAN100 | | | | | | | | p
| | 40 Mb/s | | 20 Mb/s | | | | | | | | a
| | SO_PRIORITY=3 | | SO_PRIORITY=2 | | | | | | | | c
| | | | | | | | | | | | | | e
| +---|-----------+ +---|-----------+ +---|--+ +---|--+ |
+-----|------------------|------------------|--------|-------------+
+-+ +------------+ | |
| | +-----------------+ +--+
| | | |
+---|-------|-------------|-----------------------|----------------+
| +----+ +----+ +----+ +----+ +----+ |
| | p3 | | p2 | | p1 | | p0 | | p0 | | k
| \ / \ / \ / \ / \ / | e
| \ / \ / \ / \ / \ / | r
| \/ \/ \/ \/ \/ | n
| | | | | | e
| | | +-----+ | | l
| | | | | |
| +----+ +----+ +----+ +----+ | s
| |tc0 | |tc1 | |tc2 | |tc0 | | p
| \ / \ / \ / \ / | a
| \ / \ / \ / \ / | c
| \/ \/ \/ \/ | e
| | | +-----+ | |
| | | | | | |
| | | | | | |
| | | | | | |
| +----+ +----+ +----+ +----+ +----+ |
| |txq0| |txq1| |txq2| |txq3| |txq4| |
| \ / \ / \ / \ / \ / |
| \ / \ / \ / \ / \ / |
| \/ \/ \/ \/ \/ |
| +-|------|------|------|--+ +--|--------------+ |
| | | | | | | Eth0.100 | | Eth1 | |
+---|------|------|------|------------------------|----------------+
| | | | |
p p p p |
3 2 0-1, 4-7 <- L2 priority |
| | | | |
| | | | |
+---|------|------|------|------------------------|----------------+
| | | | | |----------+ |
| +----+ +----+ +----+ +----+ +----+ |
| |dma7| |dma6| |dma5| |dma4| |dma3| |
| \ / \ / \ / \ / \ / | c
| \S / \S / \ / \ / \ / | p
| \/ \/ \/ \/ \/ | s
| | | | +----- | | w
| | | | | | |
| | | | | | | d
| +----+ +----+ +----+p p+----+ | r
| | | | | | |o o| | | i
| | f3 | | f2 | | f0 |r r| f0 | | v
| |tc0 | |tc1 | |tc2 |t t|tc0 | | e
| \CBS / \CBS / \CBS /1 2\CBS / | r
| \S / \S / \ / \ / |
| \/ \/ \/ \/ |
+------------------------------------------------------------------+
1) ::
// Add 4 tx queues, for interface Eth0, and 1 tx queue for Eth1
$ ethtool -L eth0 rx 1 tx 5
rx unmodified, ignoring
2) ::
// Check if num of queues is set correctly:
$ ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 8
TX: 8
Other: 0
Combined: 0
Current hardware settings:
RX: 1
TX: 5
Other: 0
Combined: 0
3) ::
// TX queues must be rated starting from 0, so set bws for tx0 and tx1
// Set rates 40 and 20 Mb/s appropriately.
// Pay attention, real speed can differ a bit due to discreetness.
// Leave last 2 tx queues not rated.
$ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
$ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
4) ::
// Check maximum rate of tx (cpdma) queues:
$ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
40
20
0
0
0
5) ::
// Map skb->priority to traffic class:
// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
// Map traffic class to transmit queue:
// tc0 -> txq0, tc1 -> txq1, tc2 -> (txq2, txq3)
$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1
5a) ::
// As two interface sharing same set of tx queues, assign all traffic
// coming to interface Eth1 to separate queue in order to not mix it
// with traffic from interface Eth0, so use separate txq to send
// packets to Eth1, so all prio -> tc0 and tc0 -> txq4
// Here hw 0, so here still default configuration for eth1 in hw
$ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@4 hw 0
6) ::
// Check classes settings
$ tc -g class show dev eth0
+---(100:ffe2) mqprio
| +---(100:3) mqprio
| +---(100:4) mqprio
|
+---(100:ffe1) mqprio
| +---(100:2) mqprio
|
+---(100:ffe0) mqprio
+---(100:1) mqprio
$ tc -g class show dev eth1
+---(100:ffe0) mqprio
+---(100:5) mqprio
7) ::
// Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc
// Set it +1 Mb for reserve (important!)
// here only idle slope is important, others arg are ignored
// Pay attention, real speed can differ a bit due to discreetness
$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1438 \
hicredit 62 sendslope -959000 idleslope 41000 offload 1
net eth0: set FIFO3 bw = 50
8) ::
// Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc:
// Set it +1 Mb for reserve (important!)
$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1468 \
hicredit 65 sendslope -979000 idleslope 21000 offload 1
net eth0: set FIFO2 bw = 30
9) ::
// Create vlan 100 to map sk->priority to vlan qos
$ ip link add link eth0 name eth0.100 type vlan id 100
8021q: 802.1Q VLAN Support v1.8
8021q: adding VLAN 0 to HW filter on device eth0
8021q: adding VLAN 0 to HW filter on device eth1
net eth0: Adding vlanid 100 to vlan filter
10) ::
// Map skb->priority to L2 prio, 1 to 1
$ ip link set eth0.100 type vlan \
egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
11) ::
// Check egress map for vlan 100
$ cat /proc/net/vlan/eth0.100
[...]
INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
12) ::
// Run your appropriate tools with socket option "SO_PRIORITY"
// to 3 for class A and/or to 2 for class B
// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
13) ::
// run your listener on workstation (should be in same vlan)
// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39000 kbps
14) ::
// Restore default configuration if needed
$ ip link del eth0.100
$ tc qdisc del dev eth1 root
$ tc qdisc del dev eth0 root
net eth0: Prev FIFO2 is shaped
net eth0: set FIFO3 bw = 0
net eth0: set FIFO2 bw = 0
$ ethtool -L eth0 rx 1 tx 1
Example 2: Two port tx AVB configuration scheme for target board
----------------------------------------------------------------
(prints and scheme for AM572x evm, for dual emac boards only)
::
+------------------------------------------------------------------+ u
| +----------+ +----------+ +------+ +----------+ +----------+ | s
| | | | | | | | | | | | e
| | App 1 | | App 2 | | Apps | | App 3 | | App 4 | | r
| | Class A | | Class B | | Rest | | Class B | | Class A | |
| | Eth0 | | Eth0 | | | | | Eth1 | | Eth1 | | s
| | VLAN100 | | VLAN100 | | | | | VLAN100 | | VLAN100 | | p
| | 40 Mb/s | | 20 Mb/s | | | | | 10 Mb/s | | 30 Mb/s | | a
| | SO_PRI=3 | | SO_PRI=2 | | | | | SO_PRI=3 | | SO_PRI=2 | | c
| | | | | | | | | | | | | | | | | e
| +---|------+ +---|------+ +---|--+ +---|------+ +---|------+ |
+-----|-------------|-------------|---------|-------------|--------+
+-+ +-------+ | +----------+ +----+
| | +-------+------+ | |
| | | | | |
+---|-------|-------------|--------------|-------------|-------|---+
| +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
| | p3 | | p2 | | p1 | | p0 | | p0 | | p1 | | p2 | | p3 | | k
| \ / \ / \ / \ / \ / \ / \ / \ / | e
| \ / \ / \ / \ / \ / \ / \ / \ / | r
| \/ \/ \/ \/ \/ \/ \/ \/ | n
| | | | | | | | e
| | | +----+ +----+ | | | l
| | | | | | | |
| +----+ +----+ +----+ +----+ +----+ +----+ | s
| |tc0 | |tc1 | |tc2 | |tc2 | |tc1 | |tc0 | | p
| \ / \ / \ / \ / \ / \ / | a
| \ / \ / \ / \ / \ / \ / | c
| \/ \/ \/ \/ \/ \/ | e
| | | +-----+ +-----+ | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | E E | | | | |
| +----+ +----+ +----+ +----+ t t +----+ +----+ +----+ +----+ |
| |txq0| |txq1| |txq4| |txq5| h h |txq6| |txq7| |txq3| |txq2| |
| \ / \ / \ / \ / 0 1 \ / \ / \ / \ / |
| \ / \ / \ / \ / . . \ / \ / \ / \ / |
| \/ \/ \/ \/ 1 1 \/ \/ \/ \/ |
| +-|------|------|------|--+ 0 0 +-|------|------|------|--+ |
| | | | | | | 0 0 | | | | | | |
+---|------|------|------|---------------|------|------|------|----+
| | | | | | | |
p p p p p p p p
3 2 0-1, 4-7 <-L2 pri-> 0-1, 4-7 2 3
| | | | | | | |
| | | | | | | |
+---|------|------|------|---------------|------|------|------|----+
| | | | | | | | | |
| +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
| |dma7| |dma6| |dma3| |dma2| |dma1| |dma0| |dma4| |dma5| |
| \ / \ / \ / \ / \ / \ / \ / \ / | c
| \S / \S / \ / \ / \ / \ / \S / \S / | p
| \/ \/ \/ \/ \/ \/ \/ \/ | s
| | | | +----- | | | | | w
| | | | | +----+ | | | |
| | | | | | | | | | d
| +----+ +----+ +----+p p+----+ +----+ +----+ | r
| | | | | | |o o| | | | | | | i
| | f3 | | f2 | | f0 |r CPSW r| f3 | | f2 | | f0 | | v
| |tc0 | |tc1 | |tc2 |t t|tc0 | |tc1 | |tc2 | | e
| \CBS / \CBS / \CBS /1 2\CBS / \CBS / \CBS / | r
| \S / \S / \ / \S / \S / \ / |
| \/ \/ \/ \/ \/ \/ |
+------------------------------------------------------------------+
========================================Eth==========================>
1) ::
// Add 8 tx queues, for interface Eth0, but they are common, so are accessed
// by two interfaces Eth0 and Eth1.
$ ethtool -L eth1 rx 1 tx 8
rx unmodified, ignoring
2) ::
// Check if num of queues is set correctly:
$ ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 8
TX: 8
Other: 0
Combined: 0
Current hardware settings:
RX: 1
TX: 8
Other: 0
Combined: 0
3) ::
// TX queues must be rated starting from 0, so set bws for tx0 and tx1 for Eth0
// and for tx2 and tx3 for Eth1. That is, rates 40 and 20 Mb/s appropriately
// for Eth0 and 30 and 10 Mb/s for Eth1.
// Real speed can differ a bit due to discreetness
// Leave last 4 tx queues as not rated
$ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
$ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
$ echo 30 > /sys/class/net/eth1/queues/tx-2/tx_maxrate
$ echo 10 > /sys/class/net/eth1/queues/tx-3/tx_maxrate
4) ::
// Check maximum rate of tx (cpdma) queues:
$ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
40
20
30
10
0
0
0
0
5) ::
// Map skb->priority to traffic class for Eth0:
// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
// Map traffic class to transmit queue:
// tc0 -> txq0, tc1 -> txq1, tc2 -> (txq4, txq5)
$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@4 hw 1
6) ::
// Check classes settings
$ tc -g class show dev eth0
+---(100:ffe2) mqprio
| +---(100:5) mqprio
| +---(100:6) mqprio
|
+---(100:ffe1) mqprio
| +---(100:2) mqprio
|
+---(100:ffe0) mqprio
+---(100:1) mqprio
7) ::
// Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc for Eth0
// here only idle slope is important, others ignored
// Real speed can differ a bit due to discreetness
$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1470 \
hicredit 62 sendslope -959000 idleslope 41000 offload 1
net eth0: set FIFO3 bw = 50
8) ::
// Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc for Eth0
$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1470 \
hicredit 65 sendslope -979000 idleslope 21000 offload 1
net eth0: set FIFO2 bw = 30
9) ::
// Create vlan 100 to map sk->priority to vlan qos for Eth0
$ ip link add link eth0 name eth0.100 type vlan id 100
net eth0: Adding vlanid 100 to vlan filter
10) ::
// Map skb->priority to L2 prio for Eth0.100, one to one
$ ip link set eth0.100 type vlan \
egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
11) ::
// Check egress map for vlan 100
$ cat /proc/net/vlan/eth0.100
[...]
INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
12) ::
// Map skb->priority to traffic class for Eth1:
// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
// Map traffic class to transmit queue:
// tc0 -> txq2, tc1 -> txq3, tc2 -> (txq6, txq7)
$ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@2 1@3 2@6 hw 1
13) ::
// Check classes settings
$ tc -g class show dev eth1
+---(100:ffe2) mqprio
| +---(100:7) mqprio
| +---(100:8) mqprio
|
+---(100:ffe1) mqprio
| +---(100:4) mqprio
|
+---(100:ffe0) mqprio
+---(100:3) mqprio
14) ::
// Set rate for class A - 31 Mbit (tc0, txq2) using CBS Qdisc for Eth1
// here only idle slope is important, others ignored, but calculated
// for interface speed - 100Mb for eth1 port.
// Set it +1 Mb for reserve (important!)
$ tc qdisc add dev eth1 parent 100:3 cbs locredit -1035 \
hicredit 465 sendslope -69000 idleslope 31000 offload 1
net eth1: set FIFO3 bw = 31
15) ::
// Set rate for class B - 11 Mbit (tc1, txq3) using CBS Qdisc for Eth1
// Set it +1 Mb for reserve (important!)
$ tc qdisc add dev eth1 parent 100:4 cbs locredit -1335 \
hicredit 405 sendslope -89000 idleslope 11000 offload 1
net eth1: set FIFO2 bw = 11
16) ::
// Create vlan 100 to map sk->priority to vlan qos for Eth1
$ ip link add link eth1 name eth1.100 type vlan id 100
net eth1: Adding vlanid 100 to vlan filter
17) ::
// Map skb->priority to L2 prio for Eth1.100, one to one
$ ip link set eth1.100 type vlan \
egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
18) ::
// Check egress map for vlan 100
$ cat /proc/net/vlan/eth1.100
[...]
INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
19) ::
// Run appropriate tools with socket option "SO_PRIORITY" to 3
// for class A and to 2 for class B. For both interfaces
./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p2 -s 1500&
./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p3 -s 1500&
20) ::
// run your listener on workstation (should be in same vlan)
// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39012 kbps
Receiving data rate: 39000 kbps
21) ::
// Restore default configuration if needed
$ ip link del eth1.100
$ ip link del eth0.100
$ tc qdisc del dev eth1 root
net eth1: Prev FIFO2 is shaped
net eth1: set FIFO3 bw = 0
net eth1: set FIFO2 bw = 0
$ tc qdisc del dev eth0 root
net eth0: Prev FIFO2 is shaped
net eth0: set FIFO3 bw = 0
net eth0: set FIFO2 bw = 0
$ ethtool -L eth0 rx 1 tx 1

View File

@@ -0,0 +1,242 @@
.. SPDX-License-Identifier: GPL-2.0
======================================================
Texas Instruments CPSW switchdev based ethernet driver
======================================================
:Version: 2.0
Port renaming
=============
On older udev versions renaming of ethX to swXpY will not be automatically
supported
In order to rename via udev::
ip -d link show dev sw0p1 | grep switchid
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}==<switchid>, \
ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
Dual mac mode
=============
- The new (cpsw_new.c) driver is operating in dual-emac mode by default, thus
working as 2 individual network interfaces. Main differences from legacy CPSW
driver are:
- optimized promiscuous mode: The P0_UNI_FLOOD (both ports) is enabled in
addition to ALLMULTI (current port) instead of ALE_BYPASS.
So, Ports in promiscuous mode will keep possibility of mcast and vlan
filtering, which is provides significant benefits when ports are joined
to the same bridge, but without enabling "switch" mode, or to different
bridges.
- learning disabled on ports as it make not too much sense for
segregated ports - no forwarding in HW.
- enabled basic support for devlink.
::
devlink dev show
platform/48484000.switch
devlink dev param show
platform/48484000.switch:
name switch_mode type driver-specific
values:
cmode runtime value false
name ale_bypass type driver-specific
values:
cmode runtime value false
Devlink configuration parameters
================================
See Documentation/networking/devlink/ti-cpsw-switch.rst
Bridging in dual mac mode
=========================
The dual_mac mode requires two vids to be reserved for internal purposes,
which, by default, equal CPSW Port numbers. As result, bridge has to be
configured in vlan unaware mode or default_pvid has to be adjusted::
ip link add name br0 type bridge
ip link set dev br0 type bridge vlan_filtering 0
echo 0 > /sys/class/net/br0/bridge/default_pvid
ip link set dev sw0p1 master br0
ip link set dev sw0p2 master br0
or::
ip link add name br0 type bridge
ip link set dev br0 type bridge vlan_filtering 0
echo 100 > /sys/class/net/br0/bridge/default_pvid
ip link set dev br0 type bridge vlan_filtering 1
ip link set dev sw0p1 master br0
ip link set dev sw0p2 master br0
Enabling "switch"
=================
The Switch mode can be enabled by configuring devlink driver parameter
"switch_mode" to 1/true::
devlink dev param set platform/48484000.switch \
name switch_mode value 1 cmode runtime
This can be done regardless of the state of Port's netdev devices - UP/DOWN, but
Port's netdev devices have to be in UP before joining to the bridge to avoid
overwriting of bridge configuration as CPSW switch driver copletly reloads its
configuration when first Port changes its state to UP.
When the both interfaces joined the bridge - CPSW switch driver will enable
marking packets with offload_fwd_mark flag unless "ale_bypass=0"
All configuration is implemented via switchdev API.
Bridge setup
============
::
devlink dev param set platform/48484000.switch \
name switch_mode value 1 cmode runtime
ip link add name br0 type bridge
ip link set dev br0 type bridge ageing_time 1000
ip link set dev sw0p1 up
ip link set dev sw0p2 up
ip link set dev sw0p1 master br0
ip link set dev sw0p2 master br0
[*] bridge vlan add dev br0 vid 1 pvid untagged self
[*] if vlan_filtering=1. where default_pvid=1
Note. Steps [*] are mandatory.
On/off STP
==========
::
ip link set dev BRDEV type bridge stp_state 1/0
VLAN configuration
==================
::
bridge vlan add dev br0 vid 1 pvid untagged self <---- add cpu port to VLAN 1
Note. This step is mandatory for bridge/default_pvid.
Add extra VLANs
===============
1. untagged::
bridge vlan add dev sw0p1 vid 100 pvid untagged master
bridge vlan add dev sw0p2 vid 100 pvid untagged master
bridge vlan add dev br0 vid 100 pvid untagged self <---- Add cpu port to VLAN100
2. tagged::
bridge vlan add dev sw0p1 vid 100 master
bridge vlan add dev sw0p2 vid 100 master
bridge vlan add dev br0 vid 100 pvid tagged self <---- Add cpu port to VLAN100
FDBs
----
FDBs are automatically added on the appropriate switch port upon detection
Manually adding FDBs::
bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master <---- Add on all VLANs
MDBs
----
MDBs are automatically added on the appropriate switch port upon detection
Manually adding MDBs::
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent <---- Add on all VLANs
Multicast flooding
==================
CPU port mcast_flooding is always on
Turning flooding on/off on swithch ports:
bridge link set dev sw0p1 mcast_flood on/off
Access and Trunk port
=====================
::
bridge vlan add dev sw0p1 vid 100 pvid untagged master
bridge vlan add dev sw0p2 vid 100 master
bridge vlan add dev br0 vid 100 self
ip link add link br0 name br0.100 type vlan id 100
Note. Setting PVID on Bridge device itself working only for
default VLAN (default_pvid).
NFS
===
The only way for NFS to work is by chrooting to a minimal environment when
switch configuration that will affect connectivity is needed.
Assuming you are booting NFS with eth1 interface(the script is hacky and
it's just there to prove NFS is doable).
setup.sh::
#!/bin/sh
mkdir proc
mount -t proc none /proc
ifconfig br0 > /dev/null
if [ $? -ne 0 ]; then
echo "Setting up bridge"
ip link add name br0 type bridge
ip link set dev br0 type bridge ageing_time 1000
ip link set dev br0 type bridge vlan_filtering 1
ip link set eth1 down
ip link set eth1 name sw0p1
ip link set dev sw0p1 up
ip link set dev sw0p2 up
ip link set dev sw0p2 master br0
ip link set dev sw0p1 master br0
bridge vlan add dev br0 vid 1 pvid untagged self
ifconfig sw0p1 0.0.0.0
udhchc -i br0
fi
umount /proc
run_nfs.sh:::
#!/bin/sh
mkdir /tmp/root/bin -p
mkdir /tmp/root/lib -p
cp -r /lib/ /tmp/root/
cp -r /bin/ /tmp/root/
cp /sbin/ip /tmp/root/bin
cp /sbin/bridge /tmp/root/bin
cp /sbin/ifconfig /tmp/root/bin
cp /sbin/udhcpc /tmp/root/bin
cp /path/to/setup.sh /tmp/root/bin
chroot /tmp/root/ busybox sh /bin/setup.sh
run ./run_nfs.sh

View File

@@ -0,0 +1,140 @@
.. SPDX-License-Identifier: GPL-2.0
=====================
TLAN driver for Linux
=====================
:Version: 1.14a
(C) 1997-1998 Caldera, Inc.
(C) 1998 James Banks
(C) 1999-2001 Torben Mathiasen <tmm@image.dk, torben.mathiasen@compaq.com>
For driver information/updates visit http://www.compaq.com
I. Supported Devices
====================
Only PCI devices will work with this driver.
Supported:
========= ========= ===========================================
Vendor ID Device ID Name
========= ========= ===========================================
0e11 ae32 Compaq Netelligent 10/100 TX PCI UTP
0e11 ae34 Compaq Netelligent 10 T PCI UTP
0e11 ae35 Compaq Integrated NetFlex 3/P
0e11 ae40 Compaq Netelligent Dual 10/100 TX PCI UTP
0e11 ae43 Compaq Netelligent Integrated 10/100 TX UTP
0e11 b011 Compaq Netelligent 10/100 TX Embedded UTP
0e11 b012 Compaq Netelligent 10 T/2 PCI UTP/Coax
0e11 b030 Compaq Netelligent 10/100 TX UTP
0e11 f130 Compaq NetFlex 3/P
0e11 f150 Compaq NetFlex 3/P
108d 0012 Olicom OC-2325
108d 0013 Olicom OC-2183
108d 0014 Olicom OC-2326
========= ========= ===========================================
Caveats:
I am not sure if 100BaseTX daughterboards (for those cards which
support such things) will work. I haven't had any solid evidence
either way.
However, if a card supports 100BaseTx without requiring an add
on daughterboard, it should work with 100BaseTx.
The "Netelligent 10 T/2 PCI UTP/Coax" (b012) device is untested,
but I do not expect any problems.
II. Driver Options
==================
1. You can append debug=x to the end of the insmod line to get
debug messages, where x is a bit field where the bits mean
the following:
==== =====================================
0x01 Turn on general debugging messages.
0x02 Turn on receive debugging messages.
0x04 Turn on transmit debugging messages.
0x08 Turn on list debugging messages.
==== =====================================
2. You can append aui=1 to the end of the insmod line to cause
the adapter to use the AUI interface instead of the 10 Base T
interface. This is also what to do if you want to use the BNC
connector on a TLAN based device. (Setting this option on a
device that does not have an AUI/BNC connector will probably
cause it to not function correctly.)
3. You can set duplex=1 to force half duplex, and duplex=2 to
force full duplex.
4. You can set speed=10 to force 10Mbs operation, and speed=100
to force 100Mbs operation. (I'm not sure what will happen
if a card which only supports 10Mbs is forced into 100Mbs
mode.)
5. You have to use speed=X duplex=Y together now. If you just
do "insmod tlan.o speed=100" the driver will do Auto-Neg.
To force a 10Mbps Half-Duplex link do "insmod tlan.o speed=10
duplex=1".
6. If the driver is built into the kernel, you can use the 3rd
and 4th parameters to set aui and debug respectively. For
example::
ether=0,0,0x1,0x7,eth0
This sets aui to 0x1 and debug to 0x7, assuming eth0 is a
supported TLAN device.
The bits in the third byte are assigned as follows:
==== ===============
0x01 aui
0x02 use half duplex
0x04 use full duplex
0x08 use 10BaseT
0x10 use 100BaseTx
==== ===============
You also need to set both speed and duplex settings when forcing
speeds with kernel-parameters.
ether=0,0,0x12,0,eth0 will force link to 100Mbps Half-Duplex.
7. If you have more than one tlan adapter in your system, you can
use the above options on a per adapter basis. To force a 100Mbit/HD
link with your eth1 adapter use::
insmod tlan speed=0,100 duplex=0,1
Now eth0 will use auto-neg and eth1 will be forced to 100Mbit/HD.
Note that the tlan driver supports a maximum of 8 adapters.
III. Things to try if you have problems
=======================================
1. Make sure your card's PCI id is among those listed in
section I, above.
2. Make sure routing is correct.
3. Try forcing different speed/duplex settings
There is also a tlan mailing list which you can join by sending "subscribe tlan"
in the body of an email to majordomo@vuser.vu.union.edu.
There is also a tlan website at http://www.compaq.com

View File

@@ -0,0 +1,202 @@
.. SPDX-License-Identifier: GPL-2.0
===========================
The Spidernet Device Driver
===========================
Written by Linas Vepstas <linas@austin.ibm.com>
Version of 7 June 2007
Abstract
========
This document sketches the structure of portions of the spidernet
device driver in the Linux kernel tree. The spidernet is a gigabit
ethernet device built into the Toshiba southbridge commonly used
in the SONY Playstation 3 and the IBM QS20 Cell blade.
The Structure of the RX Ring.
=============================
The receive (RX) ring is a circular linked list of RX descriptors,
together with three pointers into the ring that are used to manage its
contents.
The elements of the ring are called "descriptors" or "descrs"; they
describe the received data. This includes a pointer to a buffer
containing the received data, the buffer size, and various status bits.
There are three primary states that a descriptor can be in: "empty",
"full" and "not-in-use". An "empty" or "ready" descriptor is ready
to receive data from the hardware. A "full" descriptor has data in it,
and is waiting to be emptied and processed by the OS. A "not-in-use"
descriptor is neither empty or full; it is simply not ready. It may
not even have a data buffer in it, or is otherwise unusable.
During normal operation, on device startup, the OS (specifically, the
spidernet device driver) allocates a set of RX descriptors and RX
buffers. These are all marked "empty", ready to receive data. This
ring is handed off to the hardware, which sequentially fills in the
buffers, and marks them "full". The OS follows up, taking the full
buffers, processing them, and re-marking them empty.
This filling and emptying is managed by three pointers, the "head"
and "tail" pointers, managed by the OS, and a hardware current
descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
currently being filled. When this descr is filled, the hardware
marks it full, and advances the GDACTDPA by one. Thus, when there is
flowing RX traffic, every descr behind it should be marked "full",
and everything in front of it should be "empty". If the hardware
discovers that the current descr is not empty, it will signal an
interrupt, and halt processing.
The tail pointer tails or trails the hardware pointer. When the
hardware is ahead, the tail pointer will be pointing at a "full"
descr. The OS will process this descr, and then mark it "not-in-use",
and advance the tail pointer. Thus, when there is flowing RX traffic,
all of the descrs in front of the tail pointer should be "full", and
all of those behind it should be "not-in-use". When RX traffic is not
flowing, then the tail pointer can catch up to the hardware pointer.
The OS will then note that the current tail is "empty", and halt
processing.
The head pointer (somewhat mis-named) follows after the tail pointer.
When traffic is flowing, then the head pointer will be pointing at
a "not-in-use" descr. The OS will perform various housekeeping duties
on this descr. This includes allocating a new data buffer and
dma-mapping it so as to make it visible to the hardware. The OS will
then mark the descr as "empty", ready to receive data. Thus, when there
is flowing RX traffic, everything in front of the head pointer should
be "not-in-use", and everything behind it should be "empty". If no
RX traffic is flowing, then the head pointer can catch up to the tail
pointer, at which point the OS will notice that the head descr is
"empty", and it will halt processing.
Thus, in an idle system, the GDACTDPA, tail and head pointers will
all be pointing at the same descr, which should be "empty". All of the
other descrs in the ring should be "empty" as well.
The show_rx_chain() routine will print out the locations of the
GDACTDPA, tail and head pointers. It will also summarize the contents
of the ring, starting at the tail pointer, and listing the status
of the descrs that follow.
A typical example of the output, for a nearly idle system, might be::
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=20
net eth1: Chain head is at 20
net eth1: HW curr desc (GDACTDPA) is at 21
net eth1: Have 1 descrs with stat=x40800101
net eth1: HW next desc (GDACNEXTDA) is at 22
net eth1: Last 255 descrs with stat=xa0800000
In the above, the hardware has filled in one descr, number 20. Both
head and tail are pointing at 20, because it has not yet been emptied.
Meanwhile, hw is pointing at 21, which is free.
The "Have nnn decrs" refers to the descr starting at the tail: in this
case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
to all of the rest of the descrs, from the last status change. The "nnn"
is a count of how many descrs have exactly the same status.
The status x4... corresponds to "full" and status xa... corresponds
to "empty". The actual value printed is RXCOMST_A.
In the device driver source code, a different set of names are
used for these same concepts, so that::
"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
The RX RAM full bug/feature
===========================
As long as the OS can empty out the RX buffers at a rate faster than
the hardware can fill them, there is no problem. If, for some reason,
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
pointer will catch up to the head, notice the not-empty condition,
ad stop. However, RX packets may still continue arriving on the wire.
The spidernet chip can save some limited number of these in local RAM.
When this local ram fills up, the spider chip will issue an interrupt
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
will be set in GHIINT1STS). When the RX ram full condition occurs,
a certain bug/feature is triggered that has to be specially handled.
This section describes the special handling for this condition.
When the OS finally has a chance to run, it will empty out the RX ring.
In particular, it will clear the descriptor on which the hardware had
stopped. However, once the hardware has decided that a certain
descriptor is invalid, it will not restart at that descriptor; instead
it will restart at the next descr. This potentially will lead to a
deadlock condition, as the tail pointer will be pointing at this descr,
which, from the OS point of view, is empty; the OS will be waiting for
this descr to be filled. However, the hardware has skipped this descr,
and is filling the next descrs. Since the OS doesn't see this, there
is a potential deadlock, with the OS waiting for one descr to fill,
while the hardware is waiting for a different set of descrs to become
empty.
A call to show_rx_chain() at this point indicates the nature of the
problem. A typical print when the network is hung shows the following::
net eth1: Spider RX RAM full, incoming packets might be discarded!
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=255
net eth1: Chain head is at 255
net eth1: HW curr desc (GDACTDPA) is at 0
net eth1: Have 1 descrs with stat=xa0800000
net eth1: HW next desc (GDACNEXTDA) is at 1
net eth1: Have 127 descrs with stat=x40800101
net eth1: Have 1 descrs with stat=x40800001
net eth1: Have 126 descrs with stat=x40800101
net eth1: Last 1 descrs with stat=xa0800000
Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
is nothing to be done. In particular, there is the implicit assumption
that everything in front of the "empty" descr must surely also be empty,
as explained in the last section. The OS is waiting for descr 255 to
become non-empty, which, in this case, will never happen.
The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
Since its already full, the hardware can do nothing more, and thus has
halted processing. Notice that descrs 0 through 254 are all marked
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
descr 254, since tail was at 255.) Thus, the system is deadlocked,
and there can be no forward progress; the OS thinks there's nothing
to do, and the hardware has nowhere to put incoming data.
This bug/feature is worked around with the spider_net_resync_head_ptr()
routine. When the driver receives RX interrupts, but an examination
of the RX chain seems to show it is empty, then it is probable that
the hardware has skipped a descr or two (sometimes dozens under heavy
network conditions). The spider_net_resync_head_ptr() subroutine will
search the ring for the next full descr, and the driver will resume
operations there. Since this will leave "holes" in the ring, there
is also a spider_net_resync_tail_ptr() that will skip over such holes.
As of this writing, the spider_net_resync() strategy seems to work very
well, even under heavy network loads.
The TX ring
===========
The TX ring uses a low-watermark interrupt scheme to make sure that
the TX queue is appropriately serviced for large packet sizes.
For packet sizes greater than about 1KBytes, the kernel can fill
the TX ring quicker than the device can drain it. Once the ring
is full, the netdev is stopped. When there is room in the ring,
the netdev needs to be reawakened, so that more TX packets are placed
in the ring. The hardware can empty the ring about four times per jiffy,
so its not appropriate to wait for the poll routine to refill, since
the poll routine runs only once per jiffy. The low-watermark mechanism
marks a descr about 1/4th of the way from the bottom of the queue, so
that an interrupt is generated when the descr is processed. This
interrupt wakes up the netdev, which can then refill the queue.
For large packets, this mechanism generates a relatively small number
of interrupts, about 1K/sec. For smaller packets, this will drop to zero
interrupts, as the hardware can empty the queue faster than the kernel
can fill it.