[PATCH] A new 10GB Ethernet Driver by Chelsio Communications
A Linux driver for the Chelsio 10Gb Ethernet Network Controller by Chelsio (http://www.chelsio.com). This driver supports the Chelsio N210 NIC and is backward compatible with the Chelsio N110 model 10Gb NICs. It supports AMD64, EM64T and x86 systems. Signed-off-by: Tina Yang <tinay@chelsio.com> Signed-off-by: Scott Bardone <sbardone@chelsio.com> Signed-off-by: Christoph Lameter <christoph@lameter.com> Adrian said: - my3126.c is unused (because t1_my3126_ops isn't used anywhere) - what are the EXTRA_CFLAGS in drivers/net/chelsio/Makefile for? - $(cxgb-y) in drivers/net/chelsio/Makefile seems to be unneeded - completely unused global functions: - espi.c: t1_espi_get_intr_counts - sge.c: t1_sge_get_intr_counts - the following functions can be made static: - sge.c: t1_espi_workaround - sge.c: t1_sge_tx - subr.c: __t1_tpi_read - subr.c: __t1_tpi_write - subr.c: t1_wait_op_done shemminger said: The performance recommendations in cxgb.txt are common to all fast devices, and should be in one file rather than just for this device. I would rather see ip-sysctl.txt updated or a new file on tuning recommendations started. Some of them have consequences that aren't documented well. For example, turning off TCP timestamps risks data corruption from sequence wrap. A new driver shouldn't need so may #ifdef's unless you want to putit on older vendor versions of 2.4 Some accessor and wrapper functions like: t1_pci_read_config_4 adapter_name t1_malloc are just annoying noise. Why have useless dead code like: /* Interrupt handler */ +static int pm3393_interrupt_handler(struct cmac *cmac) +{ + u32 master_intr_status; +/* + 1. Read master interrupt register. + 2. Read BLOCK's interrupt status registers. + 3. Handle BLOCK interrupts. +*/ Jeff said: step 1: kill all the OS wrappers. And do you really need hooks for multiple MACs, when only one MAC is really supported? Typically these hooks are at a higher level anyway -- struct net_device. From: Christoph Lameter <christoph@lameter Driver modified as suggested by Pekka Enberg, Stephen Hemminger and Andrian Bunk. Reduces the size of the driver to ~260k. - clean up tabs - removed my3126.c - removed 85% of suni1x10gexp_regs.h - removed 80% of regs.h - removed various calls, renamed variables/functions. - removed system specific and other wrappers (usleep, msleep) - removed dead code - dropped redundant casts in osdep.h - dropped redundant check of kfree - dropped weird code (MODVERSIONS stuff) - reduced number of #ifdefs - use kcalloc now instead of kmalloc - Add information about known issues with the driver - Add information about authors Signed-off-by: Scott Bardone <sbardone@chelsio.com> Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> diff -puN /dev/null Documentation/networking/cxgb.txt
This commit is contained in:

committed by
Jeff Garzik

parent
88d7bd8cb9
commit
8199d3a79c
322
Documentation/networking/cxgb.txt
Normal file
322
Documentation/networking/cxgb.txt
Normal file
@@ -0,0 +1,322 @@
|
||||
Chelsio N210 10Gb Ethernet Network Controller
|
||||
|
||||
Driver Release Notes for Linux
|
||||
|
||||
Version 2.1.0
|
||||
|
||||
March 8, 2005
|
||||
|
||||
CONTENTS
|
||||
========
|
||||
INTRODUCTION
|
||||
FEATURES
|
||||
PERFORMANCE
|
||||
DRIVER MESSAGES
|
||||
KNOWN ISSUES
|
||||
SUPPORT
|
||||
|
||||
|
||||
INTRODUCTION
|
||||
============
|
||||
|
||||
This document describes the Linux driver for Chelsio 10Gb Ethernet Network
|
||||
Controller. This driver supports the Chelsio N210 NIC and is backward
|
||||
compatible with the Chelsio N110 model 10Gb NICs. This driver supports AMD64
|
||||
and EM64T, and x86 systems.
|
||||
|
||||
|
||||
FEATURES
|
||||
========
|
||||
|
||||
Adaptive Interrupts (adaptive-rx)
|
||||
---------------------------------
|
||||
|
||||
This feature provides an adaptive algorithm that adjusts the interrupt
|
||||
coalescing parameters, allowing the driver to dynamically adapt the latency
|
||||
settings to achieve the highest performance during various types of network
|
||||
load.
|
||||
|
||||
The interface used to control this feature is ethtool. Please see the
|
||||
ethtool manpage for additional usage information.
|
||||
|
||||
By default, adaptive-rx is disabled.
|
||||
To enable adaptive-rx:
|
||||
|
||||
ethtool -C <interface> adaptive-rx on
|
||||
|
||||
To disable adaptive-rx, use ethtool:
|
||||
|
||||
ethtool -C <interface> adaptive-rx off
|
||||
|
||||
After disabling adaptive-rx, the timer latency value will be set to 50us.
|
||||
You may set the timer latency after disabling adaptive-rx:
|
||||
|
||||
ethtool -C <interface> rx-usecs <microseconds>
|
||||
|
||||
An example to set the timer latency value to 100us on eth0:
|
||||
|
||||
ethtool -C eth0 rx-usecs 100
|
||||
|
||||
You may also provide a timer latency value while disabling adpative-rx:
|
||||
|
||||
ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
|
||||
|
||||
If adaptive-rx is disabled and a timer latency value is specified, the timer
|
||||
will be set to the specified value until changed by the user or until
|
||||
adaptive-rx is enabled.
|
||||
|
||||
To view the status of the adaptive-rx and timer latency values:
|
||||
|
||||
ethtool -c <interface>
|
||||
|
||||
|
||||
TCP Segmentation Offloading (TSO) Support
|
||||
-----------------------------------------
|
||||
|
||||
This feature, also known as "large send", enables a system's protocol stack
|
||||
to offload portions of outbound TCP processing to a network interface card
|
||||
thereby reducing system CPU utilization and enhancing performance.
|
||||
|
||||
The interface used to control this feature is ethtool version 1.8 or higher.
|
||||
Please see the ethtool manpage for additional usage information.
|
||||
|
||||
By default, TSO is enabled.
|
||||
To disable TSO:
|
||||
|
||||
ethtool -K <interface> tso off
|
||||
|
||||
To enable TSO:
|
||||
|
||||
ethtool -K <interface> tso on
|
||||
|
||||
To view the status of TSO:
|
||||
|
||||
ethtool -k <interface>
|
||||
|
||||
|
||||
PERFORMANCE
|
||||
===========
|
||||
|
||||
The following information is provided as an example of how to change system
|
||||
parameters for "performance tuning" an what value to use. You may or may not
|
||||
want to change these system parameters, depending on your server/workstation
|
||||
application. Doing so is not warranted in any way by Chelsio Communications,
|
||||
and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
|
||||
of data or damage to equipment.
|
||||
|
||||
Your distribution may have a different way of doing things, or you may prefer
|
||||
a different method. These commands are shown only to provide an example of
|
||||
what to do and are by no means definitive.
|
||||
|
||||
Making any of the following system changes will only last until you reboot
|
||||
your system. You may want to write a script that runs at boot-up which
|
||||
includes the optimal settings for your system.
|
||||
|
||||
Setting PCI Latency Timer:
|
||||
setpci -d 1425:* 0x0c.l=0x0000F800
|
||||
|
||||
Disabling TCP timestamp:
|
||||
sysctl -w net.ipv4.tcp_timestamps=0
|
||||
|
||||
Disabling SACK:
|
||||
sysctl -w net.ipv4.tcp_sack=0
|
||||
|
||||
Setting TCP read buffers (min/default/max):
|
||||
sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
|
||||
|
||||
Setting TCP write buffers (min/pressure/max):
|
||||
sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
|
||||
|
||||
Setting TCP buffer space (min/pressure/max):
|
||||
sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
|
||||
|
||||
Setting large number of incoming connection requests (2.6.x only):
|
||||
sysctl -w net.ipv4.tcp_max_syn_backlog=3000
|
||||
|
||||
Setting maximum receive socket buffer size:
|
||||
sysctl -w net.core.rmem_max=524287
|
||||
|
||||
Setting maximum send socket buffer size:
|
||||
sysctl -w net.core.wmem_max=524287
|
||||
|
||||
Setting default receive socket buffer size:
|
||||
sysctl -w net.core.rmem_default=524287
|
||||
|
||||
Setting default send socket buffer size:
|
||||
sysctl -w net.core.wmem_default=524287
|
||||
|
||||
Setting maximum option memory buffers:
|
||||
sysctl -w net.core.optmem_max=524287
|
||||
|
||||
Setting maximum backlog (# of unprocessed packets before kernel drops):
|
||||
sysctl -w net.core.netdev_max_backlog=300000
|
||||
|
||||
Set smp_affinity (on a multiprocessor system) to a single CPU:
|
||||
echo 00000001 > /proc/irq/<interrupt_number>/smp_affinity
|
||||
|
||||
TCP window size for single connections:
|
||||
The receive buffer (RX_WINDOW) size must be at least as large as the
|
||||
Bandwidth-Delay Product of the communication link between the sender and
|
||||
receiver. Due to the variations of RTT, you may want to increase the buffer
|
||||
size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
|
||||
"TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
|
||||
At 10Gb speeds, use the following formula:
|
||||
RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
|
||||
Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
|
||||
RX_WINDOW sizes of 256KB - 512KB should be sufficient.
|
||||
Setting the min, max, and default receive buffer (RX_WINDOW) size:
|
||||
sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
|
||||
|
||||
TCP window size for multiple connections:
|
||||
The receive buffer (RX_WINDOW) size may be calculated the same as single
|
||||
connections, but should be divided by the number of connections. The
|
||||
smaller window prevents congestion and facilitates better pacing,
|
||||
especially if/when MAC level flow control does not work well or when it is
|
||||
not supported on the machine. Experimentation may be necessary to attain
|
||||
the correct value. This method is provided as a starting point fot the
|
||||
correct receive buffer size.
|
||||
Setting the min, max, and default receive buffer (RX_WINDOW) size is
|
||||
performed in the same manner as single connection.
|
||||
|
||||
|
||||
DRIVER MESSAGES
|
||||
===============
|
||||
|
||||
The following messages are the most common messages logged by syslog. These
|
||||
may be found in /var/log/messages.
|
||||
|
||||
Driver up:
|
||||
Chelsio Network Driver - version 2.1.0
|
||||
|
||||
NIC detected:
|
||||
eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
|
||||
|
||||
Link up:
|
||||
eth#: link is up at 10 Gbps, full duplex
|
||||
|
||||
Link down:
|
||||
eth#: link is down
|
||||
|
||||
|
||||
KNOWN ISSUES
|
||||
============
|
||||
|
||||
These issues have been identified during testing. The following information
|
||||
is provided as a workaround to the problem. In some cases, this problem is
|
||||
inherent to Linux or to a particular Linux Distribution and/or hardware
|
||||
platform.
|
||||
|
||||
1. Large number of TCP retransmits on a multiprocessor (SMP) system.
|
||||
|
||||
On a system with multiple CPUs, the interrupt (IRQ) for the network
|
||||
controller may be bound to more than one CPU. This will cause TCP
|
||||
retransmits if the packet data were to be split across different CPUs
|
||||
and re-assembled in a different order than expected.
|
||||
|
||||
To eliminate the TCP retransmits, set smp_affinity on the particular
|
||||
interrupt to a single CPU. You can locate the interrupt (IRQ) used on
|
||||
the N110/N210 by using ifconfig:
|
||||
ifconfig <dev_name> | grep Interrupt
|
||||
Set the smp_affinity to a single CPU:
|
||||
echo 1 > /proc/irq/<interrupt_number>/smp_affinity
|
||||
|
||||
It is highly suggested that you do not run the irqbalance daemon on your
|
||||
system, as this will change any smp_affinity setting you have applied.
|
||||
The irqbalance daemon runs on a 10 second interval and binds interrupts
|
||||
to the least loaded CPU determined by the daemon. To disable this daemon:
|
||||
chkconfig --level 2345 irqbalance off
|
||||
|
||||
By default, some Linux distributions enable the kernel feature,
|
||||
irqbalance, which performs the same function as the daemon. To disable
|
||||
this feature, add the following line to your bootloader:
|
||||
noirqbalance
|
||||
|
||||
Example using the Grub bootloader:
|
||||
title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
|
||||
root (hd0,0)
|
||||
kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
|
||||
initrd /initrd-2.4.21-27.ELsmp.img
|
||||
|
||||
2. After running insmod, the driver is loaded and the incorrect network
|
||||
interface is brought up without running ifup.
|
||||
|
||||
When using 2.4.x kernels, including RHEL kernels, the Linux kernel
|
||||
invokes a script named "hotplug". This script is primarily used to
|
||||
automatically bring up USB devices when they are plugged in, however,
|
||||
the script also attempts to automatically bring up a network interface
|
||||
after loading the kernel module. The hotplug script does this by scanning
|
||||
the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
|
||||
for HWADDR=<mac_address>.
|
||||
|
||||
If the hotplug script does not find the HWADDRR within any of the
|
||||
ifcfg-eth# files, it will bring up the device with the next available
|
||||
interface name. If this interface is already configured for a different
|
||||
network card, your new interface will have incorrect IP address and
|
||||
network settings.
|
||||
|
||||
To solve this issue, you can add the HWADDR=<mac_address> key to the
|
||||
interface config file of your network controller.
|
||||
|
||||
To disable this "hotplug" feature, you may add the driver (module name)
|
||||
to the "blacklist" file located in /etc/hotplug. It has been noted that
|
||||
this does not work for network devices because the net.agent script
|
||||
does not use the blacklist file. Simply remove, or rename, the net.agent
|
||||
script located in /etc/hotplug to disable this feature.
|
||||
|
||||
3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
|
||||
on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
|
||||
|
||||
If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
|
||||
chipset, you may experience the "133-Mhz Mode Split Completion Data
|
||||
Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
|
||||
bus PCI-X bus.
|
||||
|
||||
AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
|
||||
can provide stale data via split completion cycles to a PCI-X card that
|
||||
is operating at 133 Mhz", causing data corruption.
|
||||
|
||||
AMD's provides three workarounds for this problem, however, Chelsio
|
||||
recommends the first option for best performance with this bug:
|
||||
|
||||
For 133Mhz secondary bus operation, limit the transaction length and
|
||||
the number of outstanding transactions, via BIOS configuration
|
||||
programming of the PCI-X card, to the following:
|
||||
|
||||
Data Length (bytes): 2k
|
||||
Total allowed outstanding transactions: 1
|
||||
|
||||
Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
|
||||
section 56, "133-MHz Mode Split Completion Data Corruption" for more
|
||||
details with this bug and workarounds suggested by AMD.
|
||||
|
||||
|
||||
SUPPORT
|
||||
=======
|
||||
|
||||
If you have problems with the software or hardware, please contact our
|
||||
customer support team via email at support@chelsio.com or check our website
|
||||
at http://www.chelsio.com
|
||||
|
||||
===============================================================================
|
||||
|
||||
Chelsio Communications
|
||||
370 San Aleso Ave.
|
||||
Suite 100
|
||||
Sunnyvale, CA 94085
|
||||
http://www.chelsio.com
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License, version 2, as
|
||||
published by the Free Software Foundation.
|
||||
|
||||
You should have received a copy of the GNU General Public License along
|
||||
with this program; if not, write to the Free Software Foundation, Inc.,
|
||||
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
|
||||
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
Copyright (c) 2003-2005 Chelsio Communications. All rights reserved.
|
||||
|
||||
===============================================================================
|
Reference in New Issue
Block a user