Commit Graph

1379 Commits

Author SHA1 Message Date
yupeng
8e2ea53a83 add snmp counters document
Add explainations for some general IP counters, SACK and DSACK related
counters

Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:50:14 -08:00
David Ahern
58956317c8 neighbor: Improve garbage collection
The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:03:10 -08:00
Jakub Kicinski
b255e500c8 net: documentation: build a directory structure for drivers
Documentation/networking/ is full of cryptically named files with
driver documentation.  This makes finding interesting information
at a glance really hard.  Move all those files into a directory
called device_drivers (since not all drivers are for device) and
fix up references.

RFC v0.1 -> RFC v1:
 - also add .txt suffix to the files which are missing it (Quentin)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Henrik Austad <henrik@austad.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05 11:30:06 -08:00
Shalom Toledo
846e980a87 devlink: Add 'fw_load_policy' generic parameter
Many drivers load the device's firmware image during the initialization
flow either from the flash or from the disk. Currently this option is not
controlled by the user and the driver decides from where to load the
firmware image.

'fw_load_policy' gives the ability to control this option which allows the
user to choose between different loading policies supported by the driver.

This parameter can be useful while testing and/or debugging the device. For
example, testing a firmware bug fix.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03 13:55:43 -08:00
yupeng
712ee16c23 add documents for snmp counters
Add explaination of below counters:
TcpExtTCPRcvCoalesce
TcpExtTCPAutoCorking
TcpExtTCPOrigDataSent
TCPSynRetrans
TCPFastOpenActiveFail
TcpExtListenOverflows
TcpExtListenDrops
TcpExtTCPHystartTrainDetect
TcpExtTCPHystartTrainCwnd
TcpExtTCPHystartDelayDetect
TcpExtTCPHystartDelayCwnd

Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27 15:39:37 -08:00
Jesse Brandeburg
09e58b2d53 docs-networking: fix typo in define
The #define for NETIF_F_GSO_UDP_L4 was incorrect in the
documentation, fix it by making it match the actual code.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21 10:30:30 -08:00
Shannon Nelson
b3c4d7c93e ixgbe: add ipsec hw offload note to ixgbe Documentation
Add a short note about using IPsec Hardware Offload with
the ixgbe driver.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21 09:39:38 -08:00
Will Deacon
806654a966 Documentation: Use "while" instead of "whilst"
Whilst making an unrelated change to some Documentation, Linus sayeth:

  | Afaik, even in Britain, "whilst" is unusual and considered more
  | formal, and "while" is the common word.
  |
  | [...]
  |
  | Can we just admit that we work with computers, and we don't need to
  | use þe eald Englisc spelling of words that most of the world never
  | uses?

dictionary.com refers to the word as "Chiefly British", which is
probably an undesirable attribute for technical documentation.

Replace all occurrences under Documentation/ with "while".

Cc: David Howells <dhowells@redhat.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-11-20 09:30:43 -07:00
David S. Miller
f2be6d710d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-19 10:55:00 -08:00
yupeng
80cc49507b net: Add part of TCP counts explanations in snmp_counters.rst
Add explanations of some generic TCP counters, fast open
related counters and TCP abort related counters and several
examples.

Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18 12:56:50 -08:00
David Howells
7150ceaacb rxrpc: Fix life check
The life-checking function, which is used by kAFS to make sure that a call
is still live in the event of a pending signal, only samples the received
packet serial number counter; it doesn't actually provoke a change in the
counter, rather relying on the server to happen to give us a packet in the
time window.

Fix this by adding a function to force a ping to be transmitted.

kAFS then keeps track of whether there's been a stall, and if so, uses the
new function to ping the server, resetting the timeout to allow the reply
to come back.

If there's a stall, a ping and the call is *still* stalled in the same
place after another period, then the call will be aborted.

Fixes: bc5e3a546d ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Fixes: f4d15fb6f9 ("rxrpc: Provide functions for allowing cleaner handling of signals")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:35:40 -08:00
Eric Dumazet
c73e5807e4 tcp: tsq: no longer use limit_output_bytes for paced flows
FQ pacing guarantees that paced packets queued by one flow do not
add head-of-line blocking for other flows.

After TCP GSO conversion, increasing limit_output_bytes to 1 MB is safe,
since this maps to 16 skbs at most in qdisc or device queues.
(or slightly more if some drivers lower {gso_max_segs|size})

We still can queue at most 1 ms worth of traffic (this can be scaled
by wifi drivers if they need to)

Tested:

# ethtool -c eth0 | egrep "tx-usecs:|tx-frames:" # 40 Gbit mlx4 NIC
tx-usecs: 16
tx-frames: 16
# tc qdisc replace dev eth0 root fq
# for f in {1..10};do netperf -P0 -H lpaa24,6 -o THROUGHPUT;done

Before patch:
27711
26118
27107
27377
27712
27388
27340
27117
27278
27509

After patch:
37434
36949
36658
36998
37711
37291
37605
36659
36544
37349

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 13:57:03 -08:00
yupeng
b08794a922 documentation of some IP/ICMP snmp counters
The snmp_counter.rst explains the meanings of snmp counters. It also
provides a set of experiments (only 1 for this initial patch),
combines the experiments' resutls and the snmp counters'
meanings. This is an initial path, only explains a part of IP/ICMP
counters and provide a simple ping test.

Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 09:59:02 -08:00
Mike Manning
6897445fb1 net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs
Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner
similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept
for datagram sockets. Have this default to enabled for reasons of
backwards compatibility. This is so as to specify the output device
with cmsg and IP_PKTINFO, but using a socket not bound to the
corresponding VRF. This allows e.g. older ping implementations to be
run with specifying the device but without executing it in the VRF.
If the option is disabled, packets received in a VRF context are only
handled by a raw socket bound to the VRF, and correspondingly packets
in the default VRF are only handled by a socket not bound to any VRF.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:38 -08:00
Robert Shearman
3c82a21f43 net: allow binding socket in a VRF when there's an unbound socket
Change the inet socket lookup to avoid packets arriving on a device
enslaved to an l3mdev from matching unbound sockets by removing the
wildcard for non sk_bound_dev_if and instead relying on check against
the secondary device index, which will be 0 when the input device is
not enslaved to an l3mdev and so match against an unbound socket and
not match when the input device is enslaved.

Change the socket binding to take the l3mdev into account to allow an
unbound socket to not conflict sockets bound to an l3mdev given the
datapath isolation now guaranteed.

Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:38 -08:00
Linus Torvalds
9a12efc5e0 Merge tag 'kbuild-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - clean-up leftovers in Kconfig files

 - remove stale oldnoconfig and silentoldconfig targets

 - remove unneeded cc-fullversion and cc-name variables

 - improve merge_config script to allow overriding option prefix

* tag 'kbuild-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: remove cc-name variable
  kbuild: replace cc-name test with CONFIG_CC_IS_CLANG
  merge_config.sh: Allow to define config prefix
  kbuild: remove unused cc-fullversion variable
  kconfig: remove silentoldconfig target
  kconfig: remove oldnoconfig target
  powerpc: PCI_MSI needs PCI
  powerpc: remove CONFIG_MCA leftovers
  powerpc: remove CONFIG_PCI_QSPAN
  scsi: aha152x: rename the PCMCIA define
2018-11-03 10:47:33 -07:00
Masahiro Yamada
0085b4191f kconfig: remove silentoldconfig target
As commit 911a91c39c ("kconfig: rename silentoldconfig to
syncconfig") announced, it is time for the removal.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-02 00:15:25 +09:00
Lorenzo Colitti
e2d00e62f2 Documentation: ip-sysctl.txt: Document tcp_fwmark_accept
This patch documents the tcp_fwmark_accept sysctl that was
added in 3.15.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29 20:41:52 -07:00
Linus Torvalds
01aa9d518e Merge tag 'docs-4.20' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
 "This is a fairly typical cycle for documentation. There's some welcome
  readability improvements for the formatted output, some LICENSES
  updates including the addition of the ISC license, the removal of the
  unloved and unmaintained 00-INDEX files, the deprecated APIs document
  from Kees, more MM docs from Mike Rapoport, and the usual pile of typo
  fixes and corrections"

* tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits)
  docs: Fix typos in histogram.rst
  docs: Introduce deprecated APIs list
  kernel-doc: fix declaration type determination
  doc: fix a typo in adding-syscalls.rst
  docs/admin-guide: memory-hotplug: remove table of contents
  doc: printk-formats: Remove bogus kobject references for device nodes
  Documentation: preempt-locking: Use better example
  dm flakey: Document "error_writes" feature
  docs/completion.txt: Fix a couple of punctuation nits
  LICENSES: Add ISC license text
  LICENSES: Add note to CDDL-1.0 license that it should not be used
  docs/core-api: memory-hotplug: add some details about locking internals
  docs/core-api: rename memory-hotplug-notifier to memory-hotplug
  docs: improve readability for people with poorer eyesight
  yama: clarify ptrace_scope=2 in Yama documentation
  docs/vm: split memory hotplug notifier description to Documentation/core-api
  docs: move memory hotplug description into admin-guide/mm
  doc: Fix acronym "FEKEK" in ecryptfs
  docs: fix some broken documentation references
  iommu: Fix passthrough option documentation
  ...
2018-10-24 18:01:11 +01:00
Jeff Kirsher
828092ef77 Documentation: intel: Convert to RST format
Now that the documents have been updated to conform to the reStructured Text
guidelines, we can now change the file extensions and update the other
related references.

This converts all of the Intel wired LAN driver documentation to *.rst.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:41:29 -07:00
Jeff Kirsher
f12a84a9f6 Documentation: fm10k: Add kernel documentation
Added the fm10k kernel documentation, which apparently was missing.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:39:39 -07:00
Jeff Kirsher
1fae869bcf Documentation: ice: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:38:17 -07:00
Jeff Kirsher
7bacc01d3e Documentation: iavf: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:36:14 -07:00
Jeff Kirsher
1e06edcc2f Documentation: i40e: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
2018-10-18 12:34:28 -07:00
Jeff Kirsher
63e2ea2f89 Documentation: ixgbevf: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:28:35 -07:00
Jeff Kirsher
4d256e4d8a Documentation: ixgbe: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:27:56 -07:00
Jeff Kirsher
413548de58 Documentation: igbvf: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:24:18 -07:00
Jeff Kirsher
cf673eee90 Documentation: igb: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:22:39 -07:00
Jeff Kirsher
b87e7f2468 Documentation: e1000e: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:21:05 -07:00
Jeff Kirsher
8d59045f11 Documentation: ixgb: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:17:32 -07:00
Jeff Kirsher
27642facf1 Documentation: e100, e1000: Add missing SPDX header
Add the SPDX-Lincense-Identifier to the Intel wired Ethernet *.rst
kernel documentation.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:15:27 -07:00
Corentin Labbe
c0e6f052f4 Documentation: networking: ixgb: Remove reference to IXGB_NAPI
NAPI is enabled by default and IXGB_NAPI was removed since
commit 6d37ab282e ("ixgb: make NAPI the only option and the default")
Update the doc accordingly.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-18 12:03:03 -07:00
Maciej W. Rozycki
61414f5ec9 FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter
Add support for the DEC FDDIcontroller 700 (DEFZA), Digital Equipment
Corporation's first-generation FDDI network interface adapter, made for
TURBOchannel and based on a discrete version of what eventually became
Motorola's widely used CAMEL chipset.

The CAMEL chipset is present for example in the DEC FDDIcontroller
TURBOchannel, EISA and PCI adapters (DEFTA/DEFEA/DEFPA) that we support
with the `defxx' driver, however the host bus interface logic and the
firmware API are different in the DEFZA and hence a separate driver is
required.

There isn't much to say about the driver except that it works, but there
is one peculiarity to mention.  The adapter implements two Tx/Rx queue
pairs.

Of these one pair is the usual network Tx/Rx queue pair, in this case
used by the adapter to exchange frames with the ring, via the RMC (Ring
Memory Controller) chip.  The Tx queue is handled directly by the RMC
chip and resides in onboard packet memory.  The Rx queue is maintained
via DMA in host memory by adapter's firmware copying received data
stored by the RMC in onboard packet memory.

The other pair is used to communicate SMT frames with adapter's
firmware.  Any SMT frame received from the RMC via the Rx queue must be
queued back by the driver to the SMT Rx queue for the firmware to
process.  Similarly the firmware uses the SMT Tx queue to supply the
driver with SMT frames that must be queued back to the Tx queue for the
RMC to send to the ring.

This solution was chosen because the designers ran out of PCB space and
could not squeeze in more logic onto the board that would be required to
handle this SMT frame traffic without the need to involve the driver, as
with the later DEFTA/DEFEA/DEFPA adapters.

Finally the driver does some Frame Control byte decoding, so to avoid
magic numbers some macros are added to <linux/if_fddi.h>.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:46:06 -07:00
David Ahern
7c6bb7d2fa net/ipv6: Add knob to skip DELROUTE message on device down
Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE
notifications when a device is taken down (admin down) or deleted. IPv4
does not generate a message for routes evicted by the down or delete;
IPv6 does. A NOS at scale really needs to avoid these messages and have
IPv4 and IPv6 behave similarly, relying on userspace to handle link
notifications and evict the routes.

At this point existing user behavior needs to be preserved. Since
notifications are a global action (not per app) the only way to preserve
existing behavior and allow the messages to be skipped is to add a new
sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to
disable the notificatioons.

IPv6 route code already supports the option to skip the message (it is
used for multipath routes for example). Besides the new sysctl we need
to pass the skip_notify setting through the generic fib6_clean and
fib6_walk functions to fib6_clean_node and to set skip_notify on calls
to __ip_del_rt for the addrconf_ifdown path.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12 09:47:02 -07:00
David S. Miller
071a234ad7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-10-08

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.

2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c

3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet

4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski

5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.

6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf

7) various AF_XDP fixes from Björn and Magnus
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08 23:42:44 -07:00
Arthur Fabre
31ce8c4a1a bpf, doc: Document Jump X addressing mode
bpf_asm and the other classic BPF tools support jump conditions
comparing register A to register X, in addition to comparing
register A with constant K.

Only the latter was documented in filter.txt, add two new addressing
modes that describe the former.

Signed-off-by: Arthur Fabre <arthur@arthurfabre.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:20:56 +02:00
Konrad Djimeli
7ccc4f1887 bpf: typo fix in Documentation/networking/af_xdp.rst
Fix a simple typo: Completetion -> Completion

Signed-off-by: Konrad Djimeli <kdjimeli@igalia.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-05 00:11:24 +02:00
Vasundhara Volam
53e233ea2f devlink: Add Documentation/networking/devlink-params-bnxt.txt
This patch adds a new file to add information about configuration
parameters that are supported by bnxt_en driver via devlink.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 13:49:43 -07:00
Vasundhara Volam
9bff98bb35 devlink: Add Documentation/networking/devlink-params.txt
This patch adds a new file to add information about some of the
generic configuration parameters set via devlink.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 13:49:43 -07:00
David Howells
e908bcf4f1 rxrpc: Allow the reply time to be obtained on a client call
Allow the epoch value to be queried on a server connection.  This is in the
rxrpc header of every packet for use in routing and is derived from the
client's state.  It's also not supposed to change unless the client gets
restarted.

AFS can make use of this information to deduce whether a fileserver has
been restarted because the fileserver makes client calls to the filesystem
driver's cache manager to send notifications (ie. callback breaks) about
conflicting changes from other clients.  These convey the fileserver's own
epoch value back to the filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:54:29 +01:00
David Howells
2070a3e449 rxrpc: Allow the reply time to be obtained on a client call
Allow the timestamp on the sk_buff holding the first DATA packet of a reply
to be queried.  This can then be used as a base for the expiry time
calculation on the callback promise duration indicated by an operation
result.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:42:29 +01:00
David S. Miller
6f41617bf2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03 21:00:17 -07:00
Joe Stringer
a610b665ec Documentation: Describe bpf reference tracking
Document the new pointer types in the verifier and how the pointer ID
tracking works to ensure that references which are taken are later
released.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-03 02:53:48 +02:00
David S. Miller
2240c12d7d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-10-01

1) Make xfrmi_get_link_net() static to silence a sparse warning.
   From Wei Yongjun.

2) Remove a unused esph pointer definition in esp_input().
   From Haishuang Yan.

3) Allow the NIC driver to quietly refuse xfrm offload
   in case it does not support it, the SA is created
   without offload in this case.
   From Shannon Nelson.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:31:17 -07:00
Maciej Żenczykowski
d4ce58082f net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not int
(fix documentation and sysctl access to treat it as such)

Tested:
  # zcat /proc/config.gz | egrep ^CONFIG_HZ
  CONFIG_HZ_1000=y
  CONFIG_HZ=1000
  # echo $[(1<<32)/1000 + 1] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294968
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
  # echo $[(1<<32)/1000] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294967
  # echo 0 | tee /proc/sys/net/ipv4/tcp_probe_interval
  # echo -1 | tee /proc/sys/net/ipv4/tcp_probe_interval
  -1
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:33:21 -07:00
Haiyang Zhang
f1951c2256 hv_netvsc: Update document for LRO/RSC support
Update document for LRO/RSC support, and the command line info to
change the setting.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-22 17:23:16 -07:00
Jesse Brandeburg
8062b2263a intel-ethernet: rename i40evf to iavf
Rename the Intel Ethernet Adaptive Virtual Function driver
(i40evf) to a new name (iavf) that is more consistent with
the ongoing maintenance of the driver as the universal VF driver
for multiple product lines.

This first patch fixes up the directory names and the .ko name,
intentionally ignoring the function names inside the driver
for now.  Basically this is the simplest patch that gets
the rename done and will be followed by other patches that
rename the internal functions.

This patch also addresses a couple of string/name issues
and updates the Copyright year.

Also, made sure to add a MODULE_ALIAS to the old name.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-18 08:43:03 -07:00
Tobin C. Harding
a20625e49d docs: net: Remove TCP congestion document
Document is stale, let's remove it.

Remove TCP congestion document.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:42:55 -07:00
Henrik Austad
a7ddcea58a Drop all 00-INDEX files from Documentation/
This is a respin with a wider audience (all that get_maintainer returned)
and I know this spams a *lot* of people. Not sure what would be the correct
way, so my apologies for ruining your inbox.

The 00-INDEX files are supposed to give a summary of all files present
in a directory, but these files are horribly out of date and their
usefulness is brought into question. Often a simple "ls" would reveal
the same information as the filenames are generally quite descriptive as
a short introduction to what the file covers (it should not surprise
anyone what Documentation/sched/sched-design-CFS.txt covers)

A few years back it was mentioned that these files were no longer really
needed, and they have since then grown further out of date, so perhaps
it is time to just throw them out.

A short status yields the following _outdated_ 00-INDEX files, first
counter is files listed in 00-INDEX but missing in the directory, last
is files present but not listed in 00-INDEX.

List of outdated 00-INDEX:
Documentation: (4/10)
Documentation/sysctl: (0/1)
Documentation/timers: (1/0)
Documentation/blockdev: (3/1)
Documentation/w1/slaves: (0/1)
Documentation/locking: (0/1)
Documentation/devicetree: (0/5)
Documentation/power: (1/1)
Documentation/powerpc: (0/5)
Documentation/arm: (1/0)
Documentation/x86: (0/9)
Documentation/x86/x86_64: (1/1)
Documentation/scsi: (4/4)
Documentation/filesystems: (2/9)
Documentation/filesystems/nfs: (0/2)
Documentation/cgroup-v1: (0/2)
Documentation/kbuild: (0/4)
Documentation/spi: (1/0)
Documentation/virtual/kvm: (1/0)
Documentation/scheduler: (0/2)
Documentation/fb: (0/1)
Documentation/block: (0/1)
Documentation/networking: (6/37)
Documentation/vm: (1/3)

Then there are 364 subdirectories in Documentation/ with several files that
are missing 00-INDEX alltogether (and another 120 with a single file and no
00-INDEX).

I don't really have an opinion to whether or not we /should/ have 00-INDEX,
but the above 00-INDEX should either be removed or be kept up to date. If
we should keep the files, I can try to keep them updated, but I rather not
if we just want to delete them anyway.

As a starting point, remove all index-files and references to 00-INDEX and
see where the discussion is going.

Signed-off-by: Henrik Austad <henrik@austad.us>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Just-do-it-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: [Almost everybody else]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-09-09 15:08:58 -06:00
Ioana Radulescu
34ff68465a dpaa2-eth: Move DPAA2 Ethernet driver from staging to drivers/net
The DPAA2 Ethernet driver supports Freescale/NXP SoCs with DPAA2
(DataPath Acceleration Architecture v2). The driver manages
network objects discovered on the fsl-mc bus.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01 17:16:59 -07:00