Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-11-01
This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
documentation.
Lyude Paul fixes an issue where a fatal read error occurs when the
device is unplugged from the machine. So change the read error into a
warn while the device is still present.
Manfred Rudigier found that the i350 device was not apart of the "Media
Auto Sense" feature, yet the device supports it. So add the missing
i350 device to the check and fix an issue where the media auto sense
would flip/flop when no cable was connected to the port causing spurious
kernel log messages.
I fixed an issue where the fix to resolve receive buffer starvation was
applied in more than one place in the driver, one being the incorrect
location in the i40e driver.
Wenwen Wang fixes a potential memory leak in e1000 where allocated
memory is not properly cleaned up in one of the error paths.
Jonathan Neuschäfer cleans up the driver documentation to be consistent
and remove the footnote reference, since the footnote no longer exists in
the documentation.
Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
it twice.
v2: Fixed alignment issue in patch 3 of the series based on community
feedback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These asterisks were once references to a line that said:
"* Other names and brands may be claimed as the property of others."
But now, they serve no purpose; they can only irritate the reader.
Fixes: de3edab427 ("e1000: update README for e1000")
Fixes: a3fb65680f ("e100.txt: Cleanup license info in kernel doc")
Fixes: da8c01c450 ("e1000e.txt: Add e1000e documentation")
Fixes: f12a84a9f6 ("Documentation: fm10k: Add kernel documentation")
Fixes: b55c52b193 ("igb.txt: Add igb documentation")
Fixes: c4e9b56e24 ("igbvf.txt: Add igbvf Documentation")
Fixes: d7064f4c19 ("Documentation/networking/: Update Intel wired LAN driver documentation")
Fixes: c4b8c01112 ("ixgbevf.txt: Update ixgbevf documentation")
Fixes: 1e06edcc2f ("Documentation: i40e: Prepare documentation for RST conversion")
Fixes: 105bf2fe6b ("i40evf: add driver to kernel build system")
Fixes: 1fae869bcf ("Documentation: ice: Prepare documentation for RST conversion")
Fixes: df69ba4321 ("ionic: Add basic framework for IONIC Network device driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add documentation file for the MAC/PHY support in the DPAA2
architecture. This describes the architecture and implementation of the
interface between phylink and a DPAA2 network driver.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_max_syn_backlog default value depends on memory size
and TCP ehash size. Before this patch, the max value
was 2048 [1], which is considered too small nowadays.
Increase it to 4096 to match the recent SOMAXCONN change.
[1] This is with TCP ehash size being capped to 524288 buckets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
SOMAXCONN is /proc/sys/net/core/somaxconn default value.
It has been defined as 128 more than 20 years ago.
Since it caps the listen() backlog values, the very small value has
caused numerous problems over the years, and many people had
to raise it on their hosts after beeing hit by problems.
Google has been using 1024 for at least 15 years, and we increased
this to 4096 after TCP listener rework has been completed, more than
4 years ago. We got no complain of this change breaking any
legacy application.
Many applications indeed setup a TCP listener with listen(fd, -1);
meaning they let the system select the backlog.
Raising SOMAXCONN lowers chance of the port being unavailable under
even small SYNFLOOD attack, and reduces possibilities of side channel
vulnerabilities.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
I need to pick up the independent changes made to
Documentation/core-api/memory-allocation.rst to be able to merge further
work without creating a total mess.
Some of the marvell switches have bits controlling the hash algorithm
the ATU uses for MAC addresses. In some industrial settings, where all
the devices are from the same manufacture, and hence use the same OUI,
the default hashing algorithm is not optimal. Allow the other
algorithms to be selected via devlink.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-10-27
The following pull-request contains BPF updates for your *net-next* tree.
We've added 52 non-merge commits during the last 11 day(s) which contain
a total of 65 files changed, 2604 insertions(+), 1100 deletions(-).
The main changes are:
1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF
assembly code. The work here teaches BPF verifier to recognize
kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints
such that verifier allows direct use of bpf_skb_event_output() helper
used in tc BPF et al (w/o probing memory access) that dumps skb data
into perf ring buffer. Also add direct loads to probe memory in order
to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov.
2) Big batch of changes to improve libbpf and BPF kselftests. Besides
others: generalization of libbpf's CO-RE relocation support to now
also include field existence relocations, revamp the BPF kselftest
Makefile to add test runner concept allowing to exercise various
ways to build BPF programs, and teach bpf_object__open() and friends
to automatically derive BPF program type/expected attach type from
section names to ease their use, from Andrii Nakryiko.
3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu.
4) Allow to read BTF as raw data from bpftool. Most notable use case
is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa.
5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which
manages to improve "rx_drop" performance by ~4%., from Björn Töpel.
6) Fix to restore the flow dissector after reattach BPF test and also
fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki.
7) Improve verifier's BTF ctx access for use outside of raw_tp, from
Martin KaFai Lau.
8) Improve documentation for AF_XDP with new sections and to reflect
latest features, from Magnus Karlsson.
9) Add back 'version' section parsing to libbpf for old kernels, from
John Fastabend.
10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(),
from KP Singh.
11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order
to improve insn coverage for built BPF progs, from Yonghong Song.
12) Misc minor cleanups and fixes, from various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Added sections on all the bind flags, libbpf, all the setsockopts and
all the getsockopts. Also updated the document to reflect the latest
features and to correct some spelling errors.
v1 -> v2:
* Updated XDP program with latest BTF map format
* Added one more FAQ entry
* Some minor edits and corrections
v2 -> v3:
* Simplified XDP_SHARED_UMEM example XDP program
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1571648224-16889-1-git-send-email-magnus.karlsson@intel.com
The ppp_idle structure is defined in terms of __kernel_time_t, which is
defined as 'long' on all architectures, and this usage is not affected
by the y2038 problem since it transports a time interval rather than an
absolute time.
However, the ppp user space defines the same structure as time_t, which
may be 64-bit wide on new libc versions even on 32-bit architectures.
It's easy enough to just handle both possible structure layouts on
all architectures, to deal with the possibility that a user space ppp
implementation comes with its own ppp_idle structure definition, as well
as to document the fact that the driver is y2038-safe.
Doing this also avoids the need for a special compat mode translation,
since 32-bit and 64-bit kernels now support the same interfaces. The old
32-bit structure is also available on native 64-bit architectures now,
but this is harmless.
Cc: netdev@vger.kernel.org
Cc: linux-ppp@vger.kernel.org
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Commit 8960b38932 ("linux/dim: Rename externally used net_dim
members") renamed the net_dim API, removing the "net_" prefix from the
structures and functions. The patch didn't update the net_dim.txt
documentation file.
Fix the documentation so that its examples match the current code.
Fixes: 8960b38932 ("linux/dim: Rename externally used net_dim members", 2019-06-25)
Fixes: c002bd529d ("linux/dim: Rename externally exposed macros", 2019-06-25)
Fixes: 4f75da3666 ("linux/dim: Move implementation to .c files")
Cc: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
It's not about times (multiple occurences of an event) but about the
duration of a time interval.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This fixes the following Sphinx warning:
Documentation/networking/devlink-trap.rst:175: WARNING: unknown document: /devlink-trap-netdevsim
Fixes: 9e08745704 ("Documentation: Add description of netdevsim traps")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Add a statistic for TLS record decryption errors.
Since devices are supposed to pass records as-is when they
encounter errors this statistic will count bad records in
both pure software and inline crypto configurations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SNMP stats for number of sockets with successfully
installed sessions. Break them down to software and
hardware ones. Note that if hardware offload fails
stack uses software implementation, and counts the
session appropriately.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These asterisks were once references to a line that said:
"* Other names and brands may be claimed as the property of others."
But now, they serve no purpose; they can only irritate the reader.
Fixes: f12a84a9f6 ("Documentation: fm10k: Add kernel documentation")
Fixes: 1e06edcc2f ("Documentation: i40e: Prepare documentation for RST conversion")
Fixes: 1fae869bcf ("Documentation: ice: Prepare documentation for RST conversion")
Fixes: df69ba4321 ("ionic: Add basic framework for IONIC Network device driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Resolving a couple of Sphinx documentation warnings
that are generated in the networking section.
- WARNING: document isn't included in any toctree
- WARNING: Title underline too short.
Signed-off-by: Adam Zerella <adam.zerella@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski.
2) Use bio_vec in the networking instead of custom skb_frag_t, from
Matthew Wilcox.
3) Make use of xmit_more in r8169 driver, from Heiner Kallweit.
4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen.
5) Support all variants of 5750X bnxt_en chips, from Michael Chan.
6) More RTNL avoidance work in the core and mlx5 driver, from Vlad
Buslov.
7) Add TCP syn cookies bpf helper, from Petar Penkov.
8) Add 'nettest' to selftests and use it, from David Ahern.
9) Add extack support to drop_monitor, add packet alert mode and
support for HW drops, from Ido Schimmel.
10) Add VLAN offload to stmmac, from Jose Abreu.
11) Lots of devm_platform_ioremap_resource() conversions, from
YueHaibing.
12) Add IONIC driver, from Shannon Nelson.
13) Several kTLS cleanups, from Jakub Kicinski.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits)
mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
mlxsw: spectrum: Register CPU port with devlink
mlxsw: spectrum_buffers: Prevent changing CPU port's configuration
net: ena: fix incorrect update of intr_delay_resolution
net: ena: fix retrieval of nonadaptive interrupt moderation intervals
net: ena: fix update of interrupt moderation register
net: ena: remove all old adaptive rx interrupt moderation code from ena_com
net: ena: remove ena_restore_ethtool_params() and relevant fields
net: ena: remove old adaptive interrupt moderation code from ena_netdev
net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*()
net: ena: enable the interrupt_moderation in driver_supported_features
net: ena: reimplement set/get_coalesce()
net: ena: switch to dim algorithm for rx adaptive interrupt moderation
net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
ethtool: implement Energy Detect Powerdown support via phy-tunable
xen-netfront: do not assume sk_buff_head list is empty in error handling
s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb”
net: ena: don't wake up tx queue when down
drop_monitor: Better sanitize notified packets
...
Pull documentation updates from Jonathan Corbet:
"It's a somewhat calmer cycle for docs this time, as the churn of the
mass RST conversion is happily mostly behind us.
- A new document on reproducible builds.
- We finally got around to zapping the documentation for hardware
support that was removed in 2004; one doesn't want to rush these
things.
- The usual assortment of fixes, typo corrections, etc"
* tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits)
Documentation: kbuild: Add document about reproducible builds
docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]
Documentation: Add "earlycon=sbi" to the admin guide
doc🔒 remove reference to clever use of read-write lock
devices.txt: improve entry for comedi (char major 98)
docs: mtd: Update spi nor reference driver
doc: arm64: fix grammar dtb placed in no attributes region
Documentation: sysrq: don't recommend 'S' 'U' before 'B'
mailmap: Update email address for Quentin Perret
docs: ftrace: clarify when tracing is disabled by the trace file
docs: process: fix broken link
Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title
Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command
Documentation/arm/sa1100: Remove some obsolete documentation
docs/zh_CN: update Chinese howto.rst for latexdocs making
Documentation: virt: Fix broken reference to virt tree's index
docs: Fix typo on pull requests guide
kernel-doc: Allow anonymous enum
Documentation: sphinx: Don't parse socket() as identifier reference
Documentation: sphinx: Add missing comma to list of strings
...
While not an exhaustive usage tutorial, this describes the details
needed to build more complex scenarios.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the phylink documentation to make it clear that phylink is
designed to be used on the MAC facing side of the link, rather than
between a SFP and PHY.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the 'fw_load_policy' devlink parameter. The FW load
policy is controlled by the 'app_fw_from_flash' hwinfo key.
Remap the values from devlink to the hwinfo key and back.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the
device reset policy on driver probe.
This parameter is useful in conjunction with the existing
'fw_load_policy' parameter.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the LAN driver documentation to include the latest feature
implementation and driver capabilities.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add the ability to use unaligned chunks in the AF_XDP umem. By
relaxing where the chunks can be placed, it allows to use an
arbitrary buffer size and place whenever there is a free
address in the umem. Helps more seamless DPDK AF_XDP driver
integration. Support for i40e, ixgbe and mlx5e, from Kevin and
Maxim.
2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
application can wake up the kernel for rx/tx processing which
avoids busy-spinning of the latter, useful when app and driver
is located on the same core. Support for i40e, ixgbe and mlx5e,
from Magnus and Maxim.
3) bpftool fixes for printf()-like functions so compiler can actually
enforce checks, bpftool build system improvements for custom output
directories, and addition of 'bpftool map freeze' command, from Quentin.
4) Support attaching/detaching XDP programs from 'bpftool net' command,
from Daniel.
5) Automatic xskmap cleanup when AF_XDP socket is released, and several
barrier/{read,write}_once fixes in AF_XDP code, from Björn.
6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
inclusion as well as libbpf versioning improvements, from Andrii.
7) Several new BPF kselftests for verifier precision tracking, from Alexei.
8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.
9) And more BPF kselftest improvements all over the place, from Stanislav.
10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.
11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.
12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.
13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.
14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.
15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
Peter, Wei, Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can-next 2019-09-04 j1939
this is a pull request for net-next/master consisting of 21 patches.
the first 12 patches are by me and target the CAN core infrastructure.
They clean up the names of variables , structs and struct members,
convert can_rx_register() to use max() instead of open coding it and
remove unneeded code from the can_pernet_exit() callback.
The next three patches are also by me and they introduce and make use of
the CAN midlayer private structure. It is used to hold protocol specific
per device data structures.
The next patch is by Oleksij Rempel, switches the
&net->can.rcvlists_lock from a spin_lock() to a spin_lock_bh(), so that
it can be used from NAPI (soft IRQ) context.
The next 4 patches are by Kurt Van Dijck, he first updates his email
address via mailmap and then extends sockaddr_can to include j1939
members.
The final patch is the collective effort of many entities (The j1939
authors: Oliver Hartkopp, Bastian Stender, Elenita Hinds, kbuild test
robot, Kurt Van Dijck, Maxime Jayat, Robin van der Gracht, Oleksij
Rempel, Marc Kleine-Budde). It adds support of SAE J1939 protocol to the
CAN networking stack.
SAE J1939 is the vehicle bus recommended practice used for communication
and diagnostics among vehicle components. Originating in the car and
heavy-duty truck industry in the United States, it is now widely used in
other parts of the world.
P.S.: This pull request doesn't invalidate my last pull request:
"pull-request: can-next 2019-09-03".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a basic driver framework for the Pensando IONIC
network device. There is no functionality right now other than
the ability to load and unload.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current tag set is still rather small and needs a couple
more tags to help with ASIC identification and to have a
more generic FW version.
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
SAE J1939 is the vehicle bus recommended practice used for communication
and diagnostics among vehicle components. Originating in the car and
heavy-duty truck industry in the United States, it is now widely used in
other parts of the world.
J1939, ISO 11783 and NMEA 2000 all share the same high level protocol.
SAE J1939 can be considered the replacement for the older SAE J1708 and
SAE J1587 specifications.
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Bastian Stender <bst@pengutronix.de>
Signed-off-by: Elenita Hinds <ecathinds@gmail.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr>
Signed-off-by: Robin van der Gracht <robin@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add new parameter (flow_steering_mode) to control the flow steering
mode of the driver.
Two modes are supported:
1. DMFS - Device managed flow steering
2. SMFS - Software/Driver managed flow steering.
In the DMFS mode, the HW steering entities are created through the
FW. In the SMFS mode this entities are created though the driver
directly.
The driver will use the devlink steering mode only if the steering
domain supports it, for now SMFS will manages only the switchdev eswitch
steering domain.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add initial documentation of the devlink-trap mechanism, explaining the
background, motivation and the semantics of the interface.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation of TCP MTU probing can considerably
underestimate the MTU on lossy connections allowing the MSS to get down to
48. We have found that in almost all of these cases on our networks these
paths can handle much larger MTUs meaning the connections are being
artificially limited. Even though TCP MTU probing can raise the MSS back up
we have seen this not to be the case causing connections to be "stuck" with
an MSS of 48 when heavy loss is present.
Prior to pushing out this change we could not keep TCP MTU probing enabled
b/c of the above reasons. Now with a reasonble floor set we've had it
enabled for the past 6 months.
The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
administrators the ability to control the floor of MSS probing.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.
Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.
Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.
Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.
Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).
See commit 0608c69c9a ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.
v2:
- remove superfluous decrypted mark copy (Willem);
- remove the stale doc entry (Boris);
- rely entirely on EOR marking to prevent coalescing (Boris);
- use an internal sendpages flag instead of marking the socket
(Boris).
v3 (Willem):
- reorganize the can_skb_orphan_partial() condition;
- fix the flag leak-in through tcp_bpf_sendmsg.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>