Merge tag 'v5.2-rc4' into mauro
We need to pick up post-rc1 changes to various document files so they don't get lost in Mauro's massive RST conversion push.
This commit is contained in:
30
Documentation/networking/device_drivers/index.rst
Normal file
30
Documentation/networking/device_drivers/index.rst
Normal file
@@ -0,0 +1,30 @@
|
||||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
|
||||
Vendor Device Drivers
|
||||
=====================
|
||||
|
||||
Contents:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
freescale/dpaa2/index
|
||||
intel/e100
|
||||
intel/e1000
|
||||
intel/e1000e
|
||||
intel/fm10k
|
||||
intel/igb
|
||||
intel/igbvf
|
||||
intel/ixgb
|
||||
intel/ixgbe
|
||||
intel/ixgbevf
|
||||
intel/i40e
|
||||
intel/iavf
|
||||
intel/ice
|
||||
|
||||
.. only:: subproject
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
@@ -11,19 +11,7 @@ Contents:
|
||||
batman-adv
|
||||
can
|
||||
can_ucan_protocol
|
||||
device_drivers/freescale/dpaa2/index
|
||||
device_drivers/intel/e100
|
||||
device_drivers/intel/e1000
|
||||
device_drivers/intel/e1000e
|
||||
device_drivers/intel/fm10k
|
||||
device_drivers/intel/igb
|
||||
device_drivers/intel/igbvf
|
||||
device_drivers/intel/ixgb
|
||||
device_drivers/intel/ixgbe
|
||||
device_drivers/intel/ixgbevf
|
||||
device_drivers/intel/i40e
|
||||
device_drivers/intel/iavf
|
||||
device_drivers/intel/ice
|
||||
device_drivers/index
|
||||
dsa/index
|
||||
devlink-info-versions
|
||||
ieee802154
|
||||
@@ -40,6 +28,8 @@ Contents:
|
||||
checksum-offloads
|
||||
segmentation-offloads
|
||||
scaling
|
||||
tls
|
||||
tls-offload
|
||||
|
||||
.. only:: subproject
|
||||
|
||||
|
@@ -560,10 +560,10 @@ tcp_comp_sack_delay_ns - LONG INTEGER
|
||||
Default : 1,000,000 ns (1 ms)
|
||||
|
||||
tcp_comp_sack_nr - INTEGER
|
||||
Max numer of SACK that can be compressed.
|
||||
Max number of SACK that can be compressed.
|
||||
Using 0 disables SACK compression.
|
||||
|
||||
Detault : 44
|
||||
Default : 44
|
||||
|
||||
tcp_slow_start_after_idle - BOOLEAN
|
||||
If set, provide RFC2861 behavior and time out the congestion
|
||||
|
@@ -18,7 +18,7 @@ The following technologies are described:
|
||||
* Generic Segmentation Offload - GSO
|
||||
* Generic Receive Offload - GRO
|
||||
* Partial Generic Segmentation Offload - GSO_PARTIAL
|
||||
* SCTP accelleration with GSO - GSO_BY_FRAGS
|
||||
* SCTP acceleration with GSO - GSO_BY_FRAGS
|
||||
|
||||
|
||||
TCP Segmentation Offload
|
||||
@@ -148,7 +148,7 @@ that the IPv4 ID field is incremented in the case that a given header does
|
||||
not have the DF bit set.
|
||||
|
||||
|
||||
SCTP accelleration with GSO
|
||||
SCTP acceleration with GSO
|
||||
===========================
|
||||
|
||||
SCTP - despite the lack of hardware support - can still take advantage of
|
||||
|
1
Documentation/networking/tls-offload-layers.svg
Normal file
1
Documentation/networking/tls-offload-layers.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 49 KiB |
1
Documentation/networking/tls-offload-reorder-bad.svg
Normal file
1
Documentation/networking/tls-offload-reorder-bad.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 6.4 KiB |
1
Documentation/networking/tls-offload-reorder-good.svg
Normal file
1
Documentation/networking/tls-offload-reorder-good.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 6.4 KiB |
482
Documentation/networking/tls-offload.rst
Normal file
482
Documentation/networking/tls-offload.rst
Normal file
@@ -0,0 +1,482 @@
|
||||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
|
||||
==================
|
||||
Kernel TLS offload
|
||||
==================
|
||||
|
||||
Kernel TLS operation
|
||||
====================
|
||||
|
||||
Linux kernel provides TLS connection offload infrastructure. Once a TCP
|
||||
connection is in ``ESTABLISHED`` state user space can enable the TLS Upper
|
||||
Layer Protocol (ULP) and install the cryptographic connection state.
|
||||
For details regarding the user-facing interface refer to the TLS
|
||||
documentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`.
|
||||
|
||||
``ktls`` can operate in three modes:
|
||||
|
||||
* Software crypto mode (``TLS_SW``) - CPU handles the cryptography.
|
||||
In most basic cases only crypto operations synchronous with the CPU
|
||||
can be used, but depending on calling context CPU may utilize
|
||||
asynchronous crypto accelerators. The use of accelerators introduces extra
|
||||
latency on socket reads (decryption only starts when a read syscall
|
||||
is made) and additional I/O load on the system.
|
||||
* Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto
|
||||
on a packet by packet basis, provided the packets arrive in order.
|
||||
This mode integrates best with the kernel stack and is described in detail
|
||||
in the remaining part of this document
|
||||
(``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``).
|
||||
* Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where
|
||||
NIC driver and firmware replace the kernel networking stack
|
||||
with its own TCP handling, it is not usable in production environments
|
||||
making use of the Linux networking stack for example any firewalling
|
||||
abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``).
|
||||
|
||||
The operation mode is selected automatically based on device configuration,
|
||||
offload opt-in or opt-out on per-connection basis is not currently supported.
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
At a high level user write requests are turned into a scatter list, the TLS ULP
|
||||
intercepts them, inserts record framing, performs encryption (in ``TLS_SW``
|
||||
mode) and then hands the modified scatter list to the TCP layer. From this
|
||||
point on the TCP stack proceeds as normal.
|
||||
|
||||
In ``TLS_HW`` mode the encryption is not performed in the TLS ULP.
|
||||
Instead packets reach a device driver, the driver will mark the packets
|
||||
for crypto offload based on the socket the packet is attached to,
|
||||
and send them to the device for encryption and transmission.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
On the receive side if the device handled decryption and authentication
|
||||
successfully, the driver will set the decrypted bit in the associated
|
||||
:c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and
|
||||
are handled normally. ``ktls`` is informed when data is queued to the socket
|
||||
and the ``strparser`` mechanism is used to delineate the records. Upon read
|
||||
request, records are retrieved from the socket and passed to decryption routine.
|
||||
If device decrypted all the segments of the record the decryption is skipped,
|
||||
otherwise software path handles decryption.
|
||||
|
||||
.. kernel-figure:: tls-offload-layers.svg
|
||||
:alt: TLS offload layers
|
||||
:align: center
|
||||
:figwidth: 28em
|
||||
|
||||
Layers of Kernel TLS stack
|
||||
|
||||
Device configuration
|
||||
====================
|
||||
|
||||
During driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and
|
||||
``NETIF_F_HW_TLS_TX`` features and installs its
|
||||
:c:type:`struct tlsdev_ops <tlsdev_ops>`
|
||||
pointer in the :c:member:`tlsdev_ops` member of the
|
||||
:c:type:`struct net_device <net_device>`.
|
||||
|
||||
When TLS cryptographic connection state is installed on a ``ktls`` socket
|
||||
(note that it is done twice, once for RX and once for TX direction,
|
||||
and the two are completely independent), the kernel checks if the underlying
|
||||
network device is offload-capable and attempts the offload. In case offload
|
||||
fails the connection is handled entirely in software using the same mechanism
|
||||
as if the offload was never tried.
|
||||
|
||||
Offload request is performed via the :c:member:`tls_dev_add` callback of
|
||||
:c:type:`struct tlsdev_ops <tlsdev_ops>`:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int (*tls_dev_add)(struct net_device *netdev, struct sock *sk,
|
||||
enum tls_offload_ctx_dir direction,
|
||||
struct tls_crypto_info *crypto_info,
|
||||
u32 start_offload_tcp_sn);
|
||||
|
||||
``direction`` indicates whether the cryptographic information is for
|
||||
the received or transmitted packets. Driver uses the ``sk`` parameter
|
||||
to retrieve the connection 5-tuple and socket family (IPv4 vs IPv6).
|
||||
Cryptographic information in ``crypto_info`` includes the key, iv, salt
|
||||
as well as TLS record sequence number. ``start_offload_tcp_sn`` indicates
|
||||
which TCP sequence number corresponds to the beginning of the record with
|
||||
sequence number from ``crypto_info``. The driver can add its state
|
||||
at the end of kernel structures (see :c:member:`driver_state` members
|
||||
in ``include/net/tls.h``) to avoid additional allocations and pointer
|
||||
dereferences.
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
After TX state is installed, the stack guarantees that the first segment
|
||||
of the stream will start exactly at the ``start_offload_tcp_sn`` sequence
|
||||
number, simplifying TCP sequence number matching.
|
||||
|
||||
TX offload being fully initialized does not imply that all segments passing
|
||||
through the driver and which belong to the offloaded socket will be after
|
||||
the expected sequence number and will have kernel record information.
|
||||
In particular, already encrypted data may have been queued to the socket
|
||||
before installing the connection state in the kernel.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
In RX direction local networking stack has little control over the segmentation,
|
||||
so the initial records' TCP sequence number may be anywhere inside the segment.
|
||||
|
||||
Normal operation
|
||||
================
|
||||
|
||||
At the minimum the device maintains the following state for each connection, in
|
||||
each direction:
|
||||
|
||||
* crypto secrets (key, iv, salt)
|
||||
* crypto processing state (partial blocks, partial authentication tag, etc.)
|
||||
* record metadata (sequence number, processing offset and length)
|
||||
* expected TCP sequence number
|
||||
|
||||
There are no guarantees on record length or record segmentation. In particular
|
||||
segments may start at any point of a record and contain any number of records.
|
||||
Assuming segments are received in order, the device should be able to perform
|
||||
crypto operations and authentication regardless of segmentation. For this
|
||||
to be possible device has to keep small amount of segment-to-segment state.
|
||||
This includes at least:
|
||||
|
||||
* partial headers (if a segment carried only a part of the TLS header)
|
||||
* partial data block
|
||||
* partial authentication tag (all data had been seen but part of the
|
||||
authentication tag has to be written or read from the subsequent segment)
|
||||
|
||||
Record reassembly is not necessary for TLS offload. If the packets arrive
|
||||
in order the device should be able to handle them separately and make
|
||||
forward progress.
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
The kernel stack performs record framing reserving space for the authentication
|
||||
tag and populating all other TLS header and tailer fields.
|
||||
|
||||
Both the device and the driver maintain expected TCP sequence numbers
|
||||
due to the possibility of retransmissions and the lack of software fallback
|
||||
once the packet reaches the device.
|
||||
For segments passed in order, the driver marks the packets with
|
||||
a connection identifier (note that a 5-tuple lookup is insufficient to identify
|
||||
packets requiring HW offload, see the :ref:`5tuple_problems` section)
|
||||
and hands them to the device. The device identifies the packet as requiring
|
||||
TLS handling and confirms the sequence number matches its expectation.
|
||||
The device performs encryption and authentication of the record data.
|
||||
It replaces the authentication tag and TCP checksum with correct values.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
Before a packet is DMAed to the host (but after NIC's embedded switching
|
||||
and packet transformation functions) the device validates the Layer 4
|
||||
checksum and performs a 5-tuple lookup to find any TLS connection the packet
|
||||
may belong to (technically a 4-tuple
|
||||
lookup is sufficient - IP addresses and TCP port numbers, as the protocol
|
||||
is always TCP). If connection is matched device confirms if the TCP sequence
|
||||
number is the expected one and proceeds to TLS handling (record delineation,
|
||||
decryption, authentication for each record in the packet). The device leaves
|
||||
the record framing unmodified, the stack takes care of record decapsulation.
|
||||
Device indicates successful handling of TLS offload in the per-packet context
|
||||
(descriptor) passed to the host.
|
||||
|
||||
Upon reception of a TLS offloaded packet, the driver sets
|
||||
the :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>`
|
||||
corresponding to the segment. Networking stack makes sure decrypted
|
||||
and non-decrypted segments do not get coalesced (e.g. by GRO or socket layer)
|
||||
and takes care of partial decryption.
|
||||
|
||||
Resync handling
|
||||
===============
|
||||
|
||||
In presence of packet drops or network packet reordering, the device may lose
|
||||
synchronization with the TLS stream, and require a resync with the kernel's
|
||||
TCP stack.
|
||||
|
||||
Note that resync is only attempted for connections which were successfully
|
||||
added to the device table and are in TLS_HW mode. For example,
|
||||
if the table was full when cryptographic state was installed in the kernel,
|
||||
such connection will never get offloaded. Therefore the resync request
|
||||
does not carry any cryptographic connection state.
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
Segments transmitted from an offloaded socket can get out of sync
|
||||
in similar ways to the receive side-retransmissions - local drops
|
||||
are possible, though network reorders are not.
|
||||
|
||||
Whenever an out of order segment is transmitted the driver provides
|
||||
the device with enough information to perform cryptographic operations.
|
||||
This means most likely that the part of the record preceding the current
|
||||
segment has to be passed to the device as part of the packet context,
|
||||
together with its TCP sequence number and TLS record number. The device
|
||||
can then initialize its crypto state, process and discard the preceding
|
||||
data (to be able to insert the authentication tag) and move onto handling
|
||||
the actual packet.
|
||||
|
||||
In this mode depending on the implementation the driver can either ask
|
||||
for a continuation with the crypto state and the new sequence number
|
||||
(next expected segment is the one after the out of order one), or continue
|
||||
with the previous stream state - assuming that the out of order segment
|
||||
was just a retransmission. The former is simpler, and does not require
|
||||
retransmission detection therefore it is the recommended method until
|
||||
such time it is proven inefficient.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
A small amount of RX reorder events may not require a full resynchronization.
|
||||
In particular the device should not lose synchronization
|
||||
when record boundary can be recovered:
|
||||
|
||||
.. kernel-figure:: tls-offload-reorder-good.svg
|
||||
:alt: reorder of non-header segment
|
||||
:align: center
|
||||
|
||||
Reorder of non-header segment
|
||||
|
||||
Green segments are successfully decrypted, blue ones are passed
|
||||
as received on wire, red stripes mark start of new records.
|
||||
|
||||
In above case segment 1 is received and decrypted successfully.
|
||||
Segment 2 was dropped so 3 arrives out of order. The device knows
|
||||
the next record starts inside 3, based on record length in segment 1.
|
||||
Segment 3 is passed untouched, because due to lack of data from segment 2
|
||||
the remainder of the previous record inside segment 3 cannot be handled.
|
||||
The device can, however, collect the authentication algorithm's state
|
||||
and partial block from the new record in segment 3 and when 4 and 5
|
||||
arrive continue decryption. Finally when 2 arrives it's completely outside
|
||||
of expected window of the device so it's passed as is without special
|
||||
handling. ``ktls`` software fallback handles the decryption of record
|
||||
spanning segments 1, 2 and 3. The device did not get out of sync,
|
||||
even though two segments did not get decrypted.
|
||||
|
||||
Kernel synchronization may be necessary if the lost segment contained
|
||||
a record header and arrived after the next record header has already passed:
|
||||
|
||||
.. kernel-figure:: tls-offload-reorder-bad.svg
|
||||
:alt: reorder of header segment
|
||||
:align: center
|
||||
|
||||
Reorder of segment with a TLS header
|
||||
|
||||
In this example segment 2 gets dropped, and it contains a record header.
|
||||
Device can only detect that segment 4 also contains a TLS header
|
||||
if it knows the length of the previous record from segment 2. In this case
|
||||
the device will lose synchronization with the stream.
|
||||
|
||||
When the device gets out of sync and the stream reaches TCP sequence
|
||||
numbers more than a max size record past the expected TCP sequence number,
|
||||
the device starts scanning for a known header pattern. For example
|
||||
for TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur
|
||||
in the SSL/TLS version field of the header. Once pattern is matched
|
||||
the device continues attempting parsing headers at expected locations
|
||||
(based on the length fields at guessed locations).
|
||||
Whenever the expected location does not contain a valid header the scan
|
||||
is restarted.
|
||||
|
||||
When the header is matched the device sends a confirmation request
|
||||
to the kernel, asking if the guessed location is correct (if a TLS record
|
||||
really starts there), and which record sequence number the given header had.
|
||||
The kernel confirms the guessed location was correct and tells the device
|
||||
the record sequence number. Meanwhile, the device had been parsing
|
||||
and counting all records since the just-confirmed one, it adds the number
|
||||
of records it had seen to the record number provided by the kernel.
|
||||
At this point the device is in sync and can resume decryption at next
|
||||
segment boundary.
|
||||
|
||||
In a pathological case the device may latch onto a sequence of matching
|
||||
headers and never hear back from the kernel (there is no negative
|
||||
confirmation from the kernel). The implementation may choose to periodically
|
||||
restart scan. Given how unlikely falsely-matching stream is, however,
|
||||
periodic restart is not deemed necessary.
|
||||
|
||||
Special care has to be taken if the confirmation request is passed
|
||||
asynchronously to the packet stream and record may get processed
|
||||
by the kernel before the confirmation request.
|
||||
|
||||
Error handling
|
||||
==============
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
Packets may be redirected or rerouted by the stack to a different
|
||||
device than the selected TLS offload device. The stack will handle
|
||||
such condition using the :c:func:`sk_validate_xmit_skb` helper
|
||||
(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook).
|
||||
Offload maintains information about all records until the data is
|
||||
fully acknowledged, so if skbs reach the wrong device they can be handled
|
||||
by software fallback.
|
||||
|
||||
Any device TLS offload handling error on the transmission side must result
|
||||
in the packet being dropped. For example if a packet got out of order
|
||||
due to a bug in the stack or the device, reached the device and can't
|
||||
be encrypted such packet must be dropped.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
If the device encounters any problems with TLS offload on the receive
|
||||
side it should pass the packet to the host's networking stack as it was
|
||||
received on the wire.
|
||||
|
||||
For example authentication failure for any record in the segment should
|
||||
result in passing the unmodified packet to the software fallback. This means
|
||||
packets should not be modified "in place". Splitting segments to handle partial
|
||||
decryption is not advised. In other words either all records in the packet
|
||||
had been handled successfully and authenticated or the packet has to be passed
|
||||
to the host's stack as it was on the wire (recovering original packet in the
|
||||
driver if device provides precise error is sufficient).
|
||||
|
||||
The Linux networking stack does not provide a way of reporting per-packet
|
||||
decryption and authentication errors, packets with errors must simply not
|
||||
have the :c:member:`decrypted` mark set.
|
||||
|
||||
A packet should also not be handled by the TLS offload if it contains
|
||||
incorrect checksums.
|
||||
|
||||
Performance metrics
|
||||
===================
|
||||
|
||||
TLS offload can be characterized by the following basic metrics:
|
||||
|
||||
* max connection count
|
||||
* connection installation rate
|
||||
* connection installation latency
|
||||
* total cryptographic performance
|
||||
|
||||
Note that each TCP connection requires a TLS session in both directions,
|
||||
the performance may be reported treating each direction separately.
|
||||
|
||||
Max connection count
|
||||
--------------------
|
||||
|
||||
The number of connections device can support can be exposed via
|
||||
``devlink resource`` API.
|
||||
|
||||
Total cryptographic performance
|
||||
-------------------------------
|
||||
|
||||
Offload performance may depend on segment and record size.
|
||||
|
||||
Overload of the cryptographic subsystem of the device should not have
|
||||
significant performance impact on non-offloaded streams.
|
||||
|
||||
Statistics
|
||||
==========
|
||||
|
||||
Following minimum set of TLS-related statistics should be reported
|
||||
by the driver:
|
||||
|
||||
* ``rx_tls_decrypted`` - number of successfully decrypted TLS segments
|
||||
* ``tx_tls_encrypted`` - number of in-order TLS segments passed to device
|
||||
for encryption
|
||||
* ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream
|
||||
but did not arrive in the expected order
|
||||
* ``tx_tls_drop_no_sync_data`` - number of TX packets dropped because
|
||||
they arrived out of order and associated record could not be found
|
||||
(see also :ref:`pre_tls_data`)
|
||||
|
||||
Notable corner cases, exceptions and additional requirements
|
||||
============================================================
|
||||
|
||||
.. _5tuple_problems:
|
||||
|
||||
5-tuple matching limitations
|
||||
----------------------------
|
||||
|
||||
The device can only recognize received packets based on the 5-tuple
|
||||
of the socket. Current ``ktls`` implementation will not offload sockets
|
||||
routed through software interfaces such as those used for tunneling
|
||||
or virtual networking. However, many packet transformations performed
|
||||
by the networking stack (most notably any BPF logic) do not require
|
||||
any intermediate software device, therefore a 5-tuple match may
|
||||
consistently miss at the device level. In such cases the device
|
||||
should still be able to perform TX offload (encryption) and should
|
||||
fallback cleanly to software decryption (RX).
|
||||
|
||||
Out of order
|
||||
------------
|
||||
|
||||
Introducing extra processing in NICs should not cause packets to be
|
||||
transmitted or received out of order, for example pure ACK packets
|
||||
should not be reordered with respect to data segments.
|
||||
|
||||
Ingress reorder
|
||||
---------------
|
||||
|
||||
A device is permitted to perform packet reordering for consecutive
|
||||
TCP segments (i.e. placing packets in the correct order) but any form
|
||||
of additional buffering is disallowed.
|
||||
|
||||
Coexistence with standard networking offload features
|
||||
-----------------------------------------------------
|
||||
|
||||
Offloaded ``ktls`` sockets should support standard TCP stack features
|
||||
transparently. Enabling device TLS offload should not cause any difference
|
||||
in packets as seen on the wire.
|
||||
|
||||
Transport layer transparency
|
||||
----------------------------
|
||||
|
||||
The device should not modify any packet headers for the purpose
|
||||
of the simplifying TLS offload.
|
||||
|
||||
The device should not depend on any packet headers beyond what is strictly
|
||||
necessary for TLS offload.
|
||||
|
||||
Segment drops
|
||||
-------------
|
||||
|
||||
Dropping packets is acceptable only in the event of catastrophic
|
||||
system errors and should never be used as an error handling mechanism
|
||||
in cases arising from normal operation. In other words, reliance
|
||||
on TCP retransmissions to handle corner cases is not acceptable.
|
||||
|
||||
TLS device features
|
||||
-------------------
|
||||
|
||||
Drivers should ignore the changes to TLS the device feature flags.
|
||||
These flags will be acted upon accordingly by the core ``ktls`` code.
|
||||
TLS device feature flags only control adding of new TLS connection
|
||||
offloads, old connections will remain active after flags are cleared.
|
||||
|
||||
Known bugs
|
||||
==========
|
||||
|
||||
skb_orphan() leaks clear text
|
||||
-----------------------------
|
||||
|
||||
Currently drivers depend on the :c:member:`sk` member of
|
||||
:c:type:`struct sk_buff <sk_buff>` to identify segments requiring
|
||||
encryption. Any operation which removes or does not preserve the socket
|
||||
association such as :c:func:`skb_orphan` or :c:func:`skb_clone`
|
||||
will cause the driver to miss the packets and lead to clear text leaks.
|
||||
|
||||
Redirects leak clear text
|
||||
-------------------------
|
||||
|
||||
In the RX direction, if segment has already been decrypted by the device
|
||||
and it gets redirected or mirrored - clear text will be transmitted out.
|
||||
|
||||
.. _pre_tls_data:
|
||||
|
||||
Transmission of pre-TLS data
|
||||
----------------------------
|
||||
|
||||
User can enqueue some already encrypted and framed records before enabling
|
||||
``ktls`` on the socket. Those records have to get sent as they are. This is
|
||||
perfectly easy to handle in the software case - such data will be waiting
|
||||
in the TCP layer, TLS ULP won't see it. In the offloaded case when pre-queued
|
||||
segment reaches transmission point it appears to be out of order (before the
|
||||
expected TCP sequence number) and the stack does not have a record information
|
||||
associated.
|
||||
|
||||
All segments without record information cannot, however, be assumed to be
|
||||
pre-queued data, because a race condition exists between TCP stack queuing
|
||||
a retransmission, the driver seeing the retransmission and TCP ACK arriving
|
||||
for the retransmitted data.
|
@@ -1,3 +1,9 @@
|
||||
.. _kernel_tls:
|
||||
|
||||
==========
|
||||
Kernel TLS
|
||||
==========
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
@@ -12,6 +18,8 @@ Creating a TLS connection
|
||||
|
||||
First create a new TCP socket and set the TLS ULP.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
sock = socket(AF_INET, SOCK_STREAM, 0);
|
||||
setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
|
||||
|
||||
@@ -21,6 +29,8 @@ handshake is complete, we have all the parameters required to move the
|
||||
data-path to the kernel. There is a separate socket option for moving
|
||||
the transmit and the receive into the kernel.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/* From linux/tls.h */
|
||||
struct tls_crypto_info {
|
||||
unsigned short version;
|
||||
@@ -58,6 +68,8 @@ After setting the TLS_TX socket option all application data sent over this
|
||||
socket is encrypted using TLS and the parameters provided in the socket option.
|
||||
For example, we can send an encrypted hello world record as follows:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
const char *msg = "hello world\n";
|
||||
send(sock, msg, strlen(msg));
|
||||
|
||||
@@ -67,6 +79,8 @@ to the encrypted kernel send buffer if possible.
|
||||
The sendfile system call will send the file's data over TLS records of maximum
|
||||
length (2^14).
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
file = open(filename, O_RDONLY);
|
||||
fstat(file, &stat);
|
||||
sendfile(sock, file, &offset, stat.st_size);
|
||||
@@ -89,6 +103,8 @@ After setting the TLS_RX socket option, all recv family socket calls
|
||||
are decrypted using TLS parameters provided. A full TLS record must
|
||||
be received before decryption can happen.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
char buffer[16384];
|
||||
recv(sock, buffer, 16384);
|
||||
|
||||
@@ -97,12 +113,12 @@ large enough, and no additional allocations occur. If the userspace
|
||||
buffer is too small, data is decrypted in the kernel and copied to
|
||||
userspace.
|
||||
|
||||
EINVAL is returned if the TLS version in the received message does not
|
||||
``EINVAL`` is returned if the TLS version in the received message does not
|
||||
match the version passed in setsockopt.
|
||||
|
||||
EMSGSIZE is returned if the received message is too big.
|
||||
``EMSGSIZE`` is returned if the received message is too big.
|
||||
|
||||
EBADMSG is returned if decryption failed for any other reason.
|
||||
``EBADMSG`` is returned if decryption failed for any other reason.
|
||||
|
||||
Send TLS control messages
|
||||
-------------------------
|
||||
@@ -113,9 +129,11 @@ These messages can be sent over the socket by providing the TLS record type
|
||||
via a CMSG. For example the following function sends @data of @length bytes
|
||||
using a record of type @record_type.
|
||||
|
||||
/* send TLS control message using record_type */
|
||||
.. code-block:: c
|
||||
|
||||
/* send TLS control message using record_type */
|
||||
static int klts_send_ctrl_message(int sock, unsigned char record_type,
|
||||
void *data, size_t length)
|
||||
void *data, size_t length)
|
||||
{
|
||||
struct msghdr msg = {0};
|
||||
int cmsg_len = sizeof(record_type);
|
||||
@@ -151,6 +169,8 @@ type passed via cmsg. If no cmsg buffer is provided, an error is
|
||||
returned if a control message is received. Data messages may be
|
||||
received without a cmsg buffer set.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
char buffer[16384];
|
||||
char cmsg[CMSG_SPACE(sizeof(unsigned char))];
|
||||
struct msghdr msg = {0};
|
||||
@@ -186,12 +206,10 @@ Integrating in to userspace TLS library
|
||||
At a high level, the kernel TLS ULP is a replacement for the record
|
||||
layer of a userspace TLS library.
|
||||
|
||||
A patchset to OpenSSL to use ktls as the record layer is here:
|
||||
A patchset to OpenSSL to use ktls as the record layer is
|
||||
`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
|
||||
|
||||
https://github.com/Mellanox/openssl/commits/tls_rx2
|
||||
|
||||
An example of calling send directly after a handshake using
|
||||
gnutls. Since it doesn't implement a full record layer, control
|
||||
messages are not supported:
|
||||
|
||||
https://github.com/ktls/af_ktls-tool/commits/RX
|
||||
`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
|
||||
of calling send directly after a handshake using gnutls.
|
||||
Since it doesn't implement a full record layer, control
|
||||
messages are not supported.
|
Reference in New Issue
Block a user