Commit Graph

35199 Commits

Author SHA1 Message Date
Ariel Levkovich
ae43033255 net/mlx5: Refactor multi chains and prios support
Decouple the chains infrastructure from eswitch and make
it generic to support other steering namespaces.

The change defines an agnostic data structure to keep
all the relevant information for maintaining flow table
chaining in any steering namespace. Each namespace that
requires table chaining will be required to allocate
such data structure.

The chains creation code will receive the steering namespace
and flow table parameters from the caller so it will operate
agnosticly when creating the required resources to
maintain the table chaining function while Parts of the code
that are relevant to eswitch specific functionality are moved
to eswitch files.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-23 15:44:34 -07:00
Zheng Yongjun
35c52c5c88 net: realtek: Remove set but not used variable
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/realtek/8139cp.c: In function cp_tx_timeout:
drivers/net/ethernet/realtek/8139cp.c:1242:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable]

`rc` is never used, so remove it.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:20:29 -07:00
Luo bin
b1b6c11051 hinic: improve the comments of function header
Fix the warnings about function header comments when building hinic
driver with "W=1" option.

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:20:29 -07:00
David S. Miller
6d772f328d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-09-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 95 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 4211 insertions(+), 2040 deletions(-).

The main changes are:

1) Full multi function support in libbpf, from Andrii.

2) Refactoring of function argument checks, from Lorenz.

3) Make bpf_tail_call compatible with functions (subprograms), from Maciej.

4) Program metadata support, from YiFei.

5) bpf iterator optimizations, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:11:11 -07:00
Zheng Yongjun
46237bf3ee net: microchip: Make lan743x_pm_suspend function return right value
drivers/net/ethernet/microchip/lan743x_main.c: In function lan743x_pm_suspend:

`ret` is set but not used. In fact, `pci_prepare_to_sleep` function value should
be the right value of `lan743x_pm_suspend` function, therefore, fix it.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 11:45:44 -07:00
David S. Miller
573a8095f6 Merge tag 'mlx5-updates-2020-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2020-09-21

Multi packet TX descriptor support for SKBs.

This series introduces some refactoring of the regular TX data path in
mlx5 and adds the Enhanced TX MPWQE feature support. MPWQE stands for
multi-packet work queue element, and it can serve multiple packets,
reducing the PCI bandwidth spent on control traffic. It should improve
performance in scenarios where PCI is the bottleneck, and xmit_more is
signaled by the kernel. The refactoring done in this series also
improves the packet rate on its own.

MPWQE is already implemented in the XDP tx path, this series adds the
support of MPWQE for regular kernel SKB tx path.

MPWQE is supported from ConnectX-5 and onward, for legacy devices we need
to keep backward compatibility for regular (Single packet) WQE descriptor.

MPWQE is not compatible with certain offloads and features, such as TLS
offload, TSO, nonlinear SKBs. If such incompatible features are in use,
the driver gracefully falls back to non-MPWQE per SKB.

Prior to the final patch "net/mlx5e: Enhanced TX MPWQE for SKBs" that adds
the actual support, Maxim did some refactoring to the tx data path to
split it into stages and smaller helper functions that can be utilized and
reused for both legacy and new MPWQE feature.

Performance testing:

UDP performance is improved in a single stream pktgen test:
  Packet rate: 16.86 Mpps (±0.15 Mpps) -> 20.94 Mpps (±0.33 Mpps)
  Instructions per packet: 434 -> 329
  Cycles per packet: 158 -> 123
  Instructions per cycle: 2.75 -> 2.67

TCP and XDP_TX single stream tests show no performance difference.

MPWQE can reduce PCI bandwidth:
  PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores:
    Inbound PCI utilization with MPWQE off: 80.3%
    Inbound PCI utilization with MPWQE on: 59.0%
  PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores:
    Inbound PCI utilization with MPWQE off: 65.4%
    Inbound PCI utilization with MPWQE on: 49.3%

MPWQE can also reduce CPU load, increasing the packet rate in case of
CPU bottleneck:
  PCI Gen2, pktgen at full rate on 24 CPU cores:
    Packet rate with MPWQE off: 37.5 Mpps
    Packet rate with MPWQE on: 49.0 Mpps
  PCI Gen3, pktgen at full rate on 24 CPU cores:
    Packet rate with MPWQE off: 57.0 Mpps
    Packet rate with MPWQE on: 66.8 Mpps

Burst size in all pktgen tests is 32.

CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 17:44:59 -07:00
David S. Miller
3ab0a7a0c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Two minor conflicts:

1) net/ipv4/route.c, adding a new local variable while
   moving another local variable and removing it's
   initial assignment.

2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
   One pretty prints the port mode differently, whilst another
   changes the driver to try and obtain the port mode from
   the port node rather than the switch node.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 16:45:34 -07:00
Maxim Mikityanskiy
5af75c747e net/mlx5e: Enhanced TX MPWQE for SKBs
This commit adds support for Enhanced TX MPWQE feature in the regular
(SKB) data path. A MPWQE (multi-packet work queue element) can serve
multiple packets, reducing the PCI bandwidth on control traffic.

Two new stats (tx*_mpwqe_blks and tx*_mpwqe_pkts) are added. The feature
is on by default and controlled by the skb_tx_mpwqe private flag.

In a MPWQE, eseg is shared among all packets, so eseg-based offloads
(IPSEC, GENEVE, checksum) run on a separate eseg that is compared to the
eseg of the current MPWQE session to decide if the new packet can be
added to the same session.

MPWQE is not compatible with certain offloads and features, such as TLS
offload, TSO, nonlinear SKBs. If such incompatible features are in use,
the driver gracefully falls back to non-MPWQE.

This change has no performance impact in TCP single stream test and
XDP_TX single stream test.

UDP pktgen, 64-byte packets, single stream, MPWQE off:
  Packet rate: 16.96 Mpps (±0.12 Mpps) -> 17.01 Mpps (±0.20 Mpps)
  Instructions per packet: 421 -> 429
  Cycles per packet: 156 -> 161
  Instructions per cycle: 2.70 -> 2.67

UDP pktgen, 64-byte packets, single stream, MPWQE on:
  Packet rate: 16.96 Mpps (±0.12 Mpps) -> 20.94 Mpps (±0.33 Mpps)
  Instructions per packet: 421 -> 329
  Cycles per packet: 156 -> 123
  Instructions per cycle: 2.70 -> 2.67

Enabling MPWQE can reduce PCI bandwidth:
  PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores:
    Inbound PCI utilization with MPWQE off: 80.3%
    Inbound PCI utilization with MPWQE on: 59.0%
  PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores:
    Inbound PCI utilization with MPWQE off: 65.4%
    Inbound PCI utilization with MPWQE on: 49.3%

Enabling MPWQE can also reduce CPU load, increasing the packet rate in
case of CPU bottleneck:
  PCI Gen2, pktgen at full rate on 24 CPU cores:
    Packet rate with MPWQE off: 37.5 Mpps
    Packet rate with MPWQE on: 49.0 Mpps
  PCI Gen3, pktgen at full rate on 24 CPU cores:
    Packet rate with MPWQE off: 57.0 Mpps
    Packet rate with MPWQE on: 66.8 Mpps

Burst size in all pktgen tests is 32.

CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:16 -07:00
Maxim Mikityanskiy
67044a88aa net/mlx5e: Move TX code into functions to be used by MPWQE
mlx5e_txwqe_complete performs some actions that can be taken to separate
functions:

1. Update the flags needed for hardware timestamping.

2. Stop the TX queue if it's full.

Take these actions into separate functions to be reused by the MPWQE
code in the following commit and to maintain clear responsibilities of
functions.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:16 -07:00
Maxim Mikityanskiy
b39fe61edc net/mlx5e: Rename xmit-related structs to generalize them
As preparation for the upcoming TX MPWQE support for SKBs, rename struct
mlx5e_xdp_mpwqe to mlx5e_tx_mpwqe and move it above struct mlx5e_txqsq.
This structure will be reused in the regular SQ and in the regular TX
data path. Also rename mlx5e_xdp_xmit_data to mlx5e_xmit_data - it will
be used in the upcoming TX MPWQE flow.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:16 -07:00
Maxim Mikityanskiy
530d5ce22c net/mlx5e: Generalize TX MPWQE checks for full session
As preparation for the upcoming TX MPWQE for SKBs, create a function
(mlx5e_tx_mpwqe_is_full) to check whether an MPWQE session is full. This
function will be shared by MPWQE code for XDP and for SKBs. Defines are
renamed and moved to make them not XDP-specific.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:16 -07:00
Maxim Mikityanskiy
338c46c636 net/mlx5e: Support multiple SKBs in a TX WQE
TX MPWQE support for SKBs is coming in one of the following patches, and
a single MPWQE can send multiple SKBs. This commit prepares the TX path
code to handle such cases:

1. An additional FIFO for SKBs is added, just like the FIFO for DMA
chunks.

2. struct mlx5e_tx_wqe_info will contain num_fifo_pkts. If a given WQE
contains only one packet, num_fifo_pkts will be zero, and the SKB will
be stored in mlx5e_tx_wqe_info, as usual. If num_fifo_pkts > 0, the SKB
pointer will be NULL, and the SKBs will be stored in the FIFO.

This change has no performance impact in TCP single stream test and
XDP_TX single stream test.

When compiled with a recent GCC, this change shows no visible
performance impact on UDP pktgen (burst 32) single stream test either:
  Packet rate: 16.95 Mpps (±0.15 Mpps) -> 16.96 Mpps (±0.12 Mpps)
  Instructions per packet: 429 -> 421
  Cycles per packet: 160 -> 156
  Instructions per cycle: 2.69 -> 2.70

CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:15 -07:00
Maxim Mikityanskiy
56e4da669a net/mlx5e: Move the TLS resync check out of the function
Before this patch, mlx5e_ktls_tx_handle_resync_dump_comp checked for
resync_dump_frag_page. It happened for all WQEs without an SKB,
including padding WQEs, and required a function call. Normally, padding
WQEs happen more often than TLS resyncs. Take this check out of the
function and put it to an inline function to save a call on all padding
WQEs.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:15 -07:00
Maxim Mikityanskiy
97e3afd64d net/mlx5e: Unify constants for WQE_EMPTY_DS_COUNT
A constant for the number of DS in an empty WQE (i.e. a WQE without data
segments) is needed in multiple places (normal TX data path, MPWQE in
XDP), but currently we have a constant for XDP and an inline formula in
normal TX. This patch introduces a common constant.

Additionally, mlx5e_xdp_mpwqe_session_start is converted to use struct
assignment, because the code nearby is touched.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:15 -07:00
Maxim Mikityanskiy
388a2b56e5 net/mlx5e: Small improvements for XDP TX MPWQE logic
Use MLX5E_XDP_MPW_MAX_WQEBBS to reserve space for a MPWQE, because it's
actually the maximal size a MPWQE can take.

Reorganize the logic that checks when to close the MPWQE session:

1. Put all checks into a single function.

2. When inline is on, make only one comparison - if it's false, the less
strict one will also be false. The compiler probably optimized it out
anyway, but it's clearer to also reflect it in the code.

The MLX5E_XDP_INLINE_WQE_* defines are also changed to make the
calculations more correct from the logical point of view. Though
MLX5E_XDP_INLINE_WQE_MAX_DS_CNT used to be 16 and didn't change its
value, the calculation used to be DIV_ROUND_UP(max inline packet size,
MLX5_SEND_WQE_DS), and the numerator should have included sizeof(struct
mlx5_wqe_inline_seg).

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:14 -07:00
Maxim Mikityanskiy
8e4b53f60f net/mlx5e: Refactor xmit functions
A huge function mlx5e_sq_xmit was split into several to achieve multiple
goals:

1. Reuse the code in IPoIB.

2. Better intergrate with TLS, IPSEC, GENEVE and checksum offloads. Now
it's possible to reserve space in the WQ before running eseg-based
offloads, so:

2.1. It's not needed to copy cseg and eseg after mlx5e_fill_sq_frag_edge
anymore.

2.2. mlx5e_txqsq_get_next_pi will be used instead of the legacy
mlx5e_fill_sq_frag_edge for better code maintainability and reuse.

3. Prepare for the upcoming TX MPWQE for SKBs. It will intervene after
mlx5e_sq_calc_wqe_attr to check if it's possible to use MPWQE, and the
code flow will split into two paths: MPWQE and non-MPWQE.

Two high-level functions are provided to send packets:

* mlx5e_xmit is called by the networking stack, runs offloads and sends
the packet. In one of the following patches, MPWQE support will be added
to this flow.

* mlx5e_sq_xmit_simple is called by the TLS offload, runs only the
checksum offload and sends the packet.

This change has no performance impact in TCP single stream test and
XDP_TX single stream test.

When compiled with a recent GCC, this change shows no visible
performance impact on UDP pktgen (burst 32) single stream test either:
  Packet rate: 16.86 Mpps (±0.15 Mpps) -> 16.95 Mpps (±0.15 Mpps)
  Instructions per packet: 434 -> 429
  Cycles per packet: 158 -> 160
  Instructions per cycle: 2.75 -> 2.69

CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:14 -07:00
Maxim Mikityanskiy
d02dfcd51f net/mlx5e: Move mlx5e_tx_wqe_inline_mode to en_tx.c
Move mlx5e_tx_wqe_inline_mode from en/txrx.h to en_tx.c as it's only
used there.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:14 -07:00
Maxim Mikityanskiy
8ba6f18399 net/mlx5e: Use struct assignment to initialize mlx5e_tx_wqe_info
Struct assignment guarantees that all fields of the structure are
initialized (those that are not mentioned are zeroed). It makes code
mode robust and reduces chances for unpredictable behavior when one
forgets to reset some field and it holds an old value from previous
iterations of using the structure.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:13 -07:00
Maxim Mikityanskiy
6d55af43fe net/mlx5e: Refactor inline header size calculation in the TX path
As preparation for the next patch, don't increase ihs to calculate
ds_cnt and then decrease it, but rather calculate the intermediate value
temporarily. This code has the same amount of arithmetic operations, but
now allows to split out ds_cnt calculation, which will be performed in
the next patch.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 19:41:13 -07:00
Vladimir Oltean
8194d8fa71 net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
The IS2 IP4_TCP_UDP key offsets do not correspond to the VSC7514
datasheet. Whether they work or not is unknown to me. On VSC9959 and
VSC9953, with the same mistake and same discrepancy from the
documentation, tc-flower src_port and dst_port rules did not work, so I
am assuming the same is true here.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 17:40:52 -07:00
David S. Miller
47cec3f68c Merge tag 'mlx5-fixes-2020-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5 fixes-2020-09-18

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

v1->v2:
 Remove missing patch from -stable list.

For -stable v5.1
 ('net/mlx5: Fix FTE cleanup')

For -stable v5.3
 ('net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported')
 ('net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported')

For -stable v5.7
 ('net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready')

For -stable v5.8
 ('net/mlx5e: Use RCU to protect rq->xdp_prog')
 ('net/mlx5e: Fix endianness when calculating pedit mask first bit')
 ('net/mlx5e: Use synchronize_rcu to sync with NAPI')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 17:32:42 -07:00
Saeed Mahameed
cb39ccc5cb net/mlx5e: mlx5e_fec_in_caps() returns a boolean
Returning errno is a bug, fix that.

Also fixes smatch warnings:
drivers/net/ethernet/mellanox/mlx5/core/en/port.c:453
mlx5e_fec_in_caps() warn: signedness bug returning '(-95)'

Fixes: 2132b71f78 ("net/mlx5e: Advertise globaly supported FEC modes")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
2020-09-21 17:22:25 -07:00
Saeed Mahameed
94c4fed710 net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
The spinlock only needed when accessing the channel's icosq, grab the lock
after the buf allocation in resync_post_get_progress_params() to avoid
kzalloc(GFP_KERNEL) in atomic context.

Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Reported-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
2020-09-21 17:22:25 -07:00
Saeed Mahameed
581642f32f net/mlx5e: kTLS, Fix leak on resync error flow
Resync progress params buffer and dma weren't released on error,
Add missing error unwinding for resync_post_get_progress_params().

Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
2020-09-21 17:22:24 -07:00
Saeed Mahameed
66ce5fc057 net/mlx5e: kTLS, Add missing dma_unmap in RX resync
Progress params dma address is never unmapped, unmap it when completion
handling is over.

Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
2020-09-21 17:22:24 -07:00
Tariq Toukan
6e8de0b6b4 net/mlx5e: kTLS, Fix napi sync and possible use-after-free
Using synchronize_rcu() is sufficient to wait until running NAPI quits.

See similar upstream fix with detailed explanation:
("net/mlx5e: Use synchronize_rcu to sync with NAPI")

This change also fixes a possible use-after-free as the NAPI
might be already released at this stage.

Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21 17:22:24 -07:00
Tariq Toukan
8f0bcd19b1 net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
The set of TLS TX global SW counters in mlx5e_tls_sw_stats_desc
is updated from all rings by using atomic ops.
This set of stats is used only in the FPGA TLS use case, not in
the Connect-X TLS one, where regular per-ring counters are used.

Do not expose them in the Connect-X use case, as this would cause
counter duplication. For example, tx_tls_drop_no_sync_data would
appear twice in the ethtool stats.

Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 17:22:24 -07:00
Alaa Hleihel
b521105b68 net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
The cited commit started to reuse function mlx5e_update_ndo_stats() for
the representors as well.
However, the function is hard-coded to work on mlx5e_nic_stats_grps only.
Due to this issue, the representors statistics were not updated in the
output of "ip -s".

Fix it to work with the correct group by extracting it from the caller's
profile.

Also, while at it and since this function became generic, move it to
en_stats.c and rename it accordingly.

Fixes: 8a236b1514 ("net/mlx5e: Convert rep stats to mlx5e_stats_grp-based infra")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21 17:22:23 -07:00
Ron Diskin
47c97e6b10 net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
Currently the FW does not generate events for counters other than error
counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s
uses) might run in atomic context, while the FW interface is non atomic.
Thus, 'ip' is not allowed to issue FW commands, so it will only display
cached counters in the driver.

Add a SW counter (mcast_packets) in the driver to count rx multicast
packets. The counter also counts broadcast packets, as we consider it a
special case of multicast.
Use the counter value when calling "ip -s"/"ifconfig".

Fixes: f62b8bb8f2 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21 17:22:23 -07:00
Maor Dickman
82198d8bcd net/mlx5e: Fix endianness when calculating pedit mask first bit
The field mask value is provided in network byte order and has to
be converted to host byte order before calculating pedit mask
first bit.

Fixes: 88f30bbcba ("net/mlx5e: Bit sized fields rewrite support")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 17:22:23 -07:00
Maor Dickman
6cec0229ab net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
The cited commit creates peer miss group during switchdev mode
initialization in order to handle miss packets correctly while in VF
LAG mode. This is done regardless of FW support of such groups which
could cause rules setups failure later on.

Fix by adding FW capability check before creating peer groups/rule.

Fixes: ac004b8321 ("net/mlx5e: E-Switch, Add peer miss rules")
Signed-off-by: Maor Dickman <maord@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21 17:22:22 -07:00
Roi Dayan
4c8594adb9 net/mlx5e: CT: Fix freeing ct_label mapping
Add missing mapping remove call when removing ct rule,
as the mapping was allocated when ct rule was adding with ct_label.
Also there is a missing mapping remove call in error flow.

Fixes: 54b154ecfb ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21 17:22:22 -07:00
Jianbo Liu
12a240a414 net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
When deleting vxlan flow rule under multipath, tun_info in parse_attr is
not freed when the rule is not ready.

Fixes: ef06c9ee89 ("net/mlx5e: Allow one failure when offloading tc encap rules under multipath")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21 17:22:22 -07:00
Maxim Mikityanskiy
9c25a22dfb net/mlx5e: Use synchronize_rcu to sync with NAPI
As described in the previous commit, napi_synchronize doesn't quite fit
the purpose when we just need to wait until the currently running NAPI
quits. Its implementation waits until NAPI is not running by polling and
waiting for 1ms in between. In cases where we need to deactivate one
queue (e.g., recovery flows) or where we deactivate them one-by-one
(deactivate channel flow), we may get stuck in napi_synchronize forever
if other queues keep NAPI active, causing a soft lockup. Depending on
kernel configuration (CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC), it may result
in a kernel panic.

To fix the issue, use synchronize_rcu to wait for NAPI to quit, and wrap
the whole NAPI in rcu_read_lock.

Fixes: acc6c5953a ("net/mlx5e: Split open/close channels to stages")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21 17:22:21 -07:00
Maxim Mikityanskiy
fe45386a20 net/mlx5e: Use RCU to protect rq->xdp_prog
Currently, the RQs are temporarily deactivated while hot-replacing the
XDP program, and napi_synchronize is used to make sure rq->xdp_prog is
not in use. However, napi_synchronize is not ideal: instead of waiting
till the end of a NAPI cycle, it polls and waits until NAPI is not
running, sleeping for 1ms between the periodic checks. Under heavy
workloads, this loop will never end, which may even lead to a kernel
panic if the kernel detects the hangup. Such workloads include XSK TX
and possibly also heavy RX (XSK or normal).

The fix is inspired by commit 326fe02d1e ("net/mlx4_en: protect
ring->xdp_prog with rcu_read_lock"). As mlx5e_xdp_handle is already
protected by rcu_read_lock, and bpf_prog_put uses call_rcu to free the
program, there is no need for additional synchronization if proper RCU
functions are used to access the pointer. This patch converts all
accesses to rq->xdp_prog to use RCU functions.

Fixes: 86994156c7 ("net/mlx5e: XDP fast RX drop bpf programs support")
Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21 17:22:21 -07:00
Maor Gottlieb
cefc23554f net/mlx5: Fix FTE cleanup
Currently, when an FTE is allocated, its refcount is decreased to 0
with the purpose it will not be a stand alone steering object and every
rule (destination) of the FTE would increase the refcount.
When mlx5_cleanup_fs is called while not all rules were deleted by the
steering users, it hit refcount underflow on the FTE once clean_tree
calls to tree_remove_node after the deleted rules already decreased
the refcount to 0.

FTE is no longer destroyed implicitly when the last rule (destination)
is deleted. mlx5_del_flow_rules avoids it by increasing the refcount on
the FTE and destroy it explicitly after all rules were deleted. So we
can avoid the refcount underflow by making FTE as stand alone object.
In addition need to set del_hw_func to FTE so the HW object will be
destroyed when the FTE is deleted from the cleanup_tree flow.

refcount_t: underflow; use-after-free.
WARNING: CPU: 2 PID: 15715 at lib/refcount.c:28 refcount_warn_saturate+0xd9/0xe0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 tree_put_node+0xf2/0x140 [mlx5_core]
 clean_tree+0x4e/0xf0 [mlx5_core]
 clean_tree+0x4e/0xf0 [mlx5_core]
 clean_tree+0x4e/0xf0 [mlx5_core]
 clean_tree+0x5f/0xf0 [mlx5_core]
 clean_tree+0x4e/0xf0 [mlx5_core]
 clean_tree+0x5f/0xf0 [mlx5_core]
 mlx5_cleanup_fs+0x26/0x270 [mlx5_core]
 mlx5_unload+0x2e/0xa0 [mlx5_core]
 mlx5_unload_one+0x51/0x120 [mlx5_core]
 mlx5_devlink_reload_down+0x51/0x90 [mlx5_core]
 devlink_reload+0x39/0x120
 ? devlink_nl_cmd_reload+0x43/0x220
 genl_rcv_msg+0x1e4/0x420
 ? genl_family_rcv_msg_attrs_parse+0x100/0x100
 netlink_rcv_skb+0x47/0x110
 genl_rcv+0x24/0x40
 netlink_unicast+0x217/0x2f0
 netlink_sendmsg+0x30f/0x430
 sock_sendmsg+0x30/0x40
 __sys_sendto+0x10e/0x140
 ? handle_mm_fault+0xc4/0x1f0
 ? do_page_fault+0x33f/0x630
 __x64_sys_sendto+0x24/0x30
 do_syscall_64+0x48/0x130
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 718ce4d601 ("net/mlx5: Consolidate update FTE for all removal changes")
Fixes: bd71b08ec2 ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21 17:22:21 -07:00
Zheng Yongjun
3ba6baf64b net: natsemi: Remove set but not used variable
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/natsemi/ns83820.c: In function ns83820_get_link_ksettings:
drivers/net/ethernet/natsemi/ns83820.c:1210:11: warning: variable ‘tanar’ set but not used [-Wunused-but-set-variable]

`tanar` is never used, so remove it.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 14:54:00 -07:00
Dan Carpenter
58ed68b592 sfc: Fix error code in probe
This failure path should return a negative error code but it currently
returns success.

Fixes: 51b35a454e ("sfc: skeleton EF100 PF driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:55:13 -07:00
Shay Agroskin
f49ed500d6 net: ena: Fix all static chekers' warnings
After running Sparse checker on the driver using
    make C=1 M=drivers/net/ethernet/amazon/ena

the only error that is thrown is:
    sparse: sparse: Using plain integer as NULL pointer
about the line
    struct ena_calc_queue_size_ctx calc_queue_ctx = { 0 };

This patch fixes this warning, thus making our driver free (for now) of
Sparse errors/warnings.

To make a more complete work, this patch also fixes all static warnings
that were found using an internal static checker.

Signed-off-by: Ido Segev <idose@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:54:23 -07:00
Shay Agroskin
0deca83ff1 net: ena: Change RSS related macros and variables names
The formal name changes to "ENA_ADMIN_RSS_INDIRECTION_TABLE_CONFIG".
Indirection is the ability to reference "something" using "something else"
instead of the value itself.
Indirection table, as the name implies, is the ability to reference
CPU/Queue value using hash-to-CPU table instead of CPU/Queue itself.

This patch renames the variable keys_num, which describes the number of
words in the RSS hash key, to key_parts which makes its purpose clearer
in RSS context.

Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:54:23 -07:00
Shay Agroskin
a8aea84981 net: ena: Remove redundant print of placement policy
The placement policy is printed in the process of queue creation in
ena_up(). No need to print it in ena_probe().

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:54:23 -07:00
Shay Agroskin
bf2746e849 net: ena: Capitalize all log strings and improve code readability
Capitalize all log strings printed by the ena driver to make their
format uniform across it.

Also fix indentation, spelling mistakes and comments to improve code
readability. This also includes adding comments to macros/enums whose
purpose might be difficult to understand.
Separate some code into functions to make it easier to understand the
purpose of these lines.

Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:54:23 -07:00
Shay Agroskin
f0525298f3 net: ena: Change log message to netif/dev function
Make log prints in ena_netdev use the same log functions as the rest of
the driver.

For the sake of consistency, all prints in ena_netdev file were
converted into netif_* format except where netdev struct isn't yet
defined. For these places, dev_* log functions are used (similar to
the patch for ena_com files).

This commit leaves some corner cases which would be changed in a
future patch.

Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:54:23 -07:00
Shay Agroskin
2246cbc2c2 net: ena: Change license into format to SPDX in all files
All ena files should now use SPDX format in their license string. This
doesn't change the license of the files, but rather states the same
license in fewer words.

Also update the license years in some of the files.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:54:22 -07:00
Qinglang Miao
b696db590f chelsio: simplify the return expression of t3_ael2020_phy_prep()
Simplify the return expression.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:49:05 -07:00
Qinglang Miao
d4b717dd20 enetc: simplify the return expression of enetc_vf_set_mac_addr()
Simplify the return expression.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:47:51 -07:00
Qinglang Miao
ccb5942add ice: simplify the return expression of ice_finalize_update()
Simplify the return expression.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:47:07 -07:00
Qinglang Miao
2595b113d9 mlxsw: spectrum_router: simplify the return expression of __mlxsw_sp_router_init()
Simplify the return expression.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:45:38 -07:00
Qinglang Miao
f621df96ac net: hns3: simplify the return expression of hclgevf_client_start()
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:44:54 -07:00
Qinglang Miao
05c3b6e79d net: qlcnic: simplify the return expression of qlcnic_83xx_shutdown
Simplify the return expression.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21 13:43:36 -07:00