Commit Graph

77704 Commits

Author SHA1 Message Date
Peng Li
96c0e8614e net: hns3: Add support to enable TX/RX promisc mode for H/W rev(0x21)
HCLGE_PROMISC_TX_EN_B and HCLGE_PROMISC_RX_EN_B are not supported
on pdev revision(0x20), new revision(0x21) supports them.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 00:04:26 -04:00
Peng Li
5b5455a9ed net: hns3: Add STRP_TAGP field support for hardware revision 0x21
Hardware Revision(0x21) Buffer Descriptor adds a field STRP_TAGP
for vlan stripped processed indication. STRP_TAGP field has 2 bits,
bit 0 is stripped indication of the vlan tag in outer vlan tag
field, bit 1 is stripped indication of the vlan tag in inner vlan
tag field. For each bit, 0 indicates the tag is not stripped and
1 indicates the tag is stripped.

This patch adds STRP_TAGP support for revision(0x21), and does not
change the revision(0x20) action.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 00:04:26 -04:00
Peng Li
dcb35ccef8 net: hns3: Add support for tx_accept_tag2 and tx_accept_untag2 config
HNS3 Hardware can support up to two VLAN tags in transmit leg, the PPP
module can handle the packets based on the tag1 and tag2 config. This
patch adds support for tag2 config for vlan handling

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 00:04:26 -04:00
Peng Li
846fcc8363 net: hns3: Updates RX packet info fetch in case of multi BD
In the latest revision of the hardware, if a packet is spanning
across multiple BDs then only VLD bit and current data size info
is valid in each BD, and rest of the information is only valid
in the last BD of the packet. In such case we should make sure
we are fetching RX packet size from the first descriptor and
information like VLAN should be fetched from last BD.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 00:04:25 -04:00
Bjørn Mork
102cd90963 qmi_wwan: apply SET_DTR quirk to the SIMCOM shared device ID
SIMCOM are reusing a single device ID for many (all of their?)
different modems, based on different chipsets and firmwares. Newer
Qualcomm chipset generations require setting DTR to wake the QMI
function.  The SIM7600E modem is using such a chipset, making it
fail to work with this driver despite the device ID match.

Fix by unconditionally enabling the SET_DTR quirk for all SIMCOM
modems using this specific device ID.  This is similar to what
we already have done for another case of device IDs recycled over
multiple chipset generations: 14cf4a771b ("drivers: net: usb:
qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201")

Initial testing on an older SIM7100 modem shows no immediate side
effects.

Reported-by: Sebastian Sjoholm <sebastian.sjoholm@gmail.com>
Cc: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:13:02 -04:00
Ard Biesheuvel
3125642695 net: netsec: reduce DMA mask to 40 bits
The netsec network controller IP can drive 64 address bits for DMA, and
the DMA mask is set accordingly in the driver. However, the SynQuacer
SoC, which is the only silicon incorporating this IP at the moment,
integrates this IP in a manner that leaves address bits [63:40]
unconnected.

Up until now, this has not resulted in any problems, given that the DDR
controller doesn't decode those bits to begin with. However, recent
firmware updates for platforms incorporating this SoC allow the IOMMU
to be enabled, which does decode address bits [47:40], and allocates
top down from the IOVA space, producing DMA addresses that have bits
set that have been left unconnected.

Both the DT and ACPI (IORT) descriptions of the platform take this into
account, and only describe a DMA address space of 40 bits (using either
dma-ranges DT properties, or DMA address limits in IORT named component
nodes). However, even though our IOMMU and bus layers may take such
limitations into account by setting a narrower DMA mask when creating
the platform device, the netsec probe() entrypoint follows the common
practice of setting the DMA mask uncondionally, according to the
capabilities of the IP block itself rather than to its integration into
the chip.

It is currently unclear what the correct fix is here. We could hack around
it by only setting the DMA mask if it deviates from its default value of
DMA_BIT_MASK(32). However, this makes it impossible for the bus layer to
use DMA_BIT_MASK(32) as the bus limit, and so it appears that a more
comprehensive approach is required to take DMA limits imposed by the
SoC as a whole into account.

In the mean time, let's limit the DMA mask to 40 bits. Given that there
is currently only one SoC that incorporates this IP, this is a reasonable
approach that can be backported to -stable and buys us some time to come
up with a proper fix going forward.

Fixes: 533dd11a12 ("net: socionext: Add Synquacer NetSec driver")
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Masahisa Kojima <masahisa.kojima@linaro.org>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:12:00 -04:00
Christophe Roullier
026e57585f net: stmmac: add dwmac-4.20a compatible
Manage dwmac-4.20a version from synopsys

Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:08:29 -04:00
Christophe Roullier
6528e02cc9 net: ethernet: stmmac: add adaptation for stm32mp157c.
Glue codes to support stm32mp157c device and stay
compatible with stm32 mcu familly

Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:08:29 -04:00
Yangbo Lu
7349a74ea7 net: ethernet: gianfar_ethtool: get phc index through drvdata
Global variable gfar_phc_index was used to get and store
phc index through gianfar_ptp driver. However gianfar_ptp
had been renamed as ptp_qoriq for QorIQ common PTP driver.
This gfar_phc_index doesn't work any more, and the phc index
is stored in drvdata now. This patch is to support getting
phc index through ptp_qoriq drvdata.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:05:11 -04:00
Yangbo Lu
ceefc71d4c ptp: rework gianfar_ptp as QorIQ common PTP driver
gianfar_ptp was the PTP clock driver for 1588 timer
module of Freescale QorIQ eTSEC (Enhanced Three-Speed
Ethernet Controllers) platforms. Actually QorIQ DPAA
(Data Path Acceleration Architecture) platforms is
also using the same 1588 timer module in hardware.

This patch is to rework gianfar_ptp as QorIQ common
PTP driver to support both DPAA and eTSEC. Moved
gianfar_ptp.c to drivers/ptp/, renamed it as
ptp_qoriq.c, and renamed many variables. There were
not any function changes.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:05:11 -04:00
Jon Maxwell
b1d2e4e03f ifb: fix packets checksum
Fixup the checksum for CHECKSUM_COMPLETE when pulling skbs on RX path.
Otherwise we get splats when tc mirred is used to redirect packets to ifb.

Before fix:

nic: hw csum failure

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:02:22 -04:00
Heiner Kallweit
049ff57a2a net: phy: realtek: add suspend/resume callbacks for RTL8211B
Add RTL8211B suspend / resume callbacks.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:01:56 -04:00
Sridhar Samudrala
ba5e4426e8 virtio_net: Extend virtio to use VF datapath when available
This patch enables virtio_net to switch over to a VF datapath when STANDBY
feature is enabled and a VF netdev is present with the same MAC address.
It allows live migration of a VM with a direct attached VF without the need
to setup a bond/team between a VF and virtio net device in the guest.

It uses the API that is exported by the net_failover driver to create and
and destroy a master failover netdev. When STANDBY feature is enabled, an
additional netdev(failover netdev) is created that acts as a master device
and tracks the state of the 2 lower netdevs. The original virtio_net netdev
is marked as 'standby' netdev and a passthru device with the same MAC is
registered as 'primary' netdev.

The hypervisor needs to unplug the VF device from the guest on the source
host and reset the MAC filter of the VF to initiate failover of datapath
to virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 22:59:54 -04:00
Sridhar Samudrala
9805069d14 virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
This feature bit can be used by hypervisor to indicate virtio_net device to
act as a standby for another device with the same MAC address.

VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 22:59:54 -04:00
Sridhar Samudrala
cfc80d9a11 net: Introduce net_failover driver
The net_failover driver provides an automated failover mechanism via APIs
to create and destroy a failover master netdev and manages a primary and
standby slave netdevs that get registered via the generic failover
infrastructure.

The failover netdev acts a master device and controls 2 slave devices. The
original paravirtual interface gets registered as 'standby' slave netdev and
a passthru/vf device with the same MAC gets registered as 'primary' slave
netdev. Both 'standby' and 'failover' netdevs are associated with the same
'pci' device. The user accesses the network interface via 'failover' netdev.
The 'failover' netdev chooses 'primary' netdev as default for transmits when
it is available with link up and running.

This can be used by paravirtual drivers to enable an alternate low latency
datapath. It also enables hypervisor controlled live migration of a VM with
direct attached VF by failing over to the paravirtual datapath when the VF
is unplugged.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 22:59:54 -04:00
Sridhar Samudrala
1ff78076d8 netvsc: refactor notifier/event handling code to use the failover framework
Use the registration/notification framework supported by the generic
failover infrastructure.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 22:59:54 -04:00
Davide Caratti
cb1603948a vrf: add CRC32c offload to device features
SCTP sockets originated in a VRF can improve their performance if CRC32c
computation is delegated to underlying devices: update device features,
setting NETIF_F_SCTP_CRC. Iterating the following command in the topology
proposed with [1],

 # ip vrf exec vrf-h2 netperf -H 192.0.2.1 -t SCTP_STREAM -- -m 10K

the measured throughput in Mbit/s improved from 2395 ± 1% to 2720 ± 1%.

[1] https://www.spinics.net/lists/netdev/msg486007.html

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 22:55:13 -04:00
Thierry Reding
29555fa3de net: stmmac: Use mutex instead of spinlock
Some drivers, such as DWC EQOS on Tegra, need to perform operations that
can sleep under this lock (clk_set_rate() in tegra_eqos_fix_speed()) for
proper operation. Since there is no need for this lock to be a spinlock,
convert it to a mutex instead.

Fixes: e6ea2d16fc ("net: stmmac: dwc-qos: Add Tegra186 support")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Bhadram Varka <vbhadram@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 22:54:15 -04:00
Sudarsana Reddy Kalluru
1b40428c04 bnx2x: Collect the device debug information during Tx timeout.
Tx-timeout mostly happens due to some issue in the device. In such cases,
debug dump would be helpful for identifying the cause of the issue.
This patch adds support to spill debug data during the Tx timeout. Here
bnx2x_panic_dump() API is used instead of bnx2x_panic(), since we still
want to allow the Tx-timeout recovery a chance to succeed.

Changes from previous version:
-------------------------------
v2: Fixed a coding error.

Please consider applying this to "net-next".

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 22:52:52 -04:00
Christoph Hellwig
06e9552f5f x86/pci-dma: remove the experimental forcesac boot option
Limiting the dma mask to avoid PCI (pre-PCIe) DAC cycles while paying
the huge overhead of an IOMMU is rather pointless, and this seriously
gets in the way of dma mapping work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-28 12:48:16 +02:00
David S. Miller
5b79c2af66 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of easy overlapping changes in the confict
resolutions here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26 19:46:15 -04:00
Christoph Hellwig
db5051ead6 net: convert datagram_poll users tp ->poll_mask
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26 09:16:44 +02:00
Christoph Hellwig
984652dd8b net: remove sock_no_poll
Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need
for a stub.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-26 09:16:44 +02:00
Eran Ben Elisha
05909babce net/mlx5e: Avoid reset netdev stats on configuration changes
Move all RQ, SQ and channel counters from the channel objects into the
priv structure.  With this change, counters will not be reset upon
channel configuration changes.

Channel's statistics for SQs which are associated with TCs higher than
zero will be presented in ethtool -S, only for SQs which were opened at
least once since the module was loaded (regardless of their open/close
current status).  This is done in order to decrease the total amount of
statistics presented and calculated for the common out of box use (no
QoS).

mlx5e_channel_stats is a compound of CH,RQ,SQs stats in order to
create locality for the NAPI when handling TX and RX of the same
channel.

Align the new statistics struct per ring to avoid several channels
update to the same cache line at the same time.
Packet rate was tested, no degradation sensed.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
CC: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 16:14:28 -07:00
Bjorn Helgaas
4695ca9d17 ixgbe: Report PCIe link properties with pcie_print_link_status()
Previously the driver used pcie_get_minimum_link() to warn when the NIC
is in a slot that can't supply as much bandwidth as the NIC could use.

pcie_get_minimum_link() can be misleading because it finds the slowest link
and the narrowest link (which may be different links) without considering
the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
bandwidth, not the true available bandwidth of about 1969 MB/s for a
16 GT/s x1 link.

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.  This finds
the slowest link in the path to the device by computing the total bandwidth
of each link and compares that with the capabilities of the device.

The dmesg change is:

  - PCI Express bandwidth of %dGT/s available
  - (Speed:%s, Width: x%d, Encoding Loss:%s)
  + %u.%03u Gb/s available PCIe bandwidth (%s x%d link)

or, if the device is capable of better performance than is available in the
current slot:

  - This is not sufficient for optimal performance of this card.
  - For optimal performance, at least %dGT/s of bandwidth is required.
  - A slot with more lanes and/or higher speed is suggested.
  + %u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)

Note that the driver previously used dev_warn() to suggest using a
different slot, but pcie_print_link_status() uses dev_info() because if the
platform has no faster slot available, the user can't do anything about the
warning and may not want to be bothered with it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-25 17:29:49 -05:00
Bjorn Helgaas
57d12fc6f7 cxgb4: Report PCIe link properties with pcie_print_link_status()
Previously the driver used pcie_get_minimum_link() to warn when the NIC
is in a slot that can't supply as much bandwidth as the NIC could use.

pcie_get_minimum_link() can be misleading because it finds the slowest link
and the narrowest link (which may be different links) without considering
the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
bandwidth, not the true available bandwidth of about 1969 MB/s for a
16 GT/s x1 link.

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.  This finds
the slowest link in the path to the device by computing the total bandwidth
of each link and compares that with the capabilities of the device.

The dmesg change is:

  - PCIe link speed is %s, device supports %s
  - PCIe link width is x%d, device supports x%d
  + %u.%03u Gb/s available PCIe bandwidth (%s x%d link)

or, if the device is capable of better performance than is available in the
current slot:

  - A slot with more lanes and/or higher speed is suggested for optimal performance.
  + %u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-05-25 17:29:49 -05:00
Bjorn Helgaas
af125b754e bnxt_en: Report PCIe link properties with pcie_print_link_status()
Previously the driver used pcie_get_minimum_link() to warn when the NIC
is in a slot that can't supply as much bandwidth as the NIC could use.

pcie_get_minimum_link() can be misleading because it finds the slowest link
and the narrowest link (which may be different links) without considering
the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
bandwidth, not the true available bandwidth of about 1969 MB/s for a
16 GT/s x1 link.

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.  This finds
the slowest link in the path to the device by computing the total bandwidth
of each link and compares that with the capabilities of the device.

The dmesg change is:

  - PCIe: Speed %s Width x%d
  + %u.%03u Gb/s available PCIe bandwidth (%s x%d link)

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-05-25 17:29:49 -05:00
Bjorn Helgaas
cc04a1dd3f bnx2x: Report PCIe link properties with pcie_print_link_status()
Previously the driver used pcie_get_minimum_link() to warn when the NIC
is in a slot that can't supply as much bandwidth as the NIC could use.

pcie_get_minimum_link() can be misleading because it finds the slowest link
and the narrowest link (which may be different links) without considering
the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
bandwidth, not the true available bandwidth of about 1969 MB/s for a
16 GT/s x1 link.

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.  This finds
the slowest link in the path to the device by computing the total bandwidth
of each link and compares that with the capabilities of the device.

The dmesg change is:

  - %s (%c%d) PCI-E x%d %s found at mem %lx, IRQ %d, node addr %pM
  + %s (%c%d) PCI-E found at mem %lx, IRQ %d, node addr %pM
  + %u.%03u Gb/s available PCIe bandwidth (%s x%d link)

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-05-25 17:29:49 -05:00
Shalom Lagziel
868a01a27d net/mlx5e: Introducing new statistics rwlock
Introduce a new read/write lock that will protect statistics gathering from
netdev channels configuration changes.
e.g. when channels are being replaced (increase/decrease number of rings)
prevent statistic gathering (ndo_get_stats64) to read the statistics of
in-active channels (channels that are being closed).

Plus update channels software statistics on the fly when calling
ndo_get_stats64, and remove it from stats periodic work.

Fixes: 9218b44dcc ("net/mlx5e: Statistics handling refactoring")
Signed-off-by: Shalom Lagziel <shaloml@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:01 -07:00
Saeed Mahameed
6ab75516cf net/mlx5e: Move phy link down events counter out of SW stats
PHY link down events counter belongs to phy_counters group.
although it has special handling, it doesn't mean it can't be there.

Move it to phy_counters_grp handler.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:01 -07:00
Tariq Toukan
3a2f703312 net/mlx5: Use order-0 allocations for all WQ types
Complete the transition of all WQ types to use fragmented
order-0 coherent memory instead of high-order allocations.

CQ-WQ already uses order-0.
Here we do the same for cyclic and linked-list WQs.

This allows the driver to load cleanly on systems with a highly
fragmented coherent memory.

Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet rate of 64B packets, single transmit ring, size 8K.

No degradation is sensed.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Tariq Toukan
549322f2f9 net/mlx5i: Use compilation flag in IPOIB header
If CONFIG_MLX5_CORE_IPOIB is not set, compile-out the
IPOIB related headers.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Tariq Toukan
043dc78ecf net/mlx5e: TX, Use actual WQE size for SQ edge fill
We fill SQ edge with NOPs to avoid WQEs wrap.
Here, instead of doing that in advance for the maximum possible
WQE size, we do it on-demand using the actual WQE size.
We re-order some parts in mlx5e_sq_xmit to finish the calculation
of WQE size (ds_cnt) before doing any writes to the WQE buffer.

When SQ work queue is fragmented (introduced in an downstream patch),
dealing with WQE wraps becomes more frequent. This change would drastically
reduce the overhead in this case.

Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet rate of 64B packets, single transmit ring, size 8K.

Before: 14.9 Mpps
After:  15.8 Mpps

Improvement of 6%.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Tariq Toukan
ddf385e31f net/mlx5e: Use WQ API functions instead of direct fields access
Use the WQ API to get the WQ size, and to map a counter
into a WQ entry index.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Chris Mi
e4ad91f23f net/mlx5e: Split offloaded eswitch TC rules for port mirroring
If a TC rule needs to be split for mirroring, create two HW rules,
in the first level and the second level flow tables accordingly.

In the first level flow table, forward the packet to the mirror
port and forward the packet to the second level flow table for
further processing, eg. encap, vlan push or header re-write.

Currently the matching is repeated in both stages.

While here, simplify the setup of the vhca id valid indicator also
in the existing code.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Chris Mi
592d365159 net/mlx5e: Parse mirroring action for offloaded TC eswitch flows
Currently, we only support the mirred redirect TC sub-action. In order
to support flow based vport mirroring, add support to parse the mirred
mirror sub-action.

For mirroring, user-space will typically set the action order such that
the mirror port (mirror VF) sees packets as the original port (VF under
mirroring) sent them or as it will receive them.

In the general case, it means that packets are potentially sent to the
mirror port before or after some actions were applied on them. To
properly do that, we should follow on the exact action order as set for
the flow and make sure this will also be the case when we program the HW
offload.

We introduce a counter for the output ports (attr->out_count), which we
increase when parsing each mirred redirect/mirror sub-action and when
dealing with encap.

We introduce a counter (attr->mirror_count) telling us if split is
needed. If no split is needed and mirroring is just multicasting to
vport, the mirror count is zero, all the actions of the TC flow should
apply on that single HW flow.

If split is needed, the mirror count tells where to do the split, all
non-mirred tc actions should apply only after the split.

The mirror count is set while parsing the following actions encap/decap,
header re-write, vlan push/pop.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Chris Mi
a842dd04cf net/mlx5: E-switch, Create a second level FDB flow table
If firmware supports the forward action with a destination list
that includes a flow table, create a second level FDB flow table.

This is going to be used for flow based mirroring under the switchdev
offloads mode.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Chris Mi
52fff3274b net/mlx5: E-Switch, Reorganize and rename fdb flow tables
We have several fdb flow tables for each of the legacy and switchdev
modes. In the switchdev mode, there are fast path and slow path flow
tables. Towards adding more flow tables in upcoming patches, reorganize
and rename the various existing ones to reflect their functionality.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Florian Fainelli
e52cde7170 net: dsa: dsa_loop: Make dynamic debugging helpful
Remove redundant debug prints from phy_read/write since we can trace those
calls through trace events. Enhance dynamic debug prints to print arguments
which helps figuring how what is going on at the driver level with higher level
configuration interfaces.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:46:29 -04:00
David S. Miller
d7c52fc8bc Merge tag 'mlx5e-updates-2018-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5e-updates-2018-05-19

This series contains updates for mlx5e netdevice driver with one subject,
DSCP to priority mapping, in the first patch Huy adds the needed API in
dcbnl, the second patch adds the needed mlx5 core capability bits for the
feature, and all other patches are mlx5e (netdev) only changes to add
support for the feature.

From: Huy Nguyen

Dscp to priority mapping for Ethernet packet:

These patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.

Firmware interface:
Mellanox firmware provides two control knobs for this feature:
  QPTS register allow changing the trust state between dscp and
  pcp mode. The default is pcp mode. Once in dscp mode, firmware will
  route the packet based on its dscp value if the dscp field exists.

  QPDPM register allow mapping a specific dscp (0 to 63) to a
  specific priority (0 to 7). By default, all the dscps are mapped to
  priority zero.

Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.

This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.

The dcb buffer configuration will be controlled by lldptool.
>> lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
      maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
>> lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
      sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively

After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.

The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.

When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.

When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:41:23 -04:00
Bo Chen
a45675796a 8139too: Remove unnecessary netif_napi_del()
The call to free_netdev() in __rtl8139_cleanup_dev() clears the network device
napi list, and explicit calls to netif_napi_del() are unnecessary.

Signed-off-by: Bo Chen <chenbo@pdx.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:35:45 -04:00
Thomas Falcon
eb110410b9 ibmvnic: Fix partial success login retries
In its current state, the driver will handle backing device
login in a loop for a certain number of retries while the
device returns a partial success, indicating that the driver
may need to try again using a smaller number of resources.

The variable it checks to continue retrying may change
over the course of operations, resulting in reallocation
of resources but exits without sending the login attempt.
Guard against this by introducing a boolean variable that
will retain the state indicating that the driver needs to
reattempt login with backing device firmware.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:32:48 -04:00
Manish Chopra
608e00d0a2 qed*: Support drop action classification
With this patch, User can configure for the supported
flows to be dropped. Added a stat "gft_filter_drop"
as well to be populated in ethtool for the dropped flows.

For example -

ethtool -N p5p1 flow-type udp4 dst-port 8000 action -1
ethtool -N p5p1 flow-type tcp4 scr-ip 192.168.8.1 action -1

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:10:42 -04:00
Manish Chopra
39385ab02c qede: Support flow classification to the VFs.
With the supported classification modes [4 tuples based,
udp port based, src-ip based], flows can be classified
to the VFs as well. With this patch, flows can be re-directed
to the requested VF provided in "action" field of command.

Please note that driver doesn't really care about the queue bits
in "action" field for the VFs. Since queue will be still chosen
by FW using RSS hash. [I.e., the classification would be done
according to vport-only]

For examples -

ethtool -N p5p1 flow-type udp4 dst-port 8000 action 0x100000000
ethtool -N p5p1 flow-type tcp4 src-ip 192.16.6.10 action 0x200000000
ethtool -U p5p1 flow-type tcp4 src-ip 192.168.40.100 dst-ip \
	192.168.40.200 src-port 6660 dst-port 5550 \
	action 0x100000000

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:10:42 -04:00
Manish Chopra
3893fc62b1 qed*: Support other classification modes.
Currently, driver supports flow classification to PF
receive queues based on TCP/UDP 4 tuples [src_ip, dst_ip,
src_port, dst_port] only.

This patch enables to configure different flow profiles
[For example - only UDP dest port or src_ip based] on the
adapter so that classification can be done according to
just those fields as well. Although, at a time just one
type of flow configuration is supported due to limited
number of flow profiles available on the device.

For example -

ethtool -N enp7s0f0 flow-type udp4 dst-port 45762 action 2
ethtool -N enp7s0f0 flow-type tcp4 src-ip 192.16.4.10 action 1
ethtool -N enp7s0f0 flow-type udp6 dst-port 45762 action 3

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:10:42 -04:00
Manish Chopra
89ffd14ee9 qede: Validate unsupported configurations
Validate and prevent some of the configurations for
unsupported [by firmware] inputs [for example - mac ext,
vlans, masks/prefix, tos/tclass] via ethtool -N/-U.

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:10:42 -04:00
Manish Chopra
87885310c1 qede: Refactor ethtool rx classification flow.
This patch simplifies the ethtool rx flow configuration
[via ethtool -U/-N] flow code base by dividing it logically
into various APIs based on given protocols. It also separates
various validations and calculations done along the flow
in their own APIs.

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 16:10:42 -04:00
Arjun Vynipadath
e2f4f4e927 cxgb4/cxgb4vf: Notify link changes to OS-dependent code
We have a confusion of two different abstractions in the Common
Code:  Physical Link (Port) and Logical Network Interface (Virtual
Interface), and we haven't been properly managing the state of the
intersection of those two abstractions.
On the one hand we have the Physical state of the Link -- up or down --
and on the other we have the logical state of the VI, enabled or not.
{ethN} refers to both the Physical and Logical State. In this case,
ifconfig only affects/interrogates the Logical State of a VI,
and ethtool only deals with the Physical State. And these are different.

So, just because we disable the VI, we don't really want to change the
Physical Link Up/Down state.  Thus, the previous hack to set
"lc->link_ok = 0" when we disable a VI is completely incorrect.

Where we get into trouble is where the Physical Link State and the
Logical VI State cross swords.  And that happens in
t4_handle_get_port_info() where we need to manage/safe the Physical
Link State, but we also need to know when the Logical VI State has
changed and pass that back up to the OS-dependent Driver routine
t4_os_link_changed() which is concerned about the Logical Interface.

So we enable a VI and that causes Firmware to send us a new Port
Information message, but if none of the Physical Link State
particulars have changed, we don't call t4_os_link_changed().

This fix uses the existing OS Contract APIs for the Common Code to
inform the OS-dependent portion of the Host Driver when the "Link" (really
Logical Network Interface) is "up" or "down". A new API
t4_enable_pi_params() is added which calls t4_enable_vi_params() and,
if that is successful, then calls back to the OS Contract API
t4_os_link_changed() notifying the OS-dependent layer of the
potential Link State change.

Original Work by : Casey Leedom <leedom@chelsio.com>

Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 15:10:07 -04:00
Ganesh Goudar
e8d452923a cxgb4: clean up init_one
clean up init_one and use chip_ver consistently throughout
init_one() for chip version.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 14:59:38 -04:00
Ganesh Goudar
57ccaedb74 cxgb4/cxgb4vf: link management changes for new SFP
newer SFPs like SFP28 and QSFP28 Transceiver Modules present
several new possibilities which we haven't faced before. Fix the
assumptions in the code reflecting the more limited capabilities
of previous Transceiver Module systems

Original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 14:56:10 -04:00