The rmnet_devices information is already stored in muxed_ep, so
storing this in rmnet_devices[] again is redundant.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The end point is set twice in the local_ep as well as the mux_id and
the real_dev in the rmnet private structure. Remove the local_ep.
While these elements are equivalent, rmnet_endpoint will be
used only as part of the rmnet_port for muxed scenarios in VND mode.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mode information on the real device makes it easier to route packets
to rmnet device or bridged device based on the configuration.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of these constants were used in the initial patchset where
custom netlink configuration was used and hence are no longer relevant.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be rewritten in the following patches.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the error messages that are printed by the interrupt handlers
are poorly written. For example, many don't include a device prefix,
so there's no indication that they are EMAC errors.
Also use rate limiting for all messages that could be printed from
interrupt context.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EMAC has a restriction that the upper 32 bits of the base addresses
for the RFD and RRD rings must be the same. The ensure that restriction,
we allocate twice the space for the RRD and locate it at an appropriate
address.
We also re-arrange the allocations so that invalid addresses are even
less likely.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EMAC is capable of multiple TX and RX rings, but the driver only
supports one ring for each. One function had some left-over unused
code that supports multiple rings, but all it did was make the code
harder to read.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 64/32-bit DMA mask hackery in the EMAC driver is not actually necessary,
and is technically not accurate. The EMAC hardware is limted to a 45-bit
DMA address. Although no EMAC-enabled system can have that much DDR,
an IOMMU could possible provide a larger address. Rather than play games
with the DMA mappings, the driver should provide a correct value and
trust the DMA/IOMMU layers to do the right thing.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warnings:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:464:5: warning:
symbol 'hns3_change_all_ring_bd_num' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:477:5: warning:
symbol 'hns3_set_ringparam' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2017-10-10
This series contains updates to e1000e and igb.
Benjamin Poirier provides several fixes for e1000e, starting with a
correction to the return status which was always returning success even
if it was not successful. Fixed code comments to reflect the actual
code behavior. Fixed the conditional test for the correct return
value. Fixed a potential race condition reported by Lennart Sorensen,
where the single flag get_link_status is used to signal two different
states.
Sasha fixes a buffer overrun for i219 devices, where the chipset had
reduced the round-trip latency for the LAN controller DMA accesses
which in some high performance cases caused a buffer overrun while
processing the DMA transactions.
Willem de Bruijn changes the default behavior of e1000e to use the
burst mode settings by default unless the user specifies the
receive interrupt delay (RxIntDelay).
Florian Fainelli updates the driver to differentiate between when
e1000e_put_txbuf() is called from normal reclamation or when a
DMA mapping failure to make the driver more "drop monitor friendly".
Christophe JAILLET fixes a potential NULL pointer dereference by
properly returning -ENOMEM on memory allocation failures.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove limitation of netif_get_num_default_rss_queues()
from logic of RX rings default number.
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix the ring count for ETHTOOL_GRXRINGS. Ring count
not TC size should be return for command "ethtool -n ethx".
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add support for ethtool's ETHTOOL_GRXFH in hns3_get_rxnfc().
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for new flash parts identification, and
also cleanup the flash Part identifying and decoding
code.
Based on the original work of Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e1000e_put_txbuf() can be called from normal reclamation path as well as
when a DMA mapping failure, so we need to differentiate these two cases
when freeing SKBs to be drop monitor friendly. e1000e_tx_hwtstamp_work()
and e1000_remove() are processing TX timestamped SKBs and those should
not be accounted as drops either.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.
The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.
Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Intel® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.
Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When e1000e_poll() is not fast enough to keep up with incoming traffic, the
adapter (when operating in msix mode) raises the Other interrupt to signal
Receiver Overrun.
This is a double problem because 1) at the moment e1000_msix_other()
assumes that it is only called in case of Link Status Change and 2) if the
condition persists, the interrupt is repeatedly raised again in quick
succession.
Ideally we would configure the Other interrupt to not be raised in case of
receiver overrun but this doesn't seem possible on this adapter. Instead,
we handle the first part of the problem by reverting to the practice of
reading ICR in the other interrupt handler, like before commit 16ecba59bc
("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
0a8047ac68 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
anymore. We handle the second part of the problem by not re-enabling the
Other interrupt right away when there is overrun. Instead, we wait until
traffic subsides, napi polling mode is exited and interrupts are
re-enabled.
Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Fixes: 16ecba59bc ("e1000e: Do not read ICR in Other interrupt")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lennart reported the following race condition:
\ e1000_watchdog_task
\ e1000e_has_link
\ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
/* link is up */
mac->get_link_status = false;
/* interrupt */
\ e1000_msix_other
hw->mac.get_link_status = true;
link_active = !hw->mac.get_link_status
/* link_active is false, wrongly */
This problem arises because the single flag get_link_status is used to
signal two different states: link status needs checking and link status is
down.
Avoid the problem by using the return value of .check_for_link to signal
the link status to e1000e_has_link().
Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reading e1000e_check_for_copper_link() shows that get_link_status is set to
false after link has been detected. Therefore, it stays TRUE until then.
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-10-09
This series contains updates to i40e and i40evf only.
Jake fixes missed flag conversion from u64 to u32. Fixes a deafult ITR
value issue where the driver defaults to an ITR value of half the
expected value (in terms of minimum microseconds between interrupts). So
fix this by changing the default values to be calculated using the
ITR_REG_TO_USEC() macro which indicates that we are converting from the
register units into microseconds. Updates the drivers to bump the tail in
increments of 8 and double the number of descriptors we will bundle into
one tail bump when receiving. With the recent kernel support for
enabling XPS and QoS at the same time, we no longer need to worry about
the number of traffic classes when enabling XPS.
Lihong converts the use of hash_for_each() to hash_for_each_safe() to
safely remove a hash entry. Adds a check for the return value for
find_first_bit() in the case that it returns the size passed to search.
Alan fixes a bug in which filters are erroneously removed if they are
removed and then added again. So make sure that when adding a filter, if
we find it already existed in our list, make sure it is not marked to be
removed.
Jayaprakash adds the retrying of PHY reads when the I2C is busy for a
maximum period of 500ms.
Rami fixes code comment typo.
Stefano Brivio simplifies the code by removing the use of a local
return code variable and simply return the results of the read function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2017-10-09
This series contains updates to ixgbe only.
Emil fixes an issue where the semaphore bits could be stuck after a reset
or a crash, by adding the clearing of software resource bits in the
software/firmware synchronization register. Added error checks when we
attempt to identify and initialize the PHY to prevent a crash. Fixed a
few issues in the logic of ixgbe_clean_test_rings() which was exposed by
a previous commit that was causing a crash in ethtool diagnostics.
Bhumika Goyal fixes a couple of instances which were overlooked when we
made ixgbe_mac_operations constant.
Shannon Nelson fixes an issue to restore normal operations after the
last MACVLAN offload is removed, otherwise we get stuck in a single queue
operations.
The infamous Jesper Dangaard Brouer adds a counter which counts the
number of times the recycle fails and the real page allocator is invoked.
Alex updates the adaptive ITR algorithm to better support the needs of the
network. This attempt to make it so that our ITR algorithm will try to
prevent either starving a socket buffer for memory in the case of
transmit, or overrunning an receive socket buffer on receive. We should
function better with new features like XDP which can handle small packets
at high rates without needing to lock us into NAPI polling mode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a typo in i40e_vsi_alloc_arrays() documentation.
The first parameter name should be "vsi" instead of "type".
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The computed result of I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES
is used more than three times in function i40e_config_irq_link_list.
Simply declare a local variable to store it to improve readability.
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
- When the I2C is busy, the PHY reads are delayed. The firmware will
return EGAIN in these cases with an expectation that the SW will
trigger the reads again
- This patch retries the operation for a maximum period of 500ms
Signed-off-by: Jayaprakash Shanmugam <jayaprakash.shanmugam@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The find_first_bit function will return the size passed to search
if the first set bit is not found. This patch adds the check in case
that happens as the return value would be used as the index in an array
and that would have caused the out-of-bounds access.
Detected by CoverityScan, CID 1295969 Out-of-bounds access
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Recently, the kernel gained support for enabling XPS and QoS at the
same time. Thus, we no longer need to worry about the number of
traffic classes when enabling XPS.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Double the number of descriptors we'll bundle into one tail bump when
receiving. Empirical testing has shown that we reduce CPU utilization
and don't appear to reduce throughput or packet rate. 32 seems to be the
sweet spot, as it's half the default polling budget, so we'd essentially
reduce from 4 tail writes when polling down to 2. Increasing this up to
64 appears to have negative impacts as it may become possible that we
don't bump the tail each time we get polled, which could cause a long
delay between returning descriptors to the hardware.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Hardware only fetches descriptors on cachelines of 8, essentially
ignoring the lower 3 bits of the tail register. Thus, it is pointless to
bump tail by an unaligned access as the hardware will ignore some of the
new descriptors we allocated. Thus, it's ideal if we can ensure tail
writes are always aligned to 8.
At first, it seems like we'd already do this, since we allocate
descriptors in batches which are a multiple of 8. Since we'd always
increment by a multiple of 8, it seems like the value should always be
aligned.
However, this ignores allocation failures. If we fail to allocate
a buffer, our tail register will become unaligned. Once it has become
unaligned it will essentially be stuck unaligned until a buffer
allocation happens to fail at the exact amount necessary to re-align it.
We can do better, by simply rounding down the number of buffers we're
about to allocate (cleaned_count) such that "next_to_clean
+ cleaned_count" is rounded to the nearest multiple of 8.
We do this by calculating how far off that value is and subtracting it
from the cleaned_count. This essentially defers allocation of buffers if
they're going to be ignored by hardware anyways, and re-aligns our
next_to_use and tail values after a failure to allocate a descriptor.
This calculation ensures that we always align the tail writes in a way
the hardware expects and don't unnecessarily allocate buffers which
won't be fetched immediately.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The lrxq thresh value tells hardware to immediately interrupt when there
are fewer than N*64 packets left in the ring.
Counter intuitively, empirical testing has shown that decreasing this
value from 2 to 1, and thus changing from an immediate interrupt at
fewer than 128 descriptors down to 64 descriptors causes a small
increase in the maximum total packets per second we can receive. This
increase occurs even when we're polling with interrupts masked, as the
hardware must still handle interrupts internally even if we've disabled
them in software.
Also reduce the value for any VFs we allocate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the past we changed driver behavior to not clear the PBA when
re-enabling interrupts. This change was motivated by the flawed belief
that clearing the PBA would cause a lost interrupt if a receive
interrupt occurred while interrupts were disabled.
According to empirical testing this isn't the case. Additionally, the
data sheet specifically says that we should set the CLEARPBA bit when
re-enabling interrupts in a polling setup.
This reverts commit 40d72a5098 ("i40e/i40evf: don't lose interrupts")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The ITR register expects to be programmed in units of 2 microseconds.
Because of this, all of the drivers I40E_ITR_* constants are in terms of
this 2 microsecond register.
Unfortunately, the rx_itr_default value is expected to be programmed in
microseconds.
Effectively the driver defaults to an ITR value of half the expected
value (in terms of minimum microseconds between interrupts).
Fix this by changing the default values to be calculated using
ITR_REG_TO_USEC macro which indicates that we're converting from the
register units into microseconds.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Due to the asynchronous nature in which mac filters are added and
deleted, there exists a bug in which filters are erroneously removed if
removed then added again quickly.
The events are as such:
- filter marked for removal
- same filter is re-added before watchdog that cleans up filters
- we skip re-adding the filter because we have it already in the
list
- watchdog filter cleanup kicks off and filter is removed
So when we were re-adding the same filter, it didn't actually get added
because it already existed in the list, but was marked for removal and
had yet to actually be removed.
This patch fixes the issue by making sure that when adding a filter, if
we find it already existing in our list, make sure it is not marked to
be removed.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch replaces hash_for_each function with hash_for_each_safe
when calling __i40e_del_filter. The hash_for_each_safe function is
the right one to use when iterating over a hash table to safely remove
a hash entry. Otherwise, incorrect values may be read from freed memory.
Detected by CoverityScan, CID 1402048 Read from pointer after free
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since we don't yet have more than 32 flags, we'll use a u32 for both the
hw_features and flag field. Should we gain more flags in the future, we
may need to convert to a u64 or separate flags out into two fields.
This was overlooked in the previous commit 2781de2134c4 ("i40e/i40evf:
organize and re-number feature flags"), where the feature flag was not
converted form u64 to u32.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In TX data-path, we intentionally do not byte-swap, as documented
in code and in the cited commit log.
This fixes sparse warning:
en_tx.c:720:23: warning: incorrect type in argument 1 (different base types)
en_tx.c:720:23: expected unsigned int [unsigned] [usertype] <noident>
en_tx.c:720:23: got restricted __be32 [usertype] doorbell_qpn
Fixes: 492f5add4b ("net/mlx4_en: Doorbell is byteswapped in Little Endian archs")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following SPARSE warning, in MLX4_GET() macro:
drivers/net/ethernet/mellanox/mlx4/fw.c:233:9: warning: cast to restricted __be64
Fixes: 17d5ceb6e4 ("net/mlx4_core: Fix unaligned accesses")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>