Bootup parameters are different for 9116 device. Check added for device
model where-ever bootup parameters are being send.
Signed-off-by: Siva Rebbagondla <siva8118@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Initial frame exchange sequence has been changed for 9116 chip. Getting MAC
address using EEPROM read frame will be once common device configuration is
done and RESET_MAC frame is sending after bootup parameters confirmation is
received, which are different from RS9113 device
Signed-off-by: Siva Rebbagondla <siva8118@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
New firmware files and firmware loading method are added for 9116.
Signed-off-by: Siva Rebbagondla <siva8118@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Till software bootloader ready state, communication with device is common
for 9113 and 9116. Hence moved that part of firmware loading to separate
function rsi_prepare_fw_load(). Also LMAC_VER_OFFSET is different for 9113
and 9116, so renamed existing macro to LMAC_VER_OFFSET_9113
Signed-off-by: Siva Rebbagondla <siva8118@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
9116 device id entry is added in both SDIO and USB interfaces.
New enumberation value taken for the device model. Based on the
device model detected run time, few device specific operations
needs to be performed.
adding rsi_dev_model to get device type in run time, as we can use
same driver for 9113 and 9116 except few firmware load changes.
Signed-off-by: Siva Rebbagondla <siva8118@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Remove not used any longer queue_entry field and flag.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
We do not any longer check txstatus timeout from tasklet, so do not need
this optimization.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Sometimes we can get into situation when there are pending statuses,
but we do not get INT_SOURCE_CSR_TX_FIFO_STATUS. Handle this situation
by arming timeout timer and read statuses (it will fix case when
we just do not have irq) and queue work to handle case we missed
statues from hardware FIFO.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Prepare to use rt2800mmio_fetch_txstatus() in concurrent manner and drop
return value since is not longer needed.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Move rt2800usb_txstatus_pending routine to rt2800lib. It will be reused
by rt2800mmio code.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Use new flush_queue() callback for SoC devices, what was already done for
PCIe devices.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Do not disable txstatus interrupt and add quota of processed tx statuses in
one tasklet. Quota is needed to allow to fed device with new frames during
processing of tx statuses.
Patch fixes about 15% performance degradation on some scenarios caused by
0b0d556e0e ("rt2800mmio: use txdone/txstatus routines from lib").
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
kmalloc can fail in rsi_register_rates_channels but memcpy still attempts
to write to channels. The patch replaces these calls with kmemdup and
passes the error upstream.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
For unknown reasons printk() on some context can cause CPU hung on
embedded MT7620 AP/router MIPS platforms. What can result on wifi
disconnects.
This patch move queue full messages to debug level what is consistent
with other mac80211 drivers which drop packet silently if tx queue is
full. This make MT7620 OpenWRT routers more stable, what was reported
by various users.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Some USB host devices/drivers on some conditions can always return
EPROTO error on submitted URBs. That can cause infinity loop in the
rt2x00 driver.
Since we can have single EPROTO errors we can not mark as device as
removed to avoid infinity loop. However we can count consecutive
EPROTO errors and mark device as removed if get lot of it.
I choose number 10 as threshold.
Reported-and-tested-by: Randy Oostdyk <linux-kernel@oostdyk.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
As reported by Randy we can overwhelm logs on some USB error conditions.
To avoid that use dev_warn_ratelimited() and dev_err_ratelimitd().
Reported-and-tested-by: Randy Oostdyk <linux-kernel@oostdyk.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
In case create_singlethread_workqueue fails, the fix free the
hardware and returns NULL to avoid NULL pointer dereference.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
init_ray_cs does not check value of pcmcia_register_driver,
if it fails, there maybe cause a NULL pointer dereference in
exit_ray_cs.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.
So, replace code of the following form:
sizeof(*pmkids) + max_pmkids * sizeof(pmkids->bssid_info[0])
with:
struct_size(pmkids, bssid_info, num_pmkids)
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Third batch of patches intended for v5.2
* Bump the 20000-series FW API version supported;
* Work on the new debugging infra continues;
* One clean-up to prevent a bogus warning with clang;
* A small cleanup in the PCI ID list;
* Work on new hardware continues;
* RTT confidence indication support for FTM;
* An improvement in HE rate-scaling;
Ariel Levkovich says:
====================
The series exposes the ICM address of the receive transport
interface (TIR) of Raw Packet and RSS QPs to the user since they are
required to properly create and insert steering rules that direct flows to
these QPs.
====================
For dependencies this branch is based on mlx5-next from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* branch 'mlx5_tir_icm':
IB/mlx5: Expose TIR ICM address to user space
net/mlx5: Introduce new TIR creation core API
net/mlx5: Expose TIR ICM address in command outbox
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The commit fc3a2fcaa1 ("mwifiex: use atomic bitops to represent
adapter status variables") had a fairly straightforward bug in it. It
contained this bit of diff:
- if (!adapter->is_suspended) {
+ if (test_bit(MWIFIEX_IS_SUSPENDED, &adapter->work_flags)) {
As you can see the patch missed the "!" when converting to the atomic
bitops. This meant that the resume hasn't done anything at all since
that commit landed and suspend/resume for mwifiex SDIO cards has been
totally broken.
After fixing this mwifiex suspend/resume appears to work again, at
least with the simple testing I've done.
Fixes: fc3a2fcaa1 ("mwifiex: use atomic bitops to represent adapter status variables")
Cc: <stable@vger.kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
The flags field in 'struct shash_desc' never actually does anything.
The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
However, no shash algorithm ever sleeps, making this flag a no-op.
With this being the case, inevitably some users who can't sleep wrongly
pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
actually started sleeping. For example, the shash_ahash_*() functions,
which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
from the ahash API to the shash API. However, the shash functions are
called under kmap_atomic(), so actually they're assumed to never sleep.
Even if it turns out that some users do need preemption points while
hashing large buffers, we could easily provide a helper function
crypto_shash_update_large() which divides the data into smaller chunks
and calls crypto_shash_update() and cond_resched() for each chunk. It's
not necessary to have a flag in 'struct shash_desc', nor is it necessary
to make individual shash algorithms aware of this at all.
Therefore, remove shash_desc::flags, and document that the
crypto_shash_*() functions can be called from any context.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introducing new TIR creation core API which allows caller
to receive back from the call the full command outbox.
This comes as a preparation for the next patch that will
retrieve the TIR ICM address from the command outbox.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Jason Gunthorpe says:
====================
Upon review it turns out there are some long standing problems in BAR
mapping area:
* BAR pages intended for read-only can be switched to writable via mprotect.
* Missing use of rdma_user_mmap_io for the mlx5 clock BAR page.
* Disassociate causes SIGBUS when touching the pages.
* CPU pages are being mapped through to the process via remap_pfn_range
instead of the more appropriate vm_insert_page, causing weird behaviors
during disassociation.
This series adds the missing VM_* flag manipulation, adds faulting a zero
page for disassociation and revises the CPU page mappings to use
vm_insert_page.
====================
For dependencies this branch is based on for-rc from
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
* branch 'rdma_mmap':
RDMA: Remove rdma_user_mmap_page
RDMA/mlx5: Use get_zeroed_page() for clock_info
RDMA/ucontext: Fix regression with disassociate
RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
RDMA/mlx5: Do not allow the user to write to the clock page
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It allows some of the code to be simplified.
Tested on Turris Omnia.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vrf device is not able to change mac address now because lack of
ndo_set_mac_address. Complete this in case some apps need to do
this.
Reported-by: Hui Wang <wanghui104@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
regmap_update_bits could fail and deserves a check.
The patch adds the checks and if it fails, returns its error
code upstream.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
mlxsw currently does not support v6 gateways with v4 routes. Commit
19a9d136f1 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
prevents a route from being added, but nothing stops the replace or
append. Add a catch for them too.
$ ip ro add 172.16.2.0/24 via 10.99.1.2
$ ip ro replace 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.
$ ip ro append 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netdev variant is usable on any context since it disables interrupts.
The napi variant of the call should only be used within softirq context.
Replace napi_alloc_frag on driver init with the correct netdev_alloc_frag
call
Changes since v1:
- Adjusted commit message
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>
Fixes: 4acb20b462 ("net: socionext: different approach on DMA")
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the R-Car Gen3 Hardware Manual Rev 1.50 of Nov 30, 2018, the
TX clock internal delay mode isn't supported on R-Car E3 (r8a77990) or D3
(r8a77995). And by extension it is also not supported by RZ/G2E (r9a774c0).
This matches all ES versions of the affected SoCs as it is
not clear if this problem will be resolved in newer chips.
This can be revisited, as necessary.
This patch does not error-out if PHY_INTERFACE_MODE_RGMII_ID or
PHY_INTERFACE_MODE_RGMII_TXID are used on SoCs where TX clock delay
mode is not supported as there is a risk of introducing a regression
when used in conjunction with older DT blobs present in the field.
Rather, a warning is logged in such cases.
Based on work by Kazuya Mizuguchi.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5-updates-2019-04-22
This series includes updates to mlx5e driver RX data path and some
significant XDP RX/TX improvements to overcome/mitigate HW and PCIE
bottlenecks.
From Tariq:
1) Some Enhancements in rq->flags
2) Stabilize RX packet rate (on Striding RQ) with
multiple outstanding UMR posts
In this patch, we add support for multiple outstanding UMR posts,
to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.
Performance test:
As expected, huge improvement in large-scale (48 cores).
xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.
Before: Unstable, 7 to 30 Mpps
After: Stable, at 70.5 Mpps
From Shay:
3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow
Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).
When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.
Performance:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
* Tested with hyper-threading disabled
XDP_TX:
| | before | after | |
| 24 rings | 51Mpps | 116Mpps | +126% |
| 1 ring | 12Mpps | 12Mpps | same |
XDP_REDIRECT:
** Below is the transmit rate, not the redirection rate
which might be larger, and is not affected by this patch.
| | before | after | |
| 32 rings | 64Mpps | 92Mpps | +43% |
| 1 ring | 6.4Mpps | 6.4Mpps | same |
As we can see, feature significantly improves scaling, without
hurting single ring performance.
From Maxim:
4) Some trivial refactoring and code improvements prior to a larger series
to support AF_XDP.
====================
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a #define for the timeout of mlx5e_wait_for_min_rx_wqes to
clarify the meaning of a magic number.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5e_trigger_irq posts a NOP to the ICO SQ just to trigger an IRQ and
enter the NAPI poll on the right CPU according to the affinity. Use it
in mlx5e_activate_rq.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on
the requested number of frames in the RQ (F) and the number of packets
per WQE (P). It ensures that N is not less than the minimum number of
WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min
should be true. This function deals with logarithms, so it should check
that log(F) - log(P) >= log(N_min). However, if F < P, this expression
will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min)
instead.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This commit moves the parameter calculation functions to a separate file
for better modularity and code sharing with future features.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If the channels fail to reopen after setting an XDP program, return the
error code instead of 0. A proper fix is still needed, as now any error
while reopening the channels brings the interface down. This patch only
adds error reporting.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).
When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.
This is added to better utilize the HW resources (which now makes
one less packet data prefetch) and allow better scalability, on the
account of CPU usage (which now 'memcpy's the packet into the WQE).
To load balance between HW and CPU and get max packet rate, we use
watermarks to detect how much the HW is congested and move the work
loads back and forth between HW and CPU.
Performance:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
* Tested with hyper-threading disabled
XDP_TX:
| | before | after | |
| 24 rings | 51Mpps | 116Mpps | +126% |
| 1 ring | 12Mpps | 12Mpps | same |
XDP_REDIRECT:
** Below is the transmit rate, not the redirection rate
which might be larger, and is not affected by this patch.
| | before | after | |
| 32 rings | 64Mpps | 92Mpps | +43% |
| 1 ring | 6.4Mpps | 6.4Mpps | same |
As we can see, feature significantly improves scaling, without
hurting single ring performance.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This counter tracks how many TX MPWQE sessions are started in XDP SQ
in XDP TX/REDIRECT flow. It counts per-channel and global stats.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The XDP redirect flush indication belongs to the receive queue,
not to its XDP send queue.
For this, use a new bit on rq->flags.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Values in enum mlx5e_rq_flag are used as bit indixes.
Intention was to use them with no BIT(i) wrapping.
No functional bug fix here, as the same (shifted)flag bit
is used for all set, test, and clear operations.
Fixes: 121e892754 ("net/mlx5e: Refactor RQ XDP_TX indication")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The buffers mapping of the Multi-Packet WQEs (of Striding RQ)
is done via UMR posts, one UMR WQE per an RX MPWQE.
A single MPWQE is capable of serving many incoming packets,
usually larger than the budget of a single napi cycle.
Hence, posting a single UMR WQE per napi cycle (and handling its
completion in the next cycle) works fine in many common cases,
but not always.
When an XDP program is loaded, every MPWQE is capable of serving less
packets, to satisfy the packet-per-page requirement.
Thus, for the same number of packets more MPWQEs (and UMR posts)
are needed (twice as much for the default MTU), giving less latency
room for the UMR completions.
In this patch, we add support for multiple outstanding UMR posts,
to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.
For better SW and HW locality, we combine the UMR posts in bulks of
(at least) two.
This is expected to improve packet rate in high CPU scale.
Performance test:
As expected, huge improvement in large-scale (48 cores).
xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.
Before: Unstable, 7 to 30 Mpps
After: Stable, at 70.5 Mpps
No degradation in other tested scenarios.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>