Expose the RoCE discard and drop counters from the HW statistics context
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
crsqe->resp would be NULL in case the host command timed out before
getting a response from HW. Check for NULL pointer to avoid a potential
crash while printing the error message.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In some FW versions, RoCE driver also receives an async notification which
was directed to L2 driver. RoCE driver does not handle this and print a
message to syslog. Drop these notifications silently.
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In the failure path, nq->bar_reg_iomem gets accessed without
initializing. Avoid this by calling the bnxt_qplib_nq_stop_irq only if the
initialization is complete.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Fixes: 6e04b10356 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is reported by smatch check. rcfw->creq_bar_reg_iomem is accessed in
bnxt_qplib_rcfw_stop_irq and this variable check afterwards doesn't make
sense. Also, rcfw->creq_bar_reg_iomem will never be NULL. So Removing
this check.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6e04b10356 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Version macro is not required as the driver is not maintaining the
version. Removing the references of this macro too.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add the missing initalization of the cq_lock and qplib.flush_lock.
Fixes: 942c9b6ca8 ("RDMA/bnxt_re: Avoid Hard lockup during error CQE processing")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
This is required to resolve dependencies of the next series of RDMA
patches.
The code motion conflicts in drivers/infiniband/core/cache.c were
resolved.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Saeed Mahameed says:
====================
mlx5e-updates-2018-10-10
IPoIB netlink support and mlx5e pre-allocated netdevice initialization
IP link was broken due to the changes in IPoIB for the rdma_netdev
support after commit cd565b4b51
("IB/IPoIB: Support acceleration options callbacks").
This patchset fixes IPoIB pkey creation and removal using rtnetlink by
adding support in both IPoIB ULP layer and mlx5 layer:
From Jason and Denis:
1) Introduces changes in the RDMA netdev code in order to
allow allocation of the netdev to be done by the rtnl netdev code.
2) Reworks IPoIB initialization to use the two step rdma_netdev
creation.
From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting
pre-allocated netdevs:
3) Adds support to initialize/cleanup netdevs that are not created
by mlx5 driver.
4) Change mlx5e netdevice layer to accept the pre-allocated netdevice
queue number.
5) Initialize mlx5e generic structures in one place to be used for all
netdevs types NIC/representors/IPoIB (both mlx5 allocated and
pre-allocted).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts were easy to resolve using immediate context mostly,
except the cls_u32.c one where I simply too the entire HEAD
chunk.
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug writes:
"RDMA fixes:
Final for-rc pull request for 4.19
We only have one bug to submit this time around. It fixes a DMA
unmap issue where we unmapped the DMA address from the IOMMU before
we did from the card, resulting in a DMAR error with IOMMU enabled,
or possible crash without."
* tag 'for-gkh' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/mlx5: Unmap DMA addr from HCA before IOMMU
IPoIB netlink support was broken by the below commit since integrating
the rdma_netdev support relies on an allocation flow for netdevs that
was controlled by the ipoib driver while netdev's rtnl_newlink
implementation assumes that the netdev will be allocated by netlink.
Such situation leads to crash in __ipoib_device_add, once trying to
reuse netlink device.
This patch fixes the kernel oops for both mlx4 and mlx5
devices triggered by the following command:
Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
netdev has several interfaces that expect to call alloc_netdev_mqs from
the core code, with the driver only providing the arguments. This is
incompatible with the rdma_netdev interface that returns the netdev
directly.
Thus re-organize the API used by ipoib so that the verbs core code calls
alloc_netdev_mqs for the driver. This is done by allowing the drivers to
provide the allocation parameters via a 'get_params' callback and then
initializing an allocated netdev as a second step.
Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Only some of these were still used by the cxgb4 driver, and that despite
the fact that the driver otherwise uses the generic DMA API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tracking CM_ID resource is performed in two stages: creation of cm_id
and connecting it to the cma_dev. It is needed because rdma-cm protocol
exports two separate user-visible calls rdma_create_id and rdma_accept.
At the time of CM_ID creation, the real owner of that object is unknown
yet and we need to grab task_struct. This task_struct is released or
reassigned in attach phase later on. but call to rdma_destroy_id left
this task_struct unreleased.
Such separation is unique to CM_ID and other restrack objects initialize
in one shot. It means that it is safe to use "res->valid" check to catch
unfinished CM_ID flow and release task_struct for that object.
Fixes: 00313983cd ("RDMA/nldev: provide detailed CM_ID information")
Reported-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves hfi1_copy_sge() into rdmavt for sharing with qib.
This patch also moves all the wss_*() functions into rdmavt as
several wss_*() functions are called from hfi1_copy_sge()
When SGE copy mode is adaptive, cacheless copy may be done in some cases
for performance reasons. In those cases, X86 cacheless copy function
is called since the drivers that use rdmavt and may set SGE copy mode
to adaptive are X86 only. For this reason, this patch adds
"depends on X86_64" to rdmavt/Kconfig.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The atomic operation not supported inline. Besides, the standard atomic
operation only support a sge and the sge is placed in the wqe.
Fix: 384f881("RDMA/hns: Add atomic support")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In order to extend vlan device range, the design add two field of qp
context for checking vlan packet in sender and in recevicer.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds local invalidate Memory Region (MR) support in the kernel
space driver.
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The hip08 hardware has two version. the version id are 0x20 and 0x21
according to the pci revision. It needs to adjust some fields for
extending new features. The specific updates include:
1. Add some fields for supporting new features by enabling some reserved
fields in 0x20 version.
2. remove some fields which the user is not visiable in order to support
the extend features.
3. Init some fields with zero.
These updates is compatible with 0x20 version.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
According to hip08 limit, the buffer size of extend sge needs to be an
integer wqe_sge_buf_page size. For example, the value of sge_shift field
of qp context is greater or equal to eight when buffer page size is 4K
size. The value of sge_shift field of qp context assigned by
hr_qp->sge.sge_cnt.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
According to the IB protocol definition, the driver needs to show the
correct device information and the information will be queryed by device
attribute.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In order to compatible with the third party RoCE device, The hardware
modify the set method for the ecn field of ip header in new hip08
version. The high 6bit of tclass be assigned for dscp field of packet.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The hip08 split two hardware version. The version id are 0x20 and 0x21
according to the PCI revison. The max size of extend sge of sq is limited
to 2M for 0x20 version and 8M for 0x21 version. It may be exceeded to 2M
according to the algorithm that compute the product of wqe count and
extend sge number of every wqe. But the product always less than 8M.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It will print the warning when the MSB bit of SLID is not zero running
cm_req_handler function that test CM. It needs to fixed zero when test
RoCE device.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When user issues a RDMA read and enables sq inline, it needs to report a
bad wr to user.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It needs to include two special qps for every port. The hip08 have four
ports and the all reserved qp numbers are eight.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
All users of rdma_nl_chk_listeners() are interested to get boolean answer
if netlink socket has listeners, so update all places to boolean function.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In rxe_queue_init, q and q->buf are allocated. In do_mmap_info, q->ip is
allocated. When error occurs, rxe_srq_from_init and the later error
handler do not free these allocated memories. This will make memory leak.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix to return a negative error code from the mthca_cmd_init() error
handling case instead of 0, as done elsewhere in this function.
Fixes: 80fd823873 ("[PATCH] IB/mthca: Encapsulate command interface init")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The uapi radix tree is a write-once data structure protected by kref.
Once we get to the ioctl() fop it is not possible for anything else
to be writing to it, so the access should use rcu_dereference_protected.
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The zap_vma_ptes() is declared but not defined on NOMMU kernels, causing a
link error for the newly added uverbs code:
drivers/infiniband/core/uverbs_main.o: In function `uverbs_user_mmap_disassociate':
uverbs_main.c:(.text+0x114c): undefined reference to `zap_vma_ptes'
drivers/infiniband/core/uverbs_main.o: In function `rdma_umap_open':
uverbs_main.c:(.text+0x53c): undefined reference to `zap_vma_ptes'
Since all user access for all of our drivers depend on remapping pages to
user space disable USER_ACCESS when there is no mmu.
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
After bfcb79fca1 ("PCI/ERR: Run error recovery callbacks for all affected
devices"), AER errors are always cleared by the PCI core and drivers don't
need to do it themselves.
Remove calls to pci_cleanup_aer_uncorrect_error_status() from device
driver error recovery functions.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, remove PCI core changes, remove unused variables]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When RDMA CM connect request arrives for IB transport, it already contains
device, port, netdevice (optional).
Instead of traversing all the cma devices, use the cma device already
found by the cma_find_listener() for which a listener id is provided.
iWarp devices doesn't need to derive RoCE GIDs, therefore drop RoCE
specific checks from cma_acquire_dev() and rename it to
cma_iw_acquire_dev().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Light weight version of cma_acquire_dev() just for binding with rdma
device based on source IP(v4/v6) address.
This simplifies cma_acquire_dev() to avoid listen_id specific checks and
also for subsequent simplification for IB vs iWarp.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When IP failover is used between multiple ports of a given rdma device,
allow accepting CM requests from either of the ports. This is applicable
for IPv4 and IPv6 non link local addressing scheme.
IPv6 link local addresses are bound. IP failover requests for listen
cm_ids bound to specific netdev interfaces cannot be supported.
(Similar to traditional sockets).
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Subnet Management Packets (SMP) should exclusively use VL15 and their SL
is ignored (IBTA v1.3, Section 3.5.8.2). Therefore, when an SMP is posted,
the SL in the address handle can be set to 0 by a user
application. Consequently, when an address handle is created by the IB
core, some fields in struct rvt_ah may not be set correctly by using the
SL2SC and SC2VL tables at the time. Subsequently, when the request is post
sent, the incoming swqe may fail the validation check, resulting in the
rejection of the send request.
This patch fixes the problem by using VL15 for any validation, ignoring
the SL in the address handle.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since Virtual Lanes BCT credits and MTU are set through separate MADs, we
have to ensure both are valid, and data VLs are ready for transmission
before we allow port transition to Armed state.
Fixes: 5e2d6764a7 ("IB/hfi1: Verify port data VLs credits on transition to Armed")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The dlid assignment made by looking into the u_ucast_dlid array does not
do an explicit check for the size of the array. The code path to arrive at
def_port, the index value is long and complicated so its best to just have
an explicit check here.
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>