This prevents the stack from accessing userspace objects while they
are being torn down.
One possible sequence of events:
- Userspace program exits
- ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(),
ib_destroy_cq(), etc. and releasing/freeing the UCQ
- The QP still has tasklets running, so it isn't destroyed yet
- The CQ is referenced by the QP, so the CQ isn't destroyed yet
- The UCQ is kfree()'d anyway
- A send work request completes
- rxe_send_complete() calls cq->ibcq.comp_handler()
- ib_uverbs_comp_handler() runs and crashes; the event queue is checked
for is_closed, but it has no way to check the ib_ucq_object before
accessing it
The reference counting on the CQ doesn't protect against this since the CQ
hasn't been destroyed yet.
There's no available interface to deregister the UCQ from the CQ, and it
didn't appear that attempting to add reference counting to the UCQ was
going to be a good way to go since this solution is much simpler.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Logic of retrieving netdev speed from net_device and translating it to
IB speed is implemented in rxe, in usnic and in bnxt drivers.
Define new function which merges all.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Its better to use __func__ to print functions name instead of writing
the name in the print statement.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use DEVICE_ATTR RO() macro and rename the show function accordingly.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver may sleep under a spin lock, and the function call path is:
post_one_send (acquire the lock by spin_lock_irqsave)
init_send_wqe
copy_from_user --> may sleep
There is no flow that makes "qp->is_user" true, and copy_from_user may
cause bug when a non-user pointer is used. So the lines of copy_from_user
and check of "qp->is_user" are removed.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This logic seems to be duplicated in (at least) three separate files.
Move it to one place so code can be re-use.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Expose new counters using the get_hw_stats callback.
We expose the following counters:
+---------------------+----------------------------------------+
| Name | Description |
|---------------------+----------------------------------------|
|sent_pkts | number of sent pkts |
|---------------------+----------------------------------------|
|rcvd_pkts | number of received packets |
|---------------------+----------------------------------------|
|out_of_sequence | number of errors due to packet |
| | transport sequence number |
|---------------------+----------------------------------------|
|duplicate_request | number of received duplicated packets. |
| | A request that previously executed is |
| | named duplicated. |
|---------------------+----------------------------------------|
|rcvd_rnr_err | number of received RNR by completer |
|---------------------+----------------------------------------|
|send_rnr_err | number of sent RNR by responder |
|---------------------+----------------------------------------|
|rcvd_seq_err | number of out of sequence packets |
| | received |
|---------------------+----------------------------------------|
|ack_deffered | number of deferred handling of ack |
| | packets. |
|---------------------+----------------------------------------|
|retry_exceeded_err | number of times retry exceeded |
|---------------------+----------------------------------------|
|completer_retry_err | number of times completer decided to |
| | retry |
|---------------------+----------------------------------------|
|send_err | number of failed send packet |
+---------------------+----------------------------------------+
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull rdma DMA mapping updates from Doug Ledford:
"Drop IB DMA mapping code and use core DMA code instead.
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes it
was possible to remove the IB DMA mapping code entirely and switch the
RDMA stack to use the core DMA mapping code.
This resulted in a nice set of cleanups, but touched the entire tree
and has been kept separate for that reason."
* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
IB/core: Remove ib_device.dma_device
nvme-rdma: Switch from dma_device to dev.parent
RDS: net: Switch from dma_device to dev.parent
IB/srpt: Modify a debug statement
IB/srp: Switch from dma_device to dev.parent
IB/iser: Switch from dma_device to dev.parent
IB/IPoIB: Switch from dma_device to dev.parent
IB/rxe: Switch from dma_device to dev.parent
IB/vmw_pvrdma: Switch from dma_device to dev.parent
IB/usnic: Switch from dma_device to dev.parent
IB/qib: Switch from dma_device to dev.parent
IB/qedr: Switch from dma_device to dev.parent
IB/ocrdma: Switch from dma_device to dev.parent
IB/nes: Remove a superfluous assignment statement
IB/mthca: Switch from dma_device to dev.parent
IB/mlx5: Switch from dma_device to dev.parent
IB/mlx4: Switch from dma_device to dev.parent
IB/i40iw: Remove a superfluous assignment statement
IB/hns: Switch from dma_device to dev.parent
...
Pull Mellanox rdma updates from Doug Ledford:
"Mellanox specific updates for 4.11 merge window
Because the Mellanox code required being based on a net-next tree, I
keept it separate from the remainder of the RDMA stack submission that
is based on 4.10-rc3.
This branch contains:
- Various mlx4 and mlx5 fixes and minor changes
- Support for adding a tag match rule to flow specs
- Support for cvlan offload operation for raw ethernet QPs
- A change to the core IB code to recognize raw eth capabilities and
enumerate them (touches non-Mellanox code)
- Implicit On-Demand Paging memory registration support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
IB/mlx5: Fix configuration of port capabilities
IB/mlx4: Take source GID by index from HW GID table
IB/mlx5: Fix blue flame buffer size calculation
IB/mlx4: Remove unused variable from function declaration
IB: Query ports via the core instead of direct into the driver
IB: Add protocol for USNIC
IB/mlx4: Support raw packet protocol
IB/mlx5: Support raw packet protocol
IB/core: Add raw packet protocol
IB/mlx5: Add implicit MR support
IB/mlx5: Expose MR cache for mlx5_ib
IB/mlx5: Add null_mkey access
IB/umem: Indicate that process is being terminated
IB/umem: Update on demand page (ODP) support
IB/core: Add implicit MR flag
IB/mlx5: Support creation of a WQ with scatter FCS offload
IB/mlx5: Enable QP creation with cvlan offload
IB/mlx5: Enable WQ creation and modification with cvlan offload
IB/mlx5: Expose vlan offloads capabilities
IB/uverbs: Enable QP creation with cvlan offload
...
Change the drivers to call ib_query_port in their get port
immutable handler instead of their own query port handler.
Doing this required to set the core cap flags of this device
before the ib_query_port call is made, since the IB core might
need these caps to serve the port query.
Drivers are ensured by the IB core that the port attributes passed
to the port query verb implementation are zero, and hence we
removed the zeroing from the drivers.
This patch doesn't add any new functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add struct ib_udata to the signature of create_ah callback that is
implemented by IB device drivers. This allows HW drivers to return extra
data to the userspace library.
This patch prepares the ground for mlx5 driver to resolve destination
mac address for a given GID and return it to userspace.
This patch was previously submitted by Knut Omang as a part of the
patch set to support Oracle's Infiniband HCA (SIF).
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The udata->inlen error path needs to clean up the ref
added by rxe_alloc().
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Peek at the CQ after arming it so that we can return a hint.
This avoids missed completions due to a race between posting
CQEs and arming the CQ.
For example, CM teardown waits on MAD requests to complete with
ib_cq_poll_work(). Without this fix, the last completion might be
left on the CQ, hanging the kthread doing the teardown.
The console backtraces look like this:
[ 4199.911284] Call Trace:
[ 4199.911401] [<ffffffff9657fe95>] schedule+0x35/0x80
[ 4199.911556] [<ffffffff965830df>] schedule_timeout+0x22f/0x2c0
[ 4199.911727] [<ffffffff9657f7a8>] ? __schedule+0x368/0xa20
[ 4199.911891] [<ffffffff96580903>] wait_for_completion+0xb3/0x130
[ 4199.912067] [<ffffffff960a17e0>] ? wake_up_q+0x70/0x70
[ 4199.912243] [<ffffffffc074a06d>] cm_destroy_id+0x13d/0x450 [ib_cm]
[ 4199.912422] [<ffffffff961615d5>] ? printk+0x57/0x73
[ 4199.912578] [<ffffffffc074a390>] ib_destroy_cm_id+0x10/0x20 [ib_cm]
[ 4199.912759] [<ffffffffc076098c>] rdma_destroy_id+0xac/0x340 [rdma_cm]
[ 4199.912941] [<ffffffffc076f2cc>] 0xffffffffc076f2cc
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
1. Debugging qp state transitions and qp errors in loopback and
multiple QP tests is difficult without qp numbers in debug logs.
This patch adds qp number to important debug logs.
2. Instead of having rxe: prefix in few logs and not having in
few logs, using uniform module name prefix using pr_fmt macro.
3. Code cleanup for various warnings reported by checkpatch for
incomplete unsigned data type, line over 80 characters, return
statements.
Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch avoids scheduing tasklet for WQE and protocol processing
for user space QP. It performs the task in calling process context.
To improve code readability kernel specific post_send handling moved to
post_send_kernel() function.
Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Soft RoCE (RXE) - The software RoCE driver
ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device. This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.
The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.
A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices. The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.
Architecture:
+-----------------------------------------------------------+
| Application |
+-----------------------------------------------------------+
+-----------------------------------+
| libibverbs |
User +-----------------------------------+
+----------------+ +----------------+
| librxe | | HW RoCE lib |
+----------------+ +----------------+
+---------------------------------------------------------------+
+--------------+ +------------+
| Sockets | | RDMA ULP |
+--------------+ +------------+
+--------------+ +---------------------+
| TCP/IP | | ib_core |
+--------------+ +---------------------+
+------------+ +----------------+
Kernel | ib_rxe | | HW RoCE driver |
+------------+ +----------------+
+------------------------------------+
| NIC driver |
+------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------------------------------------------------------+
| Application |
+-----------------------------------------------------------+
+-----------------------------------+
| libibverbs |
User +-----------------------------------+
+----------------+ +----------------+
| librxe | | HW RoCE lib |
+----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+--------------+ +------------+
| Sockets | | RDMA ULP |
+--------------+ +------------+
+--------------+ +---------------------+
| TCP/IP | | ib_core |
+--------------+ +---------------------+
+------------+ +----------------+
Kernel | ib_rxe | | HW RoCE driver |
+------------+ +----------------+
+------------------------------------+
| NIC driver |
+------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Soft RoCE resources:
[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>