The device can support two credit modes, message based (default) and
packet based. In order to enable packet based mode, the QP should be
created with special flag that indicates this.
This patch adds support for the new DV QP creation flag that can be used
for RC QPs in order to change the credit mode.
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Flow table can be passed as a DEVX object which is a valid destination in
an EGRESS flow. Fix the original code to allow that.
Fixes: a7ee18bdee ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This fixes a compilation warning in sysfs.c
drivers/infiniband/hw/mlx4/sysfs.c:360:2: warning: 'strncpy' output may be
truncated copying 8 bytes from a string of length 31
[-Wstringop-truncation]
By eliminating the temporary stack buffer.
Signed-off-by: Qian Cai <cai@gmx.us>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Commit 4e045572e2 ("IB/hfi1: Add unique txwait_lock for txreq events")
laid the ground work to support per resource waiting locking.
This patch adds that with a lock unique to each sdma engine and pio
sendcontext and makes necessary changes for verbs, PSM, and vnic to use
the new locks.
This is particularly beneficial for smaller messages that will exhaust
resources at a faster rate.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The call to sdma_progress() is called outside the wait lock.
In this case, there is a race condition where sdma_progress() can return
false and the sdma_engine can idle. If that happens, there will be no
more sdma interrupts to cause the wakeup and the vnic_sdma xmit will hang.
Fix by moving the lock to enclose the sdma_progress() call.
Also, delete the tx_retry. The need for this was removed by:
commit bcad29137a ("IB/hfi1: Serve the most starved iowait entry first")
Fixes: 64551ede6c ("IB/hfi1: VNIC SDMA support")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds an interface to allow the driver to initialize the QP priv
struct when the QP is created and after the qpn has been assigned. A
field is added to the QP priv struct to reference the rcd and two new
files are added to contain the function to initialize the rcd field so
that more TID RDMA related code can be added here later.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently, When a reserved operation is completed, its entry in the send
queue will not be unreserved, which leads to the miscalculation of
qp->s_avail and thus the triggering of a WARN_ON call trace. This patch
fixes the problem by unreserving the reserved operation when it is
completed.
Fixes: 856cc4c237 ("IB/hfi1: Add the capability for reserved operations")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ingress packet check for 16B/bypass packets should consider the port
LMC. Not doing this will result in packets sent to the LMC LIDs getting
dropped. The check is implemented in HW for 9B packets.
Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
An incorrect sge sizing in the HFI PIO path will cause an OOPs similar to
this:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] hfi1_verbs_send_pio+0x3d8/0x530 [hfi1]
PGD 0
Oops: 0000 1 SMP
Call Trace:
? hfi1_verbs_send_dma+0xad0/0xad0 [hfi1]
hfi1_verbs_send+0xdf/0x250 [hfi1]
? make_rc_ack+0xa80/0xa80 [hfi1]
hfi1_do_send+0x192/0x430 [hfi1]
hfi1_do_send_from_rvt+0x10/0x20 [hfi1]
rvt_post_send+0x369/0x820 [rdmavt]
ib_uverbs_post_send+0x317/0x570 [ib_uverbs]
ib_uverbs_write+0x26f/0x420 [ib_uverbs]
? security_file_permission+0x21/0xa0
vfs_write+0xbd/0x1e0
? mntput+0x24/0x40
SyS_write+0x7f/0xe0
system_call_fastpath+0x16/0x1b
Fix by adding the missing sizing check to correctly determine the sge
length.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
VNIC assumes that all SDMA engines have been configured for use. This is
not necessarily true (i.e. if the count was constrained by the module
parameter).
Update VNICs usage to use the configured count, rather than the hardware
count.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Gary Leshner <gary.s.leshner@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A CA is supposed to ignore FECN bits in multicast, ACK, and CNP
packets. This patch corrects the behavior of the HFI1 driver in this
regard by ignoring FECNs in those packet types.
While fixing the above behavior, fix the extraction of the FECN and BECN
bits from the packet headers for both 9B and 16B packets.
Furthermore, this patch corrects the driver's response to a FECN in RDMA
READ RESPONSE packets. Instead of sending an "empty" ACK, the driver now
sends a CNP packet. While editing that code path, add the missing trace
for CNP packets.
Fixes: 88733e3b84 ("IB/hfi1: Add 16B UD support")
Fixes: f59fb9e051 ("IB/hfi1: Fix handling of FECN marked multicast packet")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When it is requested to change its physical state back to Offline while in
the process to go up, DC8051 will set the ERROR field in the
DC8051_DBG_ERR_INFO_SET_BY_8051 register. This ERROR field will remain
until the next time when DC8051 transitions from Offline to Polling.
Subsequently, when the host requests DC8051 to change its physical state
to Polling again, it may receive a DC8051 interrupt with the stale ERROR
field still in DC8051_DBG_ERR_INFO_SET_BY_8051. If the host link state has
been changed to Polling, this stale ERROR will force the host to
transition to Offline state, resulting in a vicious cycle of Polling
->Offline->Polling->Offline. On the other hand, if the host link state is
still Offline when the stale ERROR is received, the stale ERROR will be
ignored, and the link will come up correctly. This patch implements the
correct behavior by changing host link state to Polling only after DC8051
changes its physical state to Polling.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Krzysztof Goreczny <krzysztof.goreczny@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch implements the process flow of SRQ asynchronous
event.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch implements the SRQ(Share Receive Queue) verbs
and update the poll cq verbs to deal with SRQ complentions.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch inits hem resource for SRQ table, includes
SRQWQE and SRQWQE index resource.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch configures the flags for enabling the
SRQ(Share Receive Queue) capacity as well as update the
verb of querying device for setting srq specifications.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Allows XRC usage from the verbs flow in a DEVX context.
As XRCD is some shared kernel resource between processes it should be
created with UID=0 to point on that.
As a result once XRC QP/SRQ are created they must be used as well with
UID=0 so that firmware will allow the XRCD usage.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Update the supported DEVX commands, it includes adding to the
query/modify command's list and to the encoding handling.
In addition, a valid range for general commands was added to be used for
future commands.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enforce DEVX privilege by firmware, this enables future device
functionality without the need to make driver changes unless a new
privilege type will be introduced.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enables modify and query verbs objects via the DEVX interface.
To support this the above DEVX handlers were changed to get any
object type via the UVERBS_IDR_ANY_OBJECT mechanism.
The type checking and handling is done per object as part of the
driver code.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The enhanced devx support series needs commit:
9d43faac02 ("net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits")
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is no need to perform modify_rmp in two separate function,
while one of them uses stack as a placeholder for data while other
allocates it dynamically. Combine those two functions to one call
instead of two.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
There is no need to perform create_rmp in two separate function, while
one of them uses stack as a placeholder for data while other allocates
it dynamically. Combine those two functions to one instead of two.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Transfer initialization and cleanup from mlx5_priv struct of
mlx5_core_dev to be part of mlx5_ib_dev. This completes removal
of SRQ from mlx5_core.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by
updating function signatures do not require mlx5_core_dev as an input,
because all operations in mlx5_ib are supposed to use mlx5_ib_dev.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reuse existing infrastructure to initialize and release DEVX uid.
The DevX interface is intended for user space access, so it is supposed
to be initialized before ib_register_device(). Also it isn't supported
in switchdev mode and don't need to initialize it in that mode.
Fixes: 76dc5a8406 ("IB/mlx5: Manage device uid for DEVX white list commands")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
SRQ signature is not supported, hence no need for special static
global variable to announce it.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
There is no need to keep SRQ which is RDMA object in mlx5_core.
In this patch, we partially move the execution code, while next patches
will move table initialization/release logic too.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Ensure that both RDMA and netdev parts of SRQ implementation
has same copyright and license information annotated by SPDX
tags.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Since any page fault may be interrupted by a MMU invalidation and implicit
leaf MR may be released during this process. The check for parent value
is unreliable condition for an implicit MR.
Use other condition that we can rely on to determine if MR is implicit.
Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A recent performance enhancement introduced a latency issue in the
HFI message path. The new algorithm removed a forced call send for
PIO messages and added a forced schedule event for messages larger
than the MTU.
For PIO, the schedule path can introduce thrashing that can
significantly impact the throughput for small messages.
If a message size is within the PIO threshold, always take the send
path.
Fixes: 0b79b27748 ("IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Handle FW general event rq delay drop as it was received from FW via mlx5
notifiers API, instead of handling the processed software version of that
event. After this patch we can safely remove all software processed FW
events types and definitions.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use the FW version of the port change event as forwarded via new mlx5
notifiers API.
After this patch, processed software version of the port change event
will become deprecated and will be totally removed in downstream
patches.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Remove the deprecated mlx5_interface->event mlx5_ib callback and use new
mlx5 notifier API to subscribe for mlx5 events.
For native mlx5_ib devices profiles pf_profile/nic_rep_profile register
the notifier callback mlx5_ib_handle_event which treats the notifier
context as mlx5_ib_dev.
For vport repesentors, don't register any notifier, same as before, they
didn't receive any mlx5 events.
For slave port (mlx5_ib_multiport_info) register a different notifier
callback mlx5_ib_event_slave_port, which knows that the event is coming
for mlx5_ib_multiport_info and prepares the event job accordingly.
Before this on the event handler work we had to ask mlx5_core if this is
a slave port mlx5_core_is_mp_slave(work->dev), now it is not needed
anymore.
mlx5_ib_multiport_info notifier registration is done on
mlx5_ib_bind_slave_port and de-registration is done on
mlx5_ib_unbind_slave_port.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current implementation of create QP requires contiguous memory, such a
requirement is problematic once the memory is fragmented or the system is
low in memory, it causes failures in dma_zalloc_coherent().
This patch takes advantage of the new mlx5_core API which allocates a
fragmented buffer. This makes the QP creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.
We also use the opportunity to fix some cosmetic legacy coding convention
errors which were in the feature scope.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The current implementation of create SRQ requires contiguous memory, such
a requirement is problematic once the memory is fragmented or the system
is low in memory, it causes failures in dma_zalloc_coherent().
This patch takes the advantage of the new mlx5_core API which allocates a
fragmented buffer, and makes the SRQ creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Allow a user to attach a DEVX counter via mlx5 raw flow creation. In order
to attach a counter we introduce a new attribute:
MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX
A counter can be attached to multiple flow steering rules.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
QIB driver was added in 2010 with many BUG_ON(), most of them were cleaned
out after years of development and usages.
It looks like that it is safe now to remove rest of BUG_ONs.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a spelling mistake in a usnic_err error message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pagefaults occurred in non-ODP MR are completely valid events, so
initialize return variable to 0.
Fixes: 4d5422a309 ("IB/mlx5: Skip non-ODP MR when handling a page fault")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The uverbs_attr_bundle already contains this pointer, and most methods
don't actually need it. Get rid of the redundant function argument.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Now that we can add meta-data to the description of write() methods we
need to pass the uverbs_attr_bundle into all write based handlers so
future patches can use it as a container for any new data transferred out
of the core.
This is the first step to bringing the write() and ioctl() methods to a
common interface signature.
This is a simple search/replace, and we push the attr down into the uobj
and other APIs to keep changes minimal.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Memory windows are implemented with an indirect MKey, when a page fault
event comes for a MW Mkey we need to find the MR at the end of the list of
the indirect MKeys by iterating on all items from the first to the last.
The offset calculated during this process has to be zeroed after the first
iteration or the next iteration will start from a wrong address, resulting
incorrect ODP faulting behavior.
Fixes: db570d7dea ("IB/mlx5: Add ODP support to MW")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>