The mlx5 VF driver doesn't set QP tx port affinity because it doesn't know
if the lag is active or not, since the "lag_active" works only for PF
interfaces. In this case for VF interfaces only one lag is used which
brings performance issue.
Add a lag_tx_port_affinity CAP bit; When it is enabled and
"num_lag_ports > 1", then driver always set QP tx affinity, regardless
of lag state.
Link: https://lore.kernel.org/r/20200527055014.355093-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
pin_user_pages_fast() doesn't need that from its caller.
NB: only reachable from ->ioctl(), and only under USER_DS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The most common way to set ECE option will be during modify QP command in
INIT2RTR, RTR2RTS and RTS2RTS stages, so update mlx5 to support it.
The new bit in the comp_mask is needed to mark that kernel supports ECE
and can receive data instead of "reserved" field in the struct
mlx5_ib_modify_qp.
Link: https://lore.kernel.org/r/20200526115440.205922-8-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
From the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Required for dependencies in following patches
* branch 'mellanox/mlx5-next':
net/mlx5: Add ability to read and write ECE options
net/mlx5: Add support for RDMA TX FT headers modifying
net/mlx5: Move iseg access helper routines close to mlx5_core driver
net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
While creating a uobject every create reaches a point where the uobject is
fully initialized. For ioctls that go on to copy_to_user this means they
need to open code the destruction of a fully created uobject - ie the
RDMA_REMOVE_DESTROY sort of flow.
Open coding this creates bugs, eg the CQ does not properly flush the
events list when it does its error unwind.
Provide a uverbs_finalize_uobj_create() function which indicates that the
uobject is fully initialized and that abort should call to destroy_hw to
destroy the uobj->object and related.
Methods can call this function if they go on to have error cases after
setting uobj->object. Once done those error cases can simply do return,
without an error unwind.
Link: https://lore.kernel.org/r/20200519072711.257271-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Linux 5.7-rc6
Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
resolved by deleting dr_cq_event, matching how netdev resolved it.
Required for dependencies in the following patches.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
As described in earlier patches, ipoib netdev will share receive
contexts with existing VNIC netdev through a dummy netdev. The
following changes are made to achieve that:
- Set up netdev receive contexts after user contexts. A function is
added to count the available netdev receive contexts.
- Add functions to set/get receive map table free index.
- Rename NUM_VNIC_MAP_ENTRIES as NUM_NETDEV_MAP_ENTRIES.
- Let the dummy netdev own the receive contexts instead of VNIC.
- Allocate the dummy netdev when the hfi1 device is added and free it
when the device is removed.
- Initialize AIP RSM rules when the IpoIb rxq is initialized and
remove the rules when it is de-initialized.
- Convert VNIC to use the dummy netdev.
Link: https://lore.kernel.org/r/20200511160649.173205.4626.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is implementation of RSM rule for AIP packets.
AIP rule will use rule RSM2 and will match standard
Infiniband packet containg BTH (LNH==BTH) and
having Dest QPN prefixed with value 0x81. Spread between
receive contexts will be done using source QPN bits.
VNIC and AIP will share receive contexts, so their rules
will point to the same RMT entries and their shared
code is moved to separate functions.
If any of the rules is active RMT mapping will be skipped
for latter.
Changed function hfi1_vnic_is_rsm_full to be more general
and moved it from main header to chip.c.
Changed the order of RSM rules because AIP rule as
more specific one is needed to be placed before more
general QOS rule. Rules are occupying two last RSM
registers.
Link: https://lore.kernel.org/r/20200511160612.173205.73002.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Adds capability to create a qpn to be recognized as an accelerated
UD QP for ipoib.
This is accomplished by reserving 0x81 in byte[0] of the qpn as the
prefix for these qp types and reserving qpns between 0x810000 and
0x81ffff.
The hfi1 capability mask already contained a flag for the VNIC netdev.
This has been renamed and extended to include both VNIC and ipoib.
The rvt code to allocate qps now recognizes this flag and sets 0x81
into byte[0] of the qpn.
The code to allocate qpns is modified to reset the qpn numbering when it
is detected that a value is located in byte[0] for a UD QP and it is a
qpn being requested for net dev use. If it is a regular UD QP then it is
allowable to have bits set in byte[0] of the qpn and provide the
previously normal behavior.
The code to free the qpn now checks for the AIP prefix value of 0x81 and
removes it from the qpn before being freed so that the lower 16 bit
number can be reused.
This patch requires minor changes in the IB core and ipoib to facilitate
the creation of accelerated UP QPs.
Link: https://lore.kernel.org/r/20200511160607.173205.11757.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This implements the transmit side of the multiple transmit queue RDMA
netdev used to accelerate ipoib. The receive side remains the ipoib
internal implementation.
The init/unint/open/stop netdev operations are saved off and called by the
versions within the hfi1 netdev in order to initialize the connected mode
resources present in ipoib thus allowing us to switch modes between
datagram and connected.
The datagram queue pair instantiated by the ipoib ulp is used by this
implementation for its queue pair number and to register with multicast.
The above queue pair is not used on transmit other than its qpn as the
verbs layer is skipped and packets are directly submitted to the sdma
engines.
Link: https://lore.kernel.org/r/20200511160554.173205.1369.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch implements the mechanism to accelerate the transmit side of
a multiple transmit queue RDMA netdev by submitting the packets to
the SDMA engine directly instead of sending through the verbs layer.
This patch also changes the UD/SEND_ONLY op to output the entropy value
in byte 0 of deth[1]. UD/SEND_ONLY_WITH_IMMEDIATE uses the previous
behavior with no entropy value being output.
The code in the ipoib rdma netdev which submits tx requests upon
successful submission will call trace_sdma_output_ibhdr to output
the ibhdr to the trace buffer.
Link: https://lore.kernel.org/r/20200511160548.173205.45616.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The host info feature allows the driver to infrom the EFA device
firmware with system configuration for debugging and troubleshooting
purposes.
The host info buffer is passed as an admin command DMA mapped control
buffer, and is unmapped and freed once the command CQE is consumed.
Currently, the setting of host info is done for each device on its
probe. Failing to set the host info for the device shall not disturb the
probe flow, any errors will be discarded.
Link: https://lore.kernel.org/r/20200512152204.93091-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>