The AHG index is only accessed in the request call
from user space, so there's no need for atomic semantics.
Replace atomic operations for SDMA_REQ_HAVE_AHG bit
with a test of the AHG index.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The only context that frees user_exp_rcv data structures is the last
context closed (from a sub-context set). This leaks the allocations
from the other sub-contexts. Separate the common frees from the
specific frees and call them at the appropriate time.
Using KEDR to check for memory leaks we get:
Before test:
[leak_check] Possible leaks: 25
After test:
[leak_check] Possible leaks: 31 (6 leaked data structures)
After patch applied (before and after test have the same value)
[leak_check] Possible leaks: 25
Each leak is 192 + 13440 + 6720 = 20352 bytes per sub-context.
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Performance analysis shows benefits for PSM2 in increasing eager buffer
size from 2MB to 8MB. The change has neutral impact on verbs.
Make change to the module parameter's default value. Allocation
ring down was verified to work with the larger buffer size.
Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Div instructions show costly in profiles when
the tx request header is set. Using right shift
instead of a divide operation reduces the cycles
spent in the function that sets the tx request
header as shown in the profile. Use right shift
operation instead.
Profile before change:
43.24% 009
|
|--23.41%-- user_sdma_send_pkts
| |
| |--99.90%-- hfi1_user_sdma_process_requestAfter:
Profile after change:
45.75% 009
|
|--14.81%-- user_sdma_send_pkts
| |
| |--99.95%-- hfi1_user_sdma_process_request
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When there are many RC QPs and an RDMA READ request
is sent, timeouts occur on the requester side because
of fairness among RC QPs on their relative SDMA engine
on the responder side. This also hits write and send, but
to a lesser extent.
Complicating the issue is that the current code checks if workqueue
is congested before scheduling other QPs, however, this
check is based on the number of active entries in the
workqueue, which was found to be too big to for
workqueue_congested() to be effective.
Fix by reducing the number of active entries as revealed by
experimentation from the default of num_sdma to
HFI1_MAX_ACTIVE_WORKQUEUE_ENTRIES. Retry counts were monitored
to determine the correct value.
Tracing to investigate any future issues is also added.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This field is causing excessive cache line bouncing.
There are spare bytes in the r_lock cache line so the best approach
is to make an rvt QP field and remove from the hfi1 priv field.
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enable mlx5 IPoIB acceleration by declaring
mlx5_ib_{alloc,free}_rdma_netdev and assigning the mlx5
IPoIB rdma_netdev callbacks.
In addition, this patch brings in sync mlx5's IPoIB parts for net and IB
trees. As a precaution, we disabled IPoIB acceleration by default (in
the mlx5_core Kconfig file).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add port_xmit_wait to the error counters read by mlx5_ib_process_mad to
ensure sysfs port counter provides correct value for PortXmitWait.
Otherwise the sysfs port_xmit_wait file always contains zero.
The previous MAD_IFC implementation populated this counter, but it was
removed during the migration to PPCNT for error counters (32-bit only).
Signed-off-by: Tim Wright <tim@binbash.co.uk>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In write to debugfs file 'resource_stats' the local buffer 'tmp_str' is
written at index 'count-1' where 'count' is the size of the write, so
potentially 0.
This patch filters odd values for the write size/position to avoid this
type of problem.
Signed-off-by: Michael Mera <dev@michaelmera.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The last two actual parameters when calling id_map_find_by_sl_id()
from id_map_get() are swapped. However, the same formal parameters to
id_map_get() have them swapped as well, inverting the effect of the
first error.
This commit improves readability, but makes no functional change to
the code.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The include file does not need any PCI specifics, so remove
that include. Also fix the places that relied on it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Both opa_vnic and the hfi driver use the same opa_classport_info
definition. We will also have ib_sa capable of querying opa class
port info and would need this definition. Move it to ib_mad.h
for everyone to use.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The process_ecn intends to return a bool value. However it is doing
so incorrectly by ANDing the fecn mask. The fecn bit is bit 31. Bool is
not a native data type and is up to the compiler to implement how it
sees fit. It is conceivable that this upper bit gets washed out.
Fix by converting to a bool properly.
Cc: stable@vger.kernel.org
Fixes: Commit fd2b562edca6 ("IB/hfi1: Pull FECN/BECN processing to a common place")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add missing braces around else blocks in a few places to make checkpatch
happy.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
According to checkpatch %Lx is not standard C so remove it and use the
suggested %llx.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Checkpatch flagged a misspelled word. Fix it.
Fixes: 8764522e52 ("staging/rdma/hfi1: Unexpected link up pkey values are not an error")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ingress and egress port P_Key checking should always be performed for
HFIs. This patch will enable ingress and egress P_Key checking when
the port is initialized and will ignore the P_Key information sent by
the FM in the port info structure which is meant to be used only by the
switch.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Neel Desai <neel.desai@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Secure data is transferred across the link during verify
cap. This includes Neighbor Guid, Type, and Port Number.
This transfer is not guaranteed to complete until the 8051
firmware has completed processing of the state_complete
frame. Move the consumption of this data from verify cap
handling to link up handling to ensure the data is finalized.
Additionally, do not notify the SM that the link is up until
after this data is actually available.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Stuart Summers <john.s.summers@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver progress routines can call cond_resched() when
a timeslice is exhausted and irqs are enabled.
If the ULP had been holding a spin lock without disabling irqs and
the post send directly called the progress routine, the cond_resched()
could yield allowing another thread from the same ULP to deadlock
on that same lock.
Correct by replacing the current hfi1_do_send() calldown with a unique
one for post send and adding an argument to hfi1_do_send() to indicate
that the send engine is running in a thread. If the routine is not
running in a thread, avoid calling cond_resched().
CC: <stable@vger.kernel.org> # 4.7.x-
Fixes: Commit 831464ce4b ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
VL15 in the SC2VL table is used to indicate an invalid SC
for the FM, however, internally the driver remaps SCs from
VL15 to ILLEGAL_VL to prevent error counts. This mapping
confuses the FM when performing a sweep, making it return
a table mismatch error. Have SMA convert ILLEGAL_VL
to VL15 entries for the SC2VL table queries.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).
The current code only uses the MGID for identifying multicast groups.
Update the driver to be compliant with this definition.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The FM uses the values of MulticastMask and CollectiveMask to
determine the number of bits for net masks. The current values of
0 and 0 are incorrect. The values should be 4 and 1. Updated the
necessary code to reflect the specified values.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use setup_timer() instead of init_timer() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use setup_timer() instead of init_timer() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use setup_timer() instead of init_timer() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>