num_user_pages() function has been defined in both user_exp_rcv.c file
and user_sdma.c file. Move the function definition to a header file so
there is only one definition in the source repo.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In pin_vector_pages() function, if there is any error while pinning the
pages or while adding a pinned buffer to the cache, the bail out code
needs to unpin any pinned pages that are not in the cache and adjust the
n_locked counter that counts the total pages pinned. The current bail
out code doesn't seem to be doing it right in two cases:
1. Before pinning required pages for a buffer, the SDMA pinned buffer
cache is searched to see if the virtual address range that needs to be
pinned is already pinned. If there isn't a hit in the cache, a new node
is created for the buffer and is added to the cache after the buffer is
pinned. If adding the new node to the cache fails, the n_locked count is
decremented properly but the pinned pages are not freed. This commit
fixes this issue.
2. If there is a hit in the SDMA cache, but the cached buffer doesn't
have enough pages to cover the entire address range that needs to be
pinned, the node for the cached buffer is extracted from the cache,
remaining pages needed are pinned and added to the node. The node is
finally added back into the cache. If there is an error pinning the
extra pages, the bail out code frees all the pages in the node but the
n_locked count is not being decremented by the no of pages in the node
that are freed. This commit fixes this issue.
This commit fixes the above two issues by creating a new function that
frees the pages in a node and decrements the n_locked count by the
number of pages freed.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Clean up hfi1_user_exp_rcv_setup function by moving page pinning and
unpinning related code to separate functions. In order to reduce the
number of parameters passed between functions, a new data structure
struct tid_user_buf is defined and used.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Performance analysis shows that the cache callback function
sdma_kmem_cache_ctor contributes to 1/2 of the kmem_cache_allocs
time.
Since all of the fields in the allocated data structure are initialized
in the code path, remove the _ctor function.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Section 9.7.7.2.5 of the 1.3 IBTA spec clearly says that receive
credits should never apply to RDMA write.
qib and hfi1 were doing that. The following situation will result
in a QP hang:
- A prior SEND or RDMA_WRITE with immmediate consumed the last
credit for a QP using RC receive buffer credits
- The prior op is acked so there are no more acks
- The peer ULP fails to post receive for some reason
- An RDMA write sees that the credits are exhausted and waits
- The peer ULP posts receive buffers
- The ULP posts a send or RDMA write that will be hung
The fix is to avoid the credit test for the RDMA write operation.
Cc: <stable@vger.kernel.org>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove HFI1_VERBS_31BIT_PSN Kconfig option leaving only 31-bit PSNs
available. The option was implemented in the early days of the driver
and is no longer needed.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Morys <grzegorz.morys@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Do not track physical state separately from host_link_state.
Deduce physical state from host_link_state when required.
Change cache_physical_state to log_physical_state to make
sure host_link_state reflects hardwares physical state properly.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently fallback configuration is loaded once per driver instance.
With multiple HFI devices in the same system the current code may not
load the platform config data for the device. Change fallback platform
config data loading to be per device.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add flag in pport data structure to determine when platform config was
read from scratch registers. Change conditions in parse_platform_config
and get_platform_config_field to use the new flag.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
PIO/SDMA send logic now uses the hdr_type field to determine
the type of packet that has been constructed. Based on the hdr_type,
certain things such as PBC flags, padding count and the LT extra
trailing bytes are determined.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We introduce a struct hfi1_16b_header to support 16B headers.
16B bypass packets are received by the driver and processed
similar to 9B packets. Add basic support to handle 16B packets.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a mixture of mutex and spinlocks to protect receive context
(rcd/uctxt) information. This is not used consistently.
Use the mutex to protect device receive context information only.
Use the spinlock to protect sub context information only.
Protect access to items in the rcd array with a spinlock and
reference count.
Remove spinlock around dd->rcd array cleanup. Since interrupts are
disabled and cleaned up before this point, this lock is not useful.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The rcd array can be accessed from user context or during interrupts.
Protecting this with a mutex isn't a good idea because the mutex should
not be used from an IRQ.
Protect the allocation and freeing of rcd array elements with a
spinlock.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When DC is shut down (by e.g. disconnecting the cable), the
driver should use host_link_state to get port's current
physical state. This is due to the fact that physical state
is read from DC's CSRs and when DC is shut down and state is
changed, its registers are not impacted.
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Do not track logical state separately from host_link_state. Deduce
logical state from host_link_state when required. Transitions in
set_link_state and goto_offline already make sure host_link_state
reflects hardware's logical state properly.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch series primarily increases sizes of variables that hold
lid values from 16 to 32 bits. Additionally, it adds a check in
the IB mad stack to verify a properly formatted MAD when OPA
extended LIDs are used.
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
drivers/infiniband/core/iwcm.c - The rdma_netlink patches in
HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM,
we aren't safe for that context) touched the same code.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add const to bin_attribute structures as they are only passed to the
functions sysfs_{remove/create}_bin_file. The arguments passed are of
type const, so declare the structures to be const.
Done using Coccinelle.
@m disable optional_qualifier@
identifier s;
position p;
@@
static struct bin_attribute s@p={...};
@okay1@
position p;
identifier m.s;
@@
(
sysfs_create_bin_file(...,&s@p,...)
|
sysfs_remove_bin_file(...,&s@p,...)
)
@bad@
position p!={m.p,okay1.p};
identifier m.s;
@@
s@p
@change depends on !bad disable optional_qualifier@
identifier m.s;
@@
static
+const
struct bin_attribute s={...};
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
include/rdma/ib_verbs.h - Modified a function signature adjacent
to a newly added function signature from a previous merge
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Both add new code
include/rdma/ib_verbs.h - Both add new code
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a need to forward FW version to user space
application through RDMA netlink. In order to make it safe, there
is need to declare nla_policy and limit the size of FW string.
The new define IB_FW_VERSION_NAME_MAX will limit the size of
FW version string. That define was chosen to be equal to
ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
by that value indirectly.
The introduction of this define allows us to remove the string size
from get_fw_str function signature.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
slid field in struct ib_wc is increased to 32 bits.
This enables core components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Always initiate an offline transition request
when a link down occurs. The firmware will
use this request to confirm that the driver
has seen the link down message. A host version
is set to indicate this driver behavior to the
firmware.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When link interrupts occur, multiple link down requests
could be queued up when only one is needed. This could get
the hfi1 out of sync with its link partner during LNI.
Only allow one link down request to be queued at any one time.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The allocate_ctxt() function adds the context to the fd data structure.
Since the context is not completely initialized, this can cause confusion
as to whether the context is valid or not.
Move the fd reference from allocate_ctxt() to setup_base_ctxt().
Update the necessary functions to be aware of this move.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a window that allows other threads to read state of
'host_link_state' as a new, before the hardware actual state is set.
This patch closes the window by indicating a new state only after
hardware transition is complete.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A copy_to_user() call assumes that two members of a data structure
are sequential. Since this may not always be true, separate the copies
to ensure a safe copy.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>