Commit Graph

4139 Commits

Author SHA1 Message Date
Don Wood
873fcdd4bf RDMA/nes: Allocate work item for disconnect event handling
The code currently has a work structure in the QP.  This requires a
lock and a pending flag to ensure there is never more than one request
active.  When two events happen quickly (such as FIN and LLP CLOSE),
it causes unnecessary timeouts since the second one is dropped.

This fix allocates memory for the work request so the second one can
be queued.  A lock is removed since it is no longer needed.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:37 -07:00
Don Wood
c4c3f279cd RDMA/nes: Update refcnt during disconnect
During termination, it is possible for the refcnt to go to zero while
the worker thread is posting events upward.  This fix increments the
refcnt before the request is passed to the worker thread.  The thread
decrements the refcnt when the request is completed.

Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:36 -07:00
Jack Morgenstein
d841064777 IB/mthca: Don't allow userspace open while recovering from catastrophic error
Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.

However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event.  In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.

This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:16 -07:00
Arputham Benjamin
d94a868901 IB/mthca: Distinguish multiple devices in /proc/interrupts
When the mthca driver uses the same name for interrupts for every
device in the system.  This can make it very confusing trying to work
out exactly which device MSI-X interrupts are for.  Change the driver
to add the PCI name of the device to the interrupt name.

Signed-off-by: Arputham Benjamin <abenjamin@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:15 -07:00
Roland Dreier
ffe063f32b IB/mthca: Annotate CQ locking
mthca_ib_lock_cqs()/mthca_ib_unlock_cqs() are helper functions that
lock/unlock both CQs attached to a QP in the proper order to avoid
AB-BA deadlocks.  Annotate this so sparse can understand what's going
on (and warn us if we misuse these functions).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:15 -07:00
Roland Dreier
deecb5d672 IB/mthca: Remove unnecessary include of <linux/init.h>
mthca_reset.c doesn't have any function annotations, so there's no
reason to include <linux/init.h>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:15 -07:00
Roland Dreier
fc1285585f IB/mthca: Remove unnecessary include of <asm/page.h>
mthca_config_reg.h was including <asm/page.h> for no reason -- the whole
file is just defines of constants, so it's entirely self-contained.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:36:13 -07:00
Jack Morgenstein
3b4a8cd51e IB/mlx4: Don't allow userspace open while recovering from catastrophic error
Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.

However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event.  In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.

This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:50 -07:00
Roland Dreier
338a8fad27 IB/mlx4: Annotate CQ locking
mlx4_ib_lock_cqs()/mlx4_ib_unlock_cqs() are helper functions that
lock/unlock both CQs attached to a QP in the proper order to avoid
AB-BA deadlocks.  Annotate this so sparse can understand what's going
on (and warn us if we misuse these functions).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:49 -07:00
Roel Kluin
1493ab4083 RDMA/amso1100: Check kmalloc() result in c2_register_device()
dev->ibdev.iwcm allocation may fail, prevent a dereference.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Marcin Slusarz
f1aa78b26e IB: Use printk_once() for driver versions
Replace open-coded reimplementations with printk_once().

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Tobias Klauser
181c74e87e RDMA/amso1100: Use %pM conversion specifier
Use the %pM conversion specifier to print a MAC address.

Signed-off-by: Tobias Klauser <klto@zhaw.ch>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:23 -07:00
Roel Kluin
286b63d096 IB/ipath: strncpy() doesn't always NUL-terminate
strlcpy() will always null terminate the string.  node_desc is not
guaranteed to be NUL-terminated so just use memcpy().

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:21 -07:00
Joachim Fenkes
6303e74c69 IB/ehca: Fix CQE flags reporting
The driver was reporting CQE flags in the wrong bit positions, causing
consumers to miss incoming immediate data.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:55 -07:00
Joachim Fenkes
d706834d99 IB/ehca: Construct MAD redirect replies from request MAD
The old code used a lot of hard-coded values, which might not be valid
in all environments (especially routed fabrics or partitioned
subnets).  Copy as much information as possible from the incoming
request to correct that.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:55 -07:00
Alexander Schmidt
50d40b8e53 IB/ehca: Make port autodetect mode the default
Make port autodetect mode the default for the ehca driver. The
autodetect code has been in the kernel for several releases now and
has proved to be stable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:54 -07:00
Steve Wise
a52bf98d99 RDMA/cxgb3: Wake up any waiters on peer close/abort
A close/abort while waiting for a wr_ack during connection migration
can cause a hung process in iwch_accept_cr/iwch_reject_cr.

The fix is to set rpl_error/rpl_done and wake up the waiters when we
get a close/abort while in MPA_REQ_RCVD state.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:38 -07:00
Steve Wise
6e47fe4350 RDMA/cxgb3: Don't free endpoints early
- Keep ref on connection request endpoints until either accepted or
  rejected so it doesn't get freed early.

- Endpoint flags now need to be set via atomic bitops because they can
  be set on both the iw_cxgb3 workqueue thread and user disconnect
  threads.

- Don't move out of CLOSING too early due to multiple calls to
  iwch_ep_disconnect.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:38 -07:00
Steve Wise
fa0d4c11c4 RDMA/cxgb3: Handle port events properly
Massage the err_handler upcall into an event handler upcall, pass
netdev port events to the cxgb3 ULPs and generate RDMA port events
based on LLD port events.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:38 -07:00
Steve Wise
b496fe82d4 RDMA/cxgb3: Set the appropriate IO channel in rdma_init work requests
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:37 -07:00
Steve Wise
3793d2fc3e RDMA/cxgb3: iwch_unregister_device leaks memory
The iwcm struct mem is never freed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:22:36 -07:00
Eric Dumazet
451f144398 drivers: Kill now superfluous ->last_rx stores
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:07:36 -07:00
Stephen Hemminger
0fc0b732ea netdev: drivers should make ethtool_ops const
No need to put ethtool_ops in data, they should be const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:33 -07:00
Roland Dreier
4a7eca824c Merge branches 'ehca', 'misc', 'mlx4', 'mthca' and 'nes' into for-linus 2009-06-23 10:38:47 -07:00
Alexander Schmidt
1d4d6da535 IB/ehca: Bump version number
Increment version number for DMEM toleration.

Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-23 10:30:04 -07:00
Roland Dreier
99987bea47 IB/mthca: Replace dma_sync_single() use with proper functions
dma_sync_single() is deprecated now, and the use in mthca is wrong:
there should be a dma_sync_single_for_cpu() before touching the memory
from the CPU, and a dma_sync_single_for_device() afterwards.  Fix
this, prompted by a kick in the pants from a patch from FUJITA
Tomonori <fujita.tomonori@lab.ntt.co.jp>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 23:04:13 -07:00
Faisal Latif
68237a0ff8 RDMA/nes: Fix FIN state handling under error conditions
During cluster testing, one QP was not closed, as FIN is not handled
properly when its rexmit count expires or in some cases when RST is is
received after sending FIN.  The reason is that the cm_id does not get
decremented under these conditions.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 22:53:28 -07:00
Faisal Latif
66388d67a0 RDMA/nes: Fix max_qp_init_rd_atom returned from query device
In nes_query_device(), max_qp_init_rd_atom is incorrectly set to
max_qp_wr.  This was found when a test application had a dapl async
event error.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 22:52:30 -07:00
Roel Kluin
af04662b4d IB/ehca: Ensure that guid_entry index is not negative
This prevents the memcpy() of a guid_entries element using a negative index.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 22:23:48 -07:00
Hannes Hering
0cf89dcdbc IB/ehca: Tolerate dynamic memory operations before driver load
Implement toleration of dynamic memory operations and 16 GB gigantic
pages, where "toleration" means that the driver can cope with dynamic
memory operations that happen before the driver is loaded.  While the
ehca driver is loaded, dynamic memory operations are still prohibited
by returning NOTIFY_BAD from the memory notifier.

On module load the driver walks through available system memory,
checks for available memory ranges and then registers the kernel
internal memory region accordingly.  The translation of address ranges
is implemented via a 3-level busmap.

Signed-off-by: Hannes Hering <hering2@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-22 22:18:51 -07:00
Greg Kroah-Hartman
f899c2ddd4 infiniband: ehca: remove driver_data direct access of struct device
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device.  Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used.  These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.

Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: general@lists.openfabrics.org
Cc: Christoph Raisch <raisch@de.ibm.com>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15 21:30:27 -07:00
Roland Dreier
8d34ff3401 Merge branches 'cxgb3', 'ehca', 'misc', 'mlx4', 'mthca' and 'nes' into for-linus 2009-06-14 13:31:19 -07:00
Roland Dreier
9aa0a489d9 IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx
When both MSI-X and legacy INTx fail to generate an interrupt, the
driver frees the MSI-X interrupts twice.  Fix this by clearing the
have_irq flag for the MSI-X interrupts when they are freed the first
time.

Reported-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-13 15:14:09 -07:00
Jack Morgenstein
2ac6bf4ddc IB/mlx4: Add strong ordering to local inval and fast reg work requests
The ConnectX Programmer's Reference Manual states that the "SO" bit
must be set when posting Fast Register and Local Invalidate send work
requests.  When this bit is set, the work request will be executed
only after all previous work requests on the send queue have been
executed.  (If the bit is not set, Fast Register and Local Invalidate
WQEs may begin execution too early, which violates the defined
semantics for these operations)

This fixes the issue with NFS/RDMA reported in
<http://lists.openfabrics.org/pipermail/general/2009-April/059253.html>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-05 10:36:24 -07:00
Joachim Fenkes
25a5239327 IB/ehca: Remove superfluous bitmasks from QP control block
All the fields in the control block are nicely right-aligned, so no
masking is necessary.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-03 13:25:42 -07:00
Steve Wise
3026c19a14 RDMA/cxgb3: Limit fast register size based on T3 limitations
T3 firmware only supports one WRs worth of page list for fast register
work requests.  The driver currently allows 2 WRs worth, which
doesn't work for T3, so reduce the limit in the driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-27 14:43:39 -07:00
Steve Wise
7ab1a2b31d RDMA/cxgb3: Report correct port state and MTU
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-27 14:42:36 -07:00
Eli Cohen
c1f67a88bf IB/mthca: Add module parameter for number of MTTs per segment
The current MTT allocator uses kmalloc() to allocate a buffer for its
buddy allocator, and thus is limited in the amount of MTT segments
that it can control.  As a result, the size of memory that can be
registered is limited too.  This patch uses a module parameter to
control the number of MTT entries that each segment represents,
allowing more memory to be registered with the same number of
segments.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-27 14:36:16 -07:00
Roel Kluin
28e43a519b RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()
With a postfix increment, i is incremented one past 10K/5K before the
loop ends, so the error messages will be displayed too soon if the
test succeeds on the last iteration.  Fix the comparisons to be >
instead of >=.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-15 10:16:45 -07:00
Jack Stone
5b891a9332 infiniband: Remove void casts
Remove uneeded casts of void *.

Signed-off-by: Jack Stone <jwjstone@fastmail.fm>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-13 16:53:39 -07:00
Stefan Roscher
bde2cfaf8f IB/ehca: Increment version number
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-13 16:52:43 -07:00
Stefan Roscher
1988d1fa1a IB/ehca: Remove unnecessary memory operations for userspace queue pairs
The queue map for flush completion circumvention is only used for
kernel space queue pairs.  This patch skips the allocation of the
queue maps in case the QP is created for userspace.  In addition, this
patch does not iomap the galpas for kernel usage if the queue pair is
only used in userspace.  These changes will improve the performance of
creation of userspace queue pairs.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-13 16:52:43 -07:00
Stefan Roscher
c94f156f63 IB/ehca: Fall back to vmalloc() for big allocations
In case of large queue pairs there is the possibillity of allocation
failures due to memory fragmentation when using kmalloc().  To ensure
the memory is allocated even if kmalloc() can not find chunks which
are big enough, we fall back to allocating the memory with vmalloc().

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-13 16:52:42 -07:00
Anton Blanchard
bf31a1a02e IB/ehca: Replace vmalloc() with kmalloc() for queue allocation
To improve performance of driver resource allocation, replace
vmalloc() calls with kmalloc().

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-13 16:52:40 -07:00
Linus Torvalds
c98861f7de Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Don't overwrite fast registration page list when posting work request
  RDMA/cxgb3: Don't complete flushed send work requests twice
2009-05-13 16:31:12 -07:00
Roland Dreier
8be741b0ac Merge branches 'cxgb3' and 'mlx4' into for-linus 2009-05-13 15:16:17 -07:00
Al Viro
265e771e81 Fix deadlock in ipathfs ->get_sb()
forgot to unlock superblock before calling deactivate_super()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-05-09 10:49:40 -04:00
Jack Morgenstein
2b6b7d4be4 IB/mlx4: Don't overwrite fast registration page list when posting work request
The low-level mlx4 driver modified the page-list addresses for fast
register work requests post send to big-endian, and set a "present"
bit.  This caused problems later when the consumer attempted to unmap
the pages using the page-list (using the list addresses which were
assumed to be still in CPU-endian order).  Fix the mlx4 driver to
allocate two buffers and use a private buffer for the hardware-format
bus addresses.

This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1571>,
an NFS/RDMA server crash.  The cause of the crash was found by Vu Pham
of Mellanox.  The fix is along the lines suggested by Steve Wise in
comment #21 in bug 1571.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-05-07 21:35:13 -07:00
Steve Wise
ec6995ddaa RDMA/cxgb3: Don't complete flushed send work requests twice
When the SQ is flushed, mark the flushed entries as not signaled so
the poll logic doesn't re-insert the CQ entry thinking its an out of
order completion.

The bug can cause the NFS/RDMA server to crash due to processing the
same completed work request twice.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-04-29 15:15:59 -07:00
Roland Dreier
9308f96c79 Merge branches 'cxgb3', 'ipoib', 'mthca', 'mlx4' and 'nes' into for-linus 2009-04-28 16:01:31 -07:00