IB/mthca: Fix race in reference counting

Fix races in in destroying various objects.  If a destroy routine
waits for an object to become free by doing

	wait_event(&obj->wait, !atomic_read(&obj->refcount));
	/* now clean up and destroy the object */

and another place drops a reference to the object by doing

	if (atomic_dec_and_test(&obj->refcount))
		wake_up(&obj->wait);

then this is susceptible to a race where the wait_event() and final
freeing of the object occur between the atomic_dec_and_test() and the
wake_up().  And this is a use-after-free, since wake_up() will be
called on part of the already-freed object.

Fix this in mthca by replacing the atomic_t refcounts with plain old
integers protected by a spinlock.  This makes it possible to do the
decrement of the reference count and the wake_up() so that it appears
as a single atomic operation to the code waiting on the wait queue.

While touching this code, also simplify mthca_cq_clean(): the CQ being
cleaned cannot go away, because it still has a QP attached to it.  So
there's no reason to be paranoid and look up the CQ by number; it's
perfectly safe to use the pointer that the callers already have.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
This commit is contained in:
Roland Dreier
2006-05-09 10:50:29 -07:00
parent d945e1df28
commit a3285aa4ee
5 changed files with 74 additions and 45 deletions

View File

@@ -139,11 +139,12 @@ struct mthca_ah {
* a qp may be locked, with the send cq locked first. No other
* nesting should be done.
*
* Each struct mthca_cq/qp also has an atomic_t ref count. The
* pointer from the cq/qp_table to the struct counts as one reference.
* This reference also is good for access through the consumer API, so
* modifying the CQ/QP etc doesn't need to take another reference.
* Access because of a completion being polled does need a reference.
* Each struct mthca_cq/qp also has an ref count, protected by the
* corresponding table lock. The pointer from the cq/qp_table to the
* struct counts as one reference. This reference also is good for
* access through the consumer API, so modifying the CQ/QP etc doesn't
* need to take another reference. Access to a QP because of a
* completion being polled does not need a reference either.
*
* Finally, each struct mthca_cq/qp has a wait_queue_head_t for the
* destroy function to sleep on.
@@ -159,8 +160,9 @@ struct mthca_ah {
* - decrement ref count; if zero, wake up waiters
*
* To destroy a CQ/QP, we can do the following:
* - lock cq/qp_table, remove pointer, unlock cq/qp_table lock
* - decrement ref count
* - lock cq/qp_table
* - remove pointer and decrement ref count
* - unlock cq/qp_table lock
* - wait_event until ref count is zero
*
* It is the consumer's responsibilty to make sure that no QP
@@ -197,7 +199,7 @@ struct mthca_cq_resize {
struct mthca_cq {
struct ib_cq ibcq;
spinlock_t lock;
atomic_t refcount;
int refcount;
int cqn;
u32 cons_index;
struct mthca_cq_buf buf;
@@ -217,7 +219,7 @@ struct mthca_cq {
struct mthca_srq {
struct ib_srq ibsrq;
spinlock_t lock;
atomic_t refcount;
int refcount;
int srqn;
int max;
int max_gs;
@@ -254,7 +256,7 @@ struct mthca_wq {
struct mthca_qp {
struct ib_qp ibqp;
atomic_t refcount;
int refcount;
u32 qpn;
int is_direct;
u8 port; /* for SQP and memfree use only */