PCI: shpchp: Use per-slot workqueues to avoid deadlock

When we have an SHPC-capable bridge with a second SHPC-capable bridge
below it, pushing the upstream bridge's attention button causes a
deadlock.

The deadlock happens because we use the shpchp_wq workqueue to run
shpchp_pushbutton_thread(), which uses shpchp_disable_slot() to remove
devices below the upstream bridge.  When we remove the downstream bridge,
we call shpc_remove(), the shpchp driver's .remove() method.  That calls
flush_workqueue(shpchp_wq), which deadlocks because the
shpchp_pushbutton_thread() work item is still running.

This patch avoids the deadlock by creating a workqueue for every slot
and removing the single shared workqueue.

Here's the call path that leads to the deadlock:

  shpchp_queue_pushbutton_work
    queue_work(shpchp_wq)		# shpchp_pushbutton_thread
    ...

  shpchp_pushbutton_thread
    shpchp_disable_slot
      remove_board
        shpchp_unconfigure_device
          pci_stop_and_remove_bus_device
            ...
              shpc_remove		# shpchp driver .remove method
                hpc_release_ctlr
                  cleanup_slots
                    flush_workqueue(shpchp_wq)

This change is based on code inspection, since we don't have hardware
with this topology.

Based-on-patch-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org
This commit is contained in:
Bjorn Helgaas
2013-01-11 12:21:15 -07:00
parent d347e75847
commit f652e7d291
3 changed files with 18 additions and 16 deletions

View File

@@ -46,7 +46,6 @@
extern bool shpchp_poll_mode;
extern int shpchp_poll_time;
extern bool shpchp_debug;
extern struct workqueue_struct *shpchp_wq;
#define dbg(format, arg...) \
do { \
@@ -90,6 +89,7 @@ struct slot {
struct list_head slot_list;
struct delayed_work work; /* work for button event */
struct mutex lock;
struct workqueue_struct *wq;
u8 hp_slot;
};