rcu: Eliminate deadlock between CPU hotplug and expedited grace periods
Currently, the expedited grace-period primitives do get_online_cpus(). This greatly simplifies their implementation, but means that calls to them holding locks that are acquired by CPU-hotplug notifiers (to say nothing of calls to these primitives from CPU-hotplug notifiers) can deadlock. But this is starting to become inconvenient, as can be seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this case is that some developers need to acquire a mutex from a CPU-hotplug notifier, but also need to hold it across a synchronize_rcu_expedited(). As noted above, this currently results in deadlock. This commit avoids the deadlock and retains the simplicity by creating a try_get_online_cpus(), which returns false if the get_online_cpus() reference count could not immediately be incremented. If a call to try_get_online_cpus() returns true, the expedited primitives operate as before. If a call returns false, the expedited primitives fall back to normal grace-period operations. This falling back of course results in increased grace-period latency, but only during times when CPU hotplug operations are actually in flight. The effect should therefore be negligible during normal operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Tested-by: Lan Tianyu <tianyu.lan@intel.com>
This commit is contained in:
@@ -2940,11 +2940,6 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
|
||||
* restructure your code to batch your updates, and then use a single
|
||||
* synchronize_sched() instead.
|
||||
*
|
||||
* Note that it is illegal to call this function while holding any lock
|
||||
* that is acquired by a CPU-hotplug notifier. And yes, it is also illegal
|
||||
* to call this function from a CPU-hotplug notifier. Failing to observe
|
||||
* these restriction will result in deadlock.
|
||||
*
|
||||
* This implementation can be thought of as an application of ticket
|
||||
* locking to RCU, with sync_sched_expedited_started and
|
||||
* sync_sched_expedited_done taking on the roles of the halves
|
||||
@@ -2994,7 +2989,12 @@ void synchronize_sched_expedited(void)
|
||||
*/
|
||||
snap = atomic_long_inc_return(&rsp->expedited_start);
|
||||
firstsnap = snap;
|
||||
get_online_cpus();
|
||||
if (!try_get_online_cpus()) {
|
||||
/* CPU hotplug operation in flight, fall back to normal GP. */
|
||||
wait_rcu_gp(call_rcu_sched);
|
||||
atomic_long_inc(&rsp->expedited_normal);
|
||||
return;
|
||||
}
|
||||
WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
|
||||
|
||||
/*
|
||||
@@ -3041,7 +3041,12 @@ void synchronize_sched_expedited(void)
|
||||
* and they started after our first try, so their grace
|
||||
* period works for us.
|
||||
*/
|
||||
get_online_cpus();
|
||||
if (!try_get_online_cpus()) {
|
||||
/* CPU hotplug operation in flight, use normal GP. */
|
||||
wait_rcu_gp(call_rcu_sched);
|
||||
atomic_long_inc(&rsp->expedited_normal);
|
||||
return;
|
||||
}
|
||||
snap = atomic_long_read(&rsp->expedited_start);
|
||||
smp_mb(); /* ensure read is before try_stop_cpus(). */
|
||||
}
|
||||
|
Reference in New Issue
Block a user