Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: "This cycle's RCU changes were: - A few more RCU flavor consolidation cleanups. - Updates to RCU's list-traversal macros improving lockdep usability. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - Miscellaneous fixes. - Torture-test updates. - minor LKMM updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits) MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org rcu: Don't include <linux/ktime.h> in rcutiny.h rcu: Allow rcu_do_batch() to dynamically adjust batch sizes rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended rcu/nocb: Add bypass callback queueing rcu/nocb: Atomic ->len field in rcu_segcblist structure rcu/nocb: Unconditionally advance and wake for excessive CBs rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock rcu/nocb: Reduce contention at no-CBs invocation-done time rcu/nocb: Reduce contention at no-CBs registry-time CB advancement rcu/nocb: Round down for number of no-CBs grace-period kthreads rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks ...
This commit is contained in:
@@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
|
||||
<li> <a href="#Hotplug CPU">Hotplug CPU</a>.
|
||||
<li> <a href="#Scheduler and RCU">Scheduler and RCU</a>.
|
||||
<li> <a href="#Tracing and RCU">Tracing and RCU</a>.
|
||||
<li> <a href="#Accesses to User Memory and RCU">
|
||||
Accesses to User Memory and RCU</a>.
|
||||
<li> <a href="#Energy Efficiency">Energy Efficiency</a>.
|
||||
<li> <a href="#Scheduling-Clock Interrupts and RCU">
|
||||
Scheduling-Clock Interrupts and RCU</a>.
|
||||
@@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section.
|
||||
<p>
|
||||
It is possible to use tracing on RCU code, but tracing itself
|
||||
uses RCU.
|
||||
For this reason, <tt>rcu_dereference_raw_notrace()</tt>
|
||||
For this reason, <tt>rcu_dereference_raw_check()</tt>
|
||||
is provided for use by tracing, which avoids the destructive
|
||||
recursion that could otherwise ensue.
|
||||
This API is also used by virtualization in some architectures,
|
||||
@@ -2521,6 +2523,75 @@ cannot be used.
|
||||
The tracing folks both located the requirement and provided the
|
||||
needed fix, so this surprise requirement was relatively painless.
|
||||
|
||||
<h3><a name="Accesses to User Memory and RCU">
|
||||
Accesses to User Memory and RCU</a></h3>
|
||||
|
||||
<p>
|
||||
The kernel needs to access user-space memory, for example, to access
|
||||
data referenced by system-call parameters.
|
||||
The <tt>get_user()</tt> macro does this job.
|
||||
|
||||
<p>
|
||||
However, user-space memory might well be paged out, which means
|
||||
that <tt>get_user()</tt> might well page-fault and thus block while
|
||||
waiting for the resulting I/O to complete.
|
||||
It would be a very bad thing for the compiler to reorder
|
||||
a <tt>get_user()</tt> invocation into an RCU read-side critical
|
||||
section.
|
||||
For example, suppose that the source code looked like this:
|
||||
|
||||
<blockquote>
|
||||
<pre>
|
||||
1 rcu_read_lock();
|
||||
2 p = rcu_dereference(gp);
|
||||
3 v = p->value;
|
||||
4 rcu_read_unlock();
|
||||
5 get_user(user_v, user_p);
|
||||
6 do_something_with(v, user_v);
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>
|
||||
The compiler must not be permitted to transform this source code into
|
||||
the following:
|
||||
|
||||
<blockquote>
|
||||
<pre>
|
||||
1 rcu_read_lock();
|
||||
2 p = rcu_dereference(gp);
|
||||
3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!!
|
||||
4 v = p->value;
|
||||
5 rcu_read_unlock();
|
||||
6 do_something_with(v, user_v);
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>
|
||||
If the compiler did make this transformation in a
|
||||
<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did
|
||||
page fault, the result would be a quiescent state in the middle
|
||||
of an RCU read-side critical section.
|
||||
This misplaced quiescent state could result in line 4 being
|
||||
a use-after-free access, which could be bad for your kernel's
|
||||
actuarial statistics.
|
||||
Similar examples can be constructed with the call to <tt>get_user()</tt>
|
||||
preceding the <tt>rcu_read_lock()</tt>.
|
||||
|
||||
<p>
|
||||
Unfortunately, <tt>get_user()</tt> doesn't have any particular
|
||||
ordering properties, and in some architectures the underlying <tt>asm</tt>
|
||||
isn't even marked <tt>volatile</tt>.
|
||||
And even if it was marked <tt>volatile</tt>, the above access to
|
||||
<tt>p->value</tt> is not volatile, so the compiler would not have any
|
||||
reason to keep those two accesses in order.
|
||||
|
||||
<p>
|
||||
Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt>
|
||||
and <tt>rcu_read_unlock()</tt> must act as compiler barriers,
|
||||
at least for outermost instances of <tt>rcu_read_lock()</tt> and
|
||||
<tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical
|
||||
sections.
|
||||
|
||||
<h3><a name="Energy Efficiency">Energy Efficiency</a></h3>
|
||||
|
||||
<p>
|
||||
|
@@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
|
||||
CONFIG_PREEMPT_RCU case, you might see stall-warning
|
||||
messages.
|
||||
|
||||
You can use the rcutree.kthread_prio kernel boot parameter to
|
||||
increase the scheduling priority of RCU's kthreads, which can
|
||||
help avoid this problem. However, please note that doing this
|
||||
can increase your system's context-switch rate and thus degrade
|
||||
performance.
|
||||
|
||||
o A periodic interrupt whose handler takes longer than the time
|
||||
interval between successive pairs of interrupts. This can
|
||||
prevent RCU's kthreads and softirq handlers from running.
|
||||
|
@@ -3842,12 +3842,13 @@
|
||||
RCU_BOOST is not set, valid values are 0-99 and
|
||||
the default is zero (non-realtime operation).
|
||||
|
||||
rcutree.rcu_nocb_leader_stride= [KNL]
|
||||
Set the number of NOCB kthread groups, which
|
||||
defaults to the square root of the number of
|
||||
CPUs. Larger numbers reduces the wakeup overhead
|
||||
on the per-CPU grace-period kthreads, but increases
|
||||
that same overhead on each group's leader.
|
||||
rcutree.rcu_nocb_gp_stride= [KNL]
|
||||
Set the number of NOCB callback kthreads in
|
||||
each group, which defaults to the square root
|
||||
of the number of CPUs. Larger numbers reduce
|
||||
the wakeup overhead on the global grace-period
|
||||
kthread, but increases that same overhead on
|
||||
each group's NOCB grace-period kthread.
|
||||
|
||||
rcutree.qhimark= [KNL]
|
||||
Set threshold of queued RCU callbacks beyond which
|
||||
@@ -4052,6 +4053,10 @@
|
||||
rcutorture.verbose= [KNL]
|
||||
Enable additional printk() statements.
|
||||
|
||||
rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
|
||||
Dump ftrace buffer after reporting RCU CPU
|
||||
stall warning.
|
||||
|
||||
rcupdate.rcu_cpu_stall_suppress= [KNL]
|
||||
Suppress RCU CPU stall warning messages.
|
||||
|
||||
|
Reference in New Issue
Block a user