Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "This cycle's RCU changes were:

   - A few more RCU flavor consolidation cleanups.

   - Updates to RCU's list-traversal macros improving lockdep usability.

   - Forward-progress improvements for no-CBs CPUs: Avoid ignoring
     incoming callbacks during grace-period waits.

   - Forward-progress improvements for no-CBs CPUs: Use ->cblist
     structure to take advantage of others' grace periods.

   - Also added a small commit that avoids needlessly inflicting
     scheduler-clock ticks on callback-offloaded CPUs.

   - Forward-progress improvements for no-CBs CPUs: Reduce contention on
     ->nocb_lock guarding ->cblist.

   - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass
     list to further reduce contention on ->nocb_lock guarding ->cblist.

   - Miscellaneous fixes.

   - Torture-test updates.

   - minor LKMM updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits)
  MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org
  rcu: Don't include <linux/ktime.h> in rcutiny.h
  rcu: Allow rcu_do_batch() to dynamically adjust batch sizes
  rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload
  rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention
  rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention
  rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks()
  rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake()
  rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed
  rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended
  rcu/nocb: Add bypass callback queueing
  rcu/nocb: Atomic ->len field in rcu_segcblist structure
  rcu/nocb: Unconditionally advance and wake for excessive CBs
  rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock
  rcu/nocb: Reduce contention at no-CBs invocation-done time
  rcu/nocb: Reduce contention at no-CBs registry-time CB advancement
  rcu/nocb: Round down for number of no-CBs grace-period kthreads
  rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU
  rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread
  rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks
  ...
This commit is contained in:
Linus Torvalds
2019-09-16 16:28:19 -07:00
49 changed files with 1499 additions and 805 deletions

View File

@@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
<li> <a href="#Hotplug CPU">Hotplug CPU</a>.
<li> <a href="#Scheduler and RCU">Scheduler and RCU</a>.
<li> <a href="#Tracing and RCU">Tracing and RCU</a>.
<li> <a href="#Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a>.
<li> <a href="#Energy Efficiency">Energy Efficiency</a>.
<li> <a href="#Scheduling-Clock Interrupts and RCU">
Scheduling-Clock Interrupts and RCU</a>.
@@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section.
<p>
It is possible to use tracing on RCU code, but tracing itself
uses RCU.
For this reason, <tt>rcu_dereference_raw_notrace()</tt>
For this reason, <tt>rcu_dereference_raw_check()</tt>
is provided for use by tracing, which avoids the destructive
recursion that could otherwise ensue.
This API is also used by virtualization in some architectures,
@@ -2521,6 +2523,75 @@ cannot be used.
The tracing folks both located the requirement and provided the
needed fix, so this surprise requirement was relatively painless.
<h3><a name="Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a></h3>
<p>
The kernel needs to access user-space memory, for example, to access
data referenced by system-call parameters.
The <tt>get_user()</tt> macro does this job.
<p>
However, user-space memory might well be paged out, which means
that <tt>get_user()</tt> might well page-fault and thus block while
waiting for the resulting I/O to complete.
It would be a very bad thing for the compiler to reorder
a <tt>get_user()</tt> invocation into an RCU read-side critical
section.
For example, suppose that the source code looked like this:
<blockquote>
<pre>
1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 v = p-&gt;value;
4 rcu_read_unlock();
5 get_user(user_v, user_p);
6 do_something_with(v, user_v);
</pre>
</blockquote>
<p>
The compiler must not be permitted to transform this source code into
the following:
<blockquote>
<pre>
1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!!
4 v = p-&gt;value;
5 rcu_read_unlock();
6 do_something_with(v, user_v);
</pre>
</blockquote>
<p>
If the compiler did make this transformation in a
<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did
page fault, the result would be a quiescent state in the middle
of an RCU read-side critical section.
This misplaced quiescent state could result in line&nbsp;4 being
a use-after-free access, which could be bad for your kernel's
actuarial statistics.
Similar examples can be constructed with the call to <tt>get_user()</tt>
preceding the <tt>rcu_read_lock()</tt>.
<p>
Unfortunately, <tt>get_user()</tt> doesn't have any particular
ordering properties, and in some architectures the underlying <tt>asm</tt>
isn't even marked <tt>volatile</tt>.
And even if it was marked <tt>volatile</tt>, the above access to
<tt>p-&gt;value</tt> is not volatile, so the compiler would not have any
reason to keep those two accesses in order.
<p>
Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt>
and <tt>rcu_read_unlock()</tt> must act as compiler barriers,
at least for outermost instances of <tt>rcu_read_lock()</tt> and
<tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical
sections.
<h3><a name="Energy Efficiency">Energy Efficiency</a></h3>
<p>

View File

@@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
CONFIG_PREEMPT_RCU case, you might see stall-warning
messages.
You can use the rcutree.kthread_prio kernel boot parameter to
increase the scheduling priority of RCU's kthreads, which can
help avoid this problem. However, please note that doing this
can increase your system's context-switch rate and thus degrade
performance.
o A periodic interrupt whose handler takes longer than the time
interval between successive pairs of interrupts. This can
prevent RCU's kthreads and softirq handlers from running.

View File

@@ -3842,12 +3842,13 @@
RCU_BOOST is not set, valid values are 0-99 and
the default is zero (non-realtime operation).
rcutree.rcu_nocb_leader_stride= [KNL]
Set the number of NOCB kthread groups, which
defaults to the square root of the number of
CPUs. Larger numbers reduces the wakeup overhead
on the per-CPU grace-period kthreads, but increases
that same overhead on each group's leader.
rcutree.rcu_nocb_gp_stride= [KNL]
Set the number of NOCB callback kthreads in
each group, which defaults to the square root
of the number of CPUs. Larger numbers reduce
the wakeup overhead on the global grace-period
kthread, but increases that same overhead on
each group's NOCB grace-period kthread.
rcutree.qhimark= [KNL]
Set threshold of queued RCU callbacks beyond which
@@ -4052,6 +4053,10 @@
rcutorture.verbose= [KNL]
Enable additional printk() statements.
rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
Dump ftrace buffer after reporting RCU CPU
stall warning.
rcupdate.rcu_cpu_stall_suppress= [KNL]
Suppress RCU CPU stall warning messages.