Commit Graph

29483 Commits

Author SHA1 Message Date
Paul E. McKenney
d28139c4e9 rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe
One necessary step towards consolidating the three flavors of RCU is to
make sure that the resulting consolidated "one flavor to rule them all"
correctly handles networking denial-of-service attacks.  One thing that
allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every
so often, and so something similar has to happen for consolidated RCU.

This must be done carefully.  For example, if a preemption-disabled
region of code takes an interrupt which does softirq processing before
returning, consolidated RCU must ignore the resulting rcu_bh_qs()
invocations -- preemption is still disabled, and that means an RCU
reader for the consolidated flavor.

This commit therefore creates a new rcu_softirq_qs() that is called only
from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region
problem.  This new rcu_softirq_qs() function invokes rcu_sched_qs(),
rcu_preempt_qs(), and rcu_preempt_deferred_qs().  The latter call handles
any deferred quiescent states.

Note that __do_softirq() still invokes rcu_bh_qs().  It will continue to
do so until a later stage of cleanup when the RCU-bh flavor is removed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix !SMP issue located by kbuild test robot. ]
2018-08-30 16:02:38 -07:00
Paul E. McKenney
e11ec65cc8 rcu: Add warning to detect half-interrupts
RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
either an interrupt that invokes rcu_irq_enter() but never invokes the
corresponding rcu_irq_exit() on the one hand, or an interrupt that never
invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
on the other.  These things really did happen at one time, as evidenced
by this ca-2011 LKML post:

http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com

The reason why RCU tolerates half-interrupts is that usermode helpers
used exceptions to invoke a system call from within the kernel such that
the system call did a normal return (not a return from exception) to
the calling context.  This caused rcu_irq_enter() to be invoked without
a matching rcu_irq_exit().  However, usermode helpers have since been
rewritten to make much more housebroken use of workqueues, kernel threads,
and do_execve(), and therefore should no longer produce half-interrupts.
No one knows of any other source of half-interrupts, but then again,
no one seems insane enough to go audit the entire kernel to verify that
half-interrupts really are a relic of the past.

This commit therefore adds a pair of WARN_ON_ONCE() calls that will
trigger in the presence of half interrupts, which the code will continue
to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
mid-2021, then perhaps RCU can stop handling half-interrupts, which
would be a considerable simplification.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-08-30 16:02:36 -07:00
Paul E. McKenney
fcc878e4df rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union
The ->b.exp_need_qs field is now set only to false, so this commit
removes it.  The job this field used to do is now done by the rcu_data
structure's ->deferred_qs field, which is a consequence of a better
split between task-based (the rcu_node structure's ->exp_tasks field) and
CPU-based (the aforementioned rcu_data structure's ->deferred_qs field)
tracking of quiescent states for RCU-preempt expedited grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:36 -07:00
Paul E. McKenney
27c744e32a rcu: Allow processing deferred QSes for exiting RCU-preempt readers
If an RCU-preempt read-side critical section is exiting, that is,
->rcu_read_lock_nesting is negative, then it is a good time to look
at the possibility of reporting deferred quiescent states.  This
commit therefore updates the checks in rcu_preempt_need_deferred_qs()
to allow exiting critical sections to report deferred quiescent states.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:35 -07:00
Paul E. McKenney
c0335743c5 rcutorture: Test extended "rcu" read-side critical sections
This commit makes the "rcu" torture type test extended read-side
critical sections in order to test the deferral of RCU-preempt
quiescent-state testing.

In CONFIG_PREEMPT=n kernels, this simply duplicates the setup already
in place for the "sched" torture type.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:35 -07:00
Paul E. McKenney
3e31009898 rcu: Defer reporting RCU-preempt quiescent states when disabled
This commit defers reporting of RCU-preempt quiescent states at
rcu_read_unlock_special() time when any of interrupts, softirq, or
preemption are disabled.  These deferred quiescent states are reported
at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug
offline operation.  Of course, if another RCU read-side critical
section has started in the meantime, the reporting of the quiescent
state will be further deferred.

This also means that disabling preemption, interrupts, and/or
softirqs will act as an RCU-preempt read-side critical section.
This is enforced by checking preempt_count() as needed.

Some special cases must be handled on an ad-hoc basis, for example,
context switch is a quiescent state even though both the scheduler and
do_exit() disable preemption.  In these cases, additional calls to
rcu_preempt_deferred_qs() override the preemption disabling.  Similar
logic overrides disabled interrupts in rcu_preempt_check_callbacks()
because in this case the quiescent state happened just before the
corresponding scheduling-clock interrupt.

In theory, this change lifts a long-standing restriction that required
that if interrupts were disabled across a call to rcu_read_unlock()
that the matching rcu_read_lock() also be contained within that
interrupts-disabled region of code.  Because the reporting of the
corresponding RCU-preempt quiescent state is now deferred until
after interrupts have been enabled, it is no longer possible for this
situation to result in deadlocks involving the scheduler's runqueue and
priority-inheritance locks.  This may allow some code simplification that
might reduce interrupt latency a bit.  Unfortunately, in practice this
would also defer deboosting a low-priority task that had been subjected
to RCU priority boosting, so real-time-response considerations might
well force this restriction to remain in place.

Because RCU-preempt grace periods are now blocked not only by RCU
read-side critical sections, but also by disabling of interrupts,
preemption, and softirqs, it will be possible to eliminate RCU-bh and
RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels.  This may
require some additional plumbing to provide the network denial-of-service
guarantees that have been traditionally provided by RCU-bh.  Once these
are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh
into RCU-sched.  This would mean that all kernels would have but
one flavor of RCU, which would open the door to significant code
cleanup.

Moving to a single flavor of RCU would also have the beneficial effect
of reducing the NOCB kthreads by at least a factor of two.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback
  from Joel Fernandes. ]
[ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
  response to bug reports from kbuild test robot. ]
[ paulmck: Fix bug located by kbuild test robot involving recursion
  via rcu_preempt_deferred_qs(). ]
2018-08-30 16:02:34 -07:00
Byungchul Park
cf7614e13c rcu: Refactor rcu_{nmi,irq}_{enter,exit}()
When entering or exiting irq or NMI handlers, the current code uses
->dynticks_nmi_nesting to detect if it is in the outermost handler,
that is, the one interrupting or returning to an RCU-idle context (the
idle loop or nohz_full usermode execution).  When entering the outermost
handler via an interrupt (as opposed to NMI), it is necessary to invoke
rcu_dynticks_task_exit() just before the CPU is marked non-idle from an
RCU perspective and to invoke rcu_cleanup_after_idle() just after the
CPU is marked non-idle.  Similarly, when exiting the outermost handler
via an interrupt, it is necessary to invoke rcu_prepare_for_idle() just
before marking the CPU idle and to invoke rcu_dynticks_task_enter()
just after marking the CPU idle.

The decision to execute these four functions is currently taken in
rcu_irq_enter() and rcu_irq_exit() as follows:

   rcu_irq_enter()
      /* A conditional branch with ->dynticks_nmi_nesting */
      rcu_nmi_enter()
         /* A conditional branch with ->dynticks */
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_irq_exit()
      /* A conditional branch with ->dynticks_nmi_nesting */
      rcu_nmi_exit()
         /* A conditional branch with ->dynticks_nmi_nesting */
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_nmi_enter()
      /* A conditional branch with ->dynticks */

   rcu_nmi_exit()
      /* A conditional branch with ->dynticks_nmi_nesting */

This works, but the conditional branches in rcu_irq_enter() and
rcu_irq_exit() are redundant with those in rcu_nmi_enter() and
rcu_nmi_exit(), respectively.  Redundant branches are not something
we want in the to/from-idle fastpaths, so this commit refactors
rcu_{nmi,irq}_{enter,exit}() so they use a common inlined function passed
a constant argument as follows:

   rcu_irq_enter() inlining rcu_nmi_enter_common(irq=true)
      /* A conditional branch with ->dynticks */

   rcu_irq_exit() inlining rcu_nmi_exit_common(irq=true)
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_nmi_enter() inlining rcu_nmi_enter_common(irq=false)
      /* A conditional branch with ->dynticks */

   rcu_nmi_exit() inlining rcu_nmi_exit_common(irq=false)
      /* A conditional branch with ->dynticks_nmi_nesting */

The combination of the constant function argument and the inlining allows
the compiler to discard the conditionals that previously controlled
execution of rcu_dynticks_task_exit(), rcu_cleanup_after_idle(),
rcu_prepare_for_idle(), and rcu_dynticks_task_enter().  This reduces both
the to-idle and from-idle path lengths by two conditional branches each,
and improves readability as well.

This commit also changes order of execution from this:

	rcu_dynticks_task_exit();
	rcu_dynticks_eqs_exit();
	trace_rcu_dyntick();
	rcu_cleanup_after_idle();

To this:

	rcu_dynticks_task_exit();
	rcu_dynticks_eqs_exit();
	rcu_cleanup_after_idle();
	trace_rcu_dyntick();

In other words, the calls to rcu_cleanup_after_idle() and
trace_rcu_dyntick() are reversed.  This has no functional effect because
the real concern is whether a given call is before or after the call to
rcu_dynticks_eqs_exit(), and this patch does not change that.  Before the
call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current
CPU and after that call RCU is watching.

A similar switch in calling order happens on the idle-entry path, with
similar lack of effect for the same reasons.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Applied Steven Rostedt feedback. ]
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-30 16:00:46 -07:00
Jiri Olsa
bf06278c3f perf/hw_breakpoint: Simplify breakpoint enable in perf_event_modify_breakpoint
We can safely enable the breakpoint back for both the fail and success
paths by checking only the bp->attr.disabled, which either holds the new
'requested' disabled state or the original breakpoint state.

Committer testing:

At the end of the series, the 'perf test' entry introduced as the first
patch now runs to completion without finding the fixed issues:

  # perf test "bp modify"
  62: x86 bp modify                                         : Ok
  #

In verbose mode:

  # perf test -v "bp modify"
  62: x86 bp modify                                         :
  --- start ---
  test child forked, pid 5161
  rip 5950a0, bp_1 0x5950a0
  in bp_1
  rip 5950a0, bp_1 0x5950a0
  in bp_1
  test child finished with 0
  ---- end ----
  x86 bp modify: Ok

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180827091228.2878-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-30 14:49:24 -03:00
Jiri Olsa
969558371b perf/hw_breakpoint: Enable breakpoint in modify_user_hw_breakpoint
Currently we enable the breakpoint back only if the breakpoint
modification was successful. If it fails we can leave the breakpoint in
disabled state with attr->disabled == 0.

We can safely enable the breakpoint back for both the fail and success
paths by checking the bp->attr.disabled, which either holds the new
'requested' disabled state or the original breakpoint state.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180827091228.2878-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-30 14:49:23 -03:00
Jiri Olsa
cb45302d7c perf/hw_breakpoint: Remove superfluous bp->attr.disabled = 0
Once the breakpoint was succesfully modified, the attr->disabled value
is in bp->attr.disabled. So there's no reason to set it again, removing
that.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180827091228.2878-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-30 14:49:23 -03:00
Jiri Olsa
bd14406b78 perf/hw_breakpoint: Modify breakpoint even if the new attr has disabled set
We need to change the breakpoint even if the attr with new fields has
disabled set to true.

Current code prevents following user code to change the breakpoint
address:

  ptrace(PTRACE_POKEUSER, child, offsetof(struct user, u_debugreg[0]), addr_1)
  ptrace(PTRACE_POKEUSER, child, offsetof(struct user, u_debugreg[0]), addr_2)
  ptrace(PTRACE_POKEUSER, child, offsetof(struct user, u_debugreg[7]), dr7)

The first PTRACE_POKEUSER creates the breakpoint with attr.disabled set
to true:

  ptrace_set_breakpoint_addr(nr = 0)
    struct perf_event *bp = t->ptrace_bps[nr];

    ptrace_register_breakpoint(..., disabled = true)
      ptrace_fill_bp_fields(..., disabled)
      register_user_hw_breakpoint

So the second PTRACE_POKEUSER will be omitted:

  ptrace_set_breakpoint_addr(nr = 0)
    struct perf_event *bp = t->ptrace_bps[nr];
    struct perf_event_attr attr = bp->attr;

    modify_user_hw_breakpoint(bp, &attr)
      if (!attr->disabled)
        modify_user_hw_breakpoint_check

Reported-by: Milind Chabbi <chabbi.milind@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180827091228.2878-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-30 14:49:23 -03:00
Yonghong Song
c7b27c37af bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash
Added bpffs pretty print for percpu arraymap, percpu hashmap
and percpu lru hashmap.

For each map <key, value> pair, the format is:
   <key_value>: {
	cpu0: <value_on_cpu0>
	cpu1: <value_on_cpu1>
	...
	cpun: <value_on_cpun>
   }

For example, on my VM, there are 4 cpus, and
for test_btf test in the next patch:
   cat /sys/fs/bpf/pprint_test_percpu_hash

You may get:
   ...
   43602: {
	cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
	cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
	cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
	cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
   }
   72847: {
	cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
	cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
	cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
	cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
   }
   ...

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30 14:03:53 +02:00
Mukesh Ojha
13ba17bee1 notifier: Remove notifier header file wherever not used
The conversion of the hotplug notifiers to a state machine left the
notifier.h includes around in some places. Remove them.

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org
2018-08-30 12:56:40 +02:00
Vincent Whitchurch
cb9d7fd51d watchdog: Mark watchdog touch functions as notrace
Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.

Commit ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations.  This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.

Fix it by adding the necessary notrace annotations.

Fixes: ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: oleg@redhat.com
Cc: tj@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
2018-08-30 12:56:40 +02:00
Edward Cree
8efea21d33 bpf/verifier: display non-spill stack slot types in print_verifier_state
If a stack slot does not hold a spilled register (STACK_SPILL), then each
 of its eight bytes could potentially have a different slot_type.  This
 information can be important for debugging, and previously we either did
 not print anything for the stack slot, or just printed fp-X=0 in the case
 where its first byte was STACK_ZERO.
Instead, print eight characters with either 0 (STACK_ZERO), m (STACK_MISC)
 or ? (STACK_INVALID) for any stack slot which is neither STACK_SPILL nor
 entirely STACK_INVALID.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 18:52:12 -07:00
Edward Cree
679c782de1 bpf/verifier: per-register parent pointers
By giving each register its own liveness chain, we elide the skip_callee()
 logic.  Instead, each register's parent is the state it inherits from;
 both check_func_call() and prepare_func_exit() automatically connect
 reg states to the correct chain since when they copy the reg state across
 (r1-r5 into the callee as args, and r0 out as the return value) they also
 copy the parent pointer.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 18:52:12 -07:00
Paul E. McKenney
7c590fcca6 rcutorture: Maintain self-propagating CB only during forward-progress test
The current forward-progress testing maintains a self-propagating
callback during the full test.  This could result in false negatives
for stutter-end checking, where it might appear that RCU was clearing
out old callbacks only because it was being continually motivated by
the self-propagating callback.  This commit therefore shuts down the
self-propagating callback at the end of each forward-progress test
interval.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
474e59b476 rcutorture: Check GP completion at stutter end
The rcu_torture_writer() function invokes stutter_wait() at the end of
each writer pass, which occasionally blocks for an extended time period
in order to ensure that RCU can handle intermittent loads.  But part of
handling a busy period is invoking all the callbacks before the end of
the idle period induced by stutter_wait().

This commit therefore adds a return value to stutter_wait() indicating
whether stutter_wait() actually waited.  In addition, this commit causes
rcu_torture_writer() to test this value and if set, checks that all the
elements of the rcu_tortures[] array have been freed up.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
f4de46ed5b rcutorture: Print forward-progress test interval on error
This commit prints the duration of the forward-progress test interval in
the case that no forward progress was observed as an aid to debugging.
When forward progress does happen, it prints out the number of
rcu_torture_writer() versions and grace periods that elapsed during the
forward-progress test.  At the end of the run, it also prints the number
of attempted and actual forward-progress tests.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
c04dd09bd3 rcutorture: Adjust number of reader kthreads per CPU-hotplug operations
Currently, rcutorture provisions rcu_torture_reader() kthreads based
on the initial number of CPUs.  This can be problematic when CPU hotplug
is enabled, as a system with a very large number of CPUs will provision
a very large number of rcu_torture_reader() kthreads.  All of these
kthreads will continue running even if the CPU-hotplug operations result
in only one remaining online CPU.  This can result in all sorts of strange
artifacts due simply to massive overload.

This commit therefore causes the rcu_torture_reader() kthreads to start
blocking as the number of online CPUs decreases.  This is accomplished
by numbering these kthreads, and having each check to make sure that the
number of online CPUs is at least as large as its assigned number.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
fecad5091f rcutorture: Reduce priority of forward-progress testing
On !SMP tests, the forward-progress kthread might prevent RCU's
grace-period kthread from running, which would defeat RCU's
forward-progress measures.  On PREEMPT tests without RCU priority
boosting, the forward-progress kthread might preempt a reader for an
extended time period, which would also defeat RCU's forward-progress
measures.  This commit therefore reduced rcutorture's forward-progress
kthread's priority in those cases.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
1e69676592 rcutorture: Limit reader duration if irq or bh disabled
There are debug checks in some environments that will complain if the
duration of a bh-disabled region of code exceeds about 50 milliseconds.
Because rcu_read_delay() can produce a 50-millisecond delay and because
there could be up to eight reader segments with such delays, this commit
limits the maximum delay to 10 milliseconds if either interrupts or
softirqs are disabled.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
3cff54a830 rcutorture: Increase rcu_read_delay() longdelay_ms
RCU now takes certain actions 100 and 200 milliseconds into a grace period
by default, but rcutorture only runs RCU read-side critical sections
with durations up to 50 milliseconds.  This commit therefore increases
test coverage by increasing the maximum critical-section duration to
300 milliseconds.  Note that the existing code automatically dials down
the probability of long delays based on the maximum duration, which means
that this change should not significantly change the rate of execution
of RCU read-side critical sections.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
9fdcb9afe0 rcutorture: Add self-propagating callback to forward-progress testing
If rcutorture is run on a quiet system with the rcutorture.stutter module
parameter set high, then there can legitimately be an extended period
during which no RCU forward progress takes place.  This can result
in false-positive no-forward-progress splats.  This commit therefore
makes rcu_torture_fwd_prog() create a self-propagating RCU callback
to ensure that grace periods are in progress for the duration of the
forward-progress test.

Note that the RCU flavor under test must define ->call(), ->sync(),
and ->cb_barrier() for this self-propagating callback to be created.
If one or more of those rcu_torture_ops fields are NULL, then the
rcu_torture_fwd_prog() function will silently proceed without creating
the self-propagating callback.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
08a7a2ec68 rcutorture: Vary forward-progress test interval
Some of the Linux kernel's RCU implementations provide several mechanisms
to promote forward progress that operate over different timeframes.
This commit therefore causes rcu_torture_fwd_prog() to vary the duration
of its forward-progress testing in order to test each such mechanism.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
152f4afbfd rcutorture: Avoid no-test complaint if too few forward-progress tries
In a too-short test, random delays can cause each attempt to do
forward-progress testing to fail to complete, thus resulting in
spurious splats.  This commit therefore requires at least five tries
before complaining about rcutorture runs that failed to produce at
least one valid forward-progress testing attempt.  Note that actual
forward-progress failures will splat regardless of the number of tries.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
119248bec9 rcutorture: Also use GP sequence to judge forward progress
Currently, rcutorture relies solely on the progress of
rcu_torture_writer() to judge grace-period forward progress.  In theory,
this is the gold standard of forward progress, but in practice rcutorture
separately detects and reports rcu_torture_writer() stalls.  This commit
therefore adds the grace-period sequence number (when provided) to the
judgment of grace-period forward progress, which makes it easier to
distinguish between failure of actual grace periods to progress on the
one hand and downstream forward-progress failures on the other.

For example, given this change, if rcu_torture_writer() stalls,
but rcu_torture_fwd_prog() does not complain, then the grace-period
computation is working, which is a hint that the failure lies in callback
processing, wakeup of the rcu_torture_writer() kthread, or similar.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
1b27291b1e rcutorture: Add forward-progress tests for RCU grace periods
This commit adds a kthread that loops going into and out of RCU
read-side critical sections, but also including a cond_resched(),
optionally guarded by a check of need_resched(), in that same loop.
This commit relies solely on rcu_torture_writer() progress to judge
the forward progress of grace periods.

Note that Tasks RCU and SRCU are exempted from forward-progress testing
due their (intentionally) less-robust forward-progress guarantees.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
f028806442 rcuperf: Warn on bad perf type for built-in tests
When running a built-in rcuperf test, specifying an invalid perf type
results in what looks like a hard hang, with the error messages hidden
by other boot-time output.  This commit therefore executes a WARN_ON()
in this case so that the splat appears just following the error messages.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
e746b55857 rcutorture: Warn on bad torture type for built-in tests
When running a built-in rcutorture test, specifying an invalid torture
type results in what looks like a hard hang, with the error messages
hidden by other boot-time output.  This commit therefore executes a
WARN_ON() in this case so that the splat appears just following the
error messages.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
444da518fd rcutorture: Force occasional reader waits
Deferred quiescent states can interact with the scheduler, but
rcu_torture_reader() does not force such interaction all that frequently.
This commit therefore blocks for one jiffy after ten jiffies of read-side
runtime.  This has the beneficial effect of being most likely to block
just after long-running readers, and it is exactly these readers that
are most likely to have been preempted (in CONFIG_PREEMPT=y kernels).
This in turn helps increase the probability that a deferred quiescent
state will be seen by RCU's context-switch hooks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
YueHaibing
efbaec89c6 bpf: remove duplicated include from syscall.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-29 17:28:00 +02:00
Arnd Bergmann
49c39f8464 y2038: signal: Change rt_sigtimedwait to use __kernel_timespec
This changes sys_rt_sigtimedwait() to use get_timespec64(), changing
the timeout type to __kernel_timespec, which will be changed to use
a 64-bit time_t in the future. Since the do_sigtimedwait() core
function changes, we also have to modify the compat version of this
system call in the same way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29 15:42:25 +02:00
Arnd Bergmann
474b9c777b y2038: sched: Change sched_rr_get_interval to use __kernel_timespec
This is a preparation patch for converting sys_sched_rr_get_interval to
work with 64-bit time_t on 32-bit architectures. The 'interval' argument
is changed to struct __kernel_timespec, which will be redefined using
64-bit time_t in the future. The compat version of the system call in
turn is enabled for compilation with CONFIG_COMPAT_32BIT_TIME so
the individual 32-bit architectures can share the handling of the
traditional argument with 64-bit architectures providing it for their
compat mode.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29 15:42:24 +02:00
kbuild test robot
743f5cdb6c y2038: __get_old_timespec32() can be static
The kbuild test robot reports two new warnings with the previous
patch:

 kernel/time/time.c:866:5: sparse: symbol '__get_old_timespec32' was not declared. Should it be static?
 kernel/time/time.c:882:5: sparse: symbol '__put_old_timespec32' was not declared. Should it be static?

These are actually older bugs, but came up now after the
symbol got renamed. Fortunately, commit afef05cf23 ("time:
Enable get/put_compat_itimerspec64 always") makes the two functions
(__compat_get_timespec64/__compat_get_timespec64) local to time.c already,
so we can mark them as 'static'.

Fixes: ee16c8f415e4 ("y2038: Globally rename compat_time to old_time32")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
[arnd: added changelog text]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29 15:42:19 +02:00
John Fastabend
501ca81760 bpf: sockmap, decrement copied count correctly in redirect error case
Currently, when a redirect occurs in sockmap and an error occurs in
the redirect call we unwind the scatterlist once in the error path
of bpf_tcp_sendmsg_do_redirect() and then again in sendmsg(). Then
in the error path of sendmsg we decrement the copied count by the
send size.

However, its possible we partially sent data before the error was
generated. This can happen if do_tcp_sendpages() partially sends the
scatterlist before encountering a memory pressure error. If this
happens we need to decrement the copied value (the value tracking
how many bytes were actually sent to TCP stack) by the number of
remaining bytes _not_ the entire send size. Otherwise we risk
confusing userspace.

Also we don't need two calls to free the scatterlist one is
good enough. So remove the one in bpf_tcp_sendmsg_do_redirect() and
then properly reduce copied by the number of remaining bytes which
may in fact be the entire send size if no bytes were sent.

To do this use bool to indicate if free_start_sg() should do mem
accounting or not.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-28 09:01:06 +02:00
Daniel Borkmann
15c480efab bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsg
In bpf_tcp_recvmsg() we first took a reference on the psock, however
once we find that there are skbs in the normal socket's receive queue
we return with processing them through tcp_recvmsg(). Problem is that
we leak the taken reference on the psock in that path. Given we don't
really do anything with the psock at this point, move the skb_queue_empty()
test before we fetch the psock to fix this case.

Fixes: 8934ce2fd0 ("bpf: sockmap redirect ingress support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-27 20:22:05 -07:00
Daniel Borkmann
e06fa9c16c bpf, sockmap: fix potential use after free in bpf_tcp_close
bpf_tcp_close() we pop the psock linkage to a map via psock_map_pop().
A parallel update on the sock hash map can happen between psock_map_pop()
and lookup_elem_raw() where we override the element under link->hash /
link->key. In bpf_tcp_close()'s lookup_elem_raw() we subsequently only
test whether an element is present, but we do not test whether the
element is infact the element we were looking for.

We lock the sock in bpf_tcp_close() during that time, so do we hold
the lock in sock_hash_update_elem(). However, the latter locks the
sock which is newly updated, not the one we're purging from the hash
table. This means that while one CPU is doing the lookup from bpf_tcp_close(),
another CPU is doing the map update in parallel, dropped our sock from
the hlist and released the psock.

Subsequently the first CPU will find the new sock and attempts to drop
and release the old sock yet another time. Fix is that we need to check
the elements for a match after lookup, similar as we do in the sock map.
Note that the hash tab elems are freed via RCU, so access to their
link->hash / link->key is fine since we're under RCU read side there.

Fixes: e9db4ef6bf ("bpf: sockhash fix omitted bucket lock in sock_close")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-27 20:22:05 -07:00
Linus Torvalds
050cdc6c95 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks.

 2) Better fix for AB-BA deadlock in packet scheduler code, from Cong
    Wang.

 3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel
    Borkmann.

 4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent
    attackers using it as a side-channel. From Eric Dumazet.

 5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva.

 6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin
    Liu.

 7) hns and hns3 bug fixes from Huazhong Tan.

 8) Fix RIF leak in mlxsw driver, from Ido Schimmel.

 9) iova range check fix in vhost, from Jason Wang.

10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend.

11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng.

12) Memory exposure in TCA_U32_SEL handling, from Kees Cook.

13) TCP BBR congestion control fixes from Kevin Yang.

14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger.

15) qed driver fixes from Tomer Tayar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
  net: sched: Fix memory exposure from short TCA_U32_SEL
  qed: fix spelling mistake "comparsion" -> "comparison"
  vhost: correctly check the iova range when waking virtqueue
  qlge: Fix netdev features configuration.
  net: macb: do not disable MDIO bus at open/close time
  Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency"
  net: macb: Fix regression breaking non-MDIO fixed-link PHYs
  mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge
  i40e: fix condition of WARN_ONCE for stat strings
  i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled
  ixgbe: fix driver behaviour after issuing VFLR
  ixgbe: Prevent unsupported configurations with XDP
  ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
  igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
  igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
  igb: Use an advanced ctx descriptor for launchtime
  e1000: ensure to free old tx/rx rings in set_ringparam()
  e1000: check on netif_running() before calling e1000_up()
  ixgb: use dma_zalloc_coherent instead of allocator/memset
  ice: Trivial formatting fixes
  ...
2018-08-27 11:59:39 -07:00
Arnd Bergmann
9afc5eee65 y2038: globally rename compat_time to old_time32
Christoph Hellwig suggested a slightly different path for handling
backwards compatibility with the 32-bit time_t based system calls:

Rather than simply reusing the compat_sys_* entry points on 32-bit
architectures unchanged, we get rid of those entry points and the
compat_time types by renaming them to something that makes more sense
on 32-bit architectures (which don't have a compat mode otherwise),
and then share the entry points under the new name with the 64-bit
architectures that use them for implementing the compatibility.

The following types and interfaces are renamed here, and moved
from linux/compat_time.h to linux/time32.h:

old				new
---				---
compat_time_t			old_time32_t
struct compat_timeval		struct old_timeval32
struct compat_timespec		struct old_timespec32
struct compat_itimerspec	struct old_itimerspec32
ns_to_compat_timeval()		ns_to_old_timeval32()
get_compat_itimerspec64()	get_old_itimerspec32()
put_compat_itimerspec64()	put_old_itimerspec32()
compat_get_timespec64()		get_old_timespec32()
compat_put_timespec64()		put_old_timespec32()

As we already have aliases in place, this patch addresses only the
instances that are relevant to the system call interface in particular,
not those that occur in device drivers and other modules. Those
will get handled separately, while providing the 64-bit version
of the respective interfaces.

I'm not renaming the timex, rusage and itimerval structures, as we are
still debating what the new interface will look like, and whether we
will need a replacement at all.

This also doesn't change the names of the syscall entry points, which can
be done more easily when we actually switch over the 32-bit architectures
to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-27 14:48:48 +02:00
Arnd Bergmann
33e2641819 y2038: make do_gettimeofday() and get_seconds() inline
get_seconds() and do_gettimeofday() are only used by a few modules now any
more (waiting for the respective patches to get accepted), and they are
among the last holdouts of code that is not y2038 safe in the core kernel.

Move the implementation into the timekeeping32.h header to clean up
the core kernel and isolate the old interfaces further.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-27 14:45:58 +02:00
Arnd Bergmann
976516404f y2038: remove unused time interfaces
After many small patches, at least some of the deprecated interfaces
have no remaining users any more and can be removed:

  current_kernel_time
  do_settimeofday
  get_monotonic_boottime
  get_monotonic_boottime64
  get_monotonic_coarse
  get_monotonic_coarse64
  getrawmonotonic64
  ktime_get_real_ts
  timekeeping_clocktai
  timespec_trunc
  timespec_valid_strict
  time_to_tm

For many of the remaining time functions, we are missing one or
two patches that failed to make it into 4.19, they will be removed
in the following merge window.

The replacement functions for the removed interfaces are documented in
Documentation/core-api/timekeeping.rst.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-27 12:59:56 +02:00
Linus Torvalds
d207ea8e74 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
 "Kernel:
   - Improve kallsyms coverage
   - Add x86 entry trampolines to kcore
   - Fix ARM SPE handling
   - Correct PPC event post processing

  Tools:
   - Make the build system more robust
   - Small fixes and enhancements all over the place
   - Update kernel ABI header copies
   - Preparatory work for converting libtraceevnt to a shared library
   - License cleanups"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
  tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
  tools arch x86: Update tools's copy of cpufeatures.h
  perf python: Fix pyrf_evlist__read_on_cpu() interface
  perf mmap: Store real cpu number in 'struct perf_mmap'
  perf tools: Remove ext from struct kmod_path
  perf tools: Add gzip_is_compressed function
  perf tools: Add lzma_is_compressed function
  perf tools: Add is_compressed callback to compressions array
  perf tools: Move the temp file processing into decompress_kmodule
  perf tools: Use compression id in decompress_kmodule()
  perf tools: Store compression id into struct dso
  perf tools: Add compression id into 'struct kmod_path'
  perf tools: Make is_supported_compression() static
  perf tools: Make decompress_to_file() function static
  perf tools: Get rid of dso__needs_decompress() call in __open_dso()
  perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble()
  perf tools: Get rid of dso__needs_decompress() call in read_object_code()
  tools lib traceevent: Change to SPDX License format
  perf llvm: Allow passing options to llc in addition to clang
  perf parser: Improve error message for PMU address filters
  ...
2018-08-26 11:25:21 -07:00
Linus Torvalds
a9ce323378 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull licking update from Thomas Gleixner:
 "Mark the switch cases which fall through to the next case with the
  proper comment so the fallthrough compiler checks can be enabled"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Mark expected switch fall-throughs
2018-08-26 09:47:00 -07:00
Linus Torvalds
2923b27e54 Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm memory-failure update from Dave Jiang:
 "As it stands, memory_failure() gets thoroughly confused by dev_pagemap
  backed mappings. The recovery code has specific enabling for several
  possible page states and needs new enabling to handle poison in dax
  mappings.

  In order to support reliable reverse mapping of user space addresses:

   1/ Add new locking in the memory_failure() rmap path to prevent races
      that would typically be handled by the page lock.

   2/ Since dev_pagemap pages are hidden from the page allocator and the
      "compound page" accounting machinery, add a mechanism to determine
      the size of the mapping that encompasses a given poisoned pfn.

   3/ Given pmem errors can be repaired, change the speculatively
      accessed poison protection, mce_unmap_kpfn(), to be reversible and
      otherwise allow ongoing access from the kernel.

  A side effect of this enabling is that MADV_HWPOISON becomes usable
  for dax mappings, however the primary motivation is to allow the
  system to survive userspace consumption of hardware-poison via dax.
  Specifically the current behavior is:

     mce: Uncorrected hardware memory error in user-access at af34214200
     {1}[Hardware Error]: It has been corrected by h/w and requires no further action
     mce: [Hardware Error]: Machine check events logged
     {1}[Hardware Error]: event severity: corrected
     Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users
     [..]
     Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed
     mce: Memory error not recovered
     <reboot>

  ...and with these changes:

     Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000
     Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption
     Memory failure: 0x20cb00: recovery action for dax page: Recovered

  Given all the cross dependencies I propose taking this through
  nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax
  folks"

* tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm, pmem: Restore page attributes when clearing errors
  x86/memory_failure: Introduce {set, clear}_mce_nospec()
  x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses
  mm, memory_failure: Teach memory_failure() about dev_pagemap pages
  filesystem-dax: Introduce dax_lock_mapping_entry()
  mm, memory_failure: Collect mapping size in collect_procs()
  mm, madvise_inject_error: Let memory_failure() optionally take a page reference
  mm, dev_pagemap: Do not clear ->mapping on final put
  mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages
  filesystem-dax: Set page->index
  device-dax: Set page->index
  device-dax: Enable page_mapping()
  device-dax: Convert to vmf_insert_mixed and vm_fault_t
2018-08-25 18:43:59 -07:00
Linus Torvalds
596766102a Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Just one commit from Steven to take out spin lock from trace event
  handlers"

* 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/tracing: Move taking of spin lock out of trace event handlers
2018-08-24 13:19:27 -07:00
Linus Torvalds
9022ada8ab Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Over the lockdep cross-release churn, workqueue lost some of the
  existing annotations. Johannes Berg restored it and also improved
  them"

* 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: re-add lockdep dependencies for flushing
  workqueue: skip lockdep wq dependency in cancel_work_sync()
2018-08-24 13:16:36 -07:00
Linus Torvalds
4def196360 Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
 "This is a set of four fairly obvious bug fixes:

   - a switch from d_find_alias to d_find_any_alias because the xattr
     code perversely takes a dentry

   - two mutex vs copy_to_user fixes from Jann Horn

   - a fix to use a sanitized size not the size userspace passed in from
     Christian Brauner"

* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  getxattr: use correct xattr length
  sys: don't hold uts_sem while accessing userspace memory
  userns: move user access out of the mutex
  cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()
2018-08-24 09:25:39 -07:00
Linus Torvalds
33e17876ea Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - the rest of MM

 - various misc fixes and tweaks

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  mm: Change return type int to vm_fault_t for fault handlers
  lib/fonts: convert comments to utf-8
  s390: ebcdic: convert comments to UTF-8
  treewide: convert ISO_8859-1 text comments to utf-8
  drivers/gpu/drm/gma500/: change return type to vm_fault_t
  docs/core-api: mm-api: add section about GFP flags
  docs/mm: make GFP flags descriptions usable as kernel-doc
  docs/core-api: split memory management API to a separate file
  docs/core-api: move *{str,mem}dup* to "String Manipulation"
  docs/core-api: kill trailing whitespace in kernel-api.rst
  mm/util: add kernel-doc for kvfree
  mm/util: make strndup_user description a kernel-doc comment
  fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu builds
  treewide: correct "differenciate" and "instanciate" typos
  fs/afs: use new return type vm_fault_t
  drivers/hwtracing/intel_th/msu.c: change return type to vm_fault_t
  mm: soft-offline: close the race against page allocation
  mm: fix race on soft-offlining free huge pages
  namei: allow restricted O_CREAT of FIFOs and regular files
  hfs: prevent crash on exit from failed search
  ...
2018-08-23 19:20:12 -07:00
Souptick Joarder
2b74030354 mm: Change return type int to vm_fault_t for fault handlers
Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

Ref-> commit 1c8f422059 ("mm: change return type to vm_fault_t")

The aim is to change the return type of finish_fault() and
handle_mm_fault() to vm_fault_t type.  As part of that clean up return
type of all other recursively called functions have been changed to
vm_fault_t type.

The places from where handle_mm_fault() is getting invoked will be
change to vm_fault_t type but in a separate patch.

vmf_error() is the newly introduce inline function in 4.17-rc6.

[akpm@linux-foundation.org: don't shadow outer local `ret' in __do_huge_pmd_anonymous_page()]
Link: http://lkml.kernel.org/r/20180604171727.GA20279@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23 18:48:44 -07:00