Commit Graph

26840 Commits

Author SHA1 Message Date
Paul E. McKenney
edf22f4ca2 softirq: Eliminate cond_resched_rcu_qs() in favor of cond_resched()
Now that cond_resched() also provides RCU quiescent states when
needed, it can be used in place of cond_resched_rcu_qs().  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
2017-12-04 10:28:58 -08:00
Paul E. McKenney
e31d28b6ab trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()
Now that cond_resched() also provides RCU quiescent states when
needed, it can be used in place of cond_resched_rcu_qs().  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
2017-12-04 10:28:58 -08:00
Paul E. McKenney
a7e6425ea5 workqueue: Eliminate cond_resched_rcu_qs() in favor of cond_resched()
Now that cond_resched() also provides RCU quiescent states when
needed, it can be used in place of cond_resched_rcu_qs().  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
2017-12-04 10:28:10 -08:00
Mark Brown
4ab53fe612 PM: Provide a config snippet for disabling PM
A frequent source of build problems is poor handling of optional PM
support, almost all development is done with the PM options enabled
but they can be turned off.  Currently few if any of the build test
services do this as standard as there is no standard config for it and
the use of selects and def_bool means that simply setting CONFIG_PM=n
doesn't do what is expected.  To make this easier provide a fragement
that can be used with KCONFIG_ALLCONFIG to force PM off.

CONFIG_XEN is disabled as Xen uses hibernation callbacks which end up
turning on power management on architectures with Xen.  Some cpuidle
implementations on ARM select PM so CONFIG_CPU_IDLE is disabled, and
some ARM architectures unconditionally enable PM so they are also
disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-12-04 15:14:48 +01:00
Felipe Balbi
a773d41927 tracing: Pass export pointer as argument to ->write()
By passing an export descriptor to the write function, users don't need to
keep a global static pointer and can rely on container_of() to fetch their
own structure.

Link: http://lkml.kernel.org/r/20170602102025.5140-1-felipe.balbi@linux.intel.com

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-12-04 07:14:30 -05:00
Matthias Kaehlcke
c4bfd39d7f ring-buffer: Remove unused function __rb_data_page_index()
This fixes the following warning when building with clang:

kernel/trace/ring_buffer.c:1842:1: error: unused function
    '__rb_data_page_index' [-Werror,-Wunused-function]

Link: http://lkml.kernel.org/r/20170518001415.5223-1-mka@chromium.org

Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-12-04 07:04:01 -05:00
Arnd Bergmann
2dde6b0034 tracing: make PREEMPTIRQ_EVENTS depend on TRACING
When CONFIG_TRACING is disabled, the new preemptirq events tracer
produces a build failure:

In file included from kernel/trace/trace_irqsoff.c:17:0:
kernel/trace/trace.h: In function 'trace_test_and_set_recursion':
kernel/trace/trace.h:542:28: error: 'struct task_struct' has no member named 'trace_recursion'

Adding an explicit dependency avoids the broken configuration.

Link: http://lkml.kernel.org/r/20171103104031.270375-1-arnd@arndb.de

Fixes: d59158162e ("tracing: Add support for preempt and irq enable/disable events")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-12-04 06:52:09 -05:00
Changbin Du
90e406f96f tracing: Allocate mask_str buffer dynamically
The default NR_CPUS can be very large, but actual possible nr_cpu_ids
usually is very small. For my x86 distribution, the NR_CPUS is 8192 and
nr_cpu_ids is 4. About 2 pages are wasted.

Most machines don't have so many CPUs, so define a array with NR_CPUS
just wastes memory. So let's allocate the buffer dynamically when need.

With this change, the mutext tracing_cpumask_update_lock also can be
removed now, which was used to protect mask_str.

Link: http://lkml.kernel.org/r/1512013183-19107-1-git-send-email-changbin.du@intel.com

Fixes: 36dfe9252b ("ftrace: make use of tracing_cpumask")
Cc: stable@vger.kernel.org
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-12-04 06:52:08 -05:00
Chunyu Hu
5a93bae2c3 tracing: Fix code comments in trace.c
Naming in code comments for tracing_snapshot, tracing_snapshot_alloc
and trace_pid_filter_add_remove_task don't match the real function
names.  And latency_trace has been removed from tracing directory.
Fix them.

Link: http://lkml.kernel.org/r/1508394753-20887-1-git-send-email-chuhu@redhat.com

Fixes: cab5037 ("tracing/ftrace: Enable snapshot function trigger")
Fixes: 886b5b7 ("tracing: remove /debug/tracing/latency_trace")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
[ Replaced /sys/kernel/debug/tracing with /sys/kerne/tracing ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-12-04 06:52:07 -05:00
David S. Miller
c2eb6d07a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a compilation warning in xdp redirect tracepoint due to
   missing bpf.h include that pulls in struct bpf_map, from Xie.

2) Limit the maximum number of attachable BPF progs for a given
   perf event as long as uabi is not frozen yet. The hard upper
   limit is now 64 and therefore the same as with BPF multi-prog
   for cgroups. Also add related error checking for the sample
   BPF loader when enabling and attaching to the perf event, from
   Yonghong.

3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
   case, so that the test case can always pass and not fail in
   some environments due to too low default limit, also from
   Yonghong.

4) Fix up a missing license header comment for kernel/bpf/offload.c,
   from Jakub.

5) Several fixes for bpftool, among others a crash on incorrect
   arguments when json output is used, error message handling
   fixes on unknown options and proper destruction of json writer
   for some exit cases, all from Quentin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 13:08:30 -05:00
Linus Torvalds
75f64f68af Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A selection of fixes/changes that should make it into this series.
  This contains:

   - NVMe, two merges, containing:
        - pci-e, rdma, and fc fixes
        - Device quirks

   - Fix for a badblocks leak in null_blk

   - bcache fix from Rui Hua for a race condition regression where
     -EINTR was returned to upper layers that didn't expect it.

   - Regression fix for blktrace for a bug introduced in this series.

   - blktrace cleanup for cgroup id.

   - bdi registration error handling.

   - Small series with cleanups for blk-wbt.

   - Various little fixes for typos and the like.

  Nothing earth shattering, most important are the NVMe and bcache fixes"

* 'for-linus' of git://git.kernel.dk/linux-block: (34 commits)
  nvme-pci: fix NULL pointer dereference in nvme_free_host_mem()
  nvme-rdma: fix memory leak during queue allocation
  blktrace: fix trace mutex deadlock
  nvme-rdma: Use mr pool
  nvme-rdma: Check remotely invalidated rkey matches our expected rkey
  nvme-rdma: wait for local invalidation before completing a request
  nvme-rdma: don't complete requests before a send work request has completed
  nvme-rdma: don't suppress send completions
  bcache: check return value of register_shrinker
  bcache: recover data from backing when data is clean
  bcache: Fix building error on MIPS
  bcache: add a comment in journal bucket reading
  nvme-fc: don't use bit masks for set/test_bit() numbers
  blk-wbt: fix comments typo
  blk-wbt: move wbt_clear_stat to common place in wbt_done
  blk-sysfs: remove NULL pointer checking in queue_wb_lat_store
  blk-wbt: remove duplicated setting in wbt_init
  nvme-pci: add quirk for delay before CHK RDY for WDC SN200
  block: remove useless assignment in bio_split
  null_blk: fix dev->badblocks leak
  ...
2017-12-01 08:05:45 -05:00
Alexei Starovoitov
914cb781ee bpf: cleanup register_is_null()
don't pass large struct bpf_reg_state by value.
Instead pass it by pointer.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 11:25:10 +01:00
Alexei Starovoitov
3bf15921c5 bpf: improve JEQ/JNE path walking
verifier knows how to trim paths that are known not to be
taken at run-time when register containing run-time constant
is compared with another constant.
It was done only for JEQ comparison.
Extend it to include JNE as well.
More cases can be added in the future.

                     before  after
bpf_lb-DLB_L3.o       2270    2051
bpf_lb-DLB_L4.o       3682    3287
bpf_lb-DUNKNOWN.o     1110    1080
bpf_lxc-DDROP_ALL.o   27876   24980
bpf_lxc-DUNKNOWN.o    38780   34308
bpf_netdev.o          16937   15404
bpf_overlay.o         7929    7191

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 11:25:10 +01:00
Alexei Starovoitov
2f18f62ee1 bpf: improve verifier liveness marks
registers with pointers filled from stack were missing live_written marks
which caused liveness propagation to unnecessary mark more registers as
live_read and miss state pruning opportunities later on.

                     before  after
bpf_lb-DLB_L3.o       2285   2270
bpf_lb-DLB_L4.o       3723   3682
bpf_lb-DUNKNOWN.o     1110   1110
bpf_lxc-DDROP_ALL.o   27954  27876
bpf_lxc-DUNKNOWN.o    38954  38780
bpf_netdev.o          16943  16937
bpf_overlay.o         7929   7929

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 11:25:10 +01:00
Alexei Starovoitov
19ceb4178d bpf: don't mark FP reg as uninit
when verifier hits an internal bug don't mark register R10==FP as uninit,
since it's read only register and it's not technically correct to let
verifier run further, since it may assume that R10 has valid auxiliary state.

While developing subsequent patches this issue was discovered,
though the code eventually changed that aux reg state doesn't have
pointers any more it is still safer to avoid clearing readonly register.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 11:25:10 +01:00
Alexei Starovoitov
4e92024a48 bpf: print liveness info to verifier log
let verifier print register and stack liveness information
into verifier log

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 11:25:10 +01:00
Alexei Starovoitov
12a3cc8424 bpf: fix stack state printing in verifier log
fix incorrect stack state prints in print_verifier_state()

Fixes: 638f5b90d4 ("bpf: reduce verifier memory consumption")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 11:25:10 +01:00
Yonghong Song
c8c088ba0e bpf: set maximum number of attached progs to 64 for a single perf tp
cgropu+bpf prog array has a maximum number of 64 programs.
Let us apply the same limit here.

Fixes: e87c6bc385 ("bpf: permit multiple bpf attachments for a single perf event")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 02:56:10 +01:00
Linus Torvalds
668533dc07 kallsyms: take advantage of the new '%px' format
The conditional kallsym hex printing used a special fixed-width '%lx'
output (KALLSYM_FMT) in preparation for the hashing of %p, but that
series ended up adding a %px specifier to help with the conversions.

Use it, and avoid the "print pointer as an unsigned long" code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 10:30:13 -08:00
Ingo Molnar
6e948c67c4 Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf tooling fixes from Arnaldo Carvalho de Melo:

"- Fix window dimensions change handling in 'perf top' (Jiri Olsa)

- Fix 'perf record -c/-F' options for CPU event aliases (Andi Kleen)

- Generate PERF_RECORD_{MMAP,COMM,EXEC} with 'perf record --delay'
  fixing symbol resolution for processes created, maps put in place
  while --delay happens (Arnaldo Carvalho de Melo)

- Fix up leftover perf_evsel_stat usage via evsel->priv, plugging
  a SEGV when using event groups as in:

     $ perf stat -e '{cpu-clock,instructions}' workload

- Fix 'perf script --per-event-dump' for auxtrace synth evsels (Arnaldo Carvalho de Melo)

- Ignore kptr_restrict when not sampling the kernel (Arnaldo Carvalho de Melo)

- Synchronize kernel ABI headers wrt SPDX tags and ABI changes,
  taking minimal action to handle new syscall args and silencing
  perf build warnings (Arnaldo Carvalho de Melo, Ingo Molnar)

- Fix header.size for namespace events (Jiri Olsa)

- Fix a bug during strstart() conversion in 'perf help' (Namhyung Kim)

- Do not truncate instruction names at 6 chars in 'perf annotate', there
  are really long instruction names in PPC (Ravi Bangoria)

- Fixup discontiguous/sparse numa nodes in 'perf bench numa' (Satheesh Rajendran)

- Fix an exit code of trace__symbols_init in 'perf trace' (Andrei Vagin)

- Fix 'perf test' entries on s/390 (Thomas Richter)

- Bring instruction decoder files used by Intel PT into line with the kernel,
  silencing build warning (Adrian Hunter)"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-29 07:15:09 +01:00
Ingo Molnar
4fc31ba13d Merge branch 'linus' into perf/urgent, to pick up dependent commits
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-29 07:11:24 +01:00
Paul E. McKenney
2fe2582649 sched: Stop switched_to_rt() from sending IPIs to offline CPUs
The rcutorture test suite occasionally provokes a splat due to invoking
rt_mutex_lock() which needs to boost the priority of a task currently
sitting on a runqueue that belongs to an offline CPU:

WARNING: CPU: 0 PID: 12 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x37/0x40
Modules linked in:
CPU: 0 PID: 12 Comm: rcub/7 Not tainted 4.14.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff9ed3de5f8cc0 task.stack: ffffbbf80012c000
RIP: 0010:native_smp_send_reschedule+0x37/0x40
RSP: 0018:ffffbbf80012fd10 EFLAGS: 00010082
RAX: 000000000000002f RBX: ffff9ed3dd9cb300 RCX: 0000000000000004
RDX: 0000000080000004 RSI: 0000000000000086 RDI: 00000000ffffffff
RBP: ffffbbf80012fd10 R08: 000000000009da7a R09: 0000000000007b9d
R10: 0000000000000001 R11: ffffffffbb57c2cd R12: 000000000000000d
R13: ffff9ed3de5f8cc0 R14: 0000000000000061 R15: ffff9ed3ded59200
FS:  0000000000000000(0000) GS:ffff9ed3dea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000080686f0 CR3: 000000001b9e0000 CR4: 00000000000006f0
Call Trace:
 resched_curr+0x61/0xd0
 switched_to_rt+0x8f/0xa0
 rt_mutex_setprio+0x25c/0x410
 task_blocks_on_rt_mutex+0x1b3/0x1f0
 rt_mutex_slowlock+0xa9/0x1e0
 rt_mutex_lock+0x29/0x30
 rcu_boost_kthread+0x127/0x3c0
 kthread+0x104/0x140
 ? rcu_report_unblock_qs_rnp+0x90/0x90
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x22/0x30
Code: f0 00 0f 92 c0 84 c0 74 14 48 8b 05 34 74 c5 00 be fd 00 00 00 ff 90 a0 00 00 00 5d c3 89 fe 48 c7 c7 a0 c6 fc b9 e8 d5 b5 06 00 <0f> ff 5d c3 0f 1f 44 00 00 8b 05 a2 d1 13 02 85 c0 75 38 55 48

But the target task's priority has already been adjusted, so the only
purpose of switched_to_rt() invoking resched_curr() is to wake up the
CPU running some task that needs to be preempted by the boosted task.
But the CPU is offline, which presumably means that the task must be
migrated to some other CPU, and that this other CPU will undertake any
needed preemption at the time of migration.  Because the runqueue lock
is held when resched_curr() is invoked, we know that the boosted task
cannot go anywhere, so it is not necessary to invoke resched_curr()
in this particular case.

This commit therefore makes switched_to_rt() refrain from invoking
resched_curr() when the target CPU is offline.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2017-11-28 16:00:27 -08:00
Paul E. McKenney
a0982dfa03 sched: Stop resched_cpu() from sending IPIs to offline CPUs
The rcutorture test suite occasionally provokes a splat due to invoking
resched_cpu() on an offline CPU:

WARNING: CPU: 2 PID: 8 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x37/0x40
Modules linked in:
CPU: 2 PID: 8 Comm: rcu_preempt Not tainted 4.14.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff902ede9daf00 task.stack: ffff96c50010c000
RIP: 0010:native_smp_send_reschedule+0x37/0x40
RSP: 0018:ffff96c50010fdb8 EFLAGS: 00010096
RAX: 000000000000002e RBX: ffff902edaab4680 RCX: 0000000000000003
RDX: 0000000080000003 RSI: 0000000000000000 RDI: 00000000ffffffff
RBP: ffff96c50010fdb8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 00000000299f36ae R12: 0000000000000001
R13: ffffffff9de64240 R14: 0000000000000001 R15: ffffffff9de64240
FS:  0000000000000000(0000) GS:ffff902edfc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000f7d4c642 CR3: 000000001e0e2000 CR4: 00000000000006e0
Call Trace:
 resched_curr+0x8f/0x1c0
 resched_cpu+0x2c/0x40
 rcu_implicit_dynticks_qs+0x152/0x220
 force_qs_rnp+0x147/0x1d0
 ? sync_rcu_exp_select_cpus+0x450/0x450
 rcu_gp_kthread+0x5a9/0x950
 kthread+0x142/0x180
 ? force_qs_rnp+0x1d0/0x1d0
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x27/0x40
Code: 14 01 0f 92 c0 84 c0 74 14 48 8b 05 14 4f f4 00 be fd 00 00 00 ff 90 a0 00 00 00 5d c3 89 fe 48 c7 c7 38 89 ca 9d e8 e5 56 08 00 <0f> ff 5d c3 0f 1f 44 00 00 8b 05 52 9e 37 02 85 c0 75 38 55 48
---[ end trace 26df9e5df4bba4ac ]---

This splat cannot be generated by expedited grace periods because they
always invoke resched_cpu() on the current CPU, which is good because
expedited grace periods require that resched_cpu() unconditionally
succeed.  However, other parts of RCU can tolerate resched_cpu() acting
as a no-op, at least as long as it doesn't happen too often.

This commit therefore makes resched_cpu() invoke resched_curr() only if
the CPU is either online or is the current CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2017-11-28 16:00:26 -08:00
Paul E. McKenney
dac9590600 torture: Suppress CPU stall warnings during shutdown ftrace dump
The torture_shutdown() function directly invokes ftrace_dump(), which
can result in RCU CPU stall warnings when the ftrace buffer is large,
which it usually is.  This commit therefore invoks rcu_ftrace_dump()
in place of ftrace_dump(), suppressing RCU CPU stall warnings during
this time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:54:26 -08:00
Paul E. McKenney
d633198088 srcu: Prohibit call_srcu() use under raw spinlocks
Invoking queue_delayed_work() while holding a raw spinlock is forbidden
in -rt kernels, which is exactly what __call_srcu() does, indirectly via
srcu_funnel_gp_start().  This commit therefore downgrades Tree SRCU's
locking from raw to non-raw spinlocks, which works because call_srcu()
is not ever called while holding a raw spinlock.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:52:33 -08:00
Paul E. McKenney
e68bbb266d rcu: Simplify rcu_eqs_{enter,exit}() non-idle task debug code
The code that checks for non-idle non-nohz_idle-usermode tasks invoking
rcu_eqs_enter() and rcu_eqs_exit() prints a considerable quantity of
helpful information.  However, these checks fire rarely, so the extra
complexity is no longer worth it.  This commit therefore replaces this
debug code with simple WARN_ON_ONCE() statements.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:21 -08:00
Paul E. McKenney
9dd238e286 rcu: Fold rcu_eqs_exit_common() into rcu_eqs_exit()
There is now only one call to rcu_eqs_exit_common() and there is no other
reason to keep it separate.  This commit therefore inlines it into its
sole call site, saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:21 -08:00
Paul E. McKenney
215bba9f59 rcu: Fold rcu_eqs_enter_common() into rcu_eqs_enter()
There is now only one call to rcu_eqs_enter_common() and there is no other
reason to keep it separate.  This commit therefore inlines it into its
sole call site, saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:20 -08:00
Paul E. McKenney
2342172fd6 rcu: Avoid ->dynticks_nesting store tearing
Although ->dynticks_nesting is updated only by process level, it is
accessed from hardirq to check for interrupt-from-idle quiescent states.
Store tearing is thus possible, so this commit applies WRITE_ONCE()
to ->dynticks_nesting stores.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:20 -08:00
Paul E. McKenney
914955e18c rcu: Stop duplicating lockdep checks in RCU's idle-entry code
The three RCU_LOCKDEP_WARN() calls in rcu_eqs_enter_common() are
redundant with other lockdep checks, so this commit removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:19 -08:00
Paul E. McKenney
dec98900ea rcu: Add ->dynticks field to rcu_dyntick trace event
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:19 -08:00
Paul E. McKenney
84585aa8b6 rcu: Shrink ->dynticks_{nmi_,}nesting from long long to long
Because the ->dynticks_nesting field now only contains the process-based
nesting level instead of a value encoding both the process nesting level
and the irq "nesting" level, we no longer need a long long, even on
32-bit systems.  This commit therefore changes both the ->dynticks_nesting
and ->dynticks_nmi_nesting fields to long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:18 -08:00
Paul E. McKenney
bd2b879a1c rcu: Add tracing to irq/NMI dyntick-idle transitions
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:18 -08:00
Tycho Andersen
26500475ac ptrace, seccomp: add support for retrieving seccomp metadata
With the new SECCOMP_FILTER_FLAG_LOG, we need to be able to extract these
flags for checkpoint restore, since they describe the state of a filter.

So, let's add PTRACE_SECCOMP_GET_METADATA, similar to ..._GET_FILTER, which
returns the metadata of the nth filter (right now, just the flags).
Hopefully this will be future proof, and new per-filter metadata can be
added to this struct.

Signed-off-by: Tycho Andersen <tycho@docker.com>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-28 15:41:01 -08:00
Tycho Andersen
f06eae831f seccomp: hoist out filter resolving logic
Hoist out the nth filter resolving logic that ptrace uses into a new
function. We'll use this in the next patch to implement the new
PTRACE_SECCOMP_GET_FILTER_FLAGS command.

Signed-off-by: Tycho Andersen <tycho@docker.com>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-28 15:36:01 -08:00
Jiri Olsa
34900ec5c9 perf: Fix header.size for namespace events
Reset header size for namespace events, otherwise it only gets bigger in
ctx iterations.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixes: e422267322 ("perf: Add PERF_RECORD_NAMESPACES to include namespaces related info")
Link: http://lkml.kernel.org/n/tip-nlo4gonz9d4guyb8153ukzt0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-28 14:27:05 -03:00
Al Viro
ecf927000c ring_buffer_poll_wait() return value used as return value of ->poll()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-28 11:07:12 -05:00
Lucas Stach
52cf373c37 cgroup: properly init u64_stats
Lockdep complains that the stats update is trying to register a non-static
key. This is because u64_stats are using a seqlock on 32bit arches, which
needs to be initialized before usage.

Fixes: 041cd640b2 (cgroup: Implement cgroup2 basic CPU usage accounting)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-28 07:16:08 -08:00
Lai Jiangshan
46febd37f9 smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place
Commit 31487f8328 ("smp/cfd: Convert core to hotplug state machine")
accidently put this step on the wrong place. The step should be at the
cpuhp_ap_states[] rather than the cpuhp_bp_states[].

grep smpcfd /sys/devices/system/cpu/hotplug/states
 40: smpcfd:prepare
129: smpcfd:dying

"smpcfd:dying" was missing before.
So was the invocation of the function smpcfd_dying_cpu().

Fixes: 31487f8328 ("smp/cfd: Convert core to hotplug state machine")
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lkml.kernel.org/r/20171128131954.81229-1-jiangshanlai@gmail.com
2017-11-28 14:40:23 +01:00
Jakub Kicinski
a39e17b2d8 bpf: offload: add a license header
I forgot to add a license on kernel/bpf/offload.c.  Luckily I'm
still the only author so make it explicitly GPLv2.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-27 22:24:51 +01:00
Al Viro
9dd957485d ipc, kernel, mm: annotate ->poll() instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:20:05 -05:00
Wang Long
ddf7005f32 debug cgroup: use task_css_set instead of rcu_dereference
This macro `task_css_set` verifies that the caller is
inside proper critical section if the kernel set CONFIG_PROVE_RCU=y.

Signed-off-by: Wang Long <wanglong19@meituan.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 11:37:33 -08:00
Jens Axboe
2967acbb25 blktrace: fix trace mutex deadlock
A previous commit changed the locking around registration/cleanup,
but direct callers of blk_trace_remove() were missed. This means
that if we hit the error path in setup, we will deadlock on
attempting to re-acquire the queue trace mutex.

Fixes: 1f2cac107c ("blktrace: fix unlocked access to init/start-stop/teardown")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-27 12:03:58 -07:00
Tal Shorer
c98a980509 workqueue: respect isolated cpus when queueing an unbound work
Initialize wq_unbound_cpumask to exclude cpus that were isolated by
the cmdline's isolcpus parameter.

Signed-off-by: Tal Shorer <tal.shorer@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 08:57:00 -08:00
Prateek Sood
1599a185f0 cpuset: Make cpuset hotplug synchronous
Convert cpuset_hotplug_workfn() into synchronous call for cpu hotplug
path. For memory hotplug path it still gets queued as a work item.

Since cpuset_hotplug_workfn() can be made synchronous for cpu hotplug
path, it is not required to wait for cpuset hotplug while thawing
processes.

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 08:48:10 -08:00
Prateek Sood
aa24163b2e cgroup/cpuset: remove circular dependency deadlock
Remove circular dependency deadlock in a scenario where hotplug of CPU is
being done while there is updation in cgroup and cpuset triggered from
userspace.

Process A => kthreadd => Process B => Process C => Process A

Process A
cpu_subsys_offline();
  cpu_down();
    _cpu_down();
      percpu_down_write(&cpu_hotplug_lock); //held
      cpuhp_invoke_callback();
	     workqueue_offline_cpu();
            queue_work_on(); // unbind_work on system_highpri_wq
               __queue_work();
                 insert_work();
                    wake_up_worker();
            flush_work();
               wait_for_completion();

worker_thread();
   manage_workers();
      create_worker();
	     kthread_create_on_node();
		    wake_up_process(kthreadd_task);

kthreadd
kthreadd();
  kernel_thread();
    do_fork();
      copy_process();
        percpu_down_read(&cgroup_threadgroup_rwsem);
          __rwsem_down_read_failed_common(); //waiting

Process B
kernfs_fop_write();
  cgroup_file_write();
    cgroup_procs_write();
      percpu_down_write(&cgroup_threadgroup_rwsem); //held
      cgroup_attach_task();
        cgroup_migrate();
          cgroup_migrate_execute();
            cpuset_can_attach();
              mutex_lock(&cpuset_mutex); //waiting

Process C
kernfs_fop_write();
  cgroup_file_write();
    cpuset_write_resmask();
      mutex_lock(&cpuset_mutex); //held
      update_cpumask();
        update_cpumasks_hier();
          rebuild_sched_domains_locked();
            get_online_cpus();
              percpu_down_read(&cpu_hotplug_lock); //waiting

Eliminating deadlock by reversing the locking order for cpuset_mutex and
cpu_hotplug_lock.

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 08:48:10 -08:00
Paul E. McKenney
844ccdd7dc rcu: Eliminate rcu_irq_enter_disabled()
Now that the irq path uses the rcu_nmi_{enter,exit}() algorithm,
rcu_irq_enter() and rcu_irq_exit() may be used from any context.  There is
thus no need for rcu_irq_enter_disabled() and for the checks using it.
This commit therefore eliminates rcu_irq_enter_disabled().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
51a1fd30f1 rcu: Make ->dynticks_nesting be a simple counter
Now that ->dynticks_nesting counts only process-level dyntick-idle
entry and exit, there is no need for the elaborate segmented counter
with its guard fields and overflow checking.  This commit therefore
makes ->dynticks_nesting be a simple counter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
58721f5da4 rcu: Define rcu_irq_{enter,exit}() in terms of rcu_nmi_{enter,exit}()
RCU currently uses two different mechanisms for tracking irqs and NMIs.
This is unnecessary complexity: Given that NMIs can nest and given that
RCU's tracking handles such nesting, the NMI tracking mechanism can also
be used to track irqs.  This commit therefore defines rcu_irq_enter()
in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_nmi_exit().

Unfortunately, callers must still distinguish between the irq and NMI
functions because additional actions are taken when an irq interrupts
idle or nohz_full usermode execution, and these actions cannot always
be taken from NMI handlers.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
6136d6e48a rcu: Clamp ->dynticks_nmi_nesting at eqs entry/exit
In preparation for merging dyntick-idle irq handling into the NMI
algorithm, clamp ->dynticks_nmi_nesting value to allow for interrupts
that enter but never leave and vice versa.

It is important that the clamping happen outside of the extended quiescent
state.  Otherwise, there will be short windows where irqs and NMIs fail
to convince RCU to start watching.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00