Nothing calls unregister_ftrace_function_probe(). Remove it as well as the
flag PROBE_TEST_DATA, as this function was the only one to set it.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As the data pointer for individual ips will soon be removed and no longer
passed to the callback function probe handlers, convert the rest of the function
trigger counters over to the new ftrace_func_mapper helper functions.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As the data pointer for individual ips will soon be removed and no longer
passed to the callback function probe handlers, convert the snapshot
trigger counter over to the new ftrace_func_mapper helper functions.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In order to move the ops to the function probes directly, they need a way to
map function ips to their own data without depending on the infrastructure
of the function probes, as the data field will be going away.
New helper functions are added that are based on the ftrace_hash code.
ftrace_func_mapper functions are there to let the probes map ips to their
data. These can be allocated by the probe ops, and referenced in the
function callbacks.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In preparation to cleaning up the probe function registration code, the
"data" parameter will eventually be removed from the probe->func() call.
Instead it will receive its own "ops" function, in which it can set up its
own data that it needs to map.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As nothing outside the tracing directory uses the function command mechanism,
I'm moving the prototypes out of the include/linux/ftrace.h and into the
local kernel/trace/trace.h header. I plan on making them hook to the
trace_array structure which is local to kernel/trace, and I do not want to
expose it to the rest of the kernel. This requires that the command functions
must also be local to tracing. But luckily nothing else uses them.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull two more ftrace fixes from Steven Rostedt:
"While continuing my development, I uncovered two more small bugs.
One is a race condition when enabling the snapshot function probe
trigger. It enables the probe before allocating the snapshot, and if
the probe triggers first, it stops tracing with a warning that the
snapshot buffer was not allocated.
The seconds is that the snapshot file should show how to use it when
it is empty. But a bug fix from long ago broke the "is empty" test and
the snapshot file no longer displays the help message"
* tag 'trace-v4.11-rc5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Have ring_buffer_iter_empty() return true when empty
tracing: Allocate the snapshot buffer before enabling probe
A function in kernel/bpf/syscall.c which got a bug fix in 'net'
was moved to kernel/bpf/verifier.c in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 239aeba764 ("perf powerpc: Fix kprobe and kretprobe handling with
kallsyms on ppc64le") changed how we use the offset field in struct kprobe on
ABIv2. perf now offsets from the global entry point if an offset is specified
and otherwise chooses the local entry point.
Fix the same in kernel for kprobe API users. We do this by extending
kprobe_lookup_name() to accept an additional parameter to indicate the offset
specified with the kprobe registration. If offset is 0, we return the local
function entry and return the global entry point otherwise.
With:
# cd /sys/kernel/debug/tracing/
# echo "p _do_fork" >> kprobe_events
# echo "p _do_fork+0x10" >> kprobe_events
before this patch:
# cat ../kprobes/list
c0000000000d0748 k _do_fork+0x8 [DISABLED]
c0000000000d0758 k _do_fork+0x18 [DISABLED]
c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED]
and after:
# cat ../kprobes/list
c0000000000d04c8 k _do_fork+0x8 [DISABLED]
c0000000000d04d0 k _do_fork+0x10 [DISABLED]
c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED]
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The macro is now pretty long and ugly on powerpc. In the light of further
changes needed here, convert it to a __weak variant to be over-ridden with a
nicer looking function.
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Skip preparing optprobe if the probe is ftrace-based, since anyway, it
must not be optimized (or already optimized by ftrace).
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:
># cat snapshot
# tracer: nop
#
#
# * Snapshot is allocated *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
But instead it just showed an empty buffer:
># cat snapshot
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.
Cc: stable@vger.kernel.org
Fixes: 651e22f270 ("ring-buffer: Always reset iterator to reader page")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently the snapshot trigger enables the probe and then allocates the
snapshot. If the probe triggers before the allocation, it could cause the
snapshot to fail and turn tracing off. It's best to allocate the snapshot
buffer first, and then enable the trigger. If something goes wrong in the
enabling of the trigger, the snapshot buffer is still allocated, but it can
also be freed by the user by writting zero into the snapshot buffer file.
Also add a check of the return status of alloc_snapshot().
Cc: stable@vger.kernel.org
Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This commit just changes a "the the" to "the" to reduce repetition.
Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the parse_rcu_nocb_poll() function assign true
(rather than the constant 1) to the bool variable rcu_nocb_poll.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The beenonline variable is declared bool so there is no need for an
explicit comparison, especially not against the constant zero.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_nocb_gp_cleanup() function is now invoked elsewhere, so this
commit drags this comment into the year 2017.
Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit changes lockdep splats to begin lines with "WARNING" and
to use pr_warn() instead of printk(). This change eases scripted
analysis of kernel console output.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Pull sparc fixes from David Miller:
"Two Sparc bug fixes from Daniel Jordan and Nitin Gupta"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix hugepage page table free
sparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALL
Pull networking fixes from David Miller:
1) BPF tail call handling bug fixes from Daniel Borkmann.
2) Fix allowance of too many rx queues in sfc driver, from Bert
Kenward.
3) Non-loopback ipv6 packets claiming src of ::1 should be dropped,
from Florian Westphal.
4) Statistics requests on KSZ9031 can crash, fix from Grygorii
Strashko.
5) TX ring handling fixes in mediatek driver, from Sean Wang.
6) ip_ra_control can deadlock, fix lock acquisition ordering to fix,
from Cong WANG.
7) Fix use after free in ip_recv_error(), from Willem de Buijn.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
bpf: fix checking xdp_adjust_head on tail calls
bpf: fix cb access in socket filter programs on tail calls
ipv6: drop non loopback packets claiming to originate from ::1
net: ethernet: mediatek: fix inconsistency of port number carried in TXD
net: ethernet: mediatek: fix inconsistency between TXD and the used buffer
net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule
net: thunderx: Fix set_max_bgx_per_node for 81xx rgx
net-timestamp: avoid use-after-free in ip_recv_error
ipv4: fix a deadlock in ip_ra_control
sfc: limit the number of receive queues
CONFIG_PROVE_LOCKING_SMALL shrinks the memory usage of lockdep so the
kernel text, data, and bss fit in the required 32MB limit, but this
option is not set for every config that enables lockdep.
A 4.10 kernel fails to boot with the console output
Kernel: Using 8 locked TLB entries for main kernel image.
hypervisor_tlb_lock[2000000:0:8000000071c007c3:1]: errors with f
Program terminated
with these config options
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_PROVE_LOCKING=n
To fix, rename CONFIG_PROVE_LOCKING_SMALL to CONFIG_LOCKDEP_SMALL, and
enable this option with CONFIG_LOCKDEP=y so we get the reduced memory
usage every time lockdep is turned on.
Tested that CONFIG_LOCKDEP_SMALL is set to 'y' if and only if
CONFIG_LOCKDEP is set to 'y'. When other lockdep-related config options
that select CONFIG_LOCKDEP are enabled (e.g. CONFIG_LOCK_STAT or
CONFIG_PROVE_LOCKING), verified that CONFIG_LOCKDEP_SMALL is also
enabled.
Fixes: e6b5f1be7a ("config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A group of Linux kernel hackers reported chasing a bug that resulted
from their assumption that SLAB_DESTROY_BY_RCU provided an existence
guarantee, that is, that no block from such a slab would be reallocated
during an RCU read-side critical section. Of course, that is not the
case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
slab of blocks.
However, there is a phrase for this, namely "type safety". This commit
therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
to avoid future instances of this sort of confusion.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
[ paulmck: Add comments mentioning the old name, as requested by Eric
Dumazet, in order to help people familiar with the old name find
the new one. ]
Acked-by: David Rientjes <rientjes@google.com>
This allows callers to get back at them instead of having to store it in
another variable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
The TREE_SRCU rewrite is large and a bit on the non-simple side, so
this commit helps reduce risk by allowing the old v4.11 SRCU algorithm
to be selected using a new CLASSIC_SRCU Kconfig option that depends
on RCU_EXPERT. The default is to use the new TREE_SRCU and TINY_SRCU
algorithms, in order to help get these the testing that they need.
However, if your users do not require the update-side scalability that
is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU
to revert back to the old classic SRCU algorithm.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The srcu_torture_stats() function is adapted to the specific srcu_struct
layout traditionally used by SRCU. This commit therefore adds support
for Tiny SRCU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
In response to automated complaints about modifications to SRCU
increasing its size, this commit creates a tiny SRCU that is
used in SMP=n && PREEMPT=n builds.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
SRCU's implementation of expedited grace periods has always assumed
that the SRCU instance is idle when the expedited request arrives.
This commit improves this a bit by maintaining a count of the number
of outstanding expedited requests, thus allowing prior non-expedited
grace periods accommodate these requests by shifting to expedited mode.
However, any non-expedited wait already in progress will still wait for
the full duration.
Improved control of expedited grace periods is planned, but one step
at a time.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex
race conditions given multiple callback queues, so this commit takes
advantage of the two-bit state now available in rcu_seq counters to
store the state in the bottom two bits of ->srcu_gp_seq.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit increases the number of reserved bits at the bottom of an
rcu_seq grace-period counter from one to two, as will be needed to
accommodate SRCU's three-state grace periods.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The expedited grace-period code contains several open-coded shifts
know the format of an rcu_seq grace-period counter, which is not
particularly good style. This commit therefore creates a new
rcu_seq_ctr() function that extracts the counter portion of the
counter, and an rcu_seq_state() function that extracts the low-order
state bit. This commit prepares for SRCU callback parallelization,
which will require two state bits.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the num_rcu_lvl[] array external so that SRCU can
make use of it for initializing its upcoming srcu_node tree.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves rcu_for_each_node_breadth_first(),
rcu_for_each_nonleaf_node_breadth_first(), and
rcu_for_each_leaf_node() from kernel/rcu/tree.h to
kernel/rcu/rcu.h so that SRCU can access them.
This commit is code-movement only.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves the rcu_init_levelspread() function from
kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it. This is
another step towards enabling SRCU to create its own combining tree.
This commit is code-movement only, give or take knock-on adjustments.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves the C preprocessor code that defines the default shape
of the rcu_node combining tree to a new include/linux/rcu_node_tree.h
file as a first step towards enabling SRCU to create its own combining
tree, which in turn enables SRCU to implement per-CPU callback handling,
thus avoiding contention on the lock currently guarding the single list
of callbacks. Note that users of SRCU still need to know the size of
the srcu_struct structure, hence include/linux rather than kernel/rcu.
This commit is code-movement only.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches SRCU from custom-built callback queues to the new
rcu_segcblist structure. This change associates grace-period sequence
numbers with groups of callbacks, which will be needed for efficient
processing of per-CPU callbacks.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds grace-period sequence numbers, which will be used to
handle mid-boot grace periods and per-CPU callback lists.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The current SRCU grace-period processing might never reach the last
portion of srcu_advance_batches(). This is OK given the current
implementation, as the first portion, up to the try_check_zero()
following the srcu_flip() is sufficient to drive grace periods forward.
However, it has the unfortunate side-effect of making it impossible to
determine when a given grace period has ended, and it will be necessary
to efficiently trace ends of grace periods in order to efficiently handle
per-CPU SRCU callback lists.
This commit therefore adds states to the SRCU grace-period processing,
so that the end of a given SRCU grace period is marked by the transition
to the SRCU_STATE_DONE state.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit simplifies the SRCU state machine by pushing the
srcu_advance_batches() idle-SRCU fastpath into the common case. This is
done by giving srcu_reschedule() a delay parameter, which is zero in
the call from srcu_advance_batches().
This commit is a step towards numbering callbacks in order to
efficiently handle per-CPU callback lists.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_seq_end() function increments seq signifying completion
of a grace period, after that checks that the seq is even and wakes
_synchronize_rcu_expedited(). The _synchronize_rcu_expedited() function
uses wait_event() to wait for even seq. The problem is that wait_event()
can return as soon as seq becomes even without waiting for the wakeup.
In such case the warning in rcu_seq_end() can falsely fire if the next
expedited grace period starts before the check.
Check that seq has good value before incrementing it.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: syzkaller@googlegroups.com
Cc: linux-kernel@vger.kernel.org
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: josh@joshtriplett.org
Cc: jiangshanlai@gmail.com
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
syzkaller-triggered warning:
WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events wait_rcu_exp_gp
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
panic+0x1fb/0x412 kernel/panic.c:179
__warn+0x1c4/0x1e0 kernel/panic.c:540
warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
---