Pull perf fixes from Thomas Gleixner:
"This is much bigger than typical fixes, but Peter found a category of
races that spurred more fixes and more debugging enhancements. Work
started before the merge window, but got finished only now.
Aside of that this contains the usual small fixes to perf and tools.
Nothing particular exciting"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
perf: Remove/simplify lockdep annotation
perf: Synchronously clean up child events
perf: Untangle 'owner' confusion
perf: Add flags argument to perf_remove_from_context()
perf: Clean up sync_child_event()
perf: Robustify event->owner usage and SMP ordering
perf: Fix STATE_EXIT usage
perf: Update locking order
perf: Remove __free_event()
perf/bpf: Convert perf_event_array to use struct file
perf: Fix NULL deref
perf/x86: De-obfuscate code
perf/x86: Fix uninitialized value usage
perf: Fix race in perf_event_exit_task_context()
perf: Fix orphan hole
perf stat: Do not clean event's private stats
perf hists: Fix HISTC_MEM_DCACHELINE width setting
perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
perf tests: Remove wrong semicolon in while loop in CQM test
perf: Synchronously free aux pages in case of allocation failure
...
Mel reported stddev reporting was broken due to following commit:
106a94a0f8 ("perf stat: Introduce read_counters function")
This commit merged interval and overall counters reading into single
read_counters function.
The old interval code cleaned the stddev data for some reason (it's
never displayed in interval mode) and the mentioned commit kept on
cleaning the stddev data in merged function, which resulted in the
stddev not being displayed.
Removing the wrong stddev data cleanup init_stats call.
Reported-and-Tested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org # v4.2+
Fixes: 106a94a0f8 ("perf stat: Introduce read_counters function")
Link: http://lkml.kernel.org/r/1453290995-18485-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull rdma updates from Doug Ledford:
"Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed
through the RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to
dependencies, acknowledged by Bruce)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
IB/mlx5: Unify CQ create flags check
IB/mlx5: Expose Raw Packet QP to user space consumers
{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
IB/mlx5: Add Raw Packet QP query functionality
IB/mlx5: Add create and destroy functionality for Raw Packet QP
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
IB/mlx5: Allocate a Transport Domain for each ucontext
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
net/mlx5_core: Add RQ and SQ event handling
net/mlx5_core: Export transport objects
IB/mlx5: Expose CQE version to user-space
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
IB/sa: Fix netlink local service GFP crash
IB/srpt: Remove redundant wc array
IB/qib: Improve ipoib UD performance
IB/mlx4: Advertise RoCE v2 support
IB/mlx4: Create and use another QP1 for RoCEv2
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
...
While recording guest samples in host using perf kvm record, it will
populate unprocessable sample error, though samples will be recorded
properly. While generating report using perf kvm report, no samples will
be processed and same error will populate. We have seen this behaviour
with upstream perf(4.4-rc3) on x86 and ppc64 hardware.
Reason behind this failure is, when it tries to fetch machine from
rb_tree of machines, it fails. As a part of tracing a bug, we figured
out that this code was incorrectly refactored in commit 54245fdc35
("perf session: Remove wrappers to machines__find").
This patch will change the functionality such that if it can't fetch
machine in first trial, it will create one node of machine and add that to
rb_tree. So next time when it tries to fetch same machine from rb_tree,
it won't fail. Actually it was the case before refactoring of code in
aforementioned commit.
This patch is generated from acme perf/core branch.
Below I've mention an example that demonstrate the behaviour before and
after applying patch.
Before applying patch:
[Note: One needs to run guest before recording data in host]
ravi@ravi-bangoria:~$ ./perf kvm record -a
Warning:
5903 unprocessable samples recorded.
Do you have a KVM guest running and not using 'perf kvm'?
[ perf record: Captured and wrote 1.409 MB perf.data.guest (285 samples) ]
ravi@ravi-bangoria:~$ ./perf kvm report --stdio
Warning:
5903 unprocessable samples recorded.
Do you have a KVM guest running and not using 'perf kvm'?
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 285 of event 'cycles'
# Event count (approx.): 88715406
#
# Overhead Command Shared Object Symbol
# ........ ....... ............. ......
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
After applying patch:
ravi@ravi-bangoria:~$ ./perf kvm record -a
[ perf record: Captured and wrote 1.188 MB perf.data.guest (17 samples) ]
ravi@ravi-bangoria:~$ ./perf kvm report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 17 of event 'cycles'
# Event count (approx.): 700746
#
# Overhead Command Shared Object Symbol
# ........ ....... ................ ......................
#
34.19% :5758 [unknown] [g] 0xffffffff818682ab
22.79% :5758 [unknown] [g] 0xffffffff812dc7f8
22.79% :5758 [unknown] [g] 0xffffffff818650d0
14.83% :5758 [unknown] [g] 0xffffffff8161a1b6
2.49% :5758 [unknown] [g] 0xffffffff818692bf
0.48% :5758 [unknown] [g] 0xffffffff81869253
0.05% :5758 [unknown] [g] 0xffffffff81869250
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v3.19+
Fixes: 54245fdc35 ("perf session: Remove wrappers to machines__find")
Link: http://lkml.kernel.org/r/1449471302-11283-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently perf report only shows a help message "For a higher level
overview, try: perf report --sort comm,dso" unconditionally (even if
the sort keys were used). Add more help tips and show randomly.
Load tips from ${prefix}/share/doc/perf-tip/tips.txt file.
$ perf report | tail
0.10% swapper [kernel.vmlinux] [k] irq_exit
0.09% swapper [kernel.vmlinux] [k] flush_smp_call_function_queue
0.08% swapper [kernel.vmlinux] [k] native_write_msr_safe
0.03% swapper [kernel.vmlinux] [k] group_sched_in
0.01% perf [kernel.vmlinux] [k] native_write_msr_safe
#
# (Tip: Search options using a keyword: perf report -h <keyword>)
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1452166913-27046-1-git-send-email-namhyung@kernel.org
[ Renamed it to perf_tip() and the parameter dirname to dirpath to fix the build on older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The event group view feature is to see related events together. To use
the group view, events should be recorded as a group with a dedicated
syntax of surrounding events by braces (-e '{ evt1, evt2, ... }').
Also 'perf report' also requires the --group option to enable it.
However it's almost always beneficial to use the group view to see the
group events as it's more expressive. And I think it's more natural to
see events together if they are recorded as a group.
Thus this patch changes the default value to enable it. If users don't
want to see like it and keep the original behavior, they can set the
report.group config variable to false and/or use --no-group option in
the 'perf report' command line.
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1448807057-3506-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixing this problem, introduced recently:
$ perf test python
16: Try 'import perf' in python, checking link problems : FAILED!
In verbose mode we find out what is missing:
$ perf test -v python
16: Try 'import perf' in python, checking link problems :
--- start ---
test child forked, pid 24894
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /tmp/build/perf/python/perf.so: undefined symbol: find_next_bit
test child finished with -1
---- end ----
Try 'import perf' in python, checking link problems: FAILED!
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: f77b57ad4f ("perf cpu_map: Add cpu_map__new_event function")
Link: http://lkml.kernel.org/n/tip-rajx0zkz6czdrnvvwf0jp76p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add support to get stat events data in perf python scripts.
The python script shall implement the following new interface to process
stat data:
def stat__<event_name>_[<modifier>](cpu, thread, time, val, ena, run):
- is called for every stat event for given counter,
if user monitors 'cycles,instructions:u" following
callbacks should be defined:
def stat__cycles(cpu, thread, time, val, ena, run):
def stat__instructions_u(cpu, thread, time, val, ena, run):
def stat__interval(time):
- is called for every interval with its time,
in non interval mode it's called after last
stat event with total measured time in ns
The rest of the current interface stays untouched..
Please check example CPI metrics script in following patch
with command line examples in changelogs.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1452028152-26762-8-git-send-email-jolsa@kernel.org
[ Rename 'time' parameters to 'tstamp', to fix the build in older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The dynamic sort key requires event name but specifying full event name
is rather inconvenient. This patch adds more ways to identify the event
in a more compact way.
1. If session has just one event, event name can be omitted.
2. Events can be accessed by index preceded by a percent sign.
3. A part of the name can be used, if it's not ambiguous. The partial
name should not contain ':' in it.
4. Full system + event name is still used, it should contain ':'.
So in the below example all does same thing:
$ perf record -e sched:sched_switch -a sleep 1
$ perf report -s next_pid,next_comm
$ perf report -s %1.next_pid,%1.next_comm
$ perf report -s switch.next_pid,switch.next_comm
$ perf report -s sched:sched_switch.next_pid,sched:sched_switch.next_comm
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-10-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'trace' sort key is to show tracepoint event output using either
print fmt or plugin. For example sched_switch event (using plugin) will
show output like below:
# perf record -e sched:sched_switch -a usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.197 MB perf.data (69 samples) ]
#
$ perf report -s trace --stdio
...
# Overhead Trace output
# ........ ...................................................
#
9.48% swapper/0:0 [120] R ==> transmission-gt:17773 [120]
9.48% transmission-gt:17773 [120] S ==> swapper/0:0 [120]
9.04% swapper/2:0 [120] R ==> transmission-gt:17773 [120]
8.92% transmission-gt:17773 [120] S ==> swapper/2:0 [120]
5.25% swapper/0:0 [120] R ==> kworker/0:1H:109 [100]
5.21% kworker/0:1H:109 [100] S ==> swapper/0:0 [120]
1.78% swapper/3:0 [120] R ==> transmission-gt:17773 [120]
1.78% transmission-gt:17773 [120] S ==> swapper/3:0 [120]
1.53% Xephyr:6524 [120] S ==> swapper/0:0 [120]
1.53% swapper/0:0 [120] R ==> Xephyr:6524 [120]
1.17% swapper/2:0 [120] R ==> irq/33-iwlwifi:233 [49]
1.13% irq/33-iwlwifi:233 [49] S ==> swapper/2:0 [120]
Note that the 'trace' sort key works only for tracepoint events. If
it's used to other type of events, just "N/A" will be printed.
Suggested-and-acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Each tracepoint event has format string for print to improve
readability. Try to parse the output and match the field name. If it
finds one, use that for the result. If not, fallbacks to the original
output.
For example, sort on kmem:kmalloc.gfp_flags looks like below:
(Note: libtraceevent plugins are not installed on my system. They might
affect the output below)
Before:
# Overhead Command gfp_flags
# ........ ....... ..........
#
99.89% perf 32848
0.06% sleep 208
0.03% perf 32976
0.01% perf 208
After:
# Overhead Command gfp_flags
# ........ ....... ...................
#
99.89% perf GFP_NOFS|GFP_ZERO
0.06% sleep GFP_KERNEL
0.03% perf GFP_KERNEL|GFP_ZERO
0.01% perf GFP_KERNEL
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-7-git-send-email-namhyung@kernel.org
[ Fixed clash with earlier, updated patch in this patchkit ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>