The flags field is synthesized and may have a value when Instruction
Trace decoding. The flags are "bcrosyiABEx" which stand for branch,
call, return, conditional, system, asynchronous, interrupt, transaction
abort, trace begin, trace end, and in transaction, respectively.
Change the display so that known combinations of flags are printed more
nicely e.g.: "call" for "bc", "return" for "br", "jcc" for "bo", "jmp"
for "b", "int" for "bci", "iret" for "bri", "syscall" for "bcs",
"sysret" for "brs", "async" for "by", "hw int" for "bcyi", "tx abrt" for
"bA", "tr strt" for "bB", "tr end" for "bE".
However the "x" flag will be displayed separately in those cases e.g.
"jcc (x)" for a condition branch within a transaction.
Example:
perf record -e intel_pt//u ls
perf script --ns -F comm,cpu,pid,tid,time,ip,addr,sym,dso,symoff,flags
...
ls 3689/3689 [001] 2062.020965237: jcc 7f06a958847a _dl_sysdep_start+0xfa (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a9588450 _dl_sysdep_start+0xd0 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020965237: jmp 7f06a9588461 _dl_sysdep_start+0xe1 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a95885a0 _dl_sysdep_start+0x220 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020965237: jmp 7f06a95885a4 _dl_sysdep_start+0x224 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a9588470 _dl_sysdep_start+0xf0 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020965904: call 7f06a95884c3 _dl_sysdep_start+0x143 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a9589140 brk+0x0 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020965904: syscall 7f06a958914a brk+0xa (/lib/x86_64-linux-gnu/ld-2.19.so) => 0 [unknown] ([unknown])
ls 3689/3689 [001] 2062.020966237: tr strt 0 [unknown] ([unknown]) => 7f06a958914c brk+0xc (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020966237: return 7f06a9589165 brk+0x25 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a95884c8 _dl_sysdep_start+0x148 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020966237: jcc 7f06a95884d7 _dl_sysdep_start+0x157 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a95885f0 _dl_sysdep_start+0x270 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020966237: call 7f06a95885f0 _dl_sysdep_start+0x270 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a958ac50 strlen+0x0 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020966237: jcc 7f06a958ac6e strlen+0x1e (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a958ac60 strlen+0x10 (/lib/x86_64-linux-gnu/ld-2.19.so)
...
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1466689258-28493-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's a display inconsistency when there are multiple tracepoint
events, some of which have the 'call-graph' config option set but the
first one hasn't, i.e. the whole logic for call graph processing is
enabled only if the first tracepoint event has call-graph set.
For instance, if we record signal_deliver with call-graph and
signal_generate without:
$ perf record -g -a -e signal:signal_deliver -e signal:signal_generate/call-graph=no/
[ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
$ perf script
kworker/u2:1 13 [000] 6563.875949: signal:signal_generate: sig=2 errno=0 code=128 comm=perf pid=1313 grp=1 res=0 ff61cc __send_signal+0x3ec ([kernel.kallsyms])
perf 1313 [000] 6563.877584: signal:signal_deliver: sig=2 errno=0 code=128 sa_handler=43115e sa_flags=14000000
7ffff314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
...
Then we exchange the order of these two events in commandline, and keep
signal_generate without call-graph.
$ perf record -g -a -e signal:signal_generate/call-graph=no/ -e signal:signal_deliver
[ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
$ perf script
kworker/u2:2 1314 [000] 6933.353060: signal:signal_generate: sig=2 errno=0 code=128 comm=perf pid=1321 grp=1 res=0
perf 1321 [000] 6933.353872: signal:signal_deliver: sig=2 errno=0 code=128 sa_handler=43115e sa_flags=14000000
This time, the callchain of the event signal_deliver disappeared. The
problem is caused by that perf only checks for the first evsel in evlist
and decides if callchain should be printed.
This patch traverses all evsels in evlist to see if any of them have
callchains, and shows the right result:
$ perf script
kworker/u2:2 1314 [000] 6933.353060: signal:signal_generate: sig=2 errno=0 code=128 comm=perf pid=1321 grp=1 res=0 ff61cc __send_signal+0x3ec ([kernel.kallsyms])
perf 1321 [000] 6933.353872: signal:signal_deliver: sig=2 errno=0 code=128 sa_handler=43115e sa_flags=14000000
7ffff314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
...
Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1463374279-97209-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This fixes a bug caused by an unitialized callchain cursor. The crash
frist appeared in:
6f736735e3 ("perf evsel: Require that callchains be resolved before
calling fprintf_{sym,callchain}")
The callchain cursor is a struct that contains pointers, that when
uninitialized will cause unpredictable behavior (usually a crash)
when trying to append to the callchain.
The existing implementation has the following issues:
1. The callchain cursor used is not initialized, resulting in
unpredictable behavior when used.
2. The cursor is declared on the stack. Even if it is properly initalized,
the implmentation will leak memory when the function returns,
since all the references to the callchain_nodes allocated by
callchain_cursor_append will be lost when the cursor goes out of
scope.
3. Storing the cursor on the stack is inefficient. Even if memory is
properly freed when it goes out of scope, a performance penalty
will be incurred due to reallocation of callchain nodes.
callchain_cursor_append is designed to avoid these reallocations
when an existing cursor is reused.
This patch fixes the crash by replacing cursor_callchain with a reference
to the global callchain_cursor which also resolves all 3 issues mentioned
above.
How to reproduce the crash:
$ perf record --call-graph=dwarf stress -t 1 -c 1
$ perf script > /dev/null
Segfault
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 6f736735e3 ("perf evsel: Require that callchains be resolved before calling fprintf_{sym,callchain}")
Link: http://lkml.kernel.org/r/1461119531-2529-1-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
# perf test -v python
16: Try 'import perf' in python, checking link problems :
--- start ---
test child forked, pid 672
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /tmp/build/perf/python/perf.so: undefined symbol:
symbol_conf
test child finished with -1
---- end ----
Try 'import perf' in python, checking link problems: FAILED!
#
To fix it just pass a parameter to perf_evsel__fprintf_sym telling if
callchains should be printed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-comrsr20bsnr8bg0n6rfwv12@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case in 'perf script', so, to avoid breaking the build
with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
See: http://man7.org/linux/man-pages/man3/readdir.3.html
"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe. In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."
Noticed while building on a Fedora Rawhide docker container.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-mt3xz7n2hl49ni2vx7kuq74g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andreas reported following command produces no output:
# cat test.py
#!/usr/bin/env python
def stat__krava(cpu, thread, time, val, ena, run):
print "event %s cpu %d, thread %d, time %d, val %d, ena %d, run %d" % \
("krava", cpu, thread, time, val, ena, run)
# perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
^C
#
The reason is that 'perf script' does not process event update events and
will never get the event name update thus the python callback is never
called.
The fix is just to add already existing callback we use in 'perf stat
report'.
Committer note:
After the patch:
# perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
event krava cpu -1, thread -1, time 1000239179, val 1789051, ena 4000690920, run 4000690920
event krava cpu -1, thread -1, time 2000479061, val 2391338, ena 4000879596, run 4000879596
event krava cpu -1, thread -1, time 3000740802, val 1939121, ena 4000977209, run 4000977209
event krava cpu -1, thread -1, time 4001006730, val 2356115, ena 4001000489, run 4001000489
^C
#
Reported-by: Andreas Hollmann <hollmann@in.tum.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460013073-18444-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The default script handler (the one that displays samples on screen) is
implemented scripting_ops instance with process_event callback.
This way we can't pass any script config into display function, because
we don't want perl or python handlers to be depended on perf script
internals.
Removing the default_scripting_ops and calling process event function
directly. This way it's possible to pass perf_script struct and process
configuration data in following commit.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1448546125-29245-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch improves perf script by enabling printing of the
branch stack via the 'brstack' and 'brstacksym' arguments to
the field selection option -F. The option is off by default
and operates only if the perf.data file has branch stack content.
The branches are printed in to/from pairs. The most recent branch
is printed first. The number of branch entries vary based on the
underlying hardware and filtering used.
The brstack prints FROM/TO addresses in raw hexadecimal format.
The brstacksym prints FROM/TO addresses in symbolic form wherever
possible.
$ perf script -F ip,brstack
5d3000 0x401aa0/0x5d2000/M/-/-/-/0 ...
$ perf script -F ip,brstacksym
4011e0 noploop+0x0/noploop+0x0/P/-/-/0
The notation F/T/M/X/A/C describes the attributes of the branch.
F=from, T=to, M/P=misprediction/prediction, X=TSX, A=TSX abort, C=cycles (SKL)
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yuanfang Chen <cyfmxc@gmail.com>
Link: http://lkml.kernel.org/r/1441039273-16260-5-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf script has a setting to set the maximum stack depth when processing
callchains. The setting defaults to the hard-coded maximum definition
PERF_MAX_STACK_DEPTH which is 127.
It is possible, when processing instruction traces, to synthesize
callchains. Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024. The
default value is 16.
To allow for synthesized callchains, make the scripting_max_stack value
at least the same size as the synthesized callchain size.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-21-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In a couple of cases the 'comm' member of 'union event' has been used
instead of the correct member ('fork') when processing exit events.
In the cases where it has been used incorrectly, only the 'pid' and
'tid' are affected. The 'pid' value would be correct anyway because it
is in the same position in 'comm' and 'fork' events, but the 'tid' would
have been incorrectly assigned from 'ppid'.
However, for exit events, the kernel puts the current task in the 'ppid'
and 'ttid' which is the same as the exiting task. That is 'ppid' ==
'pid' and if the task is not multi-threaded, 'pid' == 'tid' i.e. the
data goes wrong only when tracing multi-threaded programs.
It is hard to find an example of how this would produce an error in
practice. There are 3 occurences of the fix:
1. perf script is only affected if !sample_id_all which only happens on
old kernels.
2. intel_pt is only affected when decoding without timestamps
and would probably still decode correctly - the exit event is
only used to flush out data which anyway gets flushed at the
end of the session
3. intel_bts also uses the exit event to flush data which
would probably not cause errors as it would get flushed at
the end of the session instead
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1439888825-27708-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For perf report/script srcline currently only the base file name of the
source file is printed. This is a good default because it usually fits
on the screen.
But in some cases we want to know the full file name, for example to
aggregate hits per file.
In the later case we need more than the base file name to resolve file
naming collisions: for example the kernel source has ~70 files named
"core.c"
It's also useful as input to post processing tools which want to point
to the right file.
Add a flag to allow full file name output.
Add an option to perf report/script to enable this option.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1438986245-15191-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>