In preparation for using the thread stack to print an indent
representing the stack depth in perf script, add an option to tell
decoders to feed branches to the thread stack. Add support for that
option to Intel PT and Intel BTS.
The advantage of using the decoder to feed the thread stack is that it
happens before branch filtering and so can be used with different itrace
options (e.g. it still works when only showing calls, even though the
thread stack needs to see calls and returns). Also it does not conflict
with using the thread stack to get callchains.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1466689258-28493-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The flags field is synthesized and may have a value when Instruction
Trace decoding. The flags are "bcrosyiABEx" which stand for branch,
call, return, conditional, system, asynchronous, interrupt, transaction
abort, trace begin, trace end, and in transaction, respectively.
Change the display so that known combinations of flags are printed more
nicely e.g.: "call" for "bc", "return" for "br", "jcc" for "bo", "jmp"
for "b", "int" for "bci", "iret" for "bri", "syscall" for "bcs",
"sysret" for "brs", "async" for "by", "hw int" for "bcyi", "tx abrt" for
"bA", "tr strt" for "bB", "tr end" for "bE".
However the "x" flag will be displayed separately in those cases e.g.
"jcc (x)" for a condition branch within a transaction.
Example:
perf record -e intel_pt//u ls
perf script --ns -F comm,cpu,pid,tid,time,ip,addr,sym,dso,symoff,flags
...
ls 3689/3689 [001] 2062.020965237: jcc 7f06a958847a _dl_sysdep_start+0xfa (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a9588450 _dl_sysdep_start+0xd0 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020965237: jmp 7f06a9588461 _dl_sysdep_start+0xe1 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a95885a0 _dl_sysdep_start+0x220 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020965237: jmp 7f06a95885a4 _dl_sysdep_start+0x224 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a9588470 _dl_sysdep_start+0xf0 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020965904: call 7f06a95884c3 _dl_sysdep_start+0x143 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a9589140 brk+0x0 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020965904: syscall 7f06a958914a brk+0xa (/lib/x86_64-linux-gnu/ld-2.19.so) => 0 [unknown] ([unknown])
ls 3689/3689 [001] 2062.020966237: tr strt 0 [unknown] ([unknown]) => 7f06a958914c brk+0xc (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020966237: return 7f06a9589165 brk+0x25 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a95884c8 _dl_sysdep_start+0x148 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020966237: jcc 7f06a95884d7 _dl_sysdep_start+0x157 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a95885f0 _dl_sysdep_start+0x270 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020966237: call 7f06a95885f0 _dl_sysdep_start+0x270 (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a958ac50 strlen+0x0 (/lib/x86_64-linux-gnu/ld-2.19.so)
ls 3689/3689 [001] 2062.020966237: jcc 7f06a958ac6e strlen+0x1e (/lib/x86_64-linux-gnu/ld-2.19.so) => 7f06a958ac60 strlen+0x10 (/lib/x86_64-linux-gnu/ld-2.19.so)
...
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1466689258-28493-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's a problem in machine__findnew_vdso(), vdso buildid generated by a
32-bit machine stores it with the name 'vdso', but when processing buildid on a
64-bit machine with the same 'perf.data', perf will search for vdso named as
'vdso32' and get failed.
This patch tries to find the existing dsos in machine->dsos by thread dso_type.
64-bit thread tries to find vdso with name 'vdso', because all 64-bit vdso is
named as that. 32-bit thread first tries to find vdso with name 'vdso32' if
this thread was run on 64-bit machine, if failed, then it tries 'vdso' which
indicates that the thread was run on 32-bit machine when recording.
Committer note:
Additional explanation by Adrian Hunter:
We match maps to builds ids using the file name - consider
machine__findnew_[v]dso() called in map__new(). So in the context of a perf
data file, we consider the file name to be unique.
A vdso map does not have a file name - all we know is that it is vdso. We look
at the thread to tell if it is 32-bit, 64-bit or x32. Then we need to get the
build id which has been recorded using short name "[vdso]" or "[vdso32]" or
"[vdsox32]".
The problem is that on a 32-bit machine, we use the name "[vdso]". If you take
a 32-bit perf data file to a 64-bit machine, it gets hard to figure out if
"[vdso]" is 32-bit or 64-bit.
This patch solves that problem.
----
This also merges a followup patch fixing a problem introduced by the
original submission of this patch, that would crash 'perf record' when
recording samples for a 32-bit app on a 64-bit system.
Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1463475894-163531-1-git-send-email-hekuang@huawei.com
Link: http://lkml.kernel.org/r/1466578626-92406-6-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The pid sort entry currently aligns pids with 5 digits, which is not
enough for current 4 million pids limit.
This leads to unaligned ':' header-data output when we display 7 digits
pid:
# Children Self Symbol Pid:Command
# ........ ........ ...................... .....................
#
0.12% 0.12% [.] 0x0000000000147e0f 2052894:krava
...
Adding 2 more digit to properly align the pid limit:
# Children Self Symbol Pid:Command
# ........ ........ ...................... .......................
#
0.12% 0.12% [.] 0x0000000000147e0f 2052894:krava
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1466459899-1166-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add stackcollapse.py script as an example of parsing call chains, and
also of using optparse to access command line options.
The flame graph tools include a set of scripts that parse output from
various tools (including "perf script"), remove the offsets in the
function and collapse each stack to a single line. The website also
says "perf report could have a report style [...] that output folded
stacks directly, obviating the need for stackcollapse-perf.pl", so here
it is.
This script is a Python rewrite of stackcollapse-perf.pl, using the perf
scripting interface to access the perf data directly from Python.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Brendan Gregg <bgregg@netflix.com>
Link: http://lkml.kernel.org/r/1460467573-22989-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a 'llvm.dump-obj' config option to enable perf dump BPF object files
compiled by LLVM.
This option is useful when using BPF objects in embedded platforms.
LLVM compiler won't be deployed in these platforms, and currently we
don't support dynamic compiling library.
Before this patch users have to explicitly issue llvm commands to
compile BPF scripts, and can't use helpers (like include path detection
and default macros) in perf. With this option, user is allowed to use
perf to compile their BPF objects then copy them into their embedded
platforms.
Committer notice:
Testing it:
# cat ~/.perfconfig
[llvm]
dump-obj = true
#
# ls -la filter.o
ls: cannot access filter.o: No such file or directory
# cat filter.c
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
int func(void *ctx, int err, long nsec)
{
return nsec > 1000;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
# trace -e nanosleep --event filter.c usleep 6
LLVM: dumping filter.o
0.007 ( 0.007 ms): usleep/13976 nanosleep(rqtp: 0x7ffc5847f640 ) ...
0.007 ( ): perf_bpf_probe:func:(ffffffff811137d0) tv_nsec=6000)
0.070 ( 0.070 ms): usleep/13976 ... [continued]: nanosleep()) = 0
# ls -la filter.o
-rw-r--r--. 1 root root 776 Jun 20 17:01 filter.o
# readelf -SW filter.o
There are 7 section headers, starting at offset 0x148:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 0000e8 00005a 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] func=hrtimer_nanosleep rqtp->tv_nsec PROGBITS 0000000000000000 000040 000028 00 AX 0 0 8
[ 4] license PROGBITS 0000000000000000 000068 000004 00 WA 0 0 1
[ 5] version PROGBITS 0000000000000000 00006c 000004 00 WA 0 0 4
[ 6] .symtab SYMTAB 0000000000000000 000070 000078 18 1 2 8
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
#
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1466064161-48553-2-git-send-email-wangnan0@huawei.com
[ s/dumpping/dumping/g ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Build failure for static cross-compiling on aarch64, with libunwind-x86
provided:
$ file ./libunwind_for_x86_on_aarch64/lib/libunwind-x86.so.8.0.1
libunwind-x86.so.8.0.1: ELF 64-bit LSB shared object, ARM aarch64,
version 1 (SYSV), dynamically linked, not stripped
$ make LDFLAGS=-static LIBUNWIND_DIR=./libunwind_for_x86_on_aarch64
ARCH=aarch64 CROSS_COMPILE=aarch64-buildroot-linux-gnu-
~/libperf.a(libperf-in.o): In function `find_proc_info':
:(.text+0xae4ac): undefined reference to `_Ux86_dwarf_search_unwind_table'
~/libperf.a(libperf-in.o): In function `_unwind__prepare_access':
:(.text+0xaedd0): undefined reference to `_Ux86_create_addr_space'
:(.text+0xaee24): undefined reference to `_Ux86_set_caching_policy'
~/libperf.a(libperf-in.o): In function `_unwind__flush_access':
:(.text+0xaee98): undefined reference to `_Ux86_flush_cache'
~/libperf.a(libperf-in.o): In function `_unwind__finish_access':
:(.text+0xaef08): undefined reference to `_Ux86_destroy_addr_space'
~/libperf.a(libperf-in.o): In function `get_entries':
:(.text+0xaf148): undefined reference to `_Ux86_init_remote'
:(.text+0xaf184): undefined reference to `_Ux86_get_reg'
:(.text+0xaf1a4): undefined reference to `_Ux86_step'
collect2: error: ld returned 1 exit status
Makefile.perf:350: recipe for target '~/perf' failed
make[1]: *** [~/perf] Error 1
Makefile:68: recipe for target 'all' failed
make: *** [all] Error 2
This is because the remote libunwind library detected is not appended to
EXTLIBS variable, which will be included between 'start-group' and
'end-group' when linking.
The existing variable LIBUNWIND_LIBS is assigned to libs for local
unwind, this patch introduces a new variable EXTLIBS_LIBUNWIND for
storing remote libunwind libraries instead.
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1465988636-81502-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>