Jiri Olsa
cdb6b0235f
perf tools: Add pattern name checking to rm_rf
...
Add pattern argument to rm_rf_depth() (and rename it to rm_rf_depth_pat())
to specify the name pattern files need to match inside the directory.
The function fails if we find different file to remove.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alexey Budankov <alexey.budankov@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Stephane Eranian <eranian@google.com >
Link: http://lkml.kernel.org/r/20190224190656.30163-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-25 10:33:04 -03:00
Jiri Olsa
05a4865939
perf tools: Add depth checking to rm_rf
...
Adding depth argument to rm_rf (and renaming it to rm_rf_depth) to
specify the depth we will go searching for files to remove.
It will be used to specify single depth for perf.data directory removal
in following patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alexey Budankov <alexey.budankov@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Stephane Eranian <eranian@google.com >
Link: http://lkml.kernel.org/r/20190224190656.30163-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-25 10:32:11 -03:00
Jiri Olsa
2d4f27999b
perf data: Add global path holder
...
Add a 'path' member to 'struct perf_data'. It will keep the configured
path for the data (const char *). The path in struct perf_data_file is
now dynamically allocated (duped) from it.
This scheme is useful/used in following patches where struct
perf_data::path holds the 'configure' directory path and struct
perf_data_file::path holds the allocated path for specific files.
Also it actually makes the code little simpler.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alexey Budankov <alexey.budankov@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Stephane Eranian <eranian@google.com >
Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org
[ Fixup data-convert-bt.c missing conversion ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-22 16:52:07 -03:00
Jiri Olsa
45112e89a8
perf data: Move size to struct perf_data_file
...
We are about to add support for multiple files, so we need each file to
keep its size.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alexey Budankov <alexey.budankov@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Stephane Eranian <eranian@google.com >
Link: http://lkml.kernel.org/r/20190221094145.9151-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-22 16:52:07 -03:00
Adrian Hunter
3c0cd952cf
perf thread-stack: Hide x86 retpolines
...
x86 retpoline functions pollute the call graph by showing up everywhere
there is an indirect branch, but they do not really mean anything. Make
changes so that the default retpoline functions will no longer appear in
the call graph. Note this only affects the call graph, since all the
original branches are left unchanged.
This does not handle function return thunks, nor is there any
improvement for the handling of inline thunks or extern thunks.
Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
int foo(void)
{
return bar() + 1;
}
__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;
fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
<SNIP>
0000000000001040 <main>:
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
<SNIP>
0000000000001160 <bar>:
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
<SNIP>
0000000000001170 <foo>:
1170: e8 eb ff ff ff callq 1160 <bar>
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 <__x86_indirect_thunk_rax>:
1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
<SNIP>
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> bar
After:
main
-> foo
-> bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Link: http://lkml.kernel.org/r/20190109091835.5570-7-adrian.hunter@intel.com
[ Remove (sym->name != NULL) test, this is not a pointer and breaks the build with clang version 7.0.1 (Fedora 7.0.1-2.fc30) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-22 16:49:49 -03:00
Adrian Hunter
1f35cd6538
perf thread-stack: Improve thread_stack__no_call_return()
...
Improve thread_stack__no_call_return() to better handle 'returns' that
do not match the stack i.e. 'no call'. See code comments for details.
The example below shows how retpolines are affected:
Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
int foo(void)
{
return bar() + 1;
}
__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;
fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
<SNIP>
0000000000001040 <main>:
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
<SNIP>
0000000000001160 <bar>:
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
<SNIP>
0000000000001170 <foo>:
1170: e8 eb ff ff ff callq 1160 <bar>
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 <__x86_indirect_thunk_rax>:
1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
<SNIP>
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> bar
After:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> bar
Committer testing:
Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
expanding:
Before:
simple-retpolin
PID:PID
_start
_start
__libc_start_main
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
bar
After:
Remove the "simple.retpoline.db" file, run again the 'perf script' line
to regenerate the .db file and run the exported-sql-viewer.py again to
get the same all the way to 'main', then, from there, including 'main':
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
foo
bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-22 11:42:34 -03:00
Wei Li
11db1ad451
perf annotate: Fix getting source line failure
...
The output of "perf annotate -l --stdio xxx" changed since commit 425859ff0d
("perf annotate: No need to calculate notes->start twice") removed notes->start
assignment in symbol__calc_lines(). It will get failed in
find_address_in_section() from symbol__tty_annotate() subroutine as the
a2l->addr is wrong. So the annotate summary doesn't report the line number of
source code correctly.
Before fix:
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
void hotspot_1(void)
{
volatile int i;
for (i = 0; i < 0x10000000; i++);
for (i = 0; i < 0x10000000; i++);
for (i = 0; i < 0x10000000; i++);
}
int main(void)
{
hotspot_1();
return 0;
}
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
----------------------------------------------
19.30 common_while_1[32]
19.03 common_while_1[4e]
19.01 common_while_1[16]
5.04 common_while_1[13]
4.99 common_while_1[4b]
4.78 common_while_1[2c]
4.77 common_while_1[10]
4.66 common_while_1[2f]
4.59 common_while_1[51]
4.59 common_while_1[35]
4.52 common_while_1[19]
4.20 common_while_1[56]
0.51 common_while_1[48]
Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
-----------------------------------------------------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000000005fa <hotspot_1>:
: hotspot_1():
: void hotspot_1(void)
: {
0.00 : 5fa: push %rbp
0.00 : 5fb: mov %rsp,%rbp
: volatile int i;
:
: for (i = 0; i < 0x10000000; i++);
0.00 : 5fe: movl $0x0,-0x4(%rbp)
0.00 : 605: jmp 610 <hotspot_1+0x16>
0.00 : 607: mov -0x4(%rbp),%eax
common_while_1[10] 4.77 : 60a: add $0x1,%eax
common_while_1[13] 5.04 : 60d: mov %eax,-0x4(%rbp)
common_while_1[16] 19.01 : 610: mov -0x4(%rbp),%eax
common_while_1[19] 4.52 : 613: cmp $0xfffffff,%eax
0.00 : 618: jle 607 <hotspot_1+0xd>
: for (i = 0; i < 0x10000000; i++);
...
After fix:
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
----------------------------------------------
33.34 common_while_1.c:5
33.34 common_while_1.c:6
33.32 common_while_1.c:7
Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
-----------------------------------------------------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000000005fa <hotspot_1>:
: hotspot_1():
: void hotspot_1(void)
: {
0.00 : 5fa: push %rbp
0.00 : 5fb: mov %rsp,%rbp
: volatile int i;
:
: for (i = 0; i < 0x10000000; i++);
0.00 : 5fe: movl $0x0,-0x4(%rbp)
0.00 : 605: jmp 610 <hotspot_1+0x16>
0.00 : 607: mov -0x4(%rbp),%eax
common_while_1.c:5 4.70 : 60a: add $0x1,%eax
4.89 : 60d: mov %eax,-0x4(%rbp)
common_while_1.c:5 19.03 : 610: mov -0x4(%rbp),%eax
common_while_1.c:5 4.72 : 613: cmp $0xfffffff,%eax
0.00 : 618: jle 607 <hotspot_1+0xd>
: for (i = 0; i < 0x10000000; i++);
0.00 : 61a: movl $0x0,-0x4(%rbp)
0.00 : 621: jmp 62c <hotspot_1+0x32>
0.00 : 623: mov -0x4(%rbp),%eax
common_while_1.c:6 4.54 : 626: add $0x1,%eax
4.73 : 629: mov %eax,-0x4(%rbp)
common_while_1.c:6 19.54 : 62c: mov -0x4(%rbp),%eax
common_while_1.c:6 4.54 : 62f: cmp $0xfffffff,%eax
...
Signed-off-by: Wei Li <liwei391@huawei.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Jin Yao <yao.jin@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Fixes: 425859ff0d
("perf annotate: No need to calculate notes->start twice")
Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-21 17:00:35 -03:00
Jiri Olsa
b4409ae112
perf tools: Make rm_rf() remove single file
...
Let rm_rf() remove a file if it's provided by path, not just
directories.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alexey Budankov <alexey.budankov@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190220122800.864-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-20 17:09:28 -03:00
Jiri Olsa
deb83da16c
perf cpumap: Increase debug level for cpu_map__snprint verbose output
...
So it does not screw up single -v verbose output.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190220122800.864-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-20 17:08:39 -03:00
Jiri Olsa
b20fe10642
perf bpf-event: Add missing new line into pr_debug call
...
Add a missing new line into pr_debug call in perf_event__synthesize_bpf_events(),
so that the error message does not screw the verbose output.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Song Liu <songliubraving@fb.com >
Link: http://lkml.kernel.org/r/20190220122800.864-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-20 16:23:07 -03:00
Jiri Olsa
6e7e8b9fec
perf evsel: Force sample_type for slave events
...
Force sample_type setup for slave events in group leader sessions.
We don't get sample for slave events, we make them when delivering group
leader sample. Set the slave event to follow the master sample_type to
ease up report.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190220122800.864-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-20 16:08:59 -03:00
Jiri Olsa
529c1a9e18
perf session: Don't report zero period samples for slave events
...
There's no reason to deliver a sample with zero period. It means there
was no value for slave event since its last group leader sample.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190220122800.864-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-20 16:07:51 -03:00
Arnaldo Carvalho de Melo
d19f856479
perf bpf: Add bpf_map dumper
...
At some point I'll suggest moving this to libbpf, for now I'll
experiment with ways to dump BPF maps set by events in 'perf trace',
starting with a very basic dumper for the current very limited needs
of the augmented_raw_syscalls code: dumping booleans.
Having functions that apply to the map keys and values and do table
lookup in things like syscall id to string tables should come next.
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Daniel Borkmann <daniel@iogearbox.net >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Martin KaFai Lau <kafai@fb.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Yonghong Song <yhs@fb.com >
Link: https://lkml.kernel.org/n/tip-lz14w0esqyt1333aon05jpwc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-19 16:11:56 -03:00
He Kuang
7346195e86
perf report: Don't shadow inlined symbol with different addr range
...
We can't assume inlined symbols with the same name are equal, because
their address range may be different. This will cause the symbols with
different addresses be shadowed when adding to the hist entry, and lead
to ERANGE error when checking the symbol address during sample parse,
the addr should be within the range of [sym.start, sym.end].
The error message is like: "0x36aea60 [0x8]: failed to process type: 68".
The second parameter of symbol__new() is the length of the fake symbol
for the inline frame, which is the subtraction of the end and start
address of base_sym.
Signed-off-by: He Kuang <hekuang@huawei.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Milian Wolff <milian.wolff@kdab.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Fixes: aa441895f7
("perf report: Compare symbol name for inlined frames when sorting")
Link: http://lkml.kernel.org/r/20190219130531.15692-1-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-19 12:30:12 -03:00
Jiri Olsa
e19a01c143
perf tools: Use sysfs__mountpoint() when reading cpu topology
...
Use sysfs__mountpoint() when reading sysfs files to obtain cpu/numa
topologies.
Also use scnprintf instead of sprintf as suggested by Namhyung.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190219095815.15931-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-19 12:21:10 -03:00
Jiri Olsa
48e6c5acd3
perf tools: Add numa_topology object
...
Add the numa_topology object to return the list of numa nodes together
with their cpus. It will replace the numa code in header.c and will be
used from 'perf record' in the following patches.
Add the following interface functions to load numa details:
struct numa_topology *numa_topology__new(void);
void numa_topology__delete(struct numa_topology *tp);
And replace the current (copied) local interface, with no functional
changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190219095815.15931-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-19 12:21:06 -03:00
Jiri Olsa
5135d5efcb
perf tools: Add cpu_topology object
...
Make struct cpu_topo global and rename it to 'struct cpu_topology', so
that it can be used from the 'perf record' command in the following
patches.
Add the following interface functions to load/free cpu topology details:
struct cpu_topology *cpu_topology__new(void);
void cpu_topology__delete(struct cpu_topology *tp);
Move it to a separate source file cputopo.c together with numa related
object in the following patches.
No functional change, the new interface will be used in upcoming changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190219095815.15931-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-19 12:21:01 -03:00
Jiri Olsa
b00ccb27f9
perf header: Fix wrong node write in NUMA_TOPOLOGY feature
...
We are currently passing the node index instead of the real node number.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Fixes: fbe96f29ce
("perf tools: Make perf.data more self-descriptive (v8)"
Link: http://lkml.kernel.org/r/20190219095815.15931-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-19 12:20:55 -03:00
David S. Miller
3313da8188
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
...
The netfilter conflicts were rather simple overlapping
changes.
However, the cls_tcindex.c stuff was a bit more complex.
On the 'net' side, Cong is fixing several races and memory
leaks. Whilst on the 'net-next' side we have Vlad adding
the rtnl-ness support.
What I've decided to do, in order to resolve this, is revert the
conversion over to using a workqueue that Cong did, bringing us back
to pure RCU. I did it this way because I believe that either Cong's
races don't apply with have Vlad did things, or Cong will have to
implement the race fix slightly differently.
Signed-off-by: David S. Miller <davem@davemloft.net >
2019-02-15 12:38:38 -08:00
Jiri Olsa
aa4df30db5
perf header: Remove unused 'cpu_nr' field from 'struct cpu_topo'
...
Not used at all.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190213123246.4015-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:09 -03:00
Jiri Olsa
a9aeb87b98
perf header: Get rid of write_it label
...
Simplifying the code a bit.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190213123246.4015-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:09 -03:00
Jiri Olsa
33bbc571ed
perf list: Display metric expressions for --details option
...
Display metric expression itself when --details is specified.
Current list with no details:
# perf list metrics
...
TopDownL1:
IPC
[Instructions Per Cycle (per logical thread)]
SLOTS
[Total issue-pipeline slots]
...
Detailed output with metric formula:
# perf list --details metrics
...
TopDownL1:
IPC
[Instructions Per Cycle (per logical thread)]
[inst_retired.any / cpu_clk_unhalted.thread]
SLOTS
[Total issue-pipeline slots]
[4*(( cpu_clk_unhalted.thread_any / 2 ) if #smt_on else cycles)]
...
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190213123246.4015-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:09 -03:00
Jiri Olsa
714a92d83f
perf tools: Fix legacy events symbol separator parsing
...
Fixing legacy symbol events parsing. We can't support single slash
separator, like 'cycles/u', because it conflicts with non empty terms,
like 'cycles/period/u'.
Keeping only '//' and ':' separator for these events:
cycles//u
cycles:k
And removing '/' separator support, which is not working
anymore. Also adding automated tests for above events.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190213123246.4015-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:08 -03:00
Jiri Olsa
5ff328836d
perf tools: Rename build libperf to perf
...
Rename build libperf to perf, because it's used to build perf.
The libperf build object name will be used for libperf library.
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20190213123246.4015-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:08 -03:00
Mathieu Poirier
8224531cf5
perf cs-etm: Modularize auxtrace_buffer fetch function
...
Making the auxtrace_buffer fetch function modular so that it can be
called from different decoding context (timeless vs. non-timeless),
avoiding to repeat code.
No change in functionality is introduced by this patch.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-14-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:08 -03:00
Mathieu Poirier
3fa0e83e29
perf cs-etm: Modularize main packet processing loop
...
Making the main packet processing loop modular so that it can be called
from different decoding context (timeless vs. non-timless), avoiding to
repeat code.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-13-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:07 -03:00
Mathieu Poirier
f74f349c21
perf cs-etm: Modularize main decoder function
...
Making the main decoder block modular so that it can be called from
different decoding context (timeless vs. non-timeless), avoiding
to repeat code.
No change in functionality is introduced by this patch.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-12-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:07 -03:00
Mathieu Poirier
23cfcd6d75
perf cs-etm: Make cs_etm__run_decoder() queue independent
...
This patch makes decoding of auxtrace buffer centered around a struct
cs_etm_queue. This eliminates surperflous variables and is a precursor
for work that simplifies the main decoder loop.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-11-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:07 -03:00
Mathieu Poirier
4b6df11ab6
perf cs-etm: Rethink kernel address initialisation
...
Moving initialisation of the kernel start address to function
cs_etm__setup_queues(), considered to be the common denominator for
queue initialisation. That way we don't have to repeat the same code
at different places.
No change of functionatlity is introduced by this patch.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-10-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:07 -03:00
Mathieu Poirier
4f5b37139f
perf cs-etm: Cleaning up function cs_etm__alloc_queue()
...
Function cs_etm__alloc_queue() should only be concerned with the allocation
of memory for the etmq and accompanying decoder. Everything else should
be done in the calling function.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-9-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:07 -03:00
Mathieu Poirier
e4aa592d18
perf cs-etm: Fix erroneous comment
...
The comment just before initialising the decoder is plane wrong since it
is part of the decoding queue setup function and the operation code
specifically mention that trace data is to be decoded rather than printed
out.
This patch simply fix the comment to prevent people from getting really
confused.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-8-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:07 -03:00
Mathieu Poirier
2507a3d982
perf cs-etm: Introducing function cs_etm__init_trace_params()
...
The trace parameter initialisation code is repeated in two different
places, something that bloats the file and can lead to errors. This
is fixed by introducing a helper function and calling the right
protocol initialisation code when required.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-7-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:06 -03:00
Mathieu Poirier
ae4d9f5236
perf cs-etm: Fix memory leak in error path
...
Memory allocated for variable 't_params' isn't released properly in the
error path of function cs_etm_queue *cs_etm__alloc_queue() and
cs_etm__dump_event(), something this patch addresses.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-6-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:06 -03:00
Mathieu Poirier
65963e5b4d
perf cs-etm: Introducing function cs_etm_decoder__init_dparams()
...
Introducing function cs_etm_decoder__init_dparams() to avoid repeating
code at two different places.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-5-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:06 -03:00
Mathieu Poirier
d3267ad43d
perf cs-etm: Fix wrong return values in error path
...
Function cs_etm__mem_access() is supposed to return a u32 but the error
path returns negative values at a couple of places, something that really
throws off the clients using it. Fix the situation by return '0'.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-4-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:06 -03:00
Mathieu Poirier
fc7ac4138c
perf cs-etm: Remove unused structure field "time" and "timestamp"
...
Field "time" and "timestamp" in structure cs_etm_queue are no longer
used and need to be removed.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-3-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:06 -03:00
Mathieu Poirier
b611f63bb1
perf cs-etm: Remove unused structure field "state"
...
Field "state" in structure cs_etm_queue is no longer used and needs
to be removed.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190212171618.25355-2-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 15:18:06 -03:00
Song Liu
39f4a913d6
perf utils: Silence "Couldn't synthesize bpf events" warning for EPERM
...
Synthesizing BPF events is only supported for root. Silent warning msg
when non-root user runs perf-record.
Reported-by: David Carrillo-Cisneros <davidca@fb.com >
Signed-off-by: Song Liu <songliubraving@fb.com >
Tested-by: David Carrillo-Cisneros <davidca@fb.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190204193140.719740-1-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 13:31:11 -03:00
Thomas Richter
2187d87eac
perf report: Add s390 diagnosic sampling descriptor size
...
On IBM z13 machine types 2964 and 2965 the descriptor
sizes for sampling and diagnostic sampling entries
might be missing in the trailer entry and are set to zero.
This leads to a perf report failure when processing diagnostic
sampling entries.
This patch adds missing descriptor sizes when the trailer entry
contains zero for these fields.
Output before:
[root@s38lp82 perf]# ./perf report --stdio | fgrep Samples
0xabbf0 [0x8]: failed to process type: 68
Error:
failed to process sample
[root@s38lp82 perf]#
Output after:
[root@s38lp82 perf]# ./perf report --stdio | fgrep Samples
# Total Lost Samples: 0
# Samples: 3K of event 'SF_CYCLES_BASIC_DIAG'
# Samples: 162 of event 'CF_DIAG'
[root@s38lp82 perf]#
Fixes: 2b1444f2e2
("perf report: Add raw report support for s390 auxiliary trace")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com >
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com >
Cc: Heiko Carstens <heiko.carstens@de.ibm.com >
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com >
Link: http://lkml.kernel.org/r/20190211100627.85714-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 13:31:08 -03:00
Mathieu Poirier
859dcf6438
perf cs-etm: Add proper header file for symbols
...
After 'commit e22c1c7511
("perf thread: Don't include symbol.h,
symbol_conf.h is enough")'
Compilation of the perf tools is broken when using the functionality
provided by the openCSD library:
[...]
... timerfd: [ on ]
... sched_getcpu: [ on ]
... sdt: [ OFF ]
... setns: [ on ]
... libopencsd: [ on ]
[...]
CC util/arm-spe.o
CC util/arm-spe-pkt-decoder.o
CC util/s390-cpumsf.o
CC util/cs-etm.o
CC util/parse-branch-options.o
util/cs-etm.c: In function ‘cs_etm__mem_access’:
util/cs-etm.c:297:24: error: storage size of ‘al’ isn’t known
struct addr_location al;
And rightly so since file cs-etm.c doesn't include symbol.h, something
that is rectified in this patch.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190208223543.31836-1-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-14 13:30:52 -03:00
Ingo Molnar
6854daa07a
Merge tag 'perf-core-for-mingo-5.1-20190206' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
...
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
Hardware tracing:
Adrian Hunter:
- Handle calls optimized into jumps to a different symbol
in the thread stack routines used to process hardware traces (Adrian Hunter)
Intel PT:
Adrian Hunter:
- Fix overlap calculation for padding.
- Fix CYC timestamp calculation after OVF.
- Packet splitting can only happen in 32-bit.
- Add timestamp to auxtrace errors.
ARM CoreSight:
Leo Yan:
- Add last instruction information in packet
- Set sample flags for instruction range, exception and
return packets and for a trace discontinuity.
- Add exception number in exception packet
- Change tuple from traceID-CPU# to traceID-metadata
- Add traceID in packet
Mathieu Poirier:
- Add "sinks" group to PMU directory
- Use event attributes to send sink information to kernel
- Remove set_drv_config() API, no longer used.
perf annotate:
Jiri Olsa:
- Delay symbol annotation to the resort phase, speeding up 'perf report'
startup.
perf record:
Alexey Budankov:
- Allow binding userspace buffers to NUMA nodes.
Symbols:
Adrian Hunter:
- Fix calculating of symbol sizes when splitting kallsyms into
maps for kcore processing.
Vendor events:
William Cohen:
- Intel: Fix Load_Miss_Real_Latency on CLX
Misc:
Arnaldo Carvalho de Melo:
- Streamline headers, removing includes when all that is needed are
just forward declarations, fixup the fallout for cases where headers
should have been explicitely included but were instead obtained
indirectly, by sheer luck.
- Add fallback versions for CPU_{OR,EQUAL}(), so that code using it
continue to build on older systems where those were not yet introduced
or in systems using some other libc than the GNU one where those
helpers aren't present.
Documentation:
Changbin Du:
- Add documentation for BPF event selection.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2019-02-09 13:16:01 +01:00
Ingo Molnar
9821517a53
Merge branch 'perf/urgent' into perf/core, to pick up fixes
...
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2019-02-09 13:15:32 +01:00
David S. Miller
a655fe9f19
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
...
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.
Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net >
2019-02-08 15:00:17 -08:00
Adrian Hunter
16bd4321c2
perf auxtrace: Add timestamp to auxtrace errors
...
The timestamp can use useful to find part of a trace that has an error
without outputting all of the trace e.g. using the itrace 's' option to
skip initial number of events.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Link: http://lkml.kernel.org/r/20190206103947.15750-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-06 11:20:32 -03:00
Adrian Hunter
26ee2bcdea
perf intel-pt: Packet splitting can happen only on 32-bit
...
Data is copied when the trace is stopped, so packets are never split
between buffers except when processing if the buffer cannot fit in the
address space which can only happen on 32-bit systems. Change the logic
to reflect that.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Link: http://lkml.kernel.org/r/20190206103947.15750-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-06 10:27:54 -03:00
Adrian Hunter
0399761290
perf intel-pt: Fix CYC timestamp calculation after OVF
...
CYC packet timestamp calculation depends upon CBR which was being
cleared upon overflow (OVF). That can cause errors due to failing to
synchronize with sideband events. Even if a CBR change has been lost,
the old CBR is still a better estimate than zero. So remove the clearing
of CBR.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-06 10:27:27 -03:00
Adrian Hunter
5a99d99e33
perf intel-pt: Fix overlap calculation for padding
...
Auxtrace records might have up to 7 bytes of padding appended. Adjust
the overlap accordingly.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190206103947.15750-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-06 10:27:00 -03:00
Adrian Hunter
c3fcadf0bb
perf auxtrace: Define auxtrace record alignment
...
Define auxtrace record alignment so that it can be referenced elsewhere.
Note this is preparation for patch "perf intel-pt: Fix overlap calculation
for padding"
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190206103947.15750-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-06 10:25:39 -03:00
Adrian Hunter
f08046cb30
perf thread-stack: Represent jmps to the start of a different symbol
...
The compiler might optimize a call/ret combination by making it a jmp.
However the thread-stack does not presently cater for that, so that such
control flow is not visible in the call graph. Make it visible by
recording on the stack a branch to the start of a different symbol.
Note, that means when a ret pops the stack, all jmps must be popped off
first.
Example:
$ cat jmp-to-fn.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
__attribute__((noinline)) int foo(void)
{
return bar() + 1;
}
int main()
{
return foo();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c
$ objdump -d jmp-to-fn
<SNIP>
0000000000001040 <main>:
1040: 31 c0 xor %eax,%eax
1042: e9 09 01 00 00 jmpq 1150 <foo>
<SNIP>
0000000000001140 <bar>:
1140: b8 ff ff ff ff mov $0xffffffff,%eax
1145: c3 retq
<SNIP>
0000000000001150 <foo>:
1150: 31 c0 xor %eax,%eax
1152: e8 e9 ff ff ff callq 1140 <bar>
1157: 83 c0 01 add $0x1,%eax
115a: c3 retq
<SNIP>
$ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ]
$ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
2019-01-08 13:24:58.783069 Creating database...
2019-01-08 13:24:58.794650 Writing records...
2019-01-08 13:24:59.008050 Adding indexes
2019-01-08 13:24:59.015802 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db
Before:
main
-> bar
After:
main
-> foo
-> bar
Committer testing:
Install the python2-pyside package, then select these menu options
on the GUI:
"Reports"
"Context sensitive callgraphs"
Then go on expanding the symbols, to get, full picture when doing this
on a fedora:29 with gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC):
jmp-to-fn
PID:TID
_start (ld-2.28.so)
__libc_start_main
main
foo
bar
To verify that indeed, this fixes the problem.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Link: http://lkml.kernel.org/r/20190109091835.5570-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-06 10:00:40 -03:00
Adrian Hunter
90c2cda705
perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables
...
Make thread_stack__no_call_return() more readable by adding more local
variables.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Link: http://lkml.kernel.org/r/20190109091835.5570-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2019-02-06 10:00:40 -03:00