Adding processing of event update events, so perf stat report can store
additional info for events - unit,scale,name.
Committer note:
Before:
# perf stat record -e power/energy-cores/ -a
^C
Performance counter stats for 'system wide':
77.41 Joules power/energy-cores/
1.597176695 seconds time elapsed
# perf stat report
Performance counter stats for '/home/acme/bin/perf stat record -e power/energy-cores/ -a':
332,488,114,176 power/energy-cores/
1.597176695 seconds time elapsed
#
After, using the same perf.data file generated in the "Before" case
above:
# perf stat report
Performance counter stats for '/home/acme/bin/perf stat record -e power/energy-cores/ -a':
77.41 Joules power/energy-cores/
1.597176695 seconds time elapsed
#
Reported-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-17-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding processing of stat and stat round events.
The stat data com in stat events, using generic function
process_stat_round_event to store data under perf_evsel object.
The stat-round events comes each interval or as last event in non
interval mode. The function process_stat_round_event process stored data
for each perf_evsel object and print it out.
Committer note:
After this patch:
$ perf stat record usleep 1
Performance counter stats for 'usleep 1':
0.498381 task-clock (msec) # 0.571 CPUs utilized
2 context-switches # 0.004 M/sec
0 cpu-migrations # 0.000 K/sec
149 page-faults # 0.299 M/sec
1,271,635 cycles # 2.552 GHz
928,712 stalled-cycles-frontend # 73.03% frontend cycles idle
663,286 stalled-cycles-backend # 52.16% backend cycles idle
792,614 instructions # 0.62 insns per cycle
# 1.17 stalled cycles per insn
136,850 branches # 274.589 M/sec
<not counted> branch-misses (0.00%)
0.000873419 seconds time elapsed
$
$ perf stat report
Performance counter stats for '/home/acme/bin/perf stat record usleep 1':
0.498381 task-clock (msec) # 0.571 CPUs utilized
2 context-switches # 0.004 M/sec
0 cpu-migrations # 0.000 K/sec
149 page-faults # 0.299 M/sec
1,271,635 cycles # 2.552 GHz
928,712 stalled-cycles-frontend # 73.03% frontend cycles idle
663,286 stalled-cycles-backend # 52.16% backend cycles idle
792,614 instructions # 0.62 insns per cycle
# 1.17 stalled cycles per insn
136,850 branches # 274.589 M/sec
<not counted> branch-misses (0.00%)
0.000873419 seconds time elapsed
$
Reported-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-16-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We currently don't support storing multiple session in perf.data,
so we can't allow -r option in stat record.
$ perf stat -e cycles -r 2 record ls
Cannot use -r option with perf stat record.
Committer note:
Before this patch we would a perf.data file such as:
$ perf stat -e cycles -r 2 record ls
<SNIP>
Performance counter stats for 'ls' (2 runs):
3,935,236 cycles
0.002353261 seconds time elapsed ( +- 4.76% )
$ perf report -D | grep PERF_RECORD | grep ROUND
0xf0 [0]: failed to process type: 16
Error:
failed to process sample
$
Reported-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-10-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allowing storing stat record data into pipe, so report tools
(report/script) could read data directly from record.
Committer note:
Before this patch:
$ perf stat record -o - usleep 1 | perf report -i -
incompatible file format (rerun with -v to learn more)
$ perf stat record -o - usleep 1 | perf script -i -
incompatible file format (rerun with -v to learn more)
$ ls -la perf.data
ls: cannot access perf.data: No such file or directory
$
After:
$ perf stat record -o - usleep 1 | perf report -i -
# To display the perf.data header info, please use
# --header/--header-only options.
#
Error:
The - file has no samples!
$ perf stat record -o - usleep 1 | perf script -i -
Display of symbols requested but neither sample IP nor sample address
is selected. Hence, no addresses to convert to symbols.
0 [0x80]: failed to process type: 64
$ ls -la perf.data
ls: cannot access perf.data: No such file or directory
$
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Synthesizing needed stat record data for report/script:
- cpu/thread maps
- stat config
Committer note:
New records generated on a perf.data file with this patch:
$ perf report -D | grep PERF_RECORD_
0x568 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 29097
0x590 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
0x5a2 [0x40]: PERF_RECORD_STAT_CONFIG
$
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-5-git-send-email-jolsa@kernel.org
[ Adjusted wrt kernel PERF_RECORD_MMAP added when introducing 'perf stat record' ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add 'perf stat record' command support. It creates simple (header only)
perf.data file ATM.
The record command could be specified anywhere among stat options. All
stat command options are valid for stat record command with '-o' option
exception. If specified for record command it denotes the perf data file
name.
Committer note:
Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless
while avoiding that older tools show confusing messages, for instance,
with sample_type = 0, we get:
$ perf stat record usleep 1
Performance counter stats for 'usleep 1':
0.630237 task-clock (msec) # 0.528 CPUs utilized
1 context-switches # 0.002 M/sec
0 cpu-migrations # 0.000 K/sec
52 page-faults # 0.083 M/sec
978,312 cycles # 1.552 GHz
671,931 stalled-cycles-frontend # 68.68% frontend cycles idle
<not supported> stalled-cycles-backend
646,379 instructions # 0.66 insns per cycle
# 1.04 stalled cycles per insn
131,046 branches # 207.931 M/sec
7,073 branch-misses # 5.40% of all branches
0.001193240 seconds time elapsed
$ oldperf evlist
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
non matching sample_type
$
While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf
stat record usleep' we get:
$ oldperf evlist
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
task-clock
context-switches
cpu-migrations
page-faults
cycles
stalled-cycles-frontend
stalled-cycles-backend
instructions
branches
branch-misses
$
Which at least shows the names of the events in the perf.data file.
Additionally, such files, when passed to 'perf report' will produce:
$ oldperf report --stdio
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
Warning:
Kernel address maps (/proc/{kallsyms,modules}) were restricted.
Check /proc/sys/kernel/kptr_restrict before running 'perf record'.
As no suitable kallsyms nor vmlinux was found, kernel samples
can't be resolved.
Samples in kernel modules can't be resolved as well.
Error:
The perf.data file has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
$
Which is confusing and can be solved by just adding the kernel mmap record,
which will also remove that warning about the data size field being equal to
zero, after generating the mmap record:
$ perf stat record usleep 1
Performance counter stats for 'usleep 1':
0.600796 task-clock (msec) # 0.478 CPUs utilized
1 context-switches # 0.002 M/sec
0 cpu-migrations # 0.000 K/sec
54 page-faults # 0.090 M/sec
886,844 cycles # 1.476 GHz
582,169 stalled-cycles-frontend # 65.65% frontend cycles idle
<not supported> stalled-cycles-backend
638,344 instructions # 0.72 insns per cycle
# 0.91 stalled cycles per insn
130,204 branches # 216.719 M/sec
7,500 branch-misses # 5.76% of all branches
0.001255897 seconds time elapsed
$ oldperf evlist
task-clock
context-switches
cpu-migrations
page-faults
cycles
stalled-cycles-frontend
stalled-cycles-backend
instructions
branches
branch-misses
$ oldperf report --stdio
Error:
The perf.data file has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
[acme@zoo linux]$
No warnings, sensible output about what are the events in the perf.data file and also
a "file has no samples" message, which indeed it doesn't.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix cmd_stat() to release cpu_map objects (aggr_map and
cpus_aggr_map) afterwards.
refcnt debugger shows that the cmd_stat initializes cpu_map
but not puts it.
----
# ./perf stat -v ls
....
REFCNT: BUG: Unreclaimed objects found.
==== [0] ====
Unreclaimed cpu_map@0x29339c0
Refcount +1 => 1 at
./perf(cpu_map__empty_new+0x6d) [0x4e64bd]
./perf(cmd_stat+0x5fe) [0x43594e]
./perf() [0x47b785]
./perf(main+0x617) [0x422587]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f2dff420af5]
./perf() [0x4226fd]
REFCNT: Total 1 objects are not reclaimed.
"cpu_map" leaks 1 objects
----
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151209021127.10245.93697.stgit@localhost.localdomain
[ Remove NULL checks before calling the put operation, it checks it already ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we have 2 kinds of stat counters based on when the event is
enabled:
1) tracee command events, which are enable once the
tracee executes exec syscall (enable_on_exec bit)
2) all other events which get alive within the
perf_event_open syscall
And 2) case could raise a problem in case we want additional filter to
be attached for event. In this case we want the event to be enabled
after it's configured with filter.
Changing the behaviour of 2) events, so they all are created as disabled
(disabled bit). Adding extra enable call to make them alive once they
finish setup.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1449133606-14429-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --interval-print parameter was limited to 100ms. However, for
example, 10ms is required to do sophisticated bandwidth analysis using
uncore events.
The test shows that the overhead of the system-wide uncore monitoring
with 10ms interval is only ~2%. So this patch reduces the minimal
interval-print allowd to 10ms.
But 10ms may not work well for all cases. For example, when the
cpus/threads number is very large, for system-wide core event monitoring
the overhead could be high.
To handle this issue, a warning will be displayed when the
interval-print is set between 10ms to 100ms. So users can make a
decision according to their specific cases.
# perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1
print interval < 100ms. The overhead percentage could be high in some
cases. Please proceed with caution.
# time counts unit events
0.010200451 0.10 MiB uncore_imc_1/cas_count_read/
0.020475117 0.02 MiB uncore_imc_1/cas_count_read/
0.030692800 0.01 MiB uncore_imc_1/cas_count_read/
0.040948161 0.02 MiB uncore_imc_1/cas_count_read/
0.051159564 0.00 MiB uncore_imc_1/cas_count_read/
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com
[ Added warning about overhead when using sub 100ms intervals to the man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
print_aggr() fails to print per-core/per-socket statistics after commit
582ec0829b ("perf stat: Fix per-socket output bug for uncore events")
if events have differnt cpus. Because in print_aggr(), aggr_get_id needs
index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be
used to get aggregated id.
Here is an example:
Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has
cpumask 0,18)
$ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2
Without this patch, it failes to get CPU 18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 7526851 cycles
S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/
S1-C0 0 <not counted> cycles
S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/
With this patch, it can get both CPU0 and CPU18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 6327768 cycles
S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/
S1-C0 1 330228 cycles
S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 582ec0829b ("perf stat: Fix per-socket output bug for uncore events")
Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently all the -p option PID arguments tasks values get aggregated
and printed as single values.
Adding --per-tasks option to print values per task.
$ perf stat -e cycles,instructions --per-thread -p 30190,30242
^C
Performance counter stats for process id '30190,30242':
cat-30190 0 cycles
yes-30242 3,842,525,421 cycles
cat-30190 0 instructions
yes-30242 10,370,817,010 instructions
1.143155657 seconds time elapsed
Also works under interval mode:
$ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000
# time comm-pid counts unit events
1.000073435 cat-30190 89,058 cycles
1.000073435 yes-30242 3,360,786,902 cycles (100.00%)
1.000073435 cat-30190 14,066 instructions
1.000073435 yes-30242 9,069,937,462 instructions
2.000204830 cat-30190 0 cycles
2.000204830 yes-30242 3,351,667,626 cycles
2.000204830 cat-30190 0 instructions
2.000204830 yes-30242 9,045,796,885 instructions
^C 2.771286639 cat-30190 0 cycles
2.771286639 yes-30242 2,593,884,166 cycles
2.771286639 cat-30190 0 instructions
2.771286639 yes-30242 7,001,171,191 instructions
It works only with -t and -p options, otherwise following error is
printed:
$ perf stat -e cycles --per-thread -I 1000 ls
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>