perf record has a new option -W that enables weightened sampling.
Add sorting support in top/report for the average weight per sample and the
total weight sum. This allows to both compare relative cost per event
and the total cost over the measurement period.
Add the necessary glue to perf report, record and the library.
v2: Merge with new hist refactoring.
v3: Fix manpage. Remove value check.
Rename global_weight to weight and weight to local_weight.
v4: Readd sort keys to manpage
v5: Move weight to end
v6: Move weight to template
v7: Rename weight key.
Original patch from Andi modified by Stephane Eranian <eranian@google.com>
to include ONLY the weight supporting code and apply to pristine 3.8.0-rc4.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359040242-8269-6-git-send-email-eranian@google.com
[ committer note: changed to cope with fc5871ed and the hists_link perf test entry ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds the --per-core option to perf stat.
This option is used to aggregate system-wide counts
on a per physical core basis. On processors with
hyperthreading, this means counts of all HT threads
running on a physical core are aggregated.
This mode is useful to find imblance between physical
cores running an uniform workload. Cores are identified
by socket: S0-C1, means physical core 1 on socket 0. Note
that cores are identified using their physical core id,
thus their numbering may not be continuous.
Per core aggregation can be combined with interval printing:
# perf stat -a --per-core -I 1000 -e cycles sleep 1000
# time core cpus counts events
1.000090030 S0-C0 1 4,765,747 cycles
1.000090030 S0-C1 1 5,580,647 cycles
1.000090030 S0-C2 1 221,181 cycles
1.000090030 S0-C3 1 266,092 cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1360846649-6411-4-git-send-email-eranian@google.com
[ committer note: Remove parts already applied on 86ee6e1 to keep bisectability ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds per-processor socket count aggregation for system-wide
mode measurements. This is a useful mode to detect imbalance between
sockets.
To enable this mode, use --aggr-socket in addition
to -a. (system-wide).
The output includes the socket number and the number of online
processors on that socket. This is useful to gauge the amount of
aggregation.
# ./perf stat -I 1000 -a --aggr-socket -e cycles sleep 2
# time socket cpus counts events
1.000097680 S0 4 5,788,785 cycles
2.000379943 S0 4 27,361,546 cycles
2.001167808 S0 4 818,275 cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1360161962-9675-3-git-send-email-eranian@google.com
[ committer note: Added missing man page entry based on above comments ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes a test is problematic for some reason and one wants to skip it,
for instance:
[root@sandy ~]# perf test
1: vmlinux symtab matches kallsyms : Ok
2: detect open syscall event : Ok
3: detect open syscall event on all cpus : Ok
4: read samples using the mmap interface : Ok
5: parse events tests : Warning: bad op token {
Warning: bad op token {
Warning: bad op token {
Warning: bad op token {
Warning: bad op token {
Warning: function is_writable_pte not defined
Segmentation fault (core dumped)
So now we can use -s/--skip while the problematic tests are being fixed,
allowing us to test all the other entries:
[root@sandy ~]# perf test -s 5
1: vmlinux symtab matches kallsyms : Ok
2: detect open syscall event : Ok
3: detect open syscall event on all cpus : Ok
4: read samples using the mmap interface : Ok
5: parse events tests : Skip (user override)
6: x86 rdpmc test : Ok
7: Validate PERF_RECORD_* events & perf_sample fields : Ok
8: Test perf pmu format parsing : Ok
9: Test dso data interface : Ok
10: roundtrip evsel->name check : Ok
11: Check parsing of sched tracepoints fields : Ok
12: Generate and check syscalls:sys_enter_open event fields: Ok
13: struct perf_event_attr setup : Ok
14: Test matching and linking mutliple hists : Ok
15: Try 'use perf' in python, checking link problems : Ok
[root@sandy ~]#
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-klzd8p57jzdryafqkmlppcb1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
. perf build-id cache now can show DSOs present in a perf.data file that are
not in the cache, to integrate with build-id servers being put in place by
organizations such as Fedora.
. perf buildid-list -i an-elf-file-instead-of-a-perf.data is back showing its
build-id.
. No need to do feature checks when doing a 'make tags'
. Fix some 'perf test' errors and make them use the tracepoint evsel constructor.
. perf top now shares more of the evsel config/creation routines with 'record',
paving the way for further integration like 'top' snapshots, etc.
. perf top now supports DWARF callchains.
. perf evlist decodes sample_type and read_format, helping diagnose problems.
. Fix mmap limitations on 32-bit, fix from David Miller.
. perf diff fixes from Jiri Olsa.
. Ignore ABS symbols when loading data maps, fix from Namhyung Kim
. Hists improvements from Namhyung Kim
. Don't check configuration on make clean, from Namhyung Kim
. Fix dso__fprintf() print statement, from Stephane Eranian.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...
In order to measure kernel builds, one has to do some pre/post cleanup
work in order to do the repeat build.
So provide --pre and --post command hooks to allow doing just that.
perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \
-- make -s -j64 O=defconfig-build/ bzImage
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Stephane Eranian <eranian@gmail.com>
Link: http://lkml.kernel.org/r/1350992414.13456.5.camel@twins
[ committer note: Added respective entries in Documentation/perf-stat.txt ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
You may want to know where and how long a task is sleeping. A callchain
may be found in sched_switch and a time slice in stat_iowait, so I add
handler in perf inject for merging this events.
My code saves sched_switch event for each process and when it meets
stat_iowait, it reports the sched_switch event, because this event
contains a correct callchain. By another words it replaces all
stat_iowait events on proper sched_switch events.
I use the next sequence of commands for testing:
perf record -e sched:sched_stat_sleep -e sched:sched_switch \
-e sched:sched_process_exit -g -o ~/perf.data.raw \
~/test-program
perf inject -v -s -i ~/perf.data.raw -o ~/perf.data
perf report --stdio -i ~/perf.data
100.00% foo [kernel.kallsyms] [k] __schedule
|
--- __schedule
schedule
|
|--79.75%-- schedule_hrtimeout_range_clock
| schedule_hrtimeout_range
| poll_schedule_timeout
| do_select
| core_sys_select
| sys_select
| system_call_fastpath
| __select
| __libc_start_main
|
--20.25%-- do_nanosleep
hrtimer_nanosleep
sys_nanosleep
system_call_fastpath
__GI___libc_nanosleep
__libc_start_main
And here is test-program.c:
#include<unistd.h>
#include<time.h>
#include<sys/select.h>
int main()
{
struct timespec ts1;
struct timeval tv1;
int i;
long s;
for (i = 0; i < 10; i++) {
ts1.tv_sec = 0;
ts1.tv_nsec = 10000000;
nanosleep(&ts1, NULL);
tv1.tv_sec = 0;
tv1.tv_usec = 40000;
select(0, NULL, NULL, NULL,&tv1);
}
return 1;
}
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344344165-369636-4-git-send-email-avagin@openvz.org
[ committer note: Made it use evsel->handler ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's a portion in the "perf list" output refering to the exact
specification of raw hardware events.
Since this description is in the perf-list manpage, try to build and
install the man pages, warning the user when that is not possible
due to missing packages (xmlto and asciidoc).
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-ij71ysszkdvz3fy3wr331bke@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Initially should look loosely like the venerable 'strace' tool, but
using the infrastructure in the perf tools to allow tracing extra
targets:
[acme@sandy linux]$ perf trace --hell
Error: unknown option `hell'
usage: perf trace <PID>
-p, --pid <pid> trace events on existing process id
--tid <tid> trace events on existing thread id
--all-cpus system-wide collection from all CPUs
--cpu <cpu> list of cpus to monitor
--no-inherit child tasks do not inherit counters
--mmap-pages <n> number of mmap data pages
--uid <user> user to profile
[acme@sandy linux]$
Those should have the same semantics as when using with 'perf record'.
It gets stuck sometimes, but hey, it works sometimes too!
In time it should support perf.data based workloads, i.e. it should have
a:
-o filename
Command line option that will produce a perf.data file that can then be
used with 'perf trace' or any of the other perf tools (script, report,
etc).
It will also eventually have the set of functionalities described in the
previous 'trace' prototype by Thomas Gleixner:
"Announcing a new utility: 'trace'"
http://lwn.net/Articles/415728/
Also planned is to have some of the features suggested in the comments
of that LWN article.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly
Usage:
- kvm stat
run a command and gather performance counter statistics, it is the alias of
perf stat
- trace kvm events:
perf kvm stat record, or, if other tracepoints are interesting as well, we
can append the events like this:
perf kvm stat record -e timer:* -a
If many guests are running, we can track the specified guest by using -p or
--pid, -a is used to track events generated by all guests.
- show the result:
perf kvm stat report
The output example is following:
13005
13059
total 2 guests are running on the host
Then, track the guest whose pid is 13059:
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.253 MB perf.data.guest (~11065 samples) ]
See the vmexit events:
Analyze events for all VCPUs:
VM-EXIT Samples Samples% Time% Avg time
APIC_ACCESS 460 70.55% 0.01% 22.44us ( +- 1.75% )
HLT 93 14.26% 99.98% 832077.26us ( +- 10.42% )
EXTERNAL_INTERRUPT 64 9.82% 0.00% 35.35us ( +- 14.21% )
PENDING_INTERRUPT 24 3.68% 0.00% 9.29us ( +- 31.39% )
CR_ACCESS 7 1.07% 0.00% 8.12us ( +- 5.76% )
IO_INSTRUCTION 3 0.46% 0.00% 18.00us ( +- 11.79% )
EXCEPTION_NMI 1 0.15% 0.00% 5.83us ( +- -nan% )
Total Samples:652, Total events handled time:77396109.80us.
See the mmio events:
Analyze events for all VCPUs:
MMIO Access Samples Samples% Time% Avg time
0xfee00380:W 387 84.31% 79.28% 8.29us ( +- 3.32% )
0xfee00300:W 24 5.23% 9.96% 16.79us ( +- 1.97% )
0xfee00300:R 24 5.23% 7.83% 13.20us ( +- 3.00% )
0xfee00310:W 24 5.23% 2.93% 4.94us ( +- 3.84% )
Total Samples:459, Total events handled time:4044.59us.
See the ioport event:
Analyze events for all VCPUs:
IO Port Access Samples Samples% Time% Avg time
0xc050:POUT 3 100.00% 100.00% 13.75us ( +- 10.83% )
Total Samples:3, Total events handled time:41.26us.
And, --vcpu is used to track the specified vcpu and --key is used to sort the
result:
Analyze events for VCPU 0:
VM-EXIT Samples Samples% Time% Avg time
HLT 27 13.85% 99.97% 405790.24us ( +- 12.70% )
EXTERNAL_INTERRUPT 13 6.67% 0.00% 27.94us ( +- 22.26% )
APIC_ACCESS 146 74.87% 0.03% 21.69us ( +- 2.91% )
IO_INSTRUCTION 2 1.03% 0.00% 17.77us ( +- 20.56% )
CR_ACCESS 2 1.03% 0.00% 8.55us ( +- 6.47% )
PENDING_INTERRUPT 5 2.56% 0.00% 6.27us ( +- 3.94% )
Total Samples:195, Total events handled time:10959950.90us.
Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
Signed-off-by: Runzhen Wang <runzhen@linux.vnet.ibm.com>
[ Dong Hao <haodong@linux.vnet.ibm.com>
Runzhen Wang <runzhen@linux.vnet.ibm.com>:
- rebase it on current acme's tree
- fix the compiling-error on i386 ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1347870675-31495-4-git-send-email-haodong@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes the following:
+ make OUTPUT=/.../.build/perf-user/ DESTDIR=/.../.install/perf-user/ man install-man
make -C Documentation man
make[1]: Entering directory `/.../.source/linux.perf/tools/perf/Documentation'
make[2]: Entering directory `/.../.source/linux.perf/tools/perf'
make[2]: *** No rule to make target `PERF-VERSION-FILE'. Stop.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1344361396-7237-2-git-send-email-robert.richter@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As Namhyung Kim pointed, there are confused namings and descriptions of words
"cycle" and "clock" in mem-memset.c and mem-memcpy.c.
With the option "-c" (or "--clock", now renamed as "--cycle"), mem subsystem
measures cost of memset() and memcpy() with cpu-cycles event.
But current mem subsystem source code contains lots of confused variable
namings and descriptions with "clock" (e.g. the variable use_clock). This is a
very bad style because there is another software event named "cpu-clock". This
patch replaces wrong usage of "clock" to "cycle".
v2: modified Documentation/perf-bench.txt for the descriptions of
--cycle option
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1341236777-18457-1-git-send-email-h.mitake@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>