This patch introduces an abstraction for exporting sample data in a
database-friendly way. The abstraction does not implement the actual
output. A subsequent patch takes this facility into use for extending
the script interface.
The abstraction is needed because static data like symbols, dsos, comms
etc need to be exported only once. That means allocating them a unique
identifier and recording it on each structure. The member 'db_id' is
used for that. 'db_id' is just a 64-bit sequence number.
Exporting centres around the db_export__sample() function which exports
the associated data structures if they have not yet been allocated a
db_id.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1414061124-26830-6-git-send-email-adrian.hunter@intel.com
[ committer note: Stash db_id using symbol_conf.priv_size + symbol__priv() and foo->priv areas ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This new COMM infrastructure provides two features:
1) It keeps track of all comms lifecycle for a given thread. This way we
can associate a timeframe to any thread COMM, as long as
PERF_SAMPLE_TIME samples are joined to COMM and fork events.
As a result we should have more precise COMM sorted hists with seperated
entries for pre and post exec time after a fork.
2) It also makes sure that a given COMM string is not duplicated but
rather shared among the threads that refer to it. This way the threads
COMM can be compared against pointer values from the sort
infrastructure.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-hwjf70b2wve9m2kosxiq8bb3@git.kernel.org
[ Rename some accessor functions ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
[ Use __ as separator for class__method for private comm_str methods ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This way we can later delimit a lifecycle for the COMM and map a hist to
a precise COMM:timeslice couple.
PERF_RECORD_COMM and PERF_RECORD_FORK events that don't have
PERF_SAMPLE_TIME samples can only send 0 value as a timestamp and thus
should overwrite any previous COMM on a given thread because there is no
sensible way to keep track of all the comms lifecycles in a thread
without time informations.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-6tyow99vgmmtt9qwr2u2lqd7@git.kernel.org
[ Made it cope with PERF_RECORD_MMAP2 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Occassionally events (e.g., context-switch, sched tracepoints) are losing
the conversion of sample data associated with a thread. For example:
$ perf record -e sched:sched_switch -c 1 -a -- sleep 5
$ perf script
<selected events shown>
ls 30482 [000] 1379727.583037: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
ls 30482 [000] 1379727.586339: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
:30482 30482 [000] 1379727.589462: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
The last line lost the conversion from tid to comm. If you look at the events
(perf script -D) you see why - a SAMPLE event is generated after the EXIT:
0 1379727589449774 0x1540b0 [0x38]: PERF_RECORD_EXIT(30482:30482):(30482:30482)
0 1379727589462497 0x1540e8 [0x80]: PERF_RECORD_SAMPLE(IP, 1): 30482/30482: 0xffffffff816416f1 period: 1 addr: 0
... thread: :30482:30482
When perf processes the EXIT event the thread is moved to the dead_threads
list. When the SAMPLE event is processed no thread exists for the pid so a new
one is created by machine__findnew_thread.
This patch address the problem by delaying the move to the dead_threads list
until the tid is re-used (per Adrian's suggestion).
With this patch we get the previous example shows:
ls 30482 [000] 1379727.583037: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
ls 30482 [000] 1379727.586339: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
ls 30482 [000] 1379727.589462: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
and
0 1379727589449774 0x1540b0 [0x38]: PERF_RECORD_EXIT(30482:30482):(30482:30482)
0 1379727589462497 0x1540e8 [0x80]: PERF_RECORD_SAMPLE(IP, 1): 30482/30482: 0xffffffff816416f1 period: 1 addr: 0
... thread: ls:30482
v4: per Arnaldo's request add dead flag to thread struct and set when task exits
v3: re-do from a time based check to a delayed move to dead_threads list
v2: Rebased to latest perf/core branch. Changed time comparison to use
a macro which explicitly shows the time basis
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1376491767-84171-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly
Usage:
- kvm stat
run a command and gather performance counter statistics, it is the alias of
perf stat
- trace kvm events:
perf kvm stat record, or, if other tracepoints are interesting as well, we
can append the events like this:
perf kvm stat record -e timer:* -a
If many guests are running, we can track the specified guest by using -p or
--pid, -a is used to track events generated by all guests.
- show the result:
perf kvm stat report
The output example is following:
13005
13059
total 2 guests are running on the host
Then, track the guest whose pid is 13059:
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.253 MB perf.data.guest (~11065 samples) ]
See the vmexit events:
Analyze events for all VCPUs:
VM-EXIT Samples Samples% Time% Avg time
APIC_ACCESS 460 70.55% 0.01% 22.44us ( +- 1.75% )
HLT 93 14.26% 99.98% 832077.26us ( +- 10.42% )
EXTERNAL_INTERRUPT 64 9.82% 0.00% 35.35us ( +- 14.21% )
PENDING_INTERRUPT 24 3.68% 0.00% 9.29us ( +- 31.39% )
CR_ACCESS 7 1.07% 0.00% 8.12us ( +- 5.76% )
IO_INSTRUCTION 3 0.46% 0.00% 18.00us ( +- 11.79% )
EXCEPTION_NMI 1 0.15% 0.00% 5.83us ( +- -nan% )
Total Samples:652, Total events handled time:77396109.80us.
See the mmio events:
Analyze events for all VCPUs:
MMIO Access Samples Samples% Time% Avg time
0xfee00380:W 387 84.31% 79.28% 8.29us ( +- 3.32% )
0xfee00300:W 24 5.23% 9.96% 16.79us ( +- 1.97% )
0xfee00300:R 24 5.23% 7.83% 13.20us ( +- 3.00% )
0xfee00310:W 24 5.23% 2.93% 4.94us ( +- 3.84% )
Total Samples:459, Total events handled time:4044.59us.
See the ioport event:
Analyze events for all VCPUs:
IO Port Access Samples Samples% Time% Avg time
0xc050:POUT 3 100.00% 100.00% 13.75us ( +- 10.83% )
Total Samples:3, Total events handled time:41.26us.
And, --vcpu is used to track the specified vcpu and --key is used to sort the
result:
Analyze events for VCPU 0:
VM-EXIT Samples Samples% Time% Avg time
HLT 27 13.85% 99.97% 405790.24us ( +- 12.70% )
EXTERNAL_INTERRUPT 13 6.67% 0.00% 27.94us ( +- 22.26% )
APIC_ACCESS 146 74.87% 0.03% 21.69us ( +- 2.91% )
IO_INSTRUCTION 2 1.03% 0.00% 17.77us ( +- 20.56% )
CR_ACCESS 2 1.03% 0.00% 8.55us ( +- 6.47% )
PENDING_INTERRUPT 5 2.56% 0.00% 6.27us ( +- 3.94% )
Total Samples:195, Total events handled time:10959950.90us.
Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
Signed-off-by: Runzhen Wang <runzhen@linux.vnet.ibm.com>
[ Dong Hao <haodong@linux.vnet.ibm.com>
Runzhen Wang <runzhen@linux.vnet.ibm.com>:
- rebase it on current acme's tree
- fix the compiling-error on i386 ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1347870675-31495-4-git-send-email-haodong@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To untangle it from struct thread handling, that is tied to symbols, etc.
Right now in the python bindings I'm working on I need just a subset of
the util/ files, untangling it allows me to do that.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that later, we can pass the thread_map instance instead of
(thread_num, thread_map) for things like perf_evsel__open and friends,
just like was done with cpu_map.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For long running sessions with many threads with short lifetimes the
amount of memory that the buildid process takes is too much.
Since we don't have hist_entries that may be pointing to them, we can
just release the resources associated with each thread when the exit
(PERF_RECORD_EXIT) event is received.
For normal processing we need to annotate maps with hits, and thus
hist_entries pointing to it and drop the ones that had none. Will be
done in a followup patch.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move them to a session->dead_threads list just like we do with maps that
are replaced, because we may have hist_entries pointing to them.
This fixes a bug when inserting maps for a new thread that reused the
TID, mixing maps for two different threads, causing an endless loop.
The code for insering maps should be made more robust but for .35 this
is the minimalistic patch.
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thru series of refactorings functions were being renamed but not
moved to map.c to reduce patch noise, now lets have them in the
same place so that use of the symbol system by tools can be
constrained to building and linking fewer source files:
symbol.c, map.c and rbtree.c.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1269557941-15617-3-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Parameter --pid (or -p) of perf currently means a thread-wide
collection. For exmaple, if a process whose id is 8888 has 10
threads, 'perf top -p 8888' just collects the main thread
statistics. That's misleading. Users are used to attach a whole
process when debugging a process by gdb. To follow normal usage
style, the patch change --pid to process-wide collection and add
--tid (-t) to mean a thread-wide collection.
Usage example is:
# perf top -p 8888
# perf record -p 8888 -f sleep 10
# perf stat -p 8888 -f sleep 10
Above commands collect the statistics of all threads of process
8888.
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sheng Yang <sheng@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: zhiteng.huang@intel.com
Cc: Zachary Amsden <zamsden@redhat.com>
LKML-Reference: <1268922965-14774-3-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If -vv is used just the map table will be printed, -vvv will
print the symbol table too, with it we can see that we have a
bug where some samples are not being resolved to a map when we
get them in the perf.data stream, but after we have it all
processed, we can find the right map, some reordering probably
is happening.
Upcoming patches will provide ways to ask for most PERF_SAMPLE_
conditional samples to be taken for !PERF_RECORD_SAMPLE events
too, then we'll be able to ask for PERF_SAMPLE_TIME and
PERF_SAMPLE_CPU to help diagnose this.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268161097-17761-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As the parent comm then is worthless, confusing users about the
thread where the sample really happened, leading to think that
the sample happened in the parent, not where it really happened,
in the children of a thread for which a PERF_RECORD_COMM event
was not received.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1266627727-19715-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I noticed while writing the first test in 'perf regtest' that to
just test the symbol handling routines one needs to create a
perf session, that is a layer centered on a perf.data file,
events, etc, so I untied these layers.
This reduces the complexity for the users as the number of
parameters to most of the symbols and session APIs now was
reduced while not adding more state to all the map instances by
only having data that is needed to split the kernel (kallsyms
and ELF symtab sections) maps and do vmlinux relocation on the
main kernel map.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1265223128-11786-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We were always looking at the running machine /proc/modules,
even when processing a perf.data file, which only makes sense
when we're doing 'perf record' and 'perf report' on the same
machine, and in close sucession, or if we don't use modules at
all, right Peter? ;-)
Now, at 'perf record' time we read /proc/modules, find the long
path for modules, and put them as PERF_MMAP events, just like we
did to encode the reloc reference symbol for vmlinux. Talking
about that now it is encoded in .pgoff, so that we can use
.{start,len} to store the address boundaries for the kernel so
that when we reconstruct the kmaps tree we can do lookups right
away, without having to fixup the end of the kernel maps like we
did in the past (and now only in perf record).
One more step in the 'perf archive' direction when we'll finally
be able to collect data in one machine and analyse in another.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1263396139-4798-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is still some more work to do to disentangle map creation
from DSO loading, but this happens only for the kernel, and for
the early adopters of perf diff, where this disentanglement
matters most, we'll be testing different kernels, so no problem
here.
Further clarification: right now we create the kernel maps for
the various modules and discontiguous kernel text maps when
loading the DSO, we should do it as a two step process, first
creating the maps, for multiple mappings with the same DSO
store, then doing the dso load just once, for the first hit on
one of the maps sharing this DSO backing store.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260741029-4430-6-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>