In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is 'perf kmem' support caller statistics for page. Unlike slab case,
the tracepoints in page allocator don't provide callsite info. So it
records with callchain and extracts callsite info.
Note that the callchain contains several memory allocation functions
which has no meaning for users. So skip those functions to get proper
callsites. I used following regex pattern to skip the allocator
functions:
^_?_?(alloc|get_free|get_zeroed)_pages?
This gave me a following list of functions:
# perf kmem record --page sleep 3
# perf kmem stat --page -v
...
alloc func: __get_free_pages
alloc func: get_zeroed_page
alloc func: alloc_pages_exact
alloc func: __alloc_pages_direct_compact
alloc func: __alloc_pages_nodemask
alloc func: alloc_page_interleave
alloc func: alloc_pages_current
alloc func: alloc_pages_vma
alloc func: alloc_page_buffers
alloc func: alloc_pages_exact_nid
...
The output looks mostly same as --alloc (I also added callsite column
to that) but groups entries by callsite. Currently, the order,
migrate type and GFP flag info is for the last allocation and not
guaranteed to be same for all allocations from the callsite.
---------------------------------------------------------------------------------------------
Total_alloc (KB) | Hits | Order | Mig.type | GFP flags | Callsite
---------------------------------------------------------------------------------------------
1,064 | 266 | 0 | UNMOVABL | 000000d0 | __pollwait
52 | 13 | 0 | UNMOVABL | 002084d0 | pte_alloc_one
44 | 11 | 0 | MOVABLE | 000280da | handle_mm_fault
20 | 5 | 0 | MOVABLE | 000200da | do_cow_fault
20 | 5 | 0 | MOVABLE | 000200da | do_wp_page
16 | 4 | 0 | UNMOVABL | 000084d0 | __pmd_alloc
16 | 4 | 0 | UNMOVABL | 00000200 | __tlb_remove_page
12 | 3 | 0 | UNMOVABL | 000084d0 | __pud_alloc
8 | 2 | 0 | UNMOVABL | 00000010 | bio_copy_user_iov
4 | 1 | 0 | UNMOVABL | 000200d2 | pipe_write
4 | 1 | 0 | MOVABLE | 000280da | do_wp_page
4 | 1 | 0 | UNMOVABL | 002084d0 | pgd_alloc
---------------------------------------------------------------------------------------------
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1429592107-1807-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
0d68bc92c4 breaks compiles on RHEL6/OL6:
cc1: warnings being treated as errors
builtin-kmem.c: In function ‘search_page_alloc_stat’:
builtin-kmem.c:322: error: declaration of ‘stat’ shadows a global declaration
node = &parent->rb_left;
/usr/include/sys/stat.h:455: error: shadowed declaration is here
builtin-kmem.c: In function ‘perf_evsel__process_page_alloc_event’:
builtin-kmem.c:378: error: declaration of ‘stat’ shadows a global declaration
/usr/include/sys/stat.h:455: error: shadowed declaration is here
builtin-kmem.c: In function ‘perf_evsel__process_page_free_event’:
builtin-kmem.c:431: error: declaration of ‘stat’ shadows a global declaration
/usr/include/sys/stat.h:455: error: shadowed declaration is here
Rename local variable to pstat to avoid the name conflict.
Signed-off-by: David Ahern <david.ahern@oracle.com>
Link: http://lkml.kernel.org/r/1429033773-31383-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
0d68bc92c4 breaks compiles on RHEL6/OL6:
cc1: warnings being treated as errors
builtin-kmem.c: In function ‘search_page_alloc_stat’:
builtin-kmem.c:322: error: declaration of ‘stat’ shadows a global declaration
node = &parent->rb_left;
/usr/include/sys/stat.h:455: error: shadowed declaration is here
builtin-kmem.c: In function ‘perf_evsel__process_page_alloc_event’:
builtin-kmem.c:378: error: declaration of ‘stat’ shadows a global declaration
/usr/include/sys/stat.h:455: error: shadowed declaration is here
builtin-kmem.c: In function ‘perf_evsel__process_page_free_event’:
builtin-kmem.c:431: error: declaration of ‘stat’ shadows a global declaration
/usr/include/sys/stat.h:455: error: shadowed declaration is here
Rename local variable to pstat to avoid the name conflict.
Signed-off-by: David Ahern <david.ahern@oracle.com>
Link: http://lkml.kernel.org/r/1429033773-31383-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Building the perf tool for 32-bit ARM results in the following build
error due to a combination of an incorrect conversion specifier and
compiling with -Werror:
builtin-kmem.c: In function ‘print_page_summary’:
builtin-kmem.c:644:9: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘u64’ [-Werror=format=]
nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
^
builtin-kmem.c:647:9: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘u64’ [-Werror=format=]
(total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
^
cc1: all warnings being treated as errors
This patch fixes the problem by consistently using PRIu64 for printing
out u64 values.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1429796437-1790-1-git-send-email-will.deacon@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf kmem to use perf.data when it is not owned by current user
or root.
Example:
# perf kmem record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 5315665 Apr 2 10:54 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf kmem stat
File perf.data not owned by current user or root (use -f to override)
# perf kmem stat -f
Error: unknown switch `f'
usage: perf kmem [<options>] {record|stat}
-i, --input <file> input file name
-v, --verbose be more verbose (show symbol address, etc)
--caller show per-callsite statistics
--alloc show per-allocation statistics
-s, --sort <key[,key2...]>
sort by keys: ptr, call_site, bytes, hit,
pingpong, frag
-l, --line <num> show n lines
--raw-ip show raw ip instead of symbol
As shown above, the -f option does not work at all.
After this patch:
# perf kmem stat
File perf.data not owned by current user or root (use -f to override)
# perf kmem stat -f
SUMMARY
=======
Total bytes requested: 437599
Total bytes allocated: 615472
Total bytes wasted on internal fragmentation: 177873
Internal fragmentation: 28.900259%
Cross CPU allocations: 6/1192
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-4-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When it tries to free 'str', it was already updated by strsep() - so it
needs to save the original pointer.
# perf kmem stat -s xxx,hit
Error: Unknown --sort key: 'xxx'
*** Error in `perf': free(): invalid pointer: 0x0000000000e9e7b6 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x7198e)[0x7fc7e6e0d98e]
/usr/lib/libc.so.6(+0x76dee)[0x7fc7e6e12dee]
/usr/lib/libc.so.6(+0x775cb)[0x7fc7e6e135cb]
./perf[0x44a1b5]
./perf[0x490b20]
./perf(parse_options_step+0x173)[0x491773]
./perf(parse_options_subcommand+0xa7)[0x491fb7]
./perf(cmd_kmem+0x2bc)[0x44ae4c]
./perf[0x47aa13]
./perf(main+0x60a)[0x427a9a]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7fc7e6dbc800]
./perf(_start+0x29)[0x427bb9]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1426145571-3065-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently vmlinux_path__init() only tries to find vmlinux file from
current directory, /boot and some canonical directories with version
number of the running kernel. This can be a problem when reporting old
data recorded on a kernel version not running currently.
We can use --symfs option for this but it's annoying for user to do it
always. As we already have the info in the perf.data file, it can be
changed to use it for the search automatically.
Before:
$ perf report
...
# Samples: 4K of event 'cpu-clock'
# Event count (approx.): 1067250000
#
# Overhead Command Shared Object Symbol
# ........ .......... ................. ..............................
71.87% swapper [kernel.kallsyms] [k] recover_probed_instruction
After:
# Overhead Command Shared Object Symbol
# ........ .......... ................. ....................
71.87% swapper [kernel.kallsyms] [k] native_safe_halt
This requires to change signature of symbol__init() to receive struct
perf_session_env *.
Reported-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1407825645-24586-14-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently many perf commands annotate/evlist/report/script/lock etc all
support "-i" option to chose a specific perf data, and all of them
create a local "input_name" to save the file name for that perf data.
Since most of these commands need it, we can add a global variable for
it, also it can some other benefits:
1. When calling script browser inside hists/annotation browser, it needs
to know the perf data file name to run that script.
2. For further feature like runtime switching to another perf data file,
this variable can also help.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1351569369-26732-2-git-send-email-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf defines both __used and __unused variables to use for marking
unused variables. The variable __used is defined to
__attribute__((__unused__)), which contradicts the kernel definition to
__attribute__((__used__)) for new gcc versions. On Android, __used is
also defined in system headers and this leads to warnings like: warning:
'__used__' attribute ignored
__unused is not defined in the kernel and is not a standard definition.
If __unused is included everywhere instead of __used, this leads to
conflicts with glibc headers, since glibc has a variables with this name
in its headers.
The best approach is to use __maybe_unused, the definition used in the
kernel for __attribute__((unused)). In this way there is only one
definition in perf sources (instead of 2 definitions that point to the
same thing: __used and __unused) and it works on both Linux and Android.
This patch simply replaces all instances of __used and __unused with
__maybe_unused.
Signed-off-by: Irina Tirdea <irina.tirdea@intel.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com
[ committer note: fixed up conflict with a116e05 in builtin-sched.c ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We already lookup the associated event_format when reading the perf.data
header, so that we can cache the tracepoint name in evsel->name, so do
it a little further and save the event_format itself, so that we can
avoid relookups in tools that need to access it.
Change the tools to take the most obvious advantage, when they were
using pevent_find_event directly. More work is needed for further
removing the need of a pointer to pevent, such as when asking for event
field values ("common_pid" and the other common fields and per
event_format fields).
This is something that was planned but only got actually done when
Andrey Wagin needed to do this lookup at perf_tool->sample() time, when
we don't have access to pevent (session->pevent) to use with
pevent_find_event().
Cc: Andrey Wagin <avagin@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-txkvew2ckko0b594ae8fbnyk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The pevent thing is per perf.data file, so I made it stop being static
and become a perf_session member, so tools processing perf.data files
use perf_session and _there_ we read the trace events description into
session->pevent and then change everywhere to stop using that single
global pevent variable and use the per session one.
Note that it _doesn't_ fall backs to trace__event_id, as we're not
interested at all in what is present in the
/sys/kernel/debug/tracing/events in the workstation doing the analysis,
just in what is in the perf.data file.
This patch also introduces perf_session__set_tracepoints_handlers that
is the perf perf.data/session way to associate handlers to tracepoint
events by resolving their IDs using the events descriptions stored in a
perf.data file. Make 'perf sched' use it.
Reported-by: Dmitry Antipov <dmitry.antipov@linaro.org>
Tested-by: Dmitry Antipov <dmitry.antipov@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linaro-dev@lists.linaro.org
Cc: patches@linaro.org
Link: http://lkml.kernel.org/r/20120625232016.GA28525@infradead.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The event parsing code in perf was originally copied from trace-cmd
but never was kept up-to-date with the changes that was done there.
The trace-cmd libtraceevent.a code is much more mature than what is
currently in perf.
This updates the code to use wrappers to handle the calls to the
new event parsing code. The new code requires a handle to be pass
around, which removes the global event variables and allows
more than one event structure to be read from different files
(and different machines).
But perf still has the old global events and the code throughout
perf does not yet have a nice way to pass around a handle.
A global 'pevent' has been made for perf and the old calls have
been created as wrappers to the new event parsing code that uses
the global pevent.
With this change, perf can later incorporate the pevent handle into
the perf structures and allow more than one file to be read and
compared, that contains different events.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>