Allow to add events on the local functions without debuginfo.
(With the debuginfo, we can add events even on inlined functions)
Currently, probing on local functions requires debuginfo to
locate actual address. It is also possible without debuginfo since
we have symbol maps.
Without this change;
----
# ./perf probe -a t_show
Added new event:
probe:t_show (on t_show)
You can now use it in all perf tools, such as:
perf record -e probe:t_show -aR sleep 1
# ./perf probe -x perf -a identity__map_ip
no symbols found in /kbuild/ksrc/linux-3/tools/perf/perf, maybe install a debug package?
Failed to load map.
Error: Failed to add events. (-22)
----
As the above results, perf probe just put one event
on the first found symbol for kprobe event. Moreover,
for uprobe event, perf probe failed to find local
functions.
With this change;
----
# ./perf probe -a t_show
Added new events:
probe:t_show (on t_show)
probe:t_show_1 (on t_show)
probe:t_show_2 (on t_show)
probe:t_show_3 (on t_show)
You can now use it in all perf tools, such as:
perf record -e probe:t_show_3 -aR sleep 1
# ./perf probe -x perf -a identity__map_ip
Added new events:
probe_perf:identity__map_ip (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf)
probe_perf:identity__map_ip_1 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf)
probe_perf:identity__map_ip_2 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf)
probe_perf:identity__map_ip_3 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:identity__map_ip_3 -aR sleep 1
----
Now we succeed to put events on every given local functions
for both kprobes and uprobes. :)
Note that this also introduces some symbol rbtree
iteration macros; symbols__for_each, dso__for_each_symbol,
and map__for_each_symbol. These are for walking through
the symbol list in a map.
Changes from v2:
- Fix add_exec_to_probe_trace_events() not to convert address
to tp->symbol any more.
- Fix to set kernel probes based on ref_reloc_sym.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140206053225.29635.15026.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It was called "data_type", but in this context "data" is way too vague,
it could mean the "data" ELF segment, or something else.
Since we have dso__read_binary_type_filename() and the values this field
receives are all DSO__BINARY_TYPE_<FOO> we may as well call it
"binary_type" for consistency sake.
It also seems more appropriate since it determines if we can do
operations like annotation and DWARF unwinding, that needs more than
just the symtab, requiring access to ELF text segments, CFI ELF
sections, etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-2lkbqrn23uc2uvnn9w9in379@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On OpenEmbedded the symbol files are located under a .debug folder on
the same folder as the binary file.
This patch adds support for such files.
Without this patch on perf top you can see:
no symbols found in /usr/lib/gstreamer-1.0/libtheoraenc.so.1.1.2, maybe
install a debug package?
84.56% libtheoraenc.so.1.1.2 [.] 0x000000000000b346
With this patch symbols are shown:
19.06% libtheoraenc.so.1.1.2 [.] oc_int_frag_satd_thresh_mmxext
9.76% libtheoraenc.so.1.1.2 [.] oc_analyze_mb_mode_luma
5.58% libtheoraenc.so.1.1.2 [.] oc_qii_state_advance
4.84% libtheoraenc.so.1.1.2 [.] oc_enc_tokenize_ac
...
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1379512574-25912-1-git-send-email-ricardo.ribalda@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The new "object code reading" test shows that it is not possible to read
object code from vmlinux. That is because the mappings do not map to
the dso. This patch fixes that.
A side-effect of changing the kernel map is that the "reloc" offset must
be taken into account. As a result of that separate map functions for
relocation are no longer needed.
Also fixing up the maps to match the symbols no longer makes sense and
so is not done.
The vmlinux dso data_type is now set to either DSO_BINARY_TYPE__VMLINUX
or DSO_BINARY_TYPE__GUEST_VMLINUX as approprite, which enables the
correct file name to be determined by dso__binary_type_file().
This patch breaks the "vmlinux symtab matches kallsyms" test. That is
fixed in a following patch.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1375875537-4509-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When "perf record" was used on a large machine with a lot of CPUs, the
perf post-processing time (the time after the workload was done until
the perf command itself exited) could take a lot of minutes and even
hours depending on how large the resulting perf.data file was.
While running AIM7 1500-user high_systime workload on a 80-core x86-64
system with a 3.9 kernel (with only the -s -a options used), the
workload itself took about 2 minutes to run and the perf.data file had a
size of 1108.746 MB. However, the post-processing step took more than 10
minutes.
With a gprof-profiled perf binary, the time spent by perf was as
follows:
% cumulative self self total
time seconds seconds calls s/call s/call name
96.90 822.10 822.10 192156 0.00 0.00 dsos__find
0.81 828.96 6.86 172089958 0.00 0.00 rb_next
0.41 832.44 3.48 48539289 0.00 0.00 rb_erase
So 97% (822 seconds) of the time was spent in a single dsos_find()
function. After analyzing the call-graph data below:
-----------------------------------------------
0.00 822.12 192156/192156 map__new [6]
[7] 96.9 0.00 822.12 192156 vdso__dso_findnew [7]
822.10 0.00 192156/192156 dsos__find [8]
0.01 0.00 192156/192156 dsos__add [62]
0.01 0.00 192156/192366 dso__new [61]
0.00 0.00 1/45282525 memdup [31]
0.00 0.00 192156/192230 dso__set_long_name [91]
-----------------------------------------------
822.10 0.00 192156/192156 vdso__dso_findnew [7]
[8] 96.9 822.10 0.00 192156 dsos__find [8]
-----------------------------------------------
It was found that the vdso__dso_findnew() function failed to locate
VDSO__MAP_NAME ("[vdso]") in the dso list and have to insert a new
entry at the end for 192156 times. This problem is due to the fact that
there are 2 types of name in the dso entry - short name and long name.
The initial dso__new() adds "[vdso]" to both the short and long names.
After that, vdso__dso_findnew() modifies the long name to something
like /tmp/perf-vdso.so-NoXkDj. The dsos__find() function only compares
the long name. As a result, the same vdso entry is duplicated many
time in the dso list. This bug increases memory consumption as well
as slows the symbol processing time to a crawl.
To resolve this problem, the dsos__find() function interface was
modified to enable searching either the long name or the short
name. The vdso__dso_findnew() will now search only the short name
while the other call sites search for the long name as before.
With this change, the cpu time of perf was reduced from 848.38s to
15.77s and dsos__find() only accounted for 0.06% of the total time.
0.06 15.73 0.01 192151 0.00 0.00 dsos__find
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: "Chandramouleeswaran, Aswin" <aswin@hp.com>
Cc: "Norton, Scott J" <scott.norton@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368110568-64714-1-git-send-email-Waiman.Long@hp.com
[ replaced TRUE/FALSE with stdbool.h equivalents, fixing builds where
those macros are not present (NO_LIBPYTHON=1 NO_LIBPERL=1), fix from Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>