Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
* Add --repeat global option to 'perf bench' to be used in benchmarks
such as the existing 'futex' one, that was modified to use it instead
of a local option. (Davidlohr Bueso)
* Fix fd -> pathname resolution in 'trace', be it using /proc or
a vfs_getname probe point. (Arnaldo Carvalho de Melo)
* Add suggestion of how to set perf_event_paranoid sysctl, to help
non-root users trying tools like 'trace' to get a working environment.
(Arnaldo Carvalho de Melo)
Fixes:
* Fix memory leak in the 'sched-messaging' perf bench test. (Davidlohr Bueso)
* The -o and -n 'perf bench mem' options are mutually exclusive, emit error
when both are specified. (Davidlohr Bueso)
* Fix scrollbar refresh row index in the ui browser, problem exposed now
that headers will be added and will be allowed to be switched on/off.
(Jiri Olsa)
Cleanups:
* Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo)
* Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de Melo)
* No need to reimplement err() in 'perf bench sched-messaging', drop barf().
(Davidlohr Bueso).
* Remove ev_name argument from perf_evsel__hists_browse, can be obtained
from the other parameters. (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The function machine__get_kernel_start_addr() was taking the first symbol
of kallsyms as the start address. This is incorrect in certain cases
where the first symbol is something at 0, while the actual kernel
functions begin at a later point (e.g. 0x80200000).
This patch fixes machine__get_kernel_start_addr() to search for the
symbol "_text" or "_stext", which marks the beginning of kernel mapping.
This was already being done in machine__create_kernel_maps(). Thus, this
patch is just a refactor, to move that code into
machine__get_kernel_start_addr().
Signed-off-by: Simon Que <sque@chromium.org>
Link: http://lkml.kernel.org/r/1402943529-13244-1-git-send-email-sque@chromium.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
There are a number of benchmarks that do single runs and as a result
does not really help users gain a general idea of how the workload
performs. So the user must either manually do multiple runs or just use
single bogus results.
This option will enable users to specify the amount of runs (arbitrarily
defaulted to 10, to use the existing benchmarks default) through the
'--repeat' option. Add it to perf-bench instead of implementing it
always in each specific benchmark.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1402942467-10671-2-git-send-email-davidlohr@hp.com
[ Kept the existing default of 10, changing it to something else should
be done on separate patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Caching dso data file descriptors to avoid expensive re-opens
especially during DWARF unwind.
We keep dsos data file descriptors open until their count reaches
the half of the current fd open limit (RLIMIT_NOFILE). In this case
we close file descriptor of the first opened dso object.
We've got overall speedup (~27% for my workload) of report:
'perf report --stdio -i perf-test.data' (3 runs)
(perf-test.data size was around 12GB)
current code:
545,640,944,228 cycles ( +- 0.53% )
785,255,798,320 instructions ( +- 0.03% )
366.340910010 seconds time elapsed ( +- 3.65% )
after change:
435,895,036,114 cycles ( +- 0.26% )
636,790,271,176 instructions ( +- 0.04% )
266.481463387 seconds time elapsed ( +- 0.13% )
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1401892622-30848-7-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
When configuring event perf checked a wrong condition that user
specified both of freq (-F) and period (-c) or the event has no
default value. This worked because most of events don't have default
value and only tracepoint events have default of 1 (and it's not
desirable to change it for those events).
However, Andi's downloadable event patch changes the situation so it
cannot change the value for those events. Fix it by allowing override
the default value if user gives one of the options.
$ perf record -a -e uops_retired.all -F 4000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.325 MB perf.data (~14185 samples) ]
$ perf evlist -F
cpu/uops_retired.all/: sample_freq=4000
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1402292617-26278-1-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Pull perf/core improvements from Arnaldo Carvalho de Melo:
User visible:
* Improve 'perf probe' error messages, moving some diagnostic messages to
only appear in --verbose mode and fixing up some error reporting related
to variables and struct members. (Masami Hiramatsu)
* Reflow 'perf timechart' man page. (Stanislav Fomichev)
Developer stuff:
* Be more precise when reporting missing libraries in a static tool build.
(Arnaldo Carvalho de Melo)
* Show error messages from the multiple make invoked from 'make build-test'.
(Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Improve error messages of 'perf probe --line' mode.
Currently 'perf probe' shows the "Debuginfo analysis failed" message with
an error code when the given symbol is not found:
-----
# perf probe -L page_cgroup_init_flatmem
Debuginfo analysis failed. (-2)
Error: Failed to show lines.
-----
But -2 (-ENOENT) means that the given source line or function was not
found. With this patch, 'perf probe' shows the correct error message:
-----
# perf probe -L page_cgroup_init_flatmem
Specified source line is not found.
Error: Failed to show lines.
-----
There is also another debug error code is shown in the same function
after get_real_path(). This removes that too.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140606071406.6788.47850.stgit@kbuild-fedora.novalocal
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix an error message when failed to find given address in --vars
mode.
Without this fix, perf probe -V doesn't show the final "Error"
message if it fails to find given source line. Moreover, it
tells it fails to find "variables" instead of the source line.
-----
# perf probe -V foo@bar
Failed to find variables at foo@bar (0)
-----
The result also shows mysterious error code. Actually the error
returns 0 or -ENOENT means that it just fails to find the address
of given source line. (0 means there is no matching address,
and -ENOENT means there is an entry(DIE) but it has no instance,
e.g. an empty inlined function)
This fixes it to show what happened and the final error message
as below.
-----
# perf probe -V foo@bar
Failed to find the address of foo@bar
Error: Failed to show vars.
-----
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140606071359.6788.84716.stgit@kbuild-fedora.novalocal
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Show error code and description only in verbose mode if 'perf probe'
command failed.
Current 'perf probe' shows error code with final error message, and that
is meaningless for many users.
This changes error messages to show the error code and its description
only in verbose mode (-v option).
Without this patch:
-----
# perf probe -a do_execve@hoge
Probe point 'do_execve@hoge' not found.
Error: Failed to add events. (-2)
-----
With this patch, normally the message doesn't show the misterious error
number:
-----
# perf probe -a do_execve@hoge
Probe point 'do_execve@hoge' not found.
Error: Failed to add events.
-----
And in verbose mode, it also shows additional error messages as below:
-----
# perf probe -va do_execve@hoge
probe-definition(0): do_execve@hoge
symbol:do_execve file:hoge line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (6 entries long)
Using /lib/modules/3.15.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/3.15.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Probe point 'do_execve@hoge' not found.
Error: Failed to add events. Reason: No such file or directory (Code: -2)
-----
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140606071352.6788.76943.stgit@kbuild-fedora.novalocal
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Improve the error message if we can not find given member in the given
structure. Currently perf probe shows a wrong error message as below.
-----
# perf probe getname_flags:65 "result->BOGUS"
result(type:filename) has no member BOGUS.
Failed to find 'result' in this function.
Error: Failed to add events. (-22)
-----
The first message is correct, but the second one is not, since we didn't
fail to find a variable but fails to find the member of given variable.
-----
# perf probe getname_flags:65 "result->BOGUS"
result(type:filename) has no member BOGUS.
Error: Failed to add events. (-22)
-----
With this patch, the error message shows only the first one. And if we
really failed to find given variable, it tells us so.
-----
# perf probe getname_flags:65 "BOGUS"
Failed to find 'BOGUS' in this function.
Error: Failed to add events. (-2)
-----
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140606071345.6788.23744.stgit@kbuild-fedora.novalocal
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In perf's 'mem-mode', one can get access to a whole bunch of details specific to a
particular sample instruction. A bunch of those details relate to the data
address.
One interesting thing you can do with data addresses is to convert them into a unique
cacheline they belong too. Organizing these data cachelines into similar groups and sorting
them can reveal cache contention.
This patch creates an alogorithm based on various sample details that can help group
entries together into data cachelines and allows 'perf report' to sort on it.
The algorithm relies on having proper mmap2 support in the kernel to help determine
if the memory map the data address belongs to is private to a pid or globally shared.
The alogortithm is as follows:
o group cpumodes together
o group entries with discovered maps together
o sort on major, minor, inode and inode generation numbers
o if userspace anon, then sort on pid
o sort on cachelines based on data addresses
The 'dcacheline' sort option in 'perf report' only works in 'mem-mode'.
Sample output:
#
# Samples: 206 of event 'cpu/mem-loads/pp'
# Total weight : 2534
# Sort order : dcacheline,pid
#
# Overhead Samples Data Cacheline Command: Pid
# ........ ............ ...................................................................... ..................
#
13.22% 1 [k] 0xffff88042f08ebc0 swapper: 0
9.27% 1 [k] 0xffff88082e8cea80 swapper: 0
3.59% 2 [k] 0xffffffff819ba180 swapper: 0
0.32% 1 [k] arch_trigger_all_cpu_backtrace_handler_na.23901+0xffffffffffffffe0 swapper: 0
0.32% 1 [k] timekeeper_seq+0xfffffffffffffff8 swapper: 0
Note: Added a '+1' to symlen size in hists__calc_col_len to prevent the next column
from prematurely tabbing over and mis-aligning. Not sure what the problem is.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1401208087-181977-8-git-send-email-dzickus@redhat.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
With the Sebastian's change of handling num array argument (of raw
syscall enter), the script still failed to work like this:
$ perf record -e raw_syscalls:* sleep 1
$ perf script -g python
$ perf script -s perf-script.py
...
Traceback (most recent call last):
File "perf-script.py", line 42, in raw_syscalls__sys_enter
(id, args),
TypeError: %u format: a number is required, not list
Fatal Python error: problem in Python trace event handler
Aborted (core dumped)
This is because the generated script tries to print the array arg as
unsigned integer (%u). Since the python seems to convert arguments to
strings by default, just using %s solved the problem for me.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/1401338695-18837-1-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Coming in v3.16, trace events will be able to save bitmasks in raw
format in the ring buffer and output it with the __get_bitmask() macro.
In order for userspace tools to parse this, it must be able to handle
the __get_bitmask() call and be able to convert the data that's in
the ring buffer into a nice bitmask format. The output is similar to
what the kernel uses to print bitmasks, with a comma separator every
4 bytes (8 characters).
This allows for cpumasks to also be saved efficiently.
The first user is the thermal:thermal_power_limit event which has the
following output:
thermal_power_limit: cpus=0000000f freq=1900000 cdev_state=0 power=5252
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home
Suggested-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Javi Merino <javi.merino@arm.com>
Link: http://lkml.kernel.org/r/20140603032224.229186537@goodmis.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>