Running the following perf-stat command on an arm64 system produces the
following result...
[root@aarch64 ~]# perf stat -e kmem:mm_page_alloc -a sleep 1
Warning: [kmem:mm_page_alloc] function sizeof not defined
Warning: Error: expected type 4 but read 0
Segmentation fault
[root@aarch64 ~]#
The second warning was a result of the first warning not stopping
processing after it detected the issue.
That is, code that found the issue reported the first problem, but
because it did not exit out of the functions smoothly, it caused the
other warning to appear and not only that, it later caused the SIGSEGV.
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150820151632.13927.13791.email-sent-by-dnelson@teal
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch utilizes previous introduced bpf_load_program to load
programs in the ELF file into kernel. Result is stored in 'fd' field in
'struct bpf_program'.
During loading, it allocs a log buffer and free it before return. Note
that that buffer is not passed to bpf_load_program() if the first
loading try is successful. Doesn't use a statically allocated log buffer
to avoid potention multi-thread problem.
Instructions collected during opening is cleared after loading.
load_program() is created for loading a 'struct bpf_insn' array into
kernel, bpf_program__load() calls it. By this design we have a function
loads instructions into kernel. It will be used by further patches,
which creates different instances from a program and load them into
kernel.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-20-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If an eBPF program accesses a map, LLVM generates a load instruction
which loads an absolute address into a register, like this:
ld_64 r1, <MCOperand Expr:(mymap)>
...
call 2
That ld_64 instruction will be recorded in relocation section.
To enable the usage of that map, relocation must be done by replacing
the immediate value by real map file descriptor so it can be found by
eBPF map functions.
This patch to the relocation work based on information collected by
patches:
'bpf tools: Collect symbol table from SHT_SYMTAB section',
'bpf tools: Collect relocation sections from SHT_REL sections'
and
'bpf tools: Record map accessing instructions for each program'.
For each instruction which needs relocation, it inject corresponding
file descriptor to imm field. As a part of protocol, src_reg is set to
BPF_PSEUDO_MAP_FD to notify kernel this is a map loading instruction.
This is the final part of map relocation patch. The principle of map
relocation is described in commit message of 'bpf tools: Collect symbol
table from SHT_SYMTAB section'.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-18-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch records the indices of instructions which are needed to be
relocated. That information is saved in the 'reloc_desc' field in
'struct bpf_program'. In the loading phase (this patch takes effect in
the opening phase), the collected instructions will be replaced by map
loading instructions.
Since we are going to close the ELF file and clear all data at the end
of the 'opening' phase, the ELF information will no longer be valid in
the 'loading' phase. We have to locate the instructions before maps are
loaded, instead of directly modifying the instruction.
'struct bpf_map_def' is introduced in this patch to let us know how many
maps are defined in the object.
This is the third part of map relocation. The principle of map relocation
is described in commit message of 'bpf tools: Collect symbol table from
SHT_SYMTAB section'.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-15-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch collects symbols section. This section is useful when linking
BPF maps.
What 'bpf_map_xxx()' functions actually require are map's file
descriptors (and the internal verifier converts fds into pointers to
'struct bpf_map'), which we don't know when compiling. Therefore, we
should make compiler generate a 'ldr_64 r1, <imm>' instruction, and
fill the 'imm' field with the actual file descriptor when loading in
libbpf.
BPF programs should be written in this way:
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(unsigned long),
.value_size = sizeof(unsigned long),
.max_entries = 1000000,
};
SEC("my_func=sys_write")
int my_func(void *ctx)
{
...
bpf_map_update_elem(&my_map, &key, &value, BPF_ANY);
...
}
Compiler should convert '&my_map' into a 'ldr_64, r1, <imm>'
instruction, where imm should be the address of 'my_map'. According to
the address, libbpf knows which map it actually referenced, and then
fills the imm field with the 'fd' of that map created by it.
However, since we never really 'link' the object file, the imm field is
only a record in relocation section. Therefore libbpf should do the
relocation:
1. In relocation section (type == SHT_REL), positions of each such
'ldr_64' instruction are recorded with a reference of an entry in
symbol table (SHT_SYMTAB);
2. From records in symbol table we can find the indics of map
variables.
Libbpf first record SHT_SYMTAB and positions of each instruction which
required bu such operation. Then create file descriptor. Finally, after
map creation complete, replace the imm field.
This is the first patch of BPF map related stuff. It records SHT_SYMTAB
into object's efile field for further use.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch defines basic interface of libbpf. 'struct bpf_object' will
be the handler of each object file. Its internal structure is hide to
user. eBPF object files are compiled by LLVM as ELF format. In this
patch, libelf is used to open those files, read EHDR and do basic
validation according to e_type and e_machine.
All elf related staffs are grouped together and reside in efile field of
'struct bpf_object'. bpf_object__elf_finish() is introduced to clear it.
After all eBPF programs in an object file are loaded, related ELF
information is useless. Close the object file and free those memory.
The zfree() and zclose() functions are introduced to ensure setting NULL
pointers and negative file descriptors after resources are released.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-6-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf tools have a symbol resolver that includes solving kernel
symbols using either kallsyms or ELF symtabs, and it also is using
libtraceevent to format the trace events fields, including via
subsystem specific plugins, like the "timer" one.
To solve fields like "timer:hrtimer_start"'s "function", libtraceevent
needs a way to map from its value to a function name and addr.
This patch provides a way for tools that already have symbol resolving
facilities to ask libtraceevent to use it when needing to resolve
kernel symbols.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-fdx1fazols17w5py26ia3bwh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that we have two mountpoints, one for debugfs and another, for
tracefs, we end up needing to check permissions for both, so, on
a system with default config we were always asking the user to
check the permission of the debugfs mountpoint, even when it was
already sufficient. Fix it.
E.g.:
$ trace -e nanosleep usleep 1
Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'
$ sudo mount -o remount,mode=755 /sys/kernel/debug
$ trace -e nanosleep usleep 1
Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'
$ sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
$ trace -e nanosleep usleep 1
0.326 ( 0.061 ms): usleep/11961 nanosleep(rqtp: 0x7ffef1081c50) = 0
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-0viljeuhc7q84ic8kobsna43@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Traceevent plugins need dynamic symbols exported from libtraceevent.a,
otherwise a dlopen error will occur during plugins loading.
This patch uses dynamic-list-file to export dynamic symbols which will
be used in plugins to perf executable.
The problem is covered up if feature-libpython is enabled, because
PYTHON_EMBED_LDOPTS contains '-Xlinker --export-dynamic' which adds all
symbols to the dynamic symbol table. So we should reproduce the problem
by setting NO_LIBPYTHON=1.
Before this patch:
(Prepare plugins)
$ ls /root/.traceevent/plugins/
plugin_sched_switch.so
plugin_function.so
...
$ perf record -e 'ftrace:function' ls
$ perf script
Warning: could not load plugin '/mnt/data/root/.traceevent/plugins/plugin_sched_switch.so'
/root/.traceevent/plugins/plugin_sched_switch.so: undefined symbol: pevent_unregister_event_handler
Warning: could not load plugin '/root/.traceevent/plugins/plugin_function.so'
/root/.traceevent/plugins/plugin_function.so: undefined symbol: warning
...
:1049 1049 [000] 9666.754487: ftrace:function: ffffffff8118bc50 <-- ffffffff8118c5b3
:1049 1049 [000] 9666.754487: ftrace:function: ffffffff818e2440 <-- ffffffff8118bc75
:1049 1049 [000] 9666.754487: ftrace:function: ffffffff8106eee0 <-- ffffffff811212e2
After this patch:
$ perf record -e 'ftrace:function' ls
$ perf script
:1049 1049 [000] 9666.754487: ftrace:function: __set_task_comm
:1049 1049 [000] 9666.754487: ftrace:function: _raw_spin_lock
:1049 1049 [000] 9666.754487: ftrace:function: task_tgid_nr_ns
...
Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1432819735-35040-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Before this patch, 'make install' installs libraries into bindir:
$ make install DESTDIR=./tree
INSTALL trace_plugins
INSTALL libtraceevent.a
INSTALL libtraceevent.so
$ find ./tree
./tree/
./tree/usr
./tree/usr/local
./tree/usr/local/bin
./tree/usr/local/bin/libtraceevent.a
./tree/usr/local/bin/libtraceevent.so
...
/usr/local/lib( or lib64) should be a better place.
This patch replaces 'bin' with libdir. For __LP64__ building, libraries
are installed to /usr/local/lib64. For other building, to
/usr/local/lib instead.
After applying this patch:
$ make install DESTDIR=./tree
INSTALL trace_plugins
INSTALL libtraceevent.a
INSTALL libtraceevent.so
$ find ./tree
./tree
./tree/usr
./tree/usr/local
./tree/usr/local/lib64
./tree/usr/local/lib64/libtraceevent.a
./tree/usr/local/lib64/traceevent
./tree/usr/local/lib64/traceevent/plugins
./tree/usr/local/lib64/traceevent/plugins/plugin_mac80211.so
./tree/usr/local/lib64/traceevent/plugins/plugin_hrtimer.so
...
./tree/usr/local/lib64/libtraceevent.so
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pi3orama@163.com
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1431860222-61636-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If we try to cross compile liblockdep, even if we set the CROSS_COMPILE variable
the linker error can occur because LD is not set with CROSS_COMPILE.
This patch adds "LD" can be set automatically with CROSS_COMPILE variable so
fixes linker error problem.
Signed-off-by: Eunbong Song <eunb.song@samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
PMU driver hardware-enablement addition"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Fix segfault if passed with ''.
perf report: Fix -T/--threads option to work again
perf bench numa: Fix immediate meeting of convergence condition
perf bench numa: Fixes of --quiet argument
perf bench futex: Fix hung wakeup tasks after requeueing
perf probe: Fix bug with global variables handling
perf top: Fix a segfault when kernel map is restricted.
tools lib traceevent: Fix build failure on 32-bit arch
perf kmem: Fix compiles on RHEL6/OL6
tools lib api: Undefine _FORTIFY_SOURCE before setting it
perf kmem: Consistently use PRIu64 for printing u64 values
perf trace: Disable events and drain events when forked workload ends
perf trace: Enable events when doing system wide tracing and starting a workload
perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Improve --filter support for 'perf probe', allowing using its arguments
on other commands, as --add, --del, etc (Masami Hiramatsu)
- Show warning when running 'perf kmem stat' on a unsuitable perf.data file,
i.e. one with events that are not the ones required for the stat variant
used (Namhyung Kim).
Infrastructure changes:
- Auxtrace support patches, paving the way to support Intel PT and BTS (Adrian Hunter)
- hists browser (top, report) refactorings (Namhyung Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In my i386 build, it failed like this:
CC event-parse.o
event-parse.c: In function 'print_str_arg':
event-parse.c:3868:5: warning: format '%lu' expects argument of type 'long unsigned int',
but argument 3 has type 'uint64_t' [-Wformat]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Javi Merino <javi.merino@arm.com>
Link: http://lkml.kernel.org/r/20150424020218.GF1905@sejong
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf changes from Ingo Molnar:
"Core kernel changes:
- One of the more interesting features in this cycle is the ability
to attach eBPF programs (user-defined, sandboxed bytecode executed
by the kernel) to kprobes.
This allows user-defined instrumentation on a live kernel image
that can never crash, hang or interfere with the kernel negatively.
(Right now it's limited to root-only, but in the future we might
allow unprivileged use as well.)
(Alexei Starovoitov)
- Another non-trivial feature is per event clockid support: this
allows, amongst other things, the selection of different clock
sources for event timestamps traced via perf.
This feature is sought by people who'd like to merge perf generated
events with external events that were measured with different
clocks:
- cluster wide profiling
- for system wide tracing with user-space events,
- JIT profiling events
etc. Matching perf tooling support is added as well, available via
the -k, --clockid <clockid> parameter to perf record et al.
(Peter Zijlstra)
Hardware enablement kernel changes:
- x86 Intel Processor Trace (PT) support: which is a hardware tracer
on steroids, available on Broadwell CPUs.
The hardware trace stream is directly output into the user-space
ring-buffer, using the 'AUX' data format extension that was added
to the perf core to support hardware constraints such as the
necessity to have the tracing buffer physically contiguous.
This patch-set was developed for two years and this is the result.
A simple way to make use of this is to use BTS tracing, the PT
driver emulates BTS output - available via the 'intel_bts' PMU.
More explicit PT specific tooling support is in the works as well -
will probably be ready by 4.2.
(Alexander Shishkin, Peter Zijlstra)
- x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
feature of Intel Xeon CPUs that allows the measurement and
allocation/partitioning of caches to individual workloads.
These kernel changes expose the measurement side as a new PMU
driver, which exposes various QoS related PMU events. (The
partitioning change is work in progress and is planned to be merged
as a cgroup extension.)
(Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
Waskiewicz Jr)
- x86 Intel Haswell LBR call stack support: this is a new Haswell
feature that allows the hardware recording of call chains, plus
tooling support. To activate this feature you have to enable it
via the new 'lbr' call-graph recording option:
perf record --call-graph lbr
perf report
or:
perf top --call-graph lbr
This hardware feature is a lot faster than stack walk or dwarf
based unwinding, but has some limitations:
- It reuses the current LBR facility, so LBR call stack and
branch record can not be enabled at the same time.
- It is only available for user-space callchains.
(Yan, Zheng)
- x86 Intel Broadwell CPU support and various event constraints and
event table fixes for earlier models.
(Andi Kleen)
- x86 Intel HT CPUs event scheduling workarounds. This is a complex
CPU bug affecting the SNB,IVB,HSW families that results in counter
value corruption. The mitigation code is automatically enabled and
is transparent.
(Maria Dimakopoulou, Stephane Eranian)
The perf tooling side had a ton of changes in this cycle as well, so
I'm only able to list the user visible changes here, in addition to
the tooling changes outlined above:
User visible changes affecting all tools:
- Improve support of compressed kernel modules (Jiri Olsa)
- Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
- Bash completion for subcommands (Yunlong Song)
- Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
- Support missing -f to override perf.data file ownership. (Yunlong Song)
- Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)
User visible changes in individual tools:
'perf data':
New tool for converting perf.data to other formats, initially
for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
Sebastian Siewior)
'perf diff':
Add --kallsyms option (David Ahern)
'perf list':
Allow listing events with 'tracepoint' prefix (Yunlong Song)
Sort the output of the command (Yunlong Song)
'perf kmem':
Respect -i option (Jiri Olsa)
Print big numbers using thousands' group (Namhyung Kim)
Allow -v option (Namhyung Kim)
Fix alignment of slab result table (Namhyung Kim)
'perf probe':
Support multiple probes on different binaries on the same command line (Masami Hiramatsu)
Support unnamed union/structure members data collection. (Masami Hiramatsu)
Check kprobes blacklist when adding new events. (Masami Hiramatsu)
'perf record':
Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)
Support recording running/enabled time (Andi Kleen)
'perf sched':
Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)
'perf report' and 'perf top':
Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)
Indicate which callchain entries are annotated in the
TUI hists browser (Arnaldo Carvalho de Melo)
Add pid/tid filtering to 'report' and 'script' commands (David Ahern)
Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
events (Arnaldo Carvalho de Melo)
'perf stat':
Report unsupported events properly (Suzuki K. Poulose)
Output running time and run/enabled ratio in CSV mode (Andi Kleen)
'perf trace':
Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)
Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)
Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)
Dump stack on segfaults (Arnaldo Carvalho de Melo)
No need to explicitely enable evsels for workload started from perf, let it
be enabled via perf_event_attr.enable_on_exec, removing some events that take
place in the 'perf trace' before a workload is really started by it.
(Arnaldo Carvalho de Melo)
Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)
There's also been a ton of infrastructure work done, such as the
split-out of perf's build system into tools/build/ and other changes -
see the shortlog and changelog for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
perf evlist: Fix type for references to data_head/tail
perf probe: Check the orphaned -x option
perf probe: Support multiple probes on different binaries
perf buildid-list: Fix segfault when show DSOs with hits
perf tools: Fix cross-endian analysis
perf tools: Fix error path to do closedir() when synthesizing threads
perf tools: Fix synthesizing fork_event.ppid for non-main thread
perf tools: Add 'I' event modifier for exclude_idle bit
perf report: Don't call map__kmap if map is NULL.
perf tests: Fix attr tests
perf probe: Fix ARM 32 building error
perf tools: Merge all perf_event_attr print functions
perf record: Add clockid parameter
perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
perf sched replay: Support using -f to override perf.data file ownership
perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
...
Currently it ignores operator priority and just sets processed args as a
right operand. But it could result in priority inversion in case that
the right operand is also a operator arg and its priority is lower.
For example, following print format is from new kmem events.
"page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
But this was treated as below:
REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
In this case, the right arg was '?' operator which has lower priority.
But it just sets the whole arg so making the output confusing - page was
always 0 or 1 since that's the result of logical operation.
With this patch, it can handle it properly like following:
((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-10-git-send-email-namhyung@kernel.org
[ Replaced 'swap' with 'rotate' in a comment as requested by Steve and agreed by Namhyung ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Guilherme Cox found that:
There is, however, a potential bug if there is an item with code zero
that is not the first one in the symbol list, since eval_flag(..)
returns 0 when it doesn't find anything.
That is, if you have the following enums:
enum {
FOO_START = 0,
FOO_GO = 1,
FOO_END = 2
}
and then have:
__print_symbolic(foo, FOO_GO, "go", FOO_START, "start",
FOO_END, "end")
If none of the enums are known to pevent, then eval_flag() will return
zero, and it will match it to the first item in the list, which would be
FOO_GO, which is not zero.
Luckily, in most cases, the first element would be zero, and the parsing
would match out of sheer luck.
Reported-by: Guilherme Cox <cox@computer.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20150324145813.0bfe95ba@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a plugin option is defined, by default it is a boolean (true or false).
If the option is something else, then it needs to set its "value" field to
a default string other than NULL (can be just "").
If the value is not set then the option is considered boolean, and the
updating of the option value will be handled accordingly.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20150324135923.308372986@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is a pevent_data_comm_from_pid() that returns the cmdline stored for
a given pid in order for users to map pids to comms, but there's no method
to convert a comm back to a pid. This is useful for filters that specify
a comm instead of a PID (it's faster than searching each individual event).
Add a way to retrieve a comm from a pid. Since there can be more than one
pid associated to a comm, it returns a data structure that lets the user
iterate over all the saved comms for a given pid.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20150324135923.001103479@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>