Pull perf fixes from Ingo Molnar:
"This tree contains various perf fixes on the kernel side, plus three
hw/event-enablement late additions:
- Intel Memory Bandwidth Monitoring events and handling
- the AMD Accumulated Power Mechanism reporting facility
- more IOMMU events
... and a final round of perf tooling updates/fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
perf llvm: Use strerror_r instead of the thread unsafe strerror one
perf llvm: Use realpath to canonicalize paths
perf tools: Unexport some methods unused outside strbuf.c
perf probe: No need to use formatting strbuf method
perf help: Use asprintf instead of adhoc equivalents
perf tools: Remove unused perf_pathdup, xstrdup functions
perf tools: Do not include stringify.h from the kernel sources
tools include: Copy linux/stringify.h from the kernel
tools lib traceevent: Remove redundant CPU output
perf tools: Remove needless 'extern' from function prototypes
perf tools: Simplify die() mechanism
perf tools: Remove unused DIE_IF macro
perf script: Remove lots of unused arguments
perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
perf machine: Rename perf_event__preprocess_sample to machine__resolve
perf tools: Add cpumode to struct perf_sample
perf tests: Forward the perf_sample in the dwarf unwind test
perf tools: Remove misplaced __maybe_unused
perf list: Fix documentation of :ppp
perf bench numa: Fix assertion for nodes bitfield
...
Pull 'objtool' stack frame validation from Ingo Molnar:
"This tree adds a new kernel build-time object file validation feature
(ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation.
It was written by and is maintained by Josh Poimboeuf.
The motivation: there's a category of hard to find kernel bugs, most
of them in assembly code (but also occasionally in C code), that
degrades the quality of kernel stack dumps/backtraces. These bugs are
hard to detect at the source code level. Such bugs result in
incorrect/incomplete backtraces most of time - but can also in some
rare cases result in crashes or other undefined behavior.
The build time correctness checking is done via the new 'objtool'
user-space utility that was written for this purpose and which is
hosted in the kernel repository in tools/objtool/. The tool's (very
simple) UI and source code design is shaped after Git and perf and
shares quite a bit of infrastructure with tools/perf (which tooling
infrastructure sharing effort got merged via perf and is already
upstream). Objtool follows the well-known kernel coding style.
Objtool does not try to check .c or .S files, it instead analyzes the
resulting .o generated machine code from first principles: it decodes
the instruction stream and interprets it. (Right now objtool supports
the x86-64 architecture.)
From tools/objtool/Documentation/stack-validation.txt:
"The kernel CONFIG_STACK_VALIDATION option enables a host tool named
objtool which runs at compile time. It has a "check" subcommand
which analyzes every .o file and ensures the validity of its stack
metadata. It enforces a set of rules on asm code and C inline
assembly code so that stack traces can be reliable.
Currently it only checks frame pointer usage, but there are plans to
add CFI validation for C files and CFI generation for asm files.
For each function, it recursively follows all possible code paths
and validates the correct frame pointer state at each instruction.
It also follows code paths involving special sections, like
.altinstructions, __jump_table, and __ex_table, which can add
alternative execution paths to a given instruction (or set of
instructions). Similarly, it knows how to follow switch statements,
for which gcc sometimes uses jump tables."
When this new kernel option is enabled (it's disabled by default), the
tool, if it finds any suspicious assembly code pattern, outputs
warnings in compiler warning format:
warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
warning: objtool:__schedule()+0x3c0: duplicate frame pointer save
warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer
... so that scripts that pick up compiler warnings will notice them.
All known warnings triggered by the tool are fixed by the tree, most
of the commits in fact prepare the kernel to be warning-free. Most of
them are bugfixes or cleanups that stand on their own, but there are
also some annotations of 'special' stack frames for justified cases
such entries to JIT-ed code (BPF) or really special boot time code.
There are two other long-term motivations behind this tool as well:
- To improve the quality and reliability of kernel stack frames, so
that they can be used for optimized live patching.
- To create independent infrastructure to check the correctness of
CFI stack frames at build time. CFI debuginfo is notoriously
unreliable and we cannot use it in the kernel as-is without extra
checking done both on the kernel side and on the build side.
The quality of kernel stack frames matters to debuggability as well,
so IMO we can merge this without having to consider the live patching
or CFI debuginfo angle"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
objtool: Only print one warning per function
objtool: Add several performance improvements
tools: Copy hashtable.h into tools directory
objtool: Fix false positive warnings for functions with multiple switch statements
objtool: Rename some variables and functions
objtool: Remove superflous INIT_LIST_HEAD
objtool: Add helper macros for traversing instructions
objtool: Fix false positive warnings related to sibling calls
objtool: Compile with debugging symbols
objtool: Detect infinite recursion
objtool: Prevent infinite recursion in noreturn detection
objtool: Detect and warn if libelf is missing and don't break the build
tools: Support relative directory path for 'O='
objtool: Support CROSS_COMPILE
x86/asm/decoder: Use explicitly signed chars
objtool: Enable stack metadata validation on 64-bit x86
objtool: Add CONFIG_STACK_VALIDATION option
objtool: Add tool to perform compile-time stack metadata validation
x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
sched: Always inline context_switch()
...
In tracepoints, it's possible to print gfp flags in a human-friendly
format through a macro show_gfp_flags(), which defines a translation
array and passes is to __print_flags(). Since the following patch will
introduce support for gfp flags printing in printk(), it would be nice
to reuse the array. This is not straightforward, since __print_flags()
can't simply reference an array defined in a .c file such as mm/debug.c
- it has to be a macro to allow the macro magic to communicate the
format to userspace tools such as trace-cmd.
The solution is to create a macro __def_gfpflag_names which is used both
in show_gfp_flags(), and to define the gfpflag_names[] array in
mm/debug.c.
On the other hand, mm/debug.c also defines translation tables for page
flags and vma flags, and desire was expressed (but not implemented in
this series) to use these also from tracepoints. Thus, this patch also
renames the events/gfpflags.h file to events/mmflags.h and moves the
table definitions there, using the same macro approach as for gfpflags.
This allows translating all three kinds of mm-specific flags both in
tracepoints and printk.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The topology test case of 'perf test' seems to be broken on my x86
system - due to the comparison of a "core-id" with # of CPUs online.
There are 8 online CPUs:
$ cat /sys/devices/system/cpu/online
0-7
but core-ids are not sequential and some core-ids exceed the number
of online CPUs.
$ cat /sys/devices/system/cpu/cpu?/topology/core_id
0
1
9
10
0
1
9
10
Looks like we can safely remove the check. Output before:
$ perf --version
perf version 4.4.rc1.g34258a
$ perf test -v topo
36: Test topology in session :
--- start ---
test child forked, pid 5906
templ file: /tmp/perf-test-vCwWG3
core_id number is too big.You may need to upgrade the perf tool.
test child interrupted
---- end ----
Test topology in session: FAILED!
and after:
$ perf test -v topo
36: Test topology in session :
--- start ---
test child forked, pid 6532
templ file: /tmp/perf-test-y10wFJ
CPU 0, core 0, socket 0
CPU 1, core 1, socket 0
CPU 2, core 9, socket 0
CPU 3, core 10, socket 0
CPU 4, core 0, socket 1
CPU 5, core 1, socket 1
CPU 6, core 9, socket 1
CPU 7, core 10, socket 1
test child finished with 0
---- end ----
Test topology in session: Ok
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/20151203233219.GA27696@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a new mode to only print metrics. Sometimes we don't care about the
raw values, just want the computed metrics. This allows more compact
printing, so with -I each sample is only a single line. This also
allows easier plotting and processing with other tools.
The main target is with using --topdown, but it also works with -T and
standard perf stat. A few metrics are not supported.
To avoiding having to hardcode all the metrics in the code it uses a two
pass approach: first compute dummy metrics and only print the headers in
the print_metric callback. Then use the callback to print the actual
values.
There are some additional changes in the stat printout code to handle
all metrics being on a single line.
One issue is that the column code doesn't know in advance what events
are not supported by the CPU, and it would be hard to find out as this
could change based on dynamic conditions. That causes empty columns in
some cases.
The output can be fairly wide, often you may need more than 80 columns.
Example:
% perf stat -a -I 1000 --metric-only
1.001452803 frontend cycles idle insn per cycle stalled cycles per insn branch-misses of all branches
1.001452803 158.91% 0.66 2.39 2.92%
2.002192321 180.63% 0.76 2.08 2.96%
3.003088282 150.59% 0.62 2.57 2.84%
4.004369835 196.20% 0.98 1.62 3.79%
5.005227314 231.98% 0.84 1.90 4.71%
v2: Lots of updates.
v3: Use slightly narrower columns
v4: Add comment
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1457049458-28956-6-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With all the recently added fields in the perf stat CSV output we should
finally document them in the man page. Do this here.
v2: Fix fields in documentation (Jiri)
v3: fix order of fields again (Jiri)
v4: Change order again.
v5: Document more fields (Jiri)
v6: Move time stamp first
v7: More fixes (Jiri)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1457049458-28956-5-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The context menu in TUI hists browser checks corresponding sort keys
when creating the menu item. But hotkey actions lacks these checks so
it can filter using incorrect info.
For example, default sort key of 'perf top' doesn't contain 'comm' or
'pid' sort key so each hist entry's thread info is not reliable. Thus
it should prohibit using thread filter on 't' key.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1457533253-21419-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The commit 2eafd410e6 ("perf hists browser: Only 'Zoom into thread'
only when sort order has 'pid'") disabled thread filtering in hist
browser for the default sort key. However the he->thread is still valid
even if 'pid' sort key is not given. Only thing it should not use is
the pid (or tid) of the thread. So allow to filter by thread when
'comm' sort key is given and show pid only if 'pid' sort key is given.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1457536490-24084-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When hierarchy mode is enabled, each entry in a hierarchy level shares
the period. IOW an upper level entry's period is the sum of lower level
entries. Thus perf uses only one of them to calculate the total period
of hists. It was lowest-level (leaf) entries but it has a problem when
it comes to filters.
If a filter is applied, entries in the same level will be filtered or
not. But upper level entries still have period of their sum including
filtered one. So total sum of upper level entries will not be same as
sum of lower level entries.
This resulted in entries having more than 100% of overhead and it can be
produced using perf top with filter(s).
Reported-and-Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a command-line filter is applied in hierarchy mode, output is
broken especially when filtering on lower level. The higher level
entries doesn't show up so it's hard to see the results.
Also it needs to handle multi sort keys in a single hierarchy level.
Before:
$ perf report --hierarchy -s 'cpu,{dso,comm}' --comms swapper --stdio
...
# Overhead CPU / Shared Object+Command
# ........... ...........................
#
13.79% [kernel.vmlinux] swapper
31.71% 000
13.80% [kernel.vmlinux] swapper
0.43% [e1000e] swapper
11.89% [kernel.vmlinux] swapper
9.18% [kernel.vmlinux] swapper
After:
# Overhead CPU / Shared Object+Command
# ........... ...............................
#
33.09% 003
13.79% [kernel.vmlinux] swapper
31.71% 000
13.80% [kernel.vmlinux] swapper
0.43% [e1000e] swapper
21.90% 002
11.89% [kernel.vmlinux] swapper
13.30% 001
9.18% [kernel.vmlinux] swapper
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove the union in evsel so that the database id and priv pointer can
be used simultainously without conflicting and crashing.
Detailed Description for the fixed bug follows:
perf script crashes with a segmentation fault on user space tool version
4.5.rc7.ge2857b when using the python database export API. It works
properly in 4.4 and prior versions.
the crash fist appeared in:
cfc8874a48 ("perf script: Process cpu/threads maps")
How to reproduce the bug:
Remove any temporary files left over from a previous crash (if you have
already attemped to reproduce the bug):
$ rm -r test_db-perf-data
$ dropdb test_db
$ perf record timeout 1 yes >/dev/null
$ perf script -s scripts/python/export-to-postgresql.py test_db
Stack Trace:
Program received signal SIGSEGV, Segmentation fault.
__GI___libc_free (mem=0x1) at malloc.c:2929
2929 malloc.c: No such file or directory.
(gdb) bt
at util/stat.c:122
argv=<optimized out>, prefix=<optimized out>) at builtin-script.c:2231
argc=argc@entry=4, argv=argv@entry=0x7fffffffdf70) at perf.c:390
at perf.c:451
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: cfc8874a48 ("perf script: Process cpu/threads maps")
Link: http://lkml.kernel.org/r/1457500314-8912-1-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While building on a Docker container for ubuntu and installing package
by package one ends up with:
MKDIR /tmp/build/util/
CC /tmp/build/util/genelf.o
util/genelf.c:22:19: fatal error: dwarf.h: No such file or directory
#include <dwarf.h>
^
compilation terminated.
mv: cannot stat '/tmp/build/util/.genelf.o.tmp': No such file or directory
Because the jitdump code needs the DWARF related development packages to
be installed. So make it dependent on that so that the build can succeed
without jitdump support.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-le498robnmxd40237wej3w62@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now hpp formats are linked using perf_hpp_list_node when hierarchy is
enabled. Use this info to print entries with multiple sort keys in a
single hierarchy properly.
For example, the below example shows using 4 sort keys with 2 levels.
$ perf report --hierarchy -s '{prev_pid,prev_comm},{next_pid,next_comm}' \
--percent-limit 1 -i perf.data.sched
...
# Overhead prev_pid+prev_comm / next_pid+next_comm
# ........... .......................................
#
22.36% 0 swapper/0
9.48% 17773 transmission-gt
5.25% 109 kworker/0:1H
1.53% 6524 Xephyr
21.39% 17773 transmission-gt
9.52% 0 swapper/0
9.04% 0 swapper/2
1.78% 0 swapper/3
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-6-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This implements having multiple sort keys in a single hierarchy level.
Originally only single sort key is supported for each level, but now
using the group syntax with '{ }', it can set more than one sort key in
one level. Note that now it needs to quote in order to prevent shell
interpretation.
For example:
$ perf report --hierarchy -s '{comm,dso},sym'
...
# Overhead Command / Shared Object / Symbol
# .............. ..........................................
#
48.67% swapper [kernel.vmlinux]
34.42% [k] intel_idle
1.30% [k] __tick_nohz_idle_enter
1.03% [k] cpuidle_reflect
8.87% firefox libpthread-2.22.so
6.60% [.] __GI___libc_recvmsg
1.18% [.] pthread_cond_signal@@GLIBC_2.3.2
1.09% [.] 0x000000000000ff4b
6.11% Xorg libc-2.22.so
5.27% [.] __memcpy_sse2_unaligned
In the above example, the command name and the shared object name are
shown on the same line but the symbol name is on the different line.
Since the first two are grouped by '{}', they are in the same level.
Suggested-and-Tested=by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-4-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>