As specified in tools/perf/Documentation/perf-config.txt, perf
configuration items must be in 'key = value' format, otherwise the
following error message occurs:
$ perf record -e intel_pt//u -- ls
bad config file line 2 in ~/.perfconfig
$ cat .perfconfig
[intel-pt]
mispred-all
Changing to assigning a value to the key 'mispred-all' fixes the issue:
$ perf record -e intel_pt//u -- ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Capured and wrote 0.031 MB perf.data]
$ cat .perfconfig
[intel-pt]
mispred-all = true
Signed-off-by: Jack Henschel <jackdev@mailbox.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170831080535.2157-1-jackdev@mailbox.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add header record types to pipe-mode, reusing the functions
used in file-mode and leveraging the new struct feat_fd.
For alignment, check that synthesized events don't exceed
pagesize.
Add the perf_event__synthesize_feature event call back to
process the new header records.
Before this patch:
$ perf record -o - -e cycles sleep 1 | perf report --stdio --header
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
...
After this patch:
$ perf record -o - -e cycles sleep 1 | perf report --stdio --header
# ========
# captured on: Mon May 22 16:33:43 2017
# ========
#
# hostname : my_hostname
# os release : 4.11.0-dbx-up_perf
# perf version : 4.11.rc6.g6277c80
# arch : x86_64
# nrcpus online : 72
# nrcpus avail : 72
# cpudesc : Intel(R) Xeon(R) CPU E5-2696 v3 @ 2.30GHz
# cpuid : GenuineIntel,6,63,2
# total memory : 263457192 kB
# cmdline : /root/perf record -o - -e cycles -c 100000 sleep 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: intel_bts = 6, uncore_imc_4 = 22, uncore_sbox_1 = 47, uncore_cbox_5 = 33, uncore_ha_0 = 16, uncore_cbox
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
...
Support added for the subcommands: report, inject, annotate and script.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170718042549.145161-16-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implementing a new --smi-cost mode in perf stat to measure SMI cost.
During the measurement, the /sys/device/cpu/freeze_on_smi will be set.
The measurement can be done with one counter (unhalted core cycles), and
two free running MSR counters (IA32_APERF and SMI_COUNT).
In practice, the percentages of SMI core cycles should be more useful
than absolute value. So the output will be the percentage of SMI core
cycles and SMI#. metric_only will be set by default.
SMI cycles% = (aperf - unhalted core cycles) / aperf
Here is an example output.
Performance counter stats for 'sudo echo ':
SMI cycles% SMI#
0.1% 1
0.010858678 seconds time elapsed
Users who wants to get the actual value can apply additional
--no-metric-only.
Signed-off-by: Kan Liang <Kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1495825538-5230-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The idea here is to make AutoFDO easier in cloud environment with ASLR.
It's easiest to show how this is useful by example. I built a small test
akin to "while(1) { do_nothing(); }" where the do_nothing function is
loaded from a dso:
$ cat burncpu.cpp
#include <dlfcn.h>
int main() {
void* handle = dlopen("./dso.so", RTLD_LAZY);
if (!handle) return -1;
typedef void (*fp)();
fp do_nothing = (fp) dlsym(handle, "do_nothing");
while(1) {
do_nothing();
}
}
$ cat dso.cpp
extern "C" void do_nothing() {}
$ cat build.sh
#!/bin/bash
g++ -shared dso.cpp -o dso.so
g++ burncpu.cpp -o burncpu -ldl
I sampled the execution of this program with perf record -b.
Using the existing "brstack,dso", we get absolute addresses that are
affected by ASLR, and could be different on different hosts. The address
does not uniquely identify a branch/target in the binary:
$ perf script -F brstack,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1
0x7f967139b6aa(/tmp/burncpu/dso.so)/0x4006b1(/tmp/burncpu/exe)/P/-/-/0
Using the existing "brstacksym,dso" is a little better, because the
symbol plus offset and dso name *does* uniquely identify a branch/target
in the binary. Ultimately, however, AutoFDO wants a simple offset into
the binary, so we'd have to undo all the work perf did to symbolize in
the first place:
$ perf script -F brstacksym,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1
do_nothing+0x5(/tmp/burncpu/dso.so)/main+0x44(/tmp/burncpu/exe)/P/-/-/0
With the new "brstackoff,dso" we get what we need: a simple offset into a
specific dso/binary that uniquely identifies a branch/target:
$ perf script -F brstackoff,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1
0x6aa(/tmp/burncpu/dso.so)/0x4006b1(/tmp/burncpu/exe)/P/-/-/0
Signed-off-by: Mark Santaniello <marksan@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170619163825.2012979-2-marksan@fb.com
[ Updated documentation about 'brstackoff' using text from above ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With 'perf script' it is common that we just want to add or remove a field.
Currently this requires figuring out the long list of default fields and
specifying them first, and then adding/removing the new field.
This patch adds a new + - syntax to merely add or remove fields,
that allows more succint and clearer command lines
For example to remove the comm field from PMU samples:
Previously
$ perf script -F tid,cpu,time,event,sym,ip,dso,period | head -1
swapper 0 [000] 504345.383126: 1 cycles: ffffffff90060c66 native_write_msr ([kernel.kallsyms])
with the new syntax
perf script -F -comm | head -1
0 [000] 504345.383126: 1 cycles: ffffffff90060c66 native_write_msr ([kernel.kallsyms])
The new syntax cannot be mixed with normal overriding.
v2: Fix example in description. Use tid vs pid. No functional changes.
v3: Don't skip initialization when user specified explicit type.
v4: Rebase. Remove empty line.
Committer testing:
# perf record -a usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.748 MB perf.data (14 samples) ]
Without a explicit field list specified via -F, defaults to:
# perf script | head -2
perf 6338 [000] 18467.058607: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
swapper 0 [001] 18467.058617: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
#
Which is equivalent to:
# perf script -F comm,tid,cpu,time,period,event,ip,sym,dso | head -2
perf 6338 [000] 18467.058607: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
swapper 0 [001] 18467.058617: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
#
So if we want to remove the comm, as in your original example, we would have to
figure out the default field list and remove ' comm' from it:
# perf script -F tid,cpu,time,period,event,ip,sym,dso | head -2
6338 [000] 18467.058607: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
0 [001] 18467.058617: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
#
With your patch this becomes simpler, one can remove fields by prefixing them
with '-':
# perf script -F -comm | head -2
6338 [000] 18467.058607: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
0 [001] 18467.058617: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
#
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Link: http://lkml.kernel.org/r/20170602154810.15875-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace supports --no-syscalls option but it's not listed in the man
page. (Though, I see an example using --no-syscalls in EXAMPLES
section.)
Committer note:
The --no-syscalls option tells 'perf trace' not to automagically ask for
raw_syscalls:sys_{enter,exit} to then format it in a strace like way.
This become more used as 'perf trace' got support for arbitrary events,
such as tracepoints, so more and more we use:
# perf trace --no-syscalls -e nmi:*
0.000 nmi:nmi_handler:perf_event_nmi_handler() delta_ns: 36649 handled: 1)
0.019 nmi:nmi_handler:nmi_cpu_backtrace_handler() delta_ns: 2907 handled: 0)
0.676 nmi:nmi_handler:perf_event_nmi_handler() delta_ns: 9401 handled: 1)
0.680 nmi:nmi_handler:nmi_cpu_backtrace_handler() delta_ns: 288 handled: 0)
0.701 nmi:nmi_handler:perf_event_nmi_handler() delta_ns: 4977 handled: 1)
0.703 nmi:nmi_handler:nmi_cpu_backtrace_handler() delta_ns: 67 handled: 0)
0.736 nmi:nmi_handler:perf_event_nmi_handler() delta_ns: 8549 handled: 1)
^C#
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1492063332-5745-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Often it is interesting to know how costly a given source line is in
total. Previously, one had to build these sums manually based on all
addresses that pointed to the same source line. This patch introduces
srcline as a sort key, which will do the aggregation for us.
Paired with the recent addition of showing inline frames, this makes
perf report much more useful for many C++ work loads.
The following shows the new feature in action. First, let's show the
status quo output when we sort by address. The result contains many hist
entries that generate the same output:
~~~~~~~~~~~~~~~~
$ perf report --stdio --inline -g address
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................... .........................................
#
99.89% 35.34% cpp-inlining cpp-inlining [.] main
|
|--64.55%--main complex:655
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/complex:664 (inline)
| |
| |--60.31%--hypot +20
| | |
| | |--8.52%--__hypot_finite +273
| | |
| | |--7.32%--__hypot_finite +411
...
--35.34%--_start +4194346
__libc_start_main +241
|
|--6.65%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
|
|--2.70%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
|
|--1.69%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
...
~~~~~~~~~~~~~~~~
With this patch and `-g srcline` we instead get the following output:
~~~~~~~~~~~~~~~~
$ perf report --stdio --inline -g srcline
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................... .........................................
#
99.89% 35.34% cpp-inlining cpp-inlining [.] main
|
|--64.55%--main complex:655
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/complex:664 (inline)
| |
| |--64.02%--hypot
| | |
| | --59.81%--__hypot_finite
| |
| --0.53%--cabs
|
--35.34%--_start
__libc_start_main
|
|--12.48%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
...
~~~~~~~~~~~~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move the printing of perf expressions and internal events to a new
clearer --details flag, instead of lumping it together with other debug
options in --debug. This makes it clearer to use.
Before
perf list --debug
...
unc_m_power_critical_throttle_cycles
[Cycles all ranks are in critical thermal throttle. Unit: uncore_imc]
uncore_imc_2/event=0x86/ MetricName: power_critical_throttle_cycles % MetricExpr: (unc_m_power_critical_throttle_cycles / unc_m_clockticks) * 100.
after
perf list --details
...
unc_m_power_critical_throttle_cycles
[Cycles all ranks are in critical thermal throttle. Unit: uncore_imc]
uncore_imc_2/event=0x86/ MetricName: power_critical_throttle_cycles % MetricExpr: (unc_m_power_critical_throttle_cycles / unc_m_clockticks) * 100.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20170320201711.14142-14-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The uncore PMU has a lot of duplicated PMUs for different subsystems.
When expanding an uncore alias we usually end up with a large
number of identically named aliases, which makes perf stat
output difficult to read.
Automatically sum them up in perf stat, unless --no-merge is specified.
This can be default because only the uncores generally have duplicated
aliases. Other PMUs have unique names.
Before:
% perf stat --no-merge -a -e unc_c_llc_lookup.any sleep 1
Performance counter stats for 'system wide':
694,976 Bytes unc_c_llc_lookup.any
706,304 Bytes unc_c_llc_lookup.any
956,608 Bytes unc_c_llc_lookup.any
782,720 Bytes unc_c_llc_lookup.any
605,696 Bytes unc_c_llc_lookup.any
442,816 Bytes unc_c_llc_lookup.any
659,328 Bytes unc_c_llc_lookup.any
509,312 Bytes unc_c_llc_lookup.any
263,936 Bytes unc_c_llc_lookup.any
592,448 Bytes unc_c_llc_lookup.any
672,448 Bytes unc_c_llc_lookup.any
608,640 Bytes unc_c_llc_lookup.any
641,024 Bytes unc_c_llc_lookup.any
856,896 Bytes unc_c_llc_lookup.any
808,832 Bytes unc_c_llc_lookup.any
684,864 Bytes unc_c_llc_lookup.any
710,464 Bytes unc_c_llc_lookup.any
538,304 Bytes unc_c_llc_lookup.any
1.002577660 seconds time elapsed
After:
% perf stat -a -e unc_c_llc_lookup.any sleep 1
Performance counter stats for 'system wide':
2,685,120 Bytes unc_c_llc_lookup.any
1.002648032 seconds time elapsed
v2: Split collect_aliases. Rename alias flag.
v3: Make sure unsupported/not counted is always printed.
v4: Factor out callback change into separate patch.
v5: Move check for bad results here
Move merged check into collect_data
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-3-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implement printing instruction sequences as hex dump for branch stacks.
This relies on the x86 instruction decoder used by the PT decoder to
find the lengths of instructions to dump them individually.
This is good enough for pattern matching.
This allows to study hot paths for individual samples, together with
branch misprediction and cycle count / IPC information if available (on
Skylake systems).
% perf record -b ...
% perf script -F brstackinsn
...
read_hpet+67:
ffffffff9905b843 insn: 74 ea # PRED
ffffffff9905b82f insn: 85 c9
ffffffff9905b831 insn: 74 12
ffffffff9905b833 insn: f3 90
ffffffff9905b835 insn: 48 8b 0f
ffffffff9905b838 insn: 48 89 ca
ffffffff9905b83b insn: 48 c1 ea 20
ffffffff9905b83f insn: 39 f2
ffffffff9905b841 insn: 89 d0
ffffffff9905b843 insn: 74 ea # PRED
Only works when no special branch filters are specified.
Occasionally the path does not reach up to the sample IP, as the LBRs
may be frozen before executing a final jump. In this case we print a
special message.
The instruction dumper piggy backs on the existing infrastructure from
the IP PT decoder.
An earlier iteration of this patch relied on a disassembler, but this
version only uses the existing instruction decoder.
Committer note:
Added hint about how to get suitable perf.data files for use with
'-F brstackinsm':
$ perf record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (8 samples) ]
$
$ perf script -F brstackinsn
Display of branch stack assembler requested, but non all-branch filter set
Hint: run 'perf record -b ...'
$
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/20170223234634.583-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the device
number and inode number of cgroup namespace, included in perf data with
the new PERF_RECORD_NAMESPACES event, as cgroup identifier.
With the assumption that each container is created with it's own cgroup
namespace, this allows assessment/analysis of multiple containers at
once.
A simple test for this would be to clone a few processes passing
SIGCHILD & CLONE_NEWCROUP flags to each of them, execute shell and run
different workloads on each of those contexts, while running perf
record command with --namespaces option.
Shown below is the output of perf report, sorted with cgroup identifier,
on perf.data generated with the above test scenario, clearly indicating
one context's considerable use of kernel memory in comparison with
others:
$ perf report -s cgroup_id,sample --stdio
#
# Total Lost Samples: 0
#
# Samples: 5K of event 'kmem:kmalloc'
# Event count (approx.): 5965
#
# Overhead cgroup id (dev/inode) Samples
# ........ ..................... ............
#
81.27% 3/0xeffffffb 4848
16.24% 3/0xf00000d0 969
1.16% 3/0xf00000ce 69
0.82% 3/0xf00000cf 49
0.50% 0/0x0 30
While this is a start, there is further scope of improving this. For
example, instead of cgroup namespace's device and inode numbers, dev
and inode numbers of some or all namespaces may be used to distinguish
which processes are running in a given container context.
Also, scripts to map device and inode info to containers sounds
plausible for better tracing of containers.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891933338.25309.756882900782042645.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
by the kernel when fork, clone, setns or unshare are invoked. And update
perf-record documentation with the new option to record namespace
events.
Committer notes:
Combined it with a later patch to allow printing it via 'perf report -D'
and be able to test the feature introduced in this patch. Had to move
here also perf_ns__name(), that was introduced in another later patch.
Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
^
Testing it:
# perf record --namespaces -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
#
# perf report -D
<SNIP>
3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
[0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
0x1151e0 [0x30]: event: 9
.
. ... raw event: size 48 bytes
. 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h....
. 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c....
. 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................
<SNIP>
NAMESPACES events: 1
<SNIP>
#
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -p (--pid) option enables to trace existing process by its pid.
Committer notes:
Testing it:
Using the function_graph tracer on a process that is just waiting for user
input and thus will make 'perf ftrace' sit there waiting for that, then press
any key on that mutt session and see what happens:
# perf ftrace -t function_graph -p `pidof mutt` | head -40
2) 1.038 us | switch_mm_irqs_off();
------------------------------------------
2) <idle>-0 => mutt-3595
------------------------------------------
2) | finish_task_switch() {
2) | smp_irq_work_interrupt() {
2) | irq_enter() {
2) 0.180 us | rcu_irq_enter();
2) 1.248 us | }
2) | __wake_up() {
2) 0.126 us | _raw_spin_lock_irqsave();
2) | __wake_up_common() {
2) | pollwake() {
2) | default_wake_function() {
2) | try_to_wake_up() {
2) 0.662 us | _raw_spin_lock_irqsave();
2) | select_task_rq_fair() {
2) 1.719 us | effective_load.isra.41();
2) 1.343 us | effective_load.isra.41();
2) | select_idle_sibling() {
2) 0.331 us | idle_cpu();
2) 1.458 us | }
2) 8.350 us | }
2) 0.200 us | _raw_spin_lock();
2) | ttwu_do_activate() {
2) | activate_task() {
2) 0.136 us | update_rq_clock.part.77();
2) | enqueue_task_fair() {
2) | enqueue_entity() {
2) 0.146 us | update_curr();
2) 0.330 us | account_entity_enqueue();
2) 0.280 us | update_cfs_shares();
2) 0.321 us | place_entity();
2) 0.206 us | __enqueue_entity();
2) 6.926 us | }
2) | enqueue_entity() {
2) 0.105 us | update_curr();
2) 0.175 us | account_entity_enqueue();
2) 0.531 us | update_cfs_shares();
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170224011251.14946-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add new sort key 'symbol_size' to allow user to sort by symbol size, or
(more usefully) display the symbol size using --fields=...,symbol_size.
Committer note:
Testing it together with the recently added -q, to remove the headers,
and using the '+' sign with -s, to add the symbol_size sort order to
the default, which is '-s/--sort comm,dso,symbol':
# perf report -q -s +symbol_size | head -10
10.39% swapper [kernel.vmlinux] [k] intel_idle 270
3.45% swapper [kernel.vmlinux] [k] update_blocked_averages 1546
2.61% swapper [kernel.vmlinux] [k] update_load_avg 1292
2.36% swapper [kernel.vmlinux] [k] update_cfs_shares 240
1.83% swapper [kernel.vmlinux] [k] __hrtimer_run_queues 606
1.74% swapper [kernel.vmlinux] [k] update_cfs_rq_load_avg. 1187
1.66% swapper [kernel.vmlinux] [k] apic_timer_interrupt 152
1.60% CPU 0/KVM [kvm] [k] kvm_set_msr_common 3046
1.60% gnome-shell libglib-2.0.so.0 [.] g_slist_find 37
1.46% gnome-termina libglib-2.0.so.0 [.] g_hash_table_lookup 370
#
Signed-off-by: Charles Baylis <charles.baylis@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1487943176-13840-1-git-send-email-charles.baylis@linaro.org
[ Use symbol__size(), remove needless %lld + (long long) casting ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>