Occassionally events (e.g., context-switch, sched tracepoints) are losing
the conversion of sample data associated with a thread. For example:
$ perf record -e sched:sched_switch -c 1 -a -- sleep 5
$ perf script
<selected events shown>
ls 30482 [000] 1379727.583037: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
ls 30482 [000] 1379727.586339: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
:30482 30482 [000] 1379727.589462: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
The last line lost the conversion from tid to comm. If you look at the events
(perf script -D) you see why - a SAMPLE event is generated after the EXIT:
0 1379727589449774 0x1540b0 [0x38]: PERF_RECORD_EXIT(30482:30482):(30482:30482)
0 1379727589462497 0x1540e8 [0x80]: PERF_RECORD_SAMPLE(IP, 1): 30482/30482: 0xffffffff816416f1 period: 1 addr: 0
... thread: :30482:30482
When perf processes the EXIT event the thread is moved to the dead_threads
list. When the SAMPLE event is processed no thread exists for the pid so a new
one is created by machine__findnew_thread.
This patch address the problem by delaying the move to the dead_threads list
until the tid is re-used (per Adrian's suggestion).
With this patch we get the previous example shows:
ls 30482 [000] 1379727.583037: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
ls 30482 [000] 1379727.586339: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
ls 30482 [000] 1379727.589462: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
and
0 1379727589449774 0x1540b0 [0x38]: PERF_RECORD_EXIT(30482:30482):(30482:30482)
0 1379727589462497 0x1540e8 [0x80]: PERF_RECORD_SAMPLE(IP, 1): 30482/30482: 0xffffffff816416f1 period: 1 addr: 0
... thread: ls:30482
v4: per Arnaldo's request add dead flag to thread struct and set when task exits
v3: re-do from a time based check to a delayed move to dead_threads list
v2: Rebased to latest perf/core branch. Changed time comparison to use
a macro which explicitly shows the time basis
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1376491767-84171-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf stat -a needs 10 open file descriptors per logical CPU
perf stat -a -dddd needs 20 open fds for each.
This implies that stat -a doesn't work on any system with the default
ulimit -n 1024 which has more than ~100 CPUs and stat -a -dddd doesn't
work on anything with more than 46 CPUs.
Longer term there needs to be probably some way to lower the file
descriptor requirements. This would need some changes in the kernel/user
interface.
But short term this patch just tries to increase the file descriptor
limit in perf itself, when it runs into a EMFILE.
It first sets it to the hard limit, and then tries to increase the hard
limit.
On Fedora systems the default seems to be soft limit 1024 and hard limit
4*1024. So even non root can support 409 or 186 CPUs respectively. root
can go far higher.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375670486-15480-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit b55ae0a9 added code-reading.c which fails to compile on Fedora 16
with compiler version:
$ gcc --version
gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2)
Failure message is:
tests/code-reading.c: In function ‘do_sort_something’:
tests/code-reading.c:305:13: error: stack protector not protecting local variables: variable length buffer [-Werror=stack-protector]
cc1: all warnings being treated as errors
make: *** [/tmp/junk/tests/code-reading.o] Error 1
make: *** Waiting for unfinished jobs....
v2: as Adrian noticed changed sizeof to ARRAY_SIZE
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1376454732-83728-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit adds a test of instruction counting using the PMU on powerpc.
Although the bulk of the code is architecture agnostic, the code needs to
run a precisely sized loop which is implemented in assembler.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This commit adds a powerpc subdirectory to tools/testing/selftests,
for tests that are powerpc specific.
On other architectures nothing is built. The makefile supports cross
compilation if the user sets ARCH and CROSS_COMPILE.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is no need to have a nlmsghdr pointer to another temporary buffer.
Instead use a full struct nlmsghdr.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
netlink_send is supposed to send just the cn_msg+hv_kvp_msg via netlink.
Currently it sets an incorrect iovec size, as reported by valgrind.
In the case of registering with the kernel the allocated buffer is large
enough to hold nlmsghdr+cn_msg+hv_kvp_msg, no overrun happens. In the
case of responding to the kernel the cn_msg is located in the middle of
recv_buffer, after the nlmsghdr. Currently the code in netlink_send adds
also the size of nlmsghdr to the payload. But nlmsghdr is a separate
iovec. This leads to an (harmless) out-of-bounds access when the kernel
processes the iovec. Correct the iovec size of the cn_msg to be just
cn_msg + its payload.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The next commit gets conflicts because it relies on patches which were
cc:stable and thus had to be merged into Linus' tree before the coming
merge window. So pull in master now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For some types of work loads and special guest environments, you might
have a kernel that has no kernel modules. The perf kvm record tool
fails instantiate vmlinux maps when the kernel modules directory cannot
be opened, even though the kallsyms has been properly processed. This
leads to a perf kvm report that has no guest symbols resolved.
This patch changes the failure to locate kernel modules to be non-fatal.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1373920073-4874-1-git-send-email-jason.wessel@windriver.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit adds support for a new modifier "D", which requests that the
event, or group of events, be pinned to the PMU.
The "p" modifier is already taken for precise, and "P" may be used in
future to mean "fully precise".
So we use "D", which stands for pinneD - and looks like a padlock, or if
you're using the ":D" syntax perf smiles at you.
This is an oft-requested feature from our HW folks, who want to be able
to run a large number of events, but also want 100% accurate results for
instructions per cycle.
Comparison of results with and without pinning:
$ perf stat -e '{cycles,instructions}:D' -e cycles,instructions,...
79,590,480,683 cycles # 0.000 GHz
166,123,716,524 instructions # 2.09 insns per cycle
# 0.11 stalled cycles per insn
79,352,134,463 cycles # 0.000 GHz [11.11%]
165,178,301,818 instructions # 2.08 insns per cycle
# 0.11 stalled cycles per insn [11.13%]
As you can see although perf does a very good job of scaling the values
in the non-pinned case, there is some small discrepancy.
The patch is fairly straight forward, the one detail is that we need to
make sure we only request pinning for the group leader when we have a
group.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1375795686-4226-1-git-send-email-michael@ellerman.id.au
[ Use perf_evsel__is_group_leader instead of open coded equivalent, as
suggested by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf kvm stat currently requires back to back record and report commands
to see stats. e.g,.
perf kvm stat record -p $pid -- sleep 1
perf kvm stat report
This is inconvenvient for on box monitoring of a VM. This patch
introduces a 'live' mode that in effect combines the record plus report
into one command. e.g., to monitor a single VM:
perf kvm stat live -p $pid
or all VMs:
perf kvm stat live
Same stats options for the record+report path work with the live mode.
Display rate defaults to 1 second and can be changed using the -d
option.
v4:
- address comments from Xiao -- verify_vcpu check should not look at
processors on line for the host, prune configurable options.
- set attr->{mmap,comm,task} to 0 - don't need task events so trim events
we have to deal with
- better control of time for queue event flushing to reduce frequency of
"Timestamp below last timeslice flush" failures.
v3:
updated to use existing tracepoint parsing code
v2:
removed ABSTIME arg from timerfd_settime as mentioned by Namhyung
only call perf_kvm__handle_stdin when poll returns activity.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1375753297-69645-3-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>