Merge reason: This is an older commit under testing that was not pushed yet - merge it.
Also fix up the merge in command-list.txt.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
The ordered sample code allocates singular reference objects struct
sample_queue which have 48byte size on 64bit and 20 bytes on 32bit. That's
silly. Allocate ~64k sized chunks and hand them out.
Performance gain: ~ 15%
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.398713983@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the sample queue is flushed we free the sample reference objects. Though
we need to malloc new objects when we process further. Stop the malloc/free
orgy and cache the already allocated object for resuage. Only allocate when
the cache is empty.
Performance gain: ~ 10%
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.338488630@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Profiling perf with perf revealed that a large part of the processing time is
spent in malloc/memcpy/free in the sample ordering code. That code copies the
data from the mmap into malloc'ed memory. That's silly. We can keep the mmap
and just store the pointer in the queuing data structure. For 64 bit this is
not a problem as we map the whole file anyway. On 32bit we keep 8 maps around
and unmap the oldest before mmaping the next chunk of the file.
Performance gain: 2.95s -> 1.23s (Faktor 2.4)
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.278787719@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On 64bit we can map the whole file in one go, on 32bit we can at least map
32MB and not map/unmap tiny chunks of the file.
Base the progress bar on 1/16 of the data size.
Preparatory patch to get rid of the malloc/memcpy/free of trace data.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.213687773@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Replace the pseudo C++ self argument with session and give the mmap related
variables a sensible name. shift is a complete misnomer - it took me several
rounds of cursing to figure out that it's not a shift value.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.029687218@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The homebrewn sort algorithm fails to sort in time order. One of the problem
spots is that it fails to deal with equal timestamps correctly.
My first gut reaction was to replace the fancy list with an rbtree, but the
performance is 3 times worse.
Rewrite it so it works.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163819.908482530@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This primarily fixes perf-report, which didn't report the correct type
of event if perf-record was called to record one event different from
'cycles':
$ perf record -e instructions true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB perf.data (~295 samples) ]
$ perf report | head -n1
# Events: 7 cycles
LPU-Reference: <m3mxor6nex.fsf@gmail.com>
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
On ARM, module symbol start address is ahead of kernel symbol start address, so
we can't suppose that the start address of kernel map always is zero, otherwise
may cause incorrect .start and .end of kernel map (caused by fixup) when there
are modules loaded, then map_groups__find may return incorrect map for symbol
query.
This patch always figures out the start address of kernel map from
/proc/kallsyms if the file is available, so fix the issues on ARM for module
loaded case.
This patch fixes the following issues on ARM when modules are loaded:
- vmlinux symbol can't be found by kallsyms maps doing 'perf test'
- module symbols are parsed mistakenlly when doing 'perf top'/'perf report'
Cc: Ian Munsie <imunsie@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <20101125192725.62d31b42@tom-lei>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix it by explaining what can be happening and giving the number of processed
and lost events.
Also holler if unknown events were found, that can be due to processing a
perf.data file collected using a newer tool where newer events got added on
reporting using an older perf tool, that or a bug, so ask for a report to be
made.
Works on both --tui and --stdio.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tool developers have to fill in a 'perf_event_ops' method table to
specify how to handle each event, so far the ones that were not
explicitely especified would get a stub that would just discard the
event.
Change that so that tool developers can get the lost event details and
the total number of such events at the end of 'perf report -D' output.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Collecting build-ids for long running sessions may take a long time
because it needs to traverse the whole just collected perf.data stream
of events, marking the DSOs that had hits and then looking for the
.note.gnu.build-id ELF section.
For things like the 'trace' tool that records and right away consumes
the data on systems where its unlikely that the DSOs being monitored
will change while 'trace' runs, it is desirable to remove build id
collection, so add a -B/--no-buildid option to perf record to allow such
use case.
Longer term we'll avoid all this if we, at DSO load time, in the kernel,
take advantage of this slow code path to collect the build-id and stash
it somewhere, so that we can insert it in the PERF_RECORD_MMAP event.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
At least on ARM, padding is inserted between rb_node and sym in struct
symbol_name_rb_node, causing "((void *)sym) - sizeof(struct rb_node)" to
point inside rb_node rather than to the symbol_name_rb_node. Fix this
by converting the code to use container_of().
Cc: Ian Munsie <imunsie@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <20101123163106.GA25677@debian>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 59365d1 commit, even being reverted by 33e0d57, showed a non robust
behavior in 'perf record': it really should just warn the user that some
functionality will not be available.
The new behavior then becomes:
[acme@felicio linux]$ ls -la /proc/{kallsyms,modules}
-r-------- 1 root root 0 Nov 22 12:19 /proc/kallsyms
-r-------- 1 root root 0 Nov 22 12:19 /proc/modules
[acme@felicio linux]$ perf record ls -R > /dev/null
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB perf.data (~161 samples) ]
[acme@felicio linux]$ perf report --stdio
[kernel.kallsyms] with build id 77b05e00e64e4de1c9347d83879779b540d69f00 not found, continuing without symbols
# Events: 98 cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ............... ....................
#
48.26% ls [kernel] [k] ffffffff8102b92b
22.49% ls libc-2.12.90.so [.] __strlen_sse2
8.35% ls libc-2.12.90.so [.] __GI___strcoll_l
8.17% ls ls [.] 11580
3.35% ls libc-2.12.90.so [.] _IO_new_file_xsputn
3.33% ls libc-2.12.90.so [.] _int_malloc
1.88% ls libc-2.12.90.so [.] _int_free
0.84% ls libc-2.12.90.so [.] malloc_consolidate
0.84% ls libc-2.12.90.so [.] __readdir64
0.83% ls ls [.] strlen@plt
0.83% ls libc-2.12.90.so [.] __GI_fwrite_unlocked
0.83% ls libc-2.12.90.so [.] __memcpy_sse2
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
[acme@felicio linux]$
It still has the build-ids for DSOs in the maps with hits:
[acme@felicio linux]$ perf buildid-list
77b05e00e64e4de1c9347d83879779b540d69f00 [kernel.kallsyms]
09c4a431a4a8b648fcfc2c2bdda70f56050ddff1 /bin/ls
af75ea9ad951d25e0f038901a11b3846dccb29a4 /lib64/libc-2.12.90.so
[acme@felicio linux]$
That can be used in another machine to resolve kernel symbols.
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Marcus Meissner <meissner@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch makes several changes to "perf stat":
- "perf stat" will no longer go ahead and run the application when one or
more of the specified events could not be opened.
- Use error() and die() instead of pr_err() so that the output is more
consistent with "perf top" and "perf record".
- Handle permission errors in a more robust way, and in a similar way to
"perf record" and "perf top".
In addition, the sys_perf_event_open() error handling of "perf top" and "perf
record" is made more consistent and adds the following phrase when an event
doesn't open (with something ther than an access or permission error):
"/bin/dmesg may provide additional information."
This is added because kernel code doesn't have a good way of expressing
detailed errors to user space, so its only avenue is to use printk's. However,
many users may not think of looking at dmesg to find out why an event is being
rejected.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <fweisbec@gmail.com>
Cc: Ian Munsie <ianmunsi@au1.ibm.com>
Cc: Michael Ellerman <michaele@au1.ibm.com>
LKML-Reference: <1290217044-26293-1-git-send-email-cjashfor@linux.vnet.ibm.com>
Signed-off-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This change removes the use of hardcoded absolute "/usr/include/elfutils" paths
from the perf build. The problem with hardcoded paths is that it prevents them
from being overridden by $prefix or by -I in CFLAGS (e.g., for cross-compiling
purposes).
Instead, just include the "elfutils/" subdirectory as a relative path when
files are needed from that directory.
Tested by building perf:
- Cross-compiled for ARM on x86_64
- Built natively on x86_64
- Built on x86_64 with /usr/include/elfutils moved to another location
and manually included in CFLAGS
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LKML-Reference: <1289945793-31441-1-git-send-email-rmorell@nvidia.com>
Signed-off-by: Robert Morell <rmorell@nvidia.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a new -A option to perf stat. If specified then perf stat does
not aggregate counts across all monitored CPUs in system-wide mode, i.e., when
using -a. This option is not supported in per-thread mode.
Being able to get a per-cpu breakdown is useful to detect imbalances between
CPUs when running a uniform workload than spans all monitored CPUs.
The second version corrects the missing cpumap[] support, so that it works when
the -C option is used.
The third version fixes a missing cpumap[] in print_counter() and removes a
stray patch in builtin-trace.c.
Examples on a 4-way system:
# perf stat -a -e cycles,instructions -- sleep 1
Performance counter stats for 'sleep 1':
9592808135 cycles
3490380006 instructions # 0.364 IPC
1.001584632 seconds time elapsed
# perf stat -a -A -e cycles,instructions -- sleep 1
Performance counter stats for 'sleep 1':
CPU0 2398163767 cycles
CPU1 2398180817 cycles
CPU2 2398217115 cycles
CPU3 2398247483 cycles
CPU0 872282046 instructions # 0.364 IPC
CPU1 873481776 instructions # 0.364 IPC
CPU2 872638127 instructions # 0.364 IPC
CPU3 872437789 instructions # 0.364 IPC
1.001556052 seconds time elapsed
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In keeping with the notion that all tools should be simple for
all to use. I've changed ktest.pl to ask for mandatory options
instead of just failing. It will append (or create) the options
the user types in, onto the config file.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
During the config_bisect, in case of failure, it is nice to have
the last good and bad .configs that were used. This would let
us restart the config_bisect from those configs.
Copy the last good config into the output dir as config_good,
and the last bad config as config_bad.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The run_ssh handles the ssh variable $SSH_COMMAND, which was not
being used by the run_command in reboot_to function.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Added the options STOP_AFTER_SUCCESS and STOP_AFTER_FAILURE to
allow the user to give a time (in seconds) to stop the monitor
after a stack trace or login has been detected. Sometimes the
kernel constantly prints out to the console and this may cause
the test to run indefinitely.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When we store failures, we create a directory that has the build_type
in it. For useconfig, it also contains the name path of the config
file it uses. This unfortunately gets its own directory on failure.
Parse off the directory name when creating the directory to store
the failures.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
By using the "use_config" for minconfig and addconfig we risk
trying to copy itself to itself, which will cause an unexpected failure.
Use a different name instead.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a compare script that makes sure that all the options in
sample.conf are used in ktest.pl, and all the options in
ktest.pl are described in sample.conf.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Added the ability to do a config_bisect. It starts with a bad
config and does the following loop.
Enable half the configs.
if none of the configs to check are not enabled
(caused by missing dependencies) enable the other half.
Run the test
if the test passes, remove the configs from the check
but enabled them for further tests (to satisfy
dependencies).
else
Remove any config that was not enabled, as we have found
a new config that can cause a failure.
loop till we have only one config left.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Updated to version 0.2.
Now have SSH_EXEC options.
Also added some cleanups for keeping track of success and
reading the config file.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>