The current version of perf detects whether or not the perf.data file is
written in a different endianness using the attr_size field in the
header of the file. This field represents sizeof(struct perf_event_attr)
as known to perf record. If the sizes do not match, then perf tries the
byte-swapped version. If they match, then the tool assumes a different
endianness.
The issue with the approach is that it assumes the size of
perf_event_attr always has to match between perf record and perf report.
However, the kernel perf_event ABI is extensible. New fields can be
added to struct perf_event_attr. Consequently, it is not possible to use
attr_size to detect endianness.
This patch takes another approach by using the magic number written at
the beginning of the perf.data file to detect endianness. The magic
number is an eight-byte signature. It's primary purpose is to identify
(signature) a perf.data file. But it could also be used to encode the
endianness.
The patch introduces a new value for this signature. The key difference
is that the signature is written differently in the file depending on
the endianness. Thus, by comparing the signature from the file with the
tool's own signature it is possible to detect endianness. The new
signature is "PERFILE2".
Backward compatiblity with existing perf.data file is ensured.
Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roberto Agostino Vitillo <ravitillo@lbl.gov>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Vince Weaver <vweaver1@eecs.utk.edu>
Link: http://lkml.kernel.org/r/1328187288-24395-15-git-send-email-eranian@google.com
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we can put the object files in a different directory by using
'O=' comand line argument.
However the generated documentation files don't honor this directive,
This patch fixes that. It's been tested for man target but the others
seems currently broken so no tests have been done on them so far.
Link: http://lkml.kernel.org/r/1328541443-18003-1-git-send-email-fbuihuu@gmail.com
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can get the perf bench exec stack fixes and then apply the
remaining fix for the files added after what is in perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch fixes an issue where perf report shows nan% for certain
perf.data files. The below is from a report for a do_fork probe:
-nan% sshd [kernel.kallsyms] [k] do_fork
-nan% packagekitd [kernel.kallsyms] [k] do_fork
-nan% dbus-daemon [kernel.kallsyms] [k] do_fork
-nan% bash [kernel.kallsyms] [k] do_fork
A git bisect shows commit f3bda2c as the cause. However, looking back
through the git history, I saw commit 640c03c which seems to have
removed the required initialization for perf_sample->period. The problem
only started showing after commit f3bda2c. The below patch re-introduces
the initialization and it fixes the problem for me.
With the below patch, for the same perf.data:
73.08% bash [kernel.kallsyms] [k] do_fork
8.97% 11-dhclient [kernel.kallsyms] [k] do_fork
6.41% sshd [kernel.kallsyms] [k] do_fork
3.85% 20-chrony [kernel.kallsyms] [k] do_fork
2.56% sendmail [kernel.kallsyms] [k] do_fork
This patch applies over current linux-tip commit 9949284.
Problem introduced in:
$ git describe 640c03c
v2.6.37-rc3-83-g640c03c
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/20120203170113.5190.25558.stgit@localhost6.localdomain6
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In some perf ancient versions we used '[kernel.kallsyms._text]' as the
name for the kernel map.
This got changed with commit:
perf: 'perf kvm' tool for monitoring guest performance from host
commit a1645ce12a
Author: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
and we started to use following name '[kernel.kallsyms]_text'.
This name change is important for the report code dealing with ancient
perf data. When processing the kernel map event, we need to recognize
the old naming (dont match the last ']') and initialize the kernel map
correctly.
The subsequent call to maps__set_kallsyms_ref_reloc_sym deals with the
superfluous ']' to get correct symbol name.
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328461865-6127-1-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
distclean as an alias for clean was removed from the perf Makefile by
commit a3d1ee10d1
However, that commit neglected to remove it from the help output of
the perf Makefile, which could result in a user trying the following.
$ cd tools/perf/
$ make help | grep distclean
distclean - alias to clean
$ make distclean
make: *** No rule to make target `distclean'. Stop.
This patch removes it from the Makefile help output.
Link: http://lkml.kernel.org/r/1328134591-19851-1-git-send-email-jkacur@redhat.com
Signed-off-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In recent versions of perf top, pressing the 'e' key to change the
number of displayed samples had no effect.
The number of samples was still dictated by the size of the terminal
(stdio mode). That was quite annoying because typically only the first
dozen samples really matter.
This patch fixes this.
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120130105037.GA5160@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The event_type record has a max length for the event name.
It's called MAX_EVENT_NAME.
The name may be truncated to fit the max length. But the header.size still
reflects the original name length. If that length is > MAX_EVENT_NAME, then the
header.size field is bogus. Fix this by using the length of the name after the
potential truncation.
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120120094912.GA4882@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When building on my Debian/mips system, util/util.c fails to build
because commit 1aed267173 (perf kvm: Do
guest-only counting by default) indirectly includes stdio.h before the
feature selection in util.h is done. This prevents _GNU_SOURCE in
util.h from enabling the declaration of getline(), from now second
inclusion of stdio.h, and the build is broken.
There is another breakage in util/evsel.c caused by include ordering,
but I didn't fully track down the commit that caused it.
The root cause of all this is an inconsistent definition of _GNU_SOURCE,
so I move the definition into the Makefile so that it is passed to all
invocations of the compiler and used uniformly for all system header
files. All other #define and #undef of _GNU_SOURCE are removed as they
cause conflicts with the definition passed to the compiler.
All the features.h definitions (_LARGEFILE64_SOURCE _FILE_OFFSET_BITS=64
and _GNU_SOURCE) are needed by the python glue code too, so they are
moved to BASIC_CFLAGS, and the misleading comments about BASIC_CFLAGS
are removed.
This gives me a clean build on x86_64 (fc12) and mips (Debian).
Cc: David Daney <david.daney@cavium.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1326836461-11952-1-git-send-email-ddaney.cavm@gmail.com
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In recent versions of perf top, pressing the 'e' key to change the
number of displayed samples had no effect.
The number of samples was still dictated by the size of the terminal
(stdio mode). That was quite annoying because typically only the first
dozen samples really matter.
This patch fixes this.
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120130105037.GA5160@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add the option get the path of [kernel.kallsyms].
Specify '--show-kernel-path' option to use this function.
This patch enables other applications to use this output easily.
Without --show-kernel-path option
ffffffff81467612 irq_return ([kernel.kallsyms])
ffffffff81467612 irq_return ([kernel.kallsyms])
7f24fc02a6b3 _start (/lib64/ld-2.14.so)
[snip]
With --show-kernel-path option
ffffffff81467612 irq_return (/lib/modules/3.2.0+/build/vmlinux)
ffffffff81467612 irq_return (/lib/modules/3.2.0+/build/vmlinux)
7f24fc02a6b3 _start (/lib64/ld-2.14.so)
[snip]
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20120130044320.2384.73322.stgit@linux3
Signed-off-by: Akihiro Nagai <akihiro.nagai.hw@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The event_type record has a max length for the event name.
It's called MAX_EVENT_NAME.
The name may be truncated to fit the max length. But the header.size still
reflects the original name length. If that length is > MAX_EVENT_NAME, then the
header.size field is bogus. Fix this by using the length of the name after the
potential truncation.
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120120094912.GA4882@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When building on my Debian/mips system, util/util.c fails to build
because commit 1aed267173 (perf kvm: Do
guest-only counting by default) indirectly includes stdio.h before the
feature selection in util.h is done. This prevents _GNU_SOURCE in
util.h from enabling the declaration of getline(), from now second
inclusion of stdio.h, and the build is broken.
There is another breakage in util/evsel.c caused by include ordering,
but I didn't fully track down the commit that caused it.
The root cause of all this is an inconsistent definition of _GNU_SOURCE,
so I move the definition into the Makefile so that it is passed to all
invocations of the compiler and used uniformly for all system header
files. All other #define and #undef of _GNU_SOURCE are removed as they
cause conflicts with the definition passed to the compiler.
All the features.h definitions (_LARGEFILE64_SOURCE _FILE_OFFSET_BITS=64
and _GNU_SOURCE) are needed by the python glue code too, so they are
moved to BASIC_CFLAGS, and the misleading comments about BASIC_CFLAGS
are removed.
This gives me a clean build on x86_64 (fc12) and mips (Debian).
Cc: David Daney <david.daney@cavium.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1326836461-11952-1-git-send-email-ddaney.cavm@gmail.com
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Intended to be able to support the current selection of the preferred
memcpy() implementation, this patch adds the ability to also measure the
two alternative implementations, again by way of using some
pre-processsor replacement.
While on my Westmere system this proves that the movsb based variant is
worse than the movsq based one (since the ERMS feature isn't there), it
also shows that here for the default as well as small sizes the unrolled
variant outperforms the movsq one.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F16D728020000780006D732@nat28.tlf.novell.com
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since arch/x86/lib/memcpy_64.S implements not only __memcpy, but also
memcpy, without further precautions this function will get chose by the
static linker for resolving all references, and hence the "default"
measurement didn't really measure anything else than the
"x86-64-unrolled" one.
Fix this by renaming (through the pre-processor) the conflicting symbol.
On my Westmere system, the glibc variant turns out to require about 4%
less instructions, but 15% more cycles for the default 1Mb block size
measured.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F16D6FD020000780006D72F@nat28.tlf.novell.com
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
perf tools: Fix compile error on x86_64 Ubuntu
perf report: Fix --stdio output alignment when --showcpuutilization used
perf annotate: Get rid of field_sep check
perf annotate: Fix usage string
perf kmem: Fix a memory leak
perf kmem: Add missing closedir() calls
perf top: Add error message for EMFILE
perf test: Change type of '-v' option to INCR
perf script: Add missing closedir() calls
tracing: Fix compile error when static ftrace is enabled
recordmcount: Fix handling of elf64 big-endian objects.
perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again
perf tools: Add support for guest/host-only profiling
perf kvm: Do guest-only counting by default
perf top: Don't update total_period on process_sample
perf hists: Stop using 'self' for struct hist_entry
perf hists: Rename total_session to total_period
x86: Add counter when debug stack is used with interrupts enabled
x86: Allow NMIs to hit breakpoints in i386
x86: Keep current stack in NMI breakpoints
...
The ctype.h include is not needed here and it breaks build on some systems (at
least 64bit Ubuntu 10.04) like below. Just get rid of it.
CC util/trace-event-info.o
cc1: warnings being treated as errors
util/trace-event-info.c: In function ‘record_file’:
util/trace-event-info.c:192: error: implicit declaration of function ‘pwrite’
util/trace-event-info.c:192: error: nested extern declaration of ‘pwrite’
make: *** [util/trace-event-info.o] Error 1
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1326035430-7621-1-git-send-email-namhyung@gmail.com
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits)
powerpc: fix compile error with 85xx/p1010rdb.c
powerpc: fix compile error with 85xx/p1023_rds.c
powerpc/fsl: add MSI support for the Freescale hypervisor
arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree
powerpc/fsl: Add support for Integrated Flash Controller
powerpc/fsl: update compatiable on fsl 16550 uart nodes
powerpc/85xx: fix PCI and localbus properties in p1022ds.dts
powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig
powerpc/fsl: Update defconfigs to enable some standard FSL HW features
powerpc: Add TBI PHY node to first MDIO bus
sbc834x: put full compat string in board match check
powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address
powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
offb: Fix setting of the pseudo-palette for >8bpp
offb: Add palette hack for qemu "standard vga" framebuffer
offb: Fix bug in calculating requested vram size
powerpc/boot: Change the WARN to INFO for boot wrapper overlap message
powerpc/44x: Fix build error on currituck platform
powerpc/boot: Change the load address for the wrapper to fit the kernel
powerpc/44x: Enable CRASH_DUMP for 440x
...
Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to
the additional sparse-checking code for cputime_t.