Currently when using ordered events we parse the sample twice (the
perf_evlist__parse_sample function). Once before we queue the sample for
sorting:
perf_session__process_event
perf_evlist__parse_sample(sample)
perf_session__queue_event(sample.time)
And then when we deliver the sorted sample:
ordered_events__deliver_event
perf_evlist__parse_sample
perf_session__deliver_event
We can skip the initial full sample parsing by using
perf_evlist__parse_sample_timestamp function, which got introduced
earlier. The new path looks like:
perf_session__process_event
perf_evlist__parse_sample_timestamp
perf_session__queue_event
ordered_events__deliver_event
perf_session__deliver_event
perf_evlist__parse_sample
It saves some instructions and is slightly faster:
Before:
Performance counter stats for './perf.old report --stdio' (5 runs):
64,396,007,225 cycles:u ( +- 0.97% )
105,882,112,735 instructions:u # 1.64 insn per cycle ( +- 0.00% )
21.618103465 seconds time elapsed ( +- 1.12% )
After:
Performance counter stats for './perf report --stdio' (5 runs):
60,567,807,182 cycles:u ( +- 0.40% )
104,853,333,514 instructions:u # 1.73 insn per cycle ( +- 0.00% )
20.168895243 seconds time elapsed ( +- 0.32% )
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-cjp2tuk0qkjs9dxzlpmm34ua@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull KVM updates from Radim Krčmář:
"First batch of KVM changes for 4.15
Common:
- Python 3 support in kvm_stat
- Accounting of slabs to kmemcg
ARM:
- Optimized arch timer handling for KVM/ARM
- Improvements to the VGIC ITS code and introduction of an ITS reset
ioctl
- Unification of the 32-bit fault injection logic
- More exact external abort matching logic
PPC:
- Support for running hashed page table (HPT) MMU mode on a host that
is using the radix MMU mode; single threaded mode on POWER 9 is
added as a pre-requisite
- Resolution of merge conflicts with the last second 4.14 HPT fixes
- Fixes and cleanups
s390:
- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
x86:
- Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
and after-reset state
- Refined dependencies for VMX features
- Fixes for nested SMI injection
- A lot of cleanups"
* tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
KVM: s390: provide a capability for AIS state migration
KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
KVM: s390: abstract conversion between isc and enum irq_types
KVM: s390: vsie: use common code functions for pinning
KVM: s390: SIE considerations for AP Queue virtualization
KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
KVM: arm/arm64: fix the incompatible matching for external abort
KVM: arm/arm64: Unify 32bit fault injection
KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
KVM: arm/arm64: vgic-its: New helper functions to free the caches
KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
arm/arm64: KVM: Load the timer state when enabling the timer
KVM: arm/arm64: Rework kvm_timer_should_fire
KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
KVM: arm/arm64: Move phys_timer_emulate function
KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
...
Pull powerpc updates from Michael Ellerman:
"A bit of a small release, I suspect in part due to me travelling for
KS. But my backlog of patches to review is smaller than usual, so I
think in part folks just didn't send as much this cycle.
Non-highlights:
- Five fixes for the >128T address space handling, both to fix bugs
in our implementation and to bring the semantics exactly into line
with x86.
Highlights:
- Support for a new OPAL call on bare metal machines which gives us a
true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.
- Support for Power9 DD2 in the CXL driver.
- Improvements to machine check handling so that uncorrectable errors
can be reported into the generic memory_failure() machinery.
- Some fixes and improvements for VPHN, which is used under PowerVM
to notify the Linux partition of topology changes.
- Plumbing to enable TM (transactional memory) without suspend on
some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).
- Support for emulating vector loads form cache-inhibited memory, on
some Power9 revisions.
- Disable the fast-endian switch "syscall" by default (behind a
CONFIG), we believe it has never had any users.
- A major rework of the API drivers use when initiating and waiting
for long running operations performed by OPAL firmware, and changes
to the powernv_flash driver to use the new API.
- Several fixes for the handling of FP/VMX/VSX while processes are
using transactional memory.
- Optimisations of TLB range flushes when using the radix MMU on
Power9.
- Improvements to the VAS facility used to access coprocessors on
Power9, and related improvements to the way the NX crypto driver
handles requests.
- Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
Kennington III"
* tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
powerpc/64s: Fix masking of SRR1 bits on instruction fault
powerpc/64s: mm_context.addr_limit is only used on hash
powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
powerpc/64s/hash: Fix fork() with 512TB process address space
powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Fix 512T hint detection to use >= 128T
powerpc: Fix DABR match on hash based systems
powerpc/signal: Properly handle return value from uprobe_deny_signal()
powerpc/fadump: use kstrtoint to handle sysfs store
powerpc/lib: Implement UACCESS_FLUSHCACHE API
powerpc/lib: Implement PMEM API
powerpc/powernv/npu: Don't explicitly flush nmmu tlb
powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
powerpc/powernv/idle: Round up latency and residency values
powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
...
Help identify to the user the event with the unsupported sampling error.
Also suggest a corrective action.
BEFORE:
$ sudo ./oldperf record -e armv8_pmuv3/mem_access/,ccn/cycles/,armv8_pmuv3/l2d_cache/ true
Error:
PMU Hardware doesn't support sampling/overflow-interrupts.
AFTER:
$ sudo ./newperf record -e armv8_pmuv3/mem_access/,ccn/cycles/,armv8_pmuv3/l2d_cache/ true
Error:
ccn/cycles/: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171114150452.e846f2e23684c7d7d8ee706f@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If we're not sampling the kernel, we shouldn't care about kptr_restrict
neither synthesize anything for assisting in resolving kernel samples,
like the reference relocation symbol or kernel modules information.
Before:
$ cat /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid
2
2
$ perf record sleep 1
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.
Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
$ perf evlist -v
cycles:uppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
$
After:
$ perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (10 samples) ]
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-t025e9zftbx2b8cq2w01g5e5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf test' case "probe libc's inet_pton & backtrace it with ping"
fails on s390x. The reason is the 'realpath /lib64/ld*.so.* | uniq' line
which returns 2 libraries:
root@s35lp76 shell]# realpath /lib64/ld*.so.* | uniq
/usr/lib64/ld-2.26.so
/usr/lib64/ld_pre_smc.so.1.0.1
[root@s35lp76 shell]
This output makes the "perf probe" command lines invalid.
Use ldd tool to find out the libraries required by "bash" and check if
symbol "inet_pton" is part of the "libc" library. Some distros do not
have a /lib64 directory.
I have also added a check for the existence of an IPv6 network interface
before it is being used.
Committer changes:
We can't really use ldd for libc, as in some systems, such as x86_64, it
has hardlinks and then ldd sees one and the kernel the other, so grep
for libc in /proc/self/maps to get the one we'll receive from
PERF_RECORD_MMAP.
Thomas checked this change and acked it.
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Suggested-by: Hendrik Brückner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brückner <brueckner@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20171114133409.GN8836@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This 'perf test' case fails on s390x. The 'touch' command on s390x uses
the 'openat' system call to open the file named on the command line:
[root@s35lp76 perf]# perf probe -l
probe:vfs_getname (on getname_flags:72@fs/namei.c with pathname)
[root@s35lp76 perf]# perf trace -e open touch /tmp/abc
0.400 ( 0.015 ms): touch/27542 open(filename:
/usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
[root@s35lp76 perf]#
There is no 'open' system call for file '/tmp/abc'. Instead the 'openat'
system call is used:
[root@s35lp76 perf]# strace touch /tmp/abc
execve("/usr/bin/touch", ["touch", "/tmp/abc"], 0x3ffd547ec98
/* 30 vars */) = 0
[...]
openat(AT_FDCWD, "/tmp/abc", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
[...]
On s390x the 'egrep' command does not find a matching pattern and
returns an error.
Fix this for s390x create a platform dependent command line to enable
the 'perf probe' call to listen to the 'openat' system call and get the
expected output.
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
LPU-Reference: 20171114071847.2381-1-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-3qf38jk0prz54rhmhyu871my@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A recent fix for 'perf trace' introduced a bug where
machine__exit(trace->host) could be called while trace->host was still
NULL, so make this more robust by guarding against NULL, just like
free() does.
The problem happens, for instance, when !root users try to run 'perf
trace':
[acme@jouet linux]$ trace
Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'
perf: Segmentation fault
Obtained 7 stack frames.
[0x4f1b2e]
/lib64/libc.so.6(+0x3671f) [0x7f43a1dd971f]
[0x4f3fec]
[0x47468b]
[0x42a2db]
/lib64/libc.so.6(__libc_start_main+0xe9) [0x7f43a1dc3509]
[0x42a6c9]
Segmentation fault (core dumped)
[acme@jouet linux]$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 33974a414c ("perf trace: Call machine__exit() at exit")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When using leader sampling the values of the not sampled but counted
events are shown by perf script in "period".
Currently printing period is only allowed when the main event has a
period, that is it is in frequency mode.
This implies that we cannot dump the values of counted events when the
leader event is not in frequency mode.
Just remove the check that the period must be set on all events. It will
just be printed as 0 instead if it's not available.
This fixes the following:
$ perf record -c 100000 -e '{cycles,branches}:S'
$ perf script -F event,period
Further commentary by Jiri Olsa:
The period will be the value of configured period, not 0:
int perf_evsel__parse_sample(struct ...
...
data->period = evsel->attr.sample_period;
$ perf record -c 100000
$ perf script -F event,period | head -3
Failed to open /tmp/perf-2048.map, continuing without symbols
100000 cycles:ppp:
100000 cycles:ppp:
other than that I think we can remove that check, because we will have
always sane number in period
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171109145528.23371-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When processing PERF_RECORD_AUXTRACE_INFO several perf_evsel entries
will be synthesized and inserted into session->evlist, eventually ending
in perf_script.tool.sample(), which ends up calling builtin-script.c's
process_event(), that expects evsel->priv to be a perf_evsel_script
object with a valid FILE pointer in fp.
So we need to intercept the processing of PERF_RECORD_AUXTRACE_INFO and
then setup evsel->priv for these newly created perf_evsel instances, do
it to fix the segfault in process_event() trying to use a NULL for that
FILE pointer.
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Fixes: a14390fde6 ("perf script: Allow creating per-event dump files")
Link: http://lkml.kernel.org/n/tip-bthnur8r8de01gxvn2qayx6e@git.kernel.org
[ Merge fix by Ravi Bangoria before pushing upstream to preserv bisectability ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use a typed enum for the perf_evsel_config_term type enum. This allows
gcc to do much stronger type checks, and also check for missing case
statements.
I removed the unused _MAX member from the number.
It found one missing case. I'm not sure it's a real problem, so I just
turned it into a BUG_ON for now.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171020202755.21410-1-andi@firstfloor.org
[ Renamed the enum name to term_type as per jolsa's request ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Intel PMU event aliases have a implicit period= specifier to set the
default period.
Unfortunately this breaks overriding these periods with -c or -F,
because the alias terms look like they are user specified to the
internal parser, and user specified event qualifiers override the
command line options.
Track that they are coming from aliases by adding a "weak" state to the
term. Any weak terms don't override command line options.
I only did it for -c/-F for now, I think that's the only case that's
broken currently.
Before:
$ perf record -c 1000 -vv -e uops_issued.any
...
{ sample_period, sample_freq } 2000003
After:
$ perf record -c 1000 -vv -e uops_issued.any
...
{ sample_period, sample_freq } 1000
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171020202755.21410-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>