123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347 |
- perf-trace(1)
- =============
- NAME
- ----
- perf-trace - strace inspired tool
- SYNOPSIS
- --------
- [verse]
- 'perf trace'
- 'perf trace record'
- DESCRIPTION
- -----------
- This command will show the events associated with the target, initially
- syscalls, but other system events like pagefaults, task lifetime events,
- scheduling events, etc.
- This is a live mode tool in addition to working with perf.data files like
- the other perf tools. Files can be generated using the 'perf record' command
- but the session needs to include the raw_syscalls events (-e 'raw_syscalls:*').
- Alternatively, 'perf trace record' can be used as a shortcut to
- automatically include the raw_syscalls events when writing events to a file.
- The following options apply to perf trace; options to perf trace record are
- found in the perf record man page.
- OPTIONS
- -------
- -a::
- --all-cpus::
- System-wide collection from all CPUs.
- -e::
- --expr::
- --event::
- List of syscalls and other perf events (tracepoints, HW cache events,
- etc) to show. Globbing is supported, e.g.: "epoll_*", "*msg*", etc.
- See 'perf list' for a complete list of events.
- Prefixing with ! shows all syscalls but the ones specified. You may
- need to escape it.
- --filter=<filter>::
- Event filter. This option should follow an event selector (-e) which
- selects tracepoint event(s).
- -D msecs::
- --delay msecs::
- After starting the program, wait msecs before measuring. This is useful to
- filter out the startup phase of the program, which is often very different.
- -o::
- --output=::
- Output file name.
- -p::
- --pid=::
- Record events on existing process ID (comma separated list).
- -t::
- --tid=::
- Record events on existing thread ID (comma separated list).
- -u::
- --uid=::
- Record events in threads owned by uid. Name or number.
- -G::
- --cgroup::
- Record events in threads in a cgroup.
- Look for cgroups to set at the /sys/fs/cgroup/perf_event directory, then
- remove the /sys/fs/cgroup/perf_event/ part and try:
- perf trace -G A -e sched:*switch
- Will set all raw_syscalls:sys_{enter,exit}, pgfault, vfs_getname, etc
- _and_ sched:sched_switch to the 'A' cgroup, while:
- perf trace -e sched:*switch -G A
- will only set the sched:sched_switch event to the 'A' cgroup, all the
- other events (raw_syscalls:sys_{enter,exit}, etc are left "without"
- a cgroup (on the root cgroup, sys wide, etc).
- Multiple cgroups:
- perf trace -G A -e sched:*switch -G B
- the syscall ones go to the 'A' cgroup, the sched:sched_switch goes
- to the 'B' cgroup.
- --filter-pids=::
- Filter out events for these pids and for 'trace' itself (comma separated list).
- -v::
- --verbose::
- Increase the verbosity level.
- --no-inherit::
- Child tasks do not inherit counters.
- -m::
- --mmap-pages=::
- Number of mmap data pages (must be a power of two) or size
- specification with appended unit character - B/K/M/G. The
- size is rounded up to have nearest pages power of two value.
- -C::
- --cpu::
- Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a
- comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
- In per-thread mode with inheritance mode on (default), Events are captured only when
- the thread executes on the designated CPUs. Default is to monitor all CPUs.
- --duration::
- Show only events that had a duration greater than N.M ms.
- --sched::
- Accrue thread runtime and provide a summary at the end of the session.
- --failure::
- Show only syscalls that failed, i.e. that returned < 0.
- -i::
- --input::
- Process events from a given perf data file.
- -T::
- --time::
- Print full timestamp rather time relative to first sample.
- --comm::
- Show process COMM right beside its ID, on by default, disable with --no-comm.
- -s::
- --summary::
- Show only a summary of syscalls by thread with min, max, and average times
- (in msec) and relative stddev.
- -S::
- --with-summary::
- Show all syscalls followed by a summary by thread with min, max, and
- average times (in msec) and relative stddev.
- --errno-summary::
- To be used with -s or -S, to show stats for the errnos experienced by
- syscalls, using only this option will trigger --summary.
- --tool_stats::
- Show tool stats such as number of times fd->pathname was discovered thru
- hooking the open syscall return + vfs_getname or via reading /proc/pid/fd, etc.
- -f::
- --force::
- Don't complain, do it.
- -F=[all|min|maj]::
- --pf=[all|min|maj]::
- Trace pagefaults. Optionally, you can specify whether you want minor,
- major or all pagefaults. Default value is maj.
- --syscalls::
- Trace system calls. This options is enabled by default, disable with
- --no-syscalls.
- --call-graph [mode,type,min[,limit],order[,key][,branch]]::
- Setup and enable call-graph (stack chain/backtrace) recording.
- See `--call-graph` section in perf-record and perf-report
- man pages for details. The ones that are most useful in 'perf trace'
- are 'dwarf' and 'lbr', where available, try: 'perf trace --call-graph dwarf'.
- Using this will, for the root user, bump the value of --mmap-pages to 4
- times the maximum for non-root users, based on the kernel.perf_event_mlock_kb
- sysctl. This is done only if the user doesn't specify a --mmap-pages value.
- --kernel-syscall-graph::
- Show the kernel callchains on the syscall exit path.
- --max-events=N::
- Stop after processing N events. Note that strace-like events are considered
- only at exit time or when a syscall is interrupted, i.e. in those cases this
- option is equivalent to the number of lines printed.
- --switch-on EVENT_NAME::
- Only consider events after this event is found.
- --switch-off EVENT_NAME::
- Stop considering events after this event is found.
- --show-on-off-events::
- Show the --switch-on/off events too.
- --max-stack::
- Set the stack depth limit when parsing the callchain, anything
- beyond the specified depth will be ignored. Note that at this point
- this is just about the presentation part, i.e. the kernel is still
- not limiting, the overhead of callchains needs to be set via the
- knobs in --call-graph dwarf.
- Implies '--call-graph dwarf' when --call-graph not present on the
- command line, on systems where DWARF unwinding was built in.
- Default: /proc/sys/kernel/perf_event_max_stack when present for
- live sessions (without --input/-i), 127 otherwise.
- --min-stack::
- Set the stack depth limit when parsing the callchain, anything
- below the specified depth will be ignored. Disabled by default.
- Implies '--call-graph dwarf' when --call-graph not present on the
- command line, on systems where DWARF unwinding was built in.
- --print-sample::
- Print the PERF_RECORD_SAMPLE PERF_SAMPLE_ info for the
- raw_syscalls:sys_{enter,exit} tracepoints, for debugging.
- --proc-map-timeout::
- When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
- because the file may be huge. A time out is needed in such cases.
- This option sets the time out limit. The default value is 500 ms.
- --sort-events::
- Do sorting on batches of events, use when noticing out of order events that
- may happen, for instance, when a thread gets migrated to a different CPU
- while processing a syscall.
- --libtraceevent_print::
- Use libtraceevent to print tracepoint arguments. By default 'perf trace' uses
- the same beautifiers used in the strace-like enter+exit lines to augment the
- tracepoint arguments.
- --map-dump::
- Dump BPF maps setup by events passed via -e, for instance the augmented_raw_syscalls
- living in tools/perf/examples/bpf/augmented_raw_syscalls.c. For now this
- dumps just boolean map values and integer keys, in time this will print in hex
- by default and use BTF when available, as well as use functions to do pretty
- printing using the existing 'perf trace' syscall arg beautifiers to map integer
- arguments to strings (pid to comm, syscall id to syscall name, etc).
- PAGEFAULTS
- ----------
- When tracing pagefaults, the format of the trace is as follows:
- <min|maj>fault [<ip.symbol>+<ip.offset>] => <[email protected]> (<map type><addr level>).
- - min/maj indicates whether fault event is minor or major;
- - ip.symbol shows symbol for instruction pointer (the code that generated the
- fault); if no debug symbols available, perf trace will print raw IP;
- - addr.dso shows DSO for the faulted address;
- - map type is either 'd' for non-executable maps or 'x' for executable maps;
- - addr level is either 'k' for kernel dso or '.' for user dso.
- For symbols resolution you may need to install debugging symbols.
- Please be aware that duration is currently always 0 and doesn't reflect actual
- time it took for fault to be handled!
- When --verbose specified, perf trace tries to print all available information
- for both IP and fault address in the form of dso@symbol+offset.
- EXAMPLES
- --------
- Trace only major pagefaults:
- $ perf trace --no-syscalls -F
- Trace syscalls, major and minor pagefaults:
- $ perf trace -F all
- 1416.547 ( 0.000 ms): python/20235 majfault [CRYPTO_push_info_+0x0] => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0@0x61be0 (x.)
- As you can see, there was major pagefault in python process, from
- CRYPTO_push_info_ routine which faulted somewhere in libcrypto.so.
- Trace the first 4 open, openat or open_by_handle_at syscalls (in the future more syscalls may match here):
- $ perf trace -e open* --max-events 4
- [root@jouet perf]# trace -e open* --max-events 4
- 2272.992 ( 0.037 ms): gnome-shell/1370 openat(dfd: CWD, filename: /proc/self/stat) = 31
- 2277.481 ( 0.139 ms): gnome-shell/3039 openat(dfd: CWD, filename: /proc/self/stat) = 65
- 3026.398 ( 0.076 ms): gnome-shell/3039 openat(dfd: CWD, filename: /proc/self/stat) = 65
- 4294.665 ( 0.015 ms): sed/15879 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
- $
- Trace the first minor page fault when running a workload:
- # perf trace -F min --max-stack=7 --max-events 1 sleep 1
- 0.000 ( 0.000 ms): sleep/18006 minfault [__clear_user+0x1a] => 0x5626efa56080 (?k)
- __clear_user ([kernel.kallsyms])
- load_elf_binary ([kernel.kallsyms])
- search_binary_handler ([kernel.kallsyms])
- __do_execve_file.isra.33 ([kernel.kallsyms])
- __x64_sys_execve ([kernel.kallsyms])
- do_syscall_64 ([kernel.kallsyms])
- entry_SYSCALL_64 ([kernel.kallsyms])
- #
- Trace the next min page page fault to take place on the first CPU:
- # perf trace -F min --call-graph=dwarf --max-events 1 --cpu 0
- 0.000 ( 0.000 ms): Web Content/17136 minfault [js::gc::Chunk::fetchNextDecommittedArena+0x4b] => 0x7fbe6181b000 (?.)
- js::gc::FreeSpan::initAsEmpty (inlined)
- js::gc::Arena::setAsNotAllocated (inlined)
- js::gc::Chunk::fetchNextDecommittedArena (/usr/lib64/firefox/libxul.so)
- js::gc::Chunk::allocateArena (/usr/lib64/firefox/libxul.so)
- js::gc::GCRuntime::allocateArena (/usr/lib64/firefox/libxul.so)
- js::gc::ArenaLists::allocateFromArena (/usr/lib64/firefox/libxul.so)
- js::gc::GCRuntime::tryNewTenuredThing<JSString, (js::AllowGC)1> (inlined)
- js::AllocateString<JSString, (js::AllowGC)1> (/usr/lib64/firefox/libxul.so)
- js::Allocate<JSThinInlineString, (js::AllowGC)1> (inlined)
- JSThinInlineString::new_<(js::AllowGC)1> (inlined)
- AllocateInlineString<(js::AllowGC)1, unsigned char> (inlined)
- js::ConcatStrings<(js::AllowGC)1> (/usr/lib64/firefox/libxul.so)
- [0x18b26e6bc2bd] (/tmp/perf-17136.map)
- #
- Trace the next two sched:sched_switch events, four block:*_plug events, the
- next block:*_unplug and the next three net:*dev_queue events, this last one
- with a backtrace of at most 16 entries, system wide:
- # perf trace -e sched:*switch/nr=2/,block:*_plug/nr=4/,block:*_unplug/nr=1/,net:*dev_queue/nr=3,max-stack=16/
- 0.000 :0/0 sched:sched_switch:swapper/2:0 [120] S ==> rcu_sched:10 [120]
- 0.015 rcu_sched/10 sched:sched_switch:rcu_sched:10 [120] R ==> swapper/2:0 [120]
- 254.198 irq/50-iwlwifi/680 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff93498051f600 len=66
- __dev_queue_xmit ([kernel.kallsyms])
- 273.977 :0/0 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff93498051f600 len=78
- __dev_queue_xmit ([kernel.kallsyms])
- 274.007 :0/0 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff93498051ff00 len=78
- __dev_queue_xmit ([kernel.kallsyms])
- 2930.140 kworker/u16:58/2722 block:block_plug:[kworker/u16:58]
- 2930.162 kworker/u16:58/2722 block:block_unplug:[kworker/u16:58] 1
- 4466.094 jbd2/dm-2-8/748 block:block_plug:[jbd2/dm-2-8]
- 8050.123 kworker/u16:30/2694 block:block_plug:[kworker/u16:30]
- 8050.271 kworker/u16:30/2694 block:block_plug:[kworker/u16:30]
- #
- SEE ALSO
- --------
- linkperf:perf-record[1], linkperf:perf-script[1]
|