When perf report on TUI shows callchain it checks first node has
siblings to determine whether it needs to print percentage value.
But it missed a case that first node is NULL. So sometimes it segfaults
like below:
$ perf top -g
perf: Segmentation fault
-------- backtrace --------
perf[0x4fcefb]
/usr/lib/libc.so.6(+0x33b20)[0x7f2a35839b20]
perf(rb_next+0x8)[0x47d3d8]
perf[0x4f6058]
perf[0x4f833b]
perf[0x4f8610]
perf[0x4f209e]
perf(ui_browser__run+0x3a)[0x4f2e6a]
perf[0x4f94ee]
perf(perf_evlist__tui_browse_hists+0x94)[0x4fbbf4]
perf[0x444d10]
/usr/lib/libpthread.so.0(+0x7314)[0x7f2a37070314]
/usr/lib/libc.so.6(clone+0x6d)[0x7f2a358ee5bd]
$ addr2line -e `which perf` 0x4f6058
/home/namhyung/project/linux/tools/perf/ui/browsers/hists.c:553
I don't know why the backtrace didn't print some symbols..
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 4087d11cd9 ("perf hists browser: Print overhead percent value for first-level callchain")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1419401076-21700-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently perf report on TUI doesn't print percent for first-level
callchain entry.
I guess it (wrongly) assumes that there's only a single callchain in the
first level.
This patch fixes it by handling the first level callchains same as
others - if it's not 100% it should print the percent value.
Also it'll affect other callchains in the other way around - if it's
100% (single callchain) it should not print the percentage.
Before:
- 30.95% 6.84% abc2 abc2 [.] a
- a
- 70.00% c
- 100.00% apic_timer_interrupt
smp_apic_timer_interrupt
local_apic_timer_interrupt
hrtimer_interrupt
...
+ 30.00% b
+ __libc_start_main
After:
- 30.95% 6.84% abc2 abc2 [.] a
- 77.90% a
- 70.00% c
- apic_timer_interrupt
smp_apic_timer_interrupt
local_apic_timer_interrupt
hrtimer_interrupt
...
+ 30.00% b
+ 22.10% __libc_start_main
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1416816807-6495-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When 'perf top' is run, one can't easily find a difference
between -z option and normal output.
So I added a visual cue to know whether it is the zeroing or not.
Output is as below.
Before:
$ perf top
Samples: 61K of event 'cycles', Event count (approx.): 3908136933
Overhead Shared Object Symbol
1.42% firefox [.] 0x0000000000011e76
1.32% libpthread-2.17.so [.] pthread_mutex_lock
If you press key 'z' or run with zero option like '$ perf top --zero', it is as below.
After:
Samples: 61K of event 'cycles', Event count (approx.): 3908136933 [z]
Overhead Shared Object Symbol
1.42% firefox [.] 0x0000000000011e76
1.32% libpthread-2.17.so [.] pthread_mutex_lock
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1412665995-26359-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The hist_browser__show_callchain() and friends don't need to be that
complex. They're splitted in 3 pieces - one for traversing top-level
tree, other one for special casing first chains in the top-level
entries, and last one for recursive traversing inner trees. It led to
code duplication and unnecessary complexity IMHO.
Simplify the function and consolidate the logic into a single function
- it can recursively call itself. A little difference in printing
callchains in top-level tree can be handled with a small change.
It should have no functional change.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1408583746-5540-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The currently when perf TUI report shows callchain, the first level
chains have bogus '+' sign even though only the last one has children.
Since they are on a single line of the chain, toggling intermediate
entries has no effect. Fix it to show '+' sign at the last entry only.
Note that non-first level callchain entries don't have this problem.
Before:
---------------------------------------------------------------------------
Children Self Command Shared Object Symbols
- 40.70% 0.00% swapper [kernel.kallsyms] [k] cpuidle_wrap_enter
+ cpuidle_wrap_enter
+ cpuidle_enter_tk
+ cpuidle_idle_call
+ cpu_idle
After:
---------------------------------------------------------------------------
Children Self Command Shared Object Symbols
- 40.70% 0.00% swapper [kernel.kallsyms] [k] cpuidle_wrap_enter
cpuidle_wrap_enter
cpuidle_enter_tk
cpuidle_idle_call
+ cpu_idle
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1407909761-10822-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The ui_browser->height is about the whole browser "window", including
any header, status lines or any other space needed for some "Yes", "No",
etc buttons a descendent browser, like hist_browser, may have.
Since the navigation is done mostly on the ui_browser methods, it needs
to know how many rows are on the screen, while details about what other
components are, say, if a header (that may be composed of multiple
lines, etc) is present.
Besides this we'll need to add a ui_browser->refresh_dimensions() hook
so that browsers like hist_browser can update ->rows in response to
screen resizes, this will come in a follow up patch.
This patch just adds ->rows and updates it when updating ->height, keeps
using ->height for the only other widget that can come with ui_browser,
the scrollbar, that goes on using all the height on the rightmost column
in the screen, using ->rows for the keyboard navigation needs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-xexmwg1mv7u03j5imn66jdak@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
* Add --repeat global option to 'perf bench' to be used in benchmarks
such as the existing 'futex' one, that was modified to use it instead
of a local option. (Davidlohr Bueso)
* Fix fd -> pathname resolution in 'trace', be it using /proc or
a vfs_getname probe point. (Arnaldo Carvalho de Melo)
* Add suggestion of how to set perf_event_paranoid sysctl, to help
non-root users trying tools like 'trace' to get a working environment.
(Arnaldo Carvalho de Melo)
Fixes:
* Fix memory leak in the 'sched-messaging' perf bench test. (Davidlohr Bueso)
* The -o and -n 'perf bench mem' options are mutually exclusive, emit error
when both are specified. (Davidlohr Bueso)
* Fix scrollbar refresh row index in the ui browser, problem exposed now
that headers will be added and will be allowed to be switched on/off.
(Jiri Olsa)
Cleanups:
* Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo)
* Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de Melo)
* No need to reimplement err() in 'perf bench sched-messaging', drop barf().
(Davidlohr Bueso).
* Remove ev_name argument from perf_evsel__hists_browse, can be obtained
from the other parameters. (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After output/sort fields refactoring, it's expensive
to check the elide bool in its current location inside
the 'struct sort_entry'.
The perf_hpp__should_skip function gets highly noticable in
workloads with high number of output/sort fields, like for:
$ perf report -i perf-test.data -F overhead,sample,period,comm,pid,dso,symbol,cpu --stdio
Performance report:
9.70% perf [.] perf_hpp__should_skip
Moving the elide bool into the 'struct perf_hpp_fmt', which
makes the perf_hpp__should_skip just single struct read.
Got speedup of around 22% for my test perf.data workload.
The change should not harm any other workload types.
Performance counter stats for (10 runs):
before:
358,319,732,626 cycles ( +- 0.55% )
467,129,581,515 instructions # 1.30 insns per cycle ( +- 0.00% )
150.943975206 seconds time elapsed ( +- 0.62% )
now:
278,785,972,990 cycles ( +- 0.12% )
370,146,797,640 instructions # 1.33 insns per cycle ( +- 0.00% )
116.416670507 seconds time elapsed ( +- 0.31% )
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20140601142622.GA9131@krava.brq.redhat.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The hists__filter_entries() function is called when down arrow key is
pressed for navigating through the entries in TUI. It has a check for
filtering out entries that have very small overhead (under min_pcnt).
However it just assumed the entries are sorted by the overhead so when
it saw such a small overheaded entry, it just stopped navigating as an
optimization. But it's not true anymore due to new --fields and
--sort optoin behavior and this case users cannot go down to a next
entry if ther's an entry with small overhead in-between.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-14-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Until now the hpp and sort functions do similar jobs different ways.
Since the sort functions converted/wrapped to hpp formats it can do
the job in a uniform way.
The perf_hpp__sort_list has a list of hpp formats to sort entries and
the perf_hpp__list has a list of hpp formats to print output result.
To have a backward compatibility, it automatically adds 'overhead'
field in front of sort list. And then all of fields in sort list
added to the output list (if it's not already there).
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-7g3h86woz2sckg3h1lj42ygj@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The hist_browser__reset() is only called right after a filter is
applied so it needs to udpate browser->nr_entries properly. We cannot
use hists->nr_non_filtered_entreis directly since it's possible that
such entries are also filtered out by minimum percentage limit.
In addition when a filter is used for perf top, hist browser's
nr_entries field was not updated after applying the filter. But it
needs to be updated as new samples are coming.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1398327843-31845-11-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The nr_entries variable is increased inside the loop in the function
but it always count the first entry regardless of it's filtered or
not; caused an off-by-one error.
It'd become a problem especially there's no entry at all - it'd get a
segfault during referencing a NULL pointer.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1398327843-31845-9-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The --percentage option is for controlling overhead percentage
displayed. It can only receive either of "relative" or "absolute".
"relative" means it's relative to filtered entries only so that the
sum of shown entries will be always 100%. "absolute" means it retains
the original value before and after the filter is applied.
$ perf report -s comm
# Overhead Command
# ........ ............
#
74.19% cc1
7.61% gcc
6.11% as
4.35% sh
4.14% make
1.13% fixdep
...
$ perf report -s comm -c cc1,gcc --percentage absolute
# Overhead Command
# ........ ............
#
74.19% cc1
7.61% gcc
$ perf report -s comm -c cc1,gcc --percentage relative
# Overhead Command
# ........ ............
#
90.69% cc1
9.31% gcc
Note that it has zero effect if no filter was applied.
Suggested-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-3-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>