For perf list, the CPU core PMU HW event ordering is such that not all
events may will be listed adjacent - consider this example:
$ tools/perf/perf list
List of pre-defined events (to be used in -e):
duration_time [Tool event]
branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
branch-misses OR cpu/branch-misses/ [Kernel PMU event]
bus-cycles OR cpu/bus-cycles/ [Kernel PMU event]
cache-misses OR cpu/cache-misses/ [Kernel PMU event]
cache-references OR cpu/cache-references/ [Kernel PMU event]
cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event]
cstate_core/c3-residency/ [Kernel PMU event]
cstate_core/c6-residency/ [Kernel PMU event]
cstate_core/c7-residency/ [Kernel PMU event]
cstate_pkg/c2-residency/ [Kernel PMU event]
cstate_pkg/c3-residency/ [Kernel PMU event]
cstate_pkg/c6-residency/ [Kernel PMU event]
cstate_pkg/c7-residency/ [Kernel PMU event]
cycles-ct OR cpu/cycles-ct/ [Kernel PMU event]
cycles-t OR cpu/cycles-t/ [Kernel PMU event]
el-abort OR cpu/el-abort/ [Kernel PMU event]
el-capacity OR cpu/el-capacity/ [Kernel PMU event]
Notice in the above example how the cstate_core PMU events are mixed in
the middle of the CPU core events.
For my arm64 platform, all the uncore events get mixed in, making the list
very disorganised:
page-faults OR faults [Software event]
task-clock [Software event]
duration_time [Tool event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
br_mis_pred OR armv8_pmuv3_0/br_mis_pred/ [Kernel PMU event]
br_mis_pred_retired OR armv8_pmuv3_0/br_mis_pred_retired/ [Kernel PMU event]
br_pred OR armv8_pmuv3_0/br_pred/ [Kernel PMU event]
br_retired OR armv8_pmuv3_0/br_retired/ [Kernel PMU event]
br_return_retired OR armv8_pmuv3_0/br_return_retired/ [Kernel PMU event]
bus_access OR armv8_pmuv3_0/bus_access/ [Kernel PMU event]
bus_cycles OR armv8_pmuv3_0/bus_cycles/ [Kernel PMU event]
cid_write_retired OR armv8_pmuv3_0/cid_write_retired/ [Kernel PMU event]
cpu_cycles OR armv8_pmuv3_0/cpu_cycles/ [Kernel PMU event]
dtlb_walk OR armv8_pmuv3_0/dtlb_walk/ [Kernel PMU event]
exc_return OR armv8_pmuv3_0/exc_return/ [Kernel PMU event]
exc_taken OR armv8_pmuv3_0/exc_taken/ [Kernel PMU event]
hisi_sccl1_ddrc0/act_cmd/ [Kernel PMU event]
hisi_sccl1_ddrc0/flux_rcmd/ [Kernel PMU event]
hisi_sccl1_ddrc0/flux_rd/ [Kernel PMU event]
hisi_sccl1_ddrc0/flux_wcmd/ [Kernel PMU event]
hisi_sccl1_ddrc0/flux_wr/ [Kernel PMU event]
hisi_sccl1_ddrc0/pre_cmd/ [Kernel PMU event]
hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event]
...
hisi_sccl7_l3c21/wr_hit_cpipe/ [Kernel PMU event]
hisi_sccl7_l3c21/wr_hit_spipe/ [Kernel PMU event]
hisi_sccl7_l3c21/wr_spipe/ [Kernel PMU event]
inst_retired OR armv8_pmuv3_0/inst_retired/ [Kernel PMU event]
inst_spec OR armv8_pmuv3_0/inst_spec/ [Kernel PMU event]
itlb_walk OR armv8_pmuv3_0/itlb_walk/ [Kernel PMU event]
l1d_cache OR armv8_pmuv3_0/l1d_cache/ [Kernel PMU event]
l1d_cache_refill OR armv8_pmuv3_0/l1d_cache_refill/ [Kernel PMU event]
l1d_cache_wb OR armv8_pmuv3_0/l1d_cache_wb/ [Kernel PMU event]
l1d_tlb OR armv8_pmuv3_0/l1d_tlb/ [Kernel PMU event]
l1d_tlb_refill OR armv8_pmuv3_0/l1d_tlb_refill/ [Kernel PMU event]
So the events are list alphabetically. However, CPU core event listing is
special from commit dc098b35b5 ("perf list: List kernel supplied event
aliases"), in that the alias and full event is shown (in that order).
As such, the core events may become sparse.
Improve this by grouping the CPU core events and ensure that they are
listed first for kernel PMU events. For the first example, above, this
now looks like:
duration_time [Tool event]
branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
branch-misses OR cpu/branch-misses/ [Kernel PMU event]
bus-cycles OR cpu/bus-cycles/ [Kernel PMU event]
cache-misses OR cpu/cache-misses/ [Kernel PMU event]
cache-references OR cpu/cache-references/ [Kernel PMU event]
cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event]
cycles-ct OR cpu/cycles-ct/ [Kernel PMU event]
cycles-t OR cpu/cycles-t/ [Kernel PMU event]
el-abort OR cpu/el-abort/ [Kernel PMU event]
el-capacity OR cpu/el-capacity/ [Kernel PMU event]
el-commit OR cpu/el-commit/ [Kernel PMU event]
el-conflict OR cpu/el-conflict/ [Kernel PMU event]
el-start OR cpu/el-start/ [Kernel PMU event]
instructions OR cpu/instructions/ [Kernel PMU event]
mem-loads OR cpu/mem-loads/ [Kernel PMU event]
mem-stores OR cpu/mem-stores/ [Kernel PMU event]
ref-cycles OR cpu/ref-cycles/ [Kernel PMU event]
topdown-fetch-bubbles OR cpu/topdown-fetch-bubbles/ [Kernel PMU event]
topdown-recovery-bubbles OR cpu/topdown-recovery-bubbles/ [Kernel PMU event]
topdown-slots-issued OR cpu/topdown-slots-issued/ [Kernel PMU event]
topdown-slots-retired OR cpu/topdown-slots-retired/ [Kernel PMU event]
topdown-total-slots OR cpu/topdown-total-slots/ [Kernel PMU event]
tx-abort OR cpu/tx-abort/ [Kernel PMU event]
tx-capacity OR cpu/tx-capacity/ [Kernel PMU event]
tx-commit OR cpu/tx-commit/ [Kernel PMU event]
tx-conflict OR cpu/tx-conflict/ [Kernel PMU event]
tx-start OR cpu/tx-start/ [Kernel PMU event]
cstate_core/c3-residency/ [Kernel PMU event]
cstate_core/c6-residency/ [Kernel PMU event]
cstate_core/c7-residency/ [Kernel PMU event]
cstate_pkg/c2-residency/ [Kernel PMU event]
cstate_pkg/c3-residency/ [Kernel PMU event]
cstate_pkg/c6-residency/ [Kernel PMU event]
cstate_pkg/c7-residency/ [Kernel PMU event]
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1592384514-119954-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
These are broadly useful but required to handle TMA metrics. For example
encoding Ports_Utilization from:
https://download.01.org/perfmon/TMA_Metrics.csv
requires '<'.
{
"BriefDescription": "This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related). Two distinct categories can be attributed into this metric: (1) heavy data-dependency among contiguous instructions would manifest in this metric - such cases are often referred to as low Instruction Level Parallelism (ILP). (2) Contention on some hardware execution unit other than Divider. For example; when there are too many multiply operations.",
"MetricExpr": "( ( cpu@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ + cpu@EXE_ACTIVITY.1_PORTS_UTIL@ + ( cpu@EXE_ACTIVITY.2_PORTS_UTIL@ * ( ( ( cpu@UOPS_RETIRED.RETIRE_SLOTS@ ) / ( cpu@CPU_CLK_UNHALTED.THREAD@ ) ) / ( ( 4.000000 ) + 1.000000 ) ) ) ) / ( cpu@CPU_CLK_UNHALTED.THREAD@ ) if ( cpu@ARITH.DIVIDER_ACTIVE\\,cmask\\=1@ < cpu@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ ) else ( ( cpu@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ + cpu@EXE_ACTIVITY.1_PORTS_UTIL@ + ( cpu@EXE_ACTIVITY.2_PORTS_UTIL@ * ( ( ( cpu@UOPS_RETIRED.RETIRE_SLOTS@ ) / ( cpu@CPU_CLK_UNHALTED.THREAD@ ) ) / ( ( 4.000000 ) + 1.000000 ) ) ) ) - cpu@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ ) / ( cpu@CPU_CLK_UNHALTED.THREAD@ ) )",
"MetricGroup": "Topdown_Group_Ports_Utilization",
"MetricName": "Topdown_Metric_Ports_Utilization"
},
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200610235823.52557-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the past this wasn't needed as the libaudit based code would use just
one field, and the alternative constructor would fill in all the fields,
but now that even when using the libaudit based method we need the other
fields, switch to zalloc() to make sure the other fields are zeroed at
instantiation time.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we moved to a syscalltbl generated from the kernel syscall tables
(arch/..../syscall*.tbl) the idea was to either use it, when having the
generator (e.g. tools/perf/arch/x86/entry/syscalls/syscalltbl.sh), or
falling back to the previous audit-libs based way of mapping syscall ids
to strings and the other way around.
At first we just needed the audit_detect_machine() return to then use it
to the str->id/id->str, or the other fields for the now used by default
in the most well developed arches method of using the syscall table
generator.
The problem is that then the libaudit code fell into disrepair, and
architectures where it is the method used are not working.
Now, with NO_SYSCALL_TABLE=1 being possible to pass on the make command
line we can automate the testing of that method even on x86-64, arm64,
etc.
And doing it I noted that we actually use fields in both entries in the
union, oops, so ditch the union, as we need all those fields at the same
time.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Context switch events are added automatically by Intel PT and Coresight.
Make it possible to suppress them. That is useful for tracing the
scheduler without the disturbance that the switch event processing
creates.
Example:
Prerequisites:
$ which perf
~/bin/perf
$ sudo setcap "cap_sys_rawio,cap_sys_admin,cap_sys_ptrace,cap_syslog,cap_ipc_lock=ep" ~/bin/perf
$ sudo chmod +r /proc/kcore
Before:
$ perf record --no-switch-events --kcore -a -e intel_pt//k -- sleep 0.001
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.938 MB perf.data ]
$ perf script -D | grep PERF_RECORD_SWITCH | wc -l
572
After:
$ perf record --no-switch-events --kcore -a -e intel_pt//k -- sleep 0.001
Warning:
Intel Processor Trace decoding will not be possible except for kernel tracing!
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.838 MB perf.data ]
$ perf script -D | grep PERF_RECORD_SWITCH | wc -l
0
$ sudo chmod go-r /proc/kcore
$ sudo setcap -r ~/bin/perf
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: http://lore.kernel.org/lkml/20200528120859.21604-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For a Java method signature like:
Ljava/lang/AbstractStringBuilder;appendChars(Ljava/lang/String;II)V
The demangler produces:
void class java.lang.AbstractStringBuilder.appendChars(class java.lang., shorttring., int, int)
The arguments should be (java.lang.String, int, int) but the demangler
interprets the "S" in String as the type code for "short". Correct this
and two other minor things:
- There is no "bool" type in Java, should be "boolean".
- The demangler prepends "class" to every Java class name. This is not
standard Java syntax and it wastes a lot of horizontal space if the
signature is long. Remove this as there isn't any ambiguity between
class names and primitives.
Committer notes:
This was split from a larger patch that also added a java demangler
'perf test' entry, that, before this patch shows the error being fixed
by it:
$ perf test java
65: Demangle Java : FAILED!
$ perf test -v java
Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
65: Demangle Java :
--- start ---
test child forked, pid 307264
FAILED: Ljava/lang/StringLatin1;equals([B[B)Z: bool class java.lang.StringLatin1.equals(byte[], byte[]) != boolean java.lang.StringLatin1.equals(byte[], byte[])
FAILED: Ljava/util/zip/ZipUtils;CENSIZ([BI)J: long class java.util.zip.ZipUtils.CENSIZ(byte[], int) != long java.util.zip.ZipUtils.CENSIZ(byte[], int)
FAILED: Ljava/util/regex/Pattern$BmpCharProperty;match(Ljava/util/regex/Matcher;ILjava/lang/CharSequence;)Z: bool class java.util.regex.Pattern$BmpCharProperty.match(class java.util.regex.Matcher., int, class java.lang., charhar, shortequence) != boolean java.util.regex.Pattern$BmpCharProperty.match(java.util.regex.Matcher, int, java.lang.CharSequence)
FAILED: Ljava/lang/AbstractStringBuilder;appendChars(Ljava/lang/String;II)V: void class java.lang.AbstractStringBuilder.appendChars(class java.lang., shorttring., int, int) != void java.lang.AbstractStringBuilder.appendChars(java.lang.String, int, int)
FAILED: Ljava/lang/Object;<init>()V: void class java.lang.Object<init>() != void java.lang.Object<init>()
test child finished with -1
---- end ----
Demangle Java: FAILED!
$
After applying this patch:
$ perf test java
65: Demangle Java : Ok
$
Signed-off-by: Nick Gasson <nick.gasson@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200427061520.24905-4-nick.gasson@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A metric group contains multiple metrics. These metrics may use the same
events. If metrics use separate events then it leads to more
multiplexing and overall metric counts fail to sum to 100%.
Modify how metrics are associated with events so that if the events in
an earlier group satisfy the current metric, the same events are used.
A record of used events is kept and at the end of processing unnecessary
events are eliminated.
Before:
$ perf stat -a -M TopDownL1 sleep 1
Performance counter stats for 'system wide':
920,211,343 uops_issued.any # 0.5 Backend_Bound (16.56%)
1,977,733,128 idq_uops_not_delivered.core (16.56%)
51,668,510 int_misc.recovery_cycles (16.56%)
732,305,692 uops_retired.retire_slots (16.56%)
1,497,621,849 cycles (16.56%)
721,098,274 uops_issued.any # 0.1 Bad_Speculation (16.79%)
1,332,681,791 cycles (16.79%)
552,475,482 uops_retired.retire_slots (16.79%)
47,708,340 int_misc.recovery_cycles (16.79%)
1,383,713,292 cycles
# 0.4 Frontend_Bound (16.76%)
2,013,757,701 idq_uops_not_delivered.core (16.76%)
1,373,363,790 cycles
# 0.1 Retiring (33.54%)
577,302,589 uops_retired.retire_slots (33.54%)
392,766,987 inst_retired.any # 0.3 IPC (50.24%)
1,351,873,350 cpu_clk_unhalted.thread (50.24%)
1,332,510,318 cycles
# 5330041272.0 SLOTS (49.90%)
1.006336145 seconds time elapsed
After:
$ perf stat -a -M TopDownL1 sleep 1
Performance counter stats for 'system wide':
765,949,145 uops_issued.any # 0.1 Bad_Speculation
# 0.5 Backend_Bound (50.09%)
1,883,830,591 idq_uops_not_delivered.core # 0.3 Frontend_Bound (50.09%)
48,237,080 int_misc.recovery_cycles (50.09%)
581,798,385 uops_retired.retire_slots # 0.1 Retiring (50.09%)
1,361,628,527 cycles
# 5446514108.0 SLOTS (50.09%)
391,415,714 inst_retired.any # 0.3 IPC (49.91%)
1,336,486,781 cpu_clk_unhalted.thread (49.91%)
1.005469298 seconds time elapsed
Note: Bad_Speculation + Backend_Bound + Frontend_Bound + Retiring = 100%
after, where as before it is 110%. After there are 2 groups, whereas
before there are 6. After the cycles event appears once, before it
appeared 5 times.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200520182011.32236-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>