Pull rdma updates from Doug Ledford:
"Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed
through the RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to
dependencies, acknowledged by Bruce)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
IB/mlx5: Unify CQ create flags check
IB/mlx5: Expose Raw Packet QP to user space consumers
{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
IB/mlx5: Add Raw Packet QP query functionality
IB/mlx5: Add create and destroy functionality for Raw Packet QP
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
IB/mlx5: Allocate a Transport Domain for each ucontext
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
net/mlx5_core: Add RQ and SQ event handling
net/mlx5_core: Export transport objects
IB/mlx5: Expose CQE version to user-space
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
IB/sa: Fix netlink local service GFP crash
IB/srpt: Remove redundant wc array
IB/qib: Improve ipoib UD performance
IB/mlx4: Advertise RoCE v2 support
IB/mlx4: Create and use another QP1 for RoCEv2
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
...
Commit 71d6de64fe ("perf test: Fix hist testcases when kptr_restrict is on")
solves a double free problem when 'perf test hist' calling
setup_fake_machine(). However, the result is still incorrect. For example:
$ ./perf test -v 'filtering hist entries'
25: Test filtering hist entries :
--- start ---
test child forked, pid 4186
Cannot create kernel maps
test child finished with 0
---- end ----
Test filtering hist entries: Ok
In this case the body of this test is not get executed at all, but the
result is 'Ok'.
Actually, in setup_fake_machine() there's no need to create real kernel
maps. What we want are the fake maps. This patch removes the
machine__create_kernel_maps() in setup_fake_machine(), so it won't be
affected by kptr_restrict setting.
Test result:
$ cat /proc/sys/kernel/kptr_restrict
1
$ ~/perf test -v hist
15: Test matching and linking multiple hists :
--- start ---
test child forked, pid 24031
test child finished with 0
---- end ----
Test matching and linking multiple hists: Ok
[SNIP]
Suggested-and-Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1452520124-2073-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On some system the perf-config is broken, causes link failure like this:
/usr/lib64/python2.7/config/libpython2.7.a(posixmodule.o): In function `posix_forkpty':
/opt/wangnan/yocto-build/tmp-eglibc/work/x86_64-oe-linux/python/2.7.3-r0.3.1/Python-2.7.3/./Modules/posixmodule.c:3816: undefined reference to `forkpty'
/usr/lib64/python2.7/config/libpython2.7.a(posixmodule.o): In function `posix_openpty':
/opt/wangnan/yocto-build/tmp-eglibc/work/x86_64-oe-linux/python/2.7.3-r0.3.1/Python-2.7.3/./Modules/posixmodule.c:3756: undefined reference to `openpty'
collect2: error: ld returned 1 exit status
make[1]: *** [/home/wangnan/kernel-hydrogen/tools/perf/out/perf] Error 1
make: *** [all] Error 2
$ python-config --libs
-lpthread -ldl -lpthread -lutil -lm -lpython2.7
In this case a '-lutil' should be appended to -lpython2.7.
(I know we have --start-group and --end-group. I can see them in command
line of collect2 by strace. However it doesn't work. Seems I have a
broken environment?)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1452520124-2073-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently perf report only shows a help message "For a higher level
overview, try: perf report --sort comm,dso" unconditionally (even if
the sort keys were used). Add more help tips and show randomly.
Load tips from ${prefix}/share/doc/perf-tip/tips.txt file.
$ perf report | tail
0.10% swapper [kernel.vmlinux] [k] irq_exit
0.09% swapper [kernel.vmlinux] [k] flush_smp_call_function_queue
0.08% swapper [kernel.vmlinux] [k] native_write_msr_safe
0.03% swapper [kernel.vmlinux] [k] group_sched_in
0.01% perf [kernel.vmlinux] [k] native_write_msr_safe
#
# (Tip: Search options using a keyword: perf report -h <keyword>)
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1452166913-27046-1-git-send-email-namhyung@kernel.org
[ Renamed it to perf_tip() and the parameter dirname to dirpath to fix the build on older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The event group view feature is to see related events together. To use
the group view, events should be recorded as a group with a dedicated
syntax of surrounding events by braces (-e '{ evt1, evt2, ... }').
Also 'perf report' also requires the --group option to enable it.
However it's almost always beneficial to use the group view to see the
group events as it's more expressive. And I think it's more natural to
see events together if they are recorded as a group.
Thus this patch changes the default value to enable it. If users don't
want to see like it and keep the original behavior, they can set the
report.group config variable to false and/or use --no-group option in
the 'perf report' command line.
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1448807057-3506-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Before:
$ perf test -v cqm
48: Test intel cqm nmi context read :
--- start ---
test child forked, pid 1681
parse_events failed
test child finished with -2
---- end ----
Test intel cqm nmi context read: Skip
$
After:
$ perf test -v cqm
48: Test intel cqm nmi context read :
--- start ---
test child forked, pid 1681
parse_events failed, is "intel_cqm/llc_occupancy/" available?
test child finished with -2
---- end ----
Test intel cqm nmi context read: Skip
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-eidpiv5x4nkbsx37xwikbnir@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were asking for a 4kHz sample_freq, making the test fail needlessly
when the system reduced /proc/sys/kernel/perf_event_max_sample_rate
below that.
Before:
# perf test -vv dummy
23: Test using a dummy software event to keep tracking :
--- start ---
test child forked, pid 32421
------------------------------------------------------------
perf_event_attr:
type 1
size 112
config 0x9
{ sample_period, sample_freq } 4000
sample_type IP|TID|ID|PERIOD
<SNIP>
sys_perf_event_open failed, error -22
Unable to open dummy and cycles event
test child finished with -2
---- end ----
Test using a dummy software event to keep tracking: Skip
#
[root@zoo ~]# cat /proc/sys/kernel/perf_event_max_sample_rate
1000
After:
[root@zoo ~]# perf test dummy
23: Test using a dummy software event to keep tracking : Ok
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-487iquegrs2379e5n0pi0tcp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixing this problem, introduced recently:
$ perf test python
16: Try 'import perf' in python, checking link problems : FAILED!
In verbose mode we find out what is missing:
$ perf test -v python
16: Try 'import perf' in python, checking link problems :
--- start ---
test child forked, pid 24894
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /tmp/build/perf/python/perf.so: undefined symbol: find_next_bit
test child finished with -1
---- end ----
Try 'import perf' in python, checking link problems: FAILED!
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: f77b57ad4f ("perf cpu_map: Add cpu_map__new_event function")
Link: http://lkml.kernel.org/n/tip-rajx0zkz6czdrnvvwf0jp76p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were asking for a 4kHz sample_freq, making the test fail needlessly
when the system reduced /proc/sys/kernel/perf_event_max_sample_rate
below that.
In this test we only look at the PERF_SAMPLE_TIME fields in PERF_RECORD_
meta events, no need to set sample_freq.
Thanks to Namhyung for suggesting that max_sample_rate could be the
reason for the test failure, seeing the 'perf test -vv' output I sent.
Before:
# echo 1000 > /proc/sys/kernel/perf_event_max_sample_rate
# perf test TSC
45: Test converting perf time to TSC : FAILED!
After:
# perf test TSC
45: Test converting perf time to TSC : Ok
# cat /proc/sys/kernel/perf_event_max_sample_rate
1000
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-lcob05qhawkuvsyuu9g1fld5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding stat-cpi.py as an example of how to do stat scripting.
It computes the CPI metrics from cycles and instructions events.
The CPI is based performance metric showing the Cycles Per Instructions
ratio, which helps to identify cycles-hungry code.
Following stat record/report/script combinations could be used:
- get CPI for given workload
$ perf stat -e cycles,instructions record ls
SNIP
Performance counter stats for 'ls':
2,904,431 cycles
3,346,878 instructions # 1.15 insns per cycle
0.001782686 seconds time elapsed
$ perf script -s ./scripts/python/stat-cpi.py
0.001783: cpu -1, thread -1 -> cpi 0.867803 (2904431/3346878)
$ perf stat -e cycles,instructions record ls | perf script -s ./scripts/python/stat-cpi.py
SNIP
0.001730: cpu -1, thread -1 -> cpi 0.869026 (2928292/3369627)
- get CPI systemwide:
$ perf stat -e cycles,instructions -a -I 1000 record sleep 3
# time counts unit events
1.000158618 594,274,711 cycles (100.00%)
1.000158618 441,898,250 instructions
2.000350973 567,649,705 cycles (100.00%)
2.000350973 432,669,206 instructions
3.000559210 561,940,430 cycles (100.00%)
3.000559210 420,403,465 instructions
3.000670798 780,105 cycles (100.00%)
3.000670798 326,516 instructions
$ perf script -s ./scripts/python/stat-cpi.py
1.000159: cpu -1, thread -1 -> cpi 1.344823 (594274711/441898250)
2.000351: cpu -1, thread -1 -> cpi 1.311972 (567649705/432669206)
3.000559: cpu -1, thread -1 -> cpi 1.336669 (561940430/420403465)
3.000671: cpu -1, thread -1 -> cpi 2.389178 (780105/326516)
$ perf stat -e cycles,instructions -a -I 1000 record sleep 3 | perf script -s ./scripts/python/stat-cpi.py
1.000202: cpu -1, thread -1 -> cpi 1.035091 (940778881/908885530)
2.000392: cpu -1, thread -1 -> cpi 1.442600 (627493992/434974455)
3.000545: cpu -1, thread -1 -> cpi 1.353612 (741463930/547766890)
3.000622: cpu -1, thread -1 -> cpi 2.642110 (784083/296764)
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1452077397-31958-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add support to get stat events data in perf python scripts.
The python script shall implement the following new interface to process
stat data:
def stat__<event_name>_[<modifier>](cpu, thread, time, val, ena, run):
- is called for every stat event for given counter,
if user monitors 'cycles,instructions:u" following
callbacks should be defined:
def stat__cycles(cpu, thread, time, val, ena, run):
def stat__instructions_u(cpu, thread, time, val, ena, run):
def stat__interval(time):
- is called for every interval with its time,
in non interval mode it's called after last
stat event with total measured time in ns
The rest of the current interface stays untouched..
Please check example CPI metrics script in following patch
with command line examples in changelogs.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1452028152-26762-8-git-send-email-jolsa@kernel.org
[ Rename 'time' parameters to 'tstamp', to fix the build in older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>