Scheduler's task migration events don't work because they always
pass NULL regs perf_sw_event(). The event hence gets filtered
in perf_swevent_add().
Scheduler's context switches events use task_pt_regs() to get
the context when the event occured which is a wrong thing to
do as this won't give us the place in the kernel where we went
to sleep but the place where we left userspace. The result is
even more wrong if we switch from a kernel thread.
Use the hot regs snapshot for both events as they belong to the
non-interrupt/exception based events family. Unlike page faults
or so that provide the regs matching the exact origin of the event,
we need to save the current context.
This makes the task migration event working and fix the context
switch callchains and origin ip.
Example: perf record -a -e cs
Before:
10.91% ksoftirqd/0 0 [k] 0000000000000000
|
--- (nil)
perf_callchain
perf_prepare_sample
__perf_event_overflow
perf_swevent_overflow
perf_swevent_add
perf_swevent_ctx_event
do_perf_sw_event
__perf_sw_event
perf_event_task_sched_out
schedule
run_ksoftirqd
kthread
kernel_thread_helper
After:
23.77% hald-addon-stor [kernel.kallsyms] [k] schedule
|
--- schedule
|
|--60.00%-- schedule_timeout
| wait_for_common
| wait_for_completion
| blk_execute_rq
| scsi_execute
| scsi_execute_req
| sr_test_unit_ready
| |
| |--66.67%-- sr_media_change
| | media_changed
| | cdrom_media_changed
| | sr_block_media_changed
| | check_disk_change
| | cdrom_open
v2: Always build perf_arch_fetch_caller_regs() now that software
events need that too. They don't need it from modules, unlike trace
events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Implement the workaround for Intel Errata AAK100 and AAP53.
Also, remove the Core-i7 name for Nehalem events since there are
also Westmere based i7 chips.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <1269608924.12097.147.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Support for the PMU's BTS features has been upstreamed in
v2.6.32, but we still have the old and disabled ptrace-BTS,
as Linus noticed it not so long ago.
It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
regard for other uses (perf) and doesn't provide the flexibility
needed for perf either.
Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
was never used and ptrace-block-step can be implemented using a
much simpler approach.
So axe all 3000 lines of it. That includes the *locked_memory*()
APIs in mm/mlock.c as well.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100325135413.938004390@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The adding of raw event support lead to complete code
refactoring. I hope is became more readable then it was.
The list of changes:
1) The 64bit config field is enough to hold all information we need
to track event details. To achieve it we used *own* enum for
events selection in ESCR register and map this key into proper
value at moment of event enabling.
For the same reason we use 12LSB bits in CCCR register -- to track
which exactly cache trace event was requested. And we cear this bits
at real 'write' moment.
2) There is no per-cpu area reserved for P4 PMU anymore. We
don't need it. All is held by config.
3) Now we may use any available counter, ie we try to grab any
possible counter.
v2:
- Lin Ming reported the lack of ESCR selector in CCCR for cache events
v3:
- Don't loose cache event codes at config unpacking procedure, we may
need it one day so no obscure hack behind our back, better to clear
reserved bits explicitly when needed (thanks Ming for pointing out)
- Lin Ming fixed misplaced opcodes in cache events
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1269403766.3409.6.camel@minggr.sh.intel.com>
[ v4: did a few whitespace fixlets ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 3f6da39053
(perf: Rework and fix the arch CPU-hotplug hooks) broke suspend to
RAM on my HP nx6325 (and most likely on other AMD-based boxes too)
by allowing amd_pmu_cpu_offline() to be executed for CPUs that are
going offline as part of the suspend process. The problem is that
cpuhw->amd_nb may be NULL already, so the function should make sure
it's not NULL before accessing the object pointed to by it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
perf: Fix unexported generic perf_arch_fetch_caller_regs
perf record: Don't try to find buildids in a zero sized file
perf: export perf_trace_regs and perf_arch_fetch_caller_regs
perf, x86: Fix hw_perf_enable() event assignment
perf, ppc: Fix compile error due to new cpu notifiers
perf: Make the install relative to DESTDIR if specified
kprobes: Calculate the index correctly when freeing the out-of-line execution slot
perf tools: Fix sparse CPU numbering related bugs
perf_event: Fix oops triggered by cpu offline/online
perf: Drop the obsolete profile naming for trace events
perf: Take a hot regs snapshot for trace events
perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
perf/x86-64: Use frame pointer to walk on irq and process stacks
lockdep: Move lock events under lockdep recursion protection
perf report: Print the map table just after samples for which no map was found
perf report: Add multiple event support
perf session: Change perf_session post processing functions to take histogram tree
perf session: Add storage for seperating event types in report
perf session: Change add_hist_entry to take the tree root instead of session
perf record: Add ID and to recorded event data when recording multiple events
...
- A few ESCR have escaped fixing at previous attempt.
- p4_escr_map is read only, make it const.
Nothing serious.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100318211256.GH5062@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the HT bit setting code from p4_pmu_event_map to
p4_hw_config. So the cache events can get HT bit set correctly.
Tested on my P4 desktop, below 6 cache events work:
L1-dcache-load-misses
LLC-load-misses
dTLB-load-misses
dTLB-store-misses
iTLB-loads
iTLB-load-misses
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268908392.13901.128.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, we use opcode(Event and Event-Selector) + emask to
look up template in p4_templates.
But cache events (L1-dcache-load-misses, LLC-load-misses, etc)
use the same event(P4_REPLAY_EVENT) to do the counting, ie, they
have the same opcode and emask. So we can not use current lookup
mechanism to find the template for cache events.
This patch introduces a "key", which is the index into
p4_templates. The low 12 bits of CCCR are reserved, so we can
hide the "key" in the low 12 bits of hwc->config.
We extract the key from hwc->config and then quickly find the
template.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268908387.13901.127.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.
As a general rule, weak functions should not have their symbol
exported in the same file they are defined.
So let's export it on trace_event_perf.c as it is used by trace
events only.
This fixes:
ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistake
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.
As a general rule, weak functions should not have their symbol
exported in the same file they are defined.
So let's export it on trace_event_perf.c as it is used by trace
events only.
This fixes:
ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistake
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This should turn on instruction counting on P4s, which was missing in
the first version of the new PMU driver.
It's inaccurate for now, we still need dependant event to tag mops
before we can count them precisely. The result is that the number of
instruction may be lifted up.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268629102.3355.11.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix pick_next_highest_task_rt() for cgroups
sched: Cleanup: remove unused variable in try_to_wake_up()
x86: Fix sched_clock_cpu for systems with unsynchronized TSC
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, k8 nb: Fix boot crash: enable k8_northbridges unconditionally on AMD systems
x86, UV: Fix target_cpus() in x2apic_uv_x.c
x86: Reduce per cpu warning boot up messages
x86: Reduce per cpu MCA boot up messages
x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking: Make sparse work with inline spinlocks and rwlocks
x86/mce: Fix RCU lockdep splats
rcu: Increase RCU CPU stall timeouts if PROVE_RCU
ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
rcu: Suppress RCU lockdep warnings during early boot
rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
rcu: Suppress __mpol_dup() false positive from RCU lockdep
rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
rcu: Add control variables to lockdep_rcu_dereference() diagnostics
rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
rcu: Use wrapper function instead of exporting tasklist_lock
sched, rcu: Fix rcu_dereference() for RCU-lockdep
rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
x86/gart: Unexport gart_iommu_aperture
Fix trivial conflicts in kernel/trace/ftrace.c
Ingo reported:
|
| There's a build failure on -tip with the P4 driver, on UP 32-bit, if
| PERF_EVENTS is enabled but UP_APIC is disabled:
|
| arch/x86/built-in.o: In function `p4_pmu_handle_irq':
| perf_event.c:(.text+0xa756): undefined reference to `apic'
| perf_event.c:(.text+0xa76e): undefined reference to `apic'
|
So we have to unmask LVTPC only if we're configured to have one.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100313081116.GA5179@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The netburst PMU is way different from the "architectural
perfomance monitoring" specification that current CPUs use.
P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle
perfomance monitoring events.
A few implementational details:
1) We need a separate x86_pmu::hw_config helper in struct
x86_pmu since register bit-fields are quite different from P6,
Core and later cpu series.
2) For the same reason is a x86_pmu::schedule_events helper
introduced.
3) hw_perf_event::config consists of packed ESCR+CCCR values.
It's allowed since in reality both registers only use a half
of their size. Of course before making a real write into a
particular MSR we need to unpack the value and extend it to
a proper size.
4) The tuple of packed ESCR+CCCR in hw_perf_event::config
doesn't describe the memory address of ESCR MSR register
so that we need to keep a mapping between these tuples
used and available ESCR (various P4 events may use same
ESCRs but not simultaneously), for this sake every active
event has a per-cpu map of hw_perf_event::idx <--> ESCR
addresses.
5) Since hw_perf_event::idx is an offset to counter/control register
we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel
strips it down to 8 registers and event armed may never be turned
off (ie the bit in active_mask is set but the loop never reaches
this index to check), thanks to Peter Zijlstra
Restrictions:
- No cascaded counters support (do we ever need them?)
- No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS
doesn't work for now)
- There are events with same counters which can't work simultaneously
(need to use intersected ones due to broken counter 1)
- No PERF_COUNT_HW_CACHE_ events yet
Todo:
- Implement dependent events
- Need proper hashing for event opcodes (no linear search, good for
debugging stage but not in real loads)
- Some events counted during a clock cycle -- need to set threshold
for them and count every clock cycle just to get summary statistics
(ie to behave the same way as other PMUs do)
- Need to swicth to use event_constraints
- To support RAW events we need to encode a global list of P4 events
into p4_templates
- Cache events need to be added
Event support status matrix:
Event status
-----------------------------
cycles works
cache-references works
cache-misses works
branch-misses works
bus-cycles partially (does not work on 64bit cpu with HT enabled)
instruction doesnt work (needs dependent event [mop tagging])
branches doesnt work
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100311165439.GB5129@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
What happens is that we schedule badly like:
<...>-1987 [019] 280.252808: x86_pmu_start: event-46/1300c0: idx: 0
<...>-1987 [019] 280.252811: x86_pmu_start: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252812: x86_pmu_start: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252813: x86_pmu_start: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252814: x86_pmu_start: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252825: x86_pmu_stop: event-46/1300c0: idx: 0
<...>-1987 [019] 280.252826: x86_pmu_stop: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252827: x86_pmu_stop: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252828: x86_pmu_stop: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252829: x86_pmu_stop: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252834: x86_pmu_start: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252834: x86_pmu_start: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252835: x86_pmu_start: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252836: x86_pmu_start: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252837: x86_pmu_start: event-51/1300c0: idx: 32 *FAIL*
This happens because we only iterate the n_running events in the first
pass, and reset their index to -1 if they don't match to force a
re-assignment.
Now, in our RR example, n_running == 0 because we fully unscheduled, so
event-50 will retain its idx==32, even though in scheduling it will have
gotten idx=0, and we don't trigger the re-assign path.
The easiest way to fix this is the below patch, which simply validates
the full assignment in the second pass.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268311069.5037.31.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Events that trigger overflows by interrupting a context can
use get_irq_regs() or task_pt_regs() to retrieve the state
when the event triggered. But this is not the case for some
other class of events like trace events as tracepoints are
executed in the same context than the code that triggered
the event.
It means we need a different api to capture the regs there,
namely we need a hot snapshot to get the most important
informations for perf: the instruction pointer to get the
event origin, the frame pointer for the callchain, the code
segment for user_mode() tests (we always use __KERNEL_CS as
trace events always occur from the kernel) and the eflags
for further purposes.
v2: rename perf_save_regs to perf_fetch_caller_regs as per
Masami's suggestion.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
If we reset the LBR on each first counter, simple counter rotation which
first deschedules all counters and then reschedules the new ones will
lead to LBR reset, even though we're still in the same task context.
Reduce this by not flushing on the first counter but only flushing on
different task contexts.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I overlooked the perf_disable()/perf_enable() calls in
intel_pmu_handle_irq(), (pointed out by Markus) so we should not
explicitly disable_all/enable_all pebs counters in the drain functions,
these are already disabled and enabling them early is confusing.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>