Use the LBR to fix up the PEBS IP+1 issue.
As said, PEBS reports the next instruction, here we use the LBR to find
the last branch and from that construct the actual IP. If the IP matches
the LBR-TO, we use LBR-FROM, otherwise we use the LBR-TO address as the
beginning of the last basic block and decode forward.
Once we find a match to the current IP, we use the previous location.
This patch introduces a new ABI element: PERF_RECORD_MISC_EXACT, which
conveys that the reported IP (PERF_SAMPLE_IP) is the exact instruction
that caused the event (barring CPU errata).
The fixup can fail due to various reasons:
1) LBR contains invalid data (quite possible)
2) part of the basic block got paged out
3) the reported IP isn't part of the basic block (see 1)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.619375431@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implement simple suport Intel Last-Branch-Record, it supports all
hardware that implements FREEZE_LBRS_ON_PMI, but does not (yet) implement
the LBR config register.
The Intel LBR is a FIFO of From,To addresses describing the last few
branches the hardware took.
This patch does not add perf interface to the LBR, but merely provides an
interface for internal use.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.544191154@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
hw_perf_enable() would enable already enabled events.
This causes problems with code that assumes that ->enable/->disable calls
are balanced (like the LBR code does).
What happens is that events that were already running and left in place
would get enabled again.
Avoid this by only enabling new events that match their previous
assignment.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
hw_perf_enable() would disable events that were not yet enabled.
This causes problems with code that assumes that ->enable/->disable calls
are balanced (like the LBR code does).
What happens is that we disable newly added counters that match their
previous assignment, even though they are not yet programmed on the
hardware.
Avoid this by only doing the first pass over the existing events.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the amd_iommu_domain_destroy the protection_domain_free
function is partly reimplemented. The 'partly' is the bug
here because the domain is not deleted from the domain list.
This results in use-after-free errors and data-corruption.
Fix it by just using protection_domain_free instead.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
These are the non-static sysfs attributes that exist on
my test machine. Fix them to use sysfs_attr_init or
sysfs_bin_attr_init as appropriate. It simply requires
making a sysfs attribute present to see this. So this
is a little bit tedious but otherwise not too bad.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These functions are not longer used and can be removed
savely. There functionality is now provided by the
iommu_{un}map functions which are also capable of multiple
page sizes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch implements the new callbacks for the IOMMU-API
with functions that can handle different page sizes in the
IOMMU page table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch extends the amd_iommu_iova_to_phys() function to
handle different page sizes correctly. It doesn't use
fetch_pte() anymore because we don't know (or care about)
the page_size used for mapping the given iova.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch extends the functionality of iommu_unmap_page
and fetch_pte to support arbitrary page sizes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch changes the old map_size parameter of alloc_pte
to a page_size parameter which can be used more easily to
alloc a pte for intermediate page sizes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The new function pointer names match better with the
top-level functions of the iommu-api which are using them.
Main intention of this change is to make the ->{un}map
pointer names free for two new mapping functions.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Rename for_each_bit to for_each_set_bit in the kernel source tree. To
permit for_each_clear_bit(), should that ever be added.
The patch includes a macro to map the old for_each_bit() onto the new
for_each_set_bit(). This is a (very) temporary thing to ease the migration.
[akpm@linux-foundation.org: add temporary for_each_bit()]
Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix missing kernel-doc notation in mtrr/main.c:
Warning(arch/x86/kernel/cpu/mtrr/main.c:152): No description found for parameter 'info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Issue at least one memory barrier in stop_machine_text_poke()
perf probe: Correct probe syntax on command line help
perf probe: Add lazy line matching support
perf probe: Show more lines after last line
perf probe: Check function address range strictly in line finder
perf probe: Use libdw callback routines
perf probe: Use elfutils-libdw for analyzing debuginfo
perf probe: Rename probe finder functions
perf probe: Fix bugs in line range finder
perf probe: Update perf probe document
perf probe: Do not show --line option without dwarf support
kprobes: Add documents of jump optimization
kprobes/x86: Support kprobes jump optimization on x86
x86: Add text_poke_smp for SMP cross modifying code
kprobes/x86: Cleanup save/restore registers
kprobes/x86: Boost probes when reentering
kprobes: Jump optimization sysctl interface
kprobes: Introduce kprobes jump optimization
kprobes: Introduce generic insn_slot framework
kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE
Checkin bb24c47161:
"Moorestown APB system timer driver" suffered from severe whitespace
damage in arch/x86/kernel/apb_timer.c due to using Microsoft Lookout
to send a patch. Fix the whitespace breakage.
Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The current APB timer code incorrectly registers a static copy of the
clockevent device for the boot CPU. The per cpu clockevent should be
used instead.
This bug was hidden by zero-initialized data; as such it did not get
exposed in testing, but was discovered by code review.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1267592494-7723-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
x86: Fix out of order of gsi
x86: apic: Fix mismerge, add arch_probe_nr_irqs() again
x86, irq: Keep chip_data in create_irq_nr and destroy_irq
xen: Remove unnecessary arch specific xen irq functions.
smp: Use nr_cpus= to set nr_cpu_ids early
x86, irq: Remove arch_probe_nr_irqs
sparseirq: Use radix_tree instead of ptrs array
sparseirq: Change irq_desc_ptrs to static
init: Move radix_tree_init() early
irq: Remove unnecessary bootmem code
x86: Add iMac9,1 to pci_reboot_dmi_table
x86: Convert i8259_lock to raw_spinlock
x86: Convert nmi_lock to raw_spinlock
x86: Convert ioapic_lock and vector_lock to raw_spinlock
x86: Avoid race condition in pci_enable_msix()
x86: Fix SCI on IOAPIC != 0
x86, ia32_aout: do not kill argument mapping
x86, irq: Move __setup_vector_irq() before the first irq enable in cpu online path
x86, irq: Update the vector domain for legacy irqs handled by io-apic
x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's
...
* 'x86-bootmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
early_res: Need to save the allocation name in drop_range_partial()
sparsemem: Fix compilation on PowerPC
early_res: Add free_early_partial()
x86: Fix non-bootmem compilation on PowerPC
core: Move early_res from arch/x86 to kernel/
x86: Add find_fw_memmap_area
Move round_up/down to kernel.h
x86: Make 32bit support NO_BOOTMEM
early_res: Enhance check_and_double_early_res
x86: Move back find_e820_area to e820.c
x86: Add find_early_area_size
x86: Separate early_res related code from e820.c
x86: Move bios page reserve early to head32/64.c
sparsemem: Put mem map for one node together.
sparsemem: Put usemap for one node together
x86: Make 64 bit use early_res instead of bootmem before slab
x86: Only call dma32_reserve_bootmem 64bit !CONFIG_NUMA
x86: Make early_node_mem get mem > 4 GB if possible
x86: Dynamically increase early_res array size
x86: Introduce max_early_res and early_res_count
...
Callers of a stacktrace might pass bad frame pointers. Those
are usually checked for safety in stack walking helpers before
any dereferencing, but this is not the case when we need to go
through one more frame pointer that backlinks the irq stack to
the previous one, as we don't have any reliable address boudaries
to compare this frame pointer against.
This raises crashes when we record callchains for ftrace events
with perf because we don't use the right helpers to capture
registers there. We get wrong frame pointers as we call
task_pt_regs() even on kernel threads, which is a wrong thing
as it gives us the initial state of any kernel threads freshly
created. This is even not what we want for user tasks. What we want
is a hot snapshot of registers when the ftrace event triggers, not
the state before a task entered the kernel.
This requires more thoughts to do it correctly though.
So first put a guardian to ensure the given frame pointer
can be dereferenced to avoid crashes. We'll think about how to fix
the callers in a subsequent patch.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: 2.6.33.x <stable@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Patch 1da53e0230 ("perf_events, x86: Improve x86 event scheduling")
lost us one of the fixed purpose counters and then ed8777fc13
("perf_events, x86: Fix event constraint masks") broke it even
further.
Widen the fixed event mask to event+umask and specify the full config
for each of the 3 fixed purpose counters. Then let the init code fill
out the placement for the GP regs based on the cpuid info.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The ANY flag can show SMT data of another task (like 'top'),
so we want to disable it when system-wide profiling is
disabled.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>