Commit Graph

2352 Commits

Author SHA1 Message Date
Thomas Renninger
fad12ac8c8 CPU: Introduce ARCH_HAS_CPU_AUTOPROBE and X86 parts
This patch is based on Andi Kleen's work:
Implement autoprobing/loading of modules serving CPU
specific features (x86cpu autoloading).

And Kay Siever's work to get rid of sysdev cpu structures
and making use of struct device instead.

Before, the cpuid driver had to be loaded to get the x86cpu
autoloading feature. With this patch autoloading works through
the /sys/devices/system/cpu object

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-26 16:49:08 -08:00
Thomas Renninger
2f1e097e24 X86: Introduce HW-Pstate scattered cpuid feature
It is rather similar to CPB (boot capability) feature
and exists since fam10h (can be looked up in AMD's BKDG).

The feature is needed for powernow-k8 to cleanup init functions and to
provide proper autoloading matching with the new x86cpu modalias
feature.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Borislav Petkov <bp@amd64.org>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-26 16:49:06 -08:00
Andi Kleen
644e9cbbe3 Add driver auto probing for x86 features v4
There's a growing number of drivers that support a specific x86 feature
or CPU.  Currently loading these drivers currently on a generic
distribution requires various driver specific hacks and it often
doesn't work.

This patch adds auto probing for drivers based on the x86 cpuid
information, in particular based on vendor/family/model number
and also based on CPUID feature bits.

For example a common issue is not loading the SSE 4.2 accelerated
CRC module: this can significantly lower the performance of BTRFS
which relies on fast CRC.

Another issue is loading the right CPUFREQ driver for the current CPU.
Currently distributions often try all all possible driver until
one sticks, which is not really a good way to do this.

It works with existing udev without any changes. The code
exports the x86 information as a generic string in sysfs
that can be matched by udev's pattern matching.

This scheme does not support numeric ranges, so if you want to
handle e.g. ranges of model numbers they have to be encoded
in ASCII or simply all models or families listed. Fixing
that would require changing udev.

Another issue is that udev will happily load all drivers that match,
there is currently no nice way to stop a specific driver from
being loaded if it's not needed (e.g. if you don't need fast CRC)
But there are not that many cpu specific drivers around and they're
all not that bloated, so this isn't a particularly serious issue.

Originally this patch added the modalias to the normal cpu
sysdevs. However sysdevs don't have all the infrastructure
needed for udev, so it couldn't really autoload drivers.
This patch instead adds the CPU modaliases to the cpuid devices,
which are real devices with full support for udev. This implies
that the cpuid driver has to be loaded to use this.

This patch just adds infrastructure, some driver conversions
in followups.

Thanks to Kay for helping with some sysfs magic.

v2: Constifcation, some updates
v4: (trenn@suse.de):
    - Use kzalloc instead of kmalloc to terminate modalias buffer
    - Use uppercase hex values to match correctly against hex values containing
      letters

Cc: Dave Jones <davej@redhat.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Jen Axboe <axboe@kernel.dk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-26 16:44:41 -08:00
Tony Luck
08dda402d6 x86/mce: Replace hard coded hex constants with symbolic defines
Magic constants like 0x0134 in code just invite questions on
where they come from, what they mean, can they be changed.

Provide #defines for the architecturally defined MCACOD values
with a reference to the Intel Software Developers manual which
describes them.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-01-26 16:02:22 -08:00
Ingo Molnar
4e9f44ba29 Merge tag 'mce-recovery-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce
Implement MCE recovery for the data load error path and assorted cleanups.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26 11:40:13 +01:00
Greg Kroah-Hartman
e032d80774 mce: fix warning messages about static struct mce_device
When suspending, there was a large list of warnings going something like:

	Device 'machinecheck1' does not have a release() function, it is broken and must be fixed

This patch turns the static mce_devices into dynamically allocated, and
properly frees them when they are removed from the system.  It solves
the warning messages on my laptop here.

Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Djalal Harouni <tixxdz@opendz.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@amd64.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-16 17:08:42 -08:00
Linus Torvalds
83c2f912b4 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  perf tools: Fix compile error on x86_64 Ubuntu
  perf report: Fix --stdio output alignment when --showcpuutilization used
  perf annotate: Get rid of field_sep check
  perf annotate: Fix usage string
  perf kmem: Fix a memory leak
  perf kmem: Add missing closedir() calls
  perf top: Add error message for EMFILE
  perf test: Change type of '-v' option to INCR
  perf script: Add missing closedir() calls
  tracing: Fix compile error when static ftrace is enabled
  recordmcount: Fix handling of elf64 big-endian objects.
  perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again
  perf tools: Add support for guest/host-only profiling
  perf kvm: Do guest-only counting by default
  perf top: Don't update total_period on process_sample
  perf hists: Stop using 'self' for struct hist_entry
  perf hists: Rename total_session to total_period
  x86: Add counter when debug stack is used with interrupts enabled
  x86: Allow NMIs to hit breakpoints in i386
  x86: Keep current stack in NMI breakpoints
  ...
2012-01-15 11:26:35 -08:00
Srivatsa S. Bhat
a3301b751b x86/mce: Fix CPU hotplug and suspend regression related to MCE
Commit 8a25a2fd12 ("cpu: convert 'cpu' and 'machinecheck' sysdev_class
to a regular subsystem") changed how things are dealt with in the MCE
subsystem.  Some of the things that got broken due to this are CPU
hotplug and suspend/hibernate.

MCE uses per_cpu allocations of struct device.  So, when a CPU goes
offline and comes back online, in order to ensure that we start from a
clean slate with respect to the MCE subsystem, zero out the entire
per_cpu device structure to 0 before using it.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-13 19:11:35 -08:00
Ingo Molnar
636f0c70f2 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core 2012-01-08 12:31:24 +01:00
Linus Torvalds
7affca3537 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
  arm: fix up some samsung merge sysdev conversion problems
  firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
  Drivers:hv: Fix a bug in vmbus_driver_unregister()
  driver core: remove __must_check from device_create_file
  debugfs: add missing #ifdef HAS_IOMEM
  arm: time.h: remove device.h #include
  driver-core: remove sysdev.h usage.
  clockevents: remove sysdev.h
  arm: convert sysdev_class to a regular subsystem
  arm: leds: convert sysdev_class to a regular subsystem
  kobject: remove kset_find_obj_hinted()
  m86k: gpio - convert sysdev_class to a regular subsystem
  mips: txx9_sram - convert sysdev_class to a regular subsystem
  mips: 7segled - convert sysdev_class to a regular subsystem
  sh: dma - convert sysdev_class to a regular subsystem
  sh: intc - convert sysdev_class to a regular subsystem
  power: suspend - convert sysdev_class to a regular subsystem
  power: qe_ic - convert sysdev_class to a regular subsystem
  power: cmm - convert sysdev_class to a regular subsystem
  s390: time - convert sysdev_class to a regular subsystem
  ...

Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
 - arch/arm/mach-exynos/cpu.c
 - arch/arm/mach-exynos/irq-eint.c
 - arch/arm/mach-s3c64xx/common.c
 - arch/arm/mach-s3c64xx/cpu.c
 - arch/arm/mach-s5p64x0/cpu.c
 - arch/arm/mach-s5pv210/common.c
 - arch/arm/plat-samsung/include/plat/cpu.h
 - arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07 12:03:30 -08:00
Ingo Molnar
03f70388c3 Merge branch 'tip/x86/core-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core 2012-01-07 13:25:49 +01:00
Linus Torvalds
edf7c8148e Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: add IRQ context simulation in module mce-inject
  x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog
  x86, MCE: Drain mcelog buffer
  x86, mce: Add wrappers for registering on the decode chain
2012-01-06 15:02:37 -08:00
Linus Torvalds
2c364faabb Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, centaur: Enable cx8 for VIA Eden too
2012-01-06 14:00:44 -08:00
Linus Torvalds
cf3f33551b Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Use "do { } while(0)" for empty lock_cmos()/unlock_cmos() macros
  x86: Use "do { } while(0)" for empty flush_tlb_fix_spurious_fault() macro
  x86, CPU: Drop superfluous get_cpu_cap() prototype
  arch/x86/mm/pageattr.c: Quiet sparse noise; local functions should be static
  arch/x86/kernel/ptrace.c: Quiet sparse noise
  x86: Use kmemdup() in copy_thread(), rather than duplicating its implementation
  x86: Replace the EVT_TO_HPET_DEV() macro with an inline function
2012-01-06 14:00:12 -08:00
Linus Torvalds
69734b644b Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86: Fix atomic64_xxx_cx8() functions
  x86: Fix and improve cmpxchg_double{,_local}()
  x86_64, asm: Optimise fls(), ffs() and fls64()
  x86, bitops: Move fls64.h inside __KERNEL__
  x86: Fix and improve percpu_cmpxchg{8,16}b_double()
  x86: Report cpb and eff_freq_ro flags correctly
  x86/i386: Use less assembly in strlen(), speed things up a bit
  x86: Use the same node_distance for 32 and 64-bit
  x86: Fix rflags in FAKE_STACK_FRAME
  x86: Clean up and extend do_int3()
  x86: Call do_notify_resume() with interrupts enabled
  x86/div64: Add a micro-optimization shortcut if base is power of two
  x86-64: Cleanup some assembly entry points
  x86-64: Slightly shorten line system call entry and exit paths
  x86-64: Reduce amount of redundant code generated for invalidate_interruptNN
  x86-64: Slightly shorten int_ret_from_sys_call
  x86, efi: Convert efi_phys_get_time() args to physical addresses
  x86: Default to vsyscall=emulate
  x86-64: Set siginfo and context on vsyscall emulation faults
  x86: consolidate xchg and xadd macros
  ...
2012-01-06 13:59:14 -08:00
Linus Torvalds
67b0243131 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Skip cpus with apic-ids >= 255 in !x2apic_mode
  x86, x2apic: Allow "nox2apic" to disable x2apic mode setup by BIOS
  x86, x2apic: Fallback to xapic when BIOS doesn't setup interrupt-remapping
  x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present
  x86, apic: Add probe() for apic_flat
  x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'
  x86: Convert per-cpu counter icr_read_retry_count into a member of irq_stat
  x86: Add per-cpu stat counter for APIC ICR read tries
  pci, x86/io-apic: Allow PCI_IOAPIC to be user configurable on x86
  x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code support
  x86: Add NumaChip support
  x86: Add x86_init platform override to fix up NUMA core numbering
  x86: Make flat_init_apic_ldr() available
2012-01-06 13:58:21 -08:00
Greg Kroah-Hartman
ff4b8a57f0 Merge branch 'driver-core-next' into Linux 3.2
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.

The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 11:42:52 -08:00
Linus Torvalds
35b740e466 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits)
  perf kvm: Fix copy & paste error in description
  perf script: Kill script_spec__delete
  perf top: Fix a memory leak
  perf stat: Introduce get_ratio_color() helper
  perf session: Remove impossible condition check
  perf tools: Fix feature-bits rework fallout, remove unused variable
  perf script: Add generic perl handler to process events
  perf tools: Use for_each_set_bit() to iterate over feature flags
  perf tools: Unify handling of features when writing feature section
  perf report: Accept fifos as input file
  perf tools: Moving code in some files
  perf tools: Fix out-of-bound access to struct perf_session
  perf tools: Continue processing header on unknown features
  perf tools: Improve macros for struct feature_ops
  perf: builtin-record: Document and check that mmap_pages must be a power of two.
  perf: builtin-record: Provide advice if mmap'ing fails with EPERM.
  perf tools: Fix truncated annotation
  perf script: look up thread using tid instead of pid
  perf tools: Look up thread names for system wide profiling
  perf tools: Fix comm for processes with named threads
  ...
2012-01-06 08:02:58 -08:00
Linus Torvalds
423d091dfe Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  cpu: Export cpu_up()
  rcu: Apply ACCESS_ONCE() to rcu_boost() return value
  Revert "rcu: Permit rt_mutex_unlock() with irqs disabled"
  docs: Additional LWN links to RCU API
  rcu: Augment rcu_batch_end tracing for idle and callback state
  rcu: Add rcutorture tests for srcu_read_lock_raw()
  rcu: Make rcutorture test for hotpluggability before offlining CPUs
  driver-core/cpu: Expose hotpluggability to the rest of the kernel
  rcu: Remove redundant rcu_cpu_stall_suppress declaration
  rcu: Adaptive dyntick-idle preparation
  rcu: Keep invoking callbacks if CPU otherwise idle
  rcu: Irq nesting is always 0 on rcu_enter_idle_common
  rcu: Don't check irq nesting from rcu idle entry/exit
  rcu: Permit dyntick-idle with callbacks pending
  rcu: Document same-context read-side constraints
  rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass
  rcu: Remove dynticks false positives and RCU failures
  rcu: Reduce latency of rcu_prepare_for_idle()
  rcu: Eliminate RCU_FAST_NO_HZ grace-period hang
  rcu: Avoid needlessly IPIing CPUs at GP end
  ...
2012-01-06 08:02:40 -08:00
Ingo Molnar
adaf4ed2ab Merge commit 'v3.2-rc7' into x86/asm
Merge reason: Update from -rc4 to -rc7.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-04 15:01:28 +01:00
Tony Luck
5f7b88d51e x86/mce: Recognise machine check bank signature for data path error
Action required data path signature is defined in table 15-19 of SDM:

+-----------------------------------------------------------------------------+
| SRAR Error | Valid | OVER | UC | EN | MISCV | ADDRV | PCC | S | AR | MCACOD |
| Data Load  |     1 |    0 |  1 |  1 |     1 |     1 |   0 | 1 |  1 |  0x134 |
+-----------------------------------------------------------------------------+

Recognise this, and pass MCE_AR_SEVERITY code back to do_machine_check() if
we have the action handler configured (CONFIG_MEMORY_FAILURE=y)

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-01-03 12:07:07 -08:00
Tony Luck
a8c321fbf9 x86/mce: Handle "action required" errors
All non-urgent actions (reporting low severity errors and handling
"action-optional" errors) are now handled by a work queue. This
means that TIF_MCE_NOTIFY can be used to block execution for a
thread experiencing an "action-required" fault until we get all
cpus out of the machine check handler (and the thread that hit
the fault into mce_notify_process().

We use the new mce_{save,find,clear}_info() API to get information
from do_machine_check() to mce_notify_process(), and then use the
newly improved memory_failure(..., MF_ACTION_REQUIRED) to handle
the error (possibly signalling the process).

Update some comments to make the new code flows clearer.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-01-03 12:07:01 -08:00
Tony Luck
af104e394e x86/mce: Add mechanism to safely save information in MCE handler
Machine checks on Intel cpus interrupt execution on all cpus, regardless
of interrupt masking.  We have a need to save some data about the cause
of the machine check (physical address) in the machine check handler that
can be retrieved later to attempt recovery in a more flexible execution
state.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-01-03 12:06:53 -08:00
Tony Luck
85f92694af x86/mce: Create helper function to save addr/misc when needed
The MCI_STATUS_MISCV and MCI_STATUS_ADDRV bits in the bank status
registers define whether the MISC and ADDR registers respectively
contain valid data - provide a helper function to check these bits
and read the registers when needed.

In addition, processors that support software error recovery (as
indicated by the MCG_SER_P bit in the MCG_CAP register) may include
some undefined bits in the ADDR register - mask these out.

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-01-03 12:06:45 -08:00
Tony Luck
cd42f4a3b2 HWPOISON: Clean up memory_failure() vs. __memory_failure()
There is only one caller of memory_failure(), all other users call
__memory_failure() and pass in the flags argument explicitly. The
lone user of memory_failure() will soon need to pass flags too.

Add flags argument to the callsite in mce.c. Delete the old memory_failure()
function, and then rename __memory_failure() without the leading "__".

Provide clearer message when action optional memory errors are ignored.

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-01-03 12:06:32 -08:00
Robert Richter
2e64694de2 perf/x86: Fix raw_spin_unlock_irqrestore() usage
Use raw_spin_unlock_irqrestore() as equivalent to
raw_spin_lock_irqsave().

Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1324646665-13334-1-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-23 17:57:01 +01:00
Kay Sievers
8a25a2fd12 cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 14:29:42 -08:00
Steven Rostedt
42181186ad x86: Add counter when debug stack is used with interrupts enabled
Mathieu Desnoyers pointed out a case that can cause issues with
NMIs running on the debug stack:

  int3 -> interrupt -> NMI -> int3

Because the interrupt changes the stack, the NMI will not see that
it preempted the debug stack. Looking deeper at this case,
interrupts only happen when the int3 is from userspace or in
an a location in the exception table (fixup).

  userspace -> int3 -> interurpt -> NMI -> int3

All other int3s that happen in the kernel should be processed
without ever enabling interrupts, as the do_trap() call will
panic the kernel if it is called to process any other location
within the kernel.

Adding a counter around the sections that enable interrupts while
using the debug stack allows the NMI to also check that case.
If the NMI sees that it either interrupted a task using the debug
stack or the debug counter is non-zero, then it will have to
change the IDT table to make the int3 not change stacks (which will
corrupt the stack if it does).

Note, I had to move the debug_usage functions out of processor.h
and into debugreg.h because of the static inlined functions to
inc and dec the debug_usage counter. __get_cpu_var() requires
smp.h which includes processor.h, and would fail to build.

Link: http://lkml.kernel.org/r/1323976535.23971.112.camel@gandalf.stny.rr.com

Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-21 15:38:56 -05:00
Steven Rostedt
228bdaa95f x86: Keep current stack in NMI breakpoints
We want to allow NMI handlers to have breakpoints to be able to
remove stop_machine from ftrace, kprobes and jump_labels. But if
an NMI interrupts a current breakpoint, and then it triggers a
breakpoint itself, it will switch to the breakpoint stack and
corrupt the data on it for the breakpoint processing that it
interrupted.

Instead, have the NMI check if it interrupted breakpoint processing
by checking if the stack that is currently used is a breakpoint
stack. If it is, then load a special IDT that changes the IST
for the debug exception to keep the same stack in kernel context.
When the NMI is done, it puts it back.

This way, if the NMI does trigger a breakpoint, it will keep
using the same stack and not stomp on the breakpoint data for
the breakpoint it interrupted.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-21 15:38:55 -05:00
Peter Zijlstra
e3f3541c19 perf: Extend the mmap control page with time (TSC) fields
Extend the mmap control page with fields so that userspace can compute
time deltas relative to the provided time fields.

Currently only implemented for x86 with constant and nonstop TSC.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/n/tip-3u1jucza77j3wuvs0x2bic0f@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 11:01:13 +01:00
Peter Zijlstra
0c9d42ed4c perf, x86: Provide means for disabling userspace RDPMC
Allow the disabling of RDPMC via a pmu specific attribute:

  echo 0 > /sys/bus/event_source/devices/cpu/rdpmc

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/n/tip-pqeog465zo5hsimtkfz73f27@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 11:01:11 +01:00
Peter Zijlstra
fe4a330885 perf, x86: Implement user-space RDPMC support, to allow fast, user-space access to self-monitoring counters
Implement a correct pmu::event_idx for the x86 counter index rules and
set CR4.PCE on CPU_STARTING.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-mwxab34dibqgzk5zywutfnha@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 11:01:10 +01:00
Stephane Eranian
9c1497ea59 perf events: Add Intel x86 mapping for PERF_COUNT_HW_REF_CPU_CYCLES
Add event maps for Intel x86 processors (with architected PMU v2 or later).

On AMD, there is frequency scaling but no Turbo. There is no core
cycle event not subject to frequency scaling, therefore we do not
provide a mapping.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323559734-3488-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 10:26:39 +01:00
Stephane Eranian
cd09c0c40a perf events: Enable raw event support for Intel unhalted_reference_cycles event
This patch adds the encoding and definitions necessary for the
unhalted_reference_cycles event avaialble since Intel Core 2 processors.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323559734-3488-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 10:26:32 +01:00
Kevin Winchester
141168c36c x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'
Several fields in struct cpuinfo_x86 were not defined for the
!SMP case, likely to save space.  However, those fields still
have some meaning for UP, and keeping them allows some #ifdef
removal from other files.  The additional size of the UP kernel
from this change is not significant enough to worry about
keeping up the distinction:

	   text    data     bss     dec     hex filename
	4737168	 506459	 972040	6215667	 5ed7f3	vmlinux.o.before
	4737444	 506459	 972040	6215943	 5ed907	vmlinux.o.after

for a difference of 276 bytes for an example UP config.

If someone wants those 276 bytes back badly then it should
be implemented in a cleaner way.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 09:25:09 +01:00
Ingo Molnar
d87f69a16e Merge commit 'v3.2-rc6' into perf/core
Merge reason: Update with the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-20 20:32:11 +01:00
Ingo Molnar
a228b5892b Merge branch 'mce-inject' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce 2011-12-18 09:18:45 +01:00
Chen Gong
2c29d9dd57 x86: add IRQ context simulation in module mce-inject
mce-inject provides a mechanism to simulate errors so that test
scripts can check for correct operation of the kernel without
requiring any specialized hardware to create rare events.

The existing code can simulate events in normal process context
and also in NMI context - but not in IRQ context. This patch
fills that gap.

Link: https://lkml.org/lkml/2011/12/7/537
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-12-16 11:20:02 -08:00
Timo Teräs
cb3f718de8 x86, centaur: Enable cx8 for VIA Eden too
My box with following cpuinfo needs the cx8 enabling still:

vendor_id	: CentaurHauls
cpu family	: 6
model		: 13
model name	: VIA Eden Processor 1200MHz
stepping	: 0
cpu MHz		: 1199.940
cache size	: 128 KB

This fixes valgrind to work on my box (it requires and checks
cx8 from cpuinfo).

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Link: http://lkml.kernel.org/r/1323961888-10223-1-git-send-email-timo.teras@iki.fi
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-12-15 08:04:42 -08:00
Joerg Roedel
969df4b829 x86: Report cpb and eff_freq_ro flags correctly
Add the flags to get rid of the [9] and [10] feature names
in cpuinfo's 'power management' fields and replace them with
meaningful names.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1323875574-17881-1-git-send-email-joerg.roedel@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-15 08:14:49 +01:00
Ingo Molnar
715a43182a Merge branch 'early-mce-decode' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/mce 2011-12-15 08:13:40 +01:00
Fenghua Yu
29e9bf1841 x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog
Thermal throttle and power limit events are not defined as MCE errors in x86
architecture and should not generate MCE errors in mcelog.

Current kernel generates fake software defined MCE errors for these events.
This may confuse users because they may think the machine has real MCE errors
while actually only thermal throttle or power limit events happen.

To make it worse, buggy firmware on some platforms may falsely generate
the events. Therefore, kernel reports MCE errors which users think as real
hardware errors. Although the firmware bugs should be fixed, on the other hand,
kernel should not report MCE errors either.

So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count,
package_power_limit_count, core_throttle_count, and package_throttle_count) in
/sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters
for each event on each CPU. Please note that all CPU's on one package report
duplicate counters. It's user application's responsibity to retrieve a package
level counter for one package.

This patch doesn't report package level power limit, core level power limit, and
package level thermal throttle events in mcelog. When the events happen, only
report them in respective counters in sysfs.

Since core level thermal throttle has been legacy code in kernel for a while and
users accepted it as MCE error in mcelog, core level thermal throttle is still
reported in mcelog. In the mean time, the event is counted in a counter in sysfs
as well.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Borislav Petkov <bp@amd64.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-14 16:25:26 -08:00
Borislav Petkov
0937195715 x86, MCE: Drain mcelog buffer
Add a function which drains whatever MCEs were logged in already during
boot and before the decoder chains were registered.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-12-14 12:50:13 +01:00
Borislav Petkov
3653ada5d3 x86, mce: Add wrappers for registering on the decode chain
No functionality change, this is done so that in a follow-on patch all
queued-up MCEs can be decoded after registering on the chain.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-12-14 12:50:12 +01:00
Frederic Weisbecker
98ad1cc14a x86: Call idle notifier after irq_enter()
Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for irq_enter() -> rcu_idle_exit() to be called before
doing so otherwise this results in a grumpy RCU:

[    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    0.099991] Hardware name: AMD690VM-FMH
[    0.099991] Modules linked in:
[    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[    0.099991] Call Trace:
[    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
[    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
[    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
[    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Henroid <andrew.d.henroid@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:38 -08:00
Borislav Petkov
d059f24a96 x86, CPU: Drop superfluous get_cpu_cap() prototype
The get_cpu_cap() external function prototype was declared twice
so lose one of them.

Clean up the header guard while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1322594083-14507-1-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-09 08:00:53 +01:00
Gleb Natapov
b3d9468a8b perf, x86: Expose perf capability to other modules
KVM needs to know perf capability to decide which PMU it can expose to a
guest.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1320929850-10480-8-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:41:08 +01:00
Peter Zijlstra
c1d6f42f1a perf, x86: Implement arch event mask as quirk
Implement the disabling of arch events as a quirk so that we can print
a message along with it. This creates some visibility into the problem
space and could allow us to work on adding more work-around like the
AAJ80 one.

Requested-by: Ingo Molnar <mingo@elte.hu>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-wcja2z48wklzu1b0nkz0a5y7@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:41:06 +01:00
Gleb Natapov
ffb871bc91 x86, perf: Disable non available architectural events
Intel CPUs report non-available architectural events in cpuid leaf
0AH.EBX. Use it to disable events that are not available according
to CPU.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1320929850-10480-7-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:41:05 +01:00
Peter Zijlstra
4defea8559 perf, x86: Prefer fixed-purpose counters when scheduling
This avoids a scheduling failure for cases like:

  cycles, cycles, instructions, instructions (on Core2)

Which would end up being programmed like:

  PMC0, PMC1, FP-instructions, fail

Because all events will have the same weight.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 08:33:58 +01:00