Commit Graph

10011 Commits

Author SHA1 Message Date
H. Peter Anvin
b5660ba76b x86, platforms: Remove NUMAQ
The NUMAQ support seems to be unmaintained, remove it.

Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/n/530CFD6C.7040705@zytor.com
2014-02-27 08:07:39 -08:00
H. Peter Anvin
c5f9ee3d66 x86, platforms: Remove SGI Visual Workstation
The SGI Visual Workstation seems to be dead; remove support so we
don't have to continue maintaining it.

Cc: Andrey Panin <pazke@donpac.ru>
Cc: Michael Reed <mdr@sgi.com>
Link: http://lkml.kernel.org/r/530CFD6C.7040705@zytor.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-27 08:07:39 -08:00
Peter Zijlstra
c347a2f179 perf/x86: Add a few more comments
Add a few comments on the ->add(), ->del() and ->*_txn()
implementation.

Requested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-he3819318c245j7t5e1e22tr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:43:25 +01:00
Ingo Molnar
ff5a7088f0 Merge branch 'perf/urgent' into perf/core
Merge the latest fixes before queueing up new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:41:17 +01:00
Peter Zijlstra
26e61e8939 perf/x86: Fix event scheduling
Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
with perf WARN_ON()s triggering. He also provided traces of the failures.

This is I think the relevant bit:

	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
	>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
	>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
	>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
	>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)

So we add the insn:p event (fd[23]).

At this point we should have:

  n_events = 2, n_added = 1, n_txn = 1

	>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
	>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)

We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
that's not visible.

	group_sched_in()
	  pmu->start_txn() /* nop - BP pmu */
	  event_sched_in()
	     event->pmu->add()

So here we should end up with:

  0: n_events = 3, n_added = 2, n_txn = 2
  4: n_events = 4, n_added = 3, n_txn = 3

But seeing the below state on x86_pmu_enable(), the must have failed,
because the 0 and 4 events aren't there anymore.

Looking at group_sched_in(), since the BP is the leader, its
event_sched_in() must have succeeded, for otherwise we would not have
seen the sibling adds.

But since neither 0 or 4 are in the below state; their event_sched_in()
must have failed; but I don't see why, the complete state: 0,0,1:p,4
fits perfectly fine on a core2.

However, since we try and schedule 4 it means the 0 event must have
succeeded!  Therefore the 4 event must have failed, its failure will
have put group_sched_in() into the fail path, which will call:

	event_sched_out()
	  event->pmu->del()

on 0 and the BP event.

Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
giving what we see below:

 n_event = 2, n_added = 2, n_txn = 2

	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
	>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
	>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
	>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0

So the problem is that x86_pmu_del(), when called from a
group_sched_in() that fails (for whatever reason), and without x86_pmu
TXN support (because the leader is !x86_pmu), will corrupt the n_added
state.

Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:38:02 +01:00
Rafael J. Wysocki
38953d3945 Merge back earlier 'acpi-processor' material. 2014-02-27 00:22:42 +01:00
Kees Cook
e2b32e6785 x86, kaslr: randomize module base load address
Randomize the load address of modules in the kernel to make kASLR
effective for modules.  Modules can only be loaded within a particular
range of virtual address space.  This patch adds 10 bits of entropy to
the load address by adding 1-1024 * PAGE_SIZE to the beginning range
where modules are loaded.

The single base offset was chosen because randomizing each module
load ends up wasting/fragmenting memory too much. Prior approaches to
minimizing fragmentation while doing randomization tend to result in
worse entropy than just doing a single base address offset.

Example kASLR boot without this change, with a single module loaded:
---[ Modules ]---
0xffffffffc0000000-0xffffffffc0001000           4K     ro     GLB x  pte
0xffffffffc0001000-0xffffffffc0002000           4K     ro     GLB NX pte
0xffffffffc0002000-0xffffffffc0004000           8K     RW     GLB NX pte
0xffffffffc0004000-0xffffffffc0200000        2032K                   pte
0xffffffffc0200000-0xffffffffff000000        1006M                   pmd
---[ End Modules ]---

Example kASLR boot after this change, same module loaded:
---[ Modules ]---
0xffffffffc0000000-0xffffffffc0200000           2M                   pmd
0xffffffffc0200000-0xffffffffc03bf000        1788K                   pte
0xffffffffc03bf000-0xffffffffc03c0000           4K     ro     GLB x  pte
0xffffffffc03c0000-0xffffffffc03c1000           4K     ro     GLB NX pte
0xffffffffc03c1000-0xffffffffc03c3000           8K     RW     GLB NX pte
0xffffffffc03c3000-0xffffffffc0400000         244K                   pte
0xffffffffc0400000-0xffffffffff000000        1004M                   pmd
---[ End Modules ]---

Signed-off-by: Andy Honig <ahonig@google.com>
Link: http://lkml.kernel.org/r/20140226005916.GA27083@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25 17:07:26 -08:00
Eugene Surovegin
b6085a8657 x86, kaslr: export offset in VMCOREINFO ELF notes
Include kASLR offset in VMCOREINFO ELF notes to assist in debugging.

[ hpa: pushing this for v3.14 to avoid having a kernel version with
  kASLR where we can't debug output. ]

Signed-off-by: Eugene Surovegin <surovegin@google.com>
Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25 16:57:47 -08:00
Linus Torvalds
208937fdcf Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - a bugfix which prevents a divide by 0 panic when the newly introduced
   try_msr_calibrate_tsc() fails

 - enablement of the Baytrail platform to utilize the newfangled msr
   based calibration

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: tsc: Add missing Baytrail frequency to the table
  x86, tsc: Fallback to normal calibration if fast MSR calibration fails
2014-02-23 14:15:46 -08:00
Linus Torvalds
9b3e7c9b9a Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixlets from all around the place"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Fix IVT/SNB-EP uncore CBOX NID filter table
  perf/x86: Correctly use FEATURE_PDCM
  perf, nmi: Fix unknown NMI warning
  perf trace: Fix ioctl 'request' beautifier build problems on !(i386 || x86_64) arches
  perf trace: Add fallback definition of EFD_SEMAPHORE
  perf list: Fix checking for supported events on older kernels
  perf tools: Handle PERF_RECORD_HEADER_EVENT_TYPE properly
  perf probe: Do not add offset twice to uprobe address
  perf/x86: Fix Userspace RDPMC switch
  perf/x86/intel/p6: Add userspace RDPMC quirk for PPro
2014-02-22 12:11:54 -08:00
Fernando Luis Vázquez Cao
0d75de4a65 kvm: remove redundant registration of BSP's hv_clock area
These days hv_clock allocation is memblock based (i.e. the percpu
allocator is not involved), which means that the physical address
of each of the per-cpu hv_clock areas is guaranteed to remain
unchanged through all its lifetime and we do not need to update
its location after CPU bring-up.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-22 15:53:32 +01:00
Stephane Eranian
337397f3af perf/x86/uncore: Fix IVT/SNB-EP uncore CBOX NID filter table
This patch updates the CBOX PMU filters mapping tables for SNB-EP
and IVT (model 45 and 62 respectively).

The NID umask always comes in addition to another umask.
When set, the NID filter is applied.

The current mapping tables were missing some code/umask
combinations to account for the NID umask. This patch
fixes that.

Cc: mingo@elte.hu
Cc: ak@linux.intel.com
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140219131018.GA24475@quad
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 22:09:01 +01:00
Peter Zijlstra
c9b08884c9 perf/x86: Correctly use FEATURE_PDCM
The current code simply assumes Intel Arch PerfMon v2+ to have
the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check
CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead.

This was found by KVM which implements v2+ but didn't provide the
capabilities MSR. Change the code to DTRT; KVM will also implement the
MSR and return 0.

Cc: pbonzini@redhat.com
Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140203132903.GI8874@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 22:09:01 +01:00
Markus Metzger
a3ef2229c9 perf, nmi: Fix unknown NMI warning
When using BTS on Core i7-4*, I get the below kernel warning.

$ perf record -c 1 -e branches:u ls
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2.

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317920] Do you have a strange power saving mode enabled?

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317945] Dazed and confused, but trying to continue

Make intel_pmu_handle_irq() take the full exit path when returning early.

Cc: eranian@google.com
Cc: peterz@infradead.org
Cc: mingo@kernel.org
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-andi@firstfloor.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 22:09:01 +01:00
Stephane Eranian
e9d9768824 perf/x86/uncore: use MiB unit for events for SNB/IVB/HSW IMC
This patch makes perf use Mebibytes to display the counts
of uncore_imc/data_reads/ and uncore_imc/data_writes.

1MiB = 1024*1024 bytes.

Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: peterz@infradead.org
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392132015-14521-9-git-send-email-eranian@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:49:08 +01:00
Stephane Eranian
ced2efb099 perf/x86/uncore: add hrtimer to SNB uncore IMC PMU
This patch is needed because that PMU uses 32-bit free
running counters with no interrupt capabilities.

On SNB/IVB/HSW, we used 20GB/s theoretical peak to calculate
the hrtimer timeout necessary to avoid missing an overflow.
That delay is set to 5s to be on the cautious side.

The SNB IMC uses free running counters, which are handled
via pseudo fixed counters. The SNB IMC PMU implementation
supports an arbitrary number of events, because the counters
are read-only. Therefore it is not possible to track active
counters. Instead we put active events on a linked list which
is then used by the hrtimer handler to update the SW counts.

Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: peterz@infradead.org
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392132015-14521-8-git-send-email-eranian@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:49:08 +01:00
Stephane Eranian
b9e1ab6d4c perf/x86/uncore: add SNB/IVB/HSW client uncore memory controller support
This patch adds a new uncore PMU for Intel SNB/IVB/HSW client
CPUs. It adds the Integrated Memory Controller (IMC) PMU. This
new PMU provides a set of events to measure memory bandwidth utilization.

The IMC on those processor is PCI-space based. This patch
exposes a new uncore PMU on those processor: uncore_imc

Two new events are defined:
  - name: data_reads
  - code: 0x1
  - unit: 64 bytes
  - number of full cacheline read requests to the IMC

  - name: data_writes
  - code: 0x2
  - unit: 64 bytes
  - number of full cacheline write requests to the IMC

Documentation available at:
http://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel

Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: peterz@infradead.org
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392132015-14521-7-git-send-email-eranian@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:49:08 +01:00
Stephane Eranian
001e413f7e perf/x86/uncore: move uncore_event_to_box() and uncore_pmu_to_box()
Move a couple of functions around to avoid forward declarations
when we add code later on.

Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: peterz@infradead.org
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392132015-14521-6-git-send-email-eranian@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:49:07 +01:00
Stephane Eranian
79859cce5a perf/x86/uncore: make hrtimer timeout configurable per box
This patch makes the hrtimer timeout configurable per PMU
box. Not all counters have necessarily the same width and
rate, thus the default timeout of 60s may need to be adjusted.

This patch adds box->hrtimer_duration. It is set to default
when the box is allocated. It can be overriden when the box
is initialized.

Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: peterz@infradead.org
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392132015-14521-5-git-send-email-eranian@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:49:07 +01:00
Stephane Eranian
d64b25b6a0 perf/x86/uncore: add ability to customize pmu callbacks
This patch enables custom struct pmu callbacks per uncore
PMU types. This feature may be used to simplify counter
setup for certain uncore PMUs which have free running
counters for instance. It becomes possible to bypass
the event scheduling phase of the configuration.

Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: peterz@infradead.org
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392132015-14521-3-git-send-email-eranian@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:49:07 +01:00
Stephane Eranian
411cf180fa perf/x86/uncore: fix initialization of cpumask
On certain processors, the uncore PMU boxes may only be
msr-bsed or PCI-based. But in both cases, the cpumask,
suggesting on which CPUs to monitor to get full coverage
of the particular PMU, must be created.

However with the current code base, the cpumask was only
created on processor which had at least one MSR-based
uncore PMU. This patch removes that restriction and
ensures the cpumask is created even when there is no
msr-based PMU. For instance, on SNB client where only
a PCI-based memory controller PMU is supported.

Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: peterz@infradead.org
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392132015-14521-2-git-send-email-eranian@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:49:07 +01:00
Thomas Gleixner
d97a860c4f Merge branch 'linus' into sched/core
Reason: Bring bakc upstream modification to resolve conflicts

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:37:09 +01:00
Jiang Liu
896dc50640 x86, acpi: Fix bug in associating hot-added CPUs with corresponding NUMA node
Current ACPI cpu hotplug driver fails to associate hot-added CPUs with
corresponding NUMA node when doing socket online. The code path to
associate CPU with NUMA node is as below:
acpi_processor_add()
    ->acpi_processor_get_info()
	->acpi_processor_hotadd_init()
	    ->acpi_map_lsapic()
		->_acpi_map_lsapic()
		    ->acpi_map_cpu2node()
cpu_subsys_online()
    ->try_online_node()
	->node_set_online()

When doing socket online, a new NUMA node is introduced in addition to
hot-added CPU and memory device. And the new NUMA node is marked as
online when onlining hot-added CPUs through sysfs interface
/sys/devices/system/cpu/cpuxx/online.

On the other hand, acpi_map_cpu2node() will only build the CPU to node
map if corresponding NUMA node is already online, so it always fails
to associate hot-added CPUs with corresponding NUMA node because the
NUMA node is still in offline state.

For the fix, we could safely remove the "node_online(node)" check in
function acpi_map_cpu2node() because it's only called for hot-added CPUs
by acpi_processor_hotadd_init().

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1390185115-26850-1-git-send-email-jiang.liu@linux.intel.com
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-20 19:01:22 -08:00
Linus Torvalds
d49649615d Merge branch 'fixes-for-v3.14' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA-mapping fixes from Marek Szyprowski:
 "This contains fixes for incorrect atomic test in dma-mapping subsystem
  for ARM and x86 architecture"

* 'fixes-for-v3.14' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  x86: dma-mapping: fix GFP_ATOMIC macro usage
  ARM: dma-mapping: fix GFP_ATOMIC macro usage
2014-02-20 11:58:56 -08:00
Len Brown
2194324d8b ACPI idle: permit sparse C-state sub-state numbers
Linux uses CPUID.MWAIT.EDX to validate the C-states
reported by ACPI, silently discarding states which
are not supported by the HW.

This test is too restrictive, as some HW now uses
sparse sub-state numbering, so the sub-state number
may be higher than the number of sub-states...

Also, rather than silently ignoring an invalid state,
we should complain about a firmware bug.

In practice...

Bay Trail systems originally supported C6-no-shrink as
MWAIT sub-state 0x58, and in CPUID.MWAIT.EDX 0x03000000
indicated that there were 3 MWAIT-C6 sub-states.
So acpi_idle would discard that C-state because 8 >= 3.

Upon discovering this issue, the ucode was updated so that
C6-no-shrink was also exported as 0x51, and the BIOS was
updated to match.  However, systems shipped with 0x58,
will never get a BIOS update, and this patch allows
Linux to see C6-no-shrink on early Bay Trail.

Signed-off-by: Len Brown <len.brown@intel.com>
2014-02-19 17:33:18 -05:00
Mika Westerberg
3e11e818bf x86: tsc: Add missing Baytrail frequency to the table
Intel Baytrail is based on Silvermont core so MSR_FSB_FREQ[2:0] == 0 means
that the CPU reference clock runs at 83.3MHz. Add this missing frequency to
the table.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Bin Gao <bin.gao@linux.intel.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1392810750-18660-2-git-send-email-mika.westerberg@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-19 17:12:24 +01:00
Thomas Gleixner
5f0e030930 x86, tsc: Fallback to normal calibration if fast MSR calibration fails
If we cannot calibrate TSC via MSR based calibration
try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that
to the caller. This value gets then propagated further to clockevents
code resulting division by zero oops like the one below:

 divide error: 0000 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0+ #47
 task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000
 RIP: 0010:[<ffffffff810aec14>]  [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0
 RSP: 0000:ffff880075507e58  EFLAGS: 00010246
 RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
 RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be
 R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008
 R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0
 Stack:
  ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88
  ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168
  ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000
 Call Trace:
  [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30
  [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0
  [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8
  [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0
  [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205
  [<ffffffff8177c910>] ? rest_init+0x90/0x90
  [<ffffffff8177c91e>] kernel_init+0xe/0x120
  [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0
  [<ffffffff8177c910>] ? rest_init+0x90/0x90

Prevent this from happening by:
 1) Modifying try_msr_calibrate_tsc() to return calibration value or zero
    if it fails.
 2) Check this return value in native_calibrate_tsc() and in case of zero
    fallback to use normal non-MSR based calibration.

[mw: Added subject and changelog]

Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bin Gao <bin.gao@linux.intel.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-19 17:12:24 +01:00
Hanjun Guo
328281b1cd ACPI: Move BAD_MADT_ENTRY() to linux/acpi.h
BAD_MADT_ENTRY() is arch independent and will be used for all
architectures which parse MADT, so move it to linux/acpi.h to
reduce code duplication.

Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-19 00:56:07 +01:00
Ard Biesheuvel
2b9c1f0327 x86: align x86 arch with generic CPU modalias handling
The x86 CPU feature modalias handling existed before it was reimplemented
generically. This patch aligns the x86 handling so that it
(a) reuses some more code that is now generic;
(b) uses the generic format for the modalias module metadata entry, i.e., it
    now uses 'cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:,XXXX,YYYY' instead of
    the 'x86cpu:vendor:VVVV👪FFFF:model:MMMM:feature:,XXXX,YYYY' that was
    used before.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-18 12:45:38 -08:00
Linus Torvalds
9bd01b9bbd Merge tag 'trace-fixes-v3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull twi tracing fixes from Steven Rostedt:
 "Two urgent fixes in the tracing utility.

  The first is a fix for the way the ring buffer stores timestamps.
  After a restructure of the code was done, the ring buffer timestamp
  logic missed the fact that the first event on a sub buffer is to have
  a zero delta, as the full timestamp is stored on the sub buffer
  itself.  But because the delta was not cleared to zero, the timestamp
  for that event will be calculated as the real timestamp + the delta
  from the last timestamp.  This can skew the timestamps of the events
  and have them say they happened when they didn't really happen.
  That's bad.

  The second fix is for modifying the function graph caller site.  When
  the stop machine was removed from updating the function tracing code,
  it missed updating the function graph call site location.  It is still
  modified as if it is being done via stop machine.  But it's not.  This
  can lead to a GPF and kernel crash if the function graph call site
  happens to lie between cache lines and one CPU is executing it while
  another CPU is doing the update.  It would be a very hard condition to
  hit, but the result is severe enough to have it fixed ASAP"

* tag 'trace-fixes-v3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/x86: Use breakpoints for converting function graph caller
  ring-buffer: Fix first commit on sub-buffer having non-zero delta
2014-02-15 15:03:34 -08:00
Andi Kleen
40747ffa5a asmlinkage: Make jiffies visible
Jiffies is referenced by the linker script, so it has to be visible.

Handled both the generic and the x86 version.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391845930-28580-3-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13 18:12:09 -08:00
H. Peter Anvin
03bbd596ac x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled
If SMAP support is not compiled into the kernel, don't enable SMAP in
CR4 -- in fact, we should clear it, because the kernel doesn't contain
the proper STAC/CLAC instructions for SMAP support.

Found by Fengguang Wu's test system.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.7+
2014-02-13 07:50:25 -08:00
David Rientjes
7cf6c94591 x86, apic: Remove support for IBM Summit/EXA chipset
There should no longer be any IBM x440 systems or those using the
Summit/EXA chipset out in the wild, so remove support for it.

We've done our due diligence in reaching out to any contact information
listed for this chipset and no indication was given that it should be
kept around.

Signed-off-by: David Rientjes <rientjes@google.com>
2014-02-11 18:11:13 -08:00
David Rientjes
58f5d2d448 x86, apic: Remove support for ia32-based Unisys ES7000
There should no longer be any ia32-based Unisys ES7000 systems out in
the wild, so remove support for it.

We've done our due diligence in reaching out to any contact information
listed for this system and no indication was given that it should be
kept around.

Signed-off-by: David Rientjes <rientjes@google.com>
2014-02-11 17:47:48 -08:00
Steven Rostedt (Red Hat)
87fbb2ac60 ftrace/x86: Use breakpoints for converting function graph caller
When the conversion was made to remove stop machine and use the breakpoint
logic instead, the modification of the function graph caller is still
done directly as though it was being done under stop machine.

As it is not converted via stop machine anymore, there is a possibility
that the code could be layed across cache lines and if another CPU is
accessing that function graph call when it is being updated, it could
cause a General Protection Fault.

Convert the update of the function graph caller to use the breakpoint
method as well.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org # 3.5+
Fixes: 08d636b6d4 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-11 20:19:44 -05:00
Nicolas Pitre
16f8b05abe sched/idle, x86: Remove redundant cpuidle_idle_call()
The core idle loop now takes care of it.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linaro-kernel@lists.linaro.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-3ioazimg4j5iq6kdefks04i8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-11 09:58:28 +01:00
Marek Szyprowski
c091c71ad2 x86: dma-mapping: fix GFP_ATOMIC macro usage
GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller
wants to perform an atomic allocation, the code must test for a lack of the
__GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1.

CC: stable@vger.kernel.org
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2014-02-11 09:40:15 +01:00
David Rientjes
dc9788f40a x86/apic: Always define nox2apic and define it as initdata
The "nox2apic" variable can be defined as __initdata since it is
only used for bootstrap.  It can now unconditionally be defined
since it will later be freed.

At the same time, it is also better off as a bool.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354380.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:11 +01:00
David Rientjes
465822cfc8 x86/apic: Switch wait_for_init_deassert() to a bool flag
Now that there is only a single wait_for_init_deassert()
function, just convert the member of struct apic to a bool to
determine whether we need to wait for init_deassert to become
non-zero.

There are no more callers of default_wait_for_init_deassert(),
so fold it into the caller.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354010.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:08 +01:00
David Rientjes
d3c63ae1e2 x86/apic: Only use default_wait_for_init_deassert()
es7000_wait_for_init_deassert() is functionally equivalent to
default_wait_for_init_deassert(), so remove the duplicate code
and use only a single function.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042353030.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:07 +01:00
Ville Syrjälä
c71ef7b3c3 x86/gpu: Print the Intel graphics stolen memory range
Print an informative message when reserving the graphics stolen
memory region in the early quirk.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-4-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:11:31 +01:00
Ville Syrjälä
a4dff76924 x86/gpu: Add Intel graphics stolen memory quirk for gen2 platforms
There isn't an explicit stolen memory base register on gen2.
Some old comment in the i915 code suggests we should get it via
max_low_pfn_mapped, but that's clearly a bad idea on my MGM.

The e820 map in said machine looks like this:

	BIOS-e820: [mem 0x0000000000000000-0x000000000009f7ff] usable
	BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
	BIOS-e820: [mem 0x00000000000ce000-0x00000000000cffff] reserved
	BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
	BIOS-e820: [mem 0x0000000000100000-0x000000001f6effff] usable
	BIOS-e820: [mem 0x000000001f6f0000-0x000000001f6f7fff] ACPI data
	BIOS-e820: [mem 0x000000001f6f8000-0x000000001f6fffff] ACPI NVS
	BIOS-e820: [mem 0x000000001f700000-0x000000001fffffff] reserved
	BIOS-e820: [mem 0x00000000fec10000-0x00000000fec1ffff] reserved
	BIOS-e820: [mem 0x00000000ffb00000-0x00000000ffbfffff] reserved
	BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved

That makes max_low_pfn_mapped = 1f6f0000, so assuming our stolen
memory would start there would place it on top of some ACPI
memory regions. So not a good idea as already stated.

The 9MB region after the ACPI regions at 0x1f700000 however
looks promising given that the macine reports the stolen memory
size to be 8MB. Looking at the PGTBL_CTL register, the GTT
entries are at offset 0x1fee00000, and given that the GTT
entries occupy 128KB, it looks like the stolen memory could
start at 0x1f700000 and the GTT entries would occupy the last
128KB of the stolen memory.

After some more digging through chipset documentation, I've
determined the BIOS first allocates space for something called
TSEG (something to do with SMM) from the top of memory, and then
it allocates the graphics stolen memory below that. Accordind to
the chipset documentation TSEG has a fixed size of 1MB on 855.
So that explains the top 1MB in the e820 region. And it also
confirms that the GTT entries are in fact at the end of the the
stolen memory region.

Derive the stolen memory base address on gen2 the same as the
BIOS does (TOM-TSEG_SIZE-stolen_size). There are a few
differences between the registers on various gen2 chipsets, so a
few different codepaths are required.

865G is again bit more special since it seems to support enough
memory to hit 4GB address space issues. This means the PCI
allocations will also affect the location of the stolen memory.
Fortunately there appears to be the TOUD register which may give
us the correct answer directly. But the chipset docs are a bit
unclear, so I'm not 100% sure that the graphics stolen memory is
always the last thing the BIOS steals. Someone would need to
verify it on a real system.

I tested this on the my 830 and 855 machines, and so far
everything looks peachy.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-3-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:11:30 +01:00
Ville Syrjälä
52ca70454e x86/gpu: Add vfunc for Intel graphics stolen memory base address
For gen2 devices we're going to need another way to determine
the stolen memory base address. Make that into a vfunc as well.

Also drop the bogus inline keyword from gen8_stolen_size().

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-2-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:11:30 +01:00
Don Zickus
90ed5b0fa5 perf/x86/p4: Block PMIs on init to prevent a stream of unkown NMIs
A bunch of unknown NMIs have popped up on a Pentium4 recently when booting
into a kdump kernel.  This was exposed because the watchdog timer went
from 60 seconds down to 10 seconds (increasing the ability to reproduce
this problem).

What is happening is on boot up of the second kernel (the kdump one),
the previous nmi_watchdogs were enabled on thread 0 and thread 1.  The
second kernel only initializes one cpu but the perf counter on thread 1
still counts.

Normally in a kdump scenario, the other cpus are blocking in an NMI loop,
but more importantly their local apics have the performance counters disabled
(iow LVTPC is masked).  So any counters that fire are masked and never get
through to the second kernel.

However, on a P4 the local apic is shared by both threads and thread1's PMI
(despite being configured to only interrupt thread1) will generate an NMI on
thread0.  Because thread0 knows nothing about this NMI, it is seen as an
unknown NMI.

This would be fine because it is a kdump kernel, strange things happen
what is the big deal about a single unknown NMI.

Unfortunately, the P4 comes with another quirk: clearing the overflow bit
to prevent a stream of NMIs.  This is the problem.

The kdump kernel can not execute because of the endless NMIs that happen.

To solve this, I instrumented the p4 perf init code, to walk all the counters
and zero them out (just like a normal reset would).

Now when the counters go off, they do not generate anything and no unknown
NMIs are seen.

I tested this on a P4 we have in our lab.  After two or three crashes, I could
normally reproduce the problem.  Now after 10 crashes, everything continues
to boot correctly.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120154115.GZ25953@redhat.com
[ Fixed a stylistic detail. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:20:35 +01:00
Don Zickus
13beacee81 perf/x86/p4: Fix counter corruption when using lots of perf groups
On a P4 box stressing perf with:

   ./perf record -o perf.data ./perf stat -v ./perf bench all

it was noticed that a slew of unknown NMIs would pop out rather quickly.

Painfully debugging this ancient platform, led me to notice cross cpu counter
corruption.

The P4 machine is special in that it has 18 counters, half are used for cpu0
and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
the splitting of the counters has to be actively managed by the software.

In this particular bug, one of the cpu0 specific counters was being used by
cpu1 and caused all sorts of random unknown nmis.

I am not entirely sure on the corruption path, but what happens is:

 o perf schedules a group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   but for a different cpu, so it 'swaps' the config bits and returns the
   updated 'assign' array with a _new_ index.
 o perf schedules another group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
   an earlier part of the 'assign' array (and hasn't been committed yet).
 o perf commits the transaction using the wrong index and corrupts the other cpu

The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'.  So
the check for 'p4_should_swap_ts()' is correct the first time around but
incorrect the second time around (because hwc->config was updated in between).

I think the spirit of perf was to not modify anything until all the
transactions had a chance to 'test' if they would succeed, and if so, commit
atomically.  However, P4 breaks this spirit by touching the hwc->config
element.

So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1
on swap to tell follow up group scheduling to find a new index.

Of course if the transaction fails rolling this back will be difficult, but
that is not different than how the current code works. :-)  And I wasn't sure
how much effort to cleanup the code I should do for a platform that is almost
10 years old by now.

Hence the lazy fix.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:17:23 +01:00
Peter Zijlstra
e90c785352 x86/nmi: Push duration printk() to irq context
Calling printk() from NMI context is bad (TM), so move it to IRQ
context.

In doing so we slightly change (probably wreck) the debugfs
nmi_longest_ns thingy, in that it doesn't update to reflect the
longest, nor does writing to it reset the count.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-rdw0au56a5ymis1u8p48c12d@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:17:22 +01:00
Ingo Molnar
3c3d7cb1db Merge branch 'linus' into perf/core
Refresh the branch to a v3.14-rc base before queueing up new devel patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:13:45 +01:00
Steven Rostedt
569d6557ab x86: Use preempt_disable_notrace() in cycles_2_ns()
When debug preempt is enabled, preempt_disable() can be traced by
function and function graph tracing.

There's a place in the function graph tracer that calls trace_clock()
which eventually calls cycles_2_ns() outside of the recursion
protection. When cycles_2_ns() calls preempt_disable() it gets traced
and the graph tracer will go into a recursive loop causing a crash or
worse, a triple fault.

Simple fix is to use preempt_disable_notrace() in cycles_2_ns, which
makes sense because the preempt_disable() tracing may use that code
too, and it tracing it, even with recursion protection is rather
pointless.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140204141315.2a968a72@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:09:08 +01:00
Peter Zijlstra
0e9f2204cf perf/x86: Fix Userspace RDPMC switch
The current code forgets to change the CR4 state on the current CPU.
Use on_each_cpu() instead of smp_call_function().

Reported-by: Mark Davies <junk@eslaf.co.uk>
Suggested-by: Mark Davies <junk@eslaf.co.uk>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/n/tip-69efsat90ibhnd577zy3z9gh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:08:25 +01:00
Peter Zijlstra
e97df76377 perf/x86/intel/p6: Add userspace RDPMC quirk for PPro
PPro machines can die hard when PCE gets enabled due to a CPU erratum.
The safe way it so disable it by default and keep it disabled.

See erratum 26 in:

  http://download.intel.com/design/archives/processors/pro/docs/24268935.pdf

Reported-and-Tested-by: Mark Davies <junk@eslaf.co.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140206170815.GW2936@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:08:24 +01:00