Commit Graph

11545 Commits

Author SHA1 Message Date
Oleg Nesterov
02be46fba4 ptrace/x86: revert "hw_breakpoints: Fix racy access to ptrace breakpoints"
This reverts commit 87dc669ba2 ("hw_breakpoints: Fix racy access to
ptrace breakpoints").

The patch was fine but we can no longer race with SIGKILL after commit
9899d11f65 ("ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

The patch only removes ptrace_get_breakpoints/ptrace_put_breakpoints and
does a couple of "while at it" cleanups, it doesn't remove other changes
from the reverted commit.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:25 -07:00
Naveen N. Rao
9ad95879cd mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
Add a boot option to disable firmware first mode for corrected errors.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-08 11:54:28 -07:00
Naveen N. Rao
c3d1fb567a mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
The Corrected Machine Check structure (CMC) in HEST has a flag which can be
set by the firmware to indicate to the OS that it prefers to process the
corrected error events first. In this scenario, the OS is expected to not
monitor for corrected errors (through CMCI/polling). Instead, the firmware
notifies the OS on corrected error events through GHES.

Linux already has support for GHES. This patch adds support for parsing CMC
structure and to disable CMCI/polling if the firmware first flag is set.

Further, the list of machine check bank structures at the end of CMC is used
to determine which MCA banks function in FF mode, so that we continue to
monitor error events on the other banks.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-08 11:53:01 -07:00
Linus Torvalds
21884a83b2 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
 "The timer changes contain:

   - posix timer code consolidation and fixes for odd corner cases

   - sched_clock implementation moved from ARM to core code to avoid
     duplication by other architectures

   - alarm timer updates

   - clocksource and clockevents unregistration facilities

   - clocksource/events support for new hardware

   - precise nanoseconds RTC readout (Xen feature)

   - generic support for Xen suspend/resume oddities

   - the usual lot of fixes and cleanups all over the place

  The parts which touch other areas (ARM/XEN) have been coordinated with
  the relevant maintainers.  Though this results in an handful of
  trivial to solve merge conflicts, which we preferred over nasty cross
  tree merge dependencies.

  The patches which have been committed in the last few days are bug
  fixes plus the posix timer lot.  The latter was in akpms queue and
  next for quite some time; they just got forgotten and Frederic
  collected them last minute."

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
  hrtimer: Remove unused variable
  hrtimers: Move SMP function call to thread context
  clocksource: Reselect clocksource when watchdog validated high-res capability
  posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  posix_timers: fix racy timer delta caching on task exit
  posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
  selftests: add basic posix timers selftests
  posix_cpu_timers: consolidate expired timers check
  posix_cpu_timers: consolidate timer list cleanups
  posix_cpu_timer: consolidate expiry time type
  tick: Sanitize broadcast control logic
  tick: Prevent uncontrolled switch to oneshot mode
  tick: Make oneshot broadcast robust vs. CPU offlining
  x86: xen: Sync the CMOS RTC as well as the Xen wallclock
  x86: xen: Sync the wallclock when the system time is set
  timekeeping: Indicate that clock was set in the pvclock gtod notifier
  timekeeping: Pass flags instead of multiple bools to timekeeping_update()
  xen: Remove clock_was_set() call in the resume path
  hrtimers: Support resuming with two or more CPUs online (but stopped)
  timer: Fix jiffies wrap behavior of round_jiffies_common()
  ...
2013-07-06 14:09:38 -07:00
Linus Torvalds
2cb7b5a38c Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux
Pull irqdomain refactoring from Grant Likely:
 "This is the long awaited simplification of irqdomain.  It gets rid of
  the different types of irq domains and instead both linear and tree
  mappings can be supported in a single domain.  Doing this removes a
  lot of special case code and makes irq domains simpler to understand
  overall"

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux:
  irq: fix checkpatch error
  irqdomain: Include hwirq number in /proc/interrupts
  irqdomain: make irq_linear_revmap() a fast path again
  irqdomain: remove irq_domain_generate_simple()
  irqdomain: Refactor irq_domain_associate_many()
  irqdomain: Beef up debugfs output
  irqdomain: Clean up aftermath of irq_domain refactoring
  irqdomain: Eliminate revmap type
  irqdomain: merge linear and tree reverse mappings.
  irqdomain: Add a name field
  irqdomain: Replace LEGACY mapping with LINEAR
  irqdomain: Relax failure path on setting up mappings
2013-07-06 12:37:04 -07:00
Peter Zijlstra
100ac53315 perf/x86/amd: Do not print an error when the device is not present
As Linus said its not an error to not have an AMD IOMMU; esp.
when you're not even running on an AMD platform.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Link: http://lkml.kernel.org/r/20130703075542.GF23916@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-05 08:27:15 +02:00
Thomas Gleixner
2b0f89317e Merge branch 'timers/posix-cpu-timers-for-tglx' of
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core

Frederic sayed: "Most of these patches have been hanging around for
several month now, in -mmotm for a significant chunk. They already
missed a few releases."

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-04 23:11:22 +02:00
Linus Torvalds
80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Linus Torvalds
7f0ef0267e Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - various misc bits
 - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
   distracted.  There has been quite a bit of activity.
 - About half the MM queue
 - Some backlight bits
 - Various lib/ updates
 - checkpatch updates
 - zillions more little rtc patches
 - ptrace
 - signals
 - exec
 - procfs
 - rapidio
 - nbd
 - aoe
 - pps
 - memstick
 - tools/testing/selftests updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
  tools/testing/selftests: don't assume the x bit is set on scripts
  selftests: add .gitignore for kcmp
  selftests: fix clean target in kcmp Makefile
  selftests: add .gitignore for vm
  selftests: add hugetlbfstest
  self-test: fix make clean
  selftests: exit 1 on failure
  kernel/resource.c: remove the unneeded assignment in function __find_resource
  aio: fix wrong comment in aio_complete()
  drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
  drivers/memstick/host/r592.c: convert to module_pci_driver
  drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
  pps-gpio: add device-tree binding and support
  drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
  drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
  drivers/parport/share.c: use kzalloc
  Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
  aoe: update internal version number to v83
  aoe: update copyright date
  aoe: perform I/O completions in parallel
  ...
2013-07-03 17:12:13 -07:00
Jiang Liu
46a841329a mm/x86: prepare for removing num_physpages and simplify mem_init()
Prepare for removing num_physpages and simplify mem_init().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:38 -07:00
Linus Torvalds
f991fae5c6 Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
 "This time the total number of ACPI commits is slightly greater than
  the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
  remains the most active patch submitter.

  To me, the most significant change is the addition of offline/online
  device operations to the driver core (with the Greg's blessing) and
  the related modifications of the ACPI core hotplug code.  Next are the
  freezer updates from Colin Cross that should make the freezing of
  tasks a bit less heavy weight.

  We also have a couple of regression fixes, a number of fixes for
  issues that have not been identified as regressions, two new drivers
  and a bunch of cleanups all over.

  Highlights:

   - Hotplug changes to support graceful hot-removal failures.

     It sometimes is necessary to fail device hot-removal operations
     gracefully if they cannot be carried out completely.  For example,
     if memory from a memory module being hot-removed has been allocated
     for the kernel's own use and cannot be moved elsewhere, it's
     desirable to fail the hot-removal operation in a graceful way
     rather than to crash the kernel, but currenty a success or a kernel
     crash are the only possible outcomes of an attempted memory
     hot-removal.  Needless to say, that is not a very attractive
     alternative and it had to be addressed.

     However, in order to make it work for memory, I first had to make
     it work for CPUs and for this purpose I needed to modify the ACPI
     processor driver.  It's been split into two parts, a resident one
     handling the low-level initialization/cleanup and a modular one
     playing the actual driver's role (but it binds to the CPU system
     device objects rather than to the ACPI device objects representing
     processors).  That's been sort of like a live brain surgery on a
     patient who's riding a bike.

     So this is a little scary, but since we found and fixed a couple of
     regressions it caused to happen during the early linux-next testing
     (a month ago), nobody has complained.

     As a bonus we remove some duplicated ACPI hotplug code, because the
     ACPI-based CPU hotplug is now going to use the common ACPI hotplug
     code.

   - Lighter weight freezing of tasks.

     These changes from Colin Cross and Mandeep Singh Baines are
     targeted at making the freezing of tasks a bit less heavy weight
     operation.  They reduce the number of tasks woken up every time
     during the freezing, by using the observation that the freezer
     simply doesn't need to wake up some of them and wait for them all
     to call refrigerator().  The time needed for the freezer to decide
     to report a failure is reduced too.

     Also reintroduced is the check causing a lockdep warining to
     trigger when try_to_freeze() is called with locks held (which is
     generally unsafe and shouldn't happen).

   - cpufreq updates

     First off, a commit from Srivatsa S Bhat fixes a resume regression
     introduced during the 3.10 cycle causing some cpufreq sysfs
     attributes to return wrong values to user space after resume.  The
     fix is kind of fresh, but also it's pretty obvious once Srivatsa
     has identified the root cause.

     Second, we have a new freqdomain_cpus sysfs attribute for the
     acpi-cpufreq driver to provide information previously available via
     related_cpus.  From Lan Tianyu.

     Finally, we fix a number of issues, mostly related to the
     CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
     up some code.  The majority of changes from Viresh Kumar with bits
     from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
     Arnd Bergmann, and Tang Yuantian.

   - ACPICA update

     A usual bunch of updates from the ACPICA upstream.

     During the 3.4 cycle we introduced support for ACPI 5 extended
     sleep registers, but they are only supposed to be used if the
     HW-reduced mode bit is set in the FADT flags and the code attempted
     to use them without checking that bit.  That caused suspend/resume
     regressions to happen on some systems.  Fix from Lv Zheng causes
     those registers to be used only if the HW-reduced mode bit is set.

     Apart from this some other ACPICA bugs are fixed and code cleanups
     are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
     Zhang Rui.

   - cpuidle updates

     New driver for Xilinx Zynq processors is added by Michal Simek.

     Multidriver support simplification, addition of some missing
     kerneldoc comments and Kconfig-related fixes come from Daniel
     Lezcano.

   - ACPI power management updates

     Changes to make suspend/resume work correctly in Xen guests from
     Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
     cleanups and fixes of the ACPI device power state selection
     routine.

   - ACPI documentation updates

     Some previously missing pieces of ACPI documentation are added by
     Lv Zheng and Aaron Lu (hopefully, that will help people to
     uderstand how the ACPI subsystem works) and one outdated doc is
     updated by Hanjun Guo.

   - Assorted ACPI updates

     We finally nailed down the IA-64 issue that was the reason for
     reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
     against objects having scan handlers"), so we can fix it and move
     the ACPI scan handler check added to the ACPI video driver back to
     the core.

     A mechanism for adding CMOS RTC address space handlers is
     introduced by Lan Tianyu to allow some EC-related breakage to be
     fixed on some systems.

     A spec-compliant implementation of acpi_os_get_timer() is added by
     Mika Westerberg.

     The evaluation of _STA is added to do_acpi_find_child() to avoid
     situations in which a pointer to a disabled device object is
     returned instead of an enabled one with the same _ADR value.  From
     Jeff Wu.

     Intel BayTrail PCH (Platform Controller Hub) support is added to
     the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
     driver is modified to work around a couple of known BIOS issues.
     Changes from Mika Westerberg and Heikki Krogerus.

     The EC driver is fixed by Vasiliy Kulikov to use get_user() and
     put_user() instead of dereferencing user space pointers blindly.

     Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
     Kani.

   - Assorted power management updates

     The "runtime idle" helper routine is changed to take the return
     values of the callbacks executed by it into account and to call
     rpm_suspend() if they return 0, which allows us to reduce the
     overall code bloat a bit (by dropping some code that's not
     necessary any more after that modification).

     The runtime PM documentation is updated by Alan Stern (to reflect
     the "runtime idle" behavior change).

     New trace points for PM QoS are added by Sahara
     (<keun-o.park@windriver.com>).

     PM QoS documentation is updated by Lan Tianyu.

     Code cleanups are made and minor issues are addressed by Bernie
     Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

   - devfreq updates

     New driver for the Exynos5-bus device from Abhilash Kesavan.

     Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
     Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

   - OMAP power management updates

     Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
     updates from Andrii Tseglytskyi and Nishanth Menon."

* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
  cpufreq: Fix cpufreq regression after suspend/resume
  ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
  PM / Sleep: Warn about system time after resume with pm_trace
  cpufreq: don't leave stale policy pointer in cdbs->cur_policy
  acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
  cpufreq: make sure frequency transitions are serialized
  ACPI: implement acpi_os_get_timer() according the spec
  ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
  ACPI: Add CMOS RTC Operation Region handler support
  ACPI / processor: Drop unused variable from processor_perflib.c
  cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
  ...
2013-07-03 14:35:40 -07:00
Linus Torvalds
a9f4a7005f Merge tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull thermal power-limit update from Tony Luck:
 "Thermal limit warnings are too scary and cause unnecessary concern"

* tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  x86 thermal: Disable power limit notification interrupt by default
  x86 thermal: Delete power-limit-notification console messages
2013-07-03 11:16:09 -07:00
Linus Torvalds
7c6809ff2b Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 UV update from Ingo Molnar:
 "There's a single commit in this tree, which adds support for a new SGI
  UV GRU (Global Reference Unit - fast NUMA messaging ASIC) hardware
  feature to scale up and beyond: an optional distributed mode that will
  allow per-node address mapping of local GRU space, as opposed to
  mapping all GRU hardware to the same contiguous high space"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/UV: Add GRU distributed mode mappings
2013-07-02 16:33:05 -07:00
Linus Torvalds
96a3d998fb Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tracing updates from Ingo Molnar:
 "This tree adds IRQ vector tracepoints that are named after the handler
  and which output the vector #, based on a zero-overhead approach that
  relies on changing the IDT entries, by Seiji Aguchi.

  The new tracepoints look like this:

   # perf list | grep -i irq_vector
    irq_vectors:local_timer_entry                      [Tracepoint event]
    irq_vectors:local_timer_exit                       [Tracepoint event]
    irq_vectors:reschedule_entry                       [Tracepoint event]
    irq_vectors:reschedule_exit                        [Tracepoint event]
    irq_vectors:spurious_apic_entry                    [Tracepoint event]
    irq_vectors:spurious_apic_exit                     [Tracepoint event]
    irq_vectors:error_apic_entry                       [Tracepoint event]
    irq_vectors:error_apic_exit                        [Tracepoint event]
   [...]"

* 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tracing: Add config option checking to the definitions of mce handlers
  trace,x86: Do not call local_irq_save() in load_current_idt()
  trace,x86: Move creation of irq tracepoints from apic.c to irq.c
  x86, trace: Add irq vector tracepoints
  x86: Rename variables for debugging
  x86, trace: Introduce entering/exiting_irq()
  tracing: Add DEFINE_EVENT_FN() macro
2013-07-02 16:31:49 -07:00
Linus Torvalds
3045f94a20 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The changes in this tree are:

   - ACPI APEI (ACPI Platform Error Interface) improvements, by Chen
     Gong
   - misc MCE fixes/cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Update MCE severity condition check
  mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
  ACPI/APEI: Update einj documentation for param1/param2
  ACPI/APEI: Add parameter check before error injection
  ACPI, APEI, EINJ: Fix error return code in einj_init()
  x86, mce: Fix "braodcast" typo
2013-07-02 16:30:46 -07:00
Linus Torvalds
1982269a5c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc improvements:

   - Fix /proc/mtrr reporting
   - Fix ioremap printout
   - Remove the unused pvclock fixmap entry on 32-bit
   - misc cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Correct function name output
  x86: Fix /proc/mtrr with base/size more than 44bits
  ix86: Don't waste fixmap entries
  x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
  x86_64: Correct phys_addr in cleanup_highmap comment
2013-07-02 16:29:05 -07:00
Linus Torvalds
fdd78889aa Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading update from Ingo Molnar:
 "Two main changes that improve microcode loading on AMD CPUs:

   - Add support for all-in-one binary microcode files that concatenate
     the microcode images of multiple processor families, by Jacob Shin

   - Add early microcode loading (embedded in the initrd) support, also
     by Jacob Shin"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, amd: Another early loading fixup
  x86, microcode, amd: Allow multiple families' bin files appended together
  x86, microcode, amd: Make find_ucode_in_initrd() __init
  x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m
  x86, microcode, amd: Early microcode patch loading support for AMD
  x86, microcode, amd: Refactor functions to prepare for early loading
  x86, microcode: Vendor abstract out save_microcode_in_initrd()
  x86, microcode, intel: Correct typo in printk
2013-07-02 16:28:10 -07:00
Linus Torvalds
d652df0b2f Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU changes from Ingo Molnar:
 "There are two bigger changes in this tree:

   - Add an [early-use-]safe static_cpu_has() variant and other
     robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS
     configurable debugging facility, motivated by recent obscure FPU
     code bugs, by Borislav Petkov

   - Reimplement FPU detection code in C and drop the old asm code, by
     Peter Anvin."

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: Use static_cpu_has_safe before alternatives
  x86: Add a static_cpu_has_safe variant
  x86: Sanity-check static_cpu_has usage
  x86, cpu: Add a synthetic, always true, cpu feature
  x86: Get rid of ->hard_math and all the FPU asm fu
2013-07-02 16:26:44 -07:00
Linus Torvalds
55a0d3ff60 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug update from Ingo Molnar:
 "Misc debuggability improvements:

   - Optimize the x86 CPU register printout a bit
   - Expose the tboot TXT log via debugfs
   - Small do_debug() cleanup"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tboot: Provide debugfs interfaces to access TXT log
  x86: Remove weird PTR_ERR() in do_debug
  x86/debug: Only print out DR registers if they are not power-on defaults
2013-07-02 16:25:06 -07:00
Linus Torvalds
35c23d5d79 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Two changes:

   - Extend 32-bit double fault debugging aid to 64-bit
   - Fix a build warning"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/cacheinfo: Shut up last long-standing warning
  x86: Extend #DF debugging aid to 64-bit
2013-07-02 16:24:24 -07:00
Linus Torvalds
57935b262c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc x86 cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S
  x86, cleanups: Remove extra tab in __flush_tlb_one()
  x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL
2013-07-02 16:23:50 -07:00
Linus Torvalds
002e44bfb5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull asm/x86 changes from Ingo Molnar:
 "Misc changes, with a bigger processor-flags cleanup/reorganization by
  Peter Anvin"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, asm, cleanup: Replace open-coded control register values with symbolic
  x86, processor-flags: Fix the datatypes and add bit number defines
  x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  linux/const.h: Add _BITUL() and _BITULL()
  x86/vdso: Convert use of typedef ctl_table to struct ctl_table
  x86: __force_order doesn't need to be an actual variable
2013-07-02 16:21:45 -07:00
Linus Torvalds
f0bb4c0ab0 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel improvements:

   - watchdog driver improvements by Li Zefan
   - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
   - event multiplexing via hrtimers and other improvements by Stephane
     Eranian
   - kernel stack use optimization by Andrew Hunter
   - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
   - NMI handling rate-limits by Dave Hansen
   - various hw_breakpoint fixes by Oleg Nesterov
   - hw_breakpoint overflow period sampling and related signal handling
     fixes by Jiri Olsa
   - Intel Haswell PMU support by Andi Kleen

  Tooling improvements:

   - Reset SIGTERM handler in workload child process, fix from David
     Ahern.
   - Makefile reorganization, prep work for Kconfig patches, from Jiri
     Olsa.
   - Add automated make test suite, from Jiri Olsa.
   - Add --percent-limit option to 'top' and 'report', from Namhyung
     Kim.
   - Sorting improvements, from Namhyung Kim.
   - Expand definition of sysfs format attribute, from Michael Ellerman.

  Tooling fixes:

   - 'perf tests' fixes from Jiri Olsa.
   - Make Power7 CPI stack events available in sysfs, from Sukadev
     Bhattiprolu.
   - Handle death by SIGTERM in 'perf record', fix from David Ahern.
   - Fix printing of perf_event_paranoid message, from David Ahern.
   - Handle realloc failures in 'perf kvm', from David Ahern.
   - Fix divide by 0 in variance, from David Ahern.
   - Save parent pid in thread struct, from David Ahern.
   - Handle JITed code in shared memory, from Andi Kleen.
   - Fixes for 'perf diff', from Jiri Olsa.
   - Remove some unused struct members, from Jiri Olsa.
   - Add missing liblk.a dependency for python/perf.so, fix from Jiri
     Olsa.
   - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
   - No need to do locking when adding hists in perf report, only 'top'
     needs that, from Namhyung Kim.
   - Fix alignment of symbol column in in the hists browser (top,
     report) when -v is given, from NAmhyung Kim.
   - Fix 'perf top' -E option behavior, from Namhyung Kim.
   - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
   - Fix compile errors in bp_signal 'perf test', from Sukadev
     Bhattiprolu.

  ... and more things"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
  perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
  perf/x86: Fix shared register mutual exclusion enforcement
  perf/x86/intel: Support full width counting
  x86: Add NMI duration tracepoints
  perf: Drop sample rate when sampling is too slow
  x86: Warn when NMI handlers take large amounts of time
  hw_breakpoint: Introduce "struct bp_cpuinfo"
  hw_breakpoint: Simplify *register_wide_hw_breakpoint()
  hw_breakpoint: Introduce cpumask_of_bp()
  hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
  hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
  perf/x86/intel: Add mem-loads/stores support for Haswell
  perf/x86/intel: Support Haswell/v4 LBR format
  perf/x86/intel: Move NMI clearing to end of PMI handler
  perf/x86/intel: Add Haswell PEBS support
  perf/x86/intel: Add simple Haswell PMU support
  perf/x86/intel: Add Haswell PEBS record support
  perf/x86/intel: Fix sparse warning
  perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
  perf/x86/amd: Add IOMMU Performance Counter resource management
  ...
2013-07-02 16:15:23 -07:00
Seiji Aguchi
4787c368a9 x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()
Reschedule vector tracepoints may be called in cpu idle state.
This causes lockdep check warning below.

The tracepoint requires rcu but for accuracy it also
requires irq_enter() (tracepoints record the irq context), thus,
the tracepoint interrupt handler should be calling irq_enter()
and not rcu_irq_enter() (irq_enter() calls rcu_irq_enter()).

So, add irq_enter/exit() to smp_trace_reschedule_interrupt()
with common pre/post processing functions, smp_entering_irq()
and exiting_irq() (exiting_irq() calls just irq_exit()
 in arch/x86/include/asm/apic.h),
because these can be shared among reschedule, call_function,
and call_function_single vectors.

[   50.720557] Testing event reschedule_exit:
[   50.721349]
[   50.721502] ===============================
[   50.721835] [ INFO: suspicious RCU usage. ]
[   50.722169] 3.10.0-rc6-00004-gcf910e8 #190 Not tainted
[   50.722582] -------------------------------
[   50.722915] /c/kernel-tests/src/linux/arch/x86/include/asm/trace/irq_vectors.h:50 suspicious rcu_dereference_check() usage!
[   50.723770]
[   50.723770] other info that might help us debug this:
[   50.723770]
[   50.724385]
[   50.724385] RCU used illegally from idle CPU!
[   50.724385] rcu_scheduler_active = 1, debug_locks = 0
[   50.725232] RCU used illegally from extended quiescent state!
[   50.725690] no locks held by swapper/0/0.
[   50.726010]
[   50.726010] stack backtrace:
[...]

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/51CDCFA3.9080101@hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-02 09:52:31 +02:00
Wang YanQing
237d154854 x86: Fix override new_cpu_data.x86 with 486
We should set X86 to 486 before use cpuid to detect the cpu type, if
we set X86 to 486 after cpuid, then we will get 486 until cpu_detect
runs.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Link: http://lkml.kernel.org/r/20130628144516.GA2177@udknight
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-28 15:27:29 -07:00
H. Peter Anvin
9f84b6267c Merge remote-tracking branch 'origin/x86/fpu' into queue/x86/cpu
Use the union of 3.10 x86/cpu and x86/fpu as baseline.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-28 15:26:17 -07:00
Rafael J. Wysocki
3b4550e0e0 Merge branch 'acpi-pm'
* acpi-pm:
  ACPI / PM: Rework and clean up acpi_dev_pm_get_state()
  ACPI / PM: Replace ACPI_STATE_D3 with ACPI_STATE_D3_COLD in device_pm.c
  ACPI / PM: Rename function acpi_device_power_state() and make it static
  ACPI / PM: acpi_processor_suspend() can be static
  xen / ACPI / sleep: Register an acpi_suspend_lowlevel callback.
  x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.
2013-06-28 12:58:30 +02:00
Ingo Molnar
fb476cffd5 Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras
Pull MCE cleanup from Tony Luck:

 "Changes to simplify the SDM means that we can also simplify
  the code for SRAR (software recoverable action required) errors."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:09:26 +02:00
Qiaowei Ren
13bfd47a0e x86/tboot: Provide debugfs interfaces to access TXT log
These logs come from tboot (Trusted Boot, an open source,
pre-kernel/VMM module that uses Intel TXT to perform a
measured and verified launch of an OS kernel/VMM.).

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Gang Wei <gang.wei@intel.com>
Link: http://lkml.kernel.org/r/1372053333-21788-1-git-send-email-qiaowei.ren@intel.com
[ Beautified the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:05:16 +02:00
Chen Gong
33d7885b59 x86/mce: Update MCE severity condition check
Update some SRAR severity conditions check to make it clearer,
according to latest Intel SDM Vol 3(June 2013), table 15-20.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-27 13:45:52 -07:00
Dave Airlie
4300a0f8bd Merge tag 'v3.10-rc7' into drm-next
Linux 3.10-rc7

The sdvo lvds fix in this -fixes pull

commit c3456fb3e4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Jun 10 09:47:58 2013 +0200

    drm/i915: prefer VBT modes for SVDO-LVDS over EDID

has a silent functional conflict with

commit 990256aec2
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Fri May 31 12:17:07 2013 +0000

    drm: Add probed modes in probe order

in drm-next. W simply need to add the vbt modes before edid modes, i.e. the
other way round than now.

Conflicts:
	drivers/gpu/drm/drm_prime.c
	drivers/gpu/drm/i915/intel_sdvo.c
2013-06-27 20:40:44 +10:00
Jacob Shin
9608d33b82 x86, microcode, amd: Another early loading fixup
commit cd1c32ca96 is an early premature
rendition of the patch. Augment it with this delta patch to:
  * correctly mark offset and size of the matching bin file
  * use __pa instead of __pa_nodebug during AP load
  * check for !initrd_start before using it

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130620152414.GA6676@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-26 14:55:37 -07:00
Stephane Eranian
983433b581 perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
Make sure intel_pmu_pebs_disable() and intel_pmu_pebs_enable()
are symmetrical w.r.t. PEBS-LL and precise store.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1371824448-7306-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:51 +02:00
Stephane Eranian
2f7f73a520 perf/x86: Fix shared register mutual exclusion enforcement
This patch fixes a problem with the shared registers mutual
exclusion code and incremental event scheduling by the
generic perf_event code.

There was a bug whereby the mutual exclusion on the shared
registers was not enforced because of incremental scheduling
abort due to event constraints. As an example on Intel
Nehalem, consider the following events:

group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
group2= L1D_CACHE_LD:I_STATE

The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
are 3 instances here. The first group can be scheduled and is committed.
Then, the generic code tries to schedule group2 and this fails (because
there is no more counter to support the 3rd instance of L1D_CACHE_LD).
But in x86_schedule_events() error path, put_event_contraints() is invoked
on ALL the events and not just the ones that just failed. That causes the
"lock" on the shared offcore_response MSR to be released. Yet the first group
is actually scheduled and is exposed to reprogramming of that shared msr by
the sibling HT thread. In other words, there is no guarantee on what is
measured.

This patch fixes the problem by tagging committed events with the
PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
only the events NOT tagged have their constraint released. The tag
is eventually removed when the event in descheduled.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130620164254.GA3556@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:49 +02:00
Linus Torvalds
54faf77d06 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Three small fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
  hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
  kprobes: Fix arch_prepare_kprobe to handle copy insn failures
2013-06-26 08:51:44 -10:00
Andi Kleen
069e0c3c40 perf/x86/intel: Support full width counting
Recent Intel CPUs like Haswell and IvyBridge have a new
alternative MSR range for perfctrs that allows writing the full
counter width. Enable this range if the hardware reports it
using a new capability bit.

Currently the perf code queries CPUID to get the counter width,
and sign extends the counter values as needed. The traditional
PERFCTR MSRs always limit to 32bit, even though the counter
internally is larger (usually 48 bits on recent CPUs)

When the new capability is set use the alternative range which
do not have these restrictions.

This lowers the overhead of perf stat slightly because it has to
do less interrupts to accumulate the counter value. On Haswell
it also avoids some problems with TSX aborting when the end of
the counter range is reached.

( See the patch "perf/x86/intel: Avoid checkpointed counters
  causing excessive TSX aborts" for more details. )

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 11:59:25 +02:00
Ingo Molnar
ca02c21674 Merge tag 'please-pull-mce-bitmap-comment' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras
Pull MCE updates from Tony Luck:

 "Better comments so we understand our existing machine check
  bank bitmaps - prelude to adding another bitmap soon."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 10:53:45 +02:00
H. Peter Anvin
a3d7b7dddc x86, asm, cleanup: Replace open-coded control register values with symbolic
Clean up an unnecessary open-coded control register values.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org
2013-06-25 16:26:06 -07:00
H. Peter Anvin
1adfa76a95 x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
Bit 1 in the x86 EFLAGS is always set.  Name the macro something that
actually tries to explain what it is all about, rather than being a
tautology.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org
2013-06-25 16:25:32 -07:00
Naveen N. Rao
0644414e62 mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps.  Provide comments so that we all know exactly what these
are used for, and why.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-25 13:53:27 -07:00
Yinghai Lu
d5c78673b1 x86: Fix /proc/mtrr with base/size more than 44bits
On one sytem that mtrr range is more then 44bits, in dmesg we have
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-DFFFF write-through
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   8 disabled
[    0.000000]   9 disabled

but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

so bit 44 and bit 45 get cut off.

We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.

2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.

Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.

So When are we going to have size more than 44bits? that is 16TiB.

after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

-v2: simply checking in mtrr_add_page according to hpa.

[ hpa: This probably wants to go into -stable only after having sat in
  mainline for a bit.  It is not a regression. ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-25 13:08:10 -07:00
H. Peter Anvin
5236eb968e Merge remote-tracking branch 'trace/tip/x86/trace' into x86/trace
Fix from Steven Rostedt.
2013-06-24 11:01:09 -07:00
Grant Likely
ddaf144c61 irqdomain: Refactor irq_domain_associate_many()
Originally, irq_domain_associate_many() was designed to unwind the
mapped irqs on a failure of any individual association. However, that
proved to be a problem with certain IRQ controllers. Some of them only
support a subset of irqs, and will fail when attempting to map a
reserved IRQ. In those cases we want to map as many IRQs as possible, so
instead it is better for irq_domain_associate_many() to make a
best-effort attempt to map irqs, but not fail if any or all of them
don't succeed. If a caller really cares about how many irqs got
associated, then it should instead go back and check that all of the
irqs is cares about were mapped.

The original design open-coded the individual association code into the
body of irq_domain_associate_many(), but with no longer needing to
unwind associations, the code becomes simpler to split out
irq_domain_associate() to contain the bulk of the logic, and
irq_domain_associate_many() to be a simple loop wrapper.

This patch also adds a new error check to the associate path to make
sure it isn't called for an irq larger than the controller can handle,
and adds locking so that the irq_domain_mutex is held while setting up a
new association.

v3: Fixup missing change to irq_domain_add_tree()
v2: Fixup x86 warning. irq_domain_associate_many() no longer returns an
    error code, but reports errors to the printk log directly. In the
    majority of cases we don't actually want to fail if there is a
    problem, but rather log it and still try to boot the system.

Signed-off-by: Grant Likely <grant.likely@linaro.org>

irqdomain: Fix flubbed irq_domain_associate_many refactoring

commit d39046ec72, "irqdomain: Refactor irq_domain_associate_many()" was
missing the following hunk which causes a boot failure on anything using
irq_domain_add_tree() to allocate an irq domain.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2013-06-24 14:01:42 +01:00
Dave Hansen
0c4df02d73 x86: Add NMI duration tracepoints
This patch has been invaluable in my adventures finding
issues in the perf NMI handler.  I'm as big a fan of
printk() as anybody is, but using printk() in NMIs is
deadly when they're happening frequently.

Even hacking in trace_printk() ended up eating enough
CPU to throw off some of the measurements I was making.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:58 +02:00
Dave Hansen
14c63f17b1 perf: Drop sample rate when sampling is too slow
This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second.  If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well.  *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:57 +02:00
Dave Hansen
2ab00456ea x86: Warn when NMI handlers take large amounts of time
I have a system which is causing all kinds of problems.  It has
8 NUMA nodes, and lots of cores that can fight over cachelines.
If things are not working _perfectly_, then NMIs can take longer
than expected.

If we get too many of them backed up to each other, we can
easily end up in a situation where we are doing nothing *but*
running NMIs.  The biggest problem, though, is that this happens
_silently_.  You might be lucky to get an hrtimer warning, but
most of the time system simply hangs.

This patch should at least give us some warning before we fall
off the cliff.  the warnings look like this:

	nmi_handle: perf_event_nmi_handler() took: 26095071 ns

The message is triggered whenever we notice the longest NMI
we've seen to date.  You can always view and reset this value
via the debugfs interface if you like.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:56 +02:00
Seiji Aguchi
33e5ff634f x86/tracing: Add config option checking to the definitions of mce handlers
In case CONFIG_X86_MCE_THRESHOLD and CONFIG_X86_THERMAL_VECTOR
are disabled, kernel build fails as follows.

   arch/x86/built-in.o: In function `trace_threshold_interrupt':
   (.entry.text+0x122b): undefined reference to `smp_trace_threshold_interrupt'
   arch/x86/built-in.o: In function `trace_thermal_interrupt':
   (.entry.text+0x132b): undefined reference to `smp_trace_thermal_interrupt'

In this case, trace_threshold_interrupt/trace_thermal_interrupt
are not needed to define.

So, add config option checking to their definitions in entry_64.S.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/51C58B8A.2080808@hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:41:36 +02:00
Steven Rostedt (Red Hat)
2b4bc78956 trace,x86: Do not call local_irq_save() in load_current_idt()
As load_current_idt() is now what is used to update the IDT for the
switches needed for NMI, lockdep debug, and for tracing, it must not
call local_irq_save(). This is because one of the users of this is
lockdep, which does tracing of local_irq_save() and when the debug
trap is hit, we need to update the IDT before tracing interrupts
being disabled. As load_current_idt() is used to do this, calling
local_irq_save() which lockdep traces, defeats the point of calling
load_current_idt().

As interrupts are already disabled when used by lockdep and NMI, the
only other user is tracing that can disable interrupts itself. Simply
have the tracing update disable interrupts before calling load_current_idt()
instead of breaking the other users.

Here's the dump that happened:

------------[ cut here ]------------
WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398()
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled)
Modules linked in:
CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90
 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011
 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0
Call Trace:
 [<ffffffff8192822b>] dump_stack+0x19/0x1b
 [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80
 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffff810341f7>] copy_process+0x2c3/0x1398
 [<ffffffff8103539d>] do_fork+0xa8/0x260
 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f
 [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff810355cf>] SyS_clone+0x16/0x18
 [<ffffffff81938369>] stub_clone+0x69/0x90
 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b
---[ end trace 8b157a9d20ca1aa2 ]---

in fork.c:

 #ifdef CONFIG_PROVE_LOCKING
	DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here
	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-22 13:16:19 -04:00
Linus Torvalds
f71194a7d4 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This series fixes a couple of build failures, and fixes MTRR cleanup
  and memory setup on very specific memory maps.

  Finally, it fixes triggering backtraces on all CPUs, which was
  inadvertently disabled on x86."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Fix dummy variable buffer allocation
  x86: Fix trigger_all_cpu_backtrace() implementation
  x86: Fix section mismatch on load_ucode_ap
  x86: fix build error and kconfig for ia32_emulation and binfmt
  range: Do not add new blank slot with add_range_with_merge
  x86, mtrr: Fix original mtrr range get for mtrr_cleanup
2013-06-21 06:33:48 -10:00
Linus Torvalds
9d0be540d7 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "Three one-line fixes for my first pull request; one for x86 host, one
  for x86 guest, one for PPC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86: kvmclock: zero initialize pvclock shared memory area
  kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
  KVM: x86: remove vcpu's CPL check in host-invoked XCR set
2013-06-21 06:29:22 -10:00