Commit Graph

9054 Commits

Author SHA1 Message Date
Joerg Roedel
94fe79e2f1 x86/amd-iommu: Move inv-dte command building to own function
This patch moves command building for the invalidate-dte
command into its own function.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-04-06 11:44:15 +02:00
Joerg Roedel
ded467374a x86/amd-iommu: Move compl-wait command building to own function
This patch introduces a seperate function for building
completion-wait commands.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-04-06 10:53:48 +02:00
Matthew Garrett
660e34cebf x86: Reorder reboot method preferences
We have a never ending stream of 'reboot quirks' for new boxes
that will not reboot properly under Linux (they will hang on
reboot).

The reason is widespread 'Windows compatible' assumption of modern
x86 hardware, which expects the following reboot sequence:

 - hitting the ACPI reboot vector (if available)
 - trying the keyboard controller
 - hitting the ACPI reboot vector again
 - then giving the keyboard controller one last go

This sequence expectation gets more and more embedded in modern
hardware, which often lacks a keyboard controller and may even
lock up if the legacy io ports are hit - and which hardware is
often not tested with Linux during development.

The end result is that reboot works under Windows-alike OSs but not
under Linux.

Rework our reboot process to meet this hardware externality a little
better and match this assumption of newer x86 hardware.

In addition to the ACPI,kbd,ACPI,kbd sequence we'll still fall
through to attempting a legacy triple fault if nothing else
works - and keep trying that and the kbd reset.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
[ this commit will also save special casing Oaktrail boards ]
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Leann Ogasawara <leann.ogasawara@canonical.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <1301939705-2404-1-git-send-email-mjg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-06 10:36:50 +02:00
Jason Baron
d430d3d7e6 jump label: Introduce static_branch() interface
Introduce:

static __always_inline bool static_branch(struct jump_label_key *key);

instead of the old JUMP_LABEL(key, label) macro.

In this way, jump labels become really easy to use:

Define:

        struct jump_label_key jump_key;

Can be used as:

        if (static_branch(&jump_key))
                do unlikely code

enable/disale via:

        jump_label_inc(&jump_key);
        jump_label_dec(&jump_key);

that's it!

For the jump labels disabled case, the static_branch() becomes an
atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(),
atomic_dec() operations. We show testing results for this change below.

Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct.

Since we now require a 'struct jump_label_key *key', we can store a pointer into
the jump table addresses. In this way, we can enable/disable jump labels, in
basically constant time. This change allows us to completely remove the previous
hashtable scheme. Thanks to Peter Zijlstra for this re-write.

Testing:

I ran a series of 'tbench 20' runs 5 times (with reboots) for 3
configurations, where tracepoints were disabled.

jump label configured in
avg: 815.6

jump label *not* configured in (using atomic reads)
avg: 800.1

jump label *not* configured in (regular reads)
avg: 803.4

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110316212947.GA8792@redhat.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Suggested-by: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-04 12:48:08 -04:00
Linus Torvalds
d7c764c4c7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Fix kdump reboot
  x86, amd-nb: Rename CPU PCI id define for F4
  sound: Add delay.h to sound/soc/codecs/sn95031.c
  x86, mtrr, pat: Fix one cpu getting out of sync during resume
  x86, microcode: Unregister syscore_ops after microcode unloaded
  x86: Stop including <linux/delay.h> in two asm header files
2011-04-04 08:37:45 -07:00
Linus Torvalds
fb9a7d76da Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: create new rcu_access_index() and use in mce
  WARN_ON_SMP(): Add comment to explain ({0;})
2011-04-04 08:36:15 -07:00
Rakib Mullick
64d21fc194 x86, mpparse: Put check_slot() into .init section
check_slot() is only called from replace_intsrc_all() - which is
in the .init section.

So, put check_slot into the .init section as well, so it can be freed
after system boot.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <AANLkTing52ntzRcHkODCWDKOfRF=0uhXw5-cCUhx6M54@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-04 17:17:19 +02:00
Ingo Molnar
9402efaa96 Merge branch 'x86-mm' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into x86/mm 2011-04-04 16:39:20 +02:00
Paul E. McKenney
a4dd99250d rcu: create new rcu_access_index() and use in mce
The MCE subsystem needs to sample an RCU-protected index outside of
any protection for that index.  If this was a pointer, we would use
rcu_access_pointer(), but there is no corresponding rcu_access_index().
This commit therefore creates an rcu_access_index() and applies it
to MCE.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Zdenek Kabelac <zkabelac@redhat.com>
2011-04-01 07:27:31 -07:00
Cliff Wickman
818987e9a1 x86, UV: Fix kdump reboot
After a crash dump on an SGI Altix UV system the crash kernel
fails to cause a reboot.  EFI mode is disabled in the kdump
kernel, so only the reboot_type of BOOT_ACPI works.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: rja@sgi.com
LKML-Reference: <E1Q5Iuo-00013b-UK@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 18:44:03 +02:00
Borislav Petkov
cb6c8520f6 x86, amd-nb: Rename CPU PCI id define for F4
With increasing number of PCI function ids, add the PCI function
id in the define name instead of its symbolic name in the BKDG
for more clarity. This renames function 4 define.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <20110330183447.GA3668@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 08:51:38 +02:00
Suresh Siddha
84ac7cdbdd x86, mtrr, pat: Fix one cpu getting out of sync during resume
On laptops with core i5/i7, there were reports that after resume
graphics workloads were performing poorly on a specific AP, while
the other cpu's were ok. This was observed on a 32bit kernel
specifically.

Debug showed that the PAT init was not happening on that AP
during resume and hence it contributing to the poor workload
performance on that cpu.

On this system, resume flow looked like this:

1. BP starts the resume sequence and we reinit BP's MTRR's/PAT
   early on using mtrr_bp_restore()

2. Resume sequence brings all AP's online

3. Resume sequence now kicks off the MTRR reinit on all the AP's.

4. For some reason, between point 2 and 3, we moved from BP
   to one of the AP's. My guess is that printk() during resume
   sequence is contributing to this. We don't see similar
   behavior with the 64bit kernel but there is no guarantee that
   at this point the remaining resume sequence (after AP's bringup)
   has to happen on BP.

5. set_mtrr() was assuming that we are still on BP and skipped the
   MTRR/PAT init on that cpu (because of 1 above)

6. But we were on an AP and this led to not reprogramming PAT
   on this cpu leading to bad performance.

Fix this by doing unconditional mtrr_if->set_all() in set_mtrr()
during MTRR/PAT init. This might be unnecessary if we are still
running on BP. But it is of no harm and will guarantee that after
resume, all the cpu's will be in sync with respect to the
MTRR/PAT registers.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Keith Packard <keithp@keithp.com>
Cc: stable@kernel.org	[v2.6.32+]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-03-29 16:17:42 -07:00
Thomas Gleixner
86cc8dfc21 x86: apb_timer: Fixup genirq fallout
The lonely user of the internal interface was not in the coccinelle
script.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-30 00:13:30 +02:00
Xiaotian Feng
4ac5fc6a3e x86, microcode: Unregister syscore_ops after microcode unloaded
Currently, microcode doesn't unregister syscore_ops after it's
unloaded. So if we modprobe then rmmod microcode, the stale
microcode syscore_ops info will stay on syscore_ops_list.

Later when we're trying to reboot/halt/shutdown the machine, kernel
will panic on syscore_shutdown().

With the patch applied, I can reboot/halt/shutdown my machine successfully.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1301387672-23661-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-29 11:12:04 +02:00
Christoph Lameter
fe5042138b x86: Use this_cpu_has for thermal_interrupt current cpu
It is more effective to use a segment prefix instead of calculating the
address of the current cpu area amd then testing flags.

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-03-29 10:18:30 +02:00
Christoph Lameter
349c004e3d x86: A fast way to check capabilities of the current cpu
Add this_cpu_has() which determines if the current cpu has a certain
ability using a segment prefix and a bit test operation.

For that we need to add bit operations to x86s percpu.h.

Many uses of cpu_has use a pointer passed to a function to determine
the current flags. That is no longer necessary after this patch.

However, this patch only converts the straightforward cases where
cpu_has is used with this_cpu_ptr. The rest is work for later.

-tj: Rolled up patch to add x86_ prefix and use percpu_read() instead
     of percpu_read_stable().

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-03-29 10:18:30 +02:00
Jean Delvare
ca444564a9 x86: Stop including <linux/delay.h> in two asm header files
Stop including <linux/delay.h> in x86 header files which don't
need it. This will let the compiler complain when this header is
not included by source files when it should, so that
contributors can fix the problem before building on other
architectures starts to fail.

Credits go to Geert for the idea.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: James E.J. Bottomley <James.Bottomley@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20110325152014.297890ec@endymion.delvare>
[ this also fixes an upstream build bug in drivers/media/rc/ite-cir.c ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-29 09:37:42 +02:00
Cyrill Gorcunov
1d321881af perf, x86: P4 PMU - clean up the code a bit
No change on the functional level, just align the table properly.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <4D8FA213.5050108@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-29 09:36:33 +02:00
Linus Torvalds
16c29dafcc Merge branch 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  Introduce ARCH_NO_SYSDEV_OPS config option (v2)
  cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
  KVM: Use syscore_ops instead of sysdev class and sysdev
  PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev
  timekeeping: Use syscore_ops instead of sysdev class and sysdev
  x86: Use syscore_ops instead of sysdev classes and sysdevs
2011-03-25 21:07:59 -07:00
Linus Torvalds
95e14ed7fc Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kdb: add usage string of 'per_cpu' command
  kgdb,x86_64: fix compile warning found with sparse
  kdb: code cleanup to use macro instead of value
  kgdboc,kgdbts: strlen() doesn't count the terminator
2011-03-25 21:04:56 -07:00
Linus Torvalds
2a20b02c05 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue
  perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows
  perf symbols: Look at .dynsym again if .symtab not found
  perf build-id: Add quirk to deal with perf.data file format breakage
  perf session: Pass evsel in event_ops->sample()
  perf: Better fit max unprivileged mlock pages for tools needs
  perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()
  perf top: Fix uninitialized 'counter' variable
  tracing: Fix set_ftrace_filter probe function display
  perf, x86: Fix Intel fixed counters base initialization
2011-03-25 17:53:09 -07:00
Linus Torvalds
94df491c4a Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Fix WARN_ON() test for UP
  WARN_ON_SMP(): Allow use in if() statements on UP
  x86, dumpstack: Use %pB format specifier for stack trace
  vsprintf: Introduce %pB format specifier
  lockdep: Remove unused 'factor' variable from lockdep_stats_show()
2011-03-25 17:52:22 -07:00
Linus Torvalds
26ff6801f7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional
  x86: DT: Fix return condition in irq_create_of_mapping()
  x86, mpparse: Move check_slot into CONFIG_X86_IO_APIC context
2011-03-25 17:51:51 -07:00
Jason Wessel
21431c2900 kgdb,x86_64: fix compile warning found with sparse
Fix sparse warning:

arch/x86/kernel/kgdb.c:123:9: warning: switch with no cases

Reported-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2011-03-25 16:37:31 -05:00
Ingo Molnar
45daae575e perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue
Eric Dumazet reported that hardware PMU events do not work on his
system, due to the BIOS corrupting PMU state:

    Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
    [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)

Linus suggested that we continue in the face of such BIOS-induced CPU
state corruption:

   http://lkml.org/lkml/2011/3/24/608

Such BIOSes will have to be fixed - Linux developers rely on a working and
fully capable PMU and the BIOS interfering with the CPU's PMU state is simply
not acceptable.

So this patch changes perf to continue when it detects such BIOS
interaction, some hardware events may be unreliable due to the BIOS
writing and re-writing them - there's not much the kernel can do
about that but to detect the corruption and report it.

Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-25 11:23:41 +01:00
Thomas Gleixner
07611dbda5 x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional
That call escaped the name space cleanup. Fix it up.

We really want to call there. The chip might have changed since the
irq was setup initially. So let the core code and the chip decide what
to do. The status is just an unreliable snapshot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2011-03-24 23:17:56 +01:00
Thomas Gleixner
00a30b254b x86: DT: Fix return condition in irq_create_of_mapping()
The xlate() function returns 0 or a negative error code. Returning the
error code blindly will be seen as an huge irq number by the calling
function because irq_create_of_mapping() returns an unsigned value.

Return 0 (NO_IRQ) as required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2011-03-24 23:17:56 +01:00
Don Zickus
242214f9c1 perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows
The read of a proper MSR register was missed and instead of
counter the configration register was tested (it has
ARCH_P4_UNFLAGGED_BIT always cleared) leading to unknown NMI
hitting the system. As result the user may obtain "Dazed and
confused, but trying to continue" message. Fix it by reading a
proper MSR register.

When an NMI happens on a P4, the perf nmi handler checks the
configuration register to see if the overflow bit is set or not
before taking appropriate action.  Unfortunately, various P4
machines had a broken overflow bit, so a backup mechanism was
implemented.  This mechanism checked to see if the counter
rolled over or not.

A previous commit that implemented this backup mechanism was
broken. Instead of reading the counter register, it used the
configuration register to determine if the counter rolled over
or not. Reading that bit would give incorrect results.

This would lead to 'Dazed and confused' messages for the end
user when using the perf tool (or if the nmi watchdog is
running).

The fix is to read the counter register before determining if
the counter rolled over or not.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <4D8BAB49.3080701@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-24 21:40:01 +01:00
Tejun Heo
0415b00d17 percpu: Always align percpu output section to PAGE_SIZE
Percpu allocator honors alignment request upto PAGE_SIZE and both the
percpu addresses in the percpu address space and the translated kernel
addresses should be aligned accordingly.  The calculation of the
former depends on the alignment of percpu output section in the kernel
image.

The linker script macros PERCPU_VADDR() and PERCPU() are used to
define this output section and the latter takes @align parameter.
Several architectures are using @align smaller than PAGE_SIZE breaking
percpu memory alignment.

This patch removes @align parameter from PERCPU(), renames it to
PERCPU_SECTION() and makes it always align to PAGE_SIZE.  While at it,
add PCPU_SETUP_BUG_ON() checks such that alignment problems are
reliably detected and remove percpu alignment comment recently added
in workqueue.c as the condition would trigger BUG way before reaching
there.

For um, this patch raises the alignment of percpu area.  As the area
is in .init, there shouldn't be any noticeable difference.

This problem was discovered by David Howells while debugging boot
failure on mn10300.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: uclinux-dist-devel@blackfin.uclinux.org
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
2011-03-24 18:50:09 +01:00
Linus Torvalds
047f61c5d1 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (42 commits)
  ACPI: minor printk format change in acpi_pad
  ACPI: make acpi_pad /sys output more readable
  ACPICA: Update version to 20110316
  ACPICA: Header support for SLIC table
  ACPI: Make sure the FADT is at least rev 2 before using the reset register
  ACPI: Bug compatibility for Windows on the ACPI reboot vector
  ACPICA: Fix access width for reset vector
  ACPI battery: fribble sysfs files from a resume notifier
  ACPI button: remove unused procfs I/F
  ACPI, APEI, Add PCIe AER error information printing support
  PCIe, AER, use pre-generated prefix in error information printing
  ACPI, APEI, Add ERST record ID cache
  ACPI: Use syscore_ops instead of sysdev class and sysdev
  ACPI: Remove the unused EC sysdev class
  ACPI: use __cpuinit for the acpi_processor_set_pdc() call tree
  ACPI: use __init where possible in processor driver
  Thermal_Framework-Fix_crash_during_hwmon_unregister
  ACPICA: Update version to 20110211.
  ACPICA: Add mechanism to defer _REG methods for some installed handlers
  ACPICA: Add support for FunctionalFixedHW in acpi_ut_get_region_name
  ...
2011-03-24 08:25:15 -07:00
Rakib Mullick
9f1f1bfd8d x86, mpparse: Remove unnecessary variable
'ret' isn't used by check_slot(), gets initialized but has no real use,
so remove it.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <AANLkTikh3y+is3xixKBdyHhr_cHxzPFJF729Fcvt8+Ns@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-24 09:05:40 +01:00
Namhyung Kim
71f9e59800 x86, dumpstack: Use %pB format specifier for stack trace
Improve noreturn function entries in call traces:

Before:

 Call Trace:
  [<ffffffff812a8502>] panic+0x8c/0x18d
  [<ffffffffa000012a>] deep01+0x0/0x38 [test_panic]  <--- bad
  [<ffffffff81104666>] proc_file_write+0x73/0x8d
  [<ffffffff811000b3>] proc_reg_write+0x8d/0xac
  [<ffffffff810c7d32>] vfs_write+0xa1/0xc5
  [<ffffffff810c7e0f>] sys_write+0x45/0x6c
  [<ffffffff8f02943b>] system_call_fastpath+0x16/0x1b

After:

 Call Trace:
  [<ffffffff812bce69>] panic+0x8c/0x18d
  [<ffffffffa000012a>] panic_write+0x20/0x20 [test_panic] <--- good
  [<ffffffff81115fab>] proc_file_write+0x73/0x8d
  [<ffffffff81111a5f>] proc_reg_write+0x8d/0xac
  [<ffffffff810d90ee>] vfs_write+0xa1/0xc5
  [<ffffffff810d91cb>] sys_write+0x45/0x6c
  [<ffffffff812c07fb>] system_call_fastpath+0x16/0x1b

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1300934550-21394-2-git-send-email-namhyung@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-24 08:36:10 +01:00
Linus Torvalds
b81a618dcd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  deal with races in /proc/*/{syscall,stack,personality}
  proc: enable writing to /proc/pid/mem
  proc: make check_mem_permission() return an mm_struct on success
  proc: hold cred_guard_mutex in check_mem_permission()
  proc: disable mem_write after exec
  mm: implement access_remote_vm
  mm: factor out main logic of access_process_vm
  mm: use mm_struct to resolve gate vma's in __get_user_pages
  mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm
  mm: arch: make in_gate_area take an mm_struct instead of a task_struct
  mm: arch: make get_gate_vma take an mm_struct instead of a task_struct
  x86: mark associated mm when running a task in 32 bit compatibility mode
  x86: add context tag to mark mm when running a task in 32-bit compatibility mode
  auxv: require the target to be tracable (or yourself)
  close race in /proc/*/environ
  report errors in /proc/*/*map* sanely
  pagemap: close races with suid execve
  make sessionid permissions in /proc/*/task/* match those in /proc/*
  fix leaks in path_lookupat()

Fix up trivial conflicts in fs/proc/base.c
2011-03-23 20:51:42 -07:00
Olaf Hering
93a72052be crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn
The Xen PV drivers in a crashed HVM guest can not connect to the dom0
backend drivers because both frontend and backend drivers are still in
connected state.  To run the connection reset function only in case of a
crashdump, the is_kdump_kernel() function needs to be available for the PV
driver modules.

Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into
kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel()
usable for modules.

Leave 'elfcorehdr' as early_param().  This changes powerpc from __setup()
to early_param().  It adds an address range check from x86 also on ia64
and powerpc.

[akpm@linux-foundation.org: additional #includes]
[akpm@linux-foundation.org: remove elfcorehdr_addr export]
[akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:47:19 -07:00
Rafael J. Wysocki
f3c6ea1b06 x86: Use syscore_ops instead of sysdev classes and sysdevs
Some subsystems in the x86 tree need to carry out suspend/resume and
shutdown operations with one CPU on-line and interrupts disabled and
they define sysdev classes and sysdevs or sysdev drivers for this
purpose.  This leads to unnecessarily complicated code and excessive
memory usage, so switch them to using struct syscore_ops objects for
this purpose instead.

Generally, there are three categories of subsystems that use
sysdevs for implementing PM operations: (1) subsystems whose
suspend/resume callbacks ignore their arguments entirely (the
majority), (2) subsystems whose suspend/resume callbacks use their
struct sys_device argument, but don't really need to do that,
because they can be implemented differently in an arguably simpler
way (io_apic.c), and (3) subsystems whose suspend/resume callbacks
use their struct sys_device argument, but the value of that argument
is always the same and could be ignored (microcode_core.c).  In all
of these cases the subsystems in question may be readily converted to
using struct syscore_ops objects for power management and shutdown.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
2011-03-23 22:15:54 +01:00
Stephen Wilson
375906f876 x86: mark associated mm when running a task in 32 bit compatibility mode
This patch simply follows the same practice as for setting the TIF_IA32 flag.
In particular, an mm is marked as holding 32-bit tasks when a 32-bit binary is
exec'ed.  Both ELF and a.out formats are updated.

Signed-off-by: Stephen Wilson <wilsons@start.ca>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-23 16:36:53 -04:00
Len Brown
02e2407858 Merge branch 'linus' into release
Conflicts:
	arch/x86/kernel/acpi/sleep.c

Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-23 02:34:54 -04:00
Len Brown
5c129a8600 Merge commit 'v2.6.38' into release 2011-03-23 02:33:54 -04:00
David Rientjes
4061d68e1a x86: only compile 8237A if CONFIG_ISA_DMA_API is enabled
8237A utilizes the interface provided by CONFIG_ISA_DMA_API, specifically
claim_dma_lock() and release_dma_lock().  Thus, there's a strict
dependency on the config option and the module should only be loaded if
the kernel supports ISA-style DMA.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22 17:44:16 -07:00
Olaf Hering
d404ab0a11 move x86 specific oops=panic to generic code
The oops=panic cmdline option is not x86 specific, move it to generic code.
Update documentation.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22 17:44:11 -07:00
Linus Torvalds
73d5a8675f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xen: update mask_rw_pte after kernel page tables init changes
  xen: set max_pfn_mapped to the last pfn mapped
  x86: Cleanup highmap after brk is concluded

Fix up trivial onflict (added header file includes) in
arch/x86/mm/init_64.c
2011-03-22 10:41:36 -07:00
Rakib Mullick
cbb84c4cc1 x86, mpparse: Move check_slot into CONFIG_X86_IO_APIC context
When CONFIG_X86_MPPARSE=y and CONFIG_X86_IO_APIC=n, then we get
the following warning:

  arch/x86/kernel/mpparse.c:723: warning: 'check_slot' defined but not used

So, put check_slot into CONFIG_X86_IO_APIC context. Its only
called from CONFIG_X86_IO_APIC=y context.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <AANLkTinsUfGc=NG_GeH_B+zFVu+DXJzZbJKdQLscqfuH@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-22 13:00:22 +01:00
Len Brown
25076246e8 Merge branch 'apei-release' into release 2011-03-22 01:41:47 -04:00
Huang Ying
885b976fad ACPI, APEI, Add ERST record ID cache
APEI ERST firmware interface and implementation has no multiple users
in mind.  For example, if there is four records in storage with ID: 1,
2, 3 and 4, if two ERST readers enumerate the records via
GET_NEXT_RECORD_ID as follow,

reader 1		reader 2
1
			2
3
			4
-1
			-1

where -1 signals there is no more record ID.

Reader 1 has no chance to check record 2 and 4, while reader 2 has no
chance to check record 1 and 3.  And any other GET_NEXT_RECORD_ID will
return -1, that is, other readers will has no chance to check any
record even they are not cleared by anyone.

This makes raw GET_NEXT_RECORD_ID not suitable for used by multiple
users.

To solve the issue, an in-memory ERST record ID cache is designed and
implemented.  When enumerating record ID, the ID returned by
GET_NEXT_RECORD_ID is added into cache in addition to be returned to
caller.  So other readers can check the cache to get all record ID
available.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-21 22:59:06 -04:00
Sage Weil
b7ed78f565 introduce sys_syncfs to sync a single file system
It is frequently useful to sync a single file system, instead of all
mounted file systems via sync(2):

 - On machines with many mounts, it is not at all uncommon for some of
   them to hang (e.g. unresponsive NFS server).  sync(2) will get stuck on
   those and may never get to the one you do care about (e.g., /).
 - Some applications write lots of data to the file system and then
   want to make sure it is flushed to disk.  Calling fsync(2) on each
   file introduces unnecessary ordering constraints that result in a large
   amount of sub-optimal writeback/flush/commit behavior by the file
   system.

There are currently two ways (that I know of) to sync a single super_block:

 - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
   mapping, which isn't usually desirable, and doesn't work for non-block
   file systems.
 - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
   current implemention.  Relying on this little-known side effect for
   something like data safety sounds foolish.

Both of these approaches require root privileges, which some applications
do not have (nor should they need?) given that sync(2) is an unprivileged
operation.

This patch introduces a new system call syncfs(2) that takes an fd and
syncs only the file system it references.  Maybe someday we can

 $ sync /some/path

and not get

 sync: ignoring all arguments

The syscall is motivated by comments by Al and Christoph at the last LSF.
syncfs(2) seems like an appropriate name given statfs(2).

A similar ioctl was also proposed a while back, see
	http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-21 00:40:29 -04:00
Yinghai Lu
e5f15b45dd x86: Cleanup highmap after brk is concluded
Now cleanup_highmap actually is in two steps: one is early in head64.c
and only clears above _end; a second one is in init_memory_mapping() and
tries to clean from _brk_end to _end.
It should check if those boundaries are PMD_SIZE aligned but currently
does not.
Also init_memory_mapping() is called several times for numa or memory
hotplug, so we really should not handle initial kernel mappings there.

This patch moves cleanup_highmap() down after _brk_end is settled so
we can do everything in one step.
Also we honor max_pfn_mapped in the implementation of cleanup_highmap.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-03-19 11:58:19 -07:00
Stephane Eranian
fc66c5210e perf, x86: Fix Intel fixed counters base initialization
The following patch solves the problems introduced by Robert's
commit 41bf498 and reported by Arun Sharma. This commit gets rid
of the base + index notation for reading and writing PMU msrs.

The problem is that for fixed counters, the new calculation for
the base did not take into account the fixed counter indexes,
thus all fixed counters were read/written from fixed counter 0.
Although all fixed counters share the same config MSR, they each
have their own counter register.

Without:

 $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds

  242202299 unhalted_core_cycles (0.00% scaling, ena=1000790892, run=1000790892)
 2389685946 instructions_retired (0.00% scaling, ena=1000790892, run=1000790892)
      49473 baclears             (0.00% scaling, ena=1000790892, run=1000790892)

With:

 $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds

 2392703238 unhalted_core_cycles (0.00% scaling, ena=1000840809, run=1000840809)
 2389793744 instructions_retired (0.00% scaling, ena=1000840809, run=1000840809)
      47863 baclears             (0.00% scaling, ena=1000840809, run=1000840809)

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ming.m.lin@intel.com
Cc: robert.richter@amd.com
Cc: asharma@fb.com
Cc: perfmon2-devel@lists.sf.net
LKML-Reference: <20110319172005.GB4978@quad>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-19 19:00:49 +01:00
Len Brown
05534c9ffc Merge branch 'acpica' into release 2011-03-18 18:06:08 -04:00
Linus Torvalds
f2e1fbb5f2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Flush TLB if PGD entry is changed in i386 PAE mode
  x86, dumpstack: Correct stack dump info when frame pointer is available
  x86: Clean up csum-copy_64.S a bit
  x86: Fix common misspellings
  x86: Fix misspelling and align params
  x86: Use PentiumPro-optimized partial_csum() on VIA C7
2011-03-18 10:45:21 -07:00
Linus Torvalds
619297855a Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
  trace, filters: Initialize the match variable in process_ops() properly
  trace, documentation: Fix branch profiling location in debugfs
  oprofile, s390: Cleanups
  oprofile, s390: Remove hwsampler_files.c and merge it into init.c
  perf: Fix tear-down of inherited group events
  perf: Reorder & optimize perf_event_context to remove alignment padding on 64 bit builds
  perf: Handle stopped state with tracepoints
  perf: Fix the software events state check
  perf, powerpc: Handle events that raise an exception without overflowing
  perf, x86: Use INTEL_*_CONSTRAINT() for all PEBS event constraints
  perf, x86: Clean up SandyBridge PEBS events
  perf lock: Fix sorting by wait_min
  perf tools: Version incorrect with some versions of grep
  perf evlist: New command to list the names of events present in a perf.data file
  perf script: Add support for H/W and S/W events
  perf script: Add support for dumping symbols
  perf script: Support custom field selection for output
  perf script: Move printing of 'common' data from print_event and rename
  perf tracing: Remove print_graph_cpu and print_graph_proc from trace-event-parse
  perf script: Change process_event prototype
  ...
2011-03-18 10:38:34 -07:00