Commit Graph

14679 Commits

Author SHA1 Message Date
Michael Wang
78feefc512 sched: using dst_rq instead of this_rq during load balance
As we already have dst_rq in lb_env, using or changing "this_rq" do not
make sense.

This patch will replace "this_rq" with dst_rq in load_balance, and we
don't need to change "this_rq" while process LBF_SOME_PINNED any more.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/501F8357.3070102@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 18:58:15 +02:00
Pekka Enberg
edde96eafc sched: Document schedule() entry points
This patch adds a comment on top of the schedule() function to explain
to scheduler newbies how the main scheduler function is entered.

Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Explained-by: Ingo Molnar <mingo@kernel.org>
Explained-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344070187-2420-1-git-send-email-penberg@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 18:58:15 +02:00
Borislav Petkov
532b1858c5 sched: Fix __sched_period comment
It should be sched_nr_latency so fix it before it annoys me more.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344435364-18632-1-git-send-email-bp@amd64.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 18:58:15 +02:00
Thomas Gleixner
a4133765c1 Merge branch 'sched/urgent' into sched/core 2012-08-13 18:56:46 +02:00
Mike Galbraith
8f6189684e sched: Fix migration thread runtime bogosity
Make stop scheduler class do the same accounting as other classes,

Migration threads can be caught in the act while doing exec balancing,
leading to the below due to use of unmaintained ->se.exec_start.  The
load that triggered this particular instance was an apparently out of
control heavily threaded application that does system monitoring in
what equated to an exec bomb, with one of the VERY frequently migrated
tasks being ps.

%CPU   PID USER     CMD
99.3    45 root     [migration/10]
97.7    53 root     [migration/12]
97.0    57 root     [migration/13]
90.1    49 root     [migration/11]
89.6    65 root     [migration/15]
88.7    17 root     [migration/3]
80.4    37 root     [migration/8]
78.1    41 root     [migration/9]
44.2    13 root     [migration/2]

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 18:41:55 +02:00
Mike Galbraith
e221d028bb sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled
Root task group bandwidth replenishment must service all CPUs, regardless of
where the timer was last started, and regardless of the isolation mechanism,
lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 18:41:55 +02:00
Mike Galbraith
35cf4e50b1 sched,cgroup: Fix up task_groups list
With multiple instances of task_groups, for_each_rt_rq() is a noop,
no task groups having been added to the rt.c list instance.  This
renders __enable/disable_runtime() and print_rt_stats() noop, the
user (non) visible effect being that rt task groups are missing in
/proc/sched_debug.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: stable@kernel.org # v3.3+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344308413.6846.7.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 18:41:54 +02:00
Stanislaw Gruszka
bea6832cc8 sched: fix divide by zero at {thread_group,task}_times
On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

  PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
   #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
   #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
   #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
   #3 [ffff880472a51cd0] die at ffffffff8100f26b
   #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
   #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
   #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
      [exception RIP: thread_group_times+0x56]
      RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
      RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
      RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
      RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
      R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
   #8 [ffff880472a51f40] sys_times at ffffffff81088524
   #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
      RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
      RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
      RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
      RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
      R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
      ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b

Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 18:41:54 +02:00
Peter Zijlstra
a35b6466aa sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies
Peter Portante reported that for large cgroup hierarchies (and or on
large CPU counts) we get immense lock contention on rq->lock and stuff
stops working properly.

His workload was a ton of processes, each in their own cgroup,
everybody idling except for a sporadic wakeup once every so often.

It was found that:

  schedule()
    idle_balance()
      load_balance()
        local_irq_save()
        double_rq_lock()
        update_h_load()
          walk_tg_tree(tg_load_down)
            tg_load_down()

Results in an entire cgroup hierarchy walk under rq->lock for every
new-idle balance and since new-idle balance isn't throttled this
results in a lot of work while holding the rq->lock.

This patch does two things, it removes the work from under rq->lock
based on the good principle of race and pray which is widely employed
in the load-balancer as a whole. And secondly it throttles the
update_h_load() calculation to max once per jiffy.

I considered excluding update_h_load() for new-idle balance
all-together, but purely relying on regular balance passes to update
this data might not work out under some rare circumstances where the
new-idle busiest isn't the regular busiest for a while (unlikely, but
a nightmare to debug if someone hits it and suffers).

Cc: pjt@google.com
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Reported-by: Peter Portante <pportant@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-aaarrzfpnaam7pqrekofu8a6@git.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 18:41:54 +02:00
Paul E. McKenney
62ab707247 rcu: Use smp_hotplug_thread facility for RCUs per-CPU kthread
Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
kthread code to use the new smp_hotplug_thread facility.

[ tglx: Adapted it to use callbacks and to the simplified rcu yield ]
    
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20120716103948.673354828@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 17:01:08 +02:00
Thomas Gleixner
bcd951cf10 watchdog: Use hotplug thread infrastructure
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20120716103948.563736676@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 17:01:07 +02:00
Thomas Gleixner
3e339b5dae softirq: Use hotplug thread infrastructure
[ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20120716103948.456416747@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 17:01:07 +02:00
Paul E. McKenney
3180d89b47 hotplug: Fix UP bug in smpboot hotplug code
Because kernel subsystems need their per-CPU kthreads on UP systems as
well as on SMP systems, the smpboot hotplug kthread functions must be
provided in UP builds as well as in SMP builds.  This commit therefore
adds smpboot.c to UP builds and excludes irrelevant code via #ifdef.
    
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 17:01:07 +02:00
Thomas Gleixner
f97f8f06a4 smpboot: Provide infrastructure for percpu hotplug threads
Provide a generic interface for setting up and tearing down percpu
threads.

On registration the threads for already online cpus are created and
started. On deregistration (modules) the threads are stoppped.

During hotplug operations the threads are created, started, parked and
unparked. The datastructure for registration provides a pointer to
percpu storage space and optional setup, cleanup, park, unpark
functions. These functions are called when the thread state changes.

Each implementation has to provide a function which is queried and
returns whether the thread should run and the thread function itself.

The core code handles all state transitions and avoids duplicated code
in the call sites.

[ paulmck: Preemption leak fix ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20120716103948.352501068@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 17:01:07 +02:00
Thomas Gleixner
2a1d446019 kthread: Implement park/unpark facility
To avoid the full teardown/setup of per cpu kthreads in the case of
cpu hot(un)plug, provide a facility which allows to put the kthread
into a park position and unpark it when the cpu comes online again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120716103948.236618824@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 17:01:06 +02:00
Thomas Gleixner
5d01bbd111 rcu: Yield simpler
The rcu_yield() code is amazing. It's there to avoid starvation of the
system when lots of (boosting) work is to be done.

Now looking at the code it's functionality is:

 Make the thread SCHED_OTHER and very nice, i.e. get it out of the way
 Arm a timer with 2 ticks
 schedule()

Now if the system goes idle the rcu task returns, regains SCHED_FIFO
and plugs on. If the systems stays busy the timer fires and wakes a
per node kthread which in turn makes the per cpu thread SCHED_FIFO and
brings it back on the cpu. For the boosting thread the "make it FIFO"
bit is missing and it just runs some magic boost checks. Now this is a
lot of code with extra threads and complexity.

It's way simpler to let the tasks when they detect overload schedule
away for 2 ticks and defer the normal wakeup as long as they are in
yielded state and the cpu is not idle.

That solves the same problem and the only difference is that when the
cpu goes idle it's not guaranteed that the thread returns right away,
but it won't be longer out than two ticks, so no harm is done. If
that's an issue than it is way simpler just to wake the task from
idle as RCU has callbacks there anyway.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120716103948.131256723@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 17:01:06 +02:00
Linus Torvalds
e4e139bebd Merge tag 'pm-for-3.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael J. Wysocki:

 - Fix for two recent regressions in the generic PM domains framework.

 - Revert of a commit that introduced a resume regression and is
   conceptually incorrect in my opinion.

 - Fix for a return value in pcc-cpufreq.c from Julia Lawall.

 - RTC wakeup signaling fix from Neil Brown.

 - Suppression of compiler warnings for CONFIG_PM_SLEEP unset in ACPI,
   platform/x86 and TPM drivers.

* tag 'pm-for-3.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  tpm_tis / PM: Fix unused function warning for CONFIG_PM_SLEEP
  platform / x86 / PM: Fix unused function warnings for CONFIG_PM_SLEEP
  ACPI / PM: Fix unused function warnings for CONFIG_PM_SLEEP
  Revert "NMI watchdog: fix for lockup detector breakage on resume"
  PM: Make dev_pm_get_subsys_data() always return 0 on success
  drivers/cpufreq/pcc-cpufreq.c: fix error return code
  RTC: Avoid races between RTC alarm wakeup and suspend.
2012-08-12 21:34:09 +03:00
Jeff Mahoney
e3756477ae printk: Fix calculation of length used to discard records
While tracking down a weird buffer overflow issue in a program that
looked to be sane, I started double checking the length returned by
syslog(SYSLOG_ACTION_READ_ALL, ...) to make sure it wasn't overflowing
the buffer.

Sure enough, it was.  I saw this in strace:

  11339 syslog(SYSLOG_ACTION_READ_ALL, "<5>[244017.708129] REISERFS (dev"..., 8192) = 8279

It turns out that the loops that calculate how much space the entries
will take when they're copied don't include the newlines and prefixes
that will be included in the final output since prev flags is passed as
zero.

This patch properly accounts for it and fixes the overflow.

CC: stable@kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-12 21:25:50 +03:00
Frederic Weisbecker
d077526485 perf: Add attribute to filter out callchains
Introducing following bits to the the perf_event_attr struct:

  - exclude_callchain_kernel to filter out kernel callchain
    from the sample dump

  - exclude_callchain_user to filter out user callchain
    from the sample dump

We need to be able to disable standard user callchain dump when we use
the dwarf cfi callchain mode, because frame pointer based user
callchains are useless in this mode.

Implementing also exclude_callchain_kernel to have complete set of
options.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
[ Added kernel callchains filtering ]
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-7-git-send-email-jolsa@redhat.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-10 12:40:57 -03:00
Jiri Olsa
c5ebcedb56 perf: Add ability to attach user stack dump to sample
Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
of the user level stack on sample. The size of the dump is specified by
sample_stack_user value.

Being able to dump parts of the user stack, starting from the stack
pointer, will be useful to make a post mortem dwarf CFI based stack
unwinding.

Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
architecture provides user stack dump on perf event samples.  This needs
access to the user stack pointer which is not unified across
architectures. Enabling this for x86 architecture.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-10 12:17:58 -03:00
Jiri Olsa
5685e0ff45 perf: Add perf_output_skip function to skip bytes in sample
Introducing perf_output_skip function to be able to skip data within the
perf ring buffer.

When writing data into perf ring buffer we first reserve needed place in
ring buffer and then copy the actual data.

There's a possibility we won't be able to fill all the reserved size
with data, so we need a way to skip the remaining bytes.

This is going to be useful when storing the user stack dump, where we
might end up with less data than we originally requested.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-5-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-10 12:16:22 -03:00
Frederic Weisbecker
91d7753a45 perf: Factor __output_copy to be usable with specific copy function
Adding a generic way to use __output_copy function with specific copy
function via DEFINE_PERF_OUTPUT_COPY macro.

Using this to add new __output_copy_user function, that provides output
copy from user pointers. For x86 the copy_from_user_nmi function is used
and __copy_from_user_inatomic for the rest of the architectures.

This new function will be used in user stack dump on sample, coming in
next patches.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-10 11:44:06 -03:00
Jiri Olsa
4018994f3d perf: Add ability to attach user level registers dump to sample
Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
user level registers on sample. Registers we want to dump are specified
by sample_regs_user bitmask.

Only user level registers are dumped at the moment. Meaning the register
values of the user space context as it was before the user entered the
kernel for whatever reason (syscall, irq, exception, or a PMI happening
in userspace).

The layout of the sample_regs_user bitmap is described in
asm/perf_regs.h for archs that support register dump.

This is going to be useful to bring Dwarf CFI based stack unwinding on
top of samples.

Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Dump registers ABI specification. ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-10 11:31:26 -03:00
Rafael J. Wysocki
300d3739e8 Revert "NMI watchdog: fix for lockup detector breakage on resume"
Revert commit 45226e9 (NMI watchdog: fix for lockup detector breakage
on resume) which breaks resume from system suspend on my SH7372
Mackerel board (by causing a NULL pointer dereference to happen) and
is generally wrong, because it abuses the CPU hotplug functionality
in a shamelessly blatant way.

The original issue should be addressed through appropriate syscore
resume callback instead.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-08-08 20:49:45 +02:00
Wang Tianhong
87abb3b15c tracing/trivial: Fix some typos in kernel/trace
Fix some typos in kernel/trace.

Link: http://lkml.kernel.org/r/1343887320.2228.9.camel@louis-ThinkPad-T410

Signed-off-by: Wang Tianhong <wangthbj@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-08-07 09:43:32 -04:00
Jiri Olsa
92d8d4a8b0 tracing/filter: Add missing initialization
Add missing initialization for ret variable. Its initialization
is based on the re_cnt variable, which is being set deep down
in the ftrace_function_filter_re function.

I'm not sure compilers would be smart enough to see this in near
future, so killing the warning this way.

Link: http://lkml.kernel.org/r/1340120894-9465-2-git-send-email-jolsa@redhat.com

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-08-07 09:42:47 -04:00
Steven Rostedt
3c18c10bde tracing: Fix wakeup_rt self test on virtual machines
The warkeup_rt self test used msleep() calls to wait for real time
tasks to wake up and run. On bare-metal hardware, this was enough as
the scheduler should let the RT task run way before the non-RT task
wakes up from the msleep(). If it did not, then that would mean the
scheduler was broken.

But when dealing with virtual machines, this is a different story.
If the RT task wakes up on a VCPU, it's up to the host to decide when
that task gets to schedule, which can be far behind the time that the
non-RT task wakes up. In this case, the test would fail incorrectly.

As we are not testing the scheduler, but instead the wake up tracing,
we can use completions to wait and not depend on scheduler timings
to see if events happen on time.

Link: http://lkml.kernel.org/r/1343663105.3847.7.camel@fedora

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-08-07 09:40:51 -04:00
Gleb Natapov
a181dc14ed jump_label: Export jump_label_rate_limit()
CC: Jason Baron <jbaron@redhat.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06 19:00:35 +03:00
Ingo Molnar
1d17d17484 time: Fix adjustment cleanup bug in timekeeping_adjust()
Tetsuo Handa reported that sporadically the system clock starts
counting up too quickly which is enough to confuse the hangcheck
timer to print a bogus stall warning.

Commit 2a8c0883 "time: Move xtime_nsec adjustment underflow handling
timekeeping_adjust" overlooked this exit path:

        } else
                return;

which should really be a proper exit sequence, fixing the bug as a
side effect.

Also make the flow more readable by properly balancing curly
braces.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: john.stultz@linaro.org
Cc: a.p.zijlstra@chello.nl
Cc: richardcochran@gmail.com
Cc: prarit@redhat.com
Link: http://lkml.kernel.org/r/20120804192114.GA28347@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-05 12:37:14 +02:00
Linus Torvalds
c4e62d6785 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex fixes from Ingo Molnar:
 "A couple of futex fixes from Darren Hart: two bugs reported by Dave
  Jones (found with his trinity test) and Dan Carpenter through static
  analysis.  The third found while debugging the first two."

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
  futex: Fix bug in WARN_ON for NULL q.pi_state
  futex: Test for pi_mutex on fault in futex_wait_requeue_pi()
2012-08-03 11:00:26 -07:00
Linus Torvalds
ddc5057c1c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "One regression fix, and a couple of cleanups that clean up the code
  flow in areas that had high-profile bugs recently."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Remove all direct references to timekeeper
  time: Clean up offs_real/wall_to_mono and offs_boot/total_sleep_time updates
  time: Clean up stray newlines
  time/jiffies: Rename ACTHZ to SHIFTED_HZ
  time/jiffies: Allow CLOCK_TICK_RATE to be undefined
  time: Fix casting issue in tk_set_xtime and tk_xtime_add
2012-08-03 10:58:57 -07:00
Linus Torvalds
fcc1d2a9ce Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Fixes and two late cleanups"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cleanups: Add load balance cpumask pointer to 'struct lb_env'
  sched: Fix comment about PREEMPT_ACTIVE bit location
  sched: Fix minor code style issues
  sched: Use task_rq_unlock() in __sched_setscheduler()
  sched/numa: Add SD_PERFER_SIBLING to CPU domain
2012-08-03 10:58:13 -07:00
Linus Torvalds
bd463a0606 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Fix merge window fallout and fix sleep profiling (this was always
  broken, so it's not a fix for the merge window - we can skip this one
  from the head of the tree)."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/trace: Add ability to set a target task for events
  perf/x86: Fix USER/KERNEL tagging of samples properly
  perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit
2012-08-03 10:57:20 -07:00
Linus Torvalds
148311d2ad Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Allow irq chips to mark themself oneshot safe
2012-08-03 10:56:44 -07:00
Linus Torvalds
d97e1dcde5 Merge tag 'for_linux-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb
Pull KGDB/KDB/usb-dbgp fixes and cleanups from Jason Wessel:
 "There are no new features, those will be delayed to the 3.7 window.
  There are only fixes/cleanup against the usual kernel churn and we are
  removing more lines than we add:

   - usb-dbgp - increase the controller wait time to come out of halt.
   - kdb - Remove unused KDB_FLAG_ONLY_DO_DUMP code and cpu in more prompt
   - debug core - pass NMI type on archs that provide NMI types"

* tag 'for_linux-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
  USB: echi-dbgp: increase the controller wait time to come out of halt.
  kernel/debug: Make use of KGDB_REASON_NMI
  kdb: Remove cpu from the more prompt
  kdb: Remove unused KDB_FLAG_ONLY_DO_DUMP
2012-08-03 10:53:47 -07:00
Tejun Heo
8376fe22c7 workqueue: implement mod_delayed_work[_on]()
Workqueue was lacking a mechanism to modify the timeout of an already
pending delayed_work.  delayed_work users have been working around
this using several methods - using an explicit timer + work item,
messing directly with delayed_work->timer, and canceling before
re-queueing, all of which are error-prone and/or ugly.

This patch implements mod_delayed_work[_on]() which behaves similarly
to mod_timer() - if the delayed_work is idle, it's queued with the
given delay; otherwise, its timeout is modified to the new value.
Zero @delay guarantees immediate execution.

v2: Updated to reflect try_to_grab_pending() changes.  Now safe to be
    called from bh context.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
2012-08-03 10:30:47 -07:00
Tejun Heo
bbb68dfaba workqueue: mark a work item being canceled as such
There can be two reasons try_to_grab_pending() can fail with -EAGAIN.
One is when someone else is queueing or deqeueing the work item.  With
the previous patches, it is guaranteed that PENDING and queued state
will soon agree making it safe to busy-retry in this case.

The other is if multiple __cancel_work_timer() invocations are racing
one another.  __cancel_work_timer() grabs PENDING and then waits for
running instances of the target work item on all CPUs while holding
PENDING and !queued.  try_to_grab_pending() invoked from another task
will keep returning -EAGAIN while the current owner is waiting.

Not distinguishing the two cases is okay because __cancel_work_timer()
is the only user of try_to_grab_pending() and it invokes
wait_on_work() whenever grabbing fails.  For the first case, busy
looping should be fine but wait_on_work() doesn't cause any critical
problem.  For the latter case, the new contender usually waits for the
same condition as the current owner, so no unnecessarily extended
busy-looping happens.  Combined, these make __cancel_work_timer()
technically correct even without irq protection while grabbing PENDING
or distinguishing the two different cases.

While the current code is technically correct, not distinguishing the
two cases makes it difficult to use try_to_grab_pending() for other
purposes than canceling because it's impossible to tell whether it's
safe to busy-retry grabbing.

This patch adds a mechanism to mark a work item being canceled.
try_to_grab_pending() now disables irq on success and returns -EAGAIN
to indicate that grabbing failed but PENDING and queued states are
gonna agree soon and it's safe to busy-loop.  It returns -ENOENT if
the work item is being canceled and it may stay PENDING && !queued for
arbitrary amount of time.

__cancel_work_timer() is modified to mark the work canceling with
WORK_OFFQ_CANCELING after grabbing PENDING, thus making
try_to_grab_pending() fail with -ENOENT instead of -EAGAIN.  Also, it
invokes wait_on_work() iff grabbing failed with -ENOENT.  This isn't
necessary for correctness but makes it consistent with other future
users of try_to_grab_pending().

v2: try_to_grab_pending() was testing preempt_count() to ensure that
    the caller has disabled preemption.  This triggers spuriously if
    !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
    Fengguang Wu.

v3: Updated so that try_to_grab_pending() disables irq on success
    rather than requiring preemption disabled by the caller.  This
    makes busy-looping easier and will allow try_to_grap_pending() to
    be used from bh/irq contexts.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
2012-08-03 10:30:46 -07:00
Tejun Heo
36e227d242 workqueue: reorganize try_to_grab_pending() and __cancel_timer_work()
* Use bool @is_dwork instead of @timer and let try_to_grab_pending()
  use to_delayed_work() to determine the delayed_work address.

* Move timer handling from __cancel_work_timer() to
  try_to_grab_pending().

* Make try_to_grab_pending() use -EAGAIN instead of -1 for
  busy-looping and drop the ret local variable.

* Add proper function comment to try_to_grab_pending().

This makes the code a bit easier to understand and will ease further
changes.  This patch doesn't make any functional change.

v2: Use @is_dwork instead of @timer.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
7beb2edf44 workqueue: factor out __queue_delayed_work() from queue_delayed_work_on()
This is to prepare for mod_delayed_work[_on]() and doesn't cause any
functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
b549007727 workqueue: introduce WORK_OFFQ_FLAG_*
Low WORK_STRUCT_FLAG_BITS bits of work_struct->data contain
WORK_STRUCT_FLAG_* and flush color.  If the work item is queued, the
rest point to the cpu_workqueue with WORK_STRUCT_CWQ set; otherwise,
WORK_STRUCT_CWQ is clear and the bits contain the last CPU number -
either a real CPU number or one of WORK_CPU_*.

Scheduled addition of mod_delayed_work[_on]() requires an additional
flag, which is used only while a work item is off queue.  There are
more than enough bits to represent off-queue CPU number on both 32 and
64bits.  This patch introduces WORK_OFFQ_FLAG_* which occupy the lower
part of the @work->data high bits while off queue.  This patch doesn't
define any actual OFFQ flag yet.

Off-queue CPU number is now shifted by WORK_OFFQ_CPU_SHIFT, which adds
the number of bits used by OFFQ flags to WORK_STRUCT_FLAG_SHIFT, to
make room for OFFQ flags.

To avoid shift width warning with large WORK_OFFQ_FLAG_BITS, ulong
cast is added to WORK_STRUCT_NO_CPU and, just in case, BUILD_BUG_ON()
to check that there are enough bits to accomodate off-queue CPU number
is added.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
bf4ede014e workqueue: move try_to_grab_pending() upwards
try_to_grab_pending() will be used by to-be-implemented
mod_delayed_work[_on]().  Move try_to_grab_pending() and related
functions above queueing functions.

This patch only moves functions around.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
715f130080 workqueue: fix zero @delay handling of queue_delayed_work_on()
If @delay is zero and the dealyed_work is idle, queue_delayed_work()
queues it for immediate execution; however, queue_delayed_work_on()
lacks this logic and always goes through timer regardless of @delay.

This patch moves 0 @delay handling logic from queue_delayed_work() to
queue_delayed_work_on() so that both functions behave the same.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
57469821fd workqueue: unify local CPU queueing handling
Queueing functions have been using different methods to determine the
local CPU.

* queue_work() superflously uses get/put_cpu() to acquire and hold the
  local CPU across queue_work_on().

* delayed_work_timer_fn() uses smp_processor_id().

* queue_delayed_work() calls queue_delayed_work_on() with -1 @cpu
  which is interpreted as the local CPU.

* flush_delayed_work[_sync]() were using raw_smp_processor_id().

* __queue_work() interprets %WORK_CPU_UNBOUND as local CPU if the
  target workqueue is bound one but nobody uses this.

This patch converts all functions to uniformly use %WORK_CPU_UNBOUND
to indicate local CPU and use the local binding feature of
__queue_work().  unlikely() is dropped from %WORK_CPU_UNBOUND handling
in __queue_work().

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:45 -07:00
Tejun Heo
d8e794dfd5 workqueue: set delayed_work->timer function on initialization
delayed_work->timer.function is currently initialized during
queue_delayed_work_on().  Export delayed_work_timer_fn() and set
delayed_work timer function during delayed_work initialization
together with other fields.

This ensures the timer function is always valid on an initialized
delayed_work.  This is to help mod_delayed_work() implementation.

To detect delayed_work users which diddle with the internal timer,
trigger WARN if timer function doesn't match on queue.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:45 -07:00
Tejun Heo
8930caba3d workqueue: disable irq while manipulating PENDING
Queueing operations use WORK_STRUCT_PENDING_BIT to synchronize access
to the target work item.  They first try to claim the bit and proceed
with queueing only after that succeeds and there's a window between
PENDING being set and the actual queueing where the task can be
interrupted or preempted.

There's also a similar window in process_one_work() when clearing
PENDING.  A work item is dequeued, gcwq->lock is released and then
PENDING is cleared and the worker might get interrupted or preempted
between releasing gcwq->lock and clearing PENDING.

cancel[_delayed]_work_sync() tries to claim or steal PENDING.  The
function assumes that a work item with PENDING is either queued or in
the process of being [de]queued.  In the latter case, it busy-loops
until either the work item loses PENDING or is queued.  If canceling
coincides with the above described interrupts or preemptions, the
canceling task will busy-loop while the queueing or executing task is
preempted.

This patch keeps irq disabled across claiming PENDING and actual
queueing and moves PENDING clearing in process_one_work() inside
gcwq->lock so that busy looping from PENDING && !queued doesn't wait
for interrupted/preempted tasks.  Note that, in process_one_work(),
setting last CPU and clearing PENDING got merged into single
operation.

This removes possible long busy-loops and will allow using
try_to_grab_pending() from bh and irq contexts.

v2: __queue_work() was testing preempt_count() to ensure that the
    caller has disabled preemption.  This triggers spuriously if
    !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
    Fengguang Wu.

v3: Disable irq instead of preemption.  IRQ will be disabled while
    grabbing gcwq->lock later anyway and this allows using
    try_to_grab_pending() from bh and irq contexts.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
2012-08-03 10:30:45 -07:00
Tejun Heo
959d1af8cf workqueue: add missing smp_wmb() in process_one_work()
WORK_STRUCT_PENDING is used to claim ownership of a work item and
process_one_work() releases it before starting execution.  When
someone else grabs PENDING, all pre-release updates to the work item
should be visible and all updates made by the new owner should happen
afterwards.

Grabbing PENDING uses test_and_set_bit() and thus has a full barrier;
however, clearing doesn't have a matching wmb.  Given the preceding
spin_unlock and use of clear_bit, I don't believe this can be a
problem on an actual machine and there hasn't been any related report
but it still is theretically possible for clear_pending to permeate
upwards and happen before work->entry update.

Add an explicit smp_wmb() before work_clear_pending().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
2012-08-03 10:30:45 -07:00
Tejun Heo
d4283e9378 workqueue: make queueing functions return bool
All queueing functions return 1 on success, 0 if the work item was
already pending.  Update them to return bool instead.  This signifies
better that they don't return 0 / -errno.

This is cleanup and doesn't cause any functional difference.

While at it, fix comment opening for schedule_work_on().

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:44 -07:00
Tejun Heo
0a13c00e9d workqueue: reorder queueing functions so that _on() variants are on top
Currently, queue/schedule[_delayed]_work_on() are located below the
counterpart without the _on postifx even though the latter is usually
implemented using the former.  Swap them.

This is cleanup and doesn't cause any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:44 -07:00
Tetsuo Handa
9f99798ff4 ptrace: mark __ptrace_may_access() static
__ptrace_may_access() is used within only kernel/ptrace.c.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-08-03 14:47:17 +10:00
Linus Torvalds
a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00