Commit Graph

9281 Commits

Author SHA1 Message Date
Ingo Molnar
ae7f6711d6 Merge branch 'perf/urgent' into perf/core
Merge reason: We want to queue up a dependent patch. Also update to
              later -rc's.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29 10:36:22 +01:00
John Stultz
7e1b584774 ntp: Cleanup xtime references in ntp.c
ntp.c doesn't need to access timekeeping internals directly, so change
xtime references to use the get_seconds() timekeeping interface.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: richard@rsk.demon.co.uk
LKML-Reference: <1264738844-21935-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-01-29 10:15:19 +01:00
john stultz
1f5b8f8a20 ntp: Make time_esterror and time_maxerror static
Make time_esterror and time_maxerror static as no one uses them
outside of ntp.c
    
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: richard@rsk.demon.co.uk
LKML-Reference: <1264719761.3437.47.camel@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-01-29 10:15:19 +01:00
Peter Zijlstra
75c9f3284a perf_events: Fix sample_period transfer on inherit
One problem with frequency driven counters is that we cannot
predict the rate at which they trigger, therefore we have to
start them at period=1, this causes a ramp up effect. However,
if we fail to propagate the stable state on fork each new child
will have to ramp up again. This can lead to significant
artifacts in sample data.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: eranian@google.com
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1264752266.4283.2121.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29 09:15:26 +01:00
Xiao Guangrong
1e12a4a7a3 tracing/kprobe: Cleanup unused return value of tracing functions
The return values of the kprobe's tracing functions are meaningless,
lets remove these.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4B60E9A3.2040505@cn.fujitsu.com>
[fweisbec@gmail: whitespace fixes, drop useless void returns in end
of functions]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-29 02:14:40 +01:00
Xiao Guangrong
430ad5a600 perf: Factorize trace events raw sample buffer operations
Introduce ftrace_perf_buf_prepare() and ftrace_perf_buf_submit() to
gather the common code that operates on raw events sampling buffer.
This cleans up redundant code between regular trace events, syscall
events and kprobe events.

Changelog v1->v2:
- Rename function name as per Masami and Frederic's suggestion
- Add __kprobes for ftrace_perf_buf_prepare() and make
  ftrace_perf_buf_submit() inline as per Masami's suggestion
- Export ftrace_perf_buf_prepare since modules will use it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4B60E92D.9000808@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-29 02:02:57 +01:00
Lai Jiangshan
ea2c68a08f tracing: Simplify test for function_graph tracing start point
In the function graph tracer, a calling function is to be traced
only when it is enabled through the set_graph_function file,
or when it is nested in an enabled function.

Current code uses TSK_TRACE_FL_GRAPH to test whether it is nested
or not. Looking at the code, we can get this:
(trace->depth > 0) <==> (TSK_TRACE_FL_GRAPH is set)

trace->depth is more explicit to tell that it is nested.
So we use trace->depth directly and simplify the code.

No functionality is changed.
TSK_TRACE_FL_GRAPH is not removed yet, it is left for future usage.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B4DB0B6.7040607@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-29 01:05:12 +01:00
Mahesh Salgaonkar
b23ff0e933 hw_breakpoints: Release the bp slot if arch_validate_hwbkpt_settings() fails.
On a given architecture, when hardware breakpoint registration fails
due to un-supported access type (read/write/execute), we lose the bp
slot since register_perf_hw_breakpoint() does not release the bp slot
on failure.
Hence, any subsequent hardware breakpoint registration starts failing
with 'no space left on device' error.

This patch introduces error handling in register_perf_hw_breakpoint()
function and releases bp slot on error.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
LKML-Reference: <20100121125516.GA32521@in.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-28 14:15:51 +01:00
Frans Pop
9d3cfc4c1d sched: Correct printk whitespace in warning from cpu down task check
Due to an incorrect line break the output currently contains tabs.
Also remove trailing space.

The actual output that logcheck sent me looked like this:
 Task events/1 (pid = 10) is on cpu 1^I^I^I^I(state = 1, flags = 84208040)

After this patch it becomes:
 Task events/1 (pid = 10) is on cpu 1 (state = 1, flags = 84208040)

Signed-off-by: Frans Pop <elendilplanet.nl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <201001251456.34996.elendil@planet.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-28 06:59:55 +01:00
Peter Zijlstra
11854247e2 sched: Fix incorrect sanity check
We moved to migrate on wakeup, which means that sleeping tasks could
still be present on offline cpus. Amend the check to only test running
tasks.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-28 06:59:51 +01:00
Peter Zijlstra
abd5071394 perf: Reimplement frequency driven sampling
There was a bug in the old period code that caused intel_pmu_enable_all()
or native_write_msr_safe() to show up quite high in the profiles.

In staring at that code it made my head hurt, so I rewrote it in a
hopefully simpler fashion. Its now fully symetric between tick and
overflow driven adjustments and uses less data to boot.

The only complication is that it basically wants to do a u128 division.
The code approximates that in a rather simple truncate until it fits
fashion, taking care to balance the terms while truncating.

This version does not generate that sampling artefact.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-27 08:39:33 +01:00
Oleg Nesterov
48d5067417 lockdep: Fix check_usage_backwards() error message
Lockdep has found the real bug, but the output doesn't look right to me:

> =========================================================
> [ INFO: possible irq lock inversion dependency detected ]
> 2.6.33-rc5 #77
> ---------------------------------------------------------
> emacs/1609 just changed the state of lock:
>  (&(&tty->ctrl_lock)->rlock){+.....}, at: [<ffffffff8127c648>] tty_fasync+0xe8/0x190
> but this lock took another, HARDIRQ-unsafe lock in the past:
>  (&(&sighand->siglock)->rlock){-.....}

"HARDIRQ-unsafe" and "this lock took another" looks wrong, afaics.

>   ... key      at: [<ffffffff81c054a4>] __key.46539+0x0/0x8
>   ... acquired at:
>    [<ffffffff81089af6>] __lock_acquire+0x1056/0x15a0
>    [<ffffffff8108a0df>] lock_acquire+0x9f/0x120
>    [<ffffffff81423012>] _raw_spin_lock_irqsave+0x52/0x90
>    [<ffffffff8127c1be>] __proc_set_tty+0x3e/0x150
>    [<ffffffff8127e01d>] tty_open+0x51d/0x5e0

The stack-trace shows that this lock (ctrl_lock) was taken under
->siglock (which is hopefully irq-safe).

This is a clear typo in check_usage_backwards() where we tell the print a
fancy routine we're forwards.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100126181641.GA10460@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-27 08:34:02 +01:00
Mike Frysinger
0368897034 tracing/documentation: Cover new frame pointer semantics
Update the graph tracer examples to cover the new frame pointer semantics
(in terms of passing it along).  Move the HAVE_FUNCTION_GRAPH_FP_TEST docs
out of the Kconfig, into the right place, and expand on the details.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
LKML-Reference: <1264165967-18938-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-26 17:00:39 -05:00
Steven Rostedt
3c05d74827 ring-buffer: Check for end of page in iterator
If the iterator comes to an empty page for some reason, or if
the page is emptied by a consuming read. The iterator code currently
does not check if the iterator is pass the contents, and may
return a false entry.

This patch adds a check to the ring buffer iterator to test if the
current page has been completely read and sets the iterator to the
next page if necessary.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-26 16:14:08 -05:00
Steven Rostedt
492a74f421 ring-buffer: Check if ring buffer iterator has stale data
Usually reads of the ring buffer is performed by a single task.
There are two types of reads from the ring buffer.

One is a consuming read which will consume the entry that was read
and the next read will be the entry that follows.

The other is an iterator that will let the user read the contents of
the ring buffer without modifying it. When an iterator is allocated,
writes to the ring buffer are disabled to protect the iterator.

The problem exists when consuming reads happen while an iterator is
allocated. Specifically, the kind of read that swaps out an entire
page (used by splice) and replaces it with a new read. If the iterator
is on the page that is swapped out, then the next read may read
from this swapped out page and return garbage.

This patch adds a check when reading the iterator to make sure that
the iterator contents are still valid. If a consuming read has taken
place, the iterator is reset.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-26 16:09:30 -05:00
Thomas Gleixner
7b7422a566 clocksource: Prevent potential kgdb dead lock
commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume
logic) introduced a potential kgdb dead lock. When the kernel is
stopped by kgdb inside code which holds watchdog_lock then kgdb dead
locks in clocksource_resume_watchdog().

clocksource_resume_watchdog() is called from kbdg via
clocksource_touch_watchdog() to avoid that the clock source watchdog
marks TSC unstable after the kernel has been stopped.

Solve this by replacing spin_lock with a spin_trylock and just return
in case the lock is held. Not resetting the watchdog might result in
TSC becoming marked unstable, but that's an acceptable penalty for
using kgdb.

The timekeeping is anyway easily screwed up by kgdb when the system
uses either jiffies or a clock source which wraps in short intervals
(e.g. pm_timer wraps about every 4.6s), so we really do not have to
worry about that occasional TSC marked unstable side effect.

The second caller of clocksource_resume_watchdog() is
clocksource_resume(). The trylock is safe here as well because the
system is UP at this point, interrupts are disabled and nothing else
can hold watchdog_lock().

Reported-by: Jason Wessel <jason.wessel@windriver.com>
LKML-Reference: <1264480000-6997-4-git-send-email-jason.wessel@windriver.com>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-01-26 14:53:16 +01:00
Steven Rostedt
74bf4076f2 tracing: Prevent kernel oops with corrupted buffer
If the contents of the ftrace ring buffer gets corrupted and the trace
file is read, it could create a kernel oops (usualy just killing the user
task thread). This is caused by the checking of the pid in the buffer.
If the pid is negative, it still references the cmdline cache array,
which could point to an invalid address.

The simple fix is to test for negative PIDs.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-25 15:11:53 -05:00
Linus Torvalds
f6760aa024 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clockevent: Don't remove broadcast device when cpu is dead
2010-01-24 10:38:07 -08:00
Linus Torvalds
b8be634e01 Merge git://git.infradead.org/~dwmw2/mtd-2.6.33
* git://git.infradead.org/~dwmw2/mtd-2.6.33:
  mtd: tests: fix read, speed and stress tests on NOR flash
  mtd: Really add ARM pismo support
  kmsg_dump: Dump on crash_kexec as well
2010-01-24 10:31:34 -08:00
Thomas Gleixner
60db48cacb sched: Queue a deboosted task to the head of the RT prio queue
rtmutex_set_prio() is used to implement priority inheritance for
futexes. When a task is deboosted it gets enqueued at the tail of its
RT priority list. This is violating the POSIX scheduling semantics:

rt priority list X contains two runnable tasks A and B

task A	 runs with priority X and holds mutex M
task C	 preempts A and is blocked on mutex M 
     	 -> task A is boosted to priority of task C (Y)
task A	 unlocks the mutex M and deboosts itself
     	 -> A is dequeued from rt priority list Y
	 -> A is enqueued to the tail of rt priority list X
task C	 schedules away
task B	 runs

This is wrong as task A did not schedule away and therefor violates
the POSIX scheduling semantics.

Enqueue the task to the head of the priority list instead. 

Reported-by: Mathias Weber <mathias.weber.mw1@roche.com>
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Carsten Emde <cbe@osadl.org>
Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.809074113@linutronix.de>
2010-01-22 18:09:59 +01:00
Thomas Gleixner
37dad3fce9 sched: Implement head queueing for sched_rt
The ability of enqueueing a task to the head of a SCHED_FIFO priority
list is required to fix some violations of POSIX scheduling policy.

Implement the functionality in sched_rt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Carsten Emde <cbe@osadl.org>
Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.772169931@linutronix.de>
2010-01-22 18:09:59 +01:00
Thomas Gleixner
ea87bb7853 sched: Extend enqueue_task to allow head queueing
The ability of enqueueing a task to the head of a SCHED_FIFO priority
list is required to fix some violations of POSIX scheduling policy.

Extend the related functions with a "head" argument.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Carsten Emde <cbe@osadl.org>
Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.734886007@linutronix.de>
2010-01-22 18:09:59 +01:00
Peter Zijlstra
fabf318e5e sched: Fix fork vs hotplug vs cpuset namespaces
There are a number of issues:

1) TASK_WAKING vs cgroup_clone (cpusets)

copy_process():

  sched_fork()
    child->state = TASK_WAKING; /* waiting for wake_up_new_task() */
  if (current->nsproxy != p->nsproxy)
     ns_cgroup_clone()
       cgroup_clone()
         mutex_lock(inode->i_mutex)
         mutex_lock(cgroup_mutex)
         cgroup_attach_task()
	   ss->can_attach()
           ss->attach() [ -> cpuset_attach() ]
             cpuset_attach_task()
               set_cpus_allowed_ptr();
                 while (child->state == TASK_WAKING)
                   cpu_relax();
will deadlock the system.


2) cgroup_clone (cpusets) vs copy_process

So even if the above would work we still have:

copy_process():

  if (current->nsproxy != p->nsproxy)
     ns_cgroup_clone()
       cgroup_clone()
         mutex_lock(inode->i_mutex)
         mutex_lock(cgroup_mutex)
         cgroup_attach_task()
	   ss->can_attach()
           ss->attach() [ -> cpuset_attach() ]
             cpuset_attach_task()
               set_cpus_allowed_ptr();
  ...

  p->cpus_allowed = current->cpus_allowed

over-writing the modified cpus_allowed.


3) fork() vs hotplug

  if we unplug the child's cpu after the sanity check when the child
  gets attached to the task_list but before wake_up_new_task() shit
  will meet with fan.

Solve all these issues by moving fork cpu selection into
wake_up_new_task().

Reported-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1264106190.4283.1314.camel@laptop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-01-21 23:25:31 +01:00
Linus Torvalds
e80b135985 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: x86: Add support for the ANY bit
  perf: Change the is_software_event() definition
  perf: Honour event state for aux stream data
  perf: Fix perf_event_do_pending() fallback callsite
  perf kmem: Print usage help for unknown commands
  perf kmem: Increase "Hit" column length
  hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change
  perf timechart: Use tid not pid for COMM change
2010-01-21 08:50:04 -08:00
Peter Zijlstra
22e190851f perf: Honour event state for aux stream data
Anton reported that perf record kept receiving events even after calling
ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
events didn't respect the disabled state and kept flowing in.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Anton Blanchard <anton@samba.org>
LKML-Reference: <1263459187.4244.265.camel@laptop>
CC: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:40 +01:00
Peter Zijlstra
fe432200ab perf: Fix perf_event_do_pending() fallback callsite
Paul questioned the context in which we should call
perf_event_do_pending(). After looking at that I found that it should be
called from IRQ context these days, however the fallback call-site is
placed in softirq context. Ammend this by placing the callback in the IRQ
timer path.

Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1263374859.4244.192.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:39 +01:00
Dhaval Giani
7c9414385e sched: Remove USER_SCHED
Remove the USER_SCHED feature. It has been scheduled to be removed in
2.6.34 as per http://marc.info/?l=linux-kernel&m=125728479022976&w=2

Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1263990378.24844.3.camel@localhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:18 +01:00
Gautham R Shenoy
871e35bc97 sched: Fix the place where group powers are updated
We want to update the sched_group_powers when balance_cpu == this_cpu.

Currently the group powers are updated only if the balance_cpu is the
first CPU in the local group. But balance_cpu = this_cpu could also be
the first idle cpu in the group. Hence fix the place where the group
powers are updated.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1264017764.5717.127.camel@jschopp-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:17 +01:00
Peter Zijlstra
8f190fb3f7 sched: Assume *balance is valid
Since all load_balance() callers will have !NULL balance parameters we
can now assume so and remove a few checks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:15 +01:00
Peter Zijlstra
f492e12ef0 sched: Remove load_balance_newidle()
The two functions: load_balance{,_newidle}() are very similar, with the
following differences:

 - rq->lock usage
 - sb->balance_interval updates
 - *balance check

So remove the load_balance_newidle() call with load_balance(.idle =
CPU_NEWLY_IDLE), explicitly unlock the rq->lock before calling (would be
done by double_lock_balance() anyway), and ignore the other differences
for now.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:14 +01:00
Peter Zijlstra
1af3ed3ddf sched: Unify load_balance{,_newidle}()
load_balance() and load_balance_newidle() look remarkably similar, one
key point they differ in is the condition on when to active balance.

So split out that logic into a separate function.

One side effect is that previously load_balance_newidle() used to fail
and return -1 under these conditions, whereas now it doesn't. I've not
yet fully figured out the whole -1 return case for either
load_balance{,_newidle}().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:13 +01:00
Peter Zijlstra
baa8c1102f sched: Add a lock break for PREEMPT=y
Since load-balancing can hold rq->locks for quite a long while, allow
breaking out early when there is lock contention.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:13 +01:00
Peter Zijlstra
230059de77 sched: Remove from fwd decls
Move code around to get rid of fwd declarations.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:12 +01:00
Peter Zijlstra
897c395f4c sched: Remove rq_iterator from move_one_task
Again, since we only iterate the fair class, remove the abstraction.

Since this is the last user of the rq_iterator, remove all that too.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:11 +01:00
Peter Zijlstra
ee00e66fff sched: Remove rq_iterator usage from load_balance_fair
Since we only ever iterate the fair class, do away with this abstraction.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:10 +01:00
Peter Zijlstra
3d45fd804a sched: Remove the sched_class load_balance methods
Take out the sched_class methods for load-balancing.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:09 +01:00
Peter Zijlstra
1e3c88bdeb sched: Move load balance code into sched_fair.c
Straight fwd code movement.

Since non of the load-balance abstractions are used anymore, do away with
them and simplify the code some. In preparation move the code around.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:08 +01:00
Yong Zhang
6d558c3ac9 sched: Reassign prev and switch_count when reacquire_kernel_lock() fail
Assume A->B schedule is processing, if B have acquired BKL before and it
need reschedule this time. Then on B's context, it will go to
need_resched_nonpreemptible for reschedule. But at this time, prev and
switch_count are related to A. It's wrong and will lead to incorrect
scheduler statistics.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001102238w7b0ddcadref00d345e2181d11@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:39:04 +01:00
Mike Galbraith
50b926e439 sched: Fix vmark regression on big machines
SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
enabled, leading to many cache misses on large machines as we traverse
looking for an idle shared cache to wake to.  Change the enabler of
select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
sibling domain level.

Reported-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1262612696.15495.15.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:39:03 +01:00
Xiaotian Feng
ea9d8e3f45 clockevent: Don't remove broadcast device when cpu is dead
Marc reported that the BUG_ON in clockevents_notify() triggers on his
system. This happens because the kernel tries to remove an active
clock event device (used for broadcasting) from the device list.

The handling of devices which can be used as per cpu device and as a
global broadcast device is suboptimal.

The simplest solution for now (and for stable) is to check whether the
device is used as global broadcast device, but this needs to be
revisited.

[ tglx: restored the cpuweight check and massaged the changelog ]

Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1262834564-13033-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2010-01-18 14:44:50 +01:00
Milton Miller
e03bcb6862 generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data
The smp ipi data is passed around and given write access by
other cpus and should be separated from per-cpu data consumed by
this cpu.

Looking for hot lines, I saw call_function_data shared with
tick_cpu_sched.

Signed-off-by: Milton Miller <miltonm@bga.com>
Acked-by: Anton Blanchard <anton@samba.org>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: : Nick Piggin <npiggin@suse.de>
LKML-Reference: <20100118020051.GR12666@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-18 09:02:59 +01:00
Ingo Molnar
f426a7e029 Merge branch 'perf/scheduling' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-01-18 08:56:41 +01:00
James Morris
2457552d1e Merge branch 'master' into next 2010-01-18 09:56:22 +11:00
Frederic Weisbecker
329c0e012b perf: Better order flexible and pinned scheduling
When a task gets scheduled in. We don't touch the cpu bound events
so the priority order becomes:

	cpu pinned, cpu flexible, task pinned, task flexible.

So schedule out cpu flexibles when a new task context gets in
and correctly order the groups to schedule in:

	task pinned, cpu flexible, task flexible.

Cpu pinned groups don't need to be touched at this time.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17 13:11:05 +01:00
Frederic Weisbecker
7defb0f879 perf: Don't schedule out/in pinned events on task tick
We don't need to schedule in/out pinned events on task tick,
now that pinned and flexible groups can be scheduled separately.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17 13:09:51 +01:00
Frederic Weisbecker
5b0311e1f2 perf: Allow pinned and flexible groups to be scheduled separately
Tune the scheduling helpers so that we can choose to schedule either
pinned and/or flexible groups from a context.

And while at it, refactor a bit the naming of these helpers to make
these more consistent and flexible.

There is no (intended) change in scheduling behaviour in this
patch.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17 13:08:57 +01:00
Frederic Weisbecker
42cce92f4d perf: Make __perf_event_sched_out static
__perf_event_sched_out doesn't need to be globally available, make
it static.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17 13:08:01 +01:00
Masami Hiramatsu
231e36f4d2 tracing/kprobe: Update kprobe tracing self test for new syntax
Update kprobe tracing self test for new syntax (it supports
deleting individual probes, and drops $argN support)
and behavior change (new probes are disabled in default).

This selftest includes the following checks:

 - Adding function-entry probe and return probe with arguments.
 - Enabling these probes.
 - Deleting it individually.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100114051211.7814.29436.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-17 08:15:35 +01:00
H Hartley Sweeten
6d686f4564 sched: Don't expose local functions
kernel/sched: don't expose local functions

The get_rr_interval_* functions are all class methods of
struct sched_class. They are not exported so make them
static.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <201001132021.53253.hartleys@visionengravers.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-17 08:09:45 +01:00
Frederic Weisbecker
24a53652e3 tracing: Drop the tr check from the graph tracing path
Each time we save a function entry from the function graph
tracer, we check if the trace array is set, which is wasteful
because it is set anyway before we start the tracer. All we need
is to ensure we have good read and write orderings. When we set
the trace array, we just need to guarantee it to be visible
before starting tracing.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
LKML-Reference: <1263453795-7496-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-17 08:06:25 +01:00