* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
x86: disable __do_IRQ support
sparseirq, powerpc/cell: fix unused variable warning in interrupt.c
genirq: deprecate obsolete typedefs and defines
genirq: deprecate __do_IRQ
genirq: add doc to struct irqaction
genirq: use kzalloc instead of explicit zero initialization
genirq: make irqreturn_t an enum
genirq: remove redundant if condition
genirq: remove unused hw_irq_controller typedef
irq: export remove_irq() and setup_irq() symbols
irq: match remove_irq() args with setup_irq()
irq: add remove_irq() for freeing of setup_irq() irqs
genirq: assert that irq handlers are indeed running in hardirq context
irq: name 'p' variables a bit better
irq: further clean up the free_irq() code flow
irq: refactor and clean up the free_irq() code flow
irq: clean up manage.c
irq: use GFP_KERNEL for action allocation in request_irq()
kernel/irq: fix sparse warning: make symbol static
irq: optimize init_kstat_irqs/init_copy_kstat_irqs
...
* 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits)
sched: Add comments to find_busiest_group() function
sched: Refactor the power savings balance code
sched: Optimize the !power_savings_balance during fbg()
sched: Create a helper function to calculate imbalance
sched: Create helper to calculate small_imbalance in fbg()
sched: Create a helper function to calculate sched_domain stats for fbg()
sched: Define structure to store the sched_domain statistics for fbg()
sched: Create a helper function to calculate sched_group stats for fbg()
sched: Define structure to store the sched_group statistics for fbg()
sched: Fix indentations in find_busiest_group() using gotos
sched: Simple helper functions for find_busiest_group()
sched: remove unused fields from struct rq
sched: jiffies not printed per CPU
sched: small optimisation of can_migrate_task()
sched: fix typos in documentation
sched: add avg_overlap decay
x86, sched_clock(): mark variables read-mostly
sched: optimize ttwu vs group scheduling
sched: TIF_NEED_RESCHED -> need_reshed() cleanup
sched: don't rebalance if attached on NULL domain
...
Impact: fix crash (hang) when using TRACE_EVENT_FORMAT filter files
filters are only hooked up to the tracepoint events defined using
TRACE_EVENT but not the tracers that use TRACE_EVENT_FORMAT, such
as ftrace.
Do not display the filter files at all for TRACE_EVENT_FORMAT events
for the time being.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1237878882.8339.61.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Show the average time in the function (Time / Hit)
Function Hit Time Avg
-------- --- ---- ---
mwait_idle 51 140326.6 us 2751.503 us
smp_apic_timer_interrupt 47 3517.735 us 74.845 us
schedule 10 2738.754 us 273.875 us
__schedule 10 2732.857 us 273.285 us
hrtimer_interrupt 47 1896.104 us 40.342 us
irq_exit 56 1711.833 us 30.568 us
__run_hrtimer 47 1315.589 us 27.991 us
tick_sched_timer 47 1138.690 us 24.227 us
do_softirq 56 1116.829 us 19.943 us
__do_softirq 56 1066.932 us 19.052 us
do_IRQ 9 926.153 us 102.905 us
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: safer code
The on the fly allocator for the function profiler was to save
memory. But at the expense of stability. Although it survived several
tests, allocating from the function tracer is just too risky, just
to save space.
This patch removes the allocator and simply allocates enough entries
at start up.
Each function gets a profiling structure of 40 bytes. With an average
of 20K functions, and this is for each CPU, we have 800K per online
CPU. This is not too bad, at least for non-embedded.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
"Because when we call ftrace_free_rec we change the rec->ip to point to the
next record in the chain. Something is very wrong if rec->ip >= s &&
rec->ip < e and the record is already free."
"Note, use FTRACE_WARN_ON() macro. This way it shuts down ftrace if it is
hit and helps to avoid further damage later."
-- Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: make trace_stat files show items with the original order
trace_stat tracer reverse the items, it makes the output
looks a little ugly.
Example, when we read trace_stat/workqueues, we get cpu#7's stat.
at first, and then cpu#6... cpu#0.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49C9F23F.5040307@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar suggested clean ups for the profiling code. This patch
makes those updates.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
graph time is the time that a function is executing another function.
Thus if function A calls B, if graph-time is set, then the time for
A includes B. This is the default behavior. But if graph-time is off,
then the time spent executing B is subtracted from A.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: speed enhancement
By making the function profiler record in per cpu data we not only
get better readings, avoid races, we also do not have to take any
locks.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
If the function graph trace is enabled, the function profiler will
use it to take the timing of the functions.
cat /debug/tracing/trace_stat/functions
Function Hit Time
-------- --- ----
mwait_idle 127 183028.4 us
schedule 26 151997.7 us
__schedule 31 151975.1 us
sys_wait4 2 74080.53 us
do_wait 2 74077.80 us
sys_newlstat 138 39929.16 us
do_path_lookup 179 39845.79 us
vfs_lstat_fd 138 39761.97 us
user_path_at 153 39469.58 us
path_walk 179 39435.76 us
__link_path_walk 189 39143.73 us
[...]
Note the times are skewed due to the function graph tracer not taking
into account schedules.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: reduce size of memory in function profiler
The function profiler originally introduces its counters into the
function records itself. There is 20 thousand different functions on
a normal system, and that is adding 20 thousand counters for profiling
event when not needed.
A normal run of the profiler yields only a couple of thousand functions
executed, depending on what is being profiled. This means we have around
18 thousand useless counters.
This patch rectifies this by moving the data out of the function
records used by dynamic ftrace. Data is preallocated to hold the functions
when the profiling begins. Checks are made during profiling to see if
more recorcds should be allocated, and they are allocated if it is safe
to do so.
This also removes the dependency from using dynamic ftrace, and also
removes the overhead by having it enabled.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: new profiling feature
This patch adds a function profiler. In debugfs/tracing/ two new
files are created.
function_profile_enabled - to enable or disable profiling
trace_stat/functions - the profiled functions.
For example:
echo 1 > /debugfs/tracing/function_profile_enabled
./hackbench 50
echo 0 > /debugfs/tracing/function_profile_enabled
yields:
cat /debugfs/tracing/trace_stat/functions
Function Hit
-------- ---
_spin_lock 10106442
_spin_unlock 10097492
kfree 6013704
_spin_unlock_irqrestore 4423941
_spin_lock_irqsave 4406825
__phys_addr 4181686
__slab_free 4038222
dput 4030130
path_put 4023387
unroll_tree_refs 4019532
[...]
The most hit functions are listed first. Functions that are not
hit are not listed.
This feature depends on and uses dynamic function tracing. When the
function profiling is disabled, no overhead occurs. But it still
takes up around 300KB to hold the data, thus it is not recomended
to keep it enabled for systems low on memory.
When a '1' is echoed into the function_profile_enabled file, the
counters for is function is reset back to zero. Thus you can see what
functions are hit most by different programs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Currently, if a trace_stat user wants a handle to some private data,
the trace_stat infrastructure does not supply a way to do that.
This patch passes the trace_stat structure to the start function of
the trace_stat code.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch combines Greg Bank's dprintk() work with the existing dynamic
printk patchset, we are now calling it 'dynamic debug'.
The new feature of this patchset is a richer /debugfs control file interface,
(an example output from my system is at the bottom), which allows fined grained
control over the the debug output. The output can be controlled by function,
file, module, format string, and line number.
for example, enabled all debug messages in module 'nf_conntrack':
echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control
to disable them:
echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control
A further explanation can be found in the documentation patch.
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Impact: cleanup, new schedstat ABI
Since they are used on in statistics and are always set to zero, the
following fields from struct rq have been removed: yld_exp_empty,
yld_act_empty and yld_both_empty.
Both Sched Debug and SCHEDSTAT_VERSION versions has also been
incremented since ABIs have been changed.
The schedtop tool has been updated to properly handle new version of
schedstat:
http://rt.wiki.kernel.org/index.php/Schedtop_utility
Signed-off-by: Luis Henriques <henrix@sapo.pt>
Acked-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090324221002.GA10061@hades.domain.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix PID output under namespaces
When current namespace is not the global namespace,
pid read from set_ftrace_pid is no correct.
# ~/newpid_namespace_run bash
# echo $$
1
# echo 1 > set_ftrace_pid
# cat set_ftrace_pid
3756
Since we write virtual PID to set_ftrace_pid, we need get
virtual PID when we read it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49C84D65.9050606@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: give user a choice to show times spent while sleeping
The user may want to see the time a function spent sleeping.
This patch adds the trace option "sleep-time" to allow that.
The "sleep-time" option is default on.
echo sleep-time > /debug/tracing/trace_options
produces:
------------------------------------------
2) avahi-d-3428 => <idle>-0
------------------------------------------
2) | finish_task_switch() {
2) 0.621 us | _spin_unlock_irq();
2) 2.202 us | }
2) ! 1002.197 us | }
2) ! 1003.521 us | }
where as,
echo nosleep-time > /debug/tracing/trace_options
produces:
0) <idle>-0 => yum-upd-3416
------------------------------------------
0) | finish_task_switch() {
0) 0.643 us | _spin_unlock_irq();
0) 2.342 us | }
0) + 41.302 us | }
0) + 42.453 us | }
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: more accurate timings
The current method of function graph tracing does not take into
account the time spent when a task is not running. This shows functions
that call schedule have increased costs:
3) + 18.664 us | }
------------------------------------------
3) <idle>-0 => kblockd-123
------------------------------------------
3) | finish_task_switch() {
3) 1.441 us | _spin_unlock_irq();
3) 3.966 us | }
3) ! 2959.433 us | }
3) ! 2961.465 us | }
This patch uses the tracepoint in the scheduling context switch to
account for time that has elapsed while a task is scheduled out.
Now we see:
------------------------------------------
3) <idle>-0 => edac-po-1067
------------------------------------------
3) | finish_task_switch() {
3) 0.685 us | _spin_unlock_irq();
3) 2.331 us | }
3) + 41.439 us | }
3) + 42.663 us | }
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: prevent crash due to multiple function graph tracers
The function graph tracer can currently only handle a single tracer
being registered. If another tracer registers with the function
graph tracer it can crash the system.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch move the timestamp from happening in the arch specific
code into the general code. This allows for better control by the tracer
to time manipulation.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
If the function profiler does not have any items recorded and one were
to cat the function stat file, the kernel would take a BUG with a NULL
pointer dereference.
Looking further into this, I found that returning NULL from stat_start
did not stop the stat logic, and would later call stat_next. This breaks
from the way seq_file works, so I looked into fixing the stat code.
This is where I noticed that the last next_entry is never freed.
It is allocated, and if the stat_next returns NULL, the code breaks out
of the loop, unlocks the mutex and exits. We never link the next_entry
nor do we free it. Thus it is a real memory leak.
This patch rearranges the code a bit to not only fix the memory leak,
but also to act more like seq_file where nothing is printed if there
is nothing to print. That is, stat_start returns NULL.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: new feature, allow symbolic values in /debug/tracing/act_mask
Print stringified act_mask instead of hex value:
# cat act_mask
read,write,barrier,sync,queue,requeue,issue,complete,fs,pc,ahead,meta,
discard,drv_data
# echo "meta,write" > act_mask
# cat act_mask
write,meta
Also:
- make act_mask accept "ahead", "meta", "discard" and "drv_data"
- use strsep() instead of strchr() to parse user input
- return -EINVAL if a token is not found in the mask map
- fix a bug that 'value' is unsigned, so it can < 0
- propagate error value of blk_trace_mask2str() to userspace, but not
always return -ENXIO.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49C8AB42.1000802@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Delta patch to address the review comments.
- Implement warning when IRQ_WAKE_THREAD is requested and no
thread handler installed
- coding style fixes
Pointed-out-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some devices use devres_request_irq() for to install their interrupt
handler. Add support for threaded interrupts to devres as well.
[tglx - simplified and adapted to latest threadirq version]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>