Both settimeofday() and clock_settime() promise with a 'const'
attribute not to alter the arguments passed in. This patch adds the
missing 'const' attribute into the various kernel functions
implementing these calls.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20110201134417.545698637@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The only user for this hook was selinux. sysctl routes every call
through /proc/sys/. Selinux and other security modules use the file
system checks for sysctl too, so no need for this hook any more.
Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
All callers of do_timer() are converted to xtime_update(). The only
users of xtime_lock are in kernel/time/. Make both local to
kernel/time/ and remove them from the global header files.
[ tglx: Reuse tick-internal.h instead of creating another local header
file. Massaged changelog ]
Signed-off-by: Torben Hohn <torbenh@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: johnstul@us.ibm.com
Cc: yong.zhang0@gmail.com
Cc: hch@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The hrtimer code accesses timekeeping variables under
xtime_lock. Provide a sensible accessor function and use it.
[ tglx: Removed the conditionals, unused variable, fixed codingstyle
and massaged changelog ]
Signed-off-by: Torben Hohn <torbenh@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: johnstul@us.ibm.com
Cc: yong.zhang0@gmail.com
Cc: hch@infradead.org
LKML-Reference: <20110127145905.23248.30458.stgit@localhost>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
do_timer() is primary timekeeping related. calc_global_load() is
called from do_timer() as well, but that's more for historical
reasons.
[ tglx: Fixed up the calc_global_load() reject andmassaged changelog ]
Signed-off-by: Torben Hohn <torbenh@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: johnstul@us.ibm.com
Cc: yong.zhang0@gmail.com
Cc: hch@infradead.org
LKML-Reference: <20110127145855.23248.56933.stgit@localhost>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Passing nowatchdog to kernel disables 2 things: creation of
watchdog threads AND initialization of percpu watchdog_hrtimer.
As hrtimers are initialized only at boot it's not possible to
enable watchdog later - for me all watchdog threads started to
eat 100% of CPU time, but they could just crash.
Additionally, even if these threads would start properly,
watchdog_disable_all_cpus was guarded by no_watchdog check, so
you couldn't disable watchdog.
To fix this, remove no_watchdog variable and use already
existing watchdog_enabled variable.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
[ removed another no_watchdog instance ]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1296230433-6261-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The new code of commit cd7eab44e(genirq: Add IRQ affinity notifiers)
references irq_desc.affinity which fails to compile with
CONFIG_GENERIC_HARDIRQS_NO_DEPRECATED=y.
Use irq_desc.irq_data.affinity instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Since check_prlimit_permission always fails in the case of SUID/GUID
processes, such processes are not able to read or set their own limits.
This commit changes this by assuming that process can always read/change
its own limits.
Signed-off-by: Kacper Kornet <kornet@camk.edu.pl>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In current rtmutex, the pending owner may be boosted by the tasks
in the rtmutex's waitlist when the pending owner is deboosted
or a task in the waitlist is boosted. This boosting is unrelated,
because the pending owner does not really take the rtmutex.
It is not reasonable.
Example.
time1:
A(high prio) onwers the rtmutex.
B(mid prio) and C (low prio) in the waitlist.
time2
A release the lock, B becomes the pending owner
A(or other high prio task) continues to run. B's prio is lower
than A, so B is just queued at the runqueue.
time3
A or other high prio task sleeps, but we have passed some time
The B and C's prio are changed in the period (time2 ~ time3)
due to boosting or deboosting. Now C has the priority higher
than B. ***Is it reasonable that C has to boost B and help B to
get the rtmutex?
NO!! I think, it is unrelated/unneed boosting before B really
owns the rtmutex. We should give C a chance to beat B and
win the rtmutex.
This is the motivation of this patch. This patch *ensures*
only the top waiter or higher priority task can take the lock.
How?
1) we don't dequeue the top waiter when unlock, if the top waiter
is changed, the old top waiter will fail and go to sleep again.
2) when requiring lock, it will get the lock when the lock is not taken and:
there is no waiter OR higher priority than waiters OR it is top waiter.
3) In any time, the top waiter is changed, the top waiter will be woken up.
The algorithm is much simpler than before, no pending owner, no
boosting for pending owner.
Other advantage of this patch:
1) The states of a rtmutex are reduced a half, easier to read the code.
2) the codes become shorter.
3) top waiter is not dequeued until it really take the lock:
they will retain FIFO when it is stolen.
Not advantage nor disadvantage
1) Even we may wakeup multiple waiters(any time when top waiter changed),
we hardly cause "thundering herd",
the number of wokenup task is likely 1 or very little.
2) two APIs are changed.
rt_mutex_owner() will not return pending owner, it will return NULL when
the top waiter is going to take the lock.
rt_mutex_next_owner() always return the top waiter.
will not return NULL if we have waiters
because the top waiter is not dequeued.
I have fixed the code that use these APIs.
need updated after this patch is accepted
1) Document/*
2) the testcase scripts/rt-tester/t4-l2-pi-deboost.tst
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4D3012D5.4060709@cn.fujitsu.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Commit 927c7a9e92 ("perf: Fix race in callchains") introduced
a mismatch in the sizing of struct callchain_cpus_entries.
nr_cpu_ids must be used instead of num_possible_cpus(), or we
might get out of bound memory accesses on some machines.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephane Eranian <eranian@google.com>
CC: stable@kernel.org
LKML-Reference: <1295980851.3588.351.camel@edumazet-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When a task is taken out of the fair class we must ensure the vruntime
is properly normalized because when we put it back in it will assume
to be normalized.
The case that goes wrong is when changing away from the fair class
while sleeping. Sleeping tasks have non-normalized vruntime in order
to make sleeper-fairness work. So treat the switch away from fair as a
wakeup and preserve the relative vruntime.
Also update sysrq-n to call the ->switch_{to,from} methods.
Reported-by: Onkalo Samu <samu.p.onkalo@nokia.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since commit 48c5ccae88 (sched: Simplify cpu-hot-unplug task
migration) this should no longer happen, so remove the code.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CONFIG_IRQ_TIME_ACCOUNTING adds ns granularity irq time on each CPU.
This info is already used in scheduler to do proper task chargeback
(earlier patches). This patch retro-fits this ns granularity
hardirq and softirq information to /proc/stat irq and softirq fields.
The update is still done on timer tick, where we look at accumulated
ns hardirq/softirq time and account the tick to user/system/irq/hardirq/guest
accordingly.
No new interface added.
Earlier versions looked at adding this as new fields in some /proc
files. This one seems to be the best in terms of impact to existing
apps, even though it has somewhat more kernel code than earlier versions.
Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-5-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Refactor account_system_time, to separate out the logic of
identifying the update needed and code that does actual update.
This is used by following patch for IRQ_TIME_ACCOUNTING,
which has different identification logic and same update logic.
Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-4-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since cfs->{load_stamp,load_last} are zero-initalized the initial load update
will consider the delta to be 'since the beginning of time'.
This results in a lot of pointless divisions to bring this large period to be
within the sysctl_sched_shares_window.
Fix this by initializing load_stamp to be 1 at cfs_rq initialization, this
allows for an initial load_stamp > load_last which then lets standard idle
truncation proceed.
We avoid spinning (and slightly improve consistency) by fixing delta to be
[period - 1] in this path resulting in a slightly more predictable shares ramp.
(Previously the amount of idle time preserved by the overflow would range between
[period/2,period-1].)
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044852.102126037@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Re-visiting this: Since update_cfs_shares will now only ever re-weight an
entity that is a relative parent of the current entity in enqueue_entity; we
can safely issue the account_entity_enqueue relative to that cfs_rq and avoid
the requirement for special handling of the enqueue case in update_cfs_shares.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044851.915214637@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The delta in clock_task is a more fair attribution of how much time a tg has
been contributing load to the current cpu.
While not really important it also means we're more in sync (by magnitude)
with respect to periodic updates (since __update_curr deltas are clock_task
based).
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044852.007092349@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since updates are against an entity's queuing cfs_rq it's not possible to
enter update_cfs_{shares,load} with a NULL cfs_rq. (Indeed, update_cfs_load
would crash prior to the check if we did anyway since we load is examined
during the initializers).
Also, in the update_cfs_load case there's no point
in maintaining averages for rq->cfs_rq since we don't perform shares
distribution at that level -- NULL check is replaced accordingly.
Thanks to Dan Carpenter for pointing out the deference before NULL check.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044851.825284940@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While care is taken around the zero-point in effective_load to not exceed
the instantaneous rq->weight, it's still possible (e.g. using wake_idx != 0)
for (load + effective_load) to underflow.
In this case the comparing the unsigned values can result in incorrect balanced
decisions.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044851.734245014@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: wacom - pass touch resolution to clients through input_absinfo
Input: wacom - add 2 Bamboo Pen and touch models
Input: sysrq - ensure sysrq_enabled and __sysrq_enabled are consistent
Input: sparse-keymap - fix KEY_VSW handling in sparse_keymap_setup
Input: tegra-kbc - add tegra keyboard driver
Input: gpio_keys - switch to using request_any_context_irq
Input: serio - allow registered drivers to get status flag
Input: ct82710c - return proper error code for ct82c710_open
Input: bu21013_ts - added regulator support
Input: bu21013_ts - remove duplicate resolution parameters
Input: tnetv107x-ts - don't treat NULL clk as an error
Input: tnetv107x-keypad - don't treat NULL clk as an error
Fix up trivial conflicts in drivers/input/keyboard/Makefile due to
additions of tc3589x/Tegra drivers
The -rt patches change the console_semaphore to console_mutex. As a
result, a quite large chunk of the patches changes all
acquire/release_console_sem() to acquire/release_console_mutex()
This commit makes things use more neutral function names which dont make
implications about the underlying lock.
The only real change is the return value of console_trylock which is
inverted from try_acquire_console_sem()
This patch also paves the way to switching console_sem from a semaphore to
a mutex.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
Signed-off-by: Torben Hohn <torbenh@gmx.de>
Cc: Thomas Gleixner <tglx@tglx.de>
Cc: Greg KH <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Fix time function double declaration with glibc
perf tools: Fix build by checking if extra warnings are supported
perf tools: Fix build when using gcc 3.4.6
perf tools: Add missing header, fixes build
perf tools: Fix 64 bit integer format strings
perf test: Fix build on older glibcs
perf: perf_event_exit_task_context: s/rcu_dereference/rcu_dereference_raw/
perf test: Use cpu_map->[cpu] when setting affinity
perf symbols: Fix annotation of thumb code
perf: Annotate cpuctx->ctx.mutex to avoid a lockdep splat
powerpc, perf: Fix frequency calculation for overflowing counters (FSL version)
perf: Fix perf_event_init_task()/perf_event_free_task() interaction
perf: Fix find_get_context() vs perf_event_exit_task() race
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix poor interactivity on UP systems due to group scheduler nice tune bug
Currently sysrq_enabled and __sysrq_enabled are initialised separately
and inconsistently, leading to sysrq being actually enabled by reported
as not enabled in sysfs. The first change to the sysfs configurable
synchronises these two:
static int __read_mostly sysrq_enabled = 1;
static int __sysrq_enabled;
Add a common define to carry the default for these preventing them becoming
out of sync again. Default this to 1 to mirror previous behaviour.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Michael Witten and Christian Kujau reported that the autogroup
scheduling feature hurts interactivity on their UP systems.
It turns out that this is an older bug in the group scheduling code,
and the wider appeal provided by the autogroup feature exposed it
more prominently.
When on UP with FAIR_GROUP_SCHED enabled, tune shares
only affect tg->shares, but is not reflected in
tg->se->load. The reason is that update_cfs_shares()
does nothing on UP.
So introduce update_cfs_shares() for UP && FAIR_GROUP_SCHED.
This issue was found when enable autogroup scheduling was enabled,
but it is an older bug that also exists on cgroup.cpu on UP.
Reported-and-Tested-by: Michael Witten <mfwitten@gmail.com>
Reported-and-Tested-by: Christian Kujau <christian@nerdbynature.de>
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110124073352.GA24186@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently only drivers that are built as modules have their versions
shown in /sys/module/<module_name>/version, but this information might
also be useful for built-in drivers as well. This especially important
for drivers that do not define any parameters - such drivers, if
built-in, are completely invisible from userspace.
This patch changes MODULE_VERSION() macro so that in case when we are
compiling built-in module, version information is stored in a separate
section. Kernel then uses this data to create 'version' sysfs attribute
in the same fashion it creates attributes for module parameters.
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU. This requires a reverse-map of IRQ affinity. Add a
notification mechanism to support this.
This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: linux-net-drivers@solarflare.com
Cc: Tom Herbert <therbert@google.com>
Cc: David Miller <davem@davemloft.net>
LKML-Reference: <1295470904.11126.84.camel@bwh-desktop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: note the nested NOT_RUNNING test in worker_clr_flags() isn't a noop
workqueue: relax lockdep annotation on flush_work()
Lockdep spotted:
loop_1b_instruc/1899 is trying to acquire lock:
(event_mutex){+.+.+.}, at: [<ffffffff810e1908>] perf_trace_init+0x3b/0x2f7
but task is already holding lock:
(&ctx->mutex){+.+.+.}, at: [<ffffffff810eb45b>] perf_event_init_context+0xc0/0x218
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&ctx->mutex){+.+.+.}:
-> #2 (cpu_hotplug.lock){+.+.+.}:
-> #1 (module_mutex){+.+...}:
-> #0 (event_mutex){+.+.+.}:
But because the deadlock would be cpuhotplug (cpu-event) vs fork
(task-event) it cannot, in fact, happen. We can annotate this by giving the
perf_event_context used for the cpuctx a different lock class from those
used by tasks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
lockdep: Move early boot local IRQ enable/disable status to init/main.c