Commit Graph

20264 Commits

Author SHA1 Message Date
Dan Streetman
b51d23e4e9 module: add per-module param_lock
Add a "param_lock" mutex to each module, and update params.c to use
the correct built-in or module mutex while locking kernel params.
Remove the kparam_block_sysfs_r/w() macros, replace them with direct
calls to kernel_param_[un]lock(module).

The kernel param code currently uses a single mutex to protect
modification of any and all kernel params.  While this generally works,
there is one specific problem with it; a module callback function
cannot safely load another module, i.e. with request_module() or even
with indirect calls such as crypto_has_alg().  If the module to be
loaded has any of its params configured (e.g. with a /etc/modprobe.d/*
config file), then the attempt will result in a deadlock between the
first module param callback waiting for modprobe, and modprobe trying to
lock the single kernel param mutex to set the new module's param.

This fixes that by using per-module mutexes, so that each individual module
is protected against concurrent changes in its own kernel params, but is
not blocked by changes to other module params.  All built-in modules
continue to use the built-in mutex, since they will always be loaded at
runtime and references (e.g. request_module(), crypto_has_alg()) to them
will never cause load-time param changing.

This also simplifies the interface used by modules to block sysfs access
to their params; while there are currently functions to block and unblock
sysfs param access which are split up by read and write and expect a single
kernel param to be passed, their actual operation is identical and applies
to all params, not just the one passed to them; they simply lock and unlock
the global param mutex.  They are replaced with direct calls to
kernel_param_[un]lock(THIS_MODULE), which locks THIS_MODULE's param_lock, or
if the module is built-in, it locks the built-in mutex.

Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-06-23 15:27:38 +09:30
Dan Streetman
5104b7d767 module: make perm const
Change the struct kernel_param.perm field to a const, as it should never
be changed.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cut from larger patch)
2015-06-23 15:27:37 +09:30
Rusty Russell
74c3dea355 params: suppress unused variable error, warn once just in case code changes.
It shouldn't fail due to OOM (it's boot time), and already warns if we
get two identical names.  But you never know what the future holds, and
WARN_ON_ONCE() keeps gcc happy with minimal code.

Reported-by: Louis Langholtz <lou_langholtz@me.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-06-23 15:27:37 +09:30
Luis R. Rodriguez
6727bb9c6a kernel/module.c: avoid ifdefs for sig_enforce declaration
There's no need to require an ifdef over the declaration
of sig_enforce as IS_ENABLED() can be used. While at it,
there's no harm in exposing this kernel parameter outside of
CONFIG_MODULE_SIG as it'd be a no-op on non module sig
kernels.

Now, technically we should in theory be able to remove
the #ifdef'ery over the declaration of the module parameter
as we are also trusting the bool_enable_only code for
CONFIG_MODULE_SIG kernels but for now remain paranoid
and keep it.

With time if no one can put a bullet through bool_enable_only
and if there are no technical requirements over not exposing
CONFIG_MODULE_SIG_FORCE with the measures in place by
bool_enable_only we could remove this last ifdef.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:13 +09:30
Luis R. Rodriguez
552f530cbc kernel/workqueue.c: remove ifdefs over wq_power_efficient
We can avoid an ifdef over wq_power_efficient's declaration
by just using IS_ENABLED().

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:12 +09:30
Luis R. Rodriguez
154be21c58 kernel/params.c: export param_ops_bool_enable_only
This will grant access to this helper to code built as modules.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Kyle McMartin <kyle@kernel.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:12 +09:30
Luis R. Rodriguez
d19f05d8a8 kernel/params.c: generalize bool_enable_only
This takes out the bool_enable_only implementation from
the module loading code and generalizes it so that others
can make use of it.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:11 +09:30
Luis R. Rodriguez
05f408dddb kernel/module.c: use generic module param operaters for sig_enforce
We're directly checking and modifying sig_enforce when needed instead
of using the generic helpers. This prevents us from generalizing this
helper so that others can use it. Use indirect helpers to allow us
to generalize this code a bit and to make it a bit more clear what
this is doing.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:11 +09:30
Luis R. Rodriguez
9c27847dda kernel/params: constify struct kernel_param_ops uses
Most code already uses consts for the struct kernel_param_ops,
sweep the kernel for the last offending stragglers. Other than
include/linux/moduleparam.h and kernel/params.c all other changes
were generated with the following Coccinelle SmPL patch. Merge
conflicts between trees can be handled with Coccinelle.

In the future git could get Coccinelle merge support to deal with
patch --> fail --> grammar --> Coccinelle --> new patch conflicts
automatically for us on patches where the grammar is available and
the patch is of high confidence. Consider this a feature request.

Test compiled on x86_64 against:

	* allnoconfig
	* allmodconfig
	* allyesconfig

@ const_found @
identifier ops;
@@

const struct kernel_param_ops ops = {
};

@ const_not_found depends on !const_found @
identifier ops;
@@

-struct kernel_param_ops ops = {
+const struct kernel_param_ops ops = {
};

Generated-by: Coccinelle SmPL
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: cocci@systeme.lip6.fr
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:10 +09:30
Peter Zijlstra
4f666546d0 module: Rework module_addr_{min,max}
__module_address() does an initial bound check before doing the
{list/tree} iteration to find the actual module. The bound variables
are nowhere near the mod_tree cacheline, in fact they're nowhere near
one another.

module_addr_min lives in .data while module_addr_max lives in .bss
(smarty pants GCC thinks the explicit 0 assignment is a mistake).

Rectify this by moving the two variables into a structure together
with the latch_tree_root to guarantee they all share the same
cacheline and avoid hitting two extra cachelines for the lookup.

While reworking the bounds code, move the bound update from allocation
to insertion time, this avoids updating the bounds for a few error
paths.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:09 +09:30
Peter Zijlstra
b7df4d1b23 module: Use __module_address() for module_address_lookup()
Use the generic __module_address() addr to struct module lookup
instead of open coding it once more.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:08 +09:30
Peter Zijlstra
6c9692e2d6 module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
Andrew worried about the overhead on small systems; only use the fancy
code when either perf or tracing is enabled.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:07 +09:30
Peter Zijlstra
93c2e105f6 module: Optimize __module_address() using a latched RB-tree
Currently __module_address() is using a linear search through all
modules in order to find the module corresponding to the provided
address. With a lot of modules this can take a lot of time.

One of the users of this is kernel_text_address() which is employed
in many stack unwinders; which in turn are used by perf-callchain and
ftrace (possibly from NMI context).

So by optimizing __module_address() we optimize many stack unwinders
which are used by both perf and tracing in performance sensitive code.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:07 +09:30
Peter Zijlstra
7fc26327b7 seqlock: Introduce raw_read_seqcount_latch()
Because with latches there is a strict data dependency on the seq load
we can avoid the rmb in favour of a read_barrier_depends.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:06 +09:30
Peter Zijlstra
6695b92a60 seqlock: Better document raw_write_seqcount_latch()
Improve the documentation of the latch technique as used in the
current timekeeping code, such that it can be readily employed
elsewhere.

Borrow from the comments in timekeeping and replace those with a
reference to this more generic comment.

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:04 +09:30
Peter Zijlstra
0be964be0d module: Sanitize RCU usage and locking
Currently the RCU usage in module is an inconsistent mess of RCU and
RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
does not imply synchronize_sched().

Most usage sites use preempt_{dis,en}able() which is RCU-sched, but
(most of) the modification sites use synchronize_rcu(). With the
exception of the module bug list, which actually uses RCU.

Convert everything over to RCU-sched.

Furthermore add lockdep asserts to all sites, because it's not at all
clear to me the required locking is observed, esp. on exported
functions.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:31:52 +09:30
Peter Zijlstra
bed831f9a2 module, jump_label: Fix module locking
As per the module core lockdep annotations in the coming patch:

[   18.034047] ---[ end trace 9294429076a9c673 ]---
[   18.047760] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[   18.059228]  ffffffff817d8676 ffff880036683c38 ffffffff8157e98b 0000000000000001
[   18.067541]  0000000000000000 ffff880036683c78 ffffffff8105fbc7 ffff880036683c68
[   18.075851]  ffffffffa0046b08 0000000000000000 ffffffffa0046d00 ffffffffa0046cc8
[   18.084173] Call Trace:
[   18.086906]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
[   18.092649]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
[   18.099361]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
[   18.105880]  [<ffffffff810ee502>] __module_address+0x1d2/0x1e0
[   18.112400]  [<ffffffff81161153>] jump_label_module_notify+0x143/0x1e0
[   18.119710]  [<ffffffff810814bf>] notifier_call_chain+0x4f/0x70
[   18.126326]  [<ffffffff8108160e>] __blocking_notifier_call_chain+0x5e/0x90
[   18.134009]  [<ffffffff81081656>] blocking_notifier_call_chain+0x16/0x20
[   18.141490]  [<ffffffff810f0f00>] load_module+0x1b50/0x2660
[   18.147720]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
[   18.154045]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
[   18.160748] ---[ end trace 9294429076a9c674 ]---

Jump labels is not doing it right; fix this.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jason Baron <jbaron@akamai.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-27 11:09:50 +09:30
Peter Zijlstra
926a59b1df module: Annotate module version magic
Due to the new lockdep checks in the coming patch, we go:

[    9.759380] ------------[ cut here ]------------
[    9.759389] WARNING: CPU: 31 PID: 597 at ../kernel/module.c:216 each_symbol_section+0x121/0x130()
[    9.759391] Modules linked in:
[    9.759393] CPU: 31 PID: 597 Comm: modprobe Not tainted 4.0.0-rc1+ #65
[    9.759393] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[    9.759396]  ffffffff817d8676 ffff880424567ca8 ffffffff8157e98b 0000000000000001
[    9.759398]  0000000000000000 ffff880424567ce8 ffffffff8105fbc7 ffff880424567cd8
[    9.759400]  0000000000000000 ffffffff810ec160 ffff880424567d40 0000000000000000
[    9.759400] Call Trace:
[    9.759407]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
[    9.759410]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
[    9.759412]  [<ffffffff810ec160>] ? section_objs+0x60/0x60
[    9.759414]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
[    9.759415]  [<ffffffff810ed9c1>] each_symbol_section+0x121/0x130
[    9.759417]  [<ffffffff810eda01>] find_symbol+0x31/0x70
[    9.759420]  [<ffffffff810ef5bf>] load_module+0x20f/0x2660
[    9.759422]  [<ffffffff8104ef10>] ? __do_page_fault+0x190/0x4e0
[    9.759426]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759427]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759433]  [<ffffffff810ae73d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
[    9.759437]  [<ffffffff812fcc0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    9.759439]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759441]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
[    9.759443]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
[    9.759445] ---[ end trace 9294429076a9c644 ]---

As per the comment this site should be fine, but lets wrap it in
preempt_disable() anyhow to placate lockdep.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-27 11:09:50 +09:30
Linus Torvalds
c5db6a3bde Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "One more fix from the timer departement:

    - Handle division of negative nanosecond values proper on 32bit.

      A recent cleanup wrecked the sign handling of the dividend and
      dropped the check for negative divisors"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ktime: Fix ktime_divns to do signed division
2015-05-23 17:57:40 -07:00
Linus Torvalds
1c8df7bd48 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Three small fixes that have been picked up the last few weeks.
  Specifically:

   - Fix a memory corruption issue in NVMe with malignant user
     constructed request.  From Christoph.

   - Kill (now) unused blk_queue_bio(), dm was changed to not need this
     anymore.  From Mike Snitzer.

   - Always use blk_schedule_flush_plug() from the io_schedule() path
     when flushing a plug, fixing a !TASK_RUNNING warning with md.  From
     Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  sched: always use blk_schedule_flush_plug in io_schedule_out
  nvme: fix kernel memory corruption with short INQUIRY buffers
  block: remove export for blk_queue_bio
2015-05-22 15:15:30 -07:00
Michal Hocko
1173ff09b9 watchdog: fix double lock in watchdog_nmi_enable_all
Commit ab992dc38f ("watchdog: Fix merge 'conflict'") has introduced an
obvious deadlock because of a typo.  watchdog_proc_mutex should be
unlocked on exit.

Thanks to Miroslav Benes who was staring at the code with me and noticed
this.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Duh-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-19 10:57:03 -07:00
Shaohua Li
10d784eae2 sched: always use blk_schedule_flush_plug in io_schedule_out
block plug callback could sleep, so we introduce a parameter
'from_schedule' and corresponding drivers can use it to destinguish a
schedule plug flush or a plug finish. Unfortunately io_schedule_out
still uses blk_flush_plug(). This causes below output (Note, I added a
might_sleep() in raid1_unplug to make it trigger faster, but the whole
thing doesn't matter if I add might_sleep). In raid1/10, this can cause
deadlock.

This patch makes io_schedule_out always uses blk_schedule_flush_plug.
This should only impact drivers (as far as I know, raid 1/10) which are
sensitive to the 'from_schedule' parameter.

[  370.817949] ------------[ cut here ]------------
[  370.817960] WARNING: CPU: 7 PID: 145 at ../kernel/sched/core.c:7306 __might_sleep+0x7f/0x90()
[  370.817969] do not call blocking ops when !TASK_RUNNING; state=2 set at [<ffffffff81092fcf>] prepare_to_wait+0x2f/0x90
[  370.817971] Modules linked in: raid1
[  370.817976] CPU: 7 PID: 145 Comm: kworker/u16:9 Tainted: G        W       4.0.0+ #361
[  370.817977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[  370.817983] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
[  370.817985]  ffffffff81cd83be ffff8800ba8cb298 ffffffff819dd7af 0000000000000001
[  370.817988]  ffff8800ba8cb2e8 ffff8800ba8cb2d8 ffffffff81051afc ffff8800ba8cb2c8
[  370.817990]  ffffffffa00061a8 000000000000041e 0000000000000000 ffff8800ba8cba28
[  370.817993] Call Trace:
[  370.817999]  [<ffffffff819dd7af>] dump_stack+0x4f/0x7b
[  370.818002]  [<ffffffff81051afc>] warn_slowpath_common+0x8c/0xd0
[  370.818004]  [<ffffffff81051b86>] warn_slowpath_fmt+0x46/0x50
[  370.818006]  [<ffffffff81092fcf>] ? prepare_to_wait+0x2f/0x90
[  370.818008]  [<ffffffff81092fcf>] ? prepare_to_wait+0x2f/0x90
[  370.818010]  [<ffffffff810776ef>] __might_sleep+0x7f/0x90
[  370.818014]  [<ffffffffa0000c03>] raid1_unplug+0xd3/0x170 [raid1]
[  370.818024]  [<ffffffff81421d9a>] blk_flush_plug_list+0x8a/0x1e0
[  370.818028]  [<ffffffff819e3550>] ? bit_wait+0x50/0x50
[  370.818031]  [<ffffffff819e21b0>] io_schedule_timeout+0x130/0x140
[  370.818033]  [<ffffffff819e3586>] bit_wait_io+0x36/0x50
[  370.818034]  [<ffffffff819e31b5>] __wait_on_bit+0x65/0x90
[  370.818041]  [<ffffffff8125b67c>] ? ext4_read_block_bitmap_nowait+0xbc/0x630
[  370.818043]  [<ffffffff819e3550>] ? bit_wait+0x50/0x50
[  370.818045]  [<ffffffff819e3302>] out_of_line_wait_on_bit+0x72/0x80
[  370.818047]  [<ffffffff810935e0>] ? autoremove_wake_function+0x40/0x40
[  370.818050]  [<ffffffff811de744>] __wait_on_buffer+0x44/0x50
[  370.818053]  [<ffffffff8125ae80>] ext4_wait_block_bitmap+0xe0/0xf0
[  370.818058]  [<ffffffff812975d6>] ext4_mb_init_cache+0x206/0x790
[  370.818062]  [<ffffffff8114bc6c>] ? lru_cache_add+0x1c/0x50
[  370.818064]  [<ffffffff81297c7e>] ext4_mb_init_group+0x11e/0x200
[  370.818066]  [<ffffffff81298231>] ext4_mb_load_buddy+0x341/0x360
[  370.818068]  [<ffffffff8129a1a3>] ext4_mb_find_by_goal+0x93/0x2f0
[  370.818070]  [<ffffffff81295b54>] ? ext4_mb_normalize_request+0x1e4/0x5b0
[  370.818072]  [<ffffffff8129ab67>] ext4_mb_regular_allocator+0x67/0x460
[  370.818074]  [<ffffffff81295b54>] ? ext4_mb_normalize_request+0x1e4/0x5b0
[  370.818076]  [<ffffffff8129ca4b>] ext4_mb_new_blocks+0x4cb/0x620
[  370.818079]  [<ffffffff81290956>] ext4_ext_map_blocks+0x4c6/0x14d0
[  370.818081]  [<ffffffff812a4d4e>] ? ext4_es_lookup_extent+0x4e/0x290
[  370.818085]  [<ffffffff8126399d>] ext4_map_blocks+0x14d/0x4f0
[  370.818088]  [<ffffffff81266fbd>] ext4_writepages+0x76d/0xe50
[  370.818094]  [<ffffffff81149691>] do_writepages+0x21/0x50
[  370.818097]  [<ffffffff811d5c00>] __writeback_single_inode+0x60/0x490
[  370.818099]  [<ffffffff811d630a>] writeback_sb_inodes+0x2da/0x590
[  370.818103]  [<ffffffff811abf4b>] ? trylock_super+0x1b/0x50
[  370.818105]  [<ffffffff811abf4b>] ? trylock_super+0x1b/0x50
[  370.818107]  [<ffffffff811d665f>] __writeback_inodes_wb+0x9f/0xd0
[  370.818109]  [<ffffffff811d69db>] wb_writeback+0x34b/0x3c0
[  370.818111]  [<ffffffff811d70df>] bdi_writeback_workfn+0x23f/0x550
[  370.818116]  [<ffffffff8106bbd8>] process_one_work+0x1c8/0x570
[  370.818117]  [<ffffffff8106bb5b>] ? process_one_work+0x14b/0x570
[  370.818119]  [<ffffffff8106c09b>] worker_thread+0x11b/0x470
[  370.818121]  [<ffffffff8106bf80>] ? process_one_work+0x570/0x570
[  370.818124]  [<ffffffff81071868>] kthread+0xf8/0x110
[  370.818126]  [<ffffffff81071770>] ? kthread_create_on_node+0x210/0x210
[  370.818129]  [<ffffffff819e9322>] ret_from_fork+0x42/0x70
[  370.818131]  [<ffffffff81071770>] ? kthread_create_on_node+0x210/0x210
[  370.818132] ---[ end trace 7b4deb71e68b6605 ]---

V2: don't change ->in_iowait

Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-18 16:06:41 -06:00
Peter Zijlstra
ab992dc38f watchdog: Fix merge 'conflict'
Two watchdog changes that came through different trees had a non
conflicting conflict, that is, one changed the semantics of a variable
but no actual code conflict happened. So the merge appeared fine, but
the resulting code did not behave as expected.

Commit 195daf665a ("watchdog: enable the new user interface of the
watchdog mechanism") changes the semantics of watchdog_user_enabled,
which thereafter is only used by the functions introduced by
b3738d2932 ("watchdog: Add watchdog enable/disable all functions").

There further appears to be a distinct lack of serialization between
setting and using watchdog_enabled, so perhaps we should wrap the
{en,dis}able_all() things in watchdog_proc_mutex.

This patch fixes a s2r failure reported by Michal; which I cannot
readily explain. But this does make the code internally consistent
again.

Reported-and-tested-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-18 10:08:29 -07:00
Linus Torvalds
14db1e8dc0 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two fixes: a suspend/resume related regression fix, and an RT priority
  boosting fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix regression in cpuset_cpu_inactive() for suspend
  sched: Handle priority boosted tasks proper in setscheduler()
2015-05-15 12:42:33 -07:00
Linus Torvalds
ef4a293a44 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also a lockdep annotation fix, a PMU event
  list fix and a new model addition"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools/liblockdep: Fix compilation error
  tools/liblockdep: Fix linker error in case of cross compile
  perf tools: Use getconf to determine number of online CPUs
  tools: Fix tools/vm build
  perf/x86/rapl: Enable Broadwell-U RAPL support
  perf/x86/intel: Fix SLM cache event list
  perf: Annotate inherited event ctx->mutex recursion
2015-05-15 12:38:21 -07:00
John Stultz
f7bcb70eba ktime: Fix ktime_divns to do signed division
It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.

And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
  https://bugzilla.redhat.com/show_bug.cgi?id=1200353

This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.

Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.

[ tglx: Hand an u64 to do_div() to avoid the compiler warning ]

Fixes: 166afb6451 'ktime: Sanitize ktime_to_us/ms conversion'
Reported-and-tested-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-13 10:19:35 +02:00
Linus Torvalds
9d88f22a81 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "Two patches from the irq departement:

   - a simple fix to make dummy_irq_chip usable for wakeup scenarios

   - removal of the gic arch_extn hackery.  Now that all users are
     converted we really want to get rid of the interface so people wont
     come up with new use cases"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: gic: Drop support for gic_arch_extn
  genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
2015-05-09 14:59:05 -07:00
Linus Torvalds
95f3b1f4b1 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A simple fix to actually shut down a detached device instead of
  keeping it active"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Shutdown detached clockevent device
2015-05-09 14:57:49 -07:00
Linus Torvalds
26b293e854 Merge tag 'trace-fixes-v4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
 "The newly added ftrace_print_array_seq() function had a bug in it.
  Luckily, the only user of it didn't make the 4.1 merge window.

  But the helper function should be fixed before 4.2 when the users
  start coming in"

* tag 'trace-fixes-v4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Make ftrace_print_array_seq compute buf_len
2015-05-08 18:22:05 -07:00
Peter Zijlstra
8b10c5e2b5 perf: Annotate inherited event ctx->mutex recursion
While fuzzing Sasha tripped over another ctx->mutex recursion lockdep
splat. Annotate this.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 11:59:40 +02:00
Omar Sandoval
533445c6e5 sched/core: Fix regression in cpuset_cpu_inactive() for suspend
Commit 3c18d447b3 ("sched/core: Check for available DL bandwidth in
cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
caused a regression in setting a CPU inactive during suspend. I ran into
this when a program was failing pthread_setaffinity_np() with EINVAL after
a suspend+wake up.

A simple reproducer:

	$ ./a.out
	sched_setaffinity: Success
	$ systemctl suspend
	$ ./a.out
	sched_setaffinity: Invalid argument

... where ./a.out is:

	#define _GNU_SOURCE
	#include <errno.h>
	#include <sched.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include <unistd.h>

	int main(void)
	{
		long num_cores;
		cpu_set_t cpu_set;
		int ret;

		num_cores = sysconf(_SC_NPROCESSORS_ONLN);
		CPU_ZERO(&cpu_set);
		CPU_SET(num_cores - 1, &cpu_set);
		errno = 0;
		ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
		perror("sched_setaffinity");
		return ret ? EXIT_FAILURE : EXIT_SUCCESS;
	}

The mistake is that suspend is handled in the action ==
CPU_DOWN_PREPARE_FROZEN case of the switch statement in
cpuset_cpu_inactive().

However, the commit in question masked out CPU_TASKS_FROZEN
from the action, making this case dead.

The fix is straightforward.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 3c18d447b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 11:53:56 +02:00
Thomas Gleixner
0782e63bc6 sched: Handle priority boosted tasks proper in setscheduler()
Ronny reported that the following scenario is not handled correctly:

	T1 (prio = 10)
	   lock(rtmutex);

	T2 (prio = 20)
	   lock(rtmutex)
	      boost T1

	T1 (prio = 20)
	   sys_set_scheduler(prio = 30)
	   T1 prio = 30
	   ....
	   sys_set_scheduler(prio = 10)
	   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

Commit c365c292d0 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus <ronny.meeus@gmail.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Fixes: c365c292d0 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 11:53:55 +02:00
Alex Bennée
ac01ce1410 tracing: Make ftrace_print_array_seq compute buf_len
The only caller to this function (__print_array) was getting it wrong by
passing the array length instead of buffer length. As the element size
was already being passed for other reasons it seems reasonable to push
the calculation of buffer length into the function.

Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.org

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-06 23:03:23 -04:00
Linus Torvalds
02f0f5721e Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU fix from Ingo Molnar:
 "An RCU Kconfig fix that eliminates an annoying interactive kconfig
  question for CONFIG_RCU_TORTURE_TEST_SLOW_INIT"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Control grace-period delays directly from value
2015-05-06 10:26:37 -07:00
Linus Torvalds
6c3c1eb3c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Receive packet length needs to be adjust by 2 on RX to accomodate
    the two padding bytes in altera_tse driver.  From Vlastimil Setka.

 2) If rx frame is dropped due to out of memory in macb driver, we leave
    the receive ring descriptors in an undefined state.  From Punnaiah
    Choudary Kalluri

 3) Some netlink subsystems erroneously signal NLM_F_MULTI.  That is
    only for dumps.  Fix from Nicolas Dichtel.

 4) Fix mis-use of raw rt->rt_pmtu value in ipv4, one must always go via
    the ipv4_mtu() helper.  From Herbert Xu.

 5) Fix null deref in bridge netfilter, and miscalculated lengths in
    jump/goto nf_tables verdicts.  From Florian Westphal.

 6) Unhash ping sockets properly.

 7) Software implementation of BPF divide did 64/32 rather than 64/64
    bit divide.  The JITs got it right.  Fix from Alexei Starovoitov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
  ipv4: Missing sk_nulls_node_init() in ping_unhash().
  net: fec: Fix RGMII-ID mode
  net/mlx4_en: Schedule napi when RX buffers allocation fails
  netxen_nic: use spin_[un]lock_bh around tx_clean_lock
  net/mlx4_core: Fix unaligned accesses
  mlx4_en: Use correct loop cursor in error path.
  cxgb4: Fix MC1 memory offset calculation
  bnx2x: Delay during kdump load
  net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table
  net: dsa: Fix scope of eeprom-length property
  net: macb: Fix race condition in driver when Rx frame is dropped
  hv_netvsc: Fix a bug in netvsc_start_xmit()
  altera_tse: Correct rx packet length
  mlx4: Fix tx ring affinity_mask creation
  tipc: fix problem with parallel link synchronization mechanism
  tipc: remove wrong use of NLM_F_MULTI
  bridge/nl: remove wrong use of NLM_F_MULTI
  bridge/mdb: remove wrong use of NLM_F_MULTI
  net: sched: act_connmark: don't zap skb->nfct
  trivial: net: systemport: bcmsysport.h: fix 0x0x prefix
  ...
2015-05-01 20:51:04 -07:00
Linus Torvalds
4a152c3913 Merge tag 'pm+acpi-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
 "Three regression fixes this time, one for a recent regression in the
  cpuidle core affecting multiple systems, one for an inadvertently
  added duplicate typedef in ACPICA that breaks compilation with GCC 4.5
  and one for an ACPI Smart Battery Subsystem driver regression
  introduced during the 3.18 cycle (stable-candidate).

  Specifics:

   - Fix for a regression in the cpuidle core introduced by one of the
     recent commits in the clockevents_notify() removal series that put
     a call to a function which had to be executed with disabled
     interrupts into a code path running with enabled interrupts (Rafael
     J Wysocki)

   - Fix for a build problem in ACPICA (with GCC 4.5) introduced by one
     of the recent ACPICA tools commits that added a duplicate typedef
     to one of the ACPICA's header files by mistake (Olaf Hering)

   - Fix for a regression in the ACPI SBS (Smart Battery Subsystem)
     driver introduced during the 3.18 development cycle causing the
     smart battery manager to be marked as not present when it should be
     marked as present (Chris Bainbridge)"

* tag 'pm+acpi-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: Run tick_broadcast_exit() with disabled interrupts
  ACPI / SBS: Enable battery manager when present
  ACPICA: remove duplicate u8 typedef
2015-04-30 14:23:31 -07:00
Linus Torvalds
9dbbe3cfc3 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm changes from Paolo Bonzini:
 "Remove from guest code the handling of task migration during a pvclock
  read; instead use the correct protocol in KVM.

  This removes the need for task migration notifiers in core scheduler
  code"

[ The scheduler people really hated the migration notifiers, so this was
  kind of required  - Linus ]

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86: pvclock: Really remove the sched notifier for cross-cpu migrations
  kvm: x86: fix kvmclock update protocol
2015-04-30 09:44:04 -07:00
David Howells
9c4249c8e0 modsign: change default key details
Change default key details to be more obviously unspecified.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-30 09:35:41 -07:00
Rafael J. Wysocki
df8d9eeadd cpuidle: Run tick_broadcast_exit() with disabled interrupts
Commit 335f49196f (sched/idle: Use explicit broadcast oneshot
control function) replaced clockevents_notify() invocations in
cpuidle_idle_call() with direct calls to tick_broadcast_enter()
and tick_broadcast_exit(), but it overlooked the fact that
interrupts were already enabled before calling the latter which
led to functional breakage on systems using idle states with the
CPUIDLE_FLAG_TIMER_STOP flag set.

Fix that by moving the invocations of tick_broadcast_enter()
and tick_broadcast_exit() down into cpuidle_enter_state() where
interrupts are still disabled when tick_broadcast_exit() is
called.  Also ensure that interrupts will be disabled before
running tick_broadcast_exit() even if they have been enabled by
the idle state's ->enter callback.  Trigger a WARN_ON_ONCE() in
that case, as we generally don't want that to happen for states
with CPUIDLE_FLAG_TIMER_STOP set.

Fixes: 335f49196f (sched/idle: Use explicit broadcast oneshot control function)
Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-29 15:19:21 +02:00
Linus Torvalds
14bc84ce0b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "One additional new feature for 4.1, a new PRNG based on SHA-512 for
  the zcrypt driver.

  Two memory management related changes, the page table reallocation for
  KVM is removed, and with file ptes gone the encoding of page table
  entries is improved.

  And three bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: Introduce new SHA-512 based Pseudo Random Generator.
  s390/mm: change swap pte encoding and pgtable cleanup
  s390/mm: correct transfer of dirty & young bits in __pmd_to_pte
  s390/bpf: add dependency to z196 features
  s390/3215: free memory in error path
  s390/kvm: remove delayed reallocation of page tables for KVM
  kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFP
2015-04-28 09:58:46 -07:00
Alexei Starovoitov
876a7ae65b bpf: fix 64-bit divide
ALU64_DIV instruction should be dividing 64-bit by 64-bit,
whereas do_div() does 64-bit by 32-bit divide.
x64 and arm64 JITs correctly implement 64 by 64 unsigned divide.
llvm BPF backend emits code assuming that ALU64_DIV does 64 by 64.

Fixes: 89aa075832 ("net: sock: allow eBPF programs to be attached to sockets")
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-27 23:11:49 -04:00
Paolo Bonzini
73459e2a1a x86: pvclock: Really remove the sched notifier for cross-cpu migrations
This reverts commits 0a4e6be9ca
and 80f7fdb1c7.

The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC.  Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.

As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.

Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 15:49:30 +02:00
Linus Torvalds
9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
Viresh Kumar
149aabcc44 clockevents: Shutdown detached clockevent device
A clockevent device is marked DETACHED when it is replaced by another
clockevent device.

The device is shutdown properly for drivers that implement legacy
->set_mode() callback, as we call ->set_mode() for CLOCK_EVT_MODE_UNUSED
as well.

But for the new per-state callback interface, we skip shutting down the
device, as we thought its an internal state change. That wasn't correct.

The effect is that the device is left programmed in oneshot or periodic
mode.

Fall-back to 'case CLOCK_EVT_STATE_SHUTDOWN', to shutdown the device.

Fixes: bd624d75db "clockevents: Introduce mode specific callbacks"
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/eef0a91c51b74d4e52c8e5a95eca27b5a0563f07.1428650683.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 21:01:48 +02:00
Roger Quadros
10a50f1ab5 genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
Without this system suspend is broken on systems that have
drivers calling enable/disable_irq_wake() for interrupts based off
the dummy irq hook. (e.g. drivers/gpio/gpio-pcf857x.c)

Signed-off-by: Roger Quadros <rogerq@ti.com>
Cc: <cw00.choi@samsung.com>
Cc: <balbi@ti.com>
Cc: <tony@atomide.com>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Link: http://lkml.kernel.org/r/552E1DD3.4040106@ti.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 20:57:06 +02:00
Martin Schwidefsky
7e01b5acd8 kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFP
Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code
to override the gfp flags of the allocation for the kexec control
page. The loop in kimage_alloc_normal_control_pages allocates pages
with GFP_KERNEL until a page is found that happens to have an
address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems
with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT
the loop will keep allocating memory until the oom killer steps in.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-23 16:52:01 +02:00
Linus Torvalds
27cf3a16b2 Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit fixes from Paul Moore:
 "Seven audit patches for v4.1, all bug fixes.

  The largest, and perhaps most significant commit helps resolve some
  memory pressure issues related to the inode cache and audit, there are
  also a few small commits which help resolve some timing issues with
  the audit log queue, and the rest fall into the always popular "code
  clean-up" category.

  In general, nothing really substantial, just a nice set of maintenance
  patches"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
  audit: Remove condition which always evaluates to false
  audit: reduce mmap_sem hold for mm->exe_file
  audit: consolidate handling of mm->exe_file
  audit: code clean up
  audit: don't reset working wait time accidentally with auditd
  audit: don't lose set wait time on first successful call to audit_log_start()
  audit: move the tree pruning to a dedicated thread
2015-04-22 14:49:23 -07:00
Linus Torvalds
4f2112351b Merge tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "This adds three fixes for the tracing code.

  The first is a bug when ftrace_dump_on_oops is triggered in atomic
  context and function graph tracer is the tracer that is being
  reported.

  The second fix is bad parsing of the trace_events from the kernel
  command line, where it would ignore specific events if the system name
  is used with defining the event(it enables all events within the
  system).

  The last one is a fix to the TRACE_DEFINE_ENUM(), where a check was
  missing to see if the ptr was incremented to the end of the string,
  but the loop increments it again and can miss the nul delimiter to
  stop processing"

* tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix possible out of bounds memory access when parsing enums
  tracing: Fix incorrect enabling of trace events by boot cmdline
  tracing: Handle ftrace_dump() atomic context in graph_trace_open()
2015-04-22 11:27:36 -07:00
Linus Torvalds
15ce2658dd Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
 "Quentin opened a can of worms by adding extable entry checking to
  modpost, but most architectures seem fixed now.  Thanks to all
  involved.

  Last minute rebase because I noticed a "[PATCH]" had snuck into a
  commit message somehow"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modpost: don't emit section mismatch warnings for compiler optimizations
  modpost: expand pattern matching to support substring matches
  modpost: do not try to match the SHT_NUL section.
  modpost: fix extable entry size calculation.
  modpost: fix inverted logic in is_extable_fault_address().
  modpost: handle -ffunction-sections
  modpost: Whitelist .text.fixup and .exception.text
  params: handle quotes properly for values not of form foo="bar".
  modpost: document the use of struct section_check.
  modpost: handle relocations mismatch in __ex_table.
  scripts: add check_extable.sh script.
  modpost: mismatch_handler: retrieve tosym information only when needed.
  modpost: factorize symbol pretty print in get_pretty_name().
  modpost: add handler function pointer to sectioncheck.
  modpost: add .sched.text and .kprobes.text to the TEXT_SECTIONS list.
  modpost: add strict white-listing when referencing sections.
  module: do not print allocation-fail warning on bogus user buffer size
  kernel/module.c: fix typos in message about unused symbols
2015-04-22 09:49:24 -07:00
Linus Torvalds
1fc149933f Merge tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
 "Here's the big char/misc driver patchset for 4.1-rc1.

  Lots of different driver subsystem updates here, nothing major, full
  details are in the shortlog.

  All of this has been in linux-next for a while"

* tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (133 commits)
  mei: trace: remove unused TRACE_SYSTEM_STRING
  DTS: ARM: OMAP3-N900: Add lis3lv02d support
  Documentation: DT: lis302: update wakeup binding
  lis3lv02d: DT: add wakeup unit 2 and wakeup threshold
  lis3lv02d: DT: use s32 to support negative values
  Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case
  Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case
  mei: replace check for connection instead of transitioning
  mei: use mei_cl_is_connected consistently
  mei: fix mei_poll operation
  hv_vmbus: Add gradually increased delay for retries in vmbus_post_msg()
  Drivers: hv: hv_balloon: survive ballooning request with num_pages=0
  Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function
  Drivers: hv: hv_balloon: do not online pages in offline blocks
  hv: remove the per-channel workqueue
  hv: don't schedule new works in vmbus_onoffer()/vmbus_onoffer_rescind()
  hv: run non-blocking message handlers in the dispatch tasklet
  coresight: moving to new "hwtracing" directory
  coresight-tmc: Adding a status interface to sysfs
  coresight: remove the unnecessary configuration coresight-default-sink
  ...
2015-04-21 09:42:58 -07:00