Commit Graph

48594 Commits

Author SHA1 Message Date
Christoph Hellwig
f11bb3e244 nvme: add a local nvme.h header
Add a new drivers/block/nvme.h which contains all the driver internal
interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09 10:40:37 -06:00
Keith Busch
0a7385ad69 NVMe: Simplify device resume on io queue failure
Releasing IO queues and disks was done in a work queue outside the
controller resume context to delete namespaces if the controller failed
after a resume from suspend. This is unnecessary since we can resume
a device asynchronously.

This patch makes resume use probe_work so it can directly remove
namespaces if the device is manageable but not IO capable. Since the
deleting disks was the only reason we had the convoluted "reset_workfn",
this patch removes that unnecessary indirection.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09 10:40:36 -06:00
Keith Busch
188c3568f8 NVMe: Reference count open namespaces
Dynamic namespace attachment means the namespace may be removed at any
time, so the namespace reference count can not be tied to the device
reference count. This fixes a NULL dereference if an opened namespace
is detached from a controller.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09 10:40:36 -06:00
Marc Zyngier
10abc7df92 irqdomain: Add an accessor for the of_node field
As we're about to remove the of_node field from the irqdomain
structure, introduce an accessor for it. Subsequent patches
will take care of the actual repainting.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1444402211-1141-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09 17:17:30 +02:00
Yaowei Bai
0cbf334376 net/core: lockdep_rtnl_is_held can be boolean
This patch makes lockdep_rtnl_is_held return bool due to this
particular function only using either one or zero as its return
value.

In another patch lockdep_is_held is also made return bool.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:06 -07:00
Yaowei Bai
f06cc7b284 net/inetdevice: bad_mask can be boolean
This patch makes bad_mask return bool due to this particular function
only using either one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:05 -07:00
Yaowei Bai
c3225164cf net/inetdevice: inet_ifa_match can be boolean
This patch makes inet_ifa_match return bool due to this
particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:03 -07:00
Yaowei Bai
0c6119d99b net/dccp: dccp_list_has_service can be boolean
This patch makes dccp_list_has_service return bool due to this
particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:02 -07:00
Yaowei Bai
d6fbaea5f6 net/can: can_dropped_invalid_skb can be boolean
This patch makes can_dropped_invalid_skb return bool due to this
particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:01 -07:00
Yaowei Bai
875e082949 net/nfnetlink: lockdep_nfnl_is_held can be boolean
This patch makes lockdep_nfnl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:00 -07:00
Yaowei Bai
35498edc64 net/ieee80211: ieee80211_is_* can be boolean
This patch makes ieee80211_is_* return bool to improve
readability due to these particular functions only using either
one or zero as their return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:48:59 -07:00
Yaowei Bai
61d03535e4 net/netlink: lockdep_genl_is_held can be boolean
This patch makes lockdep_genl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:48:59 -07:00
Eli Cohen
ac6ea6e81a net/mlx5_core: Use private health thread for each device
Use a single threaded work queue for each device in the system instead of
using one thread for any device. This is required so we can concurrently
process system error handling for all the devices that need that.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:27:49 -07:00
Eli Cohen
020446e01e net/mlx5_core: Prepare cmd interface to system errors handling
In preparation to handling system errors at the mlx5_core level, change the
interface of cmd_work_handler to accept a 64 bit argument for the vector.

This allows to encode a flag that signifies when the handler is called
as a result of a driver logic that wishes to terminate commands that
the hardware may not be able to terminate. Such command completions
are detected at the handler and proper return status is encoded.

To be able to terminate page handler commands, we make sure to set
the corresponding bit in the bitmask.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:27:48 -07:00
Punit Agrawal
38a1bdc9ff firmware: arm_scpi: Extend to support sensors
ARM System Control Processor (SCP) provides an API to query and use
the sensors available in the system. Extend the SCPI driver to support
 sensor messages.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
2015-10-09 11:05:52 +01:00
Arnd Bergmann
f3c65c2892 Merge tag 'at91-cleanup-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux into next/drivers
Merge "First batch of cleanups for 4.4:" from Alexandre Belloni:
 - properly get the slow clock from timer-atmel-st, tcb_clksrc and pwm-atmel-tcb
 - small fix in an error path for tcb_clksrc

* tag 'at91-cleanup-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  misc: atmel_tclib: get and use slow clock
  clocksource: tcb_clksrc: fix setup_clkevents error path
  clocksource: atmel-st: get and use slow clock
2015-10-08 17:26:27 +02:00
Trond Myklebust
120bf961b9 Merge branch 'sunrpc'
* sunrpc:
  SUNRPC: Use MSG_SENDPAGE_NOTLAST in xs_send_pagedata()
  SUNRPC: Move AF_LOCAL receive data path into a workqueue context
  SUNRPC: Move UDP receive data path into a workqueue context
  SUNRPC: Move TCP receive data path into a workqueue context
  SUNRPC: Refactor TCP receive
2015-10-08 10:46:19 -04:00
Trond Myklebust
516285ebe0 NFSv4: nfs4_async_handle_error should take a non-const nfs_server
For symmetry with the synchronous handler, and so that we can potentially
handle errors such as NFS4ERR_BADNAME.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08 10:45:52 -04:00
Trond Myklebust
edc1b01cd3 SUNRPC: Move TCP receive data path into a workqueue context
Stream protocols such as TCP can often build up a backlog of data to be
read due to ordering. Combine this with the fact that some workloads such
as NFS read()-intensive workloads need to receive a lot of data per RPC
call, and it turns out that receiving the data from inside a softirq
context can cause starvation.

The following patch moves the TCP data receive into a workqueue context.
We still end up calling tcp_read_sock(), but we do so from a process
context, meaning that softirqs are enabled for most of the time.

With this patch, I see a doubling of read bandwidth when running a
multi-threaded iozone workload between a virtual client and server setup.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08 08:27:04 -04:00
Daniel Borkmann
3ad0040573 bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs
While recently arguing on a seccomp discussion that raw prandom_u32()
access shouldn't be exposed to unpriviledged user space, I forgot the
fact that SKF_AD_RANDOM extension actually already does it for some time
in cBPF via commit 4cd3675ebf ("filter: added BPF random opcode").

Since prandom_u32() is being used in a lot of critical networking code,
lets be more conservative and split their states. Furthermore, consolidate
eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF,
bpf_get_prandom_u32() was only accessible for priviledged users, but
should that change one day, we also don't want to leak raw sequences
through things like eBPF maps.

One thought was also to have own per bpf_prog states, but due to ABI
reasons this is not easily possible, i.e. the program code currently
cannot access bpf_prog itself, and copying the rnd_state to/from the
stack scratch space whenever a program uses the prng seems not really
worth the trouble and seems too hacky. If needed, taus113 could in such
cases be implemented within eBPF using a map entry to keep the state
space, or get_random_bytes() could become a second helper in cases where
performance would not be critical.

Both sides can trigger a one-time late init via prandom_init_once() on
the shared state. Performance-wise, there should even be a tiny gain
as bpf_user_rnd_u32() saves one function call. The PRNG needs to live
inside the BPF core since kernels could have a NET-less config as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:39 -07:00
Daniel Borkmann
897ece56e7 random32: add prandom_init_once helper for own rngs
Add a prandom_init_once() facility that works on the rnd_state, so that
users that are keeping their own state independent from prandom_u32() can
initialize their taus113 per cpu states.

The motivation here is similar to net_get_random_once(): initialize the
state as late as possible in the hope that enough entropy has been
collected for the seeding. prandom_init_once() makes use of the recently
introduced prandom_seed_full_state() helper and is generic enough so that
it could also be used on fast-paths due to the DO_ONCE().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:38 -07:00
Hannes Frederic Sowa
c90aeb9482 once: make helper generic for calling functions once
Make the get_random_once() helper generic enough, so that functions
in general would only be called once, where one user of this is then
net_get_random_once().

The only implementation specific call is to get_random_bytes(), all
the rest of this *_once() facility would be duplicated among different
subsystems otherwise. The new DO_ONCE() helper will be used by prandom()
later on, but might also be useful for other scenarios/subsystems as
well where a one-time initialization in often-called, possibly fast
path code could occur.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:36 -07:00
Hannes Frederic Sowa
46234253b9 net: move net_get_random_once to lib
There's no good reason why users outside of networking should not
be using this facility, f.e. for initializing their seeds.

Therefore, make it accessible from there as get_random_once().

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:35 -07:00
Arun Parameswaran
8e185d6997 net: phy: Broadcom Cygnus internal Etherent PHY driver
Add support for the Broadcom Cygnus SoCs internal PHY's.
The PHYs are 1000M/100M/10M capable with support for 'EEE'
and 'APD' (Auto Power Down).

This driver supports the following Broadcom Cygnus SoCs:
 - BCM583XX (BCM58300, BCM58302, BCM58303, BCM58305)
 - BCM113XX (BCM11300, BCM11320, BCM11350, BCM11360)

The PHY's on these SoC's require some workarounds for
stable operation, both during configuration time and
during suspend/resume. This driver handles the
application of the workarounds.

Signed-off-by: Arun Parameswaran <arunp@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:45:52 -07:00
Arun Parameswaran
a1cba5613e net: phy: Add Broadcom phy library for common interfaces
This patch adds the Broadcom phy library to consolidate common
interfaces shared by Broadcom phy's.

Moved the common interfaces to the 'bcm-phy-lib.c' and updated
the Broadcom PHY drivers to use the new APIs.

Signed-off-by: Arun Parameswaran <arunp@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:45:46 -07:00
David S. Miller
61d0372028 Merge tag 'regmap-offload-update-bits' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
regmap: Allow buses to provide a custom update_bits() operation

Some buses provide a native _update_bits() operation which for uncached
registers is faster than doing a read/modify/write cycle as it is a
single bus transaction.  Add support for implementing this to regmap.
2015-10-08 04:01:28 -07:00
Ingo Molnar
d3df65c198 Merge branch 'perf/urgent' into perf/core, before pulling new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-08 10:52:18 +02:00
Paul E. McKenney
39cd2dd39a Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and 'torture.2015.10.06a' into HEAD
doc.2015.10.06a:  Documentation updates.
percpu-rwsem.2015.10.06a:  Optimization of per-CPU reader-writer semaphores.
torture.2015.10.06a:  Torture-test updates.
2015-10-07 16:06:25 -07:00
Paul E. McKenney
d2856b046d Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEAD
exp.2015.10.07a:  Reduce OS jitter of RCU-sched expedited grace periods.
fixes.2015.10.06a:  Miscellaneous fixes.
2015-10-07 16:05:21 -07:00
Paul E. McKenney
02ef3c4a2a cpu: Remove try_get_online_cpus()
Now that synchronize_sched_expedited() no longer uses it, there are
no users of try_get_online_cpus() in mainline.  This commit therefore
removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2015-10-07 16:02:49 -07:00
Emilio López
18800fc7a0 platform/chrome: Support reading/writing the vboot context
Some EC implementations include a small nvram space used to store
verified boot context data. This patch offers a way to expose this
data to userspace.

Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Emilio López <emilio.lopez@collabora.co.uk>
Signed-off-by: Olof Johansson <olof@lixom.net>
2015-10-07 15:05:53 -07:00
Emilio López
7f5028cf61 sysfs: Support is_visible() on binary attributes
According to the sysfs header file:

    "The returned value will replace static permissions defined in
     struct attribute or struct bin_attribute."

but this isn't the case, as is_visible is only called on struct attribute
only. This patch introduces a new is_bin_visible() function to implement
the same functionality for binary attributes, and updates documentation
accordingly.

Note that to keep functionality and code similar to that of normal
attributes, the mode is now checked as well to ensure it contains only
read/write permissions or SYSFS_PREALLOC.

Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Emilio López <emilio.lopez@collabora.co.uk>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2015-10-07 15:05:31 -07:00
Alan Tull
6a8c3be7ec add FPGA manager core
API to support programming FPGA's.

The following functions are exported as GPL:
* fpga_mgr_buf_load
   Load fpga from image in buffer

* fpga_mgr_firmware_load
   Request firmware and load it to the FPGA.

* fpga_mgr_register
* fpga_mgr_unregister
   FPGA device drivers can be added by calling
   fpga_mgr_register() to register a set of
   fpga_manager_ops to do device specific stuff.

* of_fpga_mgr_get
* fpga_mgr_put
   Get/put a reference to a fpga manager.

The following sysfs files are created:
* /sys/class/fpga_manager/<fpga>/name
  Name of low level driver.

* /sys/class/fpga_manager/<fpga>/state
  State of fpga manager

Signed-off-by: Alan Tull <atull@opensource.altera.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-07 18:08:15 +01:00
Mathieu Poirier
1d27ff5aba coresight: fixing typographical error
Tracing gets enabled _for_ a source rather than _from_ a source.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-07 17:54:09 +01:00
David S. Miller
2579c98f0d Merge tag 'mac80211-next-for-davem-2015-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
For the current cycle, we have the following right now:
 * many internal fixes, API improvements, cleanups, etc.
 * full AP client state tracking in cfg80211/mac80211 from Ayala
 * VHT support (in mac80211) for mesh
 * some A-MSDU in A-MPDU support from Emmanuel
 * show current TX power to userspace (from Rafał)
 * support for netlink dump in vendor commands (myself)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:29:18 -07:00
David Ahern
fee6d4c777 net: Add netif_is_l3_slave
IPv6 addrconf keys off of IFF_SLAVE so can not use it for L3 slave.
Add a new private flag and add netif_is_l3_slave function for checking
it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:27:43 -07:00
Paul Gortmaker
b4eb6cdbbd PCI: Add builtin_pci_driver() to avoid registration boilerplate
In f309d44431 ("platform_device: better support builtin boilerplate
avoidance"), we introduced the builtin_driver() macro.

Here we use that support and extend it to PCI driver registration, so where
a driver is clearly non-modular and builtin-only, we can register it in a
similar fashion.  Existing code that is clearly non-modular can be updated
with the simple mapping of

  module_pci_driver(...)  ---> builtin_pci_driver(...)

We've essentially cloned the former to make the latter, and taken out the
remove/module_exit parts since those never get used in a non-modular build
of the code.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-10-06 14:24:18 -05:00
Oleg Nesterov
4bace7344d rcu_sync: Cleanup the CONFIG_PROVE_RCU checks
1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and
   change it to use rcu_lockdep_assert().

2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE
   unconditonally, this way we can remove the same check from
   rcu_sync_lockdep_assert() and clearly isolate the debugging
   code.

Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs
another CONFIG_PROVE_RCU check, the same as is done in ->sync(); but
this needs some simple preparations in the core RCU code to avoid the
code duplication.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:45 -07:00
Oleg Nesterov
001dac627f locking/percpu-rwsem: Make use of the rcu_sync infrastructure
Currently down_write/up_write calls synchronize_sched_expedited()
twice, which is evil.  Change this code to rely on rcu-sync primitives.
This avoids the _expedited "big hammer", and this can be faster in
the contended case or even in the case when a single thread does
down_write/up_write in a loop.

Of course, a single down_write() will take more time, but otoh it
will be much more friendly to the whole system.

To simplify the review this patch doesn't update the comments, fixed
by the next change.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:31 -07:00
Oleg Nesterov
07899a6e5f rcu_sync: Introduce rcu_sync_dtor()
This commit allows rcu_sync structures to be safely deallocated,
The trick is to add a new ->wait field to the gp_ops array.
This field is a pointer to the rcu_barrier() function corresponding
to the flavor of RCU in question.  This allows a new rcu_sync_dtor()
to wait for any outstanding callbacks before freeing the rcu_sync
structure.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:21 -07:00
Oleg Nesterov
3a518b76af rcu_sync: Add CONFIG_PROVE_RCU checks
This commit validates that the caller of rcu_sync_is_idle() holds the
corresponding type of RCU read-side lock, but only in kernels built
with CONFIG_PROVE_RCU=y.  This validation is carried out via a new
rcu_sync_ops->held() method that is checked within rcu_sync_is_idle().

Note that although this does add code to the fast path, it only does so
in kernels built with CONFIG_PROVE_RCU=y.

Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:16 -07:00
Oleg Nesterov
82e8c565be rcu_sync: Simplify rcu_sync using new rcu_sync_ops structure
This commit adds the new struct rcu_sync_ops which holds sync/call
methods, and turns the function pointers in rcu_sync_struct into an array
of struct rcu_sync_ops.  This simplifies the "init" helpers by collapsing
a switch statement and explicit multiple definitions into a simple
assignment and a helper macro, respectively.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:10 -07:00
Oleg Nesterov
cc44ca848f rcu: Create rcu_sync infrastructure
The rcu_sync infrastructure can be thought of as infrastructure to be
used to implement reader-writer primitives having extremely lightweight
readers during times when there are no writers.  The first use is in
the percpu_rwsem used by the VFS subsystem.

This infrastructure is functionally equivalent to

        struct rcu_sync_struct {
                atomic_t counter;
        };

	/* Check possibility of fast-path read-side operations. */
        static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
        {
                return atomic_read(&rss->counter) == 0;
        }

	/* Tell readers to use slowpaths. */
        static inline void rcu_sync_enter(struct rcu_sync_struct *rss)
        {
                atomic_inc(&rss->counter);
                synchronize_sched();
        }

	/* Allow readers to once again use fastpaths. */
        static inline void rcu_sync_exit(struct rcu_sync_struct *rss)
        {
                synchronize_sched();
                atomic_dec(&rss->counter);
        }

The main difference is that it records the state and only calls
synchronize_sched() if required.  At least some of the calls to
synchronize_sched() will be optimized away when rcu_sync_enter() and
rcu_sync_exit() are invoked repeatedly in quick succession.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:04 -07:00
Paul E. McKenney
7f5f873c6a rculist: Use WRITE_ONCE() when deleting from reader-visible list
The various RCU list-deletion macros (list_del_rcu(),
hlist_del_init_rcu(), hlist_del_rcu(), hlist_bl_del_init_rcu(),
hlist_bl_del_rcu(), hlist_nulls_del_init_rcu(), and hlist_nulls_del_rcu())
do plain stores into the ->next pointer of the preceding list elemment.
Unfortunately, the compiler is within its rights to (for example) use
byte-at-a-time writes to update the pointer, which would fatally confuse
concurrent readers.  This patch therefore adds the needed WRITE_ONCE()
macros.

KernelThreadSanitizer (KTSAN) reported the __hlist_del() issue, which
is a problem when __hlist_del() is invoked by hlist_del_rcu().

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:16:42 -07:00
Paul E. McKenney
e62e3f620b rcu: Remove deprecated rcu_lockdep_assert()
The old rcu_lockdep_assert() was retained to ease handling of incoming
patches, but any use will result in deprecated warnings.  However, its
replacement, RCU_LOCKDEP_WARN(), is now upstream.  It is therefore
time to remove rcu_lockdep_assert(), which this commit does.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:16:31 -07:00
Patrick Marlier
8db70b132d rculist: Make list_entry_rcu() use lockless_dereference()
The current list_entry_rcu() implementation copies the pointer to a stack
variable, then invokes rcu_dereference_raw() on it.  This results in an
additional store-load pair.  Now, most compilers will emit normal store
and load instructions, which might seem to be of negligible overhead,
but this results in a load-hit-store situation that can cause surprisingly
long pipeline stalls, even on modern microprocessors.  The problem is
that it takes time for the store to get the store buffer updated, which
can delay the subsequent load, which immediately follows.

This commit therefore switches to the lockless_dereference() primitive,
which does not expect the __rcu annotations (that are anyway not present
in the list_head structure) and which, like rcu_dereference_raw(),
does not check for an enclosing RCU read-side critical section.
Most importantly, it does not copy the pointer, thus avoiding the
load-hit-store overhead.

Signed-off-by: Patrick Marlier <patrick.marlier@gmail.com>
[ paulmck: Switched to lockless_dereference() to suppress sparse warnings. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:16:18 -07:00
Paul E. McKenney
c3ac7cf184 rcu: Add rcu_pointer_handoff()
This commit adds an rcu_pointer_handoff() that is intended to mark
situations where a structure's protection transitions from RCU to some
other mechanism (locking, reference counting, whatever).  These markings
should allow external tools to more easily spot bugs involving leaking
pointers out of RCU read-side critical sections.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:16:18 -07:00
Paul E. McKenney
49f5903b47 rcu: Move preemption disabling out of __srcu_read_lock()
Currently, __srcu_read_lock() cannot be invoked from restricted
environments because it contains calls to preempt_disable() and
preempt_enable(), both of which can invoke lockdep, which is a bad
idea in some restricted execution modes.  This commit therefore moves
the preempt_disable() and preempt_enable() from __srcu_read_lock()
to srcu_read_lock().  It also inserts the preempt_disable() and
preempt_enable() around the call to __srcu_read_lock() in do_exit().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:15:43 -07:00
Boqun Feng
bb73c52bad rcu: Don't disable preemption for Tiny and Tree RCU readers
Because preempt_disable() maps to barrier() for non-debug builds,
it forces the compiler to spill and reload registers.  Because Tree
RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
barrier() instances generate needless extra code for each instance of
rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
RCU and bloats Tiny RCU.

This commit therefore removes the preempt_disable() and preempt_enable()
from the non-preemptible implementations of __rcu_read_lock() and
__rcu_read_unlock(), respectively.  However, for debug purposes,
preempt_disable() and preempt_enable() are still invoked if
CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
atomic sections in non-preemptible kernels.

However, Tiny and Tree RCU operates by coalescing all RCU read-side
critical sections on a given CPU that lie between successive quiescent
states.  It is therefore necessary to compensate for removing barriers
from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
couple of the RCU functions invoked during quiescent states, namely to
rcu_all_qs() and rcu_note_context_switch().  However, note that the latter
is more paranoia than necessity, at least until link-time optimizations
become more aggressive.

This is based on an earlier patch by Paul E. McKenney, fixing
a bug encountered in kernels built with CONFIG_PREEMPT=n and
CONFIG_PREEMPT_COUNT=y.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:08:23 -07:00
Boqun Feng
b6a4ae766e rcu: Use rcu_callback_t in call_rcu*() and friends
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
should use it in call_rcu*() and friends as the type of parameters. This
could save us a few lines of code and make it clear which function
requires an rcu callbacks rather than other callbacks as its argument.

Besides, this can also help cscope to generate a better database for
code reading.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:08:05 -07:00