Commit Graph

37483 Commits

Author SHA1 Message Date
Thomas Huehn
6d48851779 mac80211: add new Minstrel statistic output via csv
This patch adds a new debugfs file "rc_stats_csv" to output Minstrels
statistics in a common csv format that is easy to parse.

Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Stefan Venz <ikstream86@gmail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
[remove printing current time of day]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-01 20:41:43 +02:00
David S. Miller
af3e09e666 Merge tag 'mac80211-for-davem-2015-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
This contains just a single fix for a crash I happened to randomly
run into today during testing. It's clearly been around for a while,
but is pretty hard to trigger, even when I tried explicitly (and
modified the code to make it more likely) it rarely did.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-01 14:19:22 -04:00
Linus Torvalds
1e848913f0 Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Two main issues:

   - We found that turning on pNFS by default (when it's configured at
     build time) was too aggressive, so we want to switch the default
     before the 4.0 release.

   - Recent client changes to increase open parallelism uncovered a
     serious bug lurking in the server's open code.

  Also fix a krb5/selinux regression.

  The rest is mainly smaller pNFS fixes"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  sunrpc: make debugfs file creation failure non-fatal
  nfsd: require an explicit option to enable pNFS
  NFSD: Fix bad update of layout in nfsd4_return_file_layout
  NFSD: Take care the return value from nfsd4_encode_stateid
  NFSD: Printk blocklayout length and offset as format 0x%llx
  nfsd: return correct lockowner when there is a race on hash insert
  nfsd: return correct openowner when there is a race to put one in the hash
  NFSD: Put exports after nfsd4_layout_verify fail
  NFSD: Error out when register_shrinker() fail
  NFSD: Take care the return value from nfsd4_decode_stateid
  NFSD: Check layout type when returning client layouts
  NFSD: restore trace event lost in mismerge
2015-04-01 09:45:47 -07:00
David Howells
44ba06987c RxRPC: Handle VERSION Rx protocol packets
Handle VERSION Rx protocol packets.  We should respond to a VERSION packet
with a string indicating the Rx version.  This is a maximum of 64 characters
and is padded out to 65 chars with NUL bytes.

Note that other AFS clients use the version request as a NAT keepalive so we
need to handle it rather than returning an abort.

The standard formulation seems to be:

	<project> <version> built <yyyy>-<mm>-<dd>

for example:

	" OpenAFS 1.6.2 built  2013-05-07 "

(note the three extra spaces) as obtained with:

	rxdebug grand.mit.edu -version

from the openafs package.

Signed-off-by: David Howells <dhowells@redhat.com>
2015-04-01 16:31:26 +01:00
David Howells
382d7974de RxRPC: Use iov_iter_count() in rxrpc_send_data() instead of the len argument
Use iov_iter_count() in rxrpc_send_data() to get the remaining data length
instead of using the len argument as the len argument is now redundant.

Signed-off-by: David Howells <dhowells@redhat.com>
2015-04-01 15:49:26 +01:00
David Howells
aab94830a7 RxRPC: Don't call skb_add_data() if there's no data to copy
Don't call skb_add_data() in rxrpc_send_data() if there's no data to copy and
also skip the calculations associated with it in such a case.

Signed-off-by: David Howells <dhowells@redhat.com>
2015-04-01 15:48:00 +01:00
David Howells
3af6878eca RxRPC: Fix the conversion to iov_iter
This commit:

	commit af2b040e47
	Author: Al Viro <viro@zeniv.linux.org.uk>
	Date:   Thu Nov 27 21:44:24 2014 -0500
	Subject: rxrpc: switch rxrpc_send_data() to iov_iter primitives

incorrectly changes a do-while loop into a while loop in rxrpc_send_data().

Unfortunately, at least one pass through the loop is required - even if
there is no data - so that the packet the closes the send phase can be
sent if MSG_MORE is not set.

Signed-off-by: David Howells <dhowells@redhat.com>
2015-04-01 14:06:00 +01:00
Johannes Berg
788211d81b mac80211: fix RX A-MPDU session reorder timer deletion
There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:

 * tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
 * station is destroyed
 * reorder timer fires before ieee80211_free_tid_rx() runs,
   accessing the station, thus potentially crashing due to
   the use-after-free

The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.

Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-01 14:35:01 +02:00
Pablo Neira Ayuso
c5035c77f8 netfilter: nft_meta: fix cgroup matching
We have to stop iterating on the rule expressions if the cgroup
mismatches. Moreover, make sure a non-full socket from the input path
leads us to a crash.

Fixes: ce67417 ("netfilter: nft_meta: add cgroup support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:33:00 +02:00
Oliver Hartkopp
a5581ef4c2 can: introduce new raw socket option to join the given CAN filters
The CAN_RAW socket can set multiple CAN identifier specific filters that lead
to multiple filters in the af_can.c filter processing. These filters are
indenpendent from each other which leads to logical OR'ed filters when applied.

This socket option joines the given CAN filters in the way that only CAN frames
are passed to user space that matched *all* given CAN filters. The semantic for
the applied filters is therefore changed to a logical AND.

This is useful especially when the filterset is a combination of filters where
the CAN_INV_FILTER flag is set in order to notch single CAN IDs or CAN ID
ranges from the incoming traffic.

As the raw_rcv() function is executed from NET_RX softirq the introduced
variables are implemented as per-CPU variables to avoid extensive locking at
CAN frame reception time.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-04-01 11:28:22 +02:00
Oliver Hartkopp
514ac99c64 can: fix multiple delivery of a single CAN frame for overlapping CAN filters
The CAN_RAW socket can set multiple CAN identifier specific filters that lead
to multiple filters in the af_can.c filter processing. These filters are
indenpendent from each other which leads to logical OR'ed filters when applied.

This patch makes sure that every CAN frame which is filtered for a specific
socket is only delivered once to the user space. This is independent from the
number of matching CAN filters of this socket.

As the raw_rcv() function is executed from NET_RX softirq the introduced
variables are implemented as per-CPU variables to avoid extensive locking at
CAN frame reception time.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-04-01 11:27:41 +02:00
Daniel Borkmann
afb7718016 netfilter: x_tables: fix cgroup matching on non-full sks
While originally only being intended for outgoing traffic, commit
a00e76349f ("netfilter: x_tables: allow to use cgroup match for
LOCAL_IN nf hooks") enabled xt_cgroups for the NF_INET_LOCAL_IN hook
as well, in order to allow for nfacct accounting.

Besides being currently limited to early demuxes only, commit
a00e76349f forgot to add a check if we deal with full sockets,
i.e. in this case not with time wait sockets. TCP time wait sockets
do not have the same memory layout as full sockets, a lower memory
footprint and consequently also don't have a sk_classid member;
probing for sk_classid member there could potentially lead to a
crash.

Fixes: a00e76349f ("netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks")
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:26:42 +02:00
Thomas Huehn
9c00bb7210 mac80211: enhance readability of Minstrel-HTs rc_stats output
This patch restructures the rc_stats debugfs table of Minstrel-HT in
order to achieve better human readability. A new layout of the
statistics and a new header is added. In addition to the old layout
there are two new columns of information added:
idx	- representing the rate index of each rate in mac80211 which
	  can be used to set specific rates as fixed rate via debugfs
airtime	- the tx-time in micro seconds that a 1200 Byte packet
	  takes to be transmitted over the air at the given rate

The old layout of rc_stats:

type           rate      tpt eprob *prob ret  *ok(*cum)        ok(      cum)
HT20/LGI       MCS0      5.6 100.0 100.0   1    0(   0)         1(        1)
HT20/LGI   B   MCS1     10.5 100.0 100.0   0    0(   0)         1(        1)
HT20/LGI  A    MCS2     14.8 100.0 100.0   0    0(   0)         1(        1)
...

is changed into this new layout:

            best   ________rate______    __statistics__    ________last_______    ______sum-of________
mode guard #  rate  [name   idx airtime]  [ ø(tp) ø(prob)]  [prob.|retry|suc|att]  [#success | #attempts]
HT20  LGI  1         MCS0     0    1480      0.0      0.0      0.0   1     0 0             0   0
HT20  LGI  1     B   MCS1     1     740     10.5    100.0    100.0   0     0 0             1   1
HT20  LGI  1    A    MCS2     2     496     14.8    100.0    100.0   0     0 0             1   1
...

Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Stefan Venz <ikstream86@gmail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-01 11:22:39 +02:00
Thomas Huehn
e161f7f6c4 mac80211: enhance readability of Minstrels rc_stats output
This patch restructures the rc_stats debugfs table of Minstrel in
order to achieve better human readability. A new layout of the
statistics and a new header is added. In addition to the old layout
there are two new columns of information added:
idx	- representing the rate index of each rate in mac80211 which
	  can be used to set specific rates as fixed rate via debugfs
airtime - the tx-time in micro seconds that a 1200 Byte packet
	  takes to be transmitted over the air at the given rate

The old layout of rc_stats:

    rate      tpt  eprob *prob ret  *ok(*cum)        ok(      cum)
 DP 1          0.9  93.5 100.0   1    0(   0)         2(        2)
    2          0.4  40.0 100.0   0    0(   0)         4(        10)
    5.5        0.0   0.0   0.0   0    0(   0)         0(        0)
...

is changed into this new layout:

best   _______rate_____    __statistics__    ________last_______    ______sum-of________
rate  [name idx tx-time]  [ ø(tp) ø(prob)]  [prob.|retry|suc|att]  [#success | #attempts]
 DP   1     0     9738      0.9    93.5     100.0   1     1 1             2   2
      2     1     4922      0.4    40.0     100.0   1     0 0             4   10
      5.5   2     1858      0.0     0.0       0.0   2     0 0             0   0
...

Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Stefan Venz <ikstream86@gmail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-01 11:22:39 +02:00
Ilan peer
c37722bd19 cfg80211: Stop calling crda if it is not responsive
Patch eeca9fce1d (cfg80211: Schedule
timeout for all CRDA call) introduced a regression, where in case
that crda is not installed (or not configured properly etc.), the
regulatory core will needlessly continue to call it, polluting the
log with the following log:

"cfg80211: Calling CRDA to update world regulatory domain"

Fix this by limiting the number of continuous CRDA request failures.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-01 11:22:38 +02:00
Patrick McHardy
9d0982927e netfilter: nft_hash: add support for timeouts
Add support for element timeouts to nft_hash. The lookup and walking
functions are changed to ignore timed out elements, a periodic garbage
collection task cleans out expired entries.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:49 +02:00
Patrick McHardy
6908665826 netfilter: nf_tables: add GC synchronization helpers
GC is expected to happen asynchrously to the netlink interface. In the
netlink path, both insertion and removal of elements consist of two
steps, insertion followed by activation or deactivation followed by
removal, during which the element must not be freed by GC.

The synchronization helpers use an unused bit in the genmask field to
atomically mark an element as "busy", meaning it is either currently
being handled through the netlink API or by GC.

Elements being processed by GC will never survive, netlink will simply
ignore them. Elements being currently processed through netlink will be
skipped by GC and reprocessed during the next run.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:29 +02:00
Patrick McHardy
cfed7e1b1f netfilter: nf_tables: add set garbage collection helpers
Add helpers for GC batch destruction: since element destruction needs
a RCU grace period for all set implementations, add some helper functions
for asynchronous batch destruction. Elements are collected in a batch
structure, which is asynchronously released using RCU once its full.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:29 +02:00
Patrick McHardy
c3e1b005ed netfilter: nf_tables: add set element timeout support
Add API support for set element timeouts. Elements can have a individual
timeout value specified, overriding the sets' default.

Two new extension types are used for timeouts - the timeout value and
the expiration time. The timeout value only exists if it differs from
the default value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:28 +02:00
Patrick McHardy
761da2935d netfilter: nf_tables: add set timeout API support
Add set timeout support to the netlink API. Sets with timeout support
enabled can have a default timeout value and garbage collection interval
specified.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-01 11:17:28 +02:00
Johannes Berg
7bedd0cfad mac80211: use rhashtable for station table
We currently have a hand-rolled table with 256 entries and are
using the last byte of the MAC address as the hash. This hash
is obviously very fast, but collisions are easily created and
we waste a lot of space in the common case of just connecting
as a client to an AP where we just have a single station. The
other common case of an AP is also suboptimal due to the size
of the hash table and the ease of causing collisions.

Convert all of this to use rhashtable with jhash, which gives
us the advantage of a far better hash function (with random
perturbation to avoid hash collision attacks) and of course
that the hash table grows and shrinks dynamically with chain
length, improving both cases above.

Use a specialised hash function (using jhash, but with fixed
length) to achieve better compiler optimisation as suggested
by Sergey Ryazanov.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-01 10:06:26 +02:00
Ying Xue
7e43690578 tipc: fix a slab object leak
When remove TIPC module, there is a warning to remind us that a slab
object is leaked like:

root@localhost:~# rmmod tipc
[   19.056226] =============================================================================
[   19.057549] BUG TIPC (Not tainted): Objects remaining in TIPC on kmem_cache_close()
[   19.058736] -----------------------------------------------------------------------------
[   19.058736]
[   19.060287] INFO: Slab 0xffffea0000519a00 objects=23 used=1 fp=0xffff880014668b00 flags=0x100000000004080
[   19.061915] INFO: Object 0xffff880014668000 @offset=0
[   19.062717] kmem_cache_destroy TIPC: Slab cache still has objects

This is because the listening socket of TIPC topology server is not
closed before TIPC proto handler is unregistered with proto_unregister().
However, as the socket is closed in tipc_exit_net() which is called by
unregister_pernet_subsys() during unregistering TIPC namespace operation,
the warning can be eliminated if calling unregister_pernet_subsys() is
moved before calling proto_unregister().

Fixes: e05b31f4bf ("tipc: make tipc socket support net namespace")
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 23:10:08 -04:00
David S. Miller
7b6249bba9 Merge tag 'mac80211-next-for-davem-2015-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
Lots of updates for net-next; along with the usual flurry
of small fixes, cleanups and internal features we have:
 * VHT support for TDLS and IBSS (conditional on drivers though)
 * first TX performance improvements (the biggest will come later)
 * many suspend/resume (race) fixes
 * name_assign_type support from Tom Gundersen
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:39:04 -04:00
Jiri Pirko
fbcb217059 net: rename dev to orig_dev in deliver_ptype_list_skb
Unlike other places, this function uses name "dev" for what should be
"orig_dev", which might be a bit confusing. So fix this.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:37:43 -04:00
Eugene Crosser
ed4ac42217 af_iucv: fix AF_IUCV sendmsg() errno
When sending over AF_IUCV socket, errno was incorrectly set to
ENOMEM even when other values where appropriate, notably EAGAIN.
With this patch, error indicator returned by sock_alloc_send_skb()
is passed to the caller, rather than being overwritten with ENOMEM.

Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:06:50 -04:00
Thomas Graf
fa2d8ff4e3 openvswitch: Return vport module ref before destruction
Return module reference before invoking the respective vport
->destroy() function. This is needed as ovs_vport_del() is not
invoked inside an RCU read side critical section so the kfree
can occur immediately before returning to ovs_vport_del().

Returning the module reference before ->destroy() is safe because
the module unregistration is blocked on ovs_lock which we hold
while destroying the datapath.

Fixes: 62b9c8d037 ("ovs: Turn vports with dependencies into separate modules")
Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 15:59:50 -04:00
Jeff Layton
f9c72d10d6 sunrpc: make debugfs file creation failure non-fatal
We currently have a problem that SELinux policy is being enforced when
creating debugfs files. If a debugfs file is created as a side effect of
doing some syscall, then that creation can fail if the SELinux policy
for that process prevents it.

This seems wrong. We don't do that for files under /proc, for instance,
so Bruce has proposed a patch to fix that.

While discussing that patch however, Greg K.H. stated:

    "No kernel code should care / fail if a debugfs function fails, so
     please fix up the sunrpc code first."

This patch converts all of the sunrpc debugfs setup code to be void
return functins, and the callers to not look for errors from those
functions.

This should allow rpc_clnt and rpc_xprt creation to work, even if the
kernel fails to create debugfs files for some reason.

Symptoms were failing krb5 mounts on systems using gss-proxy and
selinux.

Fixes: 388f0c7767 "sunrpc: add a debugfs rpc_xprt directory..."
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-31 14:15:08 -04:00
Jiri Benc
67b61f6c13 netlink: implement nla_get_in_addr and nla_get_in6_addr
Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Jiri Benc
930345ea63 netlink: implement nla_put_in_addr and nla_put_in6_addr
IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Jiri Benc
15e318bdc6 xfrm: simplify xfrm_address_t use
In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and
get rid of the typecasting.

Modifying the uapi header is okay, the union has still the same size.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Jiri Benc
8f55db4860 tcp: simplify inetpeer_addr_base use
In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and get rid
of the typecasting.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Ian Morris
53b24b8f94 ipv6: coding style: comparison for inequality with NULL
The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x != NULL and sometimes as x. x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:51:54 -04:00
Ian Morris
63159f29be ipv6: coding style: comparison for equality with NULL
The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:51:54 -04:00
Alexander Duyck
6e47d6caff fib_trie: Cleanup ip_fib_net_exit code path
While fixing a recent issue I noticed that we are doing some unnecessary
work inside the loop for ip_fib_net_exit.  As such I am pulling out the
initialization to NULL for the locally stored fib_local, fib_main, and
fib_default.

In addition I am restoring the original code for flushing the table as
there is no need to split up the fib_table_flush and hlist_del work since
the code for packing the tnodes with multiple key vectors was dropped.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:18:56 -04:00
Alexander Duyck
ad88d05136 fib_trie: Fix warning on fib4_rules_exit
This fixes the following warning:

 BUG: sleeping function called from invalid context at mm/slub.c:1268
 in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
 INFO: lockdep is turned off.
 CPU: 3 PID: 6 Comm: kworker/u8:0 Tainted: G        W       4.0.0-rc5+ #895
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: netns cleanup_net
  0000000000000006 ffff88011953fa68 ffffffff81a203b6 000000002c3a2c39
  ffff88011952a680 ffff88011953fa98 ffffffff8109daf0 ffff8801186c6aa8
  ffffffff81fbc9e5 00000000000004f4 0000000000000000 ffff88011953fac8
 Call Trace:
  [<ffffffff81a203b6>] dump_stack+0x4c/0x65
  [<ffffffff8109daf0>] ___might_sleep+0x1c3/0x1cb
  [<ffffffff8109db70>] __might_sleep+0x78/0x80
  [<ffffffff8117a60e>] slab_pre_alloc_hook+0x31/0x8f
  [<ffffffff8117d4f6>] __kmalloc+0x69/0x14e
  [<ffffffff818ed0e1>] ? kzalloc.constprop.20+0xe/0x10
  [<ffffffff818ed0e1>] kzalloc.constprop.20+0xe/0x10
  [<ffffffff818ef622>] fib_trie_table+0x27/0x8b
  [<ffffffff818ef6bd>] fib_trie_unmerge+0x37/0x2a6
  [<ffffffff810b06e1>] ? arch_local_irq_save+0x9/0xc
  [<ffffffff818e9793>] fib_unmerge+0x2d/0xb3
  [<ffffffff818f5f56>] fib4_rule_delete+0x1f/0x52
  [<ffffffff817f1c3f>] ? fib_rules_unregister+0x30/0xb2
  [<ffffffff817f1c8b>] fib_rules_unregister+0x7c/0xb2
  [<ffffffff818f64a1>] fib4_rules_exit+0x15/0x18
  [<ffffffff818e8c0a>] ip_fib_net_exit+0x23/0xf2
  [<ffffffff818e91f8>] fib_net_exit+0x32/0x36
  [<ffffffff817c8352>] ops_exit_list+0x45/0x57
  [<ffffffff817c8d3d>] cleanup_net+0x13c/0x1cd
  [<ffffffff8108b05d>] process_one_work+0x255/0x4ad
  [<ffffffff8108af69>] ? process_one_work+0x161/0x4ad
  [<ffffffff8108b4b1>] worker_thread+0x1cd/0x2ab
  [<ffffffff8108b2e4>] ? process_scheduled_works+0x2f/0x2f
  [<ffffffff81090686>] kthread+0xd4/0xdc
  [<ffffffff8109ec8f>] ? local_clock+0x19/0x22
  [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83
  [<ffffffff81a2c0c8>] ret_from_fork+0x58/0x90
  [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83

The issue was that as a part of exiting the default rules were being
deleted which resulted in the local trie being unmerged.  By moving the
freeing of the FIB tables up we can avoid the unmerge since there is no
local table left when we call the fib4_rules_exit function.

Fixes: 0ddcf43d5d ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:18:56 -04:00
David S. Miller
89a3f3c55b Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-03-27

Here's another set of Bluetooth & 802.15.4 patches for 4.1:

 - New API to control LE advertising data (i.e. peripheral role)
 - mac802154 & at86rf230 cleanups
 - Support for toggling quirks from debugfs (useful for testing)
 - Memory leak fix for LE scanning
 - Extra version info reading support for Broadcom controllers

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 12:45:58 -04:00
Chuck Lever
d654788e98 xprtrdma: Make rpcrdma_{un}map_one() into inline functions
These functions are called in a loop for each page transferred via
RDMA READ or WRITE. Extract loop invariants and inline them to
reduce CPU overhead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:53 -04:00
Chuck Lever
e46ac34c3c xprtrdma: Handle non-SEND completions via a callout
Allow each memory registration mode to plug in a callout that handles
the completion of a memory registration operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:53 -04:00
Chuck Lever
3968cb5850 xprtrdma: Add "open" memreg op
The open op determines the size of various transport data structures
based on device capabilities and memory registration mode.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:53 -04:00
Chuck Lever
4561f347d4 xprtrdma: Add "destroy MRs" memreg op
Memory Region objects associated with a transport instance are
destroyed before the instance is shutdown and destroyed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:53 -04:00
Chuck Lever
31a701a947 xprtrdma: Add "reset MRs" memreg op
This method is invoked when a transport instance is about to be
reconnected. Each Memory Region object is reset to its initial
state.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:53 -04:00
Chuck Lever
91e70e70e4 xprtrdma: Add "init MRs" memreg op
This method is used when setting up a new transport instance to
create a pool of Memory Region objects that will be used to register
memory during operation.

Memory Regions are not needed for "physical" registration, since
->prepare and ->release are no-ops for that mode.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00
Chuck Lever
6814baead8 xprtrdma: Add a "deregister_external" op for each memreg mode
There is very little common processing among the different external
memory deregistration functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00
Chuck Lever
9c1b4d775f xprtrdma: Add a "register_external" op for each memreg mode
There is very little common processing among the different external
memory registration functions. Have rpcrdma_create_chunks() call
the registration method directly. This removes a stack frame and a
switch statement from the external registration path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00
Chuck Lever
1c9351ee0e xprtrdma: Add a "max_payload" op for each memreg mode
The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00
Chuck Lever
a0ce85f595 xprtrdma: Add vector of ops for each memory registration strategy
Instead of employing switch() statements, let's use the typical
Linux kernel idiom for handling behavioral variation: virtual
functions.

Start by defining a vector of operations for each supported memory
registration mode, and by adding a source file for each mode.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00
Chuck Lever
41f9702896 xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
If a provider advertizes a zero max_fast_reg_page_list_len, FRWR
depth detection loops forever. Instead of just failing the mount,
try other memory registration modes.

Fixes: 0fc6c4e7bb ("xprtrdma: mind the device's max fast . . .")
Reported-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00
Chuck Lever
805272406a xprtrdma: Byte-align FRWR registration
The RPC/RDMA transport's FRWR registration logic registers whole
pages. This means areas in the first and last pages that are not
involved in the RDMA I/O are needlessly exposed to the server.

Buffered I/O is typically page-aligned, so not a problem there. But
for direct I/O, which can be byte-aligned, and for reply chunks,
which are nearly always smaller than a page, the transport could
expose memory outside the I/O buffer.

FRWR allows byte-aligned memory registration, so let's use it as
it was intended.

Reported-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00
Chuck Lever
e23779451e xprtrdma: Perform a full marshal on retransmit
Commit 6ab59945f2 ("xprtrdma: Update rkeys after transport
reconnect" added logic in the ->send_request path to update the
chunk list when an RPC/RDMA request is retransmitted.

Note that rpc_xdr_encode() resets and re-encodes the entire RPC
send buffer for each retransmit of an RPC. The RPC send buffer
is not preserved from the previous transmission of an RPC.

Revert 6ab59945f2, and instead, just force each request to be
fully marshaled every time through ->send_request. This should
preserve the fix from 6ab59945f2, while also performing pullup
during retransmits.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00
Chuck Lever
0dd39cae26 xprtrdma: Display IPv6 addresses and port numbers correctly
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31 09:52:52 -04:00