Commit Graph

45659 Commits

Author SHA1 Message Date
Ilya Dryomov
f01d5cb24e libceph: rename ceph_entity_name_encode() -> ceph_auth_entity_name_encode()
Clear up EntityName vs entity_name_t confusion.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-24 23:49:15 +02:00
David S. Miller
6546c78ea6 Merge tag 'rxrpc-rewrite-20160824-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:

====================
rxrpc: Add better client conn management strategy

These two patches add a better client connection management strategy.  They
need to be applied on top of the just-posted fixes.

 (1) Duplicate the connection list and separate out procfs iteration from
     garbage collection.  This is necessary for the next patch as with that
     client connections no longer appear on a single list and may not
     appear on a list at all - and really don't want to be exposed to the
     old garbage collector.

     (Note that client conns aren't left dangling, they're also in a tree
     rooted in the local endpoint so that they can be found by a user
     wanting to make a new client call.  Service conns do not appear in
     this tree.)

 (2) Implement a better lifetime management and garbage collection strategy
     for client connections.

     In this, a client connection can be in one of five cache states
     (inactive, waiting, active, culled and idle).  Limits are set on the
     number of client conns that may be active at any one time and makes
     users wait if they want to start a new call when there isn't capacity
     available.

     To make capacity available, active and idle connections can be culled,
     after a short delay (to allow for retransmission).  The delay is
     reduced if the capacity exceeds a tunable threshold.

     If there is spare capacity, client conns are permitted to hang around
     a fair bit longer (tunable) so as to allow reuse of negotiated
     security contexts.

     After this patch, the client conn strategy is separate from that of
     service conns (which continues to use the old code for the moment).

     This difference in strategy is because the client side retains control
     over when it allows a connection to become active, whereas the service
     side has no control over when it sees a new connection or a new call
     on an old connection.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:43:44 -07:00
David S. Miller
d3c10db138 Merge tag 'rxrpc-rewrite-20160824-1' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:

====================
rxrpc: More fixes

Here are a couple of fix patches:

 (1) Fix the conn-based retransmission patch posted yesterday.  This breaks
     if it actually has to retransmit.  However, it seems the likelihood of
     this happening is really low, despite the server I'm testing against
     being located >3000 miles away, and sometime of the time it's handled
     in the call background processor before we manage to disconnect the
     call - hence why I didn't spot it.

 (2) /proc/net/rxrpc_calls can cause a crash it accessed whilst a call is
     being torn down.  The window of opportunity is pretty small, however,
     as calls don't stay in this state for long.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:42:57 -07:00
Daniel Borkmann
dbb50887c8 Bluetooth: split sk_filter in l2cap_sock_recv_cb
During an audit for sk_filter(), we found that rx_busy_skb handling
in l2cap_sock_recv_cb() and l2cap_sock_recvmsg() looks not quite as
intended.

The assumption from commit e328140fda ("Bluetooth: Use event-driven
approach for handling ERTM receive buffer") is that errors returned
from sock_queue_rcv_skb() are due to receive buffer shortage. However,
nothing should prevent doing a setsockopt() with SO_ATTACH_FILTER on
the socket, that could drop some of the incoming skbs when handled in
sock_queue_rcv_skb().

In that case sock_queue_rcv_skb() will return with -EPERM, propagated
from sk_filter() and if in L2CAP_MODE_ERTM mode, wrong assumption was
that we failed due to receive buffer being full. From that point onwards,
due to the to-be-dropped skb being held in rx_busy_skb, we cannot make
any forward progress as rx_busy_skb is never cleared from l2cap_sock_recvmsg(),
due to the filter drop verdict over and over coming from sk_filter().
Meanwhile, in l2cap_sock_recv_cb() all new incoming skbs are being
dropped due to rx_busy_skb being occupied.

Instead, just use __sock_queue_rcv_skb() where an error really tells that
there's a receive buffer issue. Split the sk_filter() and enable it for
non-segmented modes at queuing time since at this point in time the skb has
already been through the ERTM state machine and it has been acked, so dropping
is not allowed. Instead, for ERTM and streaming mode, call sk_filter() in
l2cap_data_rcv() so the packet can be dropped before the state machine sees it.

Fixes: e328140fda ("Bluetooth: Use event-driven approach for handling ERTM receive buffer")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-08-24 16:55:04 +02:00
Frederic Dalleau
9afee94939 Bluetooth: Fix memory leak at end of hci requests
In hci_req_sync_complete the event skb is referenced in hdev->req_skb.
It is used (via hci_req_run_skb) from either __hci_cmd_sync_ev which will
pass the skb to the caller, or __hci_req_sync which leaks.

unreferenced object 0xffff880005339a00 (size 256):
  comm "kworker/u3:1", pid 1011, jiffies 4294671976 (age 107.389s)
  backtrace:
    [<ffffffff818d89d9>] kmemleak_alloc+0x49/0xa0
    [<ffffffff8116bba8>] kmem_cache_alloc+0x128/0x180
    [<ffffffff8167c1df>] skb_clone+0x4f/0xa0
    [<ffffffff817aa351>] hci_event_packet+0xc1/0x3290
    [<ffffffff8179a57b>] hci_rx_work+0x18b/0x360
    [<ffffffff810692ea>] process_one_work+0x14a/0x440
    [<ffffffff81069623>] worker_thread+0x43/0x4d0
    [<ffffffff8106ead4>] kthread+0xc4/0xe0
    [<ffffffff818dd38f>] ret_from_fork+0x1f/0x40
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Frédéric Dalleau <frederic.dalleau@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-08-24 16:49:29 +02:00
David Howells
45025bceef rxrpc: Improve management and caching of client connection objects
Improve the management and caching of client rxrpc connection objects.
From this point, client connections will be managed separately from service
connections because AF_RXRPC controls the creation and re-use of client
connections but doesn't have that luxury with service connections.

Further, there will be limits on the numbers of client connections that may
be live on a machine.  No direct restriction will be placed on the number
of client calls, excepting that each client connection can support a
maximum of four concurrent calls.

Note that, for a number of reasons, we don't want to simply discard a
client connection as soon as the last call is apparently finished:

 (1) Security is negotiated per-connection and the context is then shared
     between all calls on that connection.  The context can be negotiated
     again if the connection lapses, but that involves holding up calls
     whilst at least two packets are exchanged and various crypto bits are
     performed - so we'd ideally like to cache it for a little while at
     least.

 (2) If a packet goes astray, we will need to retransmit a final ACK or
     ABORT packet.  To make this work, we need to keep around the
     connection details for a little while.

 (3) The locally held structures represent some amount of setup time, to be
     weighed against their occupation of memory when idle.


To this end, the client connection cache is managed by a state machine on
each connection.  There are five states:

 (1) INACTIVE - The connection is not held in any list and may not have
     been exposed to the world.  If it has been previously exposed, it was
     discarded from the idle list after expiring.

 (2) WAITING - The connection is waiting for the number of client conns to
     drop below the maximum capacity.  Calls may be in progress upon it
     from when it was active and got culled.

     The connection is on the rxrpc_waiting_client_conns list which is kept
     in to-be-granted order.  Culled conns with waiters go to the back of
     the queue just like new conns.

 (3) ACTIVE - The connection has at least one call in progress upon it, it
     may freely grant available channels to new calls and calls may be
     waiting on it for channels to become available.

     The connection is on the rxrpc_active_client_conns list which is kept
     in activation order for culling purposes.

 (4) CULLED - The connection got summarily culled to try and free up
     capacity.  Calls currently in progress on the connection are allowed
     to continue, but new calls will have to wait.  There can be no waiters
     in this state - the conn would have to go to the WAITING state
     instead.

 (5) IDLE - The connection has no calls in progress upon it and must have
     been exposed to the world (ie. the EXPOSED flag must be set).  When it
     expires, the EXPOSED flag is cleared and the connection transitions to
     the INACTIVE state.

     The connection is on the rxrpc_idle_client_conns list which is kept in
     order of how soon they'll expire.

A connection in the ACTIVE or CULLED state must have at least one active
call upon it; if in the WAITING state it may have active calls upon it;
other states may not have active calls.

As long as a connection remains active and doesn't get culled, it may
continue to process calls - even if there are connections on the wait
queue.  This simplifies things a bit and reduces the amount of checking we
need do.


There are a couple flags of relevance to the cache:

 (1) EXPOSED - The connection ID got exposed to the world.  If this flag is
     set, an extra ref is added to the connection preventing it from being
     reaped when it has no calls outstanding.  This flag is cleared and the
     ref dropped when a conn is discarded from the idle list.

 (2) DONT_REUSE - The connection should be discarded as soon as possible and
     should not be reused.


This commit also provides a number of new settings:

 (*) /proc/net/rxrpc/max_client_conns

     The maximum number of live client connections.  Above this number, new
     connections get added to the wait list and must wait for an active
     conn to be culled.  Culled connections can be reused, but they will go
     to the back of the wait list and have to wait.

 (*) /proc/net/rxrpc/reap_client_conns

     If the number of desired connections exceeds the maximum above, the
     active connection list will be culled until there are only this many
     left in it.

 (*) /proc/net/rxrpc/idle_conn_expiry

     The normal expiry time for a client connection, provided there are
     fewer than reap_client_conns of them around.

 (*) /proc/net/rxrpc/idle_conn_fast_expiry

     The expedited expiry time, used when there are more than
     reap_client_conns of them around.


Note that I combined the Tx wait queue with the channel grant wait queue to
save space as only one of these should be in use at once.

Note also that, for the moment, the service connection cache still uses the
old connection management code.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-24 15:17:14 +01:00
David Howells
4d028b2c82 rxrpc: Dup the main conn list for the proc interface
The main connection list is used for two independent purposes: primarily it
is used to find connections to reap and secondarily it is used to list
connections in procfs.

Split the procfs list out from the reap list.  This allows us to stop using
the reap list for client connections when they acquire a separate
management strategy from service collections.

The client connections will not be on a management single list, and sometimes
won't be on a management list at all.  This doesn't leave them floating,
however, as they will also be on an rb-tree rooted on the socket so that the
socket can find them to dispatch calls.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-24 15:17:14 +01:00
David Howells
df5d8bf70f rxrpc: Make /proc/net/rxrpc_calls safer
Make /proc/net/rxrpc_calls safer by stashing a copy of the peer pointer in
the rxrpc_call struct and checking in the show routine that the peer
pointer, the socket pointer and the local pointer obtained from the socket
pointer aren't NULL before we use them.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-24 15:15:59 +01:00
David Howells
2266ffdef5 rxrpc: Fix conn-based retransmit
If a duplicate packet comes in for a call that has just completed on a
connection's channel then there will be an oops in the data_ready handler
because it tries to examine the connection struct via a call struct (which
we don't have - the pointer is unset).

Since the connection struct pointer is available to us, go direct instead.

Also, the ACK packet to be retransmitted needs three octets of padding
between the soft ack list and the ackinfo.

Fixes: 18bfeba50d ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-24 13:06:14 +01:00
Florian Westphal
35db57bbc4 xfrm: state: remove per-netns gc task
After commit 5b8ef3415a
("xfrm: Remove ancient sleeping when the SA is in acquire state")
gc does not need any per-netns data anymore.

As far as gc is concerned all state structs are the same, so we
can use a global work struct for it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-24 13:16:06 +02:00
Steffen Klassert
4141b36ab1 xfrm: Fix xfrm_policy_lock imbalance
An earlier patch accidentally replaced a write_lock_bh
with a spin_unlock_bh. Fix this by using spin_lock_bh
instead.

Fixes: 9d0380df62 ("xfrm: policy: convert policy_lock to spinlock")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-24 13:13:08 +02:00
Eric Dumazet
ba2489b0e0 net: remove clear_sk() method
We no longer use this handler, we can delete it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 23:25:29 -07:00
Eric Dumazet
391bb6be65 ipv6: tcp: get rid of tcp_v6_clear_sk()
Now RCU lookups of IPv6 TCP sockets no longer dereference pinet6,
we do not need tcp_v6_clear_sk() anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 23:25:29 -07:00
Eric Dumazet
4cac820466 udp: get rid of sk_prot_clear_portaddr_nulls()
Since we no longer use SLAB_DESTROY_BY_RCU for UDP,
we do not need sk_prot_clear_portaddr_nulls() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 23:25:29 -07:00
Eric Dumazet
6a6ad2a4e5 ipv6: udp: remove udp_v6_clear_sk()
Now RCU lookups of ipv6 udp sockets no longer dereference
pinet6 field, we can get rid of udp_v6_clear_sk() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 23:23:50 -07:00
David Ahern
5d77dca828 net: diag: support SOCK_DESTROY for UDP sockets
This implements SOCK_DESTROY for UDP sockets similar to what was done
for TCP with commit c1e64e298b ("net: diag: Support destroying TCP
sockets.") A process with a UDP socket targeted for destroy is awakened
and recvmsg fails with ECONNABORTED.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 23:12:27 -07:00
David Ahern
d7226c7a4d net: diag: Fix refcnt leak in error path destroying socket
inet_diag_find_one_icsk takes a reference to a socket that is not
released if sock_diag_destroy returns an error. Fix by changing
tcp_diag_destroy to manage the refcnt for all cases and remove
the sock_put calls from tcp_abort.

Fixes: c1e64e298b ("net: diag: Support destroying TCP sockets")
Reported-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 23:11:36 -07:00
Wei Yongjun
5128b18522 tipc: use kfree_skb() instead of kfree()
Use kfree_skb() instead of kfree() to free sk_buff.

Fixes: 0d051bf93c ("tipc: make bearer packet filtering generic")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 23:08:25 -07:00
Eric Dumazet
75d855a5e9 udp: get rid of SLAB_DESTROY_BY_RCU allocations
After commit ca065d0cf8 ("udp: no longer use SLAB_DESTROY_BY_RCU")
we do not need this special allocation mode anymore, even if it is
harmless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:46:17 -07:00
Lance Richardson
232cb53a45 sctp: fix overrun in sctp_diag_dump_one()
The function sctp_diag_dump_one() currently performs a memcpy()
of 64 bytes from a 16 byte field into another 16 byte field. Fix
by using correct size, use sizeof to obtain correct size instead
of using a hard-coded constant.

Fixes: 8f840e47f1 ("sctp: add the sctp_diag.c file")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:22:53 -07:00
David S. Miller
85d2c92051 Merge tag 'rxrpc-rewrite-20160823-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:

====================
rxrpc: Miscellaneous improvements

Here are some improvements that are part of the AF_RXRPC rewrite.  They
need to be applied on top of the just posted cleanups.

 (1) Set the connection expiry on the connection becoming idle when its
     last currently active call completes rather than each time put is
     called.

     This means that the connection isn't held open by retransmissions,
     pings and duplicate packets.  Future patches will limit the number of
     live connections that the kernel will support, so making sure that old
     connections don't overstay their welcome is necessary.

 (2) Calculate packet serial skew in the UDP data_ready callback rather
     than in the call processor on a work queue.  Deferring it like this
     causes the skew to be elevated by further packets coming in before we
     get to make the calculation.

 (3) Move retransmission of the terminal ACK or ABORT packet for a
     connection to the connection processor, using the terminal state
     cached in the rxrpc_connection struct.  This means that once last_call
     is set in a channel to the current call's ID, no more packets will be
     routed to that rxrpc_call struct.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:20:59 -07:00
David S. Miller
3a69101595 Merge tag 'rxrpc-rewrite-20160823-1' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:

====================
rxrpc: Cleanups

Here are some cleanups for the AF_RXRPC rewrite:

 (1) Remove some unused bits.

 (2) Call releasing on socket closure is now done in the order in which
     calls progress through the phases so that we don't miss a call
     actively moving list.

 (3) The rxrpc_call struct's channel number field is redundant and replaced
     with accesses to the masked off cid field instead.

 (4) Use a tracepoint for socket buffer accounting rather than printks.

     Unfortunately, since this would require currently non-existend
     arch-specific help to divine the current instruction location, the
     accounting functions are moved out of line so that
     __builtin_return_address() can be used.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:19:59 -07:00
Phil Sutter
f8edcd127b net: rtnetlink: Don't export empty RTAX_FEATURES
Since the features bit field has bits for internal only use as well, it
may happen that the kernel exports RTAX_FEATURES attribute with zero
value which is pointless.

Fix this by making sure the attribute is added only if the exported
value is non-zero.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:09:28 -07:00
Yuchung Cheng
cebc5cbab4 net-tcp: retire TFO_SERVER_WO_SOCKOPT2 config
TFO_SERVER_WO_SOCKOPT2 was intended for debugging purposes during
Fast Open development. Remove this config option and also
update/clean-up the documentation of the Fast Open sysctl.

Reported-by: Piotr Jurkiewicz <piotr.jerzy.jurkiewicz@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:01:01 -07:00
Eric Dumazet
20a2b49fc5 tcp: properly scale window in tcp_v[46]_reqsk_send_ack()
When sending an ack in SYN_RECV state, we must scale the offered
window if wscale option was negotiated and accepted.

Tested:
 Following packetdrill test demonstrates the issue :

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

// Establish a connection.
+0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0>
+0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7>

+0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100>
// check that window is properly scaled !
+0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 16:55:49 -07:00
Gao Feng
54c151d9ed l2tp: Refactor the codes with existing macros instead of literal number
Use PPP_ALLSTATIONS, PPP_UI, and SEND_SHUTDOWN instead of 0xff,
0x03, and 2 separately.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 16:49:57 -07:00
Eric Dumazet
e83c6744e8 udp: fix poll() issue with zero sized packets
Laura tracked poll() [and friends] regression caused by commit
e6afc8ace6 ("udp: remove headers from UDP packets before queueing")

udp_poll() needs to know if there is a valid packet in receive queue,
even if its payload length is 0.

Change first_packet_length() to return an signed int, and use -1
as the indication of an empty queue.

Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 16:39:14 -07:00
Tom Herbert
1616b38f20 kcm: Fix locking issue
Lock the lower socket in kcm_unattach. Release during call to strp_done
since that function cancels the RX timers and work queue with sync.

Also added some status information in psock reporting.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 16:23:12 -07:00
Tom Herbert
cff6a334e6 strparser: Queue work when being unpaused
When the upper layer unpauses a stream parser connection we need to
queue rx_work to make sure no events are missed.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 16:23:12 -07:00
Pablo Neira Ayuso
6133740d6e netfilter: nf_tables: reject hook configuration updates on existing chains
Currently, if you add a base chain whose name clashes with an existing
non-base chain, nf_tables doesn't complain about this. Similarly, if you
update the chain type, the hook number and priority.

With this patch, nf_tables bails out in case any of this unsupported
operations occur by returning EBUSY.

 # nft add table x
 # nft add chain x y
 # nft add chain x y { type nat hook input priority 0\; }
 <cmdline>:1:1-49: Error: Could not process rule: Device or resource busy
 add chain x y { type nat hook input priority 0; }
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-23 17:44:23 +02:00
Pablo Neira Ayuso
508f8ccdab netfilter: nf_tables: introduce nft_chain_parse_hook()
Introduce a new function to wrap the code that parses the chain hook
configuration so we can reuse this code to validate chain updates.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-23 17:04:25 +02:00
David Howells
18bfeba50d rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor
Perform terminal call ACK/ABORT retransmission in the connection processor
rather than in the call processor.  With this change, once last_call is
set, no more incoming packets will be routed to the corresponding call or
any earlier calls on that channel (call IDs must only increase on a channel
on a connection).

Further, if a packet's callNumber is before the last_call ID or a packet is
aimed at successfully completed service call then that packet is discarded
and ignored.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23 16:02:35 +01:00
David Howells
563ea7d5d4 rxrpc: Calculate serial skew on packet reception
Calculate the serial number skew in the data_ready handler when a packet
has been received and a connection looked up.  The skew is cached in the
sk_buff's priority field.

The connection highest received serial number is updated at this time also.
This can be done without locks or atomic instructions because, at this
point, the code is serialised by the socket.

This generates more accurate skew data because if the packet is offloaded
to a work queue before this is determined, more packets may come in,
bumping the highest serial number and thereby increasing the apparent skew.

This also removes some unnecessary atomic ops.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23 16:02:35 +01:00
David Howells
f51b448002 rxrpc: Set connection expiry on idle, not put
Set the connection expiry time when a connection becomes idle rather than
doing this in rxrpc_put_connection().  This makes the put path more
efficient (it is likely to be called occasionally whilst a connection has
outstanding calls because active workqueue items needs to be given a ref).

The time is also preset in the connection allocator in case the connection
never gets used.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23 16:02:35 +01:00
David Howells
df844fd46b rxrpc: Use a tracepoint for skb accounting debugging
Use a tracepoint to log various skb accounting points to help in debugging
refcounting errors.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23 15:27:24 +01:00
David Howells
01a90a4598 rxrpc: Drop channel number field from rxrpc_call struct
Drop the channel number (channel) field from the rxrpc_call struct to
reduce the size of the call struct.  The field is redundant: if the call is
attached to a connection, the channel can be obtained from there by AND'ing
with RXRPC_CHANNELMASK.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23 15:27:24 +01:00
David Howells
f36b5e444c rxrpc: When clearing a socket, clear the call sets in the right order
When clearing a socket, we should clear the securing-in-progress list
first, then the accept queue and last the main call tree because that's the
order in which a call progresses.  Not that a call should move from the
accept queue to the main tree whilst we're shutting down a socket, but it a
call could possibly move from sequreq to acceptq whilst we're clearing up.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23 15:27:24 +01:00
David Howells
dabe5a7906 rxrpc: Tidy up the rxrpc_call struct a bit
Do a little tidying of the rxrpc_call struct:

 (1) in_clientflag is no longer compared against the value that's in the
     packet, so keeping it in this form isn't necessary.  Use a flag in
     flags instead and provide a pair of wrapper functions.

 (2) We don't read the epoch value, so that can go.

 (3) Move what remains of the data that were used for hashing up in the
     struct to be with the channel number.

 (4) Get rid of the local pointer.  We can get at this via the socket
     struct and we only use this in the procfs viewer.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23 15:27:24 +01:00
David Howells
26164e77ca rxrpc: Remove RXRPC_CALL_PROC_BUSY
Remove RXRPC_CALL_PROC_BUSY as work queue items are now 100% non-reentrant.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23 15:27:23 +01:00
Dave Watson
a01512dbe3 net: strparser: fix strparser sk_user_data check
sk_user_data mismatch between what kcm expects (psock) and what strparser expects (strparser).

Queued rx_work, for example calling strp_check_rcv after socket buffer changes, will never complete.

sk_user_data is unused in strparser, so just remove the check.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 22:43:16 -07:00
Andrew Lunn
7b314362a2 net: dsa: Allow the DSA driver to indicate the tag protocol
DSA drivers may drive different families of switches which need
different tag protocol. Rather than hard code the tag protocol in the
driver structure, have a callback for the DSA core to call.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 21:08:08 -07:00
Geert Uytterhoeven
1ae292a245 net: ipconfig: Fix NULL pointer dereference on RARP/BOOTP/DHCP timeout
If no RARP, BOOTP, or DHCP response is received, ic_dev is never set,
causing a NULL pointer dereference in ic_close_devs():

    Sending DHCP requests ...... timed out!
    Unable to handle kernel NULL pointer dereference at virtual address 00000004

To fix this, add a check to avoid dereferencing ic_dev if it is still
NULL.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 2647cffb2b ("net: ipconfig: Support using "delayed" DHCP replies")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 21:04:41 -07:00
Jamal Hadi Salim
28a10c426e net sched: fix encoding to use real length
Encoding of the metadata was using the padded length as opposed to
the real length of the data which is a bug per specification.
This has not been an issue todate because all metadatum specified
so far has been 32 bit where aligned and data length are the same width.
This also includes a bug fix for validating the length of a u16 field.
But since there is no metadata of size u16 yes we are fine to include it
here.

While at it get rid of magic numbers.

Fixes: ef6980b6be ("net sched: introduce IFE action")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 21:01:57 -07:00
David S. Miller
8b7ac60a5d Merge tag 'batadv-next-for-davem-20160822' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
This feature patchset includes the following changes:

 - place kref_get near usage of referenced objects, separate patches
   for various used objects to improve readability and maintainability
   by Sven Eckelmann (18 patches)

 - Keep batadv net device when all hard interfaces disappear, to
   improve situations where tools currently use work arounds, by
   Sven Eckelmann

 - Add an option to disable debugfs support to minimize footprint when
   userspace uses netlink only, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 20:38:25 -07:00
Shmulik Ladkani
c0451fe1f2 net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset
In b8247f095e,

   "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"

gso skbs arriving from an ingress interface that go through UDP
tunneling, are allowed to be fragmented if the resulting encapulated
segments exceed the dst mtu of the egress interface.

This aligned the behavior of gso skbs to non-gso skbs going through udp
encapsulation path.

However the non-gso vs gso anomaly is present also in the following
cases of a GRE tunnel:
 - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
   (e.g. OvS vport-gre with df_default=false)
 - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set

In both of the above cases, the non-gso skbs get fragmented, whereas the
gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
as they don't go through the segment+fragment code path.

Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.

Tunnels that do set IP_DF, will not go to fragmentation of segments.
This preserves behavior of ip_gre in (the default) pmtudisc mode.

Fixes: b8247f095e ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
Reported-by: wenxu <wenxu@ucloud.cn>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Tested-by: wenxu <wenxu@ucloud.cn>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 17:11:01 -07:00
WANG Cong
b9a24bb76b net_sched: properly handle failure case of tcf_exts_init()
After commit 22dc13c837 ("net_sched: convert tcf_exts from list to pointer array")
we do dynamic allocation in tcf_exts_init(), therefore we need
to handle the ENOMEM case properly.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 17:02:31 -07:00
Mike Manning
85b51b1211 net: ipv6: Remove addresses for failures with strict DAD
If DAD fails with accept_dad set to 2, global addresses and host routes
are incorrectly left in place. Even though disable_ipv6 is set,
contrary to documentation, the addresses are not dynamically deleted
from the interface. It is only on a subsequent link down/up that these
are removed. The fix is not only to set the disable_ipv6 flag, but
also to call addrconf_ifdown(), which is the action to carry out when
disabling IPv6. This results in the addresses and routes being deleted
immediately. The DAD failure for the LL addr is determined as before
via netlink, or by the absence of the LL addr (which also previously
would have had to be checked for in case of an intervening link down
and up). As the call to addrconf_ifdown() requires an rtnl lock, the
logic to disable IPv6 when DAD fails is moved to addrconf_dad_work().

Previous behavior:

root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
net.ipv6.conf.eth3.accept_dad = 2
root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2000::10/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
root@vm1:/# ip -6 route show dev eth3
2000::/64  proto kernel  metric 256
fe80::/64  proto kernel  metric 256
root@vm1:/# ip link set down eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
root@vm1:/# ip -6 route show dev eth3
root@vm1:/#

New behavior:

root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
net.ipv6.conf.eth3.accept_dad = 2
root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
root@vm1:/# ip -6 route show dev eth3
root@vm1:/#

Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 16:59:37 -07:00
Wei Yongjun
a5e5733645 netfilter: nft_hash: fix non static symbol warning
Fixes the following sparse warning:

net/netfilter/nft_hash.c:40:25: warning:
 symbol 'nft_hash_policy' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-22 11:45:41 +02:00
Colin Ian King
8d6c0eaa9e netfilter: fix spelling mistake: "delimitter" -> "delimiter"
trivial fix to spelling mistake in pr_debug message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-22 11:43:27 +02:00
Laura Garcia Liebana
91dbc6be0a netfilter: nf_tables: add number generator expression
This patch adds the numgen expression that allows us to generated
incremental and random numbers, this generator is bound to a upper limit
that is specified by userspace.

This expression is useful to distribute packets in a round-robin fashion
as well as randomly.

Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-22 11:42:22 +02:00