Commit Graph

44719 Commits

Author SHA1 Message Date
Daniel Borkmann
479ffcccef bpf: fix checksum fixups on bpf_skb_store_bytes
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.

Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().

Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset & 1
test away and generates the same code as with csum_sub()/_add().

Fixes: 608cd71a9c ("tc: bpf: generalize pedit action")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 13:11:43 -07:00
Daniel Borkmann
a2bfe6bf09 bpf: also call skb_postpush_rcsum on xmit occasions
Follow-up to commit f8ffad69c9 ("bpf: add skb_postpush_rcsum and fix
dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
locations which need CHECKSUM_COMPLETE fixups on ingress.

For the same reasons as described in f8ffad69c9 already, we of course
also need this here, since dev_queue_xmit() on a veth device will let us
end up in the dev_forward_skb() helper again to cross namespaces.

Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.

Also here we have to address bpf_redirect() and bpf_clone_redirect().

Fixes: 3896d655f4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f6305 ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 13:11:43 -07:00
Phil Sutter
1ba8d77f41 sctp_diag: Respect ss adding TCPF_CLOSE to idiag_states
Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't
rely upon TCPF_LISTEN flag solely being present when listening sockets
are requested.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:51:58 -07:00
Phil Sutter
12474e8e58 sctp_diag: Fix T3_rtx timer export
The asoc's timer value is not kept in asoc->timeouts array but in it's
primary transport instead.

Furthermore, we must export the timer only if it is pending, otherwise
the value will underrun when stored in an unsigned variable and
user space will only see a very large timeout value.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:51:58 -07:00
Eric W. Biederman
703286608a netns: Add a limit on the number of net namespaces
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-08-08 14:42:04 -05:00
Wei Yongjun
864364a29c libceph: using kfree_rcu() to simplify the code
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-08-08 21:41:42 +02:00
Wei Yongjun
f52ec33cbd libceph: make cancel_generic_request() static
Fixes the following sparse warning:

net/ceph/mon_client.c:577:6: warning:
 symbol 'cancel_generic_request' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-08-08 21:41:42 +02:00
Wei Yongjun
c22e853a2e libceph: fix return value check in alloc_msg_with_page_vector()
In case of error, the function ceph_alloc_page_vector() returns
ERR_PTR() and never returns NULL. The NULL test in the return value
check should be replaced with IS_ERR().

Fixes: 1907920324 ('libceph: support for sending notifies')
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-08-08 21:41:41 +02:00
Eric W. Biederman
13bcc6a285 sysctl: Stop implicitly passing current into sysctl_table_root.lookup
Passing nsproxy into sysctl_table_root.lookup was a premature
optimization in attempt to avoid depending on current.  The
directory /proc/self/sys has not appeared and if and when
it does this code will need to be reviewed closely and reworked
anyway.  So remove the premature optimization.

Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-08-08 09:17:16 -05:00
Christophe Leroy
0d35d0815b netfilter: nf_conntrack_sip: CSeq 0 is a valid CSeq
Do not drop packet when CSeq is 0 as 0 is also a valid value for CSeq.

simple_strtoul() will return 0 either when all digits are 0
or if there are no digits at all. Therefore when simple_strtoul()
returns 0 we check if first character is digit 0 or not.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-08 11:58:43 +02:00
Pablo Neira Ayuso
c1eda3c639 netfilter: nft_rbtree: ignore inactive matching element with no descendants
If we find a matching element that is inactive with no descendants, we
jump to the found label, then crash because of nul-dereference on the
left branch.

Fix this by checking that the element is active and not an interval end
and skipping the logic that only applies to the tree iteration.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Anders K. Pedersen <akp@akp.dk>
2016-08-08 11:27:37 +02:00
Liping Zhang
707e6835f8 netfilter: nf_ct_h323: do not re-activate already expired timer
Commit 96d1327ac2 ("netfilter: h323: Use mod_timer instead of
set_expect_timeout") just simplify the source codes
    if (!del_timer(&exp->timeout))
        return 0;
    add_timer(&exp->timeout);
to mod_timer(&exp->timeout, jiffies + info->timeout * HZ);

This is not correct, and introduce a race codition:
    CPU0                     CPU1
     -                     timer expire
  process_rcf              expectation_timed_out
  lock(exp_lock)              -
  find_exp                 waiting exp_lock...
  re-activate timer!!      waiting exp_lock...
  unlock(exp_lock)         lock(exp_lock)
     -                     unlink expect
     -                     free(expect)
     -                     unlock(exp_lock)
So when the timer expires again, we will access the memory that
was already freed.

Replace mod_timer with mod_timer_pending here to fix this problem.

Fixes: 96d1327ac2 ("netfilter: h323: Use mod_timer instead of set_expect_timeout")
Cc: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-08 09:26:40 +02:00
Johannes Berg
4d0bd46a4d Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel"
This reverts commit 3d5fdff46c.

Ben Hutchings pointed out that the commit isn't safe since it assumes
that the structure used by the driver is iw_point, when in fact there's
no way to know about that.

Fortunately, the only driver in the tree that ever runs this code path
is the wilc1000 staging driver, so it doesn't really matter.

Clearly I should have investigated this better before applying, sorry.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org [though I guess it doesn't matter much]
Fixes: 3d5fdff46c ("wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-08 08:49:50 +02:00
David S. Miller
ca25ebe550 Merge tag 'mac80211-for-davem-2016-08-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
First set of fixes for the current cycle:
 * fix 80+80 bandwidth warning
 * fix powersave with mac80211 TXQ implementation
 * use correct way to free SKBs from multicast buffering
 * mesh: fix operation ordering to work with all drivers
 * mesh: end service period even when peer goes away
 * mesh: correct HT opmode validity checks
 * pass hw pointer from mac80211 to driver in TPT method,
   fixing a bug (in a bit the wrong way, but that's what
   we have right now)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06 20:52:00 -04:00
Linus Torvalds
0803e04011 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost updates from Michael Tsirkin:

 - new vsock device support in host and guest

 - platform IOMMU support in host and guest, including compatibility
   quirks for legacy systems.

 - misc fixes and cleanups.

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  VSOCK: Use kvfree()
  vhost: split out vringh Kconfig
  vhost: detect 32 bit integer wrap around
  vhost: new device IOTLB API
  vhost: drop vringh dependency
  vhost: convert pre sorted vhost memory array to interval tree
  vhost: introduce vhost memory accessors
  VSOCK: Add Makefile and Kconfig
  VSOCK: Introduce vhost_vsock.ko
  VSOCK: Introduce virtio_transport.ko
  VSOCK: Introduce virtio_vsock_common.ko
  VSOCK: defer sock removal to transports
  VSOCK: transport-specific vsock_transport functions
  vhost: drop vringh dependency
  vop: pull in vhost Kconfig
  virtio: new feature to detect IOMMU device quirk
  balloon: check the number of available pages in leak balloon
  vhost: lockless enqueuing
  vhost: simplify work flushing
2016-08-06 09:20:13 -04:00
David Forster
94d9f1c590 ipv4: panic in leaf_walk_rcu due to stale node pointer
Panic occurs when issuing "cat /proc/net/route" whilst
populating FIB with > 1M routes.

Use of cached node pointer in fib_route_get_idx is unsafe.

 BUG: unable to handle kernel paging request at ffffc90001630024
 IP: [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
 PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0
 Oops: 0000 [#1] SMP
 Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac
 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti
 acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd
tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod
 CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000
 RIP: 0010:[<ffffffff814cf6a0>]  [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
 RSP: 0018:ffff88011a05fda0  EFLAGS: 00010202
 RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20
 RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950
 RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000
 R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950
 R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480
 FS:  00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0
 Stack:
  ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00
  ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000
  ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80
 Call Trace:
  [<ffffffff814d00d3>] ? fib_route_seq_start+0x93/0xc0
  [<ffffffff811dd8b9>] ? seq_read+0x149/0x380
  [<ffffffff811f8714>] ? fsnotify+0x3b4/0x500
  [<ffffffff8138dce0>] ? process_echoes+0x70/0x70
  [<ffffffff8121cfa7>] ? proc_reg_read+0x47/0x70
  [<ffffffff811bb823>] ? __vfs_read+0x23/0xd0
  [<ffffffff811bbd42>] ? rw_verify_area+0x52/0xf0
  [<ffffffff811bbe61>] ? vfs_read+0x81/0x120
  [<ffffffff811bcbc2>] ? SyS_read+0x42/0xa0
  [<ffffffff81549ab2>] ? entry_SYSCALL_64_fastpath+0x16/0x75
 Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89
 RIP  [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0
  RSP <ffff88011a05fda0>
 CR2: ffffc90001630024

Signed-off-by: Dave Forster <dforster@brocade.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06 00:10:05 -04:00
David Howells
372ee16386 rxrpc: Fix races between skb free, ACK generation and replying
Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.

Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released.  The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb.  ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.

To this end, the following changes are made:

 (1) kernel_rxrpc_data_consumed() is added.  This should be called whenever
     an skb is emptied so as to crank the ACK and call states.  This does
     not release the skb, however.  kernel_rxrpc_free_skb() must now be
     called to achieve that.  These together replace
     rxrpc_kernel_data_delivered().

 (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().

     This makes afs_deliver_to_call() easier to work as the skb can simply
     be discarded unconditionally here without trying to work out what the
     return value of the ->deliver() function means.

     The ->deliver() functions can, via afs_data_complete(),
     afs_transfer_reply() and afs_extract_data() mark that an skb has been
     consumed (thereby cranking the state) without the need to
     conditionally free the skb to make sure the state is correct on an
     incoming call for when the call processor tries to send the reply.

 (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
     has finished with a packet and MSG_PEEK isn't set.

 (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().

     Because of this, we no longer need to clear the destructor and put the
     call before we free the skb in cases where we don't want the ACK/call
     state to be cranked.

 (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
     than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
     the delivery function already), and the caller is now responsible for
     producing an abort if that was the last packet.

 (6) There are many bits of unmarshalling code where:

 		ret = afs_extract_data(call, skb, last, ...);
		switch (ret) {
		case 0:		break;
		case -EAGAIN:	return 0;
		default:	return ret;
		}

     is to be found.  As -EAGAIN can now be passed back to the caller, we
     now just return if ret < 0:

 		ret = afs_extract_data(call, skb, last, ...);
		if (ret < 0)
			return ret;

 (7) Checks for trailing data and empty final data packets has been
     consolidated as afs_data_complete().  So:

		if (skb->len > 0)
			return -EBADMSG;
		if (!last)
			return 0;

     becomes:

		ret = afs_data_complete(call, skb, last);
		if (ret < 0)
			return ret;

 (8) afs_transfer_reply() now checks the amount of data it has against the
     amount of data desired and the amount of data in the skb and returns
     an error to induce an abort if we don't get exactly what we want.

Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:

general protection fault: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G            E   4.7.0-fsdevel+ #1303
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: kafsd afs_async_workfn [kafs]
task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
RIP: 0010:[<ffffffff8108fd3c>]  [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
RSP: 0018:ffff88040c073bc0  EFLAGS: 00010002
RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
Stack:
 0000000000000006 000000000be04930 0000000000000000 ffff880400000000
 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
Call Trace:
 [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6
 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
 [<ffffffff814c928f>] skb_dequeue+0x18/0x61
 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
 [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
 [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
 [<ffffffff81063a3a>] process_one_work+0x29d/0x57c
 [<ffffffff81064ac2>] worker_thread+0x24a/0x385
 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
 [<ffffffff810696f5>] kthread+0xf3/0xfb
 [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06 00:08:40 -04:00
Ian Wienand
5ef9f289c4 OVS: Ignore negative headroom value
net_device->ndo_set_rx_headroom (introduced in
871b642ade) says

  "Setting a negtaive value reset the rx headroom
   to the default value".

It seems that the OVS implementation in
3a927bc7cf overlooked this and sets
dev->needed_headroom unconditionally.

This doesn't have an immediate effect, but can mess up later
LL_RESERVED_SPACE calculations, such as done in
net/ipv6/mcast.c:mld_newpack.  For reference, this issue was found
from a skb_panic raised there after the length calculations had given
the wrong result.

Note the other current users of this interface
(drivers/net/tun.c:tun_set_headroom and
drivers/net/veth.c:veth_set_rx_headroom) are both checking this
correctly thus need no modification.

Thanks to Ben for some pointers from the crash dumps!

Cc: Benjamin Poirier <bpoirier@suse.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414
Signed-off-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06 00:06:11 -04:00
Trond Myklebust
8d480326c3 NFSv4: Cap the transport reconnection timer at 1/2 lease period
We don't want to miss a lease period renewal due to the TCP connection
failing to reconnect in a timely fashion. To ensure this doesn't happen,
cap the reconnection timer so that we retry the connection attempt
at least every 1/2 lease period.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 19:22:22 -04:00
Trond Myklebust
3851f1cdb2 SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
...and ensure that we propagate it to new transports on the same
client.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 14:12:09 -04:00
Trond Myklebust
02910177ae SUNRPC: Fix reconnection timeouts
When the connect attempt fails and backs off, we should start the clock
at the last connection attempt, not time at which we queue up the
reconnect job.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 12:18:10 -04:00
NeilBrown
d88e4d82ef SUNRPC: disable the use of IPv6 temporary addresses.
If the net.ipv6.conf.*.use_temp_addr sysctl is set to '2',
then TCP connections over IPv6 will prefer a 'private' source
address.
These eventually expire and become invalid, typically after a week,
but the time is configurable.

When the local address becomes invalid the client will not be able to
receive replies from the server.  Eventually the connection will timeout
or break and a new connection will be established, but this can take
half an hour (typically TCP connection break time).

RFC 4941, which describes private IPv6 addresses, acknowledges that some
applications might not work well with them and that the application may
explicitly a request non-temporary (i.e. "public") address.

I believe this is correct for SUNRPC clients.  Without this change, a
client will occasionally experience a long delay if private addresses
have been enabled.

The privacy offered by private addresses is of little value for an NFS
server which requires client authentication.

For NFSv3 this will often not be a problem because idle connections are
closed after 5 minutes.  For NFSv4 connections never go idle due to the
period RENEW (or equivalent) request.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 11:29:59 -04:00
Olga Kornievskaia
9130b8dbc6 SUNRPC: allow for upcalls for same uid but different gss service
It's possible to have simultaneous upcalls for the same UIDs but
different GSS service. In that case, we need to allow for the
upcall to gssd to proceed so that not the same context is used
by two different GSS services. Some servers lock the use of context
to the GSS service.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 11:29:59 -04:00
Maxim Altshul
2439ca0402 mac80211: Add ieee80211_hw pointer to get_expected_throughput
The variable is added to allow the driver an easy access to
it's own hw->priv when the op is invoked.

This fixes a crash in wlcore because it was relying on a
station pointer that wasn't initialized yet. It's the wrong
way to fix the crash, but it solves the problem for now and
it does make sense to have the hw pointer here.

Signed-off-by: Maxim Altshul <maxim.altshul@ti.com>
[rewrite commit message, fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-05 14:23:25 +02:00
Masashi Honma
9757235f45 nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value
Previously, NL80211_MESHCONF_HT_OPMODE validation rejected correct
flag combinations, e.g. IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED |
IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT.

Doing just a range-check allows setting flags that don't exist (0x8)
and invalid flag combinations.

Implements some checks based on IEEE 802.11 2012 8.4.2.59 "HT
Operation element".

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
[reword commit message, simplify a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-05 14:14:54 +02:00
Masashi Honma
71f2c3470f mac80211: End the MPSP even if EOSP frame was not acked
If QoS frame with EOSP (end of service period) subfield=1 sent by local
peer was not acked by remote peer, local peer did not end the MPSP. This
prevents local peer from going to DOZE state. And if the remote peer
goes away without closing connection, local peer continues AWAKE state
and wastes battery.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-05 14:06:29 +02:00
Felix Fietkau
6b07d9ca9b mac80211: fix purging multicast PS buffer queue
The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.

Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-05 14:06:28 +02:00
Linus Torvalds
a71e36045e Merge tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Highlights:

   - Trond made a change to the server's tcp logic that allows a fast
     client to better take advantage of high bandwidth networks, but may
     increase the risk that a single client could starve other clients;
     a new sunrpc.svc_rpc_per_connection_limit parameter should help
     mitigate this in the (hopefully unlikely) event this becomes a
     problem in practice.

   - Tom Haynes added a minimal flex-layout pnfs server, which is of no
     use in production for now--don't build it unless you're doing
     client testing or further server development"

* tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
  nfsd: remove some dead code in nfsd_create_locked()
  nfsd: drop unnecessary MAY_EXEC check from create
  nfsd: clean up bad-type check in nfsd_create_locked
  nfsd: remove unnecessary positive-dentry check
  nfsd: reorganize nfsd_create
  nfsd: check d_can_lookup in fh_verify of directories
  nfsd: remove redundant zero-length check from create
  nfsd: Make creates return EEXIST instead of EACCES
  SUNRPC: Detect immediate closure of accepted sockets
  SUNRPC: accept() may return sockets that are still in SYN_RECV
  nfsd: allow nfsd to advertise multiple layout types
  nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
  nfsd/blocklayout: Make sure calculate signature/designator length aligned
  xfs: abstract block export operations from nfsd layouts
  SUNRPC: Remove unused callback xpo_adjust_wspace()
  SUNRPC: Change TCP socket space reservation
  SUNRPC: Add a server side per-connection limit
  SUNRPC: Micro optimisation for svc_data_ready
  SUNRPC: Call the default socket callbacks instead of open coding
  SUNRPC: lock the socket while detaching it
  ...
2016-08-04 19:59:06 -04:00
Masahiro Yamada
97f2645f35 tree-wide: replace config_enabled() with IS_ENABLED()
The use of config_enabled() against config options is ambiguous.  In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED().  Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
clearer.

This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.

I noticed two cases where config_enabled() is used against a tristate
option:

 - config_enabled(CONFIG_HWMON)
  [ drivers/net/wireless/ath/ath10k/thermal.c ]

 - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
  [ drivers/gpu/drm/gma500/opregion.c ]

I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.

Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Joshua Kinard <kumba@gentoo.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will Drewry <wad@chromium.org>
Cc: Nikolay Martynov <mar.kolya@gmail.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: Rafal Milecki <zajec5@gmail.com>
Cc: James Cowgill <James.Cowgill@imgtec.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tony Wu <tung7970@gmail.com>
Cc: Huaitong Han <huaitong.han@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Rabin Vincent <rabin@rab.in>
Cc: "Maciej W. Rozycki" <macro@imgtec.com>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Jarno Rajahalme
bce91f8a42 openvswitch: Remove incorrect WARN_ONCE().
ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen.  However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen.  This
has resulted into somewhat confusing kernel log messages.  This patch
removes this check and warning.

Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-03 11:50:40 -07:00
Linus Torvalds
f0936155f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix several cases of missing of_node_put() calls in various
    networking drivers.  From Peter Chen.

 2) Don't try to remove unconfigured VLANs in qed driver, from Yuval
    Mintz.

 3) Unbalanced locking in TIPC error handling, from Wei Yongjun.

 4) Fix lockups in CPDMA driver, from Grygorii Strashko.

 5) More MACSEC refcount et al fixes, from Sabrina Dubroca.

 6) Fix MAC address setting in r8169 during runtime suspend, from
    Chun-Hao Lin.

 7) Various printf format specifier fixes, from Heinrich Schuchardt.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
  qed: Fail driver load in 100g MSI mode.
  ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
  ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle
  ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle
  ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
  ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle
  ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle
  ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
  ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
  ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle
  ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle
  ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle
  ethernet: altera: add missing of_node_put
  8139too: fix system hang when there is a tx timeout event.
  qed: Fix error return code in qed_resc_alloc()
  net: qlcnic: avoid superfluous assignement
  dsa: b53: remove redundant if
  ...
2016-08-03 07:26:11 -04:00
Maital Hahn
c37a54ac37 mac80211: mesh: flush stations before beacons are stopped
Some drivers (e.g. wl18xx) expect that the last stage in the
de-initialization process will be stopping the beacons, similar to AP flow.
Update ieee80211_stop_mesh() flow accordingly.
As peers can be removed dynamically, this would not impact other drivers.

Tested also on Ralink RT3572 chipset.

Signed-off-by: Maital Hahn <maitalm@ti.com>
Signed-off-by: Yaniv Machani <yanivma@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-03 08:45:15 +02:00
Linus Torvalds
72b5ac54d6 Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client
Pull Ceph updates from Ilya Dryomov:
 "The highlights are:

   - RADOS namespace support in libceph and CephFS (Zheng Yan and
     myself).  The stopgaps added in 4.5 to deny access to inodes in
     namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
     bit is now fully supported

   - A large rework of the MDS cap flushing code (Zheng Yan)

   - Handle some of ->d_revalidate() in RCU mode (Jeff Layton).  We were
     overly pessimistic before, bailing at the first sight of LOOKUP_RCU

  On top of that we've got a few CephFS bug fixes, a couple of cleanups
  and Arnd's workaround for a weird genksyms issue"

* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
  ceph: fix symbol versioning for ceph_monc_do_statfs
  ceph: Correctly return NXIO errors from ceph_llseek
  ceph: Mark the file cache as unreclaimable
  ceph: optimize cap flush waiting
  ceph: cleanup ceph_flush_snaps()
  ceph: kick cap flushes before sending other cap message
  ceph: introduce an inode flag to indicates if snapflush is needed
  ceph: avoid sending duplicated cap flush message
  ceph: unify cap flush and snapcap flush
  ceph: use list instead of rbtree to track cap flushes
  ceph: update types of some local varibles
  ceph: include 'follows' of pending snapflush in cap reconnect message
  ceph: update cap reconnect message to version 3
  ceph: mount non-default filesystem by name
  libceph: fsmap.user subscription support
  ceph: handle LOOKUP_RCU in ceph_d_revalidate
  ceph: allow dentry_lease_is_valid to work under RCU walk
  ceph: clear d_fsinfo pointer under d_lock
  ceph: remove ceph_mdsc_lease_release
  ceph: don't use ->d_time
  ...
2016-08-02 19:39:09 -04:00
Trond Myklebust
ad3331acb1 SUNRPC: Fix up socket autodisconnect
Ensure that we don't forget to set up the disconnection timer for the
case when a connect request is fulfilled after the RPC request that
initiated it has timed out or been interrupted.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-02 13:47:43 -04:00
Felix Fietkau
4e3f21bc7b mac80211: fix check for buffered powersave frames with txq
The logic was inverted here, set the bit if frames are pending.

Fixes: ba8c3d6f16 ("mac80211: add an intermediate software queue implementation")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-02 09:50:26 +02:00
Colin Ian King
680682d4d5 cfg80211: fix missing break in NL8211_CHAN_WIDTH_80P80 case
The switch on chandef->width is missing a break on the
NL8211_CHAN_WIDTH_80P80 case; currently we get a WARN_ON when
center_freq2 is non-zero because of the missing break.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-02 09:50:25 +02:00
Asias He
304ba62fd4 VSOCK: Add Makefile and Kconfig
Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-02 02:57:30 +03:00
Asias He
0ea9e1d3a9 VSOCK: Introduce virtio_transport.ko
VM sockets virtio transport implementation.  This driver runs in the
guest.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-02 02:57:29 +03:00
Asias He
06a8fc7836 VSOCK: Introduce virtio_vsock_common.ko
This module contains the common code and header files for the following
virtio_transporto and vhost_vsock kernel modules.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-02 02:57:29 +03:00
Stefan Hajnoczi
6773b7dc39 VSOCK: defer sock removal to transports
The virtio transport will implement graceful shutdown and the related
SO_LINGER socket option.  This requires orphaning the sock but keeping
it in the table of connections after .release().

This patch adds the vsock_remove_sock() function and leaves it up to the
transport when to remove the sock.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-02 02:57:28 +03:00
Stefan Hajnoczi
0b01aeb3d2 VSOCK: transport-specific vsock_transport functions
struct vsock_transport contains function pointers called by AF_VSOCK
core code.  The transport may want its own transport-specific function
pointers and they can be added after struct vsock_transport.

Allow the transport to fetch vsock_transport.  It can downcast it to
access transport-specific function pointers.

The virtio transport will use this.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-02 02:57:28 +03:00
Trond Myklebust
c7995f8a70 SUNRPC: Detect immediate closure of accepted sockets
This modification is useful for debugging issues that happen while
the socket is being initialised.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-01 17:53:42 -04:00
Trond Myklebust
b2f21f7d85 SUNRPC: accept() may return sockets that are still in SYN_RECV
We're seeing traces of the following form:

 [10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
 [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
 [10952.396362] nfsd: connect from 10.2.6.1, port=187
 [10952.396364] svc: svc_setup_socket ffff8800b99bcf00
 [10952.396368] setting up TCP socket for reading
 [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
 [10952.396373] svc: transport ffff8803eb10a000 put into queue
 [10952.396375] svc: transport ffff88042ba4a000 put into queue
 [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
 [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
 [10952.396381] svc_recv: found XPT_CLOSE
 [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
 [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
 [10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
 [10952.396412] svc: svc_sock_free(ffff8803eb10a000)

i.e. an immediate close of the socket after initialisation.

The culprit appears to be the test at the end of svc_tcp_init, which
checks if the newly created socket is in the TCP_ESTABLISHED state,
and immediately closes it if not. The evidence appears to suggest that
the socket might still be in the SYN_RECV state at this time.

The fix is to check for both states, and then to add a check in
svc_tcp_state_change() to ensure we don't close the socket when
it transitions into TCP_ESTABLISHED.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-01 17:53:41 -04:00
Trond Myklebust
1f4c17a03b SUNRPC: Handle EADDRNOTAVAIL on connection failures
If the connect attempt immediately fails with an EADDRNOTAVAIL error, then
that means our choice of source port number was bad.
This error is expected when we set the SO_REUSEPORT socket option and we
have 2 sockets sharing the same source and destination address and port
combinations.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 402e23b4ed ("SUNRPC: Fix stupid typo in xs_sock_set_reuseport")
Cc: stable@vger.kernel.org # v4.0+
2016-08-01 15:03:02 -04:00
Xin Long
e08786942e sctp: allow receiving msg when TCP-style sk is in CLOSED state
Commit 141ddefce7 ("sctp: change sk state to CLOSED instead of
CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the
assoc is closed when sctp_accept clones a new sk.

If there is still data in sk receive queue, users will not be able
to read it any more, as sctp_recvmsg returns directly if sk state
is CLOSED.

This patch is to add CLOSED state check in sctp_recvmsg to allow
reading data from TCP-style sk with CLOSED state as what TCP does.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30 22:06:22 -07:00
Xin Long
a0fc6843f9 sctp: allow delivering notifications after receiving SHUTDOWN
Prior to this patch, once sctp received SHUTDOWN or shutdown with RD,
sk->sk_shutdown would be set with RCV_SHUTDOWN, and all events would
be dropped in sctp_ulpq_tail_event(). It would cause:

1. some notifications couldn't be received by users. like
   SCTP_SHUTDOWN_COMP generated by sctp_sf_do_4_C().

2. sctp would also never trigger sk_data_ready when the association
   was closed, making it harder to identify the end of the association
   by calling recvmsg() and getting an EOF. It was not convenient for
   kernel users.

The check here should be stopping delivering DATA chunks after receiving
SHUTDOWN, and stopping delivering ANY chunks after sctp_close().

So this patch is to allow notifications to enqueue into receive queue
even if sk->sk_shutdown is set to RCV_SHUTDOWN in sctp_ulpq_tail_event,
but if sk->sk_shutdown == RCV_SHUTDOWN | SEND_SHUTDOWN, it drops all
events.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30 22:06:22 -07:00
Xin Long
1aa25ec227 sctp: fix the issue sctp requeue auth chunk incorrectly
sctp needs to queue auth chunk back when we know that we are going
to generate another segment. But commit f1533cce60 ("sctp: fix
panic when sending auth chunks") requeues the last chunk processed
which is probably not the auth chunk.

It causes panic when calculating the MAC in sctp_auth_calculate_hmac(),
as the incorrect offset of the auth chunk in skb->data.

This fix is to requeue it by using packet->auth.

Fixes: f1533cce60 ("sctp: fix panic when sending auth chunks")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30 22:06:22 -07:00
Soheil Hassas Yeganeh
f626300a3e tcp: consider recv buf for the initial window scale
tcp_select_initial_window() intends to advertise a window
scaling for the maximum possible window size. To do so,
it considers the maximum of net.ipv4.tcp_rmem[2] and
net.core.rmem_max as the only possible upper-bounds.
However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
to set the socket's receive buffer size to values
larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
Thus, SO_RCVBUFFORCE is effectively ignored by
tcp_select_initial_window().

To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
net.core.rmem_max and socket's initial buffer space.

Fixes: b0573dea1f ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30 21:21:57 -07:00
Wei Yongjun
c882219ae4 net: ipv6: use list_move instead of list_del/list_add
Using list_move() instead of list_del() + list_add().

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30 20:41:59 -07:00
Wei Yongjun
6b65bc2972 tipc: fix imbalance read_unlock_bh in __tipc_nl_add_monitor()
In the error handling case of nla_nest_start() failed read_unlock_bh()
is called  to unlock a lock that had not been taken yet. sparse warns
about the context imbalance as the following:

net/tipc/monitor.c:799:23: warning:
 context imbalance in '__tipc_nl_add_monitor' - different lock contexts for basic block

Fixes: cf6f7e1d51 ('tipc: dump monitor attributes')
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30 20:38:22 -07:00