Commit Graph

57846 Commits

Author SHA1 Message Date
YueHaibing
0ba9480cff bridge: remove duplicated include from br_multicast.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:49:57 -08:00
Zhaolong Zhang
8eab6dac8d tipc: remove dead code in struct tipc_topsrv
max_rcvbuf_size is no longer used since commit "414574a0af36".

Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:31:23 -08:00
Priyaranjan Jha
78dc70ebaa tcp_bbr: adapt cwnd based on ack aggregation estimation
Aggregation effects are extremely common with wifi, cellular, and cable
modem link technologies, ACK decimation in middleboxes, and LRO and GRO
in receiving hosts. The aggregation can happen in either direction,
data or ACKs, but in either case the aggregation effect is visible
to the sender in the ACK stream.

Previously BBR's sending was often limited by cwnd under severe ACK
aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets
were acked in bursts after long delays (e.g. one ACK acking 5*BDP after
5*RTT), BBR's sending was halted after sending 2*BDP over 2*RTT, leaving
the bottleneck idle for potentially long periods. Note that loss-based
congestion control does not have this issue because when facing
aggregation it continues increasing cwnd after bursts of ACKs, growing
cwnd until the buffer is full.

To achieve good throughput in the presence of aggregation effects, this
algorithm allows the BBR sender to put extra data in flight to keep the
bottleneck utilized during silences in the ACK stream that it has evidence
to suggest were caused by aggregation.

A summary of the algorithm: when a burst of packets are acked by a
stretched ACK or a burst of ACKs or both, BBR first estimates the expected
amount of data that should have been acked, based on its estimated
bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max
filter to estimate the recent level of observed ACK aggregation. Then cwnd
is increased by the ACK aggregation estimate. The larger cwnd avoids BBR
being cwnd-limited in the face of ACK silences that recent history suggests
were caused by aggregation. As a sanity check, the ACK aggregation degree
is upper-bounded by the cwnd (at the time of measurement) and a global max
of BW * 100ms. The algorithm is further described by the following
presentation:
https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00

In our internal testing, we observed a significant increase in BBR
throughput (measured using netperf), in a basic wifi setup.
- Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi)
- 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps
- 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps

Also, this code is running globally on YouTube TCP connections and produced
significant bandwidth increases for YouTube traffic.

This is based on Ian Swett's max_ack_height_ algorithm from the
QUIC BBR implementation.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:27:27 -08:00
Priyaranjan Jha
232aa8ec3e tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning
Because bbr_target_cwnd() is really a general-purpose BBR helper for
computing some volume of inflight data as a function of the estimated
BDP, refactor it into following helper functions:
- bbr_bdp()
- bbr_quantization_budget()
- bbr_inflight()

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:27:26 -08:00
David S. Miller
9620d6f683 Merge tag 'linux-can-fixes-for-5.0-20190122' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:

====================
pull-request: can 2019-01-22

this is a pull request of 4 patches for net/master.

The first patch by is by Manfred Schlaegl and reverts a patch that caused wrong
warning messages in certain use cases. The next patch is by Oliver Hartkopp for
the bcm that adds sanity checks for the timer value before using it to detect
potential interger overflows. The last two patches are for the flexcan driver,
YueHaibing's patch fixes the the return value in the error path of the
flexcan_setup_stop_mode() function. The second patch is by Uwe Kleine-König and
fixes a NULL pointer deref on older flexcan cores in flexcan_chip_start().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 21:52:37 -08:00
Xin Long
ecf938fe7d sctp: set flow sport from saddr only when it's 0
Now sctp_transport_pmtu() passes transport->saddr into .get_dst() to set
flow sport from 'saddr'. However, transport->saddr is set only when
transport->dst exists in sctp_transport_route().

If sctp_transport_pmtu() is called without transport->saddr set, like
when transport->dst doesn't exists, the flow sport will be set to 0
from transport->saddr, which will cause a wrong route to be got.

Commit 6e91b578bf ("sctp: re-use sctp_transport_pmtu in
sctp_transport_route") made the issue be triggered more easily
since sctp_transport_pmtu() would be called in sctp_transport_route()
after that.

In gerneral, fl4->fl4_sport should always be set to
htons(asoc->base.bind_addr.port), unless transport->asoc doesn't exist
in sctp_v4/6_get_dst(), which is the case:

  sctp_ootb_pkt_new() ->
    sctp_transport_route()

For that, we can simply handle it by setting flow sport from saddr only
when it's 0 in sctp_v4/6_get_dst().

Fixes: 6e91b578bf ("sctp: re-use sctp_transport_pmtu in sctp_transport_route")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 18:13:57 -08:00
Xin Long
4ff40b8626 sctp: set chunk transport correctly when it's a new asoc
In the paths:

  sctp_sf_do_unexpected_init() ->
    sctp_make_init_ack()
  sctp_sf_do_dupcook_a/b()() ->
    sctp_sf_do_5_1D_ce()

The new chunk 'retval' transport is set from the incoming chunk 'chunk'
transport. However, 'retval' transport belong to the new asoc, which
is a different one from 'chunk' transport's asoc.

It will cause that the 'retval' chunk gets set with a wrong transport.
Later when sending it and because of Commit b9fd683982 ("sctp: add
sctp_packet_singleton"), sctp_packet_singleton() will set some fields,
like vtag to 'retval' chunk from that wrong transport's asoc.

This patch is to fix it by setting 'retval' transport correctly which
belongs to the right asoc in sctp_make_init_ack() and
sctp_sf_do_5_1D_ce().

Fixes: b9fd683982 ("sctp: add sctp_packet_singleton")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 18:13:57 -08:00
Xin Long
8220c870cb sctp: improve the events for sctp stream adding
This patch is to improve sctp stream adding events in 2 places:

  1. In sctp_process_strreset_addstrm_out(), move up SCTP_MAX_STREAM
     and in stream allocation failure checks, as the adding has to
     succeed after reconf_timer stops for the in stream adding
     request retransmission.

  3. In sctp_process_strreset_addstrm_in(), no event should be sent,
     as no in or out stream is added here.

Fixes: 50a41591f1 ("sctp: implement receiver-side procedures for the Add Outgoing Streams Request Parameter")
Fixes: c5c4ebb3ab ("sctp: implement receiver-side procedures for the Add Incoming Streams Request Parameter")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 18:13:57 -08:00
Xin Long
2e6dc4d951 sctp: improve the events for sctp stream reset
This patch is to improve sctp stream reset events in 4 places:

  1. In sctp_process_strreset_outreq(), the flag should always be set with
     SCTP_STREAM_RESET_INCOMING_SSN instead of OUTGOING, as receiver's in
     stream is reset here.
  2. In sctp_process_strreset_outreq(), move up SCTP_STRRESET_ERR_WRONG_SSN
     check, as the reset has to succeed after reconf_timer stops for the
     in stream reset request retransmission.
  3. In sctp_process_strreset_inreq(), no event should be sent, as no in
     or out stream is reset here.
  4. In sctp_process_strreset_resp(), SCTP_STREAM_RESET_INCOMING_SSN or
     OUTGOING event should always be sent for stream reset requests, no
     matter it fails or succeeds to process the request.

Fixes: 8105447645 ("sctp: implement receiver-side procedures for the Outgoing SSN Reset Request Parameter")
Fixes: 16e1a91965 ("sctp: implement receiver-side procedures for the Incoming SSN Reset Request Parameter")
Fixes: 11ae76e67a ("sctp: implement receiver-side procedures for the Reconf Response Parameter")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 18:13:57 -08:00
wenxu
d71b57532d ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
ip l add dev tun type gretap key 1000
ip a a dev tun 10.0.0.1/24

Packets with tun-id 1000 can be recived by tun dev. But packet can't
be sent through dev tun for non-tunnel-dst

With this patch: tunnel-dst can be get through lwtunnel like beflow:
ip r a 10.0.0.7 encap ip dst 172.168.0.11 dev tun

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 17:54:12 -08:00
Björn Töpel
a36b38aa2a xsk: add sock_diag interface for AF_XDP
This patch adds the sock_diag interface for querying sockets from user
space. Tools like iproute2 ss(8) can use this interface to list open
AF_XDP sockets.

The user-space ABI is defined in linux/xdp_diag.h and includes netlink
request and response structs. The request can query sockets and the
response contains socket information about the rings, umems, inode and
more.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-25 01:50:03 +01:00
Björn Töpel
50e74c0131 xsk: add id to umem
This commit adds an id to the umem structure. The id uniquely
identifies a umem instance, and will be exposed to user-space via the
socket monitoring interface.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-25 01:50:03 +01:00
Björn Töpel
1d0dc06930 net: xsk: track AF_XDP sockets on a per-netns list
Track each AF_XDP socket in a per-netns list. This will be used later
by the sock_diag interface for querying sockets from userspace.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-25 01:50:03 +01:00
ZhangXiaoxu
53ab60baa1 ipvs: Fix signed integer overflow when setsockopt timeout
There is a UBSAN bug report as below:
UBSAN: Undefined behaviour in net/netfilter/ipvs/ip_vs_ctl.c:2227:21
signed integer overflow:
-2147483647 * 1000 cannot be represented in type 'int'

Reproduce program:
	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/socket.h>

	#define IPPROTO_IP 0
	#define IPPROTO_RAW 255

	#define IP_VS_BASE_CTL		(64+1024+64)
	#define IP_VS_SO_SET_TIMEOUT	(IP_VS_BASE_CTL+10)

	/* The argument to IP_VS_SO_GET_TIMEOUT */
	struct ipvs_timeout_t {
		int tcp_timeout;
		int tcp_fin_timeout;
		int udp_timeout;
	};

	int main() {
		int ret = -1;
		int sockfd = -1;
		struct ipvs_timeout_t to;

		sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
		if (sockfd == -1) {
			printf("socket init error\n");
			return -1;
		}

		to.tcp_timeout = -2147483647;
		to.tcp_fin_timeout = -2147483647;
		to.udp_timeout = -2147483647;

		ret = setsockopt(sockfd,
				 IPPROTO_IP,
				 IP_VS_SO_SET_TIMEOUT,
				 (char *)(&to),
				 sizeof(to));

		printf("setsockopt return %d\n", ret);
		return ret;
	}

Return -EINVAL if the timeout value is negative or max than 'INT_MAX / HZ'.

Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-24 13:38:54 +01:00
Eric Dumazet
d9ff286a0f bpf: allow BPF programs access skb_shared_info->gso_segs field
This adds the ability to read gso_segs from a BPF program.

v3: Use BPF_REG_AX instead of BPF_REG_TMP for the temporary register,
    as suggested by Martin.

v2: refined Eddie Hao patch to address Alexei feedback.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eddie Hao <eddieh@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-24 10:49:05 +01:00
Eric Dumazet
63530aba78 ax25: fix possible use-after-free
syzbot found that ax25 routes where not properly protected
against concurrent use [1].

In this particular report the bug happened while
copying ax25->digipeat.

Fix this problem by making sure we call ax25_get_route()
while ax25_route_lock is held, so that no modification
could happen while using the route.

The current two ax25_get_route() callers do not sleep,
so this change should be fine.

Once we do that, ax25_get_route() no longer needs to
grab a reference on the found route.

[1]
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
BUG: KASAN: use-after-free in memcpy include/linux/string.h:352 [inline]
BUG: KASAN: use-after-free in kmemdup+0x42/0x60 mm/util.c:113
Read of size 66 at addr ffff888066641a80 by task syz-executor2/531

ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
CPU: 1 PID: 531 Comm: syz-executor2 Not tainted 5.0.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x24/0x50 mm/kasan/common.c:130
 memcpy include/linux/string.h:352 [inline]
 kmemdup+0x42/0x60 mm/util.c:113
 kmemdup include/linux/string.h:425 [inline]
 ax25_rt_autobind+0x25d/0x750 net/ax25/ax25_route.c:424
 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1224
 __sys_connect+0x357/0x490 net/socket.c:1664
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect net/socket.c:1672 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1672
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f870ee22c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458099
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f870ee236d4
R13: 00000000004be48e R14: 00000000004ce9a8 R15: 00000000ffffffff

Allocated by task 526:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc mm/kasan/common.c:496 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
 kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
 kmalloc include/linux/slab.h:545 [inline]
 ax25_rt_add net/ax25/ax25_route.c:95 [inline]
 ax25_rt_ioctl+0x3b9/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
Freed by task 550:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
 __cache_free mm/slab.c:3487 [inline]
 kfree+0xcf/0x230 mm/slab.c:3806
 ax25_rt_add net/ax25/ax25_route.c:92 [inline]
 ax25_rt_ioctl+0x304/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888066641a80
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
 96-byte region [ffff888066641a80, ffff888066641ae0)
The buggy address belongs to the page:
page:ffffea0001999040 count:1 mapcount:0 mapping:ffff88812c3f04c0 index:0x0
flags: 0x1fffc0000000200(slab)
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
raw: 01fffc0000000200 ffffea0001817948 ffffea0002341dc8 ffff88812c3f04c0
raw: 0000000000000000 ffff888066641000 0000000100000020 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                   ^
 ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23 11:18:00 -08:00
Gustavo A. R. Silva
6317950c1b Bluetooth: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

net/bluetooth/rfcomm/core.c:479:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/bluetooth/l2cap_core.c:4223:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-01-23 18:27:16 +01:00
Gustavo A. R. Silva
f79e3365bc tipc: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

net/tipc/link.c:1125:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/tipc/socket.c:736:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/tipc/socket.c:2418:7: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23 09:06:35 -08:00
Marcel Holtmann
7c9cbd0b5e Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer
The function l2cap_get_conf_opt will return L2CAP_CONF_OPT_SIZE + opt->len
as length value. The opt->len however is in control over the remote user
and can be used by an attacker to gain access beyond the bounds of the
actual packet.

To prevent any potential leak of heap memory, it is enough to check that
the resulting len calculation after calling l2cap_get_conf_opt is not
below zero. A well formed packet will always return >= 0 here and will
end with the length value being zero after the last option has been
parsed. In case of malformed packets messing with the opt->len field the
length value will become negative. If that is the case, then just abort
and ignore the option.

In case an attacker uses a too short opt->len value, then garbage will
be parsed, but that is protected by the unknown option handling and also
the option parameter size checks.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2019-01-23 13:35:07 +02:00
Yafang Shao
c9e4576743 bpf: sock recvbuff must be limited by rmem_max in bpf_setsockopt()
When sock recvbuff is set by bpf_setsockopt(), the value must by
limited by rmem_max. It is the same with sendbuff.

Fixes: 8c4b4c7e9f ("bpf: Add setsockopt helper function to bpf")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-23 12:22:22 +01:00
Marcel Holtmann
af3d5d1c87 Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt
When doing option parsing for standard type values of 1, 2 or 4 octets,
the value is converted directly into a variable instead of a pointer. To
avoid being tricked into being a pointer, check that for these option
types that sizes actually match. In L2CAP every option is fixed size and
thus it is prudent anyway to ensure that the remote side sends us the
right option size along with option paramters.

If the option size is not matching the option type, then that option is
silently ignored. It is a protocol violation and instead of trying to
give the remote attacker any further hints just pretend that option is
not present and proceed with the default values. Implementation
following the specification and its qualification procedures will always
use the correct size and thus not being impacted here.

To keep the code readable and consistent accross all options, a few
cosmetic changes were also required.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2019-01-23 12:53:20 +02:00
Gustavo A. R. Silva
3bbe8b1a4a 9p: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

net/9p/trans_xen.c:514:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Link: http://lkml.kernel.org/r/20190123071632.GA8039@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
2019-01-23 16:29:54 +09:00
Nathan Chancellor
33a0efa4ba devlink: Use DIV_ROUND_UP_ULL in DEVLINK_HEALTH_SIZE_TO_BUFFERS
When building this code on a 32-bit platform such as ARM, there is a
link time error (lld error shown, happpens with ld.bfd too):

ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by devlink.c
>>>               net/core/devlink.o:(devlink_health_buffers_create) in archive built-in.a

This happens when using a regular division symbol with a u64 dividend.
Use DIV_ROUND_UP_ULL, which wraps do_div, to avoid this situation.

Fixes: cb5ccfbe73 ("devlink: Add health buffer support")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 21:00:37 -08:00
Lubomir Rintel
7c62b8dd5c net/ipv6: lower the level of "link is not ready" messages
This message gets logged far too often for how interesting is it.

Most distributions nowadays configure NetworkManager to use randomly
generated MAC addresses for Wi-Fi network scans. The interfaces end up
being periodically brought down for the address change. When they're
subsequently brought back up, the message is logged, eventually flooding
the log.

Perhaps the message is not all that helpful: it seems to be more
interesting to hear when the addrconf actually start, not when it does
not. Let's lower its level.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-By: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 20:42:39 -08:00
YueHaibing
ed175d9c6f devlink: Add missing check of nlmsg_put
nlmsg_put may fail, this fix add a check of its return value.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 20:08:23 -08:00
Jakub Kicinski
1518039f6b net/ipv6: don't return positive numbers when nothing was dumped
in6_dump_addrs() returns a positive 1 if there was nothing to dump.
This return value can not be passed as return from inet6_dump_addr()
as is, because it will confuse rtnetlink, resulting in NLMSG_DONE
never getting set:

$ ip addr list dev lo
EOF on netlink
Dump terminated

v2: flip condition to avoid a new goto (DaveA)

Fixes: 7c1e8a3817 ("netlink: fixup regression in RTM_GETADDR")
Reported-by: Brendan Galloway <brendan.galloway@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:24:18 -08:00
Linus Lüssing
4b3087c7e3 bridge: Snoop Multicast Router Advertisements
When multiple multicast routers are present in a broadcast domain then
only one of them will be detectable via IGMP/MLD query snooping. The
multicast router with the lowest IP address will become the selected and
active querier while all other multicast routers will then refrain from
sending queries.

To detect such rather silent multicast routers, too, RFC4286
("Multicast Router Discovery") provides a standardized protocol to
detect multicast routers for multicast snooping switches.

This patch implements the necessary MRD Advertisement message parsing
and after successful processing adds such routers to the internal
multicast router list.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:09 -08:00
Linus Lüssing
4effd28c12 bridge: join all-snoopers multicast address
Next to snooping IGMP/MLD queries RFC4541, section 2.1.1.a) recommends
to snoop multicast router advertisements to detect multicast routers.

Multicast router advertisements are sent to an "all-snoopers"
multicast address. To be able to receive them reliably, we need to
join this group.

Otherwise other snooping switches might refrain from forwarding these
advertisements to us.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:08 -08:00
Linus Lüssing
a2e2ca3beb bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() internals
With this patch the internal use of the skb_trimmed is reduced to
the ICMPv6/IGMP checksum verification. And for the length checks
the newly introduced helper functions are used instead of calculating
and checking with skb->len directly.

These changes should hopefully make it easier to verify that length
checks are performed properly.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:08 -08:00
Linus Lüssing
ba5ea61462 bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls
This patch refactors ip_mc_check_igmp(), ipv6_mc_check_mld() and
their callers (more precisely, the Linux bridge) to not rely on
the skb_trimmed parameter anymore.

An skb with its tail trimmed to the IP packet length was initially
introduced for the following three reasons:

1) To be able to verify the ICMPv6 checksum.
2) To be able to distinguish the version of an IGMP or MLD query.
   They are distinguishable only by their size.
3) To avoid parsing data for an IGMPv3 or MLDv2 report that is
   beyond the IP packet but still within the skb.

The first case still uses a cloned and potentially trimmed skb to
verfiy. However, there is no need to propagate it to the caller.
For the second and third case explicit IP packet length checks were
added.

This hopefully makes ip_mc_check_igmp() and ipv6_mc_check_mld() easier
to read and verfiy, as well as easier to use.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:08 -08:00
David S. Miller
0da2b1832c Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2019-01-22

This series contains updates to i40e and xsk.

Jan exports xdp_get_umem_from_qid() for other drivers/modules to use.
Refactored the code use the netdev provided umems, instead of containing
them inside our i40e_vsi.

Aleksandr fixes an issue where RSS queues were misconfigured, so limit
the RSS queue number to the online CPU number.

Damian adds support for ethtool's setting and getting the FEC
configuration.

Grzegorz fixes a type mismatch, where the return value was not matching
the function declaration.

Sergey adds checks in the queue configuration handler to ensure the
number of queue pairs requested by the VF is less than maximum possible.

Lihong cleans up code left around from earlier silicon validation in the
i40e debugfs code.

Julia Lawall and Colin Ian King clean up white space indentation issues
found.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 14:32:58 -08:00
Florian Westphal
e2f7cc72cb netfilter: conntrack: fix bogus port values for other l4 protocols
We must only extract l4 proto information if we can track the layer 4
protocol.

Before removal of pkt_to_tuple callback, the code to extract port
information was only reached for TCP/UDP/LITE/DCCP/SCTP.

The other protocols were handled by the indirect call, and the
'generic' tracker took care of other protocols that have no notion
of 'ports'.

After removal of the callback we must be more strict here and only
init port numbers for those protocols that have ports.

Fixes: df5e162908 ("netfilter: conntrack: remove pkt_to_tuple callback")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-22 23:28:40 +01:00
Florian Westphal
81e01647fd netfilter: conntrack: fix IPV6=n builds
Stephen Rothwell reports:
 After merging the netfilter-next tree, today's linux-next build
 (powerpc ppc64_defconfig) failed like this:

 ERROR: "nf_conntrack_invert_icmpv6_tuple" [nf_conntrack.ko] undefined!
 ERROR: "nf_conntrack_icmpv6_packet" [nf_conntrack.ko] undefined!
 ERROR: "nf_conntrack_icmpv6_init_net" [nf_conntrack.ko] undefined!
 ERROR: "icmpv6_pkt_to_tuple" [nf_conntrack.ko] undefined!
 ERROR: "nf_ct_gre_keymap_destroy" [nf_conntrack.ko] undefined!

icmpv6 related errors are due to lack of IS_ENABLED(CONFIG_IPV6) (no
icmpv6 support is builtin if kernel has CONFIG_IPV6=n), the
nf_ct_gre_keymap_destroy error is due to lack of PROTO_GRE check.

Fixes: a47c540481 ("netfilter: conntrack: handle builtin l4proto packet functions via direct calls")
Fixes: e2e48b4716 ("netfilter: conntrack: handle icmp pkt_to_tuple helper via direct calls")
Fixes: 197c4300ae ("netfilter: conntrack: remove invert_tuple callback")
Fixes: 2a389de86e ("netfilter: conntrack: remove l4proto init and get_net callbacks")
Fixes: e56894356f ("netfilter: conntrack: remove l4proto destroy hook")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-22 23:28:24 +01:00
Lorenzo Bianconi
cb73ee40b1 net: ip_gre: use erspan key field for tunnel lookup
Use ERSPAN key header field as tunnel key in gre_parse_header routine
since ERSPAN protocol sets the key field of the external GRE header to
0 resulting in a tunnel lookup fail in ip6gre_err.
In addition remove key field parsing and pskb_may_pull check in
erspan_rcv and ip6erspan_rcv

Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 11:52:17 -08:00
Atul Gupta
76f7164d02 net/tls: free ctx in sock destruct
free tls context in sock destruct. close may not be the last
call to free sock but force releasing the ctx in close
will result in GPF when ctx referred again in tcp_done

[  515.330477] general protection fault: 0000 [#1] SMP PTI
[  515.330539] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc7+ #10
[  515.330657] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0b
11/07/2013
[  515.330844] RIP: 0010:tls_hw_unhash+0xbf/0xd0
[
[  515.332220] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  515.332340] CR2: 00007fab32c55000 CR3: 000000009261e000 CR4:
00000000000006e0
[  515.332519] Call Trace:
[  515.332632]  <IRQ>
[  515.332793]  tcp_set_state+0x5a/0x190
[  515.332907]  ? tcp_update_metrics+0xe3/0x350
[  515.333023]  tcp_done+0x31/0xd0
[  515.333130]  tcp_rcv_state_process+0xc27/0x111a
[  515.333242]  ? __lock_is_held+0x4f/0x90
[  515.333350]  ? tcp_v4_do_rcv+0xaf/0x1e0
[  515.333456]  tcp_v4_do_rcv+0xaf/0x1e0

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 11:30:54 -08:00
Atul Gupta
63a6b3fee4 net/tls: build_protos moved to common routine
build protos is required for tls_hw_prot also hence moved to
'tls_build_proto' and called as required from tls_init
and tls_hw_proto. This is required since build_protos
for v4 is moved from tls_register to tls_init in
commit <28cb6f1eaffdc5a6a9707cac55f4a43aa3fd7895>

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 11:30:54 -08:00
Cong Wang
856c395cfa net: introduce a knob to control whether to inherit devconf config
There have been many people complaining about the inconsistent
behaviors of IPv4 and IPv6 devconf when creating new network
namespaces.  Currently, for IPv4, we inherit all current settings
from init_net, but for IPv6 we reset all setting to default.

This patch introduces a new /proc file
/proc/sys/net/core/devconf_inherit_init_net to control the
behavior of whether to inhert sysctl current settings from init_net.
This file itself is only available in init_net.

As demonstrated below:

Initial setup in init_net:
 # cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # cat /proc/sys/net/ipv6/conf/all/accept_dad
 1

Default value 0 (current behavior):
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 0

Set to 1 (inherit from init_net):
 # echo 1 > /proc/sys/net/core/devconf_inherit_init_net
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 1

Set to 2 (reset to default):
 # echo 2 > /proc/sys/net/core/devconf_inherit_init_net
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 0
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 0

Set to a value out of range (invalid):
 # echo 3 > /proc/sys/net/core/devconf_inherit_init_net
 -bash: echo: write error: Invalid argument
 # echo -1 > /proc/sys/net/core/devconf_inherit_init_net
 -bash: echo: write error: Invalid argument

Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 11:07:21 -08:00
Jan Sokolowski
5f4f3b2d19 xsk: export xdp_get_umem_from_qid
Export xdp_get_umem_from_qid for other modules to use.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-01-22 06:57:04 -08:00
Oliver Hartkopp
93171ba6f1 can: bcm: check timer values before ktime conversion
Kyungtae Kim detected a potential integer overflow in bcm_[rx|tx]_setup()
when the conversion into ktime multiplies the given value with NSEC_PER_USEC
(1000).

Reference: https://marc.info/?l=linux-can&m=154732118819828&w=2

Add a check for the given tv_usec, so that the value stays below one second.
Additionally limit the tv_sec value to a reasonable value for CAN related
use-cases of 400 days and ensure all values to be positive.

Reported-by: Kyungtae Kim <kt0755@gmail.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org> # >= 2.6.26
Tested-by: Kyungtae Kim <kt0755@gmail.com>
Acked-by: Andre Naujoks <nautsch2@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-01-22 11:33:46 +01:00
Matthias Kaehlcke
c4f5627f7e Bluetooth: Fix locking in bt_accept_enqueue() for BH context
With commit e163376220 ("Bluetooth: Handle bt_accept_enqueue() socket
atomically") lock_sock[_nested]() is used to acquire the socket lock
before manipulating the socket. lock_sock[_nested]() may block, which
is problematic since bt_accept_enqueue() can be called in bottom half
context (e.g. from rfcomm_connect_ind()):

[<ffffff80080d81ec>] __might_sleep+0x4c/0x80
[<ffffff800876c7b0>] lock_sock_nested+0x24/0x58
[<ffffff8000d7c27c>] bt_accept_enqueue+0x48/0xd4 [bluetooth]
[<ffffff8000e67d8c>] rfcomm_connect_ind+0x190/0x218 [rfcomm]

Add a parameter to bt_accept_enqueue() to indicate whether the
function is called from BH context, and acquire the socket lock
with bh_lock_sock_nested() if that's the case.

Also adapt all callers of bt_accept_enqueue() to pass the new
parameter:

- l2cap_sock_new_connection_cb()
  - uses lock_sock() to lock the parent socket => process context

- rfcomm_connect_ind()
  - acquires the parent socket lock with bh_lock_sock() => BH
    context

- __sco_chan_add()
  - called from sco_chan_add(), which is called from sco_connect().
    parent is NULL, hence bt_accept_enqueue() isn't called in this
    code path and we can ignore it
  - also called from sco_conn_ready(). uses bh_lock_sock() to acquire
    the parent lock => BH context

Fixes: e163376220 ("Bluetooth: Handle bt_accept_enqueue() socket atomically")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2019-01-22 09:51:20 +01:00
YueHaibing
5e053534be 6lowpan: fix debugfs_simple_attr.cocci warnings
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-01-22 09:51:19 +01:00
YueHaibing
e250fab655 Bluetooth: 6lowpan: Fix debugfs_simple_attr.cocci warnings
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-01-22 09:51:19 +01:00
David S. Miller
fa7f3a8d56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Completely minor snmp doc conflict.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-21 14:41:32 -08:00
Ilya Dryomov
4aac9228d1 libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()
con_fault() can transition the connection into STANDBY right after
ceph_con_keepalive() clears STANDBY in clear_standby():

    libceph user thread               ceph-msgr worker

ceph_con_keepalive()
  mutex_lock(&con->mutex)
  clear_standby(con)
  mutex_unlock(&con->mutex)
                                mutex_lock(&con->mutex)
                                con_fault()
                                  ...
                                  if KEEPALIVE_PENDING isn't set
                                    set state to STANDBY
                                  ...
                                mutex_unlock(&con->mutex)
  set KEEPALIVE_PENDING
  set WRITE_PENDING

This triggers warnings in clear_standby() when either ceph_con_send()
or ceph_con_keepalive() get to clearing STANDBY next time.

I don't see a reason to condition queue_con() call on the previous
value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING
into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING
could have been a non-atomic flag.

Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Myungho Jung <mhjungk@gmail.com>
2019-01-21 14:53:12 +01:00
Linus Torvalds
7d0ae236ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix endless loop in nf_tables, from Phil Sutter.

 2) Fix cross namespace ip6_gre tunnel hash list corruption, from
    Olivier Matz.

 3) Don't be too strict in phy_start_aneg() otherwise we might not allow
    restarting auto negotiation. From Heiner Kallweit.

 4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue.

 5) Memory leak in act_tunnel_key, from Davide Caratti.

 6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn.

 7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet.

 8) Missing udplite rehash callbacks, from Alexey Kodanev.

 9) Log dirty pages properly in vhost, from Jason Wang.

10) Use consume_skb() in neigh_probe() as this is a normal free not a
    drop, from Yang Wei. Likewise in macvlan_process_broadcast().

11) Missing device_del() in mdiobus_register() error paths, from Thomas
    Petazzoni.

12) Fix checksum handling of short packets in mlx5, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits)
  bpf: in __bpf_redirect_no_mac pull mac only if present
  virtio_net: bulk free tx skbs
  net: phy: phy driver features are mandatory
  isdn: avm: Fix string plus integer warning from Clang
  net/mlx5e: Fix cb_ident duplicate in indirect block register
  net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
  net/mlx5e: Fix wrong error code return on FEC query failure
  net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
  tools: bpftool: Cleanup license mess
  bpf: fix inner map masking to prevent oob under speculation
  bpf: pull in pkt_sched.h header for tooling to fix bpftool build
  selftests: forwarding: Add a test case for externally learned FDB entries
  selftests: mlxsw: Test FDB offload indication
  mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
  net: bridge: Mark FDB entries that were added by user as such
  mlxsw: spectrum_fid: Update dummy FID index
  mlxsw: pci: Return error on PCI reset timeout
  mlxsw: pci: Increase PCI SW reset timeout
  mlxsw: pci: Ring CQ's doorbell before RDQ's
  MAINTAINERS: update email addresses of liquidio driver maintainers
  ...
2019-01-21 12:52:31 +13:00
David S. Miller
6436408e81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2019-01-20

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a out-of-bounds access in __bpf_redirect_no_mac, from Willem.

2) Fix bpf_setsockopt to reset sock dst on SO_MARK changes, from Peter.

3) Fix map in map masking to prevent out-of-bounds access under
   speculative execution, from Daniel.

4) Fix bpf_setsockopt's SO_MAX_PACING_RATE to support TCP internal
   pacing, from Yuchung.

5) Fix json writer license in bpftool, from Thomas.

6) Fix AF_XDP to check if an actually queue exists during umem
   setup, from Krzysztof.

7) Several fixes to BPF stackmap's build id handling. Another fix
   for bpftool build to account for libbfd variations wrt linking
   requirements, from Stanislav.

8) Fix BPF samples build with clang by working around missing asm
   goto, from Yonghong.

9) Fix libbpf to retry program load on signal interrupt, from Lorenz.

10) Various minor compile warning fixes in BPF code, from Mathieu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19 16:38:12 -08:00
Willem de Bruijn
e7c87bd6cc bpf: in __bpf_redirect_no_mac pull mac only if present
Syzkaller was able to construct a packet of negative length by
redirecting from bpf_prog_test_run_skb with BPF_PROG_TYPE_LWT_XMIT:

    BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:345 [inline]
    BUG: KASAN: slab-out-of-bounds in skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
    BUG: KASAN: slab-out-of-bounds in __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
    Read of size 4294967282 at addr ffff8801d798009c by task syz-executor2/12942

    kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
    check_memory_region_inline mm/kasan/kasan.c:260 [inline]
    check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
    memcpy+0x23/0x50 mm/kasan/kasan.c:302
    memcpy include/linux/string.h:345 [inline]
    skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
    __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
    __pskb_copy include/linux/skbuff.h:1053 [inline]
    pskb_copy include/linux/skbuff.h:2904 [inline]
    skb_realloc_headroom+0xe7/0x120 net/core/skbuff.c:1539
    ipip6_tunnel_xmit net/ipv6/sit.c:965 [inline]
    sit_tunnel_xmit+0xe1b/0x30d0 net/ipv6/sit.c:1029
    __netdev_start_xmit include/linux/netdevice.h:4325 [inline]
    netdev_start_xmit include/linux/netdevice.h:4334 [inline]
    xmit_one net/core/dev.c:3219 [inline]
    dev_hard_start_xmit+0x295/0xc90 net/core/dev.c:3235
    __dev_queue_xmit+0x2f0d/0x3950 net/core/dev.c:3805
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
    __bpf_tx_skb net/core/filter.c:2016 [inline]
    __bpf_redirect_common net/core/filter.c:2054 [inline]
    __bpf_redirect+0x5cf/0xb20 net/core/filter.c:2061
    ____bpf_clone_redirect net/core/filter.c:2094 [inline]
    bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2066
    bpf_prog_41f2bcae09cd4ac3+0xb25/0x1000

The generated test constructs a packet with mac header, network
header, skb->data pointing to network header and skb->len 0.

Redirecting to a sit0 through __bpf_redirect_no_mac pulls the
mac length, even though skb->data already is at skb->network_header.
bpf_prog_test_run_skb has already pulled it as LWT_XMIT !is_l2.

Update the offset calculation to pull only if skb->data differs
from skb->network_header, which is not true in this case.

The test itself can be run only from commit 1cf1cae963 ("bpf:
introduce BPF_PROG_TEST_RUN command"), but the same type of packets
with skb at network header could already be built from lwt xmit hooks,
so this fix is more relevant to that commit.

Also set the mac header on redirect from LWT_XMIT, as even after this
change to __bpf_redirect_no_mac that field is expected to be set, but
is not yet in ip_finish_output2.

Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-20 01:11:48 +01:00
Cong Wang
5954894ba3 net_sched: add performance counters for basic filter
Similar to u32 filter, it is useful to know how many times
we reach each basic filter and how many times we pass the
ematch attached to it.

Sample output:

filter protocol arp pref 49152 basic chain 0
filter protocol arp pref 49152 basic chain 0 handle 0x1  (rule hit 3 success 3)
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1 installed 81 sec used 4 sec
	Action statistics:
	Sent 126 bytes 3 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19 16:05:42 -08:00
Linus Torvalds
b0efca46b5 Merge tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
 "These are mostly fixes for SUNRPC bugs, with a single v4.2
  copy_file_range() fix mixed in.

  Stable bugfixes:
   - Fix TCP receive code on archs with flush_dcache_page()

  Other bugfixes:
   - Fix error code in rpcrdma_buffer_create()
   - Fix a double free in rpcrdma_send_ctxs_create()
   - Fix kernel BUG at kernel/cred.c:825
   - Fix unnecessary retry in nfs42_proc_copy_file_range()
   - Ensure rq_bytes_sent is reset before request transmission
   - Ensure we respect the RPCSEC_GSS sequence number limit
   - Address Kerberos performance/behavior regression"

* tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: Address Kerberos performance/behavior regression
  SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit
  SUNRPC: Ensure rq_bytes_sent is reset before request transmission
  NFSv4.2 fix unnecessary retry in nfs4_copy_file_range
  sunrpc: kernel BUG at kernel/cred.c:825!
  SUNRPC: Fix TCP receive code on archs with flush_dcache_page()
  xprtrdma: Double free in rpcrdma_sendctxs_create()
  xprtrdma: Fix error code in rpcrdma_buffer_create()
2019-01-20 09:27:38 +12:00
Yafang Shao
0726f558d8 net: sock: do not set sk_cookie in sk_clone_lock()
The only call site of sk_clone_lock is in inet_csk_clone_lock,
and sk_cookie will be set there.
So we don't need to set sk_cookie in sk_clone_lock().

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19 10:34:59 -08:00