Commit Graph

57846 Commits

Author SHA1 Message Date
Matteo Croce
e84fb4b366 netfilter: conntrack: use shared sysctl constants
Use shared sysctl variables for zero and one constants, as in commit
eec4844fae ("proc/sysctl: add shared variables for range check")

Fixes: 8f14c99c7e ("netfilter: conntrack: limit sysctl setting for boolean options")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-03 18:39:08 +02:00
Fernando Fernandez Mancera
8c0bb78738 netfilter: synproxy: rename mss synproxy_options field
After introduce "mss_encode" field in the synproxy_options struct the field
"mss" is a little confusing. It has been renamed to "mss_option".

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-03 18:39:08 +02:00
Dexuan Cui
685703b497 hv_sock: Fix hang when a connection is closed
There is a race condition for an established connection that is being closed
by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
'remove_sock' is false):

1 for the initial value;
1 for the sk being in the bound list;
1 for the sk being in the connected list;
1 for the delayed close_work.

After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
decrease the refcnt to 3.

Concurrently, hvs_close_connection() runs in another thread:
  calls vsock_remove_sock() to decrease the refcnt by 2;
  call sock_put() to decrease the refcnt to 0, and free the sk;
  next, the "release_sock(sk)" may hang due to use-after-free.

In the above, after hvs_release() finishes, if hvs_close_connection() runs
faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
because at the beginning of hvs_close_connection(), the refcnt is still 4.

The issue can be resolved if an extra reference is taken when the
connection is established.

Fixes: a9eeb998c2 ("hv_sock: Add support for delayed close")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-02 17:26:27 -07:00
Jon Maloy
7c5b420559 tipc: reduce risk of wakeup queue starvation
In commit 365ad353c2 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.

Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.

We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.

Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 18:19:28 -04:00
Taras Kondratiuk
4da5f0018e tipc: compat: allow tipc commands without arguments
Commit 2753ca5d90 ("tipc: fix uninit-value in tipc_nl_compat_doit")
broke older tipc tools that use compat interface (e.g. tipc-config from
tipcutils package):

% tipc-config -p
operation not supported

The commit started to reject TIPC netlink compat messages that do not
have attributes. It is too restrictive because some of such messages are
valid (they don't need any arguments):

% grep 'tx none' include/uapi/linux/tipc_config.h
#define  TIPC_CMD_NOOP              0x0000    /* tx none, rx none */
#define  TIPC_CMD_GET_MEDIA_NAMES   0x0002    /* tx none, rx media_name(s) */
#define  TIPC_CMD_GET_BEARER_NAMES  0x0003    /* tx none, rx bearer_name(s) */
#define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
#define  TIPC_CMD_GET_REMOTE_MNG    0x4003    /* tx none, rx unsigned */
#define  TIPC_CMD_GET_MAX_PORTS     0x4004    /* tx none, rx unsigned */
#define  TIPC_CMD_GET_NETID         0x400B    /* tx none, rx unsigned */
#define  TIPC_CMD_NOT_NET_ADMIN     0xC001    /* tx none, rx none */

This patch relaxes the original fix and rejects messages without
arguments only if such arguments are expected by a command (reg_type is
non zero).

Fixes: 2753ca5d90 ("tipc: fix uninit-value in tipc_nl_compat_doit")
Cc: stable@vger.kernel.org
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 18:14:00 -04:00
Thomas Gleixner
9dd8813ed9 hrtimer/treewide: Use hrtimer_sleeper_start_expires()
hrtimer_sleepers will gain a scheduling class dependent treatment on
PREEMPT_RT. Use the new hrtimer_sleeper_start_expires() function to make
that possible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-08-01 17:43:16 +02:00
Sebastian Andrzej Siewior
dbc1625fc9 hrtimer: Consolidate hrtimer_init() + hrtimer_init_sleeper() calls
hrtimer_init_sleeper() calls require prior initialisation of the hrtimer
object which is embedded into the hrtimer_sleeper.

Combine the initialization and spare a function call. Fixup all call sites.

This is also a preparatory change for PREEMPT_RT to do hrtimer sleeper
specific initializations of the embedded hrtimer without modifying any of
the call sites.

No functional change.

[ anna-maria: Minor cleanups ]
[ tglx: Adopted to the removal of the task argument of
  	hrtimer_init_sleeper() and trivial polishing.
	Folded a fix from Stephen Rothwell for the vsoc code ]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.887468908@linutronix.de
2019-08-01 17:43:15 +02:00
Nikolay Aleksandrov
3247b27204 net: bridge: mcast: add delete due to fast-leave mdb flag
In user-space there's no way to distinguish why an mdb entry was deleted
and that is a problem for daemons which would like to keep the mdb in
sync with remote ends (e.g. mlag) but would also like to converge faster.
In almost all cases we'd like to age-out the remote entry for performance
and convergence reasons except when fast-leave is enabled. In that case we
want explicit immediate remote delete, thus add mdb flag which is set only
when the entry is being deleted due to fast-leave.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 19:13:40 -04:00
Nikolay Aleksandrov
5c725b6b65 net: bridge: mcast: don't delete permanent entries when fast leave is enabled
When permanent entries were introduced by the commit below, they were
exempt from timing out and thus igmp leave wouldn't affect them unless
fast leave was enabled on the port which was added before permanent
entries existed. It shouldn't matter if fast leave is enabled or not
if the user added a permanent entry it shouldn't be deleted on igmp
leave.

Before:
$ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
$ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent

< join and leave 229.1.1.1 on eth4 >

$ bridge mdb show
$

After:
$ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
$ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent

< join and leave 229.1.1.1 on eth4 >

$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent

Fixes: ccb1c31a7a ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 19:03:01 -04:00
David S. Miller
ac5fe22636 Merge tag 'mac80211-next-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
We have a reasonably large number of changes:
 * lots more HE (802.11ax) support, particularly things
   relevant for the the AP side, but also mesh support
 * debugfs cleanups from Greg
 * some more work on extended key ID
 * start using genl parallel_ops, as preparation for
   weaning ourselves off RTNL and getting parallelism
 * various other changes all over
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:59:41 -07:00
David S. Miller
f86a677e57 Merge tag 'mac80211-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
Just a few fixes:
 * revert NETIF_F_LLTX usage as it caused problems
 * avoid warning on WMM parameters from AP that are too short
 * fix possible null-ptr dereference in hwsim
 * fix interface combinations with 4-addr and crypto control
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:51:34 -07:00
David S. Miller
fa9586aff9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree:

1) memleak in ebtables from the error path for the 32/64 compat layer,
   from Florian Westphal.

2) Fix inverted meta ifname/ifidx matching when no interface is set
   on either from the input/output path, from Phil Sutter.

3) Remove goto label in nft_meta_bridge, also from Phil.

4) Missing include guard in xt_connlabel, from Masahiro Yamada.

5) Two patch to fix ipset destination MAC matching coming from
   Stephano Brivio, via Jozsef Kadlecsik.

6) Fix set rename and listing concurrency problem, from Shijie Luo.
   Patch also coming via Jozsef Kadlecsik.

7) ebtables 32/64 compat missing base chain policy in rule count,
   from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:49:09 -07:00
Shay Bar
f39b07fdfb mac80211: HE STA disassoc due to QOS NULL not sent
In case of HE AP-STA link, ieee80211_send_nullfunc() will not
send the QOS NULL packet to check if AP is still associated.

In this case, probe_send_count will be non-zero and
ieee80211_sta_work() will later disassociate the AP, even
though no packet was ever sent.

Fix this by decrementing probe_send_count and not calling
ieee80211_send_nullfunc() in case of HE link, so that we
still wait for some time for the AP beacon to reappear and
don't disconnect right away.

Signed-off-by: Shay Bar <shay.bar@celeno.com>
Link: https://lore.kernel.org/r/20190703131848.22879-1-shay.bar@celeno.com
[clarify commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 13:26:41 +02:00
John Crispin
1ced169cc1 mac80211: allow setting spatial reuse parameters from bss_conf
Store the OBSS PD parameters inside bss_conf when bringing up an AP and/or
when a station connects to an AP. This allows the driver to configure the
HW accordingly.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190730163701.18836-3-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Johannes Berg
6d4dd4ef1a nl80211: add strict start type
Add a strict start type so all new attributes starting from
NL80211_ATTR_HE_OBSS_PD are validated strictly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
John Crispin
796e90f42b cfg80211: add support for parsing OBBS_PD attributes
Add the data structure, policy and parsing code allowing userland to send
the OBSS PD information into the kernel.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190730163701.18836-2-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Karthikeyan Periyasamy
52dba8d7d5 mac80211: reject zero MAC address in add station
This came up in fuzz testing, and really we don't consider
all-zeroes to be a valid MAC address in most places, so
also reject it here to avoid confusion later on.

Signed-off-by: Karthikeyan Periyasamy <periyasa@codeaurora.org>
Link: https://lore.kernel.org/r/1563959770-21570-1-git-send-email-periyasa@codeaurora.org
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Johannes Berg
50508d941c cfg80211: use parallel_ops for genl
Over time, we really need to get rid of all of our global locking.
One of the things needed is to use parallel_ops. This isn't really
the most important (RTNL is much more important) but OTOH we just
keep adding uses of genl_family_attrbuf() now. Use .parallel_ops to
disallow this.

Reviewed-By: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190729143109.18683-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:46 +02:00
Colin Ian King
f12cac539f mac80211: add missing null return check from call to ieee80211_get_sband
The return from ieee80211_get_sband can potentially be a null pointer, so
it seems prudent to add a null check to avoid a null pointer dereference
on sband.

Addresses-Coverity: ("Dereference null return")
Fixes: 2ab4587675 ("mac80211: add support for the ADDBA extension element")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20190730143205.14261-1-colin.king@canonical.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 10:51:17 +02:00
Petar Penkov
70d6624431 bpf: add bpf_tcp_gen_syncookie helper
This helper function allows BPF programs to try to generate SYN
cookies, given a reference to a listener socket. The function works
from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
socket in both cases.

Signed-off-by: Petar Penkov <ppenkov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-30 21:03:05 -07:00
Petar Penkov
9349d600fb tcp: add skb-less helpers to retrieve SYN cookie
This patch allows generation of a SYN cookie before an SKB has been
allocated, as is the case at XDP.

Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-30 21:03:05 -07:00
Petar Penkov
965112785e tcp: tcp_syn_flood_action read port from socket
This allows us to call this function before an SKB has been
allocated.

Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-30 21:03:05 -07:00
Tristram Ha
016e43a26b net: dsa: ksz: Add KSZ8795 tag code
Add DSA tag code for Microchip KSZ8795 switch. The switch is simpler
and the tag is only 1 byte, instead of 2 as is the case with KSZ9477.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:12:50 -07:00
Stefano Garzarella
0038ff357f vsock/virtio: change the maximum packet size allowed
Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
6dbd3e66e7 vhost/vsock: split packets to send using multiple buffers
If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
9632e9f61b vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.

Move also buf_alloc under the same lock.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
b89d882dc9 vsock/virtio: reduce credit update messages
In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
473c7391ce vsock/virtio: limit the memory used per-socket
Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Thomas Gleixner
b744948725 hrtimer: Remove task argument from hrtimer_init_sleeper()
All callers hand in 'current' and that's the only task pointer which
actually makes sense. Remove the task argument and set current in the
function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.791885290@linutronix.de
2019-07-30 23:57:51 +02:00
Arnd Bergmann
055d88242a compat_ioctl: pppoe: fix PPPOEIOCSFWD handling
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
linux-2.5.69 along with hundreds of other commands, but was always broken
sincen only the structure is compatible, but the command number is not,
due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
sockaddr_pppox)), which is different on 64-bit architectures.

Guillaume Nault adds:

  And the implementation was broken until 2016 (see 29e73269aa ("pppoe:
  fix reference counting in PPPoE proxy")), and nobody ever noticed. I
  should probably have removed this ioctl entirely instead of fixing it.
  Clearly, it has never been used.

Fix it by adding a compat_ioctl handler for all pppoe variants that
translates the command number and then calls the regular ioctl function.

All other ioctl commands handled by pppoe are compatible between 32-bit
and 64-bit, and require compat_ptr() conversion.

This should apply to all stable kernels.

Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:42:13 -07:00
Jon Maloy
2948a1fcd7 tipc: fix unitilized skb list crash
Our test suite somtimes provokes the following crash:

Description of problem:
[ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
[ 1092.605072] PGD 0 P4D 0
[ 1092.607620] Oops: 0000 [#1] SMP PTI
[ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1
[ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018
[ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc]
[ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7
[ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282
[ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045
[ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000
[ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a
[ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000
[ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a
[ 1092.692614] FS:  0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000
[ 1092.700702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0
[ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1092.727843] PKRU: 55555554
[ 1092.730556] Call Trace:
[ 1092.733010]  <IRQ>
[ 1092.735034]  tipc_sk_filter_rcv+0x7ca/0xb80 [tipc]
[ 1092.739828]  ? __kmalloc_node_track_caller+0x1cb/0x290
[ 1092.744974]  ? dev_hard_start_xmit+0xa5/0x210
[ 1092.749332]  tipc_sk_rcv+0x389/0x640 [tipc]
[ 1092.753519]  tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc]
[ 1092.758224]  tipc_rcv+0x57a/0xf20 [tipc]
[ 1092.762154]  ? ktime_get_real_ts64+0x40/0xe0
[ 1092.766432]  ? tpacket_rcv+0x50/0x9f0
[ 1092.770098]  tipc_l2_rcv_msg+0x4a/0x70 [tipc]
[ 1092.774452]  __netif_receive_skb_core+0xb62/0xbd0
[ 1092.779164]  ? enqueue_entity+0xf6/0x630
[ 1092.783084]  ? kmem_cache_alloc+0x158/0x1c0
[ 1092.787272]  ? __build_skb+0x25/0xd0
[ 1092.790849]  netif_receive_skb_internal+0x42/0xf0
[ 1092.795557]  napi_gro_receive+0xba/0xe0
[ 1092.799417]  mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core]
[ 1092.804564]  mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core]
[ 1092.809536]  mlx5e_napi_poll+0xb2/0xce0 [mlx5_core]
[ 1092.814415]  ? __wake_up_common_lock+0x89/0xc0
[ 1092.818861]  net_rx_action+0x149/0x3b0
[ 1092.822616]  __do_softirq+0xe3/0x30a
[ 1092.826193]  irq_exit+0x100/0x110
[ 1092.829512]  do_IRQ+0x85/0xd0
[ 1092.832483]  common_interrupt+0xf/0xf
[ 1092.836147]  </IRQ>
[ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0
[ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f
[ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f
[ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000
[ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940
[ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78
[ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000
[ 1092.905196]  ? cpuidle_enter_state+0x92/0x2a0
[ 1092.909555]  do_idle+0x236/0x280
[ 1092.912785]  cpu_startup_entry+0x6f/0x80
[ 1092.916715]  start_secondary+0x1a7/0x200
[ 1092.920642]  secondary_startup_64+0xb7/0xc0
[...]

The reason is that the skb list tipc_socket::mc_method.deferredq only
is initialized for connectionless sockets, while nothing stops arriving
multicast messages from being filtered by connection oriented sockets,
with subsequent access to the said list.

We fix this by initializing the list unconditionally at socket creation.
This eliminates the crash, while the message still is dropped further
down in tipc_sk_filter_rcv() as it should be.

Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:39:36 -07:00
Jonathan Lemon
b54c9d5bd6 net: Use skb_frag_off accessors
Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:21:32 -07:00
Xin Long
a64e59c72c sctp: factor out sctp_connect_add_peer
In this function factored out from sctp_sendmsg_new_asoc() and
__sctp_connect(), it adds a peer with the other addr into the
asoc after this asoc is created with the 1st addr.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
f26f995122 sctp: factor out sctp_connect_new_asoc
In this function factored out from sctp_sendmsg_new_asoc() and
__sctp_connect(), it creates the asoc and adds a peer with the
1st addr.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
dd8378b3af sctp: clean up __sctp_connect
__sctp_connect is doing quit similar things as sctp_sendmsg_new_asoc.
To factor out common functions, this patch is to clean up their code
to make them look more similar:

  1. create the asoc and add a peer with the 1st addr.
  2. add peers with the other addrs into this asoc one by one.

while at it, also remove the unused 'addrcnt'.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
f40f1177c3 sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectx
Now __sctp_connect() is called by __sctp_setsockopt_connectx() and
sctp_inet_connect(), the latter has done addr_size check with size
of sa_family_t.

In the next patch to clean up __sctp_connect(), we will remove
addr_size check with size of sa_family_t from __sctp_connect()
for the 1st address.

So before doing that, __sctp_setsockopt_connectx() should do
this check first, as sctp_inet_connect() does.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
4c31bc6b1e sctp: only copy the available addr data in sctp_transport_init
'addr' passed to sctp_transport_init is not always a whole size
of union sctp_addr, like the path:

  sctp_sendmsg() ->
  sctp_sendmsg_new_asoc() ->
  sctp_assoc_add_peer() ->
  sctp_transport_new() -> sctp_transport_init()

In the next patches, we will also pass the address length of data
only to sctp_assoc_add_peer().

So sctp_transport_init() should copy the only available data from
addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may
cause slab-out-of-bounds.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
David Howells
1db88c5343 rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto
rxkad sometimes triggers a warning about oversized stack frames when
building with clang for a 32-bit architecture:

net/rxrpc/rxkad.c:243:12: error: stack frame size of 1088 bytes in function 'rxkad_secure_packet' [-Werror,-Wframe-larger-than=]
net/rxrpc/rxkad.c:501:12: error: stack frame size of 1088 bytes in function 'rxkad_verify_packet' [-Werror,-Wframe-larger-than=]

The problem is the combination of SYNC_SKCIPHER_REQUEST_ON_STACK() in
rxkad_verify_packet()/rxkad_secure_packet() with the relatively large
scatterlist in rxkad_verify_packet_1()/rxkad_secure_packet_encrypt().

The warning does not show up when using gcc, which does not inline the
functions as aggressively, but the problem is still the same.

Allocate the cipher buffers from the slab instead, caching the allocated
packet crypto request memory used for DATA packet crypto in the rxrpc_call
struct.

Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 10:32:35 -07:00
Dave Wysochanski
64a38e840c SUNRPC: Track writers of the 'channel' file to improve cache_listeners_exist
The sunrpc cache interface is susceptible to being fooled by a rogue
process just reading a 'channel' file.  If this happens the kernel
may think a valid daemon exists to service the cache when it does not.
For example, the following may fool the kernel:
cat /proc/net/rpc/auth.unix.gid/channel

Change the tracking of readers to writers when considering whether a
listener exists as all valid daemon processes either open a channel
file O_RDWR or O_WRONLY.  While this does not prevent a rogue process
from "stealing" a message from the kernel, it does at least improve
the kernels perception of whether a valid process servicing the cache
exists.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-30 11:55:53 -04:00
David Howells
c69565ee66 rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet
Fix the fact that a notification isn't sent to the recvmsg side to indicate
a call failed when sendmsg() fails to transmit a DATA packet with the error
ENETUNREACH, EHOSTUNREACH or ECONNREFUSED.

Without this notification, the afs client just sits there waiting for the
call to complete in some manner (which it's not now going to do), which
also pins the rxrpc call in place.

This can be seen if the client has a scope-level IPv6 address, but not a
global-level IPv6 address, and we try and transmit an operation to a
server's IPv6 address.

Looking in /proc/net/rxrpc/calls shows completed calls just sat there with
an abort code of RX_USER_ABORT and an error code of -ENETUNREACH.

Fixes: c54e43d752 ("rxrpc: Fix missing start of call timeout")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
2019-07-30 15:27:59 +01:00
David Howells
60034d3d14 rxrpc: Fix potential deadlock
There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby
rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces
the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which
the tries to take the already held lock.

Fix this by providing a version of rxrpc_put_peer() that can be called in
situations where the lock is already held.

The bug may produce the following lockdep report:

============================================
WARNING: possible recursive locking detected
5.2.0-next-20190718 #41 Not tainted
--------------------------------------------
kworker/0:3/21678 is trying to acquire lock:
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
/./include/linux/spinlock.h:343 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
__rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435

but task is already holding lock:
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
/./include/linux/spinlock.h:343 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430

Fixes: 330bdcfadc ("rxrpc: Fix the keepalive generator [ver #2]")
Reported-by: syzbot+72af434e4b3417318f84@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
2019-07-30 14:42:50 +01:00
Johannes Berg
eef347f846 Revert "mac80211: set NETIF_F_LLTX when using intermediate tx queues"
Revert this for now, it has been reported multiple times that it
completely breaks connectivity on various devices.

Cc: stable@vger.kernel.org
Fixes: 8dbb000ee7 ("mac80211: set NETIF_F_LLTX when using intermediate tx queues")
Reported-by: Jean Delvare <jdelvare@suse.de>
Reported-by: Peter Lebbing <peter@digitalbrains.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30 14:52:50 +02:00
Pablo Neira Ayuso
7cdc441228 Merge branch 'master' of git://blackhole.kfki.hu/nf
Jozsef Kadlecsik says:

====================
ipset patches for the nf tree

- When the support of destination MAC addresses for hash:mac sets was
  introduced, it was forgotten to add the same functionality to hash:ip,mac
  types of sets. The patch from Stefano Brivio adds the missing part.
- When the support of destination MAC addresses for hash:mac sets was
  introduced, a copy&paste error was made in the code of the hash:ip,mac
  and bitmap:ip,mac types: the MAC address in these set types is in
  the second position and not in the first one. Stefano Brivio's patch
  fixes the issue.
- There was still a not properly handled concurrency handling issue
  between renaming and listing sets at the same time, reported by
  Shijie Luo.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-30 13:39:20 +02:00
Florian Westphal
3b48300d5c netfilter: ebtables: also count base chain policies
ebtables doesn't include the base chain policies in the rule count,
so we need to add them manually when we call into the x_tables core
to allocate space for the comapt offset table.

This lead syzbot to trigger:
WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649
xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649

Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com
Fixes: 2035f3ff8e ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-30 13:37:44 +02:00
Suzuki K Poulose
6cda08a20d drivers: Introduce device lookup variants by name
Add a helper to match the device name for device lookup. Also
reuse this generic exported helper for the existing bus_find_device_by_name().
and add similar variants for driver/class.

Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: linux-leds@vger.kernel.org
Cc: linux-rtc@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-wpan@vger.kernel.org
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20190723221838.12024-2-suzuki.poulose@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-30 13:07:41 +02:00
Oliver Hartkopp
473d924d7d can: fix ioctl function removal
Commit 60649d4e0a ("can: remove obsolete empty ioctl() handler") replaced the
almost empty can_ioctl() function with sock_no_ioctl() which always returns
-EOPNOTSUPP.

Even though we don't have any ioctl() functions on socket/network layer we need
to return -ENOIOCTLCMD to be able to forward ioctl commands like SIOCGIFINDEX
to the network driver layer.

This patch fixes the wrong return codes in the CAN network layer protocols.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: 60649d4e0a ("can: remove obsolete empty ioctl() handler")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29 14:12:35 -07:00
Enrico Weigelt
d4e575ba9f net: sctp: drop unneeded likely() call around IS_ERR()
IS_ERR() already calls unlikely(), so this extra unlikely() call
around IS_ERR() is not needed.

Signed-off-by: Enrico Weigelt <info@metux.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29 13:57:58 -07:00
Toke Høiland-Jørgensen
6f9d451ab1 xdp: Add devmap_hash map type for looking up devices by hashed index
A common pattern when using xdp_redirect_map() is to create a device map
where the lookup key is simply ifindex. Because device maps are arrays,
this leaves holes in the map, and the map has to be sized to fit the
largest ifindex, regardless of how many devices actually are actually
needed in the map.

This patch adds a second type of device map where the key is looked up
using a hashmap, instead of being used as an array index. This allows maps
to be densely packed, so they can be smaller.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29 13:50:48 -07:00
Jozsef Kadlecsik
6c1f7e2c1b netfilter: ipset: Fix rename concurrency with listing
Shijie Luo reported that when stress-testing ipset with multiple concurrent
create, rename, flush, list, destroy commands, it can result

ipset <version>: Broken LIST kernel message: missing DATA part!

error messages and broken list results. The problem was the rename operation
was not properly handled with respect of listing. The patch fixes the issue.

Reported-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2019-07-29 21:18:07 +02:00
Stefano Brivio
1b4a75108d netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.

This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.

Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.

This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:

  ip netns add A
  ip link add veth1 type veth peer name veth2 netns A
  ip addr add 192.0.2.1/24 dev veth1
  ip -net A addr add 192.0.2.2/24 dev veth2
  ip link set veth1 up
  ip -net A link set veth2 up

  dst=$(ip netns exec A cat /sys/class/net/veth2/address)

  ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
  ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
  ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP

  ip netns exec A ipset create test_hash hash:ip,mac
  ip netns exec A ipset add test_hash 192.0.2.1,${dst}
  ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP

ipset correctly matches a test packet:

  # ping -c1 192.0.2.2 >/dev/null
  # echo $?
  0

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2019-07-29 21:17:30 +02:00