Commit Graph

57846 Commits

Author SHA1 Message Date
Yousuk Seung
e8bd8fca67 tcp: add SRTT to SCM_TIMESTAMPING_OPT_STATS
Add TCP_NLA_SRTT to SCM_TIMESTAMPING_OPT_STATS that reports the smoothed
round trip time in microseconds (tcp_sock.srtt_us >> 3).

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17 20:34:36 -08:00
Taehee Yoo
b4e955e9f3 netfilter: xt_hashlimit: fix a possible memory leak in htable_create()
In the htable_create(), hinfo is allocated by vmalloc()
So that if error occurred, hinfo should be freed.

Fixes: 11d5f15723 ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-17 19:13:58 +01:00
Jakub Kicinski
7211101502 net: sched: gred: allow manipulating per-DP RED flags
Allow users to set and dump RED flags (ECN enabled and harddrop)
on per-virtual queue basis.  Validation of attributes is split
from changes to make sure we won't have to undo previous operations
when we find out configuration is invalid.

The objective is to allow changing per-Qdisc parameters without
overwriting the per-vq configured flags.

Old user space will not pass the TCA_GRED_VQ_FLAGS attribute and
per-Qdisc flags will always get propagated to the virtual queues.

New user space which wants to make use of per-vq flags should set
per-Qdisc flags to 0 and then configure per-vq flags as it
sees fit.  Once per-vq flags are set per-Qdisc flags can't be
changed to non-zero.  Vice versa - if the per-Qdisc flags are
non-zero the TCA_GRED_VQ_FLAGS attribute has to either be omitted
or set to the same value as per-Qdisc flags.

Update per-Qdisc parameters:
per-Qdisc | per-VQ | result
        0 |      0 | all vq flags updated
	0 |  non-0 | error (vq flags in use)
    non-0 |      0 | -- impossible --
    non-0 |  non-0 | all vq flags updated

Update per-VQ state (flags parameter not specified):
   no change to flags

Update per-VQ state (flags parameter set):
per-Qdisc | per-VQ | result
        0 |   any  | per-vq flags updated
    non-0 |      0 | -- impossible --
    non-0 |  non-0 | error (per-Qdisc flags in use)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Jakub Kicinski
25fc198907 net: sched: gred: store red flags per virtual queue
Right now ECN marking and HARD drop (the common RED flags) can only
be configured for the entire Qdisc.  In preparation for per-vq flags
store the values in the virtual queue structure.  Setting per-vq
flags will only be allowed when no flags are set for the entire Qdisc.
For the new flags we will also make sure undefined bits are 0.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Jakub Kicinski
80e22e961d net: sched: gred: provide a better structured dump and expose stats
Currently all GRED's virtual queue data is dumped in a single
array in a single attribute.  This makes it pretty much impossible
to add new fields.  In order to expose more detailed stats add a
new set of attributes.  We can now expose the 64 bit value of bytesin
and all the mark stats which were not part of the original design.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Jakub Kicinski
9f5cd0c806 net: sched: gred: store bytesin as a 64 bit value
32 bit counters for bytes are not really going to last long in modern
world.  Make sch_gred count bytes on a 64 bit counter.  It will still
get truncated during dump but follow up patch will add set of new
stat dump attributes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Jakub Kicinski
4777be08b8 net: sched: gred: use extack to provide more details on configuration errors
Add extack messages to -EINVAL errors, to help users identify
their mistakes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Jakub Kicinski
79c59fe01e net: sched: gred: pass extack to nla_parse_nested()
In case netlink wants to provide parsing error pass extack
to nla_parse_nested().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Jakub Kicinski
255f4803ec net: sched: gred: separate error and non-error path in gred_change()
We will soon want to add more code to the non-error path, separate
it from the error handling flow.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Paolo Abeni
9c48060141 udp: fix jump label misuse
The commit 60fb9567bf ("udp: implement complete book-keeping for
encap_needed") introduced a severe misuse of jump label APIs, which
syzbot, as reported by Eric, was able to exploit.

When multiple sockets/process can concurrently request (and than
disable) the udp encap, we need to track the activation counter with
*_inc()/*_dec() jump label variants, or we can experience bad things
at disable time.

Fixes: 60fb9567bf ("udp: implement complete book-keeping for encap_needed")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:01:56 -08:00
Jesus Sanchez-Palencia
37342bdaf5 etf: Drop all expired packets
Currently on dequeue() ETF only drops the first expired packet, which
causes a problem if the next packet is already expired. When this
happens, the watchdog will be configured with a time in the past, fire
straight way and the packet will finally be dropped once the dequeue()
function of the qdisc is called again.

We can save quite a few cycles and improve the overall behavior of the
qdisc if we drop all expired packets if the next packet is expired.
This should allow ETF to recover faster from bad situations. But
packet drops are still a very serious warning that the requirements
imposed on the system aren't reasonable.

This was inspired by how the implementation of hrtimers use the
rb_tree inside the kernel.

Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 20:39:34 -08:00
Jesus Sanchez-Palencia
cbeeb8efec etf: Split timersortedlist_erase()
This is just a refactor that will simplify the implementation of the
next patch in this series which will drop all expired packets on the
dequeue flow.

Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 20:39:34 -08:00
Jesus Sanchez-Palencia
09fd4860ea etf: Use cached rb_root
ETF's peek() operation is heavily used so use an rb_root_cached instead
and leverage rb_first_cached() which will run in O(1) instead of
O(log n).

Even if on 'timesortedlist_clear()' we could be using rb_erase(), we
choose to use rb_erase_cached(), because if in the future we allow
runtime changes to ETF parameters, and need to do a '_clear()', this
might cause some hard to debug issues.

Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 20:39:34 -08:00
Jesus Sanchez-Palencia
3fcbdaee3b etf: Cancel timer if there are no pending skbs
There is no point in firing the qdisc watchdog if there are no future
skbs pending in the queue and the watchdog had been set previously.

Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 20:39:34 -08:00
Yafang Shao
213d7767af tcp: clean up STATE_TRACE
Currently we can use bpf or tcp tracepoint to conveniently trace the tcp
state transition at the run time.
So we don't need to do this stuff at the compile time anymore.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 20:28:00 -08:00
David S. Miller
5594923f6e Merge tag 'batadv-net-for-davem-20181114' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
Here are two batman-adv bugfixes:

 - Explicitly pad short ELP packets with zeros, by Sven Eckelmann

 - Fix packet size calculation when merging fragments,
   by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 20:26:30 -08:00
Davide Caratti
19ab69107d net/sched: act_pedit: fix memory leak when IDR allocation fails
tcf_idr_check_alloc() can return a negative value, on allocation failures
(-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases.

Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:53:45 -08:00
Jiri Pirko
32764c66fa net: 8021q: move vlan offload registrations into vlan_core
Currently, the vlan packet offloads are registered only upon 8021q module
load. However, even without this module loaded, the offloads could be
utilized, for example by openvswitch datapath. As reported by Michael,
that causes 2x to 5x performance improvement, depending on a testcase.

So move the vlan offload registrations into vlan_core and make this
available even without 8021q module loaded.

Reported-by: Michael Shteinbok <michaelsh86@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Michael Shteinbok <michaelsh86@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:51:08 -08:00
Xin Long
761f60261b ipv6: fix a dst leak when removing its exception
These is no need to hold dst before calling rt6_remove_exception_rt().
The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(),
which has been removed in Commit 93531c6743 ("net/ipv6: separate
handling of FIB entries from dst based routes"). Otherwise, it will
cause a dst leak.

This patch is to simply remove the dst_hold_safe() call before calling
rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt().
It's safe, because the removal of the exception that holds its dst's
refcnt is protected by rt6_exception_lock.

Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Fixes: 23fb93a4d3 ("net/ipv6: Cleanup exception and cache route handling")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:44:15 -08:00
Colin Ian King
99310e732a net/decnet: add missing indentation
There is a missing indentation before the declaration of port. Add
it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:42:49 -08:00
Cong Wang
7f600f14df net: remove unused skb_send_sock()
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:32:33 -08:00
Michał Mirosław
0c4b2d3705 net: remove VLAN_TAG_PRESENT
Replace VLAN_TAG_PRESENT with single bit flag and free up
VLAN.CFI overload. Now VLAN.CFI is visible in networking stack
and can be passed around intact.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:25:29 -08:00
Michał Mirosław
9c21225597 net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:25:28 -08:00
Andrey Ignatov
6c49e65e0d bpf: Support socket lookup in CGROUP_SOCK_ADDR progs
Make bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers
available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR.

Such programs operate on sockets and have access to socket and struct
sockaddr passed by user to system calls such as sys_bind, sys_connect,
sys_sendmsg.

It's useful to be able to lookup other sockets from these programs.
E.g. sys_connect may lookup IP:port endpoint and if there is a server
socket bound to that endpoint ("server" can be defined by saddr & sport
being zero), redirect client connection to it by rewriting IP:port in
sockaddr passed to sys_connect.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 17:54:29 -08:00
Andrey Ignatov
cac6cc2f5a bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp
Lookup functions in sk_lookup have different expectations about byte
order of provided arguments.

Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup
expect dport to be in network byte order and do ntohs(dport) internally.

At the same time __inet6_lookup expects dport to be in host byte order
and correspondingly name the argument hnum.

sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and
__inet6_lookup with regard to dport. But in __udp6_lib_lookup case it
uses host instead of expected network byte order. It makes result
returned by bpf_sk_lookup_udp for IPv6 incorrect.

The patch fixes byte order of dport passed to __udp6_lib_lookup.

Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84f0
fixes TCPv6 but breaks UDPv6.

Fixes: 5ef0ae84f0 ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 17:54:29 -08:00
David S. Miller
5aa25c05be Merge tag 'batadv-next-for-davem-20181114' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - Bump version strings, by Simon Wunderlich

 - Fixup includes, by Sven Eckelmann (3 patches)

 - Separate BATMAN_ADV_DEBUG from DEBUGFS, by Sven Eckelmann

 - Fixup tracing log documentation, by Sven Eckelmann

 - Use exclusive locks to secure netlink information dump transfers,
   by Sven Eckelmann (8 patches)

 - Move CRC16 dependency, by Sven Eckelmann

 - Enable MCAST by default, by Linus Luessing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 16:12:19 -08:00
Li RongQing
45cf7959c3 net: slightly optimize eth_type_trans
netperf udp stream shows that eth_type_trans takes certain cpu,
so adjust the mac address check order, and firstly check if it
is device address, and only check if it is multicast address
only if not the device address.

After this change:
To unicast, and skb dst mac is device mac, this is most of time
reduce a comparision
To unicast, and skb dst mac is not device mac, nothing change
To multicast, increase a comparision

Before:
1.03%  [kernel]          [k] eth_type_trans

After:
0.78%  [kernel]          [k] eth_type_trans

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 15:10:59 -08:00
Li RongQing
982c17b9e3 net: remove BUG_ON from __pskb_pull_tail
if list is NULL pointer, and the following access of list
will trigger panic, which is same as BUG_ON

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 15:07:50 -08:00
Eric Dumazet
08e14fe429 net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows
When EDT conversion happened, fq lost the ability to enfore a maxrate
for all flows. It kept it for non EDT flows.

This commit restores the functionality.

Tested:

tc qd replace dev eth0 root fq maxrate 500Mbit
netperf -P0 -H host -- -O THROUGHPUT
489.75

Fixes: ab408b6dc7 ("tcp: switch tcp and sch_fq to new earliest departure time model")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:42:12 -08:00
Amritha Nambiar
5c72299fba net: sched: cls_flower: Classify packets using port ranges
Added support in tc flower for filtering based on port ranges.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v4:
1. Added condition before setting port key.
2. Organized setting and dumping port range keys into functions
   and added validation of input range.

v3:
1. Moved new fields in UAPI enum to the end of enum.
2. Removed couple of empty lines.

v2:
Addressed Jiri's comments:
1. Added separate functions for dst and src comparisons.
2. Removed endpoint enum.
3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
  lookup.
4. Cleaned up fl_lookup function.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:38:23 -08:00
Cong Wang
7fe50ac83f net: dump more useful information in netdev_rx_csum_fault()
Currently netdev_rx_csum_fault() only shows a device name,
we need more information about the skb for debugging csum
failures.

Sample output:

 ens3: hw csum failure
 dev features: 0x0000000000014b89
 skb len=84 data_len=0 pkt_type=0 gso_size=0 gso_type=0 nr_frags=0 ip_summed=0 csum=0 csum_complete_sw=0 csum_valid=0 csum_level=0

Note, I use pr_err() just to be consistent with the existing one.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:37:04 -08:00
David Howells
7150ceaacb rxrpc: Fix life check
The life-checking function, which is used by kAFS to make sure that a call
is still live in the event of a pending signal, only samples the received
packet serial number counter; it doesn't actually provoke a change in the
counter, rather relying on the server to happen to give us a packet in the
time window.

Fix this by adding a function to force a ping to be transmitted.

kAFS then keeps track of whether there's been a stall, and if so, uses the
new function to ping the server, resetting the timeout to allow the reply
to come back.

If there's a stall, a ping and the call is *still* stalled in the same
place after another period, then the call will be aborted.

Fixes: bc5e3a546d ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Fixes: f4d15fb6f9 ("rxrpc: Provide functions for allowing cleaner handling of signals")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:35:40 -08:00
Florian Westphal
39aa6928d4 xfrm: policy: fix netlink/pf_key policy lookups
Colin Ian King says:
 Static analysis with CoverityScan found a potential issue [..]
 It seems that pointer pol is set to NULL and then a check to see if it
 is non-null is used to set pol to tmp; howeverm this check is always
 going to be false because pol is always NULL.

Fix this and update test script to catch this.  Updated script only:
./xfrm_policy.sh ; echo $?
RTNETLINK answers: No such file or directory
FAIL: ip -net ns3 xfrm policy get src 10.0.1.0/24 dst 10.0.2.0/24 dir out
RTNETLINK answers: No such file or directory
[..]
PASS: policy before exception matches
PASS: ping to .254 bypassed ipsec tunnel
PASS: direct policy matches
PASS: policy matches
1

Fixes: 6be3b0db6d ("xfrm: policy: add inexact policy search tree infrastructure")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-11-15 18:09:32 +01:00
Colin Ian King
7759d6a837 xfrm: policy: add missing indentation
There is a missing indentation before the goto statement. Add it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-11-15 18:09:32 +01:00
Colin Ian King
4fc735d934 af_key: fix indentation on declaration statement
There is an indentation issue before the declaration of xfrm_ctx. Remove
spaces and replace with a tab.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-11-15 18:09:32 +01:00
Linus Torvalds
94ca5c18e1 Merge tag 'nfs-for-4.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

 Stable fixes:

   - Don't exit the NFSv4 state manager without clearing
     NFS4CLNT_MANAGER_RUNNING

  Bugfixes:

   - Fix an Oops when destroying the RPCSEC_GSS credential cache

   - Fix an Oops during delegation callbacks

   - Ensure that the NFSv4 state manager exits the loop on SIGKILL

   - Fix a bogus get/put in generic_key_to_expire()"

* tag 'nfs-for-4.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix an Oops during delegation callbacks
  SUNRPC: Fix a bogus get/put in generic_key_to_expire()
  SUNRPC: Fix a Oops when destroying the RPCSEC_GSS credential cache
  NFSv4: Ensure that the state manager exits the loop on SIGKILL
  NFSv4: Don't exit the state manager without clearing NFS4CLNT_MANAGER_RUNNING
2018-11-15 10:59:37 -06:00
Xin Long
f8504f4ca0 l2tp: fix a sock refcnt leak in l2tp_tunnel_register
This issue happens when trying to add an existent tunnel. It
doesn't call sock_put() before returning -EEXIST to release
the sock refcnt that was held by calling sock_hold() before
the existence check.

This patch is to fix it by holding the sock after doing the
existence check.

Fixes: f6cd651b05 ("l2tp: fix race in duplicate tunnel detection")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 22:49:31 -08:00
Linus Torvalds
4e4490d438 Merge tag 'nfsd-4.20-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Three nfsd bugfixes.

  None are new bugs, but they all take a little effort to hit, which
  might explain why they weren't found sooner"

* tag 'nfsd-4.20-1' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: drop pointless static qualifier in xdr_get_next_encode_buffer()
  nfsd: COPY and CLONE operations require the saved filehandle to be set
  sunrpc: correct the computation for page_ptr when truncating
2018-11-14 15:31:15 -06:00
Jakub Kicinski
c0b7490b19 net: sched: red: notify drivers about RED's limit parameter
RED qdisc's limit parameter changes the behaviour of the qdisc,
for instance if it's set to 0 qdisc will drop all the packets.

When replace operation happens and parameter is set to non-0
a new fifo qdisc will be instantiated and replace the old child
qdisc which will be destroyed.

Drivers need to know the parameter, even if they don't impose
the actual limit to be able to reliably reconstruct the Qdisc
hierarchy.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 08:51:28 -08:00
Jakub Kicinski
d577a3d279 net: sched: mq: offload a graft notification
Drivers offloading Qdiscs should have reasonable certainty
the offloaded behaviour matches the SW path.  This is impossible
if the driver does not know about all Qdiscs or when Qdiscs move
and are reused.  Send a graft notification from MQ.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 08:51:28 -08:00
Jakub Kicinski
bf2a752bea net: sched: red: offload a graft notification
Drivers offloading Qdiscs should have reasonable certainty
the offloaded behaviour matches the SW path.  This is impossible
if the driver does not know about all Qdiscs or when Qdiscs move
and are reused.  Send a graft notification from RED.  The drivers
are expected to simply stop offloading the Qdisc, if a non-standard
child is ever grafted onto it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 08:51:28 -08:00
Jakub Kicinski
98b0e5f684 net: sched: provide notification for graft on root
Drivers are currently not notified when a Qdisc is grafted as root.
This requires special casing Qdiscs added with parent = TC_H_ROOT in
the driver.  Also there is no notification sent to the driver when
an existing Qdisc is grafted as root.

Add this very simple notifications, drivers should now be able to
track their Qdisc tree fully.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 08:51:27 -08:00
David S. Miller
11123ab9d9 Merge tag 'linux-can-fixes-for-4.20-20181109' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:

====================
pull-request: can 2018-11-09

this is a pull request of 20 patches for net/master.

First we have a patch by Oliver Hartkopp which changes the raw socket's
raw_sendmsg() to return an error value if the user tries to send a CANFD
frame to a CAN-2.0 device.

The next two patches are by Jimmy Assarsson and fix potential problems
in the kvaser_usb driver.

YueHaibing's patches for the ucan driver fix a compile time warning and
remove a duplicate include.

Eugeniu Rosca patch adds more binding documentation to the rcar_can
driver bindings. The next two patches are by Fabrizio Castro for the
rcar_can driver and fixes a problem in the driver's probe function and
document the r8a774a1 binding.

Lukas Wunner's patch fixes a recpetion problem in hi311x driver by
switching from edge to level triggered interruts.

The next three patches all target the flexcan driver. Pankaj Bansal's
patch unconditionally unlocks the last mailbox used for RX. Alexander
Stein provides a better workaround for a hardware limitation when
sending RTR frames, by using the last mailbox for TX, resulting in fewer
lost frames. The patch by me simplyfies the driver, by making a runtime
value a compile time constant.

The following 4 patches are by me and provide the groundwork for the
next patches by Oleksij Rempel. To avoid code duplication common code in
the common CAN driver infrastructure is factured out and error handling
is cleaned up.

The next 4 patches are by Oleksij Rempel and fix the problem in the
flexcan driver that other processes see TX frames arrive out of order
with ragards to a RX'ed frame (which are send by a different system on
the CAN bus as the result of our TX frame).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-13 08:43:05 -08:00
Florian Westphal
29e3880109 netfilter: nf_tables: fix use-after-free when deleting compat expressions
nft_compat ops do not have static storage duration, unlike all other
expressions.

When nf_tables_expr_destroy() returns, expr->ops might have been
free'd already, so we need to store next address before calling
expression destructor.

For same reason, we can't deref match pointer after nft_xt_put().

This can be easily reproduced by adding msleep() before
nft_match_destroy() returns.

Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-13 09:57:33 +01:00
Taehee Yoo
0fbcc5b568 netfilter: xt_RATEEST: remove netns exit routine
xt_rateest_net_exit() was added to check whether rules are flushed
successfully. but ->net_exit() callback is called earlier than
->destroy() callback.
So that ->net_exit() callback can't check that.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \
	   --dport 1111 -j RATEEST --rateest-name ap \
	   --rateest-interval 250ms --rateest-ewma 0.5s
   %ip netns del vm1

splat looks like:
[  668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
[  668.813518] Modules linked in: xt_RATEEST xt_tcpudp iptable_mangle bpfilter ip_tables x_tables
[  668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21
[  668.813518] Workqueue: netns cleanup_net
[  668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
[  668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 00 00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 <0f> 0b 48 83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49
[  668.813518] RSP: 0018:ffff8801156c73f8 EFLAGS: 00010282
[  668.813518] RAX: ffffed0022ad8e85 RBX: ffff880118928e98 RCX: 5db8012a00000000
[  668.813518] RDX: ffff8801156c7428 RSI: 00000000cb1d185f RDI: ffff880115663b74
[  668.813518] RBP: ffff8801156c74d0 R08: ffff8801156633c0 R09: 1ffff100236440be
[  668.813518] R10: 0000000000000001 R11: ffffed002367d852 R12: ffff880115142b08
[  668.813518] R13: 1ffff10022ad8e81 R14: ffff880118928ea8 R15: dffffc0000000000
[  668.813518] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
[  668.813518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  668.813518] CR2: 0000563aa69f4f28 CR3: 0000000105a16000 CR4: 00000000001006f0
[  668.813518] Call Trace:
[  668.813518]  ? unregister_netdevice_many+0xe0/0xe0
[  668.813518]  ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST]
[  668.813518]  ? default_device_exit+0x1ca/0x270
[  668.813518]  ? remove_proc_entry+0x1cd/0x390
[  668.813518]  ? dev_change_net_namespace+0xd00/0xd00
[  668.813518]  ? __init_waitqueue_head+0x130/0x130
[  668.813518]  ops_exit_list.isra.10+0x94/0x140
[  668.813518]  cleanup_net+0x45b/0x900
[  668.813518]  ? net_drop_ns+0x110/0x110
[  668.813518]  ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80
[  668.813518]  ? save_trace+0x300/0x300
[  668.813518]  ? lock_acquire+0x196/0x470
[  668.813518]  ? lock_acquire+0x196/0x470
[  668.813518]  ? process_one_work+0xb60/0x1de0
[  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
[  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
[  668.813518]  ? __lock_acquire+0x4500/0x4500
[  668.813518]  ? __lock_is_held+0xb4/0x140
[  668.813518]  process_one_work+0xc13/0x1de0
[  668.813518]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[  668.813518]  ? set_load_weight+0x270/0x270
[ ... ]

Fixes: 3427b2ab63 ("netfilter: make xt_rateest hash table per net")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-13 09:57:29 +01:00
Trond Myklebust
e3d5e573a5 SUNRPC: Fix a bogus get/put in generic_key_to_expire()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-12 16:39:13 -05:00
Trond Myklebust
a652a4bc21 SUNRPC: Fix a Oops when destroying the RPCSEC_GSS credential cache
Commit 07d02a67b7 causes a use-after free in the RPCSEC_GSS credential
destroy code, because the call to get_rpccred() in gss_destroying_context()
will now always fail to increment the refcount.

While we could just replace the get_rpccred() with a refcount_set(), that
would have the unfortunate consequence of resurrecting a credential in
the credential cache for which we are in the process of destroying the
RPCSEC_GSS context. Rather than do this, we choose to make a copy that
is never added to the cache and use that to destroy the context.

Fixes: 07d02a67b7 ("SUNRPC: Simplify lookup code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-12 16:39:13 -05:00
Xin Long
6ba8457402 sctp: process sk_reuseport in sctp_get_port_local
When socks' sk_reuseport is set, the same port and address are allowed
to be bound into these socks who have the same uid.

Note that the difference from sk_reuse is that it allows multiple socks
to listen on the same port and address.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-12 09:09:51 -08:00
Xin Long
76c6d988ae sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
This is a part of sk_reuseport support for sctp. It defines a helper
sctp_bind_addrs_check() to check if the bind_addrs in two socks are
matched. It will add sock_reuseport if they are completely matched,
and return err if they are partly matched, and alloc sock_reuseport
if all socks are not matched at all.

It will work until sk_reuseport support is added in
sctp_get_port_local() in the next patch.

v1->v2:
  - use 'laddr->valid && laddr2->valid' check instead as Marcelo
    pointed in sctp_bind_addrs_check().

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-12 09:09:51 -08:00
Xin Long
532ae2f10e sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
This is a part of sk_reuseport support for sctp, and it selects a
sock by the hashkey of lport, paddr and dport by default. It will
work until sk_reuseport support is added in sctp_get_port_local()
in the next patch.

v1->v2:
  - define lport as __be16 instead of __be32 as Marcelo pointed in
    __sctp_rcv_lookup_endpoint().

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-12 09:09:51 -08:00