Commit Graph

55938 Commits

Author SHA1 Message Date
Julian Anastasov
0261ea1bd1 ipvs: do not schedule icmp errors from tunnels
We can receive ICMP errors from client or from
tunneling real server. While the former can be
scheduled to real server, the latter should
not be scheduled, they are decapsulated only when
existing connection is found.

Fixes: 6044eeffaf ("ipvs: attempt to schedule icmp packets")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-13 14:52:57 +02:00
Alexander Potapenko
8176c83327 netfilter: conntrack: initialize ct->timeout
KMSAN started reporting an error when accessing ct->timeout for the
first time without initialization:

 BUG: KMSAN: uninit-value in __nf_ct_refresh_acct+0x1ae/0x470 net/netfilter/nf_conntrack_core.c:1765
 ...
 dump_stack+0x173/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:624
 __msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310
 __nf_ct_refresh_acct+0x1ae/0x470 net/netfilter/nf_conntrack_core.c:1765
 nf_ct_refresh_acct ./include/net/netfilter/nf_conntrack.h:201
 nf_conntrack_udp_packet+0xb44/0x1040 net/netfilter/nf_conntrack_proto_udp.c:122
 nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1605
 nf_conntrack_in+0x1250/0x26c9 net/netfilter/nf_conntrack_core.c:1696
 ...
 Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205
 kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:159
 kmsan_kmalloc+0xa9/0x130 mm/kmsan/kmsan_hooks.c:173
 kmem_cache_alloc+0x554/0xb10 mm/slub.c:2789
 __nf_conntrack_alloc+0x16f/0x690 net/netfilter/nf_conntrack_core.c:1342
 init_conntrack+0x6cb/0x2490 net/netfilter/nf_conntrack_core.c:1421

Signed-off-by: Alexander Potapenko <glider@google.com>
Fixes: cc16921351 ("netfilter: conntrack: avoid same-timeout update")
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-13 14:52:57 +02:00
Florian Westphal
1025ce7521 netfilter: conntrack: don't set related state for different outer address
Luca Moro says:
 ------
The issue lies in the filtering of ICMP and ICMPv6 errors that include an
inner IP datagram.
For these packets, icmp_error_message() extract the ICMP error and inner
layer to search of a known state.
If a state is found the packet is tagged as related (IP_CT_RELATED).

The problem is that there is no correlation check between the inner and
outer layer of the packet.
So one can encapsulate an error with an inner layer matching a known state,
while its outer layer is directed to a filtered host.
In this case the whole packet will be tagged as related.
This has various implications from a rule bypass (if a rule to related
trafic is allow), to a known state oracle.

Unfortunately, we could not find a real statement in a RFC on how this case
should be filtered.
The closest we found is RFC5927 (Section 4.3) but it is not very clear.

A possible fix would be to check that the inner IP source is the same than
the outer destination.

We believed this kind of attack was not documented yet, so we started to
write a blog post about it.
You can find it attached to this mail (sorry for the extract quality).
It contains more technical details, PoC and discussion about the identified
behavior.
We discovered later that
https://www.gont.com.ar/papers/filtering-of-icmp-error-messages.pdf
described a similar attack concept in 2004 but without the stateful
filtering in mind.
 -----

This implements above suggested fix:
In icmp(v6) error handler, take outer destination address, then pass
that into the common function that does the "related" association.

After obtaining the nf_conn of the matching inner-headers connection,
check that the destination address of the opposite direction tuple
is the same as the outer address and only set RELATED if thats the case.

Reported-by: Luca Moro <luca.moro@synacktiv.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-13 14:52:57 +02:00
Stephen Suryaputra
ed0de45a10 ipv4: recompile ip options in ipv4_link_failure
Recompile IP options since IPCB may not be valid anymore when
ipv4_link_failure is called from arp_error_report.

Refer to the commit 3da1ed7ac3 ("net: avoid use IPCB in cipso_v4_error")
and the commit before that (9ef6b42ad6) for a similar issue.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:23:46 -07:00
David Ahern
1deeb6408c ipv6: Remove flowi6_oif compare from __ip6_route_redirect
In the review of 0b34eb0043 ("ipv6: Refactor __ip6_route_redirect"),
Martin noted that the flowi6_oif compare is moved to the new helper and
should be removed from __ip6_route_redirect. Fix the oversight.

Fixes: 0b34eb0043 ("ipv6: Refactor __ip6_route_redirect")
Reported-by: Martin Lau <kafai@fb.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:05:38 -07:00
Jeffrey Altman
1a2391c30c rxrpc: Fix detection of out of order acks
The rxrpc packet serial number cannot be safely used to compute out of
order ack packets for several reasons:

 1. The allocation of serial numbers cannot be assumed to imply the order
    by which acks are populated and transmitted.  In some rxrpc
    implementations, delayed acks and ping acks are transmitted
    asynchronously to the receipt of data packets and so may be transmitted
    out of order.  As a result, they can race with idle acks.

 2. Serial numbers are allocated by the rxrpc connection and not the call
    and as such may wrap independently if multiple channels are in use.

In any case, what matters is whether the ack packet provides new
information relating to the bounds of the window (the firstPacket and
previousPacket in the ACK data).

Fix this by discarding packets that appear to wind back the window bounds
rather than on serial number procession.

Fixes: 298bc15b20 ("rxrpc: Only take the rwind and mtu values from latest ACK")
Signed-off-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
David Howells
39ce675575 rxrpc: Trace received connection aborts
Trace received calls that are aborted due to a connection abort, typically
because of authentication failure.  Without this, connection aborts don't
show up in the trace log.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Marc Dionne
8e8715aaa9 rxrpc: Allow errors to be returned from rxrpc_queue_packet()
Change rxrpc_queue_packet()'s signature so that it can return any error
code it may encounter when trying to send the packet.

This allows the caller to eventually do something in case of error - though
it should be noted that the packet has been queued and a resend is
scheduled.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Marc Dionne
4611da30d6 rxrpc: Make rxrpc_kernel_check_life() indicate if call completed
Make rxrpc_kernel_check_life() pass back the life counter through the
argument list and return true if the call has not yet completed.

Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Marc Dionne
56d282d9db rxrpc: Clear socket error
When an ICMP or ICMPV6 error is received, the error will be attached
to the socket (sk_err) and the report function will get called.
Clear any pending error here by calling sock_error().

This would cause the following attempt to use the socket to fail with
the error code stored by the ICMP error, resulting in unexpected errors
with various side effects depending on the context.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Karsten Graul
7a62725a50 net/smc: improve smc_conn_create reason codes
Rework smc_conn_create() to always return a valid DECLINE reason code.
This removes the need to translate the return codes on 4 different
places and allows to easily add more detailed return codes by changing
smc_conn_create() only.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:50:56 -07:00
Karsten Graul
9aa68d298c net/smc: improve smc_listen_work reason codes
Rework smc_listen_work() to provide improved reason codes when an
SMC connection is declined. This allows better debugging on user side.
This also adds 3 more detailed reason codes in smc_clc.h to indicate
what type of device was not found (ism or rdma or both), or if ism
cannot talk to the peer.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:50:56 -07:00
Karsten Graul
228bae05be net/smc: code cleanup smc_listen_work
In smc_listen_work() the variables rc and reason_code are defined which
have the same meaning. Eliminate reason_code in favor of the shorter
name rc. No functional changes.
Rename the functions smc_check_ism() and smc_check_rdma() into
smc_find_ism_device() and smc_find_rdma_device() to make there purpose
more clear. No functional changes.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:50:56 -07:00
Karsten Graul
fba7e8ef51 net/smc: cleanup of get vlan id
The vlan_id of the underlying CLC socket was retrieved two times
during processing of the listen handshaking. Change this to get the
vlan id one time in connect and in listen processing, and reuse the id.
And add a new CLC DECLINE return code for the case when the retrieval
of the vlan id failed.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:50:56 -07:00
Karsten Graul
bc36d2fc93 net/smc: consolidate function parameters
During initialization of an SMC socket a lot of function parameters need
to get passed down the function call path. Consolidate the parameters
in a helper struct so there are less enough parameters to get all passed
by register.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:50:56 -07:00
Karsten Graul
598866974c net/smc: check for ip prefix and subnet
The check for a matching ip prefix and subnet was only done for SMC-R
in smc_listen_rdma_check() but not when an SMC-D connection was
possible. Rename the function into smc_listen_prfx_check() and move its
call to a place where it is called for both SMC variants.
And add a new CLC DECLINE reason for the case when the IP prefix or
subnet check fails so the reason for the failing SMC connection can be
found out more easily.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:50:56 -07:00
Karsten Graul
4ada81fddf net/smc: fallback to TCP after connect problems
Correct the CLC decline reason codes for internal problems to not have
the sign bit set, negative reason codes are interpreted as not eligible
for TCP fallback.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:50:56 -07:00
Ursula Braun
50717a37db net/smc: nonblocking connect rework
For nonblocking sockets move the kernel_connect() from the connect
worker into the initial smc_connect part to return kernel_connect()
errors other than -EINPROGRESS to user space.

Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:50:56 -07:00
Tetsuo Handa
bddc028a4f udpv6: Check address length before reading address family
KMSAN will complain if valid address length passed to udpv6_pre_connect()
is shorter than sizeof("struct sockaddr"->sa_family) bytes.

(This patch is bogus if it is guaranteed that udpv6_pre_connect() is
always called after checking "struct sockaddr"->sa_family. In that case,
we want a comment why we don't need to check valid address length here.)

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:25:03 -07:00
Tetsuo Handa
ba024f2574 bpf: Check address length before reading address family
KMSAN will complain if valid address length passed to bpf_bind() is
shorter than sizeof("struct sockaddr"->sa_family) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:25:03 -07:00
Tetsuo Handa
c68e747d0a llc: Check address length before reading address field
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_llc) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:25:03 -07:00
Tetsuo Handa
bd7d46ddca Bluetooth: Check address length before reading address field
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_sco) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:25:03 -07:00
Tetsuo Handa
a9107a14a9 rxrpc: Check address length before reading srx_service field
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_rxrpc) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:25:03 -07:00
Tetsuo Handa
d852be8477 net: netlink: Check address length before reading groups field
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_nl) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:25:03 -07:00
Tetsuo Handa
175f7c1f01 sctp: Check address length before reading address family
KMSAN will complain if valid address length passed to connect() is shorter
than sizeof("struct sockaddr"->sa_family) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:25:03 -07:00
Tetsuo Handa
dd3ac9a684 net/rds: Check address length before reading address family
syzbot is reporting uninitialized value at rds_connect() [1] and
rds_bind() [2]. This is because syzbot is passing ulen == 0 whereas
these functions expect that it is safe to access sockaddr->family field
in order to determine minimal address length for validation.

[1] https://syzkaller.appspot.com/bug?id=f4e61c010416c1e6f0fa3ffe247561b60a50ad71
[2] https://syzkaller.appspot.com/bug?id=a4bf9e41b7e055c3823fdcd83e8c58ca7270e38f

Reported-by: syzbot <syzbot+0049bebbf3042dbd2e8f@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+915c9f99f3dbc4bd6cd1@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 10:25:03 -07:00
David Miller
013b96ec64 sctp: Pass sk_buff_head explicitly to sctp_ulpq_tail_event().
Now the SKB list implementation assumption can be removed.

And now that we know that the list head is always non-NULL
we can remove the code blocks dealing with that as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:33:31 -07:00
David Miller
178ca044aa sctp: Make sctp_enqueue_event tak an skb list.
Pass this, instead of an event.  Then everything trickles down and we
always have events a non-empty list.

Then we needs a list creating stub to place into .enqueue_event for sctp_stream_interleave_1.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:33:31 -07:00
David Miller
5e8f641db6 sctp: Use helper for sctp_ulpq_tail_event() when hooked up to ->enqueue_event
This way we can make sure events sent this way to
sctp_ulpq_tail_event() are on a list as well.  Now all such code paths
are fully covered.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:33:31 -07:00
David Miller
925b937422 sctp: Always pass skbs on a list to sctp_ulpq_tail_event().
This way we can simplify the logic and remove assumptions
about the implementation of skb lists.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:33:31 -07:00
David Miller
0eff105243 sctp: Remove superfluous test in sctp_ulpq_reasm_drain().
Inside the loop, we always start with event non-NULL.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:33:31 -07:00
Vlad Buslov
9994677c96 net: sched: flower: fix filter net reference counting
Fix net reference counting in fl_change() and remove redundant call to
tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
before releasing exts and deallocating a filter, so this code caused flower
classifier to obtain net twice per filter that is being deleted.

Implementation of __fl_delete() called tcf_exts_get_net() to pass its
result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
and only complicates fl_mask_put() implementation. This functionality seems
to be copied from filter cleanup code, where it was added by Cong with
following explanation:

    This patchset tries to fix the race between call_rcu() and
    cleanup_net() again. Without holding the netns refcnt the
    tc_action_net_exit() in netns workqueue could be called before
    filter destroy works in tc filter workqueue. This patchset
    moves the netns refcnt from tc actions to tcf_exts, without
    breaking per-netns tc actions.

This doesn't apply to flower mask, which doesn't call any tc action code
during cleanup. Simplify fl_mask_put() by removing the flag parameter and
always use tcf_queue_work() to free mask objects.

Fixes: 061775583e ("net: sched: flower: introduce reference counting for filters")
Fixes: 1f17f7742e ("net: sched: flower: insert filter to ht before offloading it to hw")
Fixes: 05cd271fd6 ("cls_flower: Support multiple masks per priority")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:32:27 -07:00
Eric Dumazet
e305845096 dctcp: more accurate tracking of packets delivery
After commit e21db6f69a ("tcp: track total bytes delivered with ECN CE marks")
core TCP stack does a very good job tracking ECN signals.

The "sender's best estimate of CE information" Yuchung mentioned in his
patch is indeed the best we can do.

DCTCP can use tp->delivered_ce and tp->delivered to not duplicate the logic,
and use the existing best estimate.

This solves some problems, since current DCTCP logic does not deal with losses
and/or GRO or ack aggregation very well.

This also removes a dubious use of inet_csk(sk)->icsk_ack.rcv_mss
(this should have been tp->mss_cache), and a 64 bit divide.

Finally, we can see that the DCTCP logic, calling dctcp_update_alpha() for
every ACK could be done differently, calling it only once per RTT.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Abdul Kabbani <akabbani@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:31:03 -07:00
David S. Miller
bb23581b9b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-12

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Improve BPF verifier scalability for large programs through two
   optimizations: i) remove verifier states that are not useful in pruning,
   ii) stop walking parentage chain once first LIVE_READ is seen. Combined
   gives approx 20x speedup. Increase limits for accepting large programs
   under root, and add various stress tests, from Alexei.

2) Implement global data support in BPF. This enables static global variables
   for .data, .rodata and .bss sections to be properly handled which allows
   for more natural program development. This also opens up the possibility
   to optimize program workflow by compiling ELFs only once and later only
   rewriting section data before reload, from Daniel and with test cases and
   libbpf refactoring from Joe.

3) Add config option to generate BTF type info for vmlinux as part of the
   kernel build process. DWARF debug info is converted via pahole to BTF.
   Latter relies on libbpf and makes use of BTF deduplication algorithm which
   results in 100x savings compared to DWARF data. Resulting .BTF section is
   typically about 2MB in size, from Andrii.

4) Add BPF verifier support for stack access with variable offset from
   helpers and add various test cases along with it, from Andrey.

5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
   so that L2 encapsulation can be used for tc tunnels, from Alan.

6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
   users can define a subset of allowed __sk_buff fields that get fed into
   the test program, from Stanislav.

7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
   various UBSAN warnings in bpftool, from Yonghong.

8) Generate a pkg-config file for libbpf, from Luca.

9) Dump program's BTF id in bpftool, from Prashant.

10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
    program, from Magnus.

11) kallsyms related fixes for the case when symbols are not present in
    BPF selftests and samples, from Daniel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 17:00:05 -07:00
Florian Westphal
223fd0adfa bridge: broute: make broute a real ebtables table
This makes broute a normal ebtables table, hooking at PREROUTING.
The broute hook is removed.

It uses skb->cb to signal to bridge rx handler that the skb should be
routed instead of being bridged.

This change is backwards compatible with ebtables as no userspace visible
parts are changed.

This means we can also remove the !ops test in ebt_register_table,
it was only there for broute table sake.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12 01:47:50 +02:00
Florian Westphal
971502d77f bridge: netfilter: unroll NF_HOOK helper in bridge input path
Replace NF_HOOK() based invocation of the netfilter hooks with a private
copy of nf_hook_slow().

This copy has one difference: it can return the rx handler value expected
by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS.

This is needed by the next patch to invoke the ebtables
"broute" table via the standard netfilter hooks rather than the custom
"br_should_route_hook" indirection that is used now.

When the skb is to be "brouted", we must return RX_HANDLER_PASS from the
bridge rx input handler, but there is no way to indicate this via
NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the
netfilter core or a percpu flag.

  text    data     bss     dec   filename
  3369      56       0    3425   net/bridge/br_input.o.before
  3458      40       0    3498   net/bridge/br_input.o.after

This allows removal of the "br_should_route_hook" in the next patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12 01:47:39 +02:00
Florian Westphal
f12064d1b4 bridge: reduce size of input cb to 16 bytes
Reduce size of br_input_skb_cb from 24 to 16 bytes by
using bitfield for those values that can only be 0 or 1.

igmp is the igmp type value, so it needs to be at least u8.

Furthermore, the bridge currently relies on step-by-step initialization
of br_input_skb_cb fields as the skb passes through the stack.

Explicitly zero out the bridge input cb instead, this avoids having to
review/validate that no BR_INPUT_SKB_CB(skb)->foo test can see a
'random' value from previous protocol cb.

AFAICS all current fields are always set up before they are read again,
so this is not a bug fix.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12 01:47:27 +02:00
Stanislav Fomichev
947e8b595b bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN
This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case
and be sure that nobody is erroneously setting ctx_{in,out}.

Fixes: b0b9395d86 ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-12 00:53:00 +02:00
David Ahern
0b34eb0043 ipv6: Refactor __ip6_route_redirect
Move the nexthop evaluation of a fib entry to a helper that can be
leveraged for each fib6_nh in a multipath nexthop object.

In the move, 'continue' statements means the helper returns false
(loop should continue) and 'break' means return true (found the entry
of interest).

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:07 -07:00
David Ahern
0c59d00675 ipv6: Refactor rt6_device_match
Move the device and gateway checks in the fib6_next loop to a helper
that can be called per fib6_nh entry.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:07 -07:00
David Ahern
d83009d462 ipv6: Move fib6_multipath_select down in ip6_pol_route
Move the siblings and fib6_multipath_select after the null entry check
since a null entry can not have siblings.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:06 -07:00
David Ahern
af52a52cba ipv6: Be smarter with null_entry handling in ip6_pol_route_lookup
Clean up the fib6_null_entry handling in ip6_pol_route_lookup.
rt6_device_match can return fib6_null_entry, but fib6_multipath_select
can not. Consolidate the fib6_null_entry handling and on the final
null_entry check set rt and goto out - no need to defer to a second
check after rt6_find_cached_rt.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:06 -07:00
David Ahern
30c15f0338 ipv6: Refactor find_rr_leaf
find_rr_leaf has 3 loops over fib_entries calling find_match. The loops
are very similar with differences in start point and whether the metric
is evaluated:
    1. start at rr_head, no extra loop compare, check fib metric
    2. start at leaf, compare rt against rr_head, check metric
    3. start at cont (potential saved point from earlier loops), no
       extra loop compare, no metric check

Create 1 loop that is called 3 different times. This will make a
later change with multipath nexthop objects much simpler.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:06 -07:00
David Ahern
28679ed104 ipv6: Refactor find_match
find_match primarily needs a fib6_nh (and fib6_flags which it passes
through to rt6_score_route). Move fib6_check_expired up to the call
sites so find_match is only called for relevant entries. Remove the
match argument which is mostly a pass through and use the return
boolean to decide if match gets set in the call sites.

The end result is a helper that can be called per fib6_nh struct
which is needed once fib entries reference nexthop objects that
have more than one fib6_nh.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:06 -07:00
David Ahern
702cea5685 ipv6: Pass fib6_nh and flags to rt6_score_route
rt6_score_route only needs the fib6_flags and nexthop data. Change
it accordingly. Allows re-use later for nexthop based fib6_nh.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:06 -07:00
David Ahern
cc3a86c802 ipv6: Change rt6_probe to take a fib6_nh
rt6_probe sends probes for gateways in a nexthop. As such it really
depends on a fib6_nh, not a fib entry. Move last_probe to fib6_nh and
update rt6_probe to a fib6_nh struct.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:06 -07:00
David Ahern
6e1809a564 ipv6: Remove rt6_check_dev
rt6_check_dev is a simpler helper with only 1 caller. Fold the code
into rt6_score_route.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:06 -07:00
David Ahern
1ba9a89517 ipv6: Only call rt6_check_neigh for nexthop with gateway
Change rt6_check_neigh to take a fib6_nh instead of a fib entry.
Move the check on fib_flags and whether the nexthop has a gateway
up to the one caller.

Remove the inline from the definition as well. Not necessary.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:24:06 -07:00
Colin Ian King
62720b12d2 dns: remove redundant zero length namelen check
The zero namelen check is redundant as it has already been checked
for zero at the start of the function.  Remove the redundant check.

Addresses-Coverity: ("Logically Dead Code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:01:08 -07:00
YueHaibing
d3706566ae net: netrom: Fix error cleanup path of nr_proto_init
Syzkaller report this:

BUG: unable to handle kernel paging request at fffffbfff830524b
PGD 237fe8067 P4D 237fe8067 PUD 237e64067 PMD 1c9716067 PTE 0
Oops: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 5.0.0+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:__list_add_valid+0x21/0xe0 lib/list_debug.c:23
Code: 8b 0c 24 e9 17 fd ff ff 90 55 48 89 fd 48 8d 7a 08 53 48 89 d3 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 0f 85 8b 00 00 00 48 8b 53 08 48 39 f2 75 35 48 89 f2
RSP: 0018:ffff8881ea2278d0 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: ffffffffc1829250 RCX: 1ffff1103d444ef4
RDX: 1ffffffff830524b RSI: ffffffff85659300 RDI: ffffffffc1829258
RBP: ffffffffc1879250 R08: fffffbfff0acb269 R09: fffffbfff0acb269
R10: ffff8881ea2278f0 R11: fffffbfff0acb268 R12: ffffffffc1829250
R13: dffffc0000000000 R14: 0000000000000008 R15: ffffffffc187c830
FS:  00007fe0361df700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffbfff830524b CR3: 00000001eb39a001 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 __list_add include/linux/list.h:60 [inline]
 list_add include/linux/list.h:79 [inline]
 proto_register+0x444/0x8f0 net/core/sock.c:3375
 nr_proto_init+0x73/0x4b3 [netrom]
 ? 0xffffffffc1628000
 ? 0xffffffffc1628000
 do_one_initcall+0xbc/0x47d init/main.c:887
 do_init_module+0x1b5/0x547 kernel/module.c:3456
 load_module+0x6405/0x8c10 kernel/module.c:3804
 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe0361dec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00007fe0361dec70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0361df6bc
R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
Modules linked in: netrom(+) ax25 fcrypt pcbc af_alg arizona_ldo1 v4l2_common videodev media v4l2_dv_timings hdlc ide_cd_mod snd_soc_sigmadsp_regmap snd_soc_sigmadsp intel_spi_platform intel_spi mtd spi_nor snd_usbmidi_lib usbcore lcd ti_ads7950 hi6421_regulator snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_hda_ext_core snd_hda_core snd_soc_rt5663 snd_soc_core snd_pcm_dmaengine snd_compress snd_soc_rl6231 mac80211 rtc_rc5t583 spi_slave_time leds_pwm hid_gt683r hid industrialio_triggered_buffer kfifo_buf industrialio ir_kbd_i2c rc_core led_class_flash dwc_xlgmac snd_ymfpci gameport snd_mpu401_uart snd_rawmidi snd_ac97_codec snd_pcm ac97_bus snd_opl3_lib snd_timer snd_seq_device snd_hwdep snd soundcore iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan
 bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ide_pci_generic piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ide_core psmouse input_leds i2c_piix4 serio_raw intel_agp intel_gtt ata_generic agpgart pata_acpi parport_pc rtc_cmos parport floppy sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: rxrpc]
Dumping ftrace buffer:
   (ftrace buffer empty)
CR2: fffffbfff830524b
---[ end trace 039ab24b305c4b19 ]---

If nr_proto_init failed, it may forget to call proto_unregister,
tiggering this issue.This patch rearrange code of nr_proto_init
to avoid such issues.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 13:59:49 -07:00