This converts the storage and layout of netfilter hook entries from a
linked list to an array. After this commit, hook entries will be
stored adjacent in memory. The next pointer is no longer required.
The ops pointers are stored at the end of the array as they are only
used in the register/unregister path and in the legacy br_netfilter code.
nf_unregister_net_hooks() is slower than needed as it just calls
nf_unregister_net_hook in a loop (i.e. at least n synchronize_net()
calls), this will be addressed in followup patch.
Test setup:
- ixgbe 10gbit
- netperf UDP_STREAM, 64 byte packets
- 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter):
empty mangle and raw prerouting, mangle and filter input hooks:
353.9
this patch:
364.2
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
net/netfilter/nft_payload.c:187:18: warning: incorrect type in return expression (expected bool got restricted __sum16 [usertype] check)
net/netfilter/nft_exthdr.c:222:14: warning: cast to restricted __be32
net/netfilter/nft_rt.c:49:23: warning: incorrect type in assignment (different base types expected unsigned int got restricted __be32)
net/netfilter/nft_rt.c:70:25: warning: symbol 'nft_rt_policy' was not declared. Should it be static?
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the sa_id before filling it.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Fixes: d51d081d65 ("[IPSEC]: Sync series - user")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The memory reserved to dump the expired xfrm state includes padding
bytes in struct xfrm_user_expire added by the compiler for alignment. To
prevent the heap info leak, memset(0) the remainder of the struct.
Initializing the whole structure isn't needed as copy_to_user_state()
already takes care of clearing the padding bytes within the 'state'
member.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the whole struct before filling
it.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 0603eac0d6 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The memory reserved to dump the xfrm offload state includes padding
bytes of struct xfrm_user_offload added by the compiler for alignment.
Add an explicit memset(0) before filling the buffer to avoid the heap
info leak.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Currently, in the udp6 code, the dst cookie is not initialized/updated
concurrently with the RX dst used by early demux.
As a result, the dst_check() in the early_demux path always fails,
the rx dst cache is always invalidated, and we can't really
leverage significant gain from the demux lookup.
Fix it adding udp6 specific variant of sk_rx_dst_set() and use it
to set the dst cookie when the dst entry is really changed.
The issue is there since the introduction of early demux for ipv6.
Fixes: 5425077d73 ("net: ipv6: Add early demux handler for UDP unicast")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd fixes from Bruce Fields:
"Two nfsd bugfixes, neither 4.13 regressions, but both potentially
serious"
* tag 'nfsd-4.13-2' of git://linux-nfs.org/~bfields/linux:
net: sunrpc: svcsock: fix NULL-pointer exception
nfsd: Limit end of page list when decoding NFSv4 WRITE
syszkaller got a hang in tcp stack, related to a bug in
tcp_sendpage_locked()
root@syzkaller:~# cat /proc/3059/stack
[<ffffffff83de926c>] __lock_sock+0x1dc/0x2f0
[<ffffffff83de9473>] lock_sock_nested+0xf3/0x110
[<ffffffff8408ce01>] tcp_sendmsg+0x21/0x50
[<ffffffff84163b6f>] inet_sendmsg+0x11f/0x5e0
[<ffffffff83dd8eea>] sock_sendmsg+0xca/0x110
[<ffffffff83dd9547>] kernel_sendmsg+0x47/0x60
[<ffffffff83de35dc>] sock_no_sendpage+0x1cc/0x280
[<ffffffff8408916b>] tcp_sendpage_locked+0x10b/0x160
[<ffffffff84089203>] tcp_sendpage+0x43/0x60
[<ffffffff841641da>] inet_sendpage+0x1aa/0x660
[<ffffffff83dd4fcd>] kernel_sendpage+0x8d/0xe0
[<ffffffff83dd50ac>] sock_sendpage+0x8c/0xc0
[<ffffffff81b63300>] pipe_to_sendpage+0x290/0x3b0
[<ffffffff81b67243>] __splice_from_pipe+0x343/0x750
[<ffffffff81b6a459>] splice_from_pipe+0x1e9/0x330
[<ffffffff81b6a5e0>] generic_splice_sendpage+0x40/0x50
[<ffffffff81b6b1d7>] SyS_splice+0x7b7/0x1610
[<ffffffff84d77a01>] entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: 306b13eb3c ("proto_ops: Add locked held versions of sendmsg and sendpage")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is ugly to hide a u32-filter-specific pointer inside Qdisc,
this breaks the TC layers:
1. Qdisc is a generic representation, should not have any specific
data of any type
2. Qdisc layer is above filter layer, should only save filters in
the list of struct tcf_proto.
This pointer is used as the head of the chain of u32 hash tables,
that is struct tc_u_hnode, because u32 filter is very special,
it allows to create multiple hash tables within one qdisc and
across multiple u32 filters.
Instead of using this ugly pointer, we can just save it in a global
hash table key'ed by (dev ifindex, qdisc handle), therefore we can
still treat it as a per qdisc basis data structure conceptually.
Of course, because of network namespaces, this key is not unique
at all, but it is fine as we already have a pointer to Qdisc in
struct tc_u_common, we can just compare the pointers when collision.
And this only affects slow paths, has no impact to fast path,
thanks to the pointer ->tp_c.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For TC classes, their ->get() and ->put() are always paired, and the
reference counting is completely useless, because:
1) For class modification and dumping paths, we already hold RTNL lock,
so all of these ->get(),->change(),->put() are atomic.
2) For filter bindiing/unbinding, we use other reference counter than
this one, and they should have RTNL lock too.
3) For ->qlen_notify(), it is special because it is called on ->enqueue()
path, but we already hold qdisc tree lock there, and we hold this
tree lock when graft or delete the class too, so it should not be gone
or changed until we release the tree lock.
Therefore, this patch removes ->get() and ->put(), but:
1) Adds a new ->find() to find the pointer to a class by classid, no
refcnt.
2) Move the original class destroy upon the last refcnt into ->delete(),
right after releasing tree lock. This is fine because the class is
already removed from hash when holding the lock.
For those who also use ->put() as ->unbind(), just rename them to reflect
this change.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like for TC actions, ->delete() is a special case,
we have to prepare and fill the notification before delete
otherwise would get use-after-free after we remove the
reference count.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a few bugs around refcnt handling in the new BPF congestion
control setsockopt:
- The new ca is assigned to icsk->icsk_ca_ops even in the case where we
cannot get a reference on it. This would lead to a use after free,
since that ca is going away soon.
- Changing the congestion control case doesn't release the refcnt on
the previous ca.
- In the reinit case, we first leak a reference on the old ca, then we
call tcp_reinit_congestion_control on the ca that we have just
assigned, leading to deinitializing the wrong ca (->release of the
new ca on the old ca's data) and releasing the refcount on the ca
that we actually want to use.
This is visible by building (for example) BIC as a module and setting
net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from
samples/bpf.
This patch fixes the refcount issues, and moves reinit back into tcp
core to avoid passing a ca pointer back to BPF.
Fixes: 91b5b21c7c ("bpf: Add support for changing congestion control")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the following seg6local actions.
- SEG6_LOCAL_ACTION_END_T: regular SRH processing and forward to the
next-hop looked up in the specified routing table.
- SEG6_LOCAL_ACTION_END_DX2: decapsulate an L2 frame and forward it to
the specified network interface.
- SEG6_LOCAL_ACTION_END_DX4: decapsulate an IPv4 packet and forward it,
possibly to the specified next-hop.
- SEG6_LOCAL_ACTION_END_DT6: decapsulate an IPv6 packet and forward it
to the next-hop looked up in the specified routing table.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds three helper functions to be used with the seg6local packet
processing actions.
The decap_and_validate() function will be used by the End.D* actions, that
decapsulate an SR-enabled packet.
The advance_nextseg() function applies the fundamental operations to update
an SRH for the next segment.
The lookup_nexthop() function helps select the next-hop for the processed
SR packets. It supports an optional next-hop address to route the packet
specifically through it, and an optional routing table to use.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that the seg6local lightweight tunnel is used solely
with IPv6 routes and processes only IPv6 packets.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the L2 frame encapsulation mechanism, referred to
as T.Encaps.L2 in the SRv6 specifications [1].
A new type of SRv6 tunnel mode is added (SEG6_IPTUN_MODE_L2ENCAP). It only
accepts packets with an existing MAC header (i.e., it will not work for
locally generated packets). The resulting packet looks like IPv6 -> SRH ->
Ethernet -> original L3 payload. The next header field of the SRH is set to
NEXTHDR_NONE.
[1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables the SRv6 encapsulation mode to carry an IPv4 payload.
All the infrastructure was already present, I just had to add a parameter
to seg6_do_srh_encap() to specify the inner packet protocol, and perform
some additional checks.
Usage example:
ip route add 1.2.3.4 encap seg6 mode encap segs fc00::1,fc00::2 dev eth0
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
rt_cookie might be used uninitialized, fix this by
initializing it.
Fixes: c5cff8561d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make this const as it is only stored as a reference in a const field of
a net_device structure.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We use skb_availroom to calculate the skb tailroom for the
ESP trailer. skb_availroom calculates the tailroom and
subtracts this value by reserved_tailroom. However
reserved_tailroom is a union with the skb mark. This means
that we subtract the tailroom by the skb mark if set.
Fix this by using skb_tailroom instead.
Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We allocate the page fragment for the ESP trailer inside
a spinlock, but consume it outside of the lock. This
is racy as some other cou could get the same page fragment
then. Fix this by consuming the page fragment inside the
lock too.
Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If we fail to find a valid bearer in tipc_node_get_linkname(),
node_read_unlock() is called without holding the node read lock.
This commit fixes this error.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tipc_msg_reverse(), we assign skb attributes to local pointers
in stack at startup. This is followed by skb_linearize() and for
cloned buffers we perform skb relocation using pskb_expand_head().
Both these methods may update the skb attributes and thus making
the pointers incorrect.
In this commit, we fix this error by ensuring that the pointers
are re-assigned after any of these skb operations.
Fixes: 29042e19f2 ("tipc: let function tipc_msg_reverse() expand header
when needed")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tipc_rcv(), we linearize only the header and usually the packets
are consumed as the nodes permit direct reception. However, if the
skb contains tunnelled message due to fail over or synchronization
we parse it in tipc_node_check_state() without performing
linearization. This will cause link disturbances if the skb was
non linear.
In this commit, we perform linearization for the above messages.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzkaller reported a refcount_t warning [1]
Issue here is that noop_qdisc refcnt was never really considered as
a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :
if (qdisc->flags & TCQ_F_BUILTIN ||
!refcount_dec_and_test(&qdisc->refcnt)))
return;
Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
really needed, but harmless until refcount_t came.
To fix this problem, we simply need to not increment noop_qdisc.refcnt,
since we never decrement it.
[1]
refcount_t: increment on 0; use-after-free.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x417 kernel/panic.c:180
__warn+0x1c4/0x1d9 kernel/panic.c:541
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152
RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282
RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000
RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8
RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0
R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000
attach_default_qdiscs net/sched/sch_generic.c:792 [inline]
dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833
__dev_open+0x227/0x330 net/core/dev.c:1380
__dev_change_flags+0x695/0x990 net/core/dev.c:6726
dev_change_flags+0x88/0x140 net/core/dev.c:6792
dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256
dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554
sock_do_ioctl+0x94/0xb0 net/socket.c:968
sock_ioctl+0x2c2/0x440 net/socket.c:1058
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
Fixes: 7b93640502 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Reshetova, Elena <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter <dan.carpenter@oracle.com> observed that the while()
loop in svc_rdma_build_read_chunk() does not document the assumption
that the loop interior is always executed at least once.
Defensive: the function now returns -EINVAL if this assumption
fails.
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Close an attack vector by moving the arrays of per-server methods to
read-only memory.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Close an attack vector by moving the arrays of server-side transport
methods to read-only memory.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Allow our callers to influence the choice of ECMP link by honoring the
hash passed together with the flow info. This allows for special
treatment of ICMP errors which we would like to route over the same path
as the IPv6 datagram that triggered the error.
Also go through rt6_multipath_hash(), in the usual case when we aren't
dealing with an ICMP error, so that there is one central place where
multipath hash is computed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 644d0e6569 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has
turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes
sense to keep it around. Also remove the accompanying comment that has
become outdated.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When forwarding or sending out an ICMPv6 error, look at the embedded
packet that triggered the error and compute a flow hash over its
headers.
This let's us route the ICMP error together with the flow it belongs to
when multipath (ECMP) routing is in use, which in turn makes Path MTU
Discovery work in ECMP load-balanced or anycast setups (RFC 7690).
Granted, end-hosts behind the ECMP router (aka servers) need to reflect
the IPv6 Flow Label for PMTUD to work.
The code is organized to be in parallel with ipv4 stack:
ip_multipath_l3_keys -> ip6_multipath_l3_keys
fib_multipath_hash -> rt6_multipath_hash
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reflecting IPv6 Flow Label at server nodes is useful in environments
that employ multipath routing to load balance the requests. As "IPv6
Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error
messages generated in response to a downstream packets from the server
can be routed by a load balancer back to the original server without
looking at transport headers, if the server applies the flow label
reflection. This enables the Path MTU Discovery past the ECMP router in
load-balance or anycast environments where each server node is reachable
by only one path.
Introduce a sysctl to enable flow label reflection per net namespace for
all newly created sockets. Same could be earlier achieved only per
socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR
socket option.
[1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 9dbbfb0ab6 function tipc_sk_reinit
had additional logic added to loop in the event that function
rhashtable_walk_next() returned -EAGAIN. No worries.
However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
and therefore skips the call to rhashtable_walk_stop(). That has
the effect of calling rcu_read_lock() without its paired call to
rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
may not be apparent for a while, especially since resize events may
be rare. But the comments to rhashtable_walk_start() state:
* ...Note that we take the RCU lock in all
* cases including when we return an error. So you must always call
* rhashtable_walk_stop to clean up.
This patch replaces the continue with a goto and label to ensure a
matching call to rhashtable_walk_stop().
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is too much overhead in the current trace_xdp_redirect
tracepoint as it does strcpy and strlen on the net_device names.
Besides, exposing the ifindex/index is actually the information that
is needed in the tracepoint to diagnose issues. When a lookup fails
(either ifindex or devmap index) then there is a need for saying which
to_index that have issues.
V2: Adjust args to be aligned with trace_xdp_exception.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the xdp_do_generic_redirect() call fails, it trigger the
trace_xdp_exception tracepoint. It seems better to use the same
tracepoint trace_xdp_redirect, as the native xdp_do_redirect{,_map} does.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given there is a tracepoint that can track the error code
of xdp_do_redirect calls, the WARN_ONCE in bpf_warn_invalid_xdp_redirect
doesn't seem relevant any longer. Simply remove the function.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix use after free of struct proc_dir_entry in ipt_CLUSTERIP, patch
from Sabrina Dubroca.
2) Fix spurious EINVAL errors from iptables over nft compatibility layer.
3) Reload pointer to ip header only if there is non-terminal verdict,
ie. XT_CONTINUE, otherwise invalid memory access may happen, patch
from Taehee Yoo.
4) Fix interaction between SYNPROXY and NAT, SYNPROXY adds sequence
adjustment already, however from nf_nat_setup() assumes there's not.
Patch from Xin Long.
5) Fix burst arithmetics in nft_limit as Joe Stringer mentioned during
NFWS in Faro. Patch from Andy Zhou.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The returns on some if statements are not indented correctly,
add in the missing tab.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Doesn't change generated code, but will make it easier to eventually
make the actual trackers themselvers const.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
CONFIG_NF_CONNTRACK_PROCFS is deprecated, no need to use a function
pointer in the trackers for this. Place the printf formatting in
the one place that uses it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
no need to waste storage for something that is only needed
in one place and can be deduced from protocol number.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
no need to waste storage for something that is only needed
in one place and can be deduced from protocol number.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
avoids a pointer and allows struct to be const later on.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>