netif_napi_add() could report an error like this below due to it allows
to pass a format string for wildcarding before calling
dev_get_valid_name(),
"netif_napi_add() called with weight 256 on device eth%d"
For example, hns_enet_drv module does this.
hns_nic_try_get_ae
hns_nic_init_ring_data
netif_napi_add
register_netdev
dev_get_valid_name
Hence, make it a bit more human-readable by using netdev_err_once()
instead.
Signed-off-by: Qian Cai <cai@gmx.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
Release of its last reference triggers a completion notification.
The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
the skbs, because it can build, send and free skbs within its loop,
possibly reaching refcount zero and freeing the ubuf_info too soon.
The UDP stack currently also takes this extra ref, but does not need
it as all skbs are sent after return from __ip(6)_append_data.
Avoid the extra refcount_inc and refcount_dec_and_test, and generally
the sock_zerocopy_put in the common path, by passing the initial
reference to the first skb.
This approach is taken instead of initializing the refcount to 0, as
that would generate error "refcount_t: increment on 0" on the
next skb_zcopy_set.
Changes
v3 -> v4
- Move skb_zcopy_set below the only kfree_skb that might cause
a premature uarg destroy before skb_zerocopy_put_abort
- Move the entire skb_shinfo assignment block, to keep that
cacheline access in one place
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
interpret flag MSG_ZEROCOPY.
This patch was previously part of the zerocopy RFC patchsets. Zerocopy
is not effective at small MTU. With segmentation offload building
larger datagrams, the benefit of page flipping outweights the cost of
generating a completion notification.
tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
test patch and making skb_orphan_frags_rx same as skb_orphan_frags:
ipv4 udp -t 1
tx=191312 (11938 MB) txc=0 zc=n
rx=191312 (11938 MB)
ipv4 udp -z -t 1
tx=304507 (19002 MB) txc=304507 zc=y
rx=304507 (19002 MB)
ok
ipv6 udp -t 1
tx=174485 (10888 MB) txc=0 zc=n
rx=174485 (10888 MB)
ipv6 udp -z -t 1
tx=294801 (18396 MB) txc=294801 zc=y
rx=294801 (18396 MB)
ok
Changes
v1 -> v2
- Fixup reverse christmas tree violation
v2 -> v3
- Split refcount avoidance optimization into separate patch
- Fix refcount leak on error in fragmented case
(thanks to Paolo Abeni for pointing this one out!)
- Fix refcount inc on zero
- Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
This is needed since commit 5cf4a8532c ("tcp: really ignore
MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences
a transport's asoc under rcu_read_lock while asoc is freed not after
a grace period, which leads to a use-after-free panic.
This patch fixes it by calling kfree_rcu to make asoc be freed after
a grace period.
Note that only the asoc's memory is delayed to free in the patch, it
won't cause sk to linger longer.
Thanks Neil and Marcelo to make this clear.
Fixes: 7fda702f93 ("sctp: use new rhlist interface on sctp transport rhashtable")
Fixes: cd2b708750 ("sctp: check duplicate node before inserting a new transport")
Reported-by: syzbot+0b05d8aa7cb185107483@syzkaller.appspotmail.com
Reported-by: syzbot+aad231d51b1923158444@syzkaller.appspotmail.com
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already have of_get_nvmem_mac_address() but some non-DT systems want
to read the MAC address from NVMEM too. Implement a generalized routine
that takes struct device as argument.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Existing functions to retreive the l3mdev of a device did not walk the
master chain to find the upper master. This patch adds a function to
find the l3mdev, even indirect through e.g. a bridge:
+----------+
| |
| vrf-blue |
| |
+----+-----+
|
|
+----+-----+
| |
| br-blue |
| |
+----+-----+
|
|
+----+-----+
| |
| eth0 |
| |
+----------+
This will properly resolve the l3mdev of eth0 to vrf-blue.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP tunnel sockets are always opened unbound to a specific device. This
patch allow the socket to be bound on a custom device, which
incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many drivers load the device's firmware image during the initialization
flow either from the flash or from the disk. Currently this option is not
controlled by the user and the driver decides from where to load the
firmware image.
'fw_load_policy' gives the ability to control this option which allows the
user to choose between different loading policies supported by the driver.
This parameter can be useful while testing and/or debugging the device. For
example, testing a firmware bug fix.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pkt_len field in qdisc_skb_cb stores the skb length as it will
appear on the wire after segmentation. For byte accounting, this value
is more accurate than skb->len. It is computed on entry to the TC
layer, so only valid there.
Allow read access to this field from BPF tc classifier and action
programs. The implementation is analogous to tc_classid, aside from
restricting to read access.
To distinguish it from skb->len and self-describe export as wire_len.
Changes v1->v2
- Rename pkt_len to wire_len
Signed-off-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Vlad Dumitrescu <vladum@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If an asynchronous connection attempt completes while another task is
in xprt_connect(), then the call to rpc_sleep_on() could end up
racing with the call to xprt_wake_pending_tasks().
So add a second test of the connection state after we've put the
task to sleep and set the XPRT_CONNECTING flag, when we know that there
can be no asynchronous connection attempts still in progress.
Fixes: 0b9e794313 ("SUNRPC: Move the test for XPRT_CONNECTING into...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we retransmit an RPC request, we currently end up clobbering the
value of req->rq_rcv_buf.bvec that was allocated by the initial call to
xprt_request_prepare(req).
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
call_encode can be invoked more than once per RPC call. Ensure that
each call to gss_wrap_req_priv does not overwrite pointers to
previously allocated memory.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a task failed to get the write lock in the call to xprt_connect(), then
it will be queued on xprt->sending. In that case, it is possible for it
to get transmitted before the call to call_connect_status(), in which
case it needs to be handled by call_transmit_status() instead.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Now that call_rcu()'s callback is not invoked until after all
preempt-disable regions of code have completed (in addition to explicitly
marked RCU read-side critical sections), call_rcu() can be used in place
of call_rcu_sched(). This commit therefore makes that change.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
Now that call_rcu()'s callback is not invoked until after all bh-disable
regions of code have completed (in addition to explicitly marked
RCU read-side critical sections), call_rcu() can be used in place
of call_rcu_bh(). Similarly, rcu_barrier() can be used in place of
rcu_barrier_bh(). This commit therefore makes these changes.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <bridge@lists.linux-foundation.org>
Cc: <netdev@vger.kernel.org>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Now that call_rcu()'s callback is not invoked until after all bh-disable
regions of code have completed (in addition to explicitly marked
RCU read-side critical sections), call_rcu() can be used in place of
call_rcu_bh(). Similarly, synchronize_rcu() can be used in place of
synchronize_rcu_bh(). This commit therefore makes these changes.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <netdev@vger.kernel.org>
Now that call_rcu()'s callback is not invoked until after bh-disable
regions of code have completed (in addition to explicitly marked
RCU read-side critical sections), call_rcu() can be used in place
of call_rcu_bh(). Similarly, rcu_barrier() can be used in place o
frcu_barrier_bh(). This commit therefore makes these changes.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
After commit f42ee093be ("bpf/test_run: support cgroup local
storage") the bpf_test_run() function may fail with -ENOMEM, if
it's not possible to allocate memory for a cgroup local storage.
This error shouldn't be mixed with the return value of the testing
program. Let's add an additional argument with a pointer where to
store the testing program's result; and make bpf_test_run()
return either 0 or -ENOMEM.
Fixes: f42ee093be ("bpf/test_run: support cgroup local storage")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This is a leftover from days where single-cpu systems were common:
Store last port used to resolve a clash to use it as a starting point when
the next conflict needs to be resolved.
When we have parallel attempt to connect to same address:port pair,
its likely that both cores end up computing the same "available" port,
as both use same starting port, and newly used ports won't become
visible to other cores until the conntrack gets confirmed later.
One of the cores then has to drop the packet at insertion time because
the chosen new tuple turns out to be in use after all.
Lets simplify this: remove port rover and use a pseudo-random starting
point.
Note that this doesn't make netfilter default to 'fully random' mode;
the 'rover' was only used if NAT could not reuse source port as-is.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that call_rcu()'s callback is not invoked until after bh-disable
regions of code have completed (in addition to explicitly marked
RCU read-side critical sections), call_rcu() can be used in place
of call_rcu_bh(). Similarly, rcu_barrier() can be used in place of
rcu_barrier_bh() and synchronize_rcu() in place of synchronize_rcu_bh().
This commit therefore makes these changes.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Previously the SNMP TCPTIMEOUTS counter has inconsistent accounting:
1. It counts all SYN and SYN-ACK timeouts
2. It counts timeouts in other states except recurring timeouts and
timeouts after fast recovery or disorder state.
Such selective accounting makes analysis difficult and complicated. For
example the monitoring system needs to collect many other SNMP counters
to infer the total amount of timeout events. This patch makes TCPTIMEOUTS
counter simply counts all the retransmit timeout (SYN or data or FIN).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the SNMP counter LINUX_MIB_TCPRETRANSFAIL is not counting
the TSO/GSO properly on failed retransmission. This patch fixes that.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously there is an off-by-one bug on determining when to abort
a stalled window-probing socket. This patch fixes that so it is
consistent with tcp_write_timeout().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While introducing the DSA tagging protocol attribute, it was added to the DSA
slave network devices, but those actually see untagged traffic (that is their
whole purpose). Correct this mistake by putting the tagging sysfs attribute
under the DSA master network device where this is the information that we need.
While at it, also correct the sysfs documentation mistake that missed the
"dsa/" directory component of the attribute.
Fixes: 98cdb48071 ("net: dsa: Expose tagging protocol to user-space")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern and Nicolas Dichtel report that the handling of the netns id
0 is incorrect for the BPF socket lookup helpers: rather than finding
the netns with id 0, it is resolving to the current netns. This renders
the netns_id 0 inaccessible.
To fix this, adjust the API for the netns to treat all negative s32
values as a lookup in the current netns (including u64 values which when
truncated to s32 become negative), while any values with a positive
value in the signed 32-bit integer space would result in a lookup for a
socket in the netns corresponding to that id. As before, if the netns
with that ID does not exist, no socket will be found. Any netns outside
of these ranges will fail to find a corresponding socket, as those
values are reserved for future usage.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, the function only works for the bridge device itself, but
subsequent patches will need to be able to query the PVID of a given
bridge port, so extend the function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, pointer offsets in three BPF context structures are
broken in two scenarios: i) 32 bit compiled applications running
on 64 bit kernels, and ii) LLVM compiled BPF programs running
on 32 bit kernels. The latter is due to BPF target machine being
strictly 64 bit. So in each of the cases the offsets will mismatch
in verifier when checking / rewriting context access. Fix this by
providing a helper macro __bpf_md_ptr() that will enforce padding
up to 64 bit and proper alignment, and for context access a macro
bpf_ctx_range_ptr() which will cover full 64 bit member range on
32 bit archs. For flow_keys, we additionally need to force the
size check to sizeof(__u64) as with other pointer types.
Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Fixes: 4f738adba3 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Fixes: 2dbb9b9e6d ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Standard kernel compilation produces the following warning:
net/core/rtnetlink.c: In function ‘rtnl_newlink’:
net/core/rtnetlink.c:3232:1: warning: the frame size of 1288 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^
This should not really be an issue, as rtnl_newlink() stack is
generally quite shallow.
Fix the warning by allocating attributes with kmalloc() in a wrapper
and passing it down to rtnl_newlink(), avoiding complexities on error
paths.
Alternatively we could kmalloc() some structure within rtnl_newlink(),
slave attributes look like a good candidate. In practice it adds to
already rather high complexity and length of the function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_newlink() used to create VLAs based on link kind. Since
commit ccf8dbcd06 ("rtnetlink: Remove VLA usage") statically
sized array is created on the stack, so there is no more use
for a separate code block that used to be the VLA's live range.
While at it christmas tree the variables. Note that there is
a goto-based retry so to be on the safe side the variables can
no longer be initialized in place. It doesn't seem to matter,
logically, but why make the code harder to read..
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most linux hosts never setup TCP MD5 keys. We can avoid a
cache line miss (accessing tp->md5ig_info) on RX and TX
using a jump label.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case GRO is not as efficient as it should be or disabled,
we might have a user thread trapped in __release_sock() while
softirq handler flood packets up to the point we have to drop.
This patch balances work done from user thread and softirq,
to give more chances to __release_sock() to complete its work
before new packets are added the the backlog.
This also helps if we receive many ACK packets, since GRO
does not aggregate them.
This patch brings ~60% throughput increase on a receiver
without GRO, but the spectacular gain is really on
1000x release_sock() latency reduction I have measured.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal pointed out that non sack flows might suffer from ACK compression
added in the following patch ("tcp: implement coalescing on backlog queue")
Instead of tweaking tcp_add_backlog() we can take into
account how many ACK were coalesced, this information
will be available in skb_shinfo(skb)->gso_segs
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trace events are already present for the receive entry points, to indicate
how the reception entered the stack.
This patch adds the corresponding exit trace events that will bound the
reception such that all events occurring between the entry and the exit
can be considered as part of the reception context. This greatly helps
for dependency and root cause analyses.
Without this, it is not possible with tracepoint instrumentation to
determine whether a sched_wakeup event following a netif_receive_skb
event is the result of the packet reception or a simple coincidence after
further processing by the thread. It is possible using other mechanisms
like kretprobes, but considering the "entry" points are already present,
it would be good to add the matching exit events.
In addition to linking packets with wakeups, the entry/exit event pair
can also be used to perform network stack latency analyses.
Signed-off-by: Geneviève Bastien <gbastien@versatic.net>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: David S. Miller <davem@davemloft.net>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (tracing side)
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_assoc_update_frag_point() should be called whenever asoc->pathmtu
changes, but we missed one place in sctp_association_init(). It would
cause frag_point is zero when sending data.
As says in Jakub's reproducer, if sp->pathmtu is set by socketopt, the
new asoc->pathmtu inherits it in sctp_association_init(). Later when
transports are added and their pmtu >= asoc->pathmtu, it will never
call sctp_assoc_update_frag_point() to set frag_point.
This patch is to fix it by updating frag_point after asoc->pathmtu is
set as sp->pathmtu in sctp_association_init(). Note that it moved them
after sctp_stream_init(), as stream->si needs to be set first.
Frag_point's calculation is also related with datachunk's type, so it
needs to update frag_point when stream->si may be changed in
sctp_process_init().
v1->v2:
- call sctp_assoc_update_frag_point() separately in sctp_process_init
and sctp_association_init, per Marcelo's suggestion.
Fixes: 2f5e3c9df6 ("sctp: introduce sctp_assoc_update_frag_point")
Reported-by: Jakub Audykowicz <jakub.audykowicz@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
bpf-next 2018-11-30
The following pull-request contains BPF updates for your *net-next* tree.
(Getting out bit earlier this time to pull in a dependency from bpf.)
The main changes are:
1) Add libbpf ABI versioning and document API naming conventions
as well as ABI versioning process, from Andrey.
2) Add a new sk_msg_pop_data() helper for sk_msg based BPF
programs that is used in conjunction with sk_msg_push_data()
for adding / removing meta data to the msg data, from John.
3) Optimize convert_bpf_ld_abs() for 0 offset and fix various
lib and testsuite build failures on 32 bit, from David.
4) Make BPF prog dump for !JIT identical to how we dump subprogs
when JIT is in use, from Yonghong.
5) Rename btf_get_from_id() to make it more conform with libbpf
API naming conventions, from Martin.
6) Add a missing BPF kselftest config item, from Naresh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If a session in X25_STATE_1 (Awaiting Call Accept) receives a call
request, the session will be closed (x25_disconnect), cause=0x01
(Number Busy) and diag=0x48 (Call Collision) will be set and a clear
request will be send.
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
o x25_find_listener(): the compare for the null_x25_address was wrong.
We have to check the x25_addr of the listener socket instead of the
x25_addr of the incomming call.
o x25_bind(): it was not possible to bind a socket to null_x25_address
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The length of the called and calling address was not calculated
correctly (BCD encoding).
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can remove the loop and conditional branches
and compute wscale efficiently thanks to ilog2()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 04157469b7 ("net: Use static_key for XPS maps") introduced a
static key for XPS, but the increments/decrements don't match.
First, the static key's counter is incremented once for each queue, but
only decremented once for a whole batch of queues, leading to large
unbalances.
Second, the xps_rxqs_needed key is decremented whenever we reset a batch
of queues, whether they had any rxqs mapping or not, so that if we setup
cpu-XPS on em1 and RXQS-XPS on em2, resetting the queues on em1 would
decrement the xps_rxqs_needed key.
This reworks the accounting scheme so that the xps_needed key is
incremented only once for each type of XPS for all the queues on a
device, and the xps_rxqs_needed key is incremented only once for all
queues. This is sufficient to let us retrieve queues via
get_xps_queue().
This patch introduces a new reset_xps_maps(), which reinitializes and
frees the appropriate map (xps_rxqs_map or xps_cpus_map), and drops a
reference to the needed keys:
- both xps_needed and xps_rxqs_needed, in case of rxqs maps,
- only xps_needed, in case of CPU maps.
Now, we also need to call reset_xps_maps() at the end of
__netif_set_xps_queue() when there's no active map left, for example
when writing '00000000,00000000' to all queues' xps_rxqs setting.
Fixes: 04157469b7 ("net: Use static_key for XPS maps")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before commit 80d19669ec ("net: Refactor XPS for CPUs and Rx queues"),
netif_reset_xps_queues() did netdev_queue_numa_node_write() for all the
queues being reset. Now, this is only done when the "active" variable in
clean_xps_maps() is false, ie when on all the CPUs, there's no active
XPS mapping left.
Fixes: 80d19669ec ("net: Refactor XPS for CPUs and Rx queues")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial conflict in net/core/filter.c, a locally computed
'sdif' is now an argument to the function.
Signed-off-by: David S. Miller <davem@davemloft.net>
o Select the R_key to invalidate while the CPU cache still contains
the received RPC Call transport header, rather than waiting until
we're about to send the RPC Reply.
o Choose Send With Invalidate if there is exactly one distinct R_key
in the received transport header. If there's more than one, the
client will have to perform local invalidation after it has
already waited for remote invalidation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This adds a BPF SK_MSG program helper so that we can pop data from a
msg. We use this to pop metadata from a previous push data call.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>