Check for net_device_ops structures that are only stored in the netdev_ops
field of a net_device structure. This field is declared const, so
net_device_ops structures that have this property can be declared as const
also.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct net_device_ops i@p = { ... };
@ok@
identifier r.i;
struct net_device e;
position p;
@@
e.netdev_ops = &i@p;
@bad@
position p != {r.p,ok.p};
identifier r.i;
struct net_device_ops e;
@@
e@i@p
@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
struct net_device_ops i = { ... };
// </smpl>
The result of size on this file before the change is:
text data bss dec hex filename
3401 931 44 4376 1118 net/l2tp/l2tp_eth.o
and after the change it is:
text data bss dec hex filename
3993 347 44 4384 1120 net/l2tp/l2tp_eth.o
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
With large BDP TCP flows and lossy networks, it is very important
to keep a low number of skbs in the write queue.
RACK and SACK processing can perform a linear scan of it.
We should avoid putting any payload in skb->head, so that SACK
shifting can be done if needed.
With this patch, we allow to pack ~0.5 MB per skb instead of
the 64KB initially cooked at tcp_sendmsg() time.
This gives a reduction of number of skbs in write queue by eight.
tcp_rack_detect_loss() likes this.
We still allow payload in skb->head for first skb put in the queue,
to not impact RPC workloads.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb is not freed if newsk is NULL. Rework the error path so free_skb is
unconditionally called on function exit.
Fixes: c3ea9fa274 ("[IrDA] af_irda: IRDA_ASSERT cleanups")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a TCP socket gets a large write queue, an overflow can happen
in a test in __tcp_retransmit_skb() preventing all retransmits.
The flow then stalls and resets after timeouts.
Tested:
sysctl -w net.core.wmem_max=1000000000
netperf -H dest -- -s 1000000000
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a configuration option to inject packet loss by discarding
approximately every 8th packet received and approximately every 8th DATA
packet transmitted.
Note that no locking is used, but it shouldn't really matter.
Signed-off-by: David Howells <dhowells@redhat.com>
Improve sk_buff tracing within AF_RXRPC by the following means:
(1) Use an enum to note the event type rather than plain integers and use
an array of event names rather than a big multi ?: list.
(2) Distinguish Rx from Tx packets and account them separately. This
requires the call phase to be tracked so that we know what we might
find in rxtx_buffer[].
(3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
event type.
(4) A pair of 'rotate' events are added to indicate packets that are about
to be rotated out of the Rx and Tx windows.
(5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
packet loss injection recording.
Signed-off-by: David Howells <dhowells@redhat.com>
Remove _enter/_debug/_leave calls from rxrpc_recvmsg_data() of which one
uses an uninitialised variable.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint to follow the insertion of a packet into the transmit
buffer, its transmission and its rotation out of the buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a pair of tracepoints, one to track rxrpc_connection struct ref
counting and the other to track the client connection cache state.
Signed-off-by: David Howells <dhowells@redhat.com>
Add additional call tracepoint points for noting call-connected,
call-released and connection-failed events.
Also fix one tracepoint that was using an integer instead of the
corresponding enum value as the point type.
Signed-off-by: David Howells <dhowells@redhat.com>
Print a symbolic packet type name for each valid received packet in the
trace output, not just a number.
Signed-off-by: David Howells <dhowells@redhat.com>
Fix the basic transmit DATA packet content size at 1412 bytes so that they
can be arbitrarily assembled into jumbo packets.
In the future, I'm thinking of moving to keeping a jumbo packet header at
the beginning of each packet in the Tx queue and creating the packet header
on the spot when kernel_sendmsg() is invoked. That way, jumbo packets can
be assembled on the spur of the moment for (re-)transmission.
Signed-off-by: David Howells <dhowells@redhat.com>
rxrpc_send_call_packet() should use type in both its switch-statements
rather than using pkt->whdr.type. This might give the compiler an easier
job of uninitialised variable checking.
Signed-off-by: David Howells <dhowells@redhat.com>
Don't transmit an ACK if call->ackr_reason in unset. There's the
possibility of a race between recvmsg() sending an ACK and the background
processing thread trying to send the same one.
Signed-off-by: David Howells <dhowells@redhat.com>
Make the retransmission algorithm use for-loops instead of do-loops and
move the counter increments into the for-statement increment slots.
Though the do-loops are slighly more efficient since there will be at least
one pass through the each loop, the counter increments are harder to get
right as the continue-statements skip them.
Without this, if there are any positive acks within the loop, the do-loop
will cycle forever because the counter increment is never done.
Signed-off-by: David Howells <dhowells@redhat.com>
The soft-ACK parser doesn't increment the pointer into the soft-ACK list,
resulting in the first ACK/NACK value being applied to all the relevant
packets in the Tx queue. This has the potential to miss retransmissions
and cause excessive retransmissions.
Fix this by incrementing the pointer.
Signed-off-by: David Howells <dhowells@redhat.com>
If the last call on a client connection is release after the connection has
had a bunch of calls allocated but before any DATA packets are sent (so
that it's not yet marked RXRPC_CONN_EXPOSED), an assertion will happen in
rxrpc_disconnect_client_call().
af_rxrpc: Assertion failed - 1(0x1) >= 2(0x2) is false
------------[ cut here ]------------
kernel BUG at ../net/rxrpc/conn_client.c:753!
This is because it's expecting the conn to have been exposed and to have 2
or more refs - but this isn't necessarily the case.
Simply remove the assertion. This allows the conn to be moved into the
inactive state and deleted if it isn't resurrected before the final put is
called.
Signed-off-by: David Howells <dhowells@redhat.com>
Call rxrpc_release_call() on getting an error in rxrpc_new_client_call()
rather than trying to do the cleanup ourselves. This isn't a problem,
provided we set RXRPC_CALL_HAS_USERID only if we actually add the call to
the calls tree as cleanup code fragments that would otherwise cause
problems are conditional.
Without this, we miss some of the cleanup.
Signed-off-by: David Howells <dhowells@redhat.com>
In rxrpc_put_one_client_conn(), if a connection has RXRPC_CONN_COUNTED set
on it, then it's accounted for in rxrpc_nr_client_conns and may be on
various lists - and this is cleaned up correctly.
However, if the connection doesn't have RXRPC_CONN_COUNTED set on it, then
the put routine returns rather than just skipping the extra bit of cleanup.
Fix this by making the extra bit of clean up conditional instead and always
killing off the connection.
This manifests itself as connections with a zero usage count hanging around
in /proc/net/rxrpc_conns because the connection allocated, but discarded,
due to a race with another process that set up a parallel connection, which
was then shared instead.
Signed-off-by: David Howells <dhowells@redhat.com>
Purge the queue of to_be_accepted calls on socket release. Note that
purging sock_calls doesn't release the ref owned by to_be_accepted.
Probably the sock_calls list is redundant given a purges of the recvmsg_q,
the to_be_accepted queue and the calls tree.
Signed-off-by: David Howells <dhowells@redhat.com>
Record calls that need to be accepted using sk_acceptq_added() otherwise
the backlog counter goes negative because sk_acceptq_removed() is called.
This causes the preallocator to malfunction.
Calls that are preaccepted by AFS within the kernel aren't affected by
this.
Signed-off-by: David Howells <dhowells@redhat.com>
The code for determining the last packet in rxrpc_recvmsg_data() has been
using the RXRPC_CALL_RX_LAST flag to determine if the rx_top pointer points
to the last packet or not. This isn't a good idea, however, as the input
code may be running simultaneously on another CPU and that sets the flag
*before* updating the top pointer.
Fix this by the following means:
(1) Restrict the use of RXRPC_CALL_RX_LAST to the input routines only.
There's otherwise a synchronisation problem between detecting the flag
and checking tx_top. This could probably be dealt with by appropriate
application of memory barriers, but there's a simpler way.
(2) Set RXRPC_CALL_RX_LAST after setting rx_top.
(3) Make rxrpc_rotate_rx_window() consult the flags header field of the
DATA packet it's about to discard to see if that was the last packet.
Use this as the basis for ending the Rx phase. This shouldn't be a
problem because the recvmsg side of things is guaranteed to see the
packets in order.
(4) Make rxrpc_recvmsg_data() return 1 to indicate the end of the data if:
(a) the packet it has just processed is marked as RXRPC_LAST_PACKET
(b) the call's Rx phase has been ended.
Signed-off-by: David Howells <dhowells@redhat.com>
Move the check of rx_pkt_offset from rxrpc_locate_data() to the caller,
rxrpc_recvmsg_data(), so that it's more clear what's going on there.
Signed-off-by: David Howells <dhowells@redhat.com>
Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
This is then made conditional on CONFIG_IPV6.
Without this, the following can be seen:
net/built-in.o: In function `rxrpc_init_peer':
>> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd bugfix from Bruce Fields:
"Fix a memory corruption bug that I introduced in 4.7"
* tag 'nfsd-4.8-2' of git://linux-nfs.org/~bfields/linux:
svcauth_gss: Revert 64c59a3726 ("Remove unnecessary allocation")
There are a few places where an IE that matches not only the EID, but
also other bytes inside the element, needs to be found. To simplify
that and reduce the amount of similar code, implement a new helper
function to match the EID and an extra array of bytes.
Additionally, simplify cfg80211_find_vendor_ie() by using the new
match function.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In 46fa38e84b ("mac80211: allow software PS-Poll/U-APSD with
AP_LINK_PS"), Johannes allowed to use mac80211's code for handling
stations that go to PS or send PS-Poll / uAPSD trigger frames for
devices that enable RSS.
This means that mac80211 doesn't look at frames anymore but rather
relies on a notification that will come from the device when a PS
transition occurs or when a PS-Poll / trigger frame is detected by
the device.
iwlwifi will need this capability but still needs mac80211 to take
care of the TIM IE. Today, if a driver sets AP_LINK_PS, mac80211
will not update the TIM IE. Change mac80211 to check existence of
the set_tim driver callback rather than using AP_LINK_PS to decide
if the driver handles the TIM IE internally or not.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[reword commit message a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for the 2-bytes Qualcomm tag that gigabit switches such as
the QCA8337/N might insert when receiving packets, or that we need
to insert while targeting specific switch ports. The tag is inserted
directly behind the ethernet header.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function ip_rcv_finish() calls l3mdev_ip_rcv(). On any VRF except
the global VRF, this replaces skb->dev with the VRF master interface.
When calling ip_route_input_noref() from here, the checks for forwarding
look at this master device instead of the initial ingress interface.
This will allow packets to be routed which normally would be dropped.
For example, an interface that is not assigned an IP address should
drop packets, but because the checking is against the master device, the
packet will be forwarded.
The fix here is to still call l3mdev_ip_rcv(), but remember the initial
net_device. This is passed to the other functions within ip_rcv_finish,
so they still see the original interface.
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich says:
====================
Here are two batman-adv bugfix patches:
- Fix reference counting for last_bonding_candidate, by Sven Eckelmann
- Fix head room reservation for ELP packets, by Linus Luessing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When skb replaces another one in ooo queue, I forgot to also
update tp->ooo_last_skb as well, if the replaced skb was the last one
in the queue.
To fix this, we simply can re-use the code that runs after an insertion,
trying to merge skbs at the right of current skb.
This not only fixes the bug, but also remove all small skbs that might
be a subset of the new one.
Example:
We receive segments 2001:3001, 4001:5001
Then we receive 2001:8001 : We should replace 2001:3001 with the big
skb, but also remove 4001:50001 from the queue to save space.
packetdrill test demonstrating the bug
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
+0.100 < . 1:1(0) ack 1 win 1024
+0 accept(3, ..., ...) = 4
+0.01 < . 1001:2001(1000) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 1001:2001>
+0.01 < . 1001:3001(2000) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 1001:2001 1001:3001>
Fixes: 9f5afeae51 ("tcp: use an RB tree for ooo receive queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells says:
====================
rxrpc: Support IPv6
Here is a set of patches that add IPv6 support. They need to be applied on
top of the just-posted miscellaneous fix patches. They are:
(1) Make autobinding of an unconnected socket work when sendmsg() is
called to initiate a client call.
(2) Don't specify the protocol when creating the client socket, but rather
take the default instead.
(3) Use rxrpc_extract_addr_from_skb() in a couple of places that were
doing the same thing manually. This allows the IPv6 address
extraction to be done in fewer places.
(4) Add IPv6 support. With this, calls can be made to IPv6 servers from
userspace AF_RXRPC programs; AFS, however, can't use IPv6 yet as the
RPC calls need to be upgradeable.
====================
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells says:
====================
rxrpc: Miscellaneous fixes
Here's a set of miscellaneous fix patches. There are a couple of points of
note:
(1) There is one non-fix patch that adjusts the call ref tracking
tracepoint to make kernel API-held refs on calls more obvious. This
is a prerequisite for the patch that fixes prealloc refcounting.
(2) The final patch alters how jumbo packets that partially exceed the
receive window are handled. Previously, space was being left in the
Rx buffer for them, but this significantly hurts performance as the Rx
window can't be increased to match the OpenAFS Tx window size.
Instead, the excess subpackets are discarded and an EXCEEDS_WINDOW ACK
is generated for the first. To avoid the problem of someone trying to
run the kernel out of space by feeding the kernel a series of
overlapping maximal jumbo packets, we stop allowing jumbo packets on a
call if we encounter more than three jumbo packets with duplicate or
excessive subpackets.
====================
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
A few more fixes:
* better mesh path fixing, from Thomas
* fix TIM IE recalculation after sending frames
to a sleeping station, from Felix
* fix sequence number assignment while sending
frames to a sleeping station, also from Felix
* validate number of probe response CSA counter
offsets, fixing a copy/paste bug (from myself)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ovs kernel data path currently defers the execution of all
recirc actions until stack utilization is at a minimum.
This is too limiting for some packet forwarding scenarios due to
the small size of the deferred action FIFO (10 entries). For
example, broadcast traffic sent out more than 10 ports with
recirculation results in packet drops when the deferred action
FIFO becomes full, as reported here:
http://openvswitch.org/pipermail/dev/2016-March/067672.html
Since the current recursion depth is available (it is already tracked
by the exec_actions_level pcpu variable), we can use it to determine
whether to execute recirculation actions immediately (safe when
recursion depth is low) or defer execution until more stack space is
available.
With this change, the deferred action fifo size becomes a non-issue
for currently failing scenarios because it is no longer used when
there are three or fewer recursions through ovs_execute_actions().
Suggested-by: Pravin Shelar <pshelar@ovn.org>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c3f8324188 "net: Add full IPv6 addresses to flow_keys" added an
unused instance of struct flow_dissector_key_addrs into struct fl_flow_key,
remove it.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This action is intended to be an upgrade from a usability perspective
from pedit (as well as operational debugability).
Compare this:
sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action pedit munge offset -14 u8 set 0x02 \
munge offset -13 u8 set 0x15 \
munge offset -12 u8 set 0x15 \
munge offset -11 u8 set 0x15 \
munge offset -10 u16 set 0x1515 \
pipe
to:
sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod dmac 02:15:15:15:15:15
Also try to do a MAC address swap with pedit or worse
try to debug a policy with destination mac, source mac and
etherype. Then make few rules out of those and you'll get my point.
In the future common use cases on pedit can be migrated to this action
(as an example different fields in ip v4/6, transports like tcp/udp/sctp
etc). For this first cut, this allows modifying basic ethernet header.
The most important ethernet use case at the moment is when redirecting or
mirroring packets to a remote machine. The dst mac address needs a re-write
so that it doesnt get dropped or confuse an interconnecting (learning) switch
or dropped by a target machine (which looks at the dst mac). And at times
when flipping back the packet a swap of the MAC addresses is needed.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a small skb_at_tc_ingress() helper for testing for ingress, so
make use of it. cls_bpf already uses it and so should act_bpf.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb_mac_header_was_set() test in cls_bpf's and act_bpf's fast-path is
actually unnecessary and can be removed altogether. This was added by
commit a166151cbe ("bpf: fix bpf helpers to use skb->mac_header relative
offsets"), which was later on improved by 3431205e03 ("bpf: make programs
see skb->data == L2 for ingress and egress"). We're always guaranteed to
have valid mac header at the time we invoke cls_bpf_classify() or tcf_bpf().
Reason is that since 6d1ccff627 ("net: reset mac header in dev_start_xmit()")
we do skb_reset_mac_header() in __dev_queue_xmit() before we could call
into sch_handle_egress() or any subsequent enqueue. sch_handle_ingress()
always sees a valid mac header as well (things like skb_reset_mac_len()
would badly fail otherwise). Thus, drop the unnecessary test in classifier
and action case.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove rcu_read_lock protection from tunnel_key_dump and use
rtnl_dereference, dump operation is protected by rtnl lock.
Also, remove rcu_read_lock from tunnel_key_release and use
rcu_dereference_protected.
Both operations are running exclusively and a writer couldn't modify
t->params while those functions are executed.
Fixes: 54d94fd89d90 ('net/sched: Introduce act_tunnel_key')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For an array, there's no need to use &array, so just use the
plain wiphy->addresses[i].addr here to silence smatch.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Based on consecutive msdu failures, mac80211 triggers CQM packet-loss
mechanism. Drivers like ath10k that have its own connection monitoring
algorithm, offloaded to firmware for triggering station kickout. In case
of station kickout, driver will report low ack status by mac80211 API
(ieee80211_report_low_ack).
This flag will enable the driver to completely rely on firmware events
for station kickout and bypass mac80211 packet loss mechanism.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>