pppol2tp_getsockopt() doesn't take into account the error code returned
by pppol2tp_tunnel_getsockopt() or pppol2tp_session_getsockopt(). If
error occurs there, pppol2tp_getsockopt() continues unconditionally and
reports erroneous values.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
pppol2tp_setsockopt() unconditionally overwrites the error value
returned by pppol2tp_tunnel_setsockopt() or
pppol2tp_session_setsockopt(), thus hiding errors from userspace.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the support for the 4-bytes tag for DSA port distinguishing inserted
allowing receiving and transmitting the packet via the particular port.
The tag is being added after the source MAC address in the ethernet
header.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Landen Chao <Landen.Chao@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We've added a considerable amount of fixes for stalls and issues
with the blk-mq scheduling in the 4.11 series since forking
off the for-4.12/block branch. We need to do improvements on
top of that for 4.12, so pull in the previous fixes to make
our lives easier going forward.
Signed-off-by: Jens Axboe <axboe@fb.com>
The recent extension of F-RTO 89fe18e44 ("tcp: extend F-RTO
to catch more spurious timeouts") interacts badly with certain
broken middle-boxes. These broken boxes modify and falsely raise
the receive window on the ACKs. During a timeout induced recovery,
F-RTO would send new data packets to probe if the timeout is false
or not. Since the receive window is falsely raised, the receiver
would silently drop these F-RTO packets. The recovery would take N
(exponentially backoff) timeouts to repair N packet losses. A TCP
performance killer.
Due to this unfortunate situation, this patch removes this extension
to revert F-RTO back to the RFC specification.
Fixes: 89fe18e44f ("tcp: extend F-RTO to catch more spurious timeouts")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove & from function pointers to conform to the style found elsewhere
in the file. Done using the following semantic patch
// <smpl>
@r@
identifier f;
@@
f(...) { ... }
@@
identifier r.f;
@@
- &f
+ f
// </smpl>
Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch replace list_entry with list_prev_entry as it makes the
code more clear to read.
Signed-off-by: simran singhal <singhalsimran0@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The following Coccinelle script was used to detect this:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
*((T *)e)
|
((T *)x)[...]
|
((T*)x)->f
|
- (T*)
e
)
Unnecessary parantheses are also remove.
Signed-off-by: simran singhal <singhalsimran0@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
ip_route_input when iif is given. If a multipath route is present for
the designated destination, fib_multipath_hash ends up being called with
that skb. However, as that skb contains no information beyond the
protocol type, the calculated hash does not match the one we would see
for a real packet.
There is currently no way to fix this for layer 4 hashing, as
RTM_GETROUTE doesn't have the necessary information to create layer 4
headers. To fix this for layer 3 hashing, set appropriate saddr/daddrs
in the skb and also change the protocol to UDP to avoid special
treatment for ICMP.
Signed-off-by: Florian Larysch <fl@n621.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add and use nfnl_msg_type() function to replace opencoded nfnetlink
message type. I suggested this change, Arushi Singhal made an initial
patch to address this but was missing several spots.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Code and Style cleanups, by Sven Eckelmann (5 patches)
- Remove an unneccessary memset, by Tobias Klauser
- DAT and BLA optimizations for various corner cases, by Andreas Pape
(5 patches)
- forward/rebroadcast packet restructuring, by Linus Luessing
(2 patches)
- ethtool cleanup and remove unncessary code, by Sven Eckelmann
(4 patches)
- use net_device_stats from net_device instead of private copy,
by Tobias Klauser
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells says:
====================
rxrpc: Miscellany
Here's a set of patches that make some minor changes to AF_RXRPC:
(1) Store error codes in struct rxrpc_call::error as negative codes and
only convert to positive in recvmsg() to avoid confusion inside the
kernel.
(2) Note the result of trying to abort a call (this fails if the call is
already 'completed').
(3) Don't abort on temporary errors whilst processing challenge and
response packets, but rather drop the packet and wait for
retransmission.
And also adds some more tracing:
(4) Protocol errors.
(5) Received abort packets.
(6) Changes in the Rx window size due to ACK packet information.
(7) Client call initiation (to allow the rxrpc_call struct pointer, the
wire call ID and the user ID/afs_call pointer to be cross-referenced).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp doesn't check sock's state before listening on it. It could
even cause changing a sock with any state to become a listening sock
when doing sctp_listen.
This patch is to fix it by checking sock's state in sctp_listen, so
that it will listen on the sock with right state.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Existing L2TP kernel code does not derive the optimal MTU for Ethernet
pseudowires and instead leaves this to a userspace L2TP daemon or
operator. If an MTU is not specified, the existing kernel code chooses
an MTU that does not take account of all tunnel header overheads, which
can lead to unwanted IP fragmentation. When L2TP is used without a
control plane (userspace daemon), we would prefer that the kernel does a
better job of choosing a default pseudowire MTU, taking account of all
tunnel header overheads, including IP header options, if any. This patch
addresses this.
Change-set here uses the new kernel function, kernel_sock_ip_overhead(),
to factor the outer IP overhead on the L2TP tunnel socket (including
IP Options, if any) when calculating the default MTU for an Ethernet
pseudowire, along with consideration of the inner Ethernet header.
Signed-off-by: R. Parameswaran <rparames@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new function, kernel_sock_ip_overhead(), is provided
to calculate the cumulative overhead imposed by the IP
Header and IP options, if any, on a socket's payload.
The new function returns an overhead of zero for sockets
that do not belong to the IPv4 or IPv6 address families.
This is used in the L2TP code path to compute the
total outer IP overhead on the L2TP tunnel socket when
calculating the default MTU for Ethernet pseudowires.
Signed-off-by: R. Parameswaran <rparames@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The expect check function __nf_ct_expect_check() asks the master_help is
necessary. So it is unnecessary to go ahead in ctnetlink_alloc_expect
when there is no help.
Actually the commit bc01befdcf ("netfilter: ctnetlink: add support for
user-space expectation helpers") permits ctnetlink create one expect
even though there is no master help. But the latter commit 3d058d7bc2
("netfilter: rework user-space expectation helper support") disables it
again.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
successful insert into the bysource hash sets IPS_SRC_NAT_DONE status bit
so we can check that instead of presence of nat extension which requires
extra deref.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_nat_mangle_{udp,tcp}_packet() returns int. However, it is used as
bool type in many spots. Fix this by consistently handle this return
value as a boolean.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, and the initializer fixes
were extracted from grsecurity. In this case, NULL initialize with { }
instead of undesignated NULLs.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry reported a crash when injecting faults in
attach_one_default_qdisc() and dev->qdisc is still
a noop_disc, the check before qdisc_hash_add() fails
to catch it because it tests NULL. We should test
against noop_qdisc since it is the default qdisc
at this point.
Fixes: 59cc1f61f0 ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
ip_route_input when iif is given. If a multipath route is present for
the designated destination, ip_multipath_icmp_hash ends up being called,
which uses the source/destination addresses within the skb to calculate
a hash. However, those are not set in the synthetic skb, causing it to
return an arbitrary and incorrect result.
Instead, use UDP, which gets no such special treatment.
Signed-off-by: Florian Larysch <fl@n621.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When remove one expect, it needs three statements. And there are
multiple duplicated codes in current code. So add one common function
nf_ct_remove_expect to consolidate this.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Because the type of expecting, the member of nf_conn_help, is u8, it
would overflow after reach U8_MAX(255). So it doesn't work when we
configure the max_expected exceeds 255 with expect policy.
Now add the check for max_expected. Return the -EINVAL when it exceeds
the limit.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a tracepoint (rxrpc_connect_call) to log the combination of rxrpc_call
pointer, afs_call pointer/user data and wire call parameters to make it
easier to match the tracebuffer contents to captured network packets.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint (rxrpc_rx_rwind_change) to log changes in a call's receive
window size as imposed by the peer through an ACK packet.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received
packets. The following changes are made:
(1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a
call and mark the call aborted. This is wrapped by
rxrpc_abort_eproto() that makes the why string usable in trace.
(2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error
generation points, replacing rxrpc_abort_call() with the latter.
(3) Only send an abort packet in rxkad_verify_packet*() if we actually
managed to abort the call.
Note that a trace event is also emitted if a kernel user (e.g. afs) tries
to send data through a call when it's not in the transmission phase, though
it's not technically a receive event.
Signed-off-by: David Howells <dhowells@redhat.com>
In the rxkad security module, when we encounter a temporary error (such as
ENOMEM) from which we could conceivably recover, don't abort the
connection, but rather permit retransmission of the relevant packets to
induce a retry.
Note that I'm leaving some places that could be merged together to insert
tracing in the next patch.
Signed-off-by; David Howells <dhowells@redhat.com>
Make rxrpc_kernel_abort_call() return an indication as to whether it
actually aborted the operation or not so that kafs can trace the failure of
the operation. Note that 'success' in this context means changing the
state of the call, not necessarily successfully transmitting an ABORT
packet.
Signed-off-by: David Howells <dhowells@redhat.com>
Use negative error codes in struct rxrpc_call::error because that's what
the kernel normally deals with and to make the code consistent. We only
turn them positive when transcribing into a cmsg for userspace recvmsg.
Signed-off-by: David Howells <dhowells@redhat.com>
Pull networking fixes from David Miller:
1) Reject invalid updates to netfilter expectation policies, from Pablo
Neira Ayuso.
2) Fix memory leak in nfnl_cthelper, from Jeffy Chen.
3) Don't do stupid things if we get a neigh_probe() on a neigh entry
whose ops lack a solicit method. From Eric Dumazet.
4) Don't transmit packets in r8152 driver when the carrier is off, from
Hayes Wang.
5) Fix ipv6 packet type detection in aquantia driver, from Pavel
Belous.
6) Don't write uninitialized data into hw registers in bna driver, from
Arnd Bergmann.
7) Fix locking in ping_unhash(), from Eric Dumazet.
8) Make BPF verifier range checks able to understand certain sequences
emitted by LLVM, from Alexei Starovoitov.
9) Fix use after free in ipconfig, from Mark Rutland.
10) Fix refcount leak on force commit in openvswitch, from Jarno
Rajahalme.
11) Fix various overflow checks in AF_PACKET, from Andrey Konovalov.
12) Fix endianness bug in be2net driver, from Suresh Reddy.
13) Don't forget to wake TX queues when processing a timeout, from
Grygorii Strashko.
14) ARP header on-stack storage is wrong in flow dissector, from Simon
Horman.
15) Lost retransmit and reordering SNMP stats in TCP can be
underreported. From Yuchung Cheng.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits)
nfp: fix potential use after free on xdp prog
tcp: fix reordering SNMP under-counting
tcp: fix lost retransmit SNMP under-counting
sctp: get sock from transport in sctp_transport_update_pmtu
net: ethernet: ti: cpsw: fix race condition during open()
l2tp: fix PPP pseudo-wire auto-loading
bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*
l2tp: take reference on sessions being dumped
tcp: minimize false-positives on TCP/GRO check
sctp: check for dst and pathmtu update in sctp_packet_config
flow dissector: correct size of storage for ARP
net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout
l2tp: take a reference on sessions used in genetlink handlers
l2tp: hold session while sending creation notifications
l2tp: fix duplicate session creation
l2tp: ensure session can't get removed during pppol2tp_session_ioctl()
l2tp: fix race in l2tp_recv_common()
sctp: use right in and out stream cnt
bpf: add various verifier test cases for self-tests
bpf, verifier: fix rejection of unaligned access checks for map_value_adj
...
Currently the reordering SNMP counters only increase if a connection
sees a higher degree then it has previously seen. It ignores if the
reordering degree is not greater than the default system threshold.
This significantly under-counts the number of reordering events
and falsely convey that reordering is rare on the network.
This patch properly and faithfully records the number of reordering
events detected by the TCP stack, just like the comment says "this
exciting event is worth to be remembered". Note that even so TCP
still under-estimate the actual reordering events because TCP
requires TS options or certain packet sequences to detect reordering
(i.e. ACKing never-retransmitted sequence in recovery or disordered
state).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lost retransmit SNMP stat is under-counting retransmission
that uses segment offloading. This patch fixes that so all
retransmission related SNMP counters are consistent.
Fixes: 10d3be5692 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can-next 2017-03-03
this is a pull request of 5 patches for net-next/master.
There are two patches by Yegor Yefremov which convert the ti_hecc
driver into a DT only driver, as there is no in-tree user of the old
platform driver interface anymore. The next patch by Mario Kicherer
adds network namespace support to the can subsystem. The last two
patches by Akshay Bhat add support for the holt_hi311x SPI CAN driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When netdev events happen, a rtnetlink_event() handler will send
messages for every event in it's white list. These messages contain
current information about a particular device, but they do not include
the iformation about which event just happened. The consumer of
the message has to try to infer this information. In some cases
(ex: NETDEV_NOTIFY_PEERS), that is not possible.
This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
that would have an encoding of the which event triggered this
message. This would allow the the message consumer to easily determine
if it is interested in a particular event or not.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rtnetlink_event currently functions as a blacklist where
we block cerntain netdev events from being sent to user space.
As a result, events have been added to the system that userspace
probably doesn't care about.
This patch converts the implementation to the white list so that
newly events would have to be specifically added to the list to
be sent to userspace. This would force new event implementers to
consider whether a given event is usefull to user space or if it's
just a kernel event.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define one new macro TCP_MAX_WSCALE instead of literal number '14',
and use U16_MAX instead of 65535 as the max value of TCP window.
There is another minor change, use rounddown(space, mss) instead of
(space / mss) * mss;
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is almost to revert commit 02f3d4ce9e ("sctp: Adjust PMTU
updates to accomodate route invalidation."). As t->asoc can't be NULL
in sctp_transport_update_pmtu, it could get sk from asoc, and no need
to pass sk into that function.
It is also to remove some duplicated codes from that function.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cb_running is reported in /proc/self/net/netlink and it is reported by
the ss tool, when it gets information from the proc files.
sock_diag is a new interface which is used instead of proc files, so it
looks reasonable that this interface has to report no less information
about sockets than proc files.
We use these flags to dump and restore netlink sockets.
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using a private copy of struct net_device_stats in struct
batadv_priv, use stats from struct net_device.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
It looks like a typo to assign a return code to a variable which is not
used. Found due to a compiler warning:
net/nfc/netlink.c: In function ‘nfc_genl_activate_target’:
net/nfc/netlink.c:903:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable]
int rc;
^~
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP).
Fixes: f1f39f9110 ("l2tp: auto load type modules")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take a reference on the sessions returned by l2tp_session_find_nth()
(and rename it l2tp_session_get_nth() to reflect this change), so that
caller is assured that the session isn't going to disappear while
processing it.
For procfs and debugfs handlers, the session is held in the .start()
callback and dropped in .show(). Given that pppol2tp_seq_session_show()
dereferences the associated PPPoL2TP socket and that
l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to
call the session's .ref() callback to prevent the socket from going
away from under us.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Fixes: 0ad6614048 ("l2tp: Add debugfs files for dumping l2tp debug info")
Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>