Commit Graph

45659 Commits

Author SHA1 Message Date
Vivien Didelot
9f91484f6f net: dsa: make "label" property optional for dsa2
In the new DTS bindings for DSA (dsa2), the "ethernet" and "link"
phandles are respectively mandatory and exclusive to CPU port and DSA
link device tree nodes.

Simplify dsa2.c a bit by checking the presence of such phandle instead
of checking the redundant "label" property.

Then the Linux philosophy for Ethernet switch ports is to expose them to
userspace as standard NICs by default. Thus use the standard enumerated
"eth%d" device name if no "label" property is provided for a user port.
This allows to save DTS files from subjective net device names.

If one wants to rename an interface, udev rules can be used as usual.

Of course the current behavior is unchanged, and the optional "label"
property for user ports has precedence over the enumerated name.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:26:15 -05:00
Eric Dumazet
7cfd5fd5a9 gro: use min_t() in skb_gro_reset_offset()
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87fa ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 08:15:40 -05:00
Jorge Ramirez-Ortiz
7acec26cec cfg80211: wext does not need to set monitor channel in managed mode
There is not a valid reason to attempt setting the monitor channel
while in managed mode. Since this code path only deals with this mode,
remove the code block.

Johannes: I'll note that the comment indicated it was for backward
compatibility, but the code wasn't functional since switching the
monitor channel isn't supported (any more?) when in managed mode, as
that mode owns the channel configuration. Additionally, since monitor
can't be done on a managed mode interface, this would only have had
any effect to start with if a separate monitor interface is present,
in which case it's better to change the channel through that anyway,
if even possible.

Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 14:10:44 +01:00
Alexander Duyck
8c2dd3e4a4 mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
Patch series "Page fragment updates", v4.

This patch series takes care of a few cleanups for the page fragments
API.

First we do some renames so that things are much more consistent.  First
we move the page_frag_ portion of the name to the front of the functions
names.  Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.

Finally I added a bit of documentation that will hopefully help to
explain some of this.  I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.

This patch (of 3):

This patch renames the page frag functions to be more consistent with
other APIs.  Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix.  This makes it a bit clearer in terms of naming.

In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.

Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Herbert Xu
57ea52a865 gro: Disable frag0 optimization on IPv6 ext headers
The GRO fast path caches the frag0 address.  This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.

This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.

This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.

Fixes: 78a478d0ef ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:30:33 -05:00
Herbert Xu
1272ce87fa gro: Enter slow-path if there is no tailroom
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0.  However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.

This patch adds the check by capping frag0_len with the skb tailroom.

Fixes: cb18978cbf ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:26:12 -05:00
Julian Wiedmann
dc5367bcc5 net/af_iucv: don't use paged skbs for TX on HiperSockets
With commit e53743994e
("af_iucv: use paged SKBs for big outbound messages"),
we transmit paged skbs for both of AF_IUCV's transport modes
(IUCV or HiperSockets).
The qeth driver for Layer 3 HiperSockets currently doesn't
support NETIF_F_SG, so these skbs would just be linearized again
by the stack.
Avoid that overhead by using paged skbs only for IUCV transport.

cc stable, since this also circumvents a significant skb leak when
sending large messages (where the skb then needs to be linearized).

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: e53743994e ("af_iucv: use paged SKBs for big outbound messages")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:08:29 -05:00
Sowmini Varadhan
a505e58252 packet: pdiag_put_ring() should return TX_RING info for TPACKET_V3
Commit 7f953ab2ba ("af_packet: TX_RING support for TPACKET_V3")
now makes it possible to use TX_RING with TPACKET_V3, so make the
the relevant information available via 'ss -e -a --packet'

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:02:42 -05:00
Anna, Suman
5d722b3024 net: add the AF_QIPCRTR entries to family name tables
Commit bdabad3e36 ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.

Cc: Courtney Cavin <courtney.cavin@sonymobile.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:50:59 -05:00
Stephen Boyd
3512a1ad56 net: qrtr: Mark 'buf' as little endian
Failure to mark this pointer as __le32 causes checkers like
sparse to complain:

net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:274:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:274:16:    got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:275:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:275:16:    got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:276:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:276:16:    got restricted __le32 [usertype] <noident>

Silence it.

Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:45:04 -05:00
Florian Fainelli
faf3a932fb net: dsa: Ensure validity of dst->ds[0]
It is perfectly possible to have non zero indexed switches being present
in a DSA switch tree, in such a case, we will be deferencing a NULL
pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive
and ensure that dst->ds[0] is valid before doing anything with it.

Fixes: 0c73c523cf ("net: dsa: Initialize CPU port ethtool ops per tree")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:18:52 -05:00
Eric Dumazet
d9584d8ccc net: skb_flow_get_be16() can be static
Removes following sparse complain :

net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16'
was not declared. Should it be static?

Fixes: 972d3876fa ("flow dissector: ICMP support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 13:30:13 -05:00
Tobias Klauser
dc647ec88e net: socket: Make unnecessarily global sockfs_setattr() static
Make sockfs_setattr() static as it is not used outside of net/socket.c

This fixes the following GCC warning:
net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 11:29:50 -05:00
Florian Westphal
75cda62d9c xfrm: state: simplify rcu_read_unlock handling in two spots
Instead of:
  if (foo) {
      unlock();
      return bar();
   }
   unlock();
do:
   unlock();
   if (foo)
       return bar();

This is ok because rcu protected structure is only dereferenced before
the conditional.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:14 +01:00
Florian Westphal
711059b975 xfrm: add and use xfrm_state_afinfo_get_rcu
xfrm_init_tempstate is always called from within rcu read side section.
We can thus use a simpler function that doesn't call rcu_read_lock
again.

While at it, also make xfrm_init_tempstate return value void, the
return value was never tested.

A followup patch will replace remaining callers of xfrm_state_get_afinfo
with xfrm_state_afinfo_get_rcu variant and then remove the 'old'
get_afinfo interface.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal
af5d27c4e1 xfrm: remove xfrm_state_put_afinfo
commit 44abdc3047
("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
xfrm_state_put_afinfo equivalent to rcu_read_unlock.

Use spatch to replace it with direct calls to rcu_read_unlock:

@@
struct xfrm_state_afinfo *a;
@@

-  xfrm_state_put_afinfo(a);
+  rcu_read_unlock();

old:
 text    data     bss     dec     hex filename
22570      72     424   23066    5a1a xfrm_state.o
 1612       0       0    1612     64c xfrm_output.o
new:
22554      72     424   23050    5a0a xfrm_state.o
 1596       0       0    1596     63c xfrm_output.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal
423826a7b1 xfrm: avoid rcu sparse warning
xfrm/xfrm_state.c:1973:21: error: incompatible types in comparison expression (different address spaces)

Harmless, but lets fix it to reduce the noise.

While at it, get rid of unneeded NULL check, its never hit:

net/ipv4/xfrm4_state.c: xfrm_state_register_afinfo(&xfrm4_state_afinfo);
net/ipv6/xfrm6_state.c: return xfrm_state_register_afinfo(&xfrm6_state_afinfo);
net/ipv6/xfrm6_state.c: xfrm_state_unregister_afinfo(&xfrm6_state_afinfo);

... are the only callsites.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal
961a9c0a4a xfrm: remove unused function
Has been ifdef'd out for more than 10 years, remove it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:12 +01:00
Vivien Didelot
3a89eaa65d net: dsa: select NET_SWITCHDEV
The support for DSA Ethernet switch chips depends on TCP/IP networking,
thus explicit that HAVE_NET_DSA depends on INET.

DSA uses SWITCHDEV, thus select it instead of depending on it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 17:17:30 -05:00
Eric Dumazet
b369e7fd41 tcp: make TCP_INFO more consistent
tcp_get_info() has to lock the socket, so lets lock it
for an extended critical section, so that various fields
have consistent values.

This solves an annoying issue that some applications
reported when multiple counters are updated during one
particular rx/rx event, and TCP_INFO was called from
another cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 17:07:54 -05:00
Alexei Starovoitov
39f19ebbf5 bpf: rename ARG_PTR_TO_STACK
since ARG_PTR_TO_STACK is no longer just pointer to stack
rename it to ARG_PTR_TO_MEM and adjust comment.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:56:27 -05:00
Eric Dumazet
6bb629db5e tcp: do not export tcp_peer_is_proven()
After commit 1fb6f159fd ("tcp: add tcp_conn_request"),
tcp_peer_is_proven() no longer needs to be exported.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:39 -05:00
Pavel Tikhomirov
b007f09072 ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648

but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"

v2: simplify to just proc_douintvec
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:38 -05:00
Alexander Alemayhu
67c408cfa8 ipv6: fix typos
o s/approriate/appropriate
o s/discouvery/discovery

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:15 -05:00
Ursula Braun
f16a7dd5cf smc: netlink interface for SMC sockets
Support for SMC socket monitoring via netlink sockets of protocol
NETLINK_SOCK_DIAG.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:41 -05:00
Ursula Braun
b38d732477 smc: socket closing and linkgroup cleanup
smc_shutdown() and smc_release() handling
delayed linkgroup cleanup for linkgroups without connections

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun
952310ccf2 smc: receive data from RMBE
move RMBE data into user space buffer and update managing cursors

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun
e6727f3900 smc: send data (through RDMA)
copy data to kernel send buffer, and trigger RDMA write

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun
5f08318f61 smc: connection data control (CDC)
send and receive CDC messages (via IB message send and CQE)

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun
9bf9abead2 smc: link layer control (LLC)
send and receive LLC messages CONFIRM_LINK (via IB message send and CQE)

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun
bd4ad57718 smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR
Prepare the link for RDMA transport:
Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR).

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Ursula Braun
f38ba179c6 smc: work request (WR) base for use by LLC and CDC
The base containers for RDMA transport are work requests and completion
queue entries processed through Infiniband verbs:
* allocate and initialize these areas
* map these areas to DMA
* implement the basic communication consisting of work request posting
  and receival of completion queue events

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Ursula Braun
cd6851f303 smc: remote memory buffers (RMBs)
* allocate data RMB memory for sending and receiving
* size depends on the maximum socket send and receive buffers
* allocated RMBs are kept during life time of the owning link group
* map the allocated RMBs to DMA

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Ursula Braun
0cfdd8f92c smc: connection and link group creation
* create smc_connection for SMC-sockets
* determine suitable link group for a connection
* create a new link group if necessary

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Ursula Braun
a046d57da1 smc: CLC handshake (incl. preparation steps)
* CLC (Connection Layer Control) handshake

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Thomas Richter
6812baabf2 smc: establish pnet table management
Connection creation with SMC-R starts through an internal
TCP-connection. The Ethernet interface for this TCP-connection is not
restricted to the Ethernet interface of a RoCE device. Any existing
Ethernet interface belonging to the same physical net can be used, as
long as there is a defined relation between the Ethernet interface and
some RoCE devices. This relation is defined with the help of an
identification string called "Physical Net ID" or short "pnet ID".
Information about defined pnet IDs and their related Ethernet
interfaces and RoCE devices is stored in the SMC-R pnet table.

A pnet table entry consists of the identifying pnet ID and the
associated network and IB device.
This patch adds pnet table configuration support using the
generic netlink message interface referring to network and IB device
by their names. Commands exist to add, delete, and display pnet table
entries, and to flush or display the entire pnet table.

There are cross-checks to verify whether the ethernet interfaces
or infiniband devices really exist in the system. If either device
is not available, the pnet ID entry is not created.
Loss of network devices and IB devices is also monitored;
a pnet ID entry is removed when an associated network or
IB device is removed.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:38 -05:00
Ursula Braun
a4cf0443c4 smc: introduce SMC as an IB-client
* create a list of SMC IB-devices

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:38 -05:00
Ursula Braun
ac7138746e smc: establish new socket family
* enable smc module loading and unloading
 * register new socket family
 * basic smc socket creation and deletion
 * use backing TCP socket to run CLC (Connection Layer Control)
   handshake of SMC protocol
 * Setup for infiniband traffic is implemented in follow-on patches.
   For now fallback to TCP socket is always used.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:38 -05:00
Ursula Braun
4b9d07a440 net: introduce keepalive function in struct proto
Direct call of tcp_set_keepalive() function from protocol-agnostic
sock_setsockopt() function in net/core/sock.c violates network
layering. And newly introduced protocol (SMC-R) will need its own
keepalive function. Therefore, add "keepalive" function pointer
to "struct proto", and call it from sock_setsockopt() via this pointer.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:37 -05:00
Jesper Dangaard Brouer
7ba91ecb16 net: for rate-limited ICMP replies save one atomic operation
It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
as pointed out by Eric Dumazet, but the BH disabled state must be correct.

The icmp_global_allow() call states it must be called with BH
disabled.  This protection was given by the calls icmp_xmit_lock and
icmpv6_xmit_lock.  Thus, split out local_bh_disable/enable from these
functions and maintain it explicitly at callers.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:49:12 -05:00
Jesper Dangaard Brouer
c0303efeab net: reduce cycles spend on ICMP replies that gets rate limited
This patch split the global and per (inet)peer ICMP-reply limiter
code, and moves the global limit check to earlier in the packet
processing path.  Thus, avoid spending cycles on ICMP replies that
gets limited/suppressed anyhow.

The global ICMP rate limiter icmp_global_allow() is a good solution,
it just happens too late in the process.  The kernel goes through the
full route lookup (return path) for the ICMP message, before taking
the rate limit decision of not sending the ICMP reply.

Details: The kernels global rate limiter for ICMP messages got added
in commit 4cdf507d54 ("icmp: add a global rate limitation").  It is
a token bucket limiter with a global lock.  It brilliantly avoids
locking congestion by only updating when 20ms (HZ/50) were elapsed. It
can then avoids taking lock when credit is exhausted (when under
pressure) and time constraint for refill is not yet meet.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:49:12 -05:00
Jesper Dangaard Brouer
8d9ba388f3 Revert "icmp: avoid allocating large struct on stack"
This reverts commit 9a99d4a50c ("icmp: avoid allocating large struct
on stack"), because struct icmp_bxm no really a large struct, and
allocating and free of this small 112 bytes hurts performance.

Fixes: 9a99d4a50c ("icmp: avoid allocating large struct on stack")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:49:12 -05:00
David S. Miller
aaa9c1071d Merge tag 'rxrpc-rewrite-20170109' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:

====================
afs: Refcount afs_call struct

These patches provide some tracepoints for AFS and fix a potential leak by
adding refcounting to the afs_call struct.

The patches are:

 (1) Add some tracepoints for logging incoming calls and monitoring
     notifications from AF_RXRPC and data reception.

 (2) Get rid of afs_wait_mode as it didn't turn out to be as useful as
     initially expected.  It can be brought back later if needed.  This
     clears some stuff out that I don't then need to fix up in (4).

 (3) Allow listen(..., 0) to be used to disable listening.  This makes
     shutting down the AFS cache manager server in the kernel much easier
     and the accounting simpler as we can then be sure that (a) all
     preallocated afs_call structs are relesed and (b) no new incoming
     calls are going to be started.

     For the moment, listening cannot be reenabled.

 (4) Add refcounting to the afs_call struct to fix a potential multiple
     release detected by static checking and add a tracepoint to follow the
     lifecycle of afs_call objects.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:47:52 -05:00
Florian Fainelli
a82f67afe8 net: dsa: Make dsa_switch_ops const
Now that we have properly encapsulated and made drivers utilize exported
functions, we can switch dsa_switch_ops to be a annotated with const.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:44:50 -05:00
Florian Fainelli
ab3d408d3f net: dsa: Encapsulate legacy switch drivers into dsa_switch_driver
In preparation for making struct dsa_switch_ops const, encapsulate it
within a dsa_switch_driver which has a list pointer and a pointer to
dsa_switch_ops. This allows us to take the list_head pointer out of
dsa_switch_ops, which is written to by {un,}register_switch_driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:44:50 -05:00
David S. Miller
bb1d303444 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-09 15:39:11 -05:00
Davide Caratti
c008b33f3e net/sched: act_csum: compute crc32c on SCTP packets
modify act_csum to compute crc32c on IPv4/IPv6 packets having SCTP in
their payload, and extend UAPI definitions accordingly.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 14:36:57 -05:00
Davide Caratti
ab9d226e3e net/sched: Kconfig: select LIBCRC32C if NET_ACT_CSUM is selected
LIBCRC32C is needed to compute crc32c on SCTP packets.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 14:36:57 -05:00
Alexandru Moise
58fa118f3d cls_u32: don't bother explicitly initializing ->divisor to zero
This struct member is already initialized to zero upon root_ht's
allocation via kzalloc().

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 14:27:16 -05:00
Jason A. Donenfeld
fe62d05b29 syncookies: use SipHash in place of SHA1
SHA1 is slower and less secure than SipHash, and so replacing syncookie
generation with SipHash makes natural sense. Some BSDs have been doing
this for several years in fact.

The speedup should be similar -- and even more impressive -- to the
speedup from the sequence number fix in this series.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 13:58:57 -05:00