Commit Graph

44719 Commits

Author SHA1 Message Date
Phil Sutter
cdf7370391 net: batman-adv: convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18 11:55:07 -07:00
Phil Sutter
0a5f107b67 net: dsa: convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18 11:55:06 -07:00
Phil Sutter
4afbc0db72 net: 6lowpan: convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18 11:55:06 -07:00
Phil Sutter
ccecb2a47c net: bridge: convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18 11:55:06 -07:00
Phil Sutter
2e659c0551 net: 8021q: convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18 11:55:06 -07:00
Tom Herbert
65d7ab8de5 net: Identifier Locator Addressing module
Adding new module name ila. This implements ILA translation. Light
weight tunnel redirection is used to perform the translation in
the data path. This is configured by the "ip -6 route" command
using the "encap ila <locator>" option, where <locator> is the
value to set in destination locator of the packet. e.g.

ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
      encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0

Sets a route where 3333:0:0:1 will be overwritten by
2001:0:0:1 on output.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 21:33:06 -07:00
Tom Herbert
abc5d1ff3e net: Add inet_proto_csum_replace_by_diff utility function
This function updates a checksum field value and skb->csum based on
a value which is the difference between the old and new checksum.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 21:33:06 -07:00
Tom Herbert
4b048d6d9d net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool
inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
the checksum field carries a pseudo header. This argument should be a
boolean instead of an int.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 21:33:06 -07:00
Tom Herbert
2536862311 lwt: Add support to redirect dst.input
This patch adds the capability to redirect dst input in the same way
that dst output is redirected by LWT.

Also, save the original dst.input and and dst.out when setting up
lwtunnel redirection. These can be called by the client as a pass-
through.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 21:33:05 -07:00
Daniel Borkmann
5e8018fc61 netfilter: nf_conntrack: add efficient mark to zone mapping
This work adds the possibility of deriving the zone id from the skb->mark
field in a scalable manner. This allows for having only a single template
serving hundreds/thousands of different zones, for example, instead of the
need to have one match for each zone as an extra CT jump target.

Note that we'd need to have this information attached to the template as at
the time when we're trying to lookup a possible ct object, we already need
to know zone information for a possible match when going into
__nf_conntrack_find_get(). This work provides a minimal implementation for
a possible mapping.

In order to not add/expose an extra ct->status bit, the zone structure has
been extended to carry a flag for deriving the mark.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-18 01:24:05 +02:00
Daniel Borkmann
deedb59039 netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.

The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.

As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.

Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.

If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).

Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.

Below a minimal, simplified collision example (script in [2]) with
netperf sessions:

  +--- tenant-1 ---+   mark := 1
  |    netperf     |--+
  +----------------+  |                CT zone := mark [ORIGINAL]
   [ip,sport] := X   +--------------+  +--- gateway ---+
                     | mark routing |--|     SNAT      |-- ... +
                     +--------------+  +---------------+       |
  +--- tenant-2 ---+  |                                     ~~~|~~~
  |    netperf     |--+                +-----------+           |
  +----------------+   mark := 2       | netserver |------ ... +
   [ip,sport] := X                     +-----------+
                                        [ip,port] := Y
On the gateway netns, example:

  iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
  iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully

  iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
  iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark

conntrack dump from gateway netns:

  netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns

  tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
                        src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
                        src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2

Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.

I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.

  [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
  [2] https://paste.fedoraproject.org/242835/65657871/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-18 01:22:50 +02:00
David Ahern
dc028da54e inet: Move VRF table lookup to inlined function
Table lookup compiles out when VRF is not enabled.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:58:57 -07:00
David S. Miller
0aa65cc0c2 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-08-16

Here's what's likely the last bluetooth-next pull request for 4.3:

 - 6lowpan/802.15.4 refactoring, cleanups & fixes
 - Document 6lowpan netdev usage in Documentation/networking/6lowpan.txt
 - Support for UART based QCA Bluetooth controllers
 - Power management support for Broeadcom Bluetooth controllers
 - Change LE connection initiation to always use passive scanning first
 - Support for new Silicon Wave USB ID

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:41:21 -07:00
David S. Miller
2ea273d76a net: Export bpf_prog_create_from_user().
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:37:06 -07:00
Ian Morris
ec120da6f0 ipv6: trivial whitespace fix
Change brace placement to be in line with coding standards

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:34:48 -07:00
David S. Miller
c1f066d4ee Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:

====================
Included changes:
- avoid integer overflow in GW selection routine
- prevent race condition by making capability bit changes atomic (use
  clear/set/test_bit)
- fix synchronization issue in mcast tvlv handler
- fix crash on double list removal of TT Request objects
- fix leak by puring packets enqueued for sending upon iface removal
- ensure network header pointer is set in skb
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:31:42 -07:00
Martin KaFai Lau
9c7370a166 ipv6: Fix a potential deadlock when creating pcpu rt
rt6_make_pcpu_route() is called under read_lock(&table->tb6_lock).
rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then
calls dst_alloc().  dst_alloc() _may_ call ip6_dst_gc() which takes
the write_lock(&tabl->tb6_lock).  A visualized version:

read_lock(&table->tb6_lock);
rt6_make_pcpu_route();
=> ip6_rt_pcpu_alloc();
=> dst_alloc();
=> ip6_dst_gc();
=> write_lock(&table->tb6_lock); /* oops */

The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc().

A reported stack:

[141625.537638] INFO: rcu_sched self-detected stall on CPU { 27}  (t=60000 jiffies g=4159086 c=4159085 q=2139)
[141625.547469] Task dump for CPU 27:
[141625.550881] mtr             R  running task        0 22121  22081 0x00000008
[141625.558069]  0000000000000000 ffff88103f363d98 ffffffff8106e488 000000000000001b
[141625.565641]  ffffffff81684900 ffff88103f363db8 ffffffff810702b0 0000000008000000
[141625.573220]  ffffffff81684900 ffff88103f363de8 ffffffff8108df9f ffff88103f375a00
[141625.580803] Call Trace:
[141625.583345]  <IRQ>  [<ffffffff8106e488>] sched_show_task+0xc1/0xc6
[141625.589650]  [<ffffffff810702b0>] dump_cpu_task+0x35/0x39
[141625.595144]  [<ffffffff8108df9f>] rcu_dump_cpu_stacks+0x6a/0x8c
[141625.601320]  [<ffffffff81090606>] rcu_check_callbacks+0x1f6/0x5d4
[141625.607669]  [<ffffffff810940c8>] update_process_times+0x2a/0x4f
[141625.613925]  [<ffffffff8109fbee>] tick_sched_handle+0x32/0x3e
[141625.619923]  [<ffffffff8109fc2f>] tick_sched_timer+0x35/0x5c
[141625.625830]  [<ffffffff81094a1f>] __hrtimer_run_queues+0x8f/0x18d
[141625.632171]  [<ffffffff81094c9e>] hrtimer_interrupt+0xa0/0x166
[141625.638258]  [<ffffffff8102bf2a>] local_apic_timer_interrupt+0x4e/0x52
[141625.645036]  [<ffffffff8102c36f>] smp_apic_timer_interrupt+0x39/0x4a
[141625.651643]  [<ffffffff8140b9e8>] apic_timer_interrupt+0x68/0x70
[141625.657895]  <EOI>  [<ffffffff81346ee8>] ? dst_destroy+0x7c/0xb5
[141625.664188]  [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20
[141625.670272]  [<ffffffff81082b45>] ? queue_write_lock_slowpath+0x60/0x6f
[141625.677140]  [<ffffffff8140aa33>] _raw_write_lock_bh+0x23/0x25
[141625.683218]  [<ffffffff813d4553>] __fib6_clean_all+0x40/0x82
[141625.689124]  [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20
[141625.695207]  [<ffffffff813d6058>] fib6_clean_all+0xe/0x10
[141625.700854]  [<ffffffff813d60d3>] fib6_run_gc+0x79/0xc8
[141625.706329]  [<ffffffff813d0510>] ip6_dst_gc+0x85/0xf9
[141625.711718]  [<ffffffff81346d68>] dst_alloc+0x55/0x159
[141625.717105]  [<ffffffff813d09b5>] __ip6_dst_alloc.isra.32+0x19/0x63
[141625.723620]  [<ffffffff813d1830>] ip6_pol_route+0x36a/0x3e8
[141625.729441]  [<ffffffff813d18d6>] ip6_pol_route_output+0x11/0x13
[141625.735700]  [<ffffffff813f02c8>] fib6_rule_action+0xa7/0x1bf
[141625.741698]  [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17
[141625.748043]  [<ffffffff81357c48>] fib_rules_lookup+0xb5/0x12a
[141625.754050]  [<ffffffff81141628>] ? poll_select_copy_remaining+0xf9/0xf9
[141625.761002]  [<ffffffff813f0535>] fib6_rule_lookup+0x37/0x5c
[141625.766914]  [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17
[141625.773260]  [<ffffffff813d008c>] ip6_route_output+0x7a/0x82
[141625.779177]  [<ffffffff813c44c8>] ip6_dst_lookup_tail+0x53/0x112
[141625.785437]  [<ffffffff813c45c3>] ip6_dst_lookup_flow+0x2a/0x6b
[141625.791604]  [<ffffffff813ddaab>] rawv6_sendmsg+0x407/0x9b6
[141625.797423]  [<ffffffff813d7914>] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2
[141625.804464]  [<ffffffff8139d4b4>] inet_sendmsg+0x57/0x8e
[141625.810028]  [<ffffffff81329ba3>] sock_sendmsg+0x2e/0x3c
[141625.815588]  [<ffffffff8132be57>] SyS_sendto+0xfe/0x143
[141625.821063]  [<ffffffff813dd551>] ? rawv6_setsockopt+0x5e/0x67
[141625.827146]  [<ffffffff8132c9f8>] ? sock_common_setsockopt+0xf/0x11
[141625.833660]  [<ffffffff8132c08c>] ? SyS_setsockopt+0x81/0xa2
[141625.839565]  [<ffffffff8140ac17>] entry_SYSCALL_64_fastpath+0x12/0x6a

Fixes: d52d3997f8 ("pv6: Create percpu rt6_info")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:28:03 -07:00
Martin KaFai Lau
a73e419563 ipv6: Add rt6_make_pcpu_route()
It is a prep work for fixing a potential deadlock when creating
a pcpu rt.

The current rt6_get_pcpu_route() will also create a pcpu rt if one does not
exist.  This patch moves the pcpu rt creation logic into another function,
rt6_make_pcpu_route().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:28:03 -07:00
Martin KaFai Lau
ad70686289 ipv6: Remove un-used argument from ip6_dst_alloc()
After 4b32b5ad31 ("ipv6: Stop rt6_info from using inet_peer's metrics"),
ip6_dst_alloc() does not need the 'table' argument.  This patch
cleans it up.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:28:03 -07:00
David S. Miller
2bd736fa0d Merge tag 'mac80211-next-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
Another pull request for the next cycle, this time with quite
a bit of content:
 * mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
 * TDLS higher bandwidth support (Arik)
 * OCB fixes from Bertold Van den Bergh
 * suspend/resume fixes from Eliad
 * dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
 * VHT bitrate mask support (Lorenzo Bianconi)
 * better regulatory support for 5/10 MHz channels (Matthias May)
 * basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
along with a number of other cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:25:04 -07:00
Willem de Bruijn
f2e520956a packet: add extended BPF fanout mode
Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF
program to select a socket.

Update the internal eBPF program by passing to socket option
SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:22:48 -07:00
Willem de Bruijn
47dceb8ecd packet: add classic BPF fanout mode
Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program
to select a socket.

This avoids having to keep adding special case fanout modes. One
example use case is application layer load balancing. The QUIC
protocol, for instance, encodes a connection ID in UDP payload.

Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data
associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the
only user so far.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:22:47 -07:00
Jiri Benc
a1c234f95c lwtunnel: rename ip lwtunnel attributes
We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
very similar, yet they serve very different purpose. This is confusing for
anyone trying to implement a user space tool supporting lwt.

As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
logical to have them in lwtunnel.h together with the encap enum.

Fixes: 3093fbe7ff ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:07:15 -07:00
David S. Miller
c87acb2558 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-08-17

1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
   From Thomas Egerer.

2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
   From Andrzej Hajda.

3) Pass oif to the xfrm lookups so that it gets set on the flow
   and the resolver routines can match based on oif.
   From David Ahern.

4) Add documentation for the new xfrm garbage collector threshold.
   From Alexander Duyck.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:05:14 -07:00
David S. Miller
a27cc68b9b Merge tag 'mac80211-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
We have a single bugfix for an invalid memory read.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:03:10 -07:00
Calvin Owens
5d37852bf7 Revert "net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN"
Commit 8133534c76 ("net: limit tcp/udp rmem/wmem to
SOCK_{RCV,SND}BUF_MIN") modified four sysctls to enforce that the values
written to them are not less than SOCK_MIN_{RCV,SND}BUF.

That change causes 4096 to no longer be accepted as a valid value for
'min' in tcp_wmem and udp_wmem_min. 4096 has been the default for both
of those sysctls for a long time, and unfortunately seems to be an
extremely popular setting. This change breaks a large number of sysctl
configurations at Facebook.

That commit referred to b1cb59cf2e ("net: sysctl_net_core: check
SNDBUF and RCVBUF for min length"), which choose to use the SOCK_MIN
constants as the lower limits to avoid nasty bugs. But AFAICS, a limit
of SOCK_MIN_SNDBUF isn't necessary to do that: the BUG_ON cited in the
commit message seems to have happened because unix_stream_sendmsg()
expects a minimum of a full page (ie SK_MEM_QUANTUM) and the math broke,
not because it had less than SOCK_MIN_SNDBUF allocated.

This particular issue doesn't seem to affect TCP however: using a
setting of "1 1 1" for tcp_{r,w}mem works, although it's obviously
suboptimal. SK_MEM_QUANTUM would be a nice minimum, but it's 64K on
some archs, so there would still be breakage.

Since a value of one doesn't seem to cause any problems, we can drop the
minimum 8133534c added to fix this.

This reverts commit 8133534c76.

Fixes: 8133534c76 ("net: limit tcp/udp rmem/wmem to SOCK_MIN...")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 12:10:30 -07:00
Phil Sutter
4b46995568 net: sch_generic: react upon IFF_NO_QUEUE flag
Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 11:50:18 -07:00
Trond Myklebust
37bfcc14b2 Merge branch 'bugfixes'
* bugfixes:
  SUNRPC: Fix a thinko in xs_connect()
  NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked()
  NFS: nfs_set_pgio_error sometimes misses errors
2015-08-17 13:36:22 -05:00
Trond Myklebust
eed889b161 Merge tag 'nfs-rdma-for-4.3' of git://git.linux-nfs.org/projects/anna/nfs-rdma
NFS: NFS over RDMA Client Side Changes

These patches improve both client performance and scalability, most notably
by increasing the maixmum allowed rsize and wsize and by increasing the number
of RDMA "credits".  There are also several bugfixes, such as correcting how
WRITE compounds are encoded and fixing large NFS symlink operations.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-17 13:33:58 -05:00
Trond Myklebust
99b1a4c32a SUNRPC: Fix a thinko in xs_connect()
It is rather pointless to test the value of transport->inet after
calling xs_reset_transport(), since it will always be zero, and
so we will never see any exponential back off behaviour.
Also don't force early connections for SOFTCONN tasks. If the server
disconnects us, we should respect the exponential backoff.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:05:49 -05:00
Richard Alpe
8f8ff9135b tipc: don't sanity check non-existing TLV (NL compat)
A zero length payload means that no TLV (Type Length Value) data has
been passed. Prior to this patch a non-existing TLV could be sanity
checked with TLV_OK() resulting in random behavior where a user
sending an empty message occasionally got a incorrect "operation not
supported" message back.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 10:39:54 -07:00
Herbert Xu
de0ded77b9 ipsec: Replace seqniv with seqiv
Now that seqniv is identical with seqiv we no longer need it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-08-17 16:53:42 +08:00
Christophe Ricard
adca3c38d8 nfc: netlink: Warning fix
When NFC_ATTR_VENDOR_DATA is not set, data_len is 0 and data is NULL.

Fixes the following warning:

net/nfc/netlink.c:1536:3: warning: 'data' may be used uninitialized
+in this function [-Wmaybe-uninitialized]
      return cmd->doit(dev, data, data_len);

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-08-17 10:45:19 +02:00
Eric Dumazet
1e31367899 ipv4: fix refcount leak in fib_check_nh()
fib_lookup() forces FIB_LOOKUP_NOREF flag, while fib_table_lookup()
does not.

This patch solves the typical message at reboot time or device
dismantle :

unregister_netdevice: waiting for eth0 to become free. Usage count = 4

Fixes: 3bfd847203 ("net: Use passed in table for nexthop lookups")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-16 22:14:32 -07:00
Christophe Ricard
fe202fe955 nfc: netlink: Add check on NFC_ATTR_VENDOR_DATA
NFC_ATTR_VENDOR_DATA is an optional vendor_cmd argument.
The current code was potentially using a non existing argument
leading to potential catastrophic results.

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-08-17 01:36:17 +02:00
Alexander Aring
c0015bf3a3 ieee802154: 6lowpan: fix non-lowpan wpan interfaces
We receive all 802.15.4 frames on the packet handler "lowpan_rcv" this
patch checks if the wpan device belongs to a lowpan interface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-15 23:28:09 +02:00
Alexander Aring
0751272880 ieee802154: 6lowpan: fix packet layer registration
This patch fixes 802.15.4 packet layer registration when mutliple
lowpan interfaces will be added. We need to register the packet layer at
the first lowpan interface and deregister it at the last interface. This
done by open_count variable which is protected by rtnl.

Additional do a quiet fix by adding dev_put(real_dev) when netdev
registration fails, which fix the refcount for the wpan dev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-15 23:28:09 +02:00
Linus Lüssing
53cf037bf8 batman-adv: Fix potentially broken skb network header access
The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They
need a correctly set skb network header.

Unfortunately we cannot rely on the device drivers to set it for us.
Therefore setting it in the beginning of the according ndo_start_xmit
handler.

Fixes: 1d8ab8d3c1 ("batman-adv: Modified forwarding behaviour for multicast packets")
Fixes: ab49886e3d ("batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:52:10 +02:00
Simon Wunderlich
3f1e08d0ae batman-adv: remove broadcast packets scheduled for purged outgoing if
When an interface is purged, the broadcast packets scheduled for this
interface should get purged as well.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:52:09 +02:00
Marek Lindner
1f15510164 batman-adv: protect tt request from double deletion
The list_del() calls were changed to list_del_init() to prevent
an accidental double deletion in batadv_tt_req_node_new().

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:52:09 +02:00
Linus Lüssing
8a4023c5b5 batman-adv: Fix potential synchronization issues in mcast tvlv handler
So far the mcast tvlv handler did not anticipate the processing of
multiple incoming OGMs from the same originator at the same time. This
can lead to various issues:

* Broken refcounting: For instance two mcast handlers might both assume
  that an originator just got multicast capabilities and will together
  wrongly decrease mcast.num_disabled by two, potentially leading to
  an integer underflow.

* Potential kernel panic on hlist_del_rcu(): Two mcast handlers might
  one after another try to do an
  hlist_del_rcu(&orig->mcast_want_all_*_node). The second one will
  cause memory corruption / crashes.
  (Reported by: Sven Eckelmann <sven@narfation.org>)

Right in the beginning the code path makes assumptions about the current
multicast related state of an originator and bases all updates on that. The
easiest and least error prune way to fix the issues in this case is to
serialize multiple mcast handler invocations with a spinlock.

Fixes: 60432d756c ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:52:08 +02:00
Linus Lüssing
9c936e3f4c batman-adv: Make MCAST capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.

Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

Fixes: 60432d756c ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:51:40 +02:00
Linus Lüssing
ac4eebd484 batman-adv: Make TT capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.

Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

Fixes: e17931d1a6 ("batman-adv: introduce capability initialization bitfield")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:50:43 +02:00
Linus Lüssing
4635469f5c batman-adv: Make NC capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.

Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

Fixes: 3f4841ffb3 ("batman-adv: tvlv - add network coding container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:50:43 +02:00
Linus Lüssing
65d7d46050 batman-adv: Make DAT capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.

Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

Fixes: 17cf0ea455 ("batman-adv: tvlv - add distributed arp table container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-14 22:50:42 +02:00
Johannes Berg
40d9a38ad3 mac80211: use DECLARE_EWMA
Instead of using the out-of-line average calculation, use the new
DECLARE_EWMA() macro to declare a signal EWMA, and use that.

This actually *reduces* the code size slightly (on x86-64) while
also reducing the station info size by 80 bytes.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:53 +02:00
Lorenzo Bianconi
b119ad6e72 mac80211: add rate mask logic for vht rates
Define rc_rateidx_vht_mcs_mask array and rate_idx_match_vht_mcs_mask()
method in order to apply mcs mask for vht rates

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:51 +02:00
Lorenzo Bianconi
e910867bd2 mac80211: define rate_control_apply_mask_ratetbl()
Define rate_control_apply_mask_ratetbl() in order to apply ratemask in
rate_control_set_rates() for station rate table

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:51 +02:00
Lorenzo Bianconi
90c66bd223 mac80211: remove ieee80211_tx_rate dependency in rate mask code
Remove ieee80211_tx_rate dependency in rate_idx_match_legacy_mask(),
rate_idx_match_mcs_mask() and rate_idx_match_mask() in order to use the
previous logic to define a ratemask in rate_control_set_rates() for
station rate table. Moreover move rate mask definition logic in
rate_control_cap_mask()

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:50 +02:00
Lorenzo Bianconi
35225eb7a5 mac80211: remove ieee80211_tx_info from rate_control_apply_mask signature
Remove unnecessary ieee80211_tx_info pointer from rate_control_apply_mask
signature. rate_control_apply_mask() will be used to define a ratemask in
rate_control_set_rates() for station rate table

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-14 17:49:50 +02:00