Commit Graph

36881 Commits

Author SHA1 Message Date
Marcel Holtmann
516018a9c0 Bluetooth: Introduce hci_dev_test_and_change_flag helper macro
Instead of manually coding test_and_change_bit on hdev->dev_flags all the
time, use hci_dev_test_and_change_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:31 +02:00
Marcel Holtmann
ce05d603af Bluetooth: Introduce hci_dev_change_flag helper macro
Instead of manually coding change_bit on hdev->dev_flags all the time,
use hci_dev_change_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:29 +02:00
Marcel Holtmann
a358dc11d8 Bluetooth: Introduce hci_dev_clear_flag helper macro
Instead of manually coding clear_bit on hdev->dev_flags all the time,
use hci_dev_clear_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:27 +02:00
Marcel Holtmann
a1536da255 Bluetooth: Introduce hci_dev_set_flag helper macro
Instead of manually coding set_bit on hdev->dev_flags all the time,
use hci_dev_set_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:26 +02:00
Marcel Holtmann
d7a5a11d7f Bluetooth: Introduce hci_dev_test_flag helper macro
Instead of manually coding test_bit on hdev->dev_flags all the time,
use hci_dev_test_flag helper macro.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:09:25 +02:00
Marcel Holtmann
cc91cb042c Bluetooth: Add support connectable advertising setting
The patch adds a second advertising setting that allows switching of the
controller into connectable mode independent of the global connectable
setting.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13 12:07:54 +02:00
Eric W. Biederman
098a697b49 tcp_metrics: Use a single hash table for all network namespaces.
Now that all of the operations are safe on a single hash table
accross network namespaces, allocate a single global hash table
and update the code to use it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 01:57:07 -04:00
Eric W. Biederman
04f721c671 tcp_metrics: Rewrite tcp_metrics_flush_all
Rewrite tcp_metrics_flush_all so that it can cope with entries from
different network namespaces on it's hash chain.

This is based on the logic in tcp_metrics_nl_cmd_del for deleting
a selection of entries from a tcp metrics hash chain.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 01:57:07 -04:00
Eric W. Biederman
8a4bff714f tcp_metrics: Remove the unused return code from tcp_metrics_flush_all
tcp_metrics_flush_all always returns 0.  Remove the unnecessary return code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 01:57:07 -04:00
Eric W. Biederman
849e8a0ca8 tcp_metrics: Add a field tcpm_net and verify it matches on lookup
In preparation for using one tcp metrics hash table for all network
namespaces add a field tcpm_net to struct tcp_metrics_block, and
verify that field on all hash table lookups.

Make the field tcpm_net of type possible_net_t so it takes no space
when network namespaces are disabled.

Further add a function tm_net to read that field so we can be
efficient when network namespaces are disabled and concise
the rest of the time.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 01:57:07 -04:00
Eric W. Biederman
3e5da62d0b tcp_metrics: Mix the network namespace into the hash function.
In preparation for using one hash table for all network namespaces
mix the network namespace into the hash value.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 01:57:07 -04:00
Eric W. Biederman
6493517eae tcp_metrics: panic when tcp_metrics_init fails.
There is not a practical way to cleanup during boot so
just panic if there is a problem initializing tcp_metrics.

That will at least give us a clear place to start debugging
if something does go wrong.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-13 01:57:07 -04:00
Michael S. Tsirkin
8051a2a518 9p/trans_virtio: fix hot-unplug
On device hot-unplug, 9p/virtio currently will kfree channel while
it might still be in use.

Of course, it might stay used forever, so it's an extremely ugly hack,
but it seems better than use-after-free that we have now.

[ Unused variable removed, whitespace cleanup, msg single-lined --RR ]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-13 15:55:41 +10:30
Eric W. Biederman
76fecd8275 mpls: In mpls_egress verify the packet length.
Reobert Shearman noticed that mpls_egress is failing to verify that
the bytes to be examined are in fact present in the packet before
mpls_egress reads those bytes.

As suggested by David Miller reduce this to a single pskb_may_pull
call so that we don't do unnecessary work in the fast path.

Reported-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:05:04 -04:00
Eric Dumazet
3f66b083a5 inet: introduce ireq_family
Before inserting request socks into general hash table,
fill their socket family.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet
d4f06873b6 inet: get_openreq4() & get_openreq6() do not need listener
ireq->ir_num contains local port, use it.

Also, get_openreq4() dumping listen_sk->refcnt makes litle sense.

inet_diag_fill_req() can also use ireq->ir_num

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet
41b822c59e inet: prepare sock_edemux() & sock_gen_put() for new SYN_RECV state
sock_edemux() & sock_gen_put() should be ready to cope with request socks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet
0159dfd3d7 net: add req_prot_cleanup() & req_prot_init() helpers
Make proto_register() & proto_unregister() a bit nicer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:13 -04:00
Eric Dumazet
bd337c581b ipv6: add missing ireq_net & ir_cookie initializations
I forgot to update dccp_v6_conn_request() & cookie_v6_check().
They both need to set ireq->ireq_net and ireq->ir_cookie

Lets clear ireq->ir_cookie in inet_reqsk_alloc()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 33cf7c90fe ("net: add real socket cookies")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 22:58:12 -04:00
Daniel Borkmann
54720df130 cls_bpf: do eBPF invocation under non-bh RCU lock variant for maps
Currently, it is possible in cls_bpf to access eBPF maps only under
rcu_read_lock_bh() variants: while on ingress side, that is, handle_ing(),
the classifier would be called from __netif_receive_skb_core() under
rcu_read_lock(); on egress side, however, it's rcu_read_lock_bh() via
__dev_queue_xmit().

This rcu/rcu_bh mix doesn't work together with eBPF maps as they require
soley to be called under rcu_read_lock(). eBPF maps could also be shared
among various other eBPF programs (possibly even with other eBPF program
types, f.e. tracing) and user space processes, so any context is assumed.

Therefore, a possible fix for cls_bpf is to wrap/nest eBPF program
invocation under non-bh RCU lock variant.

Fixes: e2e9b6541d ("cls_bpf: add initial eBPF support for programmable classifiers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 18:33:15 -04:00
Alexander Duyck
0b65bd97ba fib_trie: Provide a deterministic order for fib_alias w/ tables merged
This change makes it so that we should always have a deterministic ordering
for the main and local aliases within the merged table when two leaves
overlap.

So for example if we have a leaf with a key of 192.168.254.0.  If we
previously added two aliases with a prefix length of 24 from both local and
main the first entry would be first and the second would be second.  When I
was coding this I had added a WARN_ON should such a situation occur as I
wasn't sure how likely it would be.  However this WARN_ON has been
triggered so this is something that should be addressed.

With this patch the ordering of the aliases is as follows.  First they are
sorted on prefix length, then on their table ID, then tos, and finally
priority.  This way what we end up doing is essentially interleaving the
two tables on what used to be leaf_info structure boundaries.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 18:26:51 -04:00
Alexander Duyck
3c9e9f7320 fib_trie: Avoid NULL pointer if local table is not allocated
The function fib_unmerge assumed the local table had already been
allocated.  If that is not the case however when custom rules are applied
then this can result in a NULL pointer dereference.

In order to prevent this we must check the value of the local table pointer
and if it is NULL simply return 0 as there is no local table to separate
from the main.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 18:26:51 -04:00
Eric W. Biederman
0c5c9fb551 net: Introduce possible_net_t
Having to say
> #ifdef CONFIG_NET_NS
> 	struct net *net;
> #endif

in structures is a little bit wordy and a little bit error prone.

Instead it is possible to say:
> typedef struct {
> #ifdef CONFIG_NET_NS
>       struct net *net;
> #endif
> } possible_net_t;

And then in a header say:

> 	possible_net_t net;

Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.

Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.

This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00
Eric W. Biederman
efd7ef1c19 net: Kill hold_net release_net
hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008.  Kill the code it is long past due.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:39:40 -04:00
Marcel Holtmann
983f9814c0 Bluetooth: Remove two else branches that are not needed
The SMP code contains two else branches that are not needed since the
successful test will actually leave the function.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-12 09:00:48 +02:00
Arnd Bergmann
f862e07cf9 rds: avoid potential stack overflow
The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
on the stack in order to pass a pair of addresses. This happens to just
fit withint the 1024 byte stack size warning limit on x86, but just
exceed that limit on ARM, which gives us this warning:

net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]

As the use of this large variable is basically bogus, we can rearrange
the code to not do that. Instead of passing an rds socket into
rds_iw_get_device, we now just pass the two addresses that we have
available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
to create two address structures on the stack there.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 00:28:01 -04:00
Willem de Bruijn
3a8dd9711e sock: fix possible NULL sk dereference in __skb_tstamp_tx
Test that sk != NULL before reading sk->sk_tsflags.

Fixes: 49ca0d8bfa ("net-timestamp: no-payload option")
Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 00:09:55 -04:00
Eric Dumazet
c29390c6df xps: must clear sender_cpu before forwarding
John reported that my previous commit added a regression
on his router.

This is because sender_cpu & napi_id share a common location,
so get_xps_queue() can see garbage and perform an out of bound access.

We need to make sure sender_cpu is cleared before doing the transmit,
otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger
this bug.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: John <jw@nuclearfallout.net>
Bisected-by: John <jw@nuclearfallout.net>
Fixes: 2bd82484bb ("xps: fix xps for stacked devices")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:51:18 -04:00
Eric Dumazet
d77c555d32 net: fix CONFIG_NET_NS=n compilation
I forgot to use write_pnet() in three locations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 33cf7c90fe ("net: add real socket cookies")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:28:49 -04:00
Lubomir Rintel
c78ba6d64c ipv6: expose RFC4191 route preference via rtnetlink
This makes it possible to retain the route preference when RAs are handled in
userspace.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:28:09 -04:00
Simon Horman
ac70c05b6f switchdev: correct spelling of notifier in comments
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:06:39 -04:00
Eric Dumazet
33cf7c90fe net: add real socket cookies
A long standing problem in netlink socket dumps is the use
of kernel socket addresses as cookies.

1) It is a security concern.

2) Sockets can be reused quite quickly, so there is
   no guarantee a cookie is used once and identify
   a flow.

3) request sock, establish sock, and timewait socks
   for a given flow have different cookies.

Part of our effort to bring better TCP statistics requires
to switch to a different allocator.

In this patch, I chose to use a per network namespace 64bit generator,
and to use it only in the case a socket needs to be dumped to netlink.
(This might be refined later if needed)

Note that I tried to carry cookies from request sock, to establish sock,
then timewait sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:55:28 -04:00
Alexey Kodanev
b1cb59cf2e net: sysctl_net_core: check SNDBUF and RCVBUF for min length
sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
set to incorrect values. Given that 'struct sk_buff' allocates from
rcvbuf, incorrectly set buffer length could result to memory
allocation failures. For example, set them as follows:

    # sysctl net.core.rmem_default=64
      net.core.wmem_default = 64
    # sysctl net.core.wmem_default=64
      net.core.wmem_default = 64
    # ping localhost -s 1024 -i 0 > /dev/null

This could result to the following failure:

skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
kernel BUG at net/core/skbuff.c:102!
invalid opcode: 0000 [#1] SMP
...
task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
Stack:
 ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
 ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
Call Trace:
 [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
 [<ffffffff81556f56>] sock_write_iter+0x146/0x160
 [<ffffffff811d9612>] new_sync_write+0x92/0xd0
 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
 [<ffffffff811da499>] SyS_write+0x59/0xd0
 [<ffffffff81651532>] system_call_fastpath+0x12/0x17
Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
      00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
      eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP <ffff88003ae8bc68>
Kernel panic - not syncing: Fatal exception

Moreover, the possible minimum is 1, so we can get another kernel panic:
...
BUG: unable to handle kernel paging request at ffff88013caee5c0
IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0
...

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:25:13 -04:00
Alexander Duyck
654eff4516 fib_trie: Only display main table in /proc/net/route
When we merged the tries for local and main I had overlooked the iterator
for /proc/net/route.  As a result it was outputting both local and main
when the two tries were merged.

This patch resolves that by only providing output for aliases that are
actually in the main trie.  As a result we should go back to the original
behavior which I assume will be necessary to maintain legacy support.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:24:32 -04:00
Florian Fainelli
cd28a1a9ba net: dsa: fully divert PHY reads/writes if requested
In case a PHY is found via Device Tree, and is also flagged by the
switch driver as needing indirect reads/writes using the switch driver
implemented MDIO bus, make sure that we bind this PHY to the slave MII
bus in order for this to happen.

Without this, we would succeed in having the PHY driver probe()'s
function to use slave MII bus read/write functions, because this is done
during dsa_slave_mii_init(), but past that point, the PHY driver would
not go through these diverted reads and writes.

Fixes: 0d8bcdd383 ("net: dsa: allow for more complex PHY setups")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:56:29 -04:00
Florian Fainelli
c305c1651c net: dsa: move PHY setup on DSA MII bus to its own function
In preparation for dealing with indirect reads and writes towards
certain PHY devices, move the code which deals with binding the PHY
device to the slave MII bus created by DSA to its own function:
dsa_slave_phy_connect().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:56:28 -04:00
Alexander Duyck
61f0d861fc fib_trie: Fix uninitialized variable warning
The 0-day kernel test infrastructure reported a use of uninitialized
variable warning for local_table due to the fact that the local and main
allocations had been swapped from the original setup.  This change corrects
that by making it so that we free the main table if the local table
allocation fails.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:33:44 -04:00
Neal Cardwell
d578e18ce9 tcp: restore 1.5x per RTT limit to CUBIC cwnd growth in congestion avoidance
Commit 814d488c61 ("tcp: fix the timid additive increase on stretch
ACKs") fixed a bug where tcp_cong_avoid_ai() would either credit a
connection with an increase of snd_cwnd_cnt, or increase snd_cwnd, but
not both, resulting in cwnd increasing by 1 packet on at most every
alternate invocation of tcp_cong_avoid_ai().

Although the commit correctly implemented the CUBIC algorithm, which
can increase cwnd by as much as 1 packet per 1 packet ACKed (2x per
RTT), in practice that could be too aggressive: in tests on network
paths with small buffers, YouTube server retransmission rates nearly
doubled.

This commit restores CUBIC to a maximum cwnd growth rate of 1 packet
per 2 packets ACKed (1.5x per RTT). In YouTube tests this restored
retransmit rates to low levels.

Testing: This patch has been tested in datacenter netperf transfers
and live youtube.com and google.com servers.

Fixes: 9cd981dcf1 ("tcp: fix stretch ACK bugs in CUBIC")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:51:51 -04:00
Neal Cardwell
9949afa42b tcp: fix tcp_cong_avoid_ai() credit accumulation bug with decreases in w
The recent change to tcp_cong_avoid_ai() to handle stretch ACKs
introduced a bug where snd_cwnd_cnt could accumulate a very large
value while w was large, and then if w was reduced snd_cwnd could be
incremented by a large delta, leading to a large burst and high packet
loss. This was tickled when CUBIC's bictcp_update() sets "ca->cnt =
100 * cwnd".

This bug crept in while preparing the upstream version of
814d488c61.

Testing: This patch has been tested in datacenter netperf transfers
and live youtube.com and google.com servers.

Fixes: 814d488c61 ("tcp: fix the timid additive increase on stretch ACKs")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:51:51 -04:00
Sabrina Dubroca
6dede75b7e fib_trie: call fib_table_flush_external under RTNL
Move rtnl_lock() before the call to fib4_rules_exit so that
fib_table_flush_external is called under RTNL.

Fixes: 104616e74e ("switchdev: don't support custom ip rules, for now")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:46:26 -04:00
Robert Shearman
8a08919f43 mpls: Allow mpls_gso and mpls_router to be built as modules
CONFIG_MPLS=m doesn't result in a kernel module being built because it
applies to the net/mpls directory, rather than to .o files.

So revert the MPLS menuitem to being a boolean and make MPLS_GSO and
MPLS_ROUTING tristates to allow mpls_gso and mpls_router modules to be
produced as desired.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:38:54 -04:00
Alexander Duyck
0ddcf43d5d ipv4: FIB Local/MAIN table collapse
This patch is meant to collapse local and main into one by converting
tb_data from an array to a pointer.  Doing this allows us to point the
local table into the main while maintaining the same variables in the
table.

As such the tb_data was converted from an array to a pointer, and a new
array called data is added in order to still provide an object for tb_data
to point to.

In order to track the origin of the fib aliases a tb_id value was added in
a hole that existed on 64b systems.  Using this we can also reverse the
merge in the event that custom FIB rules are enabled.

With this patch I am seeing an improvement of 20ns to 30ns for routing
lookups as long as custom rules are not enabled, with custom rules enabled
we fall back to split tables and the original behavior.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:22:14 -04:00
Johan Hedberg
4ba9faf35f Bluetooth: Check for matching IRK when looking for paired LE devices
If we're given an RPA when checking whether we're paired or not, we
should consult the local RPA storage whether there's a matching IRK.
This we we ensure that hci_bdaddr_is_paired() gives the right result
even when trying to pair a second time with the same device with an RPA.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-11 15:54:23 +01:00
Johan Hedberg
87c8b28d29 Bluetooth: Fix missing rcu_read_unlock() in hci_bdaddr_is_paired()
When finding a matching LTK the rcu_read_unlock() function was failing
to release the RCU read lock. This patch adds the missing call to
rcu_reaD_unlock().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-11 08:52:32 +01:00
Marcel Holtmann
beb1c21b8e Bluetooth: Increment management interface revision
This patch increments the management interface revision due to
introduction of new static address setting and fixes for the
fast connectable feature.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-11 09:28:41 +02:00
David S. Miller
4363890079 net: Handle unregister properly when netdev namespace change fails.
If rtnl_newlink() fails on it's call to dev_change_net_namespace(), we
have to make use of the ->dellink() method, if present, just like we
do when rtnl_configure_link() fails.

Fixes: 317f4810e4 ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 21:59:46 -04:00
Jon Paul Maloy
169bf9121b tipc: ensure that idle links are deleted when a bearer is disabled
commit afaa3f65f6
(tipc: purge links when bearer is disabled) was an attempt to resolve
a problem that turned out to have a more profound reason.

When we disable a bearer, we delete all its pertaining links if
there is no other bearer to perform failover to, or if the module
is shutting down. In case there are dual bearers, we wait with
deleting links until the failover procedure is finished.

However, this misses the case when a link on the removed bearer
was already down, so that there will be no failover procedure to
finish the link delete. This causes confusion if a new bearer is
added to replace the removed one, and also entails a small memory
leak.

This commit takes the current state of the link into account when
deciding when to delete it, and also reverses the above-mentioned
commit.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 18:37:36 -04:00
Alexander Duyck
ddb4b9a132 fib_trie: Address possible NULL pointer dereference in resize
If the inflate call failed it would return NULL.  As a result tp would be
set to NULL and cause use to trigger a NULL pointer dereference in
should_halve if the inflate failed on the first attempt.

In order to prevent this we should decrement max_work before we actually
attempt to inflate as this will force us to exit before attempting to halve
a node we should have inflated.  In order to keep things symmetric between
inflate and halve I went ahead and also moved the decrement of max_work for
the halve case as well so we take care of that before we actually attempt
to halve the tnode.

Fixes: 88bae714 ("fib_trie: Add key vector to root, return parent key_vector in resize")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 18:36:56 -04:00
Johan Hedberg
55e76b3898 Bluetooth: Add 'Already Paired' error for Pair Device command
To make the behavior predictable when attempting to pair with a device
for which we already have a Link Key or Long Term Key, this patch adds a
new 'Already Paired' error which gets sent in such a scenario.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-10 21:42:05 +01:00
Alexander Duyck
3ec320dd5c fib_trie: Correctly handle case of key == 0 in leaf_walk_rcu
In the case of a trie that had no tnodes with a key of 0 the initial
look-up would fail resulting in an out-of-bounds cindex on the first tnode.
This resulted in an entire trie being skipped.

In order resolve this I have updated the cindex logic in the initial
look-up so that if the key is zero we will always traverse the child zero
path.

Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 16:13:55 -04:00