Commit Graph

37483 Commits

Author SHA1 Message Date
Lubomir Rintel
c78ba6d64c ipv6: expose RFC4191 route preference via rtnetlink
This makes it possible to retain the route preference when RAs are handled in
userspace.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:28:09 -04:00
Simon Horman
ac70c05b6f switchdev: correct spelling of notifier in comments
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:06:39 -04:00
Eric Dumazet
33cf7c90fe net: add real socket cookies
A long standing problem in netlink socket dumps is the use
of kernel socket addresses as cookies.

1) It is a security concern.

2) Sockets can be reused quite quickly, so there is
   no guarantee a cookie is used once and identify
   a flow.

3) request sock, establish sock, and timewait socks
   for a given flow have different cookies.

Part of our effort to bring better TCP statistics requires
to switch to a different allocator.

In this patch, I chose to use a per network namespace 64bit generator,
and to use it only in the case a socket needs to be dumped to netlink.
(This might be refined later if needed)

Note that I tried to carry cookies from request sock, to establish sock,
then timewait sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:55:28 -04:00
Alexey Kodanev
b1cb59cf2e net: sysctl_net_core: check SNDBUF and RCVBUF for min length
sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
set to incorrect values. Given that 'struct sk_buff' allocates from
rcvbuf, incorrectly set buffer length could result to memory
allocation failures. For example, set them as follows:

    # sysctl net.core.rmem_default=64
      net.core.wmem_default = 64
    # sysctl net.core.wmem_default=64
      net.core.wmem_default = 64
    # ping localhost -s 1024 -i 0 > /dev/null

This could result to the following failure:

skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
kernel BUG at net/core/skbuff.c:102!
invalid opcode: 0000 [#1] SMP
...
task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
Stack:
 ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
 ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
Call Trace:
 [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
 [<ffffffff81556f56>] sock_write_iter+0x146/0x160
 [<ffffffff811d9612>] new_sync_write+0x92/0xd0
 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
 [<ffffffff811da499>] SyS_write+0x59/0xd0
 [<ffffffff81651532>] system_call_fastpath+0x12/0x17
Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
      00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
      eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP <ffff88003ae8bc68>
Kernel panic - not syncing: Fatal exception

Moreover, the possible minimum is 1, so we can get another kernel panic:
...
BUG: unable to handle kernel paging request at ffff88013caee5c0
IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0
...

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:25:13 -04:00
Alexander Duyck
654eff4516 fib_trie: Only display main table in /proc/net/route
When we merged the tries for local and main I had overlooked the iterator
for /proc/net/route.  As a result it was outputting both local and main
when the two tries were merged.

This patch resolves that by only providing output for aliases that are
actually in the main trie.  As a result we should go back to the original
behavior which I assume will be necessary to maintain legacy support.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 21:24:32 -04:00
Florian Fainelli
cd28a1a9ba net: dsa: fully divert PHY reads/writes if requested
In case a PHY is found via Device Tree, and is also flagged by the
switch driver as needing indirect reads/writes using the switch driver
implemented MDIO bus, make sure that we bind this PHY to the slave MII
bus in order for this to happen.

Without this, we would succeed in having the PHY driver probe()'s
function to use slave MII bus read/write functions, because this is done
during dsa_slave_mii_init(), but past that point, the PHY driver would
not go through these diverted reads and writes.

Fixes: 0d8bcdd383 ("net: dsa: allow for more complex PHY setups")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:56:29 -04:00
Florian Fainelli
c305c1651c net: dsa: move PHY setup on DSA MII bus to its own function
In preparation for dealing with indirect reads and writes towards
certain PHY devices, move the code which deals with binding the PHY
device to the slave MII bus created by DSA to its own function:
dsa_slave_phy_connect().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:56:28 -04:00
Alexander Duyck
61f0d861fc fib_trie: Fix uninitialized variable warning
The 0-day kernel test infrastructure reported a use of uninitialized
variable warning for local_table due to the fact that the local and main
allocations had been swapped from the original setup.  This change corrects
that by making it so that we free the main table if the local table
allocation fails.

Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 17:33:44 -04:00
Neal Cardwell
d578e18ce9 tcp: restore 1.5x per RTT limit to CUBIC cwnd growth in congestion avoidance
Commit 814d488c61 ("tcp: fix the timid additive increase on stretch
ACKs") fixed a bug where tcp_cong_avoid_ai() would either credit a
connection with an increase of snd_cwnd_cnt, or increase snd_cwnd, but
not both, resulting in cwnd increasing by 1 packet on at most every
alternate invocation of tcp_cong_avoid_ai().

Although the commit correctly implemented the CUBIC algorithm, which
can increase cwnd by as much as 1 packet per 1 packet ACKed (2x per
RTT), in practice that could be too aggressive: in tests on network
paths with small buffers, YouTube server retransmission rates nearly
doubled.

This commit restores CUBIC to a maximum cwnd growth rate of 1 packet
per 2 packets ACKed (1.5x per RTT). In YouTube tests this restored
retransmit rates to low levels.

Testing: This patch has been tested in datacenter netperf transfers
and live youtube.com and google.com servers.

Fixes: 9cd981dcf1 ("tcp: fix stretch ACK bugs in CUBIC")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:51:51 -04:00
Neal Cardwell
9949afa42b tcp: fix tcp_cong_avoid_ai() credit accumulation bug with decreases in w
The recent change to tcp_cong_avoid_ai() to handle stretch ACKs
introduced a bug where snd_cwnd_cnt could accumulate a very large
value while w was large, and then if w was reduced snd_cwnd could be
incremented by a large delta, leading to a large burst and high packet
loss. This was tickled when CUBIC's bictcp_update() sets "ca->cnt =
100 * cwnd".

This bug crept in while preparing the upstream version of
814d488c61.

Testing: This patch has been tested in datacenter netperf transfers
and live youtube.com and google.com servers.

Fixes: 814d488c61 ("tcp: fix the timid additive increase on stretch ACKs")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:51:51 -04:00
Sabrina Dubroca
6dede75b7e fib_trie: call fib_table_flush_external under RTNL
Move rtnl_lock() before the call to fib4_rules_exit so that
fib_table_flush_external is called under RTNL.

Fixes: 104616e74e ("switchdev: don't support custom ip rules, for now")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:46:26 -04:00
Robert Shearman
8a08919f43 mpls: Allow mpls_gso and mpls_router to be built as modules
CONFIG_MPLS=m doesn't result in a kernel module being built because it
applies to the net/mpls directory, rather than to .o files.

So revert the MPLS menuitem to being a boolean and make MPLS_GSO and
MPLS_ROUTING tristates to allow mpls_gso and mpls_router modules to be
produced as desired.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:38:54 -04:00
Alexander Duyck
0ddcf43d5d ipv4: FIB Local/MAIN table collapse
This patch is meant to collapse local and main into one by converting
tb_data from an array to a pointer.  Doing this allows us to point the
local table into the main while maintaining the same variables in the
table.

As such the tb_data was converted from an array to a pointer, and a new
array called data is added in order to still provide an object for tb_data
to point to.

In order to track the origin of the fib aliases a tb_id value was added in
a hole that existed on 64b systems.  Using this we can also reverse the
merge in the event that custom FIB rules are enabled.

With this patch I am seeing an improvement of 20ns to 30ns for routing
lookups as long as custom rules are not enabled, with custom rules enabled
we fall back to split tables and the original behavior.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 16:22:14 -04:00
Johan Hedberg
4ba9faf35f Bluetooth: Check for matching IRK when looking for paired LE devices
If we're given an RPA when checking whether we're paired or not, we
should consult the local RPA storage whether there's a matching IRK.
This we we ensure that hci_bdaddr_is_paired() gives the right result
even when trying to pair a second time with the same device with an RPA.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-11 15:54:23 +01:00
Johan Hedberg
87c8b28d29 Bluetooth: Fix missing rcu_read_unlock() in hci_bdaddr_is_paired()
When finding a matching LTK the rcu_read_unlock() function was failing
to release the RCU read lock. This patch adds the missing call to
rcu_reaD_unlock().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-11 08:52:32 +01:00
Marcel Holtmann
beb1c21b8e Bluetooth: Increment management interface revision
This patch increments the management interface revision due to
introduction of new static address setting and fixes for the
fast connectable feature.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-11 09:28:41 +02:00
David S. Miller
4363890079 net: Handle unregister properly when netdev namespace change fails.
If rtnl_newlink() fails on it's call to dev_change_net_namespace(), we
have to make use of the ->dellink() method, if present, just like we
do when rtnl_configure_link() fails.

Fixes: 317f4810e4 ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 21:59:46 -04:00
Jon Paul Maloy
169bf9121b tipc: ensure that idle links are deleted when a bearer is disabled
commit afaa3f65f6
(tipc: purge links when bearer is disabled) was an attempt to resolve
a problem that turned out to have a more profound reason.

When we disable a bearer, we delete all its pertaining links if
there is no other bearer to perform failover to, or if the module
is shutting down. In case there are dual bearers, we wait with
deleting links until the failover procedure is finished.

However, this misses the case when a link on the removed bearer
was already down, so that there will be no failover procedure to
finish the link delete. This causes confusion if a new bearer is
added to replace the removed one, and also entails a small memory
leak.

This commit takes the current state of the link into account when
deciding when to delete it, and also reverses the above-mentioned
commit.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 18:37:36 -04:00
Alexander Duyck
ddb4b9a132 fib_trie: Address possible NULL pointer dereference in resize
If the inflate call failed it would return NULL.  As a result tp would be
set to NULL and cause use to trigger a NULL pointer dereference in
should_halve if the inflate failed on the first attempt.

In order to prevent this we should decrement max_work before we actually
attempt to inflate as this will force us to exit before attempting to halve
a node we should have inflated.  In order to keep things symmetric between
inflate and halve I went ahead and also moved the decrement of max_work for
the halve case as well so we take care of that before we actually attempt
to halve the tnode.

Fixes: 88bae714 ("fib_trie: Add key vector to root, return parent key_vector in resize")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 18:36:56 -04:00
Johan Hedberg
55e76b3898 Bluetooth: Add 'Already Paired' error for Pair Device command
To make the behavior predictable when attempting to pair with a device
for which we already have a Link Key or Long Term Key, this patch adds a
new 'Already Paired' error which gets sent in such a scenario.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-10 21:42:05 +01:00
Alexander Duyck
3ec320dd5c fib_trie: Correctly handle case of key == 0 in leaf_walk_rcu
In the case of a trie that had no tnodes with a key of 0 the initial
look-up would fail resulting in an out-of-bounds cindex on the first tnode.
This resulted in an entire trie being skipped.

In order resolve this I have updated the cindex logic in the initial
look-up so that if the key is zero we will always traverse the child zero
path.

Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 16:13:55 -04:00
Oliver Hartkopp
7768eed8bf net: add comment for sock_efree() usage
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 16:12:20 -04:00
Johan Hedberg
406ef2a67b Bluetooth: Make Fast Connectable available while powered off
To maximize the usability of the Fast Connectable feature we should make
it possible to set (or unset) it at any given moment. This means
removing the dependency on the 'connectable' setting as well as the
'powered' setting. The former makes also sense since page scan may get
enabled through add_device even if 'connectable' is false. To keep the
setting available over power cycles its flag also needs to be removed
from the flags that are cleared upon HCI_Reset.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-10 19:37:02 +01:00
Eric Dumazet
34160ea3f9 inet_diag: add const to inet_diag_req_v2
diag dumpers should not modify the request.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 13:45:28 -04:00
Eric Dumazet
e31c5e0e48 inet_diag: cleanups
Remove all inline keywords, add some const, and cleanup style.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 13:45:28 -04:00
Eric Dumazet
491da2a477 net: constify sock_diag_check_cookie()
sock_diag_check_cookie() second parameter is constant

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 13:45:28 -04:00
David S. Miller
515fb5c317 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter fixes for net-next

The following batch contains a couple of fixes to address some fallout
from the previous pull request, they are:

1) Address link problems in the bridge code after e5de75b. Fix it by
   using rcu hook to address to avoid ifdef pollution and hard
   dependency between bridge and br_netfilter.

2) Address sparse warnings in the netfilter reject code, patch from
   Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10 12:48:47 -04:00
Pablo Neira Ayuso
1a4ba64d16 netfilter: bridge: use rcu hook to resolve br_netfilter dependency
e5de75b ("netfilter: bridge: move DNAT helper to br_netfilter") results
in the following link problem:

net/bridge/br_device.c:29: undefined reference to `br_nf_prerouting_finish_bridge`

Moreover it creates a hard dependency between br_netfilter and the
bridge core, which is what we've been trying to avoid so far.

Resolve this problem by using a hook structure so we reduce #ifdef
pollution and keep bridge netfilter specific code under br_netfilter.c
which was the original intention.

Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-10 15:03:02 +01:00
Florian Westphal
a03a8dbe20 netfilter: fix sparse warnings in reject handling
make C=1 CF=-D__CHECK_ENDIAN__ shows following:

net/bridge/netfilter/nft_reject_bridge.c:65:50: warning: incorrect type in argument 3 (different base types)
net/bridge/netfilter/nft_reject_bridge.c:65:50:    expected restricted __be16 [usertype] protocol [..]
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: cast from restricted __be16
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: incorrect type in argument 1 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:121:50: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:168:52: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:233:52: warning: incorrect type in argument 3 (different base types) [..]

Caused by two (harmless) errors:
1. htons() instead of ntohs()
2. __be16 for protocol in nf_reject_ipXhdr_put API, use u8 instead.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-10 15:01:32 +01:00
Scott Feldman
f8f2147150 switchdev: add netlink flags to IPv4 FIB add op
Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op
to allow driver to 1) optimize hardware updates, 2) handle ip route prepend
and append commands correctly.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:56:52 -04:00
Florian Fainelli
769a020289 net: dsa: utilize of_find_net_device_by_node
Using of_find_device_by_node() restricts the search to platform_device that
match the specified device_node pointer. This is not even remotely true for
network devices backed by a pci_device for instance.

of_find_net_device_by_node() allows us to do a more thorough lookup to find the
struct net_device corresponding to a particular device_node pointer.

For symetry with the non-OF code path, we hold the net_device pointer in
dsa_probe() just like what dev_to_net_dev() does when we call this
function.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:50:21 -04:00
Florian Fainelli
aa836df958 net: core: add of_find_net_device_by_node()
Add a helper function which allows getting the struct net_device pointer
associated with a given struct device_node pointer. This is useful for
instance for DSA Ethernet devices not backed by a platform_device, but a PCI
device.

Since we need to access net_class which is not accessible outside of
net/core/net-sysfs.c, this helper function is also added here and gated
with CONFIG_OF_NET.

Network devices initialized with SET_NETDEV_DEV() are also taken into
account by checking for dev->parent first and then falling back to
checking the device pointer within struct net_device.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:50:20 -04:00
WANG Cong
5778d39d07 net_sched: fix struct tc_u_hnode layout in u32
We dynamically allocate divisor+1 entries for ->ht[] in tc_u_hnode:

  ht = kzalloc(sizeof(*ht) + divisor*sizeof(void *), GFP_KERNEL);

So ->ht is supposed to be the last field of this struct, however
this is broken, since an rcu head is appended after it.

Fixes: 1ce87720d4 ("net: sched: make cls_u32 lockless")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:44:31 -04:00
David S. Miller
3cef5c5b0b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c

Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:38:02 -04:00
Linus Torvalds
36bef88380 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) nft_compat accidently truncates ethernet protocol to 8-bits, from
    Arturo Borrero.

 2) Memory leak in ip_vs_proc_conn(), from Julian Anastasov.

 3) Don't allow the space required for nftables rules to exceed the
    maximum value representable in the dlen field.  From Patrick
    McHardy.

 4) bcm63xx_enet can accidently leave interrupts permanently disabled
    due to errors in the NAPI polling exit logic.  Fix from Nicolas
    Schichan.

 5) Fix OOPSes triggerable by the ping protocol module, due to missing
    address family validations etc.  From Lorenzo Colitti.

 6) Don't use RCU locking in sleepable context in team driver, from Jiri
    Pirko.

 7) xen-netback miscalculates statistic offset pointers when reporting
    the stats to userspace.  From David Vrabel.

 8) Fix a leak of up to 256 pages per VIF destroy in xen-netaback, also
    from David Vrabel.

 9) ip_check_defrag() cannot assume that skb_network_offset(),
    particularly when it is used by the AF_PACKET fanout defrag code.
    From Alexander Drozdov.

10) gianfar driver doesn't query OF node names properly when trying to
    determine the number of hw queues available.  Fix it to explicitly
    check for OF nodes named queue-group.  From Tobias Waldekranz.

11) MID field in macb driver should be 12 bits, not 16.  From Punnaiah
    Choudary Kalluri.

12) Fix unintentional regression in traceroute due to timestamp socket
    option changes.  Empty ICMP payloads should be allowed in
    non-timestamp cases.  From Willem de Bruijn.

13) When devices are unregistered, we have to get rid of AF_PACKET
    multicast list entries that point to it via ifindex.  Fix from
    Francesco Ruggeri.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  tipc: fix bug in link failover handling
  net: delete stale packet_mclist entries
  net: macb: constify macb configuration data
  MAINTAINERS: add Marc Kleine-Budde as co maintainer for CAN networking layer
  MAINTAINERS: linux-can moved to github
  can: kvaser_usb: Read all messages in a bulk-in URB buffer
  can: kvaser_usb: Avoid double free on URB submission failures
  can: peak_usb: fix missing ctrlmode_ init for every dev
  can: add missing initialisations in CAN related skbuffs
  ip: fix error queue empty skb handling
  bgmac: Clean warning messages
  tcp: align tcp_xmit_size_goal() on tcp_tso_autosize()
  net: fec: fix unbalanced clk disable on driver unbind
  net: macb: Correct the MID field length value
  net: gianfar: correctly determine the number of queue groups
  ipv4: ip_check_defrag should not assume that skb_network_offset is zero
  net: bcmgenet: properly disable password matching
  net: eth: xgene: fix booting with devicetree
  bnx2x: Force fundamental reset for EEH recovery
  xen-netback: refactor xenvif_handle_frag_list()
  ...
2015-03-09 18:17:21 -07:00
Jon Paul Maloy
e6441bae32 tipc: fix bug in link failover handling
In commit c637c10355
("tipc: resolve race problem at unicast message reception") we
introduced a new mechanism for delivering buffers upwards from link
to socket layer.

That code contains a bug in how we handle the new link input queue
during failover. When a link is reset, some of its users may be blocked
because of congestion, and in order to resolve this, we add any pending
wakeup pseudo messages to the link's input queue, and deliver them to
the socket. This misses the case where the other, remaining link also
may have congested users. Currently, the owner node's reference to the
remaining link's input queue is unconditionally overwritten by the
reset link's input queue. This has the effect that wakeup events from
the remaining link may be unduely delayed (but not lost) for a
potentially long period.

We fix this by adding the pending events from the reset link to the
input queue that is currently referenced by the node, whichever one
it is.

This commit should be applied to both net and net-next.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 16:20:41 -04:00
Francesco Ruggeri
82f17091e6 net: delete stale packet_mclist entries
When an interface is deleted from a net namespace the ifindex in the
corresponding entries in PF_PACKET sockets' mclists becomes stale.
This can create inconsistencies if later an interface with the same ifindex
is moved from a different namespace (not that unlikely since ifindexes are
per-namespace).
In particular we saw problems with dev->promiscuity, resulting
in "promiscuity touches roof, set promiscuity failed. promiscuity
feature of device might be broken" warnings and EOVERFLOW failures of
setsockopt(PACKET_ADD_MEMBERSHIP).
This patch deletes the mclist entries for interfaces that are deleted.
Since this now causes setsockopt(PACKET_DROP_MEMBERSHIP) to fail with
EADDRNOTAVAIL if called after the interface is deleted, also make
packet_mc_drop not fail.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 16:17:43 -04:00
Eric W. Biederman
ddb3b6033c net: Remove protocol from struct dst_ops
After my change to neigh_hh_init to obtain the protocol from the
neigh_table there are no more users of protocol in struct dst_ops.
Remove the protocol field from dst_ops and all of it's initializers.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 16:06:10 -04:00
David S. Miller
5428aef811 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Basically, improvements for the packet rejection infrastructure,
deprecation of CLUSTERIP, cleanups for nf_tables and some untangling for
br_netfilter. More specifically they are:

1) Send packet to reset flow if checksum is valid, from Florian Westphal.

2) Fix nf_tables reject bridge from the input chain, also from Florian.

3) Deprecate the CLUSTERIP target, the cluster match supersedes it in
   functionality and it's known to have problems.

4) A couple of cleanups for nf_tables rule tracing infrastructure, from
   Patrick McHardy.

5) Another cleanup to place transaction declarations at the bottom of
   nf_tables.h, also from Patrick.

6) Consolidate Kconfig dependencies wrt. NF_TABLES.

7) Limit table names to 32 bytes in nf_tables.

8) mac header copying in bridge netfilter is already required when
   calling ip_fragment(), from Florian Westphal.

9) move nf_bridge_update_protocol() to br_netfilter.c, also from
   Florian.

10) Small refactor in br_netfilter in the transmission path, again from
    Florian.

11) Move br_nf_pre_routing_finish_bridge_slow() to br_netfilter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:58:21 -04:00
Geert Uytterhoeven
26c459a807 mpls: Spelling: s/conceved/conceived/, s/as/a/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:51:59 -04:00
Erik Hugne
143fe22f50 tipc: fix inconsistent signal handling regression
Commit 9bbb4ecc68 ("tipc: standardize recvmsg routine") changed
the sleep/wakeup behaviour for sockets entering recv() or accept().
In this process the order of reporting -EAGAIN/-EINTR was reversed.
This caused problems with wrong errno being reported back if the
timeout expires. The same problem happens if the socket is
nonblocking and recv()/accept() is called when the process have
pending signals. If there is no pending data read or connections to
accept, -EINTR will be returned instead of -EAGAIN.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reported-by László Benedek <laszlo.benedek@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:42:19 -04:00
Jiri Pirko
f4427bc3e2 switchdev: use gpl variant of symbol export
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:37:39 -04:00
Erik Hugne
4fee6be813 tipc: sparse: fix htons conversion warnings
Commit d0f91938be ("tipc: add ip/udp media type") introduced
some new sparse warnings. Clean them up.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:37:04 -04:00
Cong Wang
1e052be69d net_sched: destroy proto tp when all filters are gone
Kernel automatically creates a tp for each
(kind, protocol, priority) tuple, which has handle 0,
when we add a new filter, but it still is left there
after we remove our own, unless we don't specify the
handle (literally means all the filters under
the tuple). For example this one is left:

  # tc filter show dev eth0
  filter parent 8001: protocol arp pref 49152 basic

The user-space is hard to clean up these for kernel
because filters like u32 are organized in a complex way.
So kernel is responsible to remove it after all filters
are gone.  Each type of filter has its own way to
store the filters, so each type has to provide its
way to check if all filters are gone.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 15:35:55 -04:00
Pablo Neira Ayuso
e5de75bf88 netfilter: bridge: move DNAT helper to br_netfilter
Only one caller, there is no need to keep this in a header.
Move it to br_netfilter.c where this belongs to.

Based on patch from Florian Westphal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09 17:56:07 +01:00
Florian Westphal
7a8d831df5 netfilter: bridge: refactor conditional in br_nf_dev_queue_xmit
simpilifies followup patch that re-works brnf ip_fragment handling.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09 13:22:58 +01:00
Florian Westphal
4a9d2f2008 netfilter: bridge: move nf_bridge_update_protocol to where its used
no need to keep it in a header file.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09 13:21:31 +01:00
Florian Westphal
8bd63cf1a4 bridge: move mac header copying into br_netfilter
The mac header only has to be copied back into the skb for
fragments generated by ip_fragment(), which only happens
for bridge forwarded packets with nf-call-iptables=1 && active nf_defrag.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09 13:20:48 +01:00
Oliver Hartkopp
969439016d can: add missing initialisations in CAN related skbuffs
When accessing CAN network interfaces with AF_PACKET sockets e.g. by dhclient
this can lead to a skb_under_panic due to missing skb initialisations.

Add the missing initialisations at the CAN skbuff creation times on driver
level (rx path) and in the network layer (tx path).

Reported-by: Austin Schuh <austin@peloton-tech.com>
Reported-by: Daniel Steer <daniel.steer@mclaren.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-03-09 10:22:24 +01:00
Willem de Bruijn
c247f0534c ip: fix error queue empty skb handling
When reading from the error queue, msg_name and msg_control are only
populated for some errors. A new exception for empty timestamp skbs
added a false positive on icmp errors without payload.

`traceroute -M udpconn` only displayed gateways that return payload
with the icmp error: the embedded network headers are pulled before
sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise.

Fix this regression by refining when msg_name and msg_control
branches are taken. The solutions for the two fields are independent.

msg_name only makes sense for errors that configure serr->port and
serr->addr_offset. Test the first instead of skb->len. This also fixes
another issue. saddr could hold the wrong data, as serr->addr_offset
is not initialized  in some code paths, pointing to the start of the
network header. It is only valid when serr->port is set (non-zero).

msg_control support differs between IPv4 and IPv6. IPv4 only honors
requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The
skb->len test can simply be removed, because skb->dev is also tested
and never true for empty skbs. IPv6 honors requests for all errors
aside from local errors and timestamps on empty skbs.

In both cases, make the policy more explicit by moving this logic to
a new function that decides whether to process msg_control and that
optionally prepares the necessary fields in skb->cb[]. After this
change, the IPv4 and IPv6 paths are more similar.

The last case is rxrpc. Here, simply refine to only match timestamps.

Fixes: 49ca0d8bfa ("net-timestamp: no-payload option")

Reported-by: Jan Niehusmann <jan@gondor.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  v1->v2
  - fix local origin test inversion in ip6_datagram_support_cmsg
  - make v4 and v6 code paths more similar by introducing analogous
    ipv4_datagram_support_cmsg
  - fix compile bug in rxrpc
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-08 23:01:54 -04:00