Commit Graph

44719 Commits

Author SHA1 Message Date
Eric W. Biederman
d815d90bbb netfilter: Push struct net down into nf_afinfo.reroute
The network namespace is needed when routing a packet.
Stop making nf_afinfo.reroute guess which network namespace
is the proper namespace to route the packet in.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-29 20:21:31 +02:00
Eric W. Biederman
372892ec11 ipv4: Push struct net down into nf_send_reset
This is needed so struct net can be pushed down into
ip_route_me_harder.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-29 20:21:31 +02:00
Steve Wise
c91aed9896 svcrdma: handle rdma read with a non-zero initial page offset
The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
were not taking into account the initial page_offset when determining
the rdma read length.  This resulted in a read who's starting address
and length exceeded the base/bounds of the frmr.

The server gets an async error from the rdma device and kills the
connection, and the client then reconnects and resends.  This repeats
indefinitely, and the application hangs.

Most work loads don't tickle this bug apparently, but one test hit it
every time: building the linux kernel on a 16 core node with 'make -j
16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

This bug seems to only be tripped with devices having small fastreg page
list depths.  I didn't see it with mlx4, for instance.

Fixes: 0bf4828983 ('svcrdma: refactor marshalling logic')
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-09-29 12:55:44 -04:00
Johannes Berg
076cdcb12f mac80211: use bool argument to ieee80211_send_nullfunc
Instead of int with 0/1, use bool with false/true for the
powersave argument to ieee80211_send_nullfunc().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:52 +02:00
Johannes Berg
90d13e8f5b mac80211: reduce indentation by inlining a check
Instead of nesting two if statements, inline the second
check into the first if statement and to indentation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:52 +02:00
Felix Fietkau
50f36ae61a mac80211: fix tx sequence number assignment with software queue + fast-xmit
When using software queueing, tx sequence number assignment happens at
ieee80211_tx_dequeue time, so the fast-xmit codepath must not do that.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:51 +02:00
Ayala Beker
44674d9c22 mac80211: advertise support for full station state in AP mode
This enables adding stations in unauthenticated mode, just after
receiving the first authentication frame; which in turn allows
sending a negative authentication reply if the station cannot be
added.

In addition init rate control for unassociated station only when
it becomes associated, prior to that low rates will be used.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:51 +02:00
Ayala Beker
47edb11b52 cfg80211: allow changing station capabilities for unassociated stations
Currently, cfg80211 rejects capability updates for existing entries
and as a result it's impossible to update entries that were added
unassociated, but that is necessary to go through the full station
states from userspace, adding a station before authentication etc.

Fix this by allowing updates to capabilities for stations that the
driver (or mac80211) assigned unassociated state. Drivers setting
the full station state support flag must use the new station type
for proper operation.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:50 +02:00
Johannes Berg
d0a77c6569 mac80211: allow writing TX PN in debugfs
For certain tests, for example replay detection, it can be useful
to be able to influence/set the PN used in outgoing packets. Make
it possible to change the TX PN in debugfs.

For now, this doesn't support TKIP since I haven't needed it, but
there's no reason it couldn't be added if necessary.

Note that this must be used very carefully: it could, for example,
be used to make "valid replays" where the PN reuse happens on a
different TID. This couldn't be done by an attacker since the TID
is protected as part of the AAD.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:50 +02:00
Denys Vlasenko
416eb9fc29 mac80211: Deinline drv_get/set/reset_tsf()
With this .config: http://busybox.net/~vda/kernel_config_ALLYES_Os,
after deinlining these functions have sizes and callsite counts
as follows:

drv_get_tsf: 634 bytes, 6 calls
drv_set_tsf: 626 bytes, 2 calls
drv_reset_tsf: 617 bytes, 2 calls

Total size reduction is about 4.2 kbytes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Johannes Berg <johannes.berg@intel.com>
CC: John Linville <linville@tuxdriver.com>
CC: Michal Kazior <michal.kazior@tieto.com>
CC: linux-wireless@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:49 +02:00
Denys Vlasenko
6db9683897 mac80211: Deinline drv_ampdu_action()
With this .config: http://busybox.net/~vda/kernel_config_ALLYES_Os,
after deinlining the function size is 755 bytes and there are
6 callsites.

Total size reduction is about 3.3 kbytes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Johannes Berg <johannes.berg@intel.com>
CC: John Linville <linville@tuxdriver.com>
CC: Michal Kazior <michal.kazior@tieto.com>
CC: linux-wireless@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:49 +02:00
Denys Vlasenko
42677ed33a mac80211: Deinline drv_switch_vif_chanctx()
With this .config: http://busybox.net/~vda/kernel_config_ALLYES_Os,
after deinlining the function size is 821 bytes and there are
2 callsites, reducing code size by about 800 bytes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Johannes Berg <johannes.berg@intel.com>
CC: John Linville <linville@tuxdriver.com>
CC: Michal Kazior <michal.kazior@tieto.com>
CC: linux-wireless@vger.kernel.org
CC: linux-kernel@vger.kernel.org
[adjust code-style a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:48 +02:00
Johannes Berg
0e5c371aa0 mac80211: improve __rate_control_send_low warning
If there are no supported rates in the rate mask with the required
flags, we warn, but it's not clear which part causes the warning.

Add the relevant data to the warning to understand why it happens.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:48 +02:00
Johannes Berg
a0c391b134 mac80211: minstrel[_ht]: remove non-ascii debugfs characters
Replace the average symbol by "avg" to avoid being warned about the
non-ASCII symbol all the time, line up the columns properly.

(I changed my mind - the warnings are getting annoying)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:47 +02:00
Eliad Peller
2739271954 mac80211: don't tear down aggregation on suspend in case of wowlan->any
In case of "any" wowlan trigger, there is no reason to tear down
aggregations, as we want the device to continue working normally.

Similarly, there's no reason to tear down aggregations on resume,
as they should have been torn down on suspend if needed.
However, since the reconfiguration flow is shared with HW restart,
tear down aggregations on reconfiguration when we are not resuming.

To keep things working after non-wowlan suspend, keep clearing the
WLAN_STA_BLOCK_BA flag.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:47 +02:00
Fu, Zhonghui
9f0e13546e net/wireless: enable wiphy device to suspend/resume asynchronously
Now, PM core supports asynchronous suspend/resume mode for devices
during system suspend/resume, and the power state transition of one
device may be completed in separate kernel thread. PM core ensures
all power state transition timing dependency between devices. This
patch enables wiphy device to suspend/resume asynchronously. This can
take advantage of multicore and improve system suspend/resume speed.

Signed-off-by: Zhonghui Fu <zhonghui.fu@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:46 +02:00
Denys Vlasenko
9aae296a62 mac80211: Deinline drv_add/remove/change_interface()
With this .config: http://busybox.net/~vda/kernel_config_ALLYES_Os,
after deinlining these functions have sizes and callsite counts
as follows:

drv_add_interface: 638 bytes, 5 calls
drv_remove_interface: 611 bytes, 6 calls
drv_change_interface: 658 bytes, 1 call

Total size reduction is about 9 kbytes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: John Linville <linville@tuxdriver.com>
CC: Michal Kazior <michal.kazior@tieto.com>
CC: Johannes Berg <johannes.berg@intel.com>
CC: linux-wireless@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:46 +02:00
Denys Vlasenko
4fbd572c29 mac80211: Deinline drv_sta_rc_update()
With this .config: http://busybox.net/~vda/kernel_config_ALLYES_Os,
after deinlining the function size is 706 bytes and there are
2 callsites, reducing code size by about 700 bytes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: John Linville <linville@tuxdriver.com>
CC: Michal Kazior <michal.kazior@tieto.com>
CC: Johannes Berg <johannes.berg@intel.com>
CC: linux-wireless@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:45 +02:00
Denys Vlasenko
b23dcd4aca mac80211: Deinline drv_conf_tx()
With this .config: http://busybox.net/~vda/kernel_config_ALLYES_Os,
after deinlining the function size is 785 bytes and there are
7 callsites.

Total size reduction is about 3.5 kbytes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: John Linville <linville@tuxdriver.com>
CC: Michal Kazior <michal.kazior@tieto.com>
CC: Johannes Berg <johannes.berg@intel.com>
CC: linux-wireless@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:45 +02:00
Helmut Schaa
35afa58862 mac80211: Copy tx'ed beacons to monitor mode
When debugging wireless powersave issues on the AP side it's quite helpful
to see our own beacons that are transmitted by the hardware/driver. However,
this is not that easy since beacons don't pass through the regular TX queues.

Preferably drivers would call ieee80211_tx_status also for tx'ed beacons
but that's not always possible. Hence, just send a copy of each beacon
generated by ieee80211_beacon_get_tim to monitor devices when they are
getting fetched by the driver.

Also add a HW flag IEEE80211_HW_BEACON_TX_STATUS that can be used by
drivers to indicate that they report TX status for beacons.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
(with a fix from Christian Lamparted rolled in)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-29 15:56:33 +02:00
Loic Poulain
fbef168fec Bluetooth: Add hci_cmd_sync function
Send a HCI command and wait for command complete event.
This function serializes the requests by grabbing the req_lock.

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-29 15:16:11 +02:00
Viresh Kumar
b5ffe63442 net: Drop unlikely before IS_ERR(_OR_NULL)
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
is no need to do that again from its callers. Drop it.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-29 15:15:40 +02:00
Michael Rossberg
4e077237cf xfrm: Fix state threshold configuration from userspace
Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
timer (XFRMA_ETIMER_THRESH) of a state without having to set other
attributes like replay counter and byte lifetime. Changing these other
values while traffic flows will break the state.

Signed-off-by: Michael Rossberg <michael.rossberg@tu-ilmenau.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-09-29 11:45:55 +02:00
Steffen Klassert
c386578f1c xfrm: Let the flowcache handle its size by default.
The xfrm flowcache size is limited by the flowcache limit
(4096 * number of online cpus) and the xfrm garbage collector
threshold (2 * 32768), whatever is reached first. This means
that we can hit the garbage collector limit only on systems
with more than 16 cpus. On such systems we simply refuse
new allocations if we reach the limit, so new flows are dropped.
On syslems with 16 or less cpus, we hit the flowcache limit.
In this case, we shrink the flow cache instead of refusing new
flows.

We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.

The xfrm garbage collector threshold can still be set below
the flowcache limit to reduce the memory usage of the flowcache.

Tested-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-09-29 11:44:16 +02:00
Denys Vlasenko
2103d6b818 net: sctp: Don't use 64 kilobyte lookup table for four elements
Seemingly innocuous sctp_trans_state_to_prio_map[] array
is way bigger than it looks, since
"[SCTP_UNKNOWN] = 2" expands into "[0xffff] = 2" !

This patch replaces it with switch() statement.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
CC: linux-sctp@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28 22:52:21 -07:00
Jesper Dangaard Brouer
8a4683a5e0 net: help compiler generate better code in eth_get_headlen
Noticed that the compiler (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC))
generated suboptimal assembler code in eth_get_headlen().

This early return coding style is usually not an issue, on super scalar CPUs,
but the compiler choose to put the return statement after this very unlikely
branch, thus creating larger jump down to the likely code path.

Performance wise, I could measure slightly less L1-icache-load-misses
and less branch-misses, and an improvement of 1 nanosec with an IP-forwarding
use-case with 257 bytes packets with ixgbe (CPU i7-4790K @ 4.00GHz).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28 22:51:15 -07:00
Alexander Couzens
06a15f51cf l2tp: protect tunnel->del_work by ref_count
There is a small chance that tunnel_free() is called before tunnel->del_work scheduled
resulting in a zero pointer dereference.

Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28 22:39:10 -07:00
Bendik Rønning Opstad
d2e1339f40 tcp: Fix CWV being too strict on thin streams
Application limited streams such as thin streams, that transmit small
amounts of payload in relatively few packets per RTT, can be prevented
from growing the CWND when in congestion avoidance. This leads to
increased sojourn times for data segments in streams that often transmit
time-dependent data.

Currently, a connection is considered CWND limited only after having
successfully transmitted at least one packet with new data, while at the
same time failing to transmit some unsent data from the output queue
because the CWND is full. Applications that produce small amounts of
data may be left in a state where it is never considered to be CWND
limited, because all unsent data is successfully transmitted each time
an incoming ACK opens up for more data to be transmitted in the send
window.

Fix by always testing whether the CWND is fully used after successful
packet transmissions, such that a connection is considered CWND limited
whenever the CWND has been filled. This is the correct behavior as
specified in RFC2861 (section 3.1).

Cc: Andreas Petlund <apetlund@simula.no>
Cc: Carsten Griwodz <griff@simula.no>
Cc: Jonas Markussen <jonassm@ifi.uio.no>
Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Cc: Mads Johannessen <madsjoh@ifi.uio.no>
Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28 22:36:30 -07:00
David Ahern
17fb0b2b90 net: Remove redundant oif checks in rt6_device_match
The oif has already been checked that it is non-zero; the 2 additional
checks on oif within that if (oif) {...} block are redundant.

CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28 22:30:24 -07:00
Eric Dumazet
7c85af8810 tcp: avoid reorders for TFO passive connections
We found that a TCP Fast Open passive connection was vulnerable
to reorders, as the exchange might look like

[1] C -> S S <FO ...> <request>
[2] S -> C S. ack request <options>
[3] S -> C . <answer>

packets [2] and [3] can be generated at almost the same time.

If C receives the 3rd packet before the 2nd, it will drop it as
the socket is in SYN_SENT state and expects a SYNACK.

S will have to retransmit the answer.

Current OOO avoidance in linux is defeated because SYNACK
packets are attached to the LISTEN socket, while DATA packets
are attached to the children. They might be sent by different cpus,
and different TX queues might be selected.

It turns out that for TFO, we created a child, which is a
full blown socket in TCP_SYN_RECV state, and we simply can attach
the SYNACK packet to this socket.

This means that at the time tcp_sendmsg() pushes DATA packet,
skb->ooo_okay will be set iff the SYNACK packet had been sent
and TX completed.

This removes the reorder source at the host level.

We also removed the export of tcp_try_fastopen(), as it is no
longer called from IPv6.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28 22:11:19 -07:00
Karl Heiss
635682a144 sctp: Prevent soft lockup when sctp_accept() is called during a timeout event
A case can occur when sctp_accept() is called by the user during
a heartbeat timeout event after the 4-way handshake.  Since
sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
the listening socket but released with the new association socket.
The result is a deadlock on any future attempts to take the listening
socket lock.

Note that this race can occur with other SCTP timeouts that take
the bh_lock_sock() in the event sctp_accept() is called.

 BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
 ...
 RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
 RSP: 0018:ffff880028323b20  EFLAGS: 00000206
 RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
 RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
 R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
 FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
 CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
 Stack:
 ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
 <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
 <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
 Call Trace:
 <IRQ>
 [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
 [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
 [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
 [<ffffffff81497255>] ? ip_rcv+0x275/0x350
 [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
 ...

With lockdep debugging:

 =====================================
 [ BUG: bad unlock balance detected! ]
 -------------------------------------
 CslRx/12087 is trying to release lock (slock-AF_INET) at:
 [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
 but there are no more locks to release!

 other info that might help us debug this:
 2 locks held by CslRx/12087:
 #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
 #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]

Ensure the socket taken is also the same one that is released by
saving a copy of the socket before entering the timeout event
critical section.

Signed-off-by: Karl Heiss <kheiss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28 21:03:40 -07:00
Karl Heiss
f05940e618 sctp: Whitespace fix
Fix indentation in sctp_generate_heartbeat_event.

Signed-off-by: Karl Heiss <kheiss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28 21:03:40 -07:00
Steve Wise
72c0217382 xprtrdma: disconnect and flush cqs before freeing buffers
Otherwise a FRMR completion can cause a touch-after-free crash.

In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
rpcrdma_ep_destroy().

In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-09-28 10:42:35 -04:00
Ian Wilson
34c2d9fb04 bridge: Allow forward delay to be cfgd when STP enabled
Allow bridge forward delay to be configured when Spanning Tree is enabled.

Signed-off-by: Ian Wilson <iwilson@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-27 19:09:38 -07:00
Jiri Benc
b1be00a6c3 vxlan: support both IPv4 and IPv6 sockets in a single vxlan device
For metadata based vxlan interface, open both IPv4 and IPv6 socket. This is
much more user friendly: it's not necessary to create two vxlan interfaces
and pay attention to using the right one in routing rules.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 22:40:55 -07:00
David S. Miller
4963ed48f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv4/arp.c

The net/ipv4/arp.c conflict was one commit adding a new
local variable while another commit was deleting one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 16:08:27 -07:00
Linus Torvalds
518a7cb698 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
    instead of cloning them.  From Daniel Borkmann.

 2) When converting classical BPF into eBPF, fix the setting of the
    source reg to BPF_REG_X.  From Tycho Andersen.

 3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
    Linus Lussing.

 4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.

 5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
    Roopa Prabhu.

 6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.

 7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
    Florian Westphal.

 8) Check DMA mapping errors in bna driver, from Ivan Vecera.

 9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
    misclearing of interrupt status in TX timeout handler, etc.) from
    David Woodhouse.

10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
    Hugne.

11) Fix autobind races et al. in netlink code, from Herbert Xu with
    help from Tejun Heo and others.

12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.

13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
    Eric Dumazet.

14) Fix array overruns in mac80211, from Johannes Berg and Dan
    Carpenter.

15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.

16) Fix race between poll_one_napi and napi_disable, from Neil Horman.

17) Fix byte order in geneve tunnel port config, from John W Linville.

18) Fix handling of ARP replies over lightweight tunnels, from Jiri
    Benc.

19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
    Kok and Roopa Prabhu.

20) Several reference count handling bug fixes in the PHY/MDIO layer
    from Russel King.

21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.

22) Fix crash in icmp_route_lookup(), from David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net: Fix panic in icmp_route_lookup
  net: update docbook comment for __mdiobus_register()
  ppp: fix lockdep splat in ppp_dev_uninit()
  net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
  phy: marvell: add link partner advertised modes
  net: fix net_device refcounting
  phy: add phy_device_remove()
  phy: fixed-phy: properly validate phy in fixed_phy_update_state()
  net: fix phy refcounting in a bunch of drivers
  of_mdio: fix MDIO phy device refcounting
  phy: add proper phy struct device refcounting
  phy: fix mdiobus module safety
  net: dsa: fix of_mdio_find_bus() device refcount leak
  phy: fix of_mdio_find_bus() device refcount leak
  ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
  ip6_gre: Reduce log level in ip6gre_err() to debug
  fib_rules: fix fib rule dumps across multiple skbs
  bnx2x: byte swap rss_key to comply to Toeplitz specs
  net: revert "net_sched: move tp->root allocation into fw_init()"
  lwtunnel: remove source and destination UDP port config option
  ...
2015-09-26 06:01:33 -04:00
David Ahern
bdb06cbf77 net: Fix panic in icmp_route_lookup
Andrey reported a panic:

[ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4
[ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320
[ 7249.865598] *pdpt = 0000000030f7f001 *pde = 0000000000000000
[ 7249.865637] Oops: 0000 [#1]
...
[ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.3.0-999-generic #201509220155
[ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014  08/02/2006
[ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000
[ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0
[ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
[ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00
[ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974
[ 7249.867077]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0
[ 7249.867141] Stack:
[ 7249.867165]  320310ee 00000000 00000042 320310ee 00000000 c1aeca00
f3920240 f0c69180
[ 7249.867268]  f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
e659266c f483ba54
[ 7249.867361]  8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
c16b0e00 00000064
[ 7249.867448] Call Trace:
[ 7249.867494]  [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
[ 7249.867534]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867576]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867615]  [<c16b0e00>] ? icmp_send+0xa0/0x380
[ 7249.867648]  [<c16b102f>] icmp_send+0x2cf/0x380
[ 7249.867681]  [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
[ 7249.867714]  [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
[ 7249.867746]  [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
[ 7249.867780]  [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
[nf_conntrack]
[ 7249.867838]  [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
[ 7249.867889]  [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
[ 7249.867933]  [<c16776a1>] nf_iterate+0x71/0x80
[ 7249.867970]  [<c1677715>] nf_hook_slow+0x65/0xc0
[ 7249.868002]  [<c1681811>] __ip_local_out_sk+0xc1/0xd0
[ 7249.868034]  [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0
[ 7249.868066]  [<c1681836>] ip_local_out_sk+0x16/0x30
[ 7249.868097]  [<c1684054>] ip_send_skb+0x14/0x80
[ 7249.868129]  [<c16840f4>] ip_push_pending_frames+0x34/0x40
[ 7249.868163]  [<c16844a2>] ip_send_unicast_reply+0x282/0x310
[ 7249.868196]  [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380
[ 7249.868227]  [<c16a1b63>] tcp_v4_rcv+0x323/0x990
[ 7249.868257]  [<c16776a1>] ? nf_iterate+0x71/0x80
[ 7249.868289]  [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230
[ 7249.868322]  [<c167df4c>] ip_local_deliver+0x4c/0xa0
[ 7249.868353]  [<c167dba0>] ? ip_rcv_finish+0x390/0x390
[ 7249.868384]  [<c167d88c>] ip_rcv_finish+0x7c/0x390
[ 7249.868415]  [<c167e280>] ip_rcv+0x2e0/0x420
...

Prior to the VRF change the oif was not set in the flow struct, so the
VRF support should really have only added the vrf_master_ifindex lookup.

Fixes: 613d09b30f ("net: Use VRF device index for lookups on TX")
Cc: Andrey Melnikov <temnota.am@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 21:44:02 -07:00
Eric Dumazet
1b70e977ce inet: constify inet_rtx_syn_ack() sock argument
SYNACK packets are sent on behalf on unlocked listeners
or fastopen sockets. Mark socket as const to catch future changes
that might break the assumption.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:39 -07:00
Eric Dumazet
ea3bea3a1d tcp/dccp: constify rtx_synack() and friends
This is done to make sure we do not change listener socket
while sending SYNACK packets while socket lock is not held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:39 -07:00
Eric Dumazet
802885fc04 dccp: constify dccp_make_response() socket argument
Like tcp_make_synack() the only time we might change the socket is
when calling sock_wmalloc(), which is using atomic operation to
update sk->sk_wmem_alloc

Also use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:39 -07:00
Eric Dumazet
0f935dbedc tcp: constify tcp_v{4|6}_send_synack() socket argument
This documents fact that listener lock might not be held
at the time SYNACK are sent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:39 -07:00
Eric Dumazet
1c1e9d2b67 ipv6: constify ip6_xmit() sock argument
This is to document that socket lock might not be held at this point.

skb_set_owner_w() and ipv6_local_error() are using proper atomic ops
or spinlocks, so we promote the socket to non const when calling them.

netfilter hooks should never assume socket lock is held,
we also promote the socket to non const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:38 -07:00
Eric Dumazet
5d062de7f8 tcp: constify tcp_make_synack() socket argument
listener socket is not locked when tcp_make_synack() is called.

We better make sure no field is written.

There is one exception : Since SYNACK packets are attached to the listener
at this moment (or SYN_RECV child in case of Fast Open),
sock_wmalloc() needs to update sk->sk_wmem_alloc, but this is done using
atomic operations so this is safe.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:38 -07:00
Eric Dumazet
6ac705b180 tcp: remove tcp_ecn_make_synack() socket argument
SYNACK packets might be sent without holding socket lock.

For DCTCP/ECN sake, we should call INET_ECN_xmit() while
socket lock is owned, and only when we init/change congestion control.

This also fixies a bug if congestion module is changed from
dctcp to another one on a listener : we now clear ECN bits
properly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:38 -07:00
Eric Dumazet
37bfbdda0b tcp: remove tcp_synack_options() socket argument
We do not use the socket in this function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:38 -07:00
Eric Dumazet
cfe673b0ae ip: constify ip_build_and_send_pkt() socket argument
This function is used to build and send SYNACK packets,
possibly on behalf of unlocked listener socket.

Make sure we did not miss a write by making this socket const.

We no longer can use ip_select_ident() and have to either
set iph->id to 0 or directly call __ip_select_ident()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:38 -07:00
Eric Dumazet
b83e3deb97 tcp: md5: constify tcp_md5_do_lookup() socket argument
When TCP new listener is done, these functions will be called
without socket lock being held. Make sure they don't change
anything.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:38 -07:00
Eric Dumazet
30d50c61df ipv6: constify inet6_csk_route_req() socket argument
socket is not modified, make it const so that callers can
do the same if they need.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:37 -07:00
Eric Dumazet
3aef934f4d ipv6: constify ip6_dst_lookup_{flow|tail}() sock arguments
ip6_dst_lookup_flow() and ip6_dst_lookup_tail() do not touch
socket, lets add a const qualifier.

This will permit the same change in inet6_csk_route_req()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 13:00:37 -07:00