Commit Graph

44719 Commits

Author SHA1 Message Date
Erik Hugne
9871b27f67 tipc: fix random link reset problem
In the function tipc_sk_rcv(), the stack variable 'err'
is only initialized to TIPC_ERR_NO_PORT for the first
iteration over the link input queue. If a chain of messages
are received from a link, failure to lookup the socket for
any but the first message will cause the message to bounce back
out on a random link.
We fix this by properly initializing err.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-23 11:50:34 -04:00
Ying Xue
def81f69bf tipc: fix topology server broken issue
When a new topology server is launched in a new namespace, its
listening socket is inserted into the "init ns" namespace's socket
hash table rather than the one owned by the new namespace. Although
the socket's namespace is forcedly changed to the new namespace later,
the socket is still stored in the socket hash table of "init ns"
namespace. When a client created in the new namespace connects
its own topology server, the connection is failed as its server's
socket could not be found from its own namespace's socket table.

If __sock_create() instead of original sock_create_kern() is used
to create the server's socket through specifying an expected namesapce,
the socket will be inserted into the specified namespace's socket
table, thereby avoiding to the topology server broken issue.

Fixes: 76100a8a64 ("tipc: fix netns refcnt leak")

Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-23 11:50:34 -04:00
Johannes Berg
ebd82b39bf mac80211: make station hash table max_size configurable
Allow debug builds to configure the station hash table maximum
size in order to run with hash collisions in limited scenarios
such as hwsim testing. The default remains 0 which effectively
means no limit.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-23 17:26:06 +02:00
Johannes Berg
60f4b626d5 mac80211: fix rhashtable conversion
My conversion of the mac80211 station hash table to rhashtable
completely broke the lookup in sta_info_get() as it no longer
took into account the virtual interface. Fix that.

Fixes: 7bedd0cfad ("mac80211: use rhashtable for station table")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-23 16:40:40 +02:00
Li RongQing
f31e8d4f7b xfrm: fix the return code when xfrm_*_register_afinfo failed
If xfrm_*_register_afinfo failed since xfrm_*_afinfo[afinfo->family] had the
value, return the -EEXIST, not -ENOBUFS

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-04-23 11:37:52 +02:00
Li RongQing
800777026e xfrm: optimise the use of walk list header in xfrm_policy/state_walk
The walk from input is the list header, and marked as dead, and will
be skipped in loop.

list_first_entry() can be used to return the true usable value from
walk if walk is not empty

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-04-23 11:36:06 +02:00
Li RongQing
1ee5e6676b xfrm: remove the xfrm_queue_purge definition
The task of xfrm_queue_purge is same as skb_queue_purge, so remove it

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-04-23 11:35:47 +02:00
Herbert Xu
8a1a2b717e mac802154: Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.

This patch also removes a bogus inclusion of algapi.h which should
only be used by algorithm/driver implementors and not crypto users.

Instead linux/crypto.h is added which is necessary because mac802154
also uses blkcipher in addition to aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
2015-04-23 14:18:11 +08:00
Herbert Xu
d8fe0ddd02 mac80211: Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
2015-04-23 14:18:11 +08:00
Eric Dumazet
79930f5892 net: do not deplete pfmemalloc reserve
build_skb() should look at the page pfmemalloc status.
If set, this means page allocator allocated this page in the
expectation it would help to free other pages. Networking
stack can do that only if skb->pfmemalloc is also set.

Also, we must refrain using high order pages from the pfmemalloc
reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for
them. Under memory pressure, using order-0 pages is probably the best
strategy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22 16:24:59 -04:00
Johannes Berg
26349c71b4 ip6_gre: use netdev_alloc_pcpu_stats()
The code there just open-codes the same, so use the provided macro instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22 15:39:05 -04:00
Linus Torvalds
1204c46445 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "This time around we have a collection of CephFS fixes from Zheng
  around MDS failure handling and snapshots, support for a new CRUSH
  straw2 algorithm (to sync up with userspace) and several RBD cleanups
  and fixes from Ilya, an error path leak fix from Taesoo, and then an
  assorted collection of cleanups from others"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits)
  rbd: rbd_wq comment is obsolete
  libceph: announce support for straw2 buckets
  crush: straw2 bucket type with an efficient 64-bit crush_ln()
  crush: ensuring at most num-rep osds are selected
  crush: drop unnecessary include from mapper.c
  ceph: fix uninline data function
  ceph: rename snapshot support
  ceph: fix null pointer dereference in send_mds_reconnect()
  ceph: hold on to exclusive caps on complete directories
  libceph: simplify our debugfs attr macro
  ceph: show non-default options only
  libceph: expose client options through debugfs
  libceph, ceph: split ceph_show_options()
  rbd: mark block queue as non-rotational
  libceph: don't overwrite specific con error msgs
  ceph: cleanup unsafe requests when reconnecting is denied
  ceph: don't zero i_wrbuffer_ref when reconnecting is denied
  ceph: don't mark dirty caps when there is no auth cap
  ceph: keep i_snap_realm while there are writers
  libceph: osdmap.h: Add missing format newlines
  ...
2015-04-22 11:30:10 -07:00
Robert Shearman
5a9ab01761 mpls: Prevent use of implicit NULL label as outgoing label
The reserved implicit-NULL label isn't allowed to appear in the label
stack for packets, so make it an error for the control plane to
specify it as an outgoing label.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22 14:24:54 -04:00
Robert Shearman
37bde79979 mpls: Per-device enabling of packet input
An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, the default state is changed to MPLS being
disabled on all interfaces unless explicitly enabled and no global
option is provided to change the default. Whilst this differs from
other protocols (e.g. IPv6), network operators are used to explicitly
enabling MPLS forwarding on interfaces, and with the number of links
to the MPLS core typically fairly low this doesn't present too much of
a burden on operators.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22 14:24:54 -04:00
Robert Shearman
03c57747a7 mpls: Per-device MPLS state
Add per-device MPLS state to supported interfaces. Use the presence of
this state in mpls_route_add to determine that this is a supported
interface.

Use the presence of mpls_dev to drop packets that arrived on an
unsupported interface - previously they were allowed through.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22 14:24:54 -04:00
Eric Dumazet
d83769a580 tcp: fix possible deadlock in tcp_send_fin()
Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
case a huge process is killed by OOM, and tcp_mem[2] is hit.

To be able to free memory we need to make progress, so this
patch allows FIN packets to not care about tcp_mem[2], if
skb allocation succeeded.

In a follow-up patch, we might abort tcp_send_fin() infinite loop
in case TIF_MEMDIE is set on this thread, as memory allocator
did its best getting extra memory already.

This patch reverts d22e153718 ("tcp: fix tcp fin memory accounting")

Fixes: d22e153718 ("tcp: fix tcp fin memory accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22 14:13:11 -04:00
Ilya Dryomov
958a27658d crush: straw2 bucket type with an efficient 64-bit crush_ln()
This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed.  Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated.  The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
  (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
   to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                          32a1ead92efcd351822d22a5fc37d159c65c1338,
                          6289912418c4a3597a11778bcf29ed5415117ad9,
                          35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                          6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                          b5921d55d16796e12d66ad2c4add7305f9ce2353.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-22 18:33:43 +03:00
Ilya Dryomov
45002267e8 crush: ensuring at most num-rep osds are selected
Crush temporary buffers are allocated as per replica size configured
by the user.  When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash.  Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.

Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
                          234b066ba04976783d15ff2abc3e81b6cc06fb10.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-22 18:33:42 +03:00
Ilya Dryomov
9be6df215a crush: drop unnecessary include from mapper.c
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-22 18:33:42 +03:00
Thomas Gleixner
46ac2f53df net: core: pktgen: Remove bogus hrtimer_active() check
The check for hrtimer_active() after starting the timer is
pointless. If the timer is inactive it has expired already and
therefor the task pointer is already NULL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20150414203503.165258315@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
1e3176885c net: sched: Use hrtimer_resolution instead of hrtimer_get_res()
No point in converting a timespec now that the value is directly
accessible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Link: http://lkml.kernel.org/r/20150414203500.720623028@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:48 +02:00
Johannes Berg
80616c0db8 mac80211: allow segmentation offloads
Implement the necessary software segmentation on the normal
TX path so that fast-xmit can use segmentation offload if
the hardware (or driver) supports it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 11:25:30 +02:00
Johannes Berg
680a0daba7 mac80211: allow drivers to support S/G
If drivers want to support S/G (really just gather DMA on TX) then
we can now easily support this on the fast-xmit path since it just
needs to write to the ethernet header (and already has a check for
that being possible.)

However, disallow this on the regular TX path (which has to handle
fragmentation, software crypto, etc.) by calling skb_linearize().

Also allow the related HIGHDMA since that's not interesting to the
code in mac80211 at all anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 11:25:30 +02:00
Johannes Berg
2d981fddb0 mac80211: allow checksum offload only in fast-xmit
When we go through the complete TX processing, there are a number
of things like fragmentation and software crypto that require the
checksum to be calculated already.

In favour of maintainability, instead of adding the necessary call
to skb_checksum_help() in all the places that need it, just do it
once before the regular TX processing.

Right now this only affects the TI wlcore and QCA ath10k drivers
since they're the only ones using checksum offload. The previous
commits enabled fast-xmit for them in almost all cases.

For wlcore this even fixes a corner case: when a key fails to be
programmed to hardware software encryption gets used, encrypting
frames with a bad checksum.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 11:25:29 +02:00
Johannes Berg
3ffd884012 mac80211: extend fast-xmit to cover IBSS
IBSS can be supported very easily since it uses the standard station
authorization state etc. so it just needs to be covered by the header
building switch statement.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 10:07:38 +02:00
Johannes Berg
e495c24731 mac80211: extend fast-xmit for more ciphers
When crypto is offloaded then in some cases it's all handled
by the device, and in others only some space for the IV must
be reserved in the frame. Handle both of these cases in the
fast-xmit path, up to a limit of 18 bytes of space for IVs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 10:07:37 +02:00
Johannes Berg
725b812c83 mac80211: extend fast-xmit to driver fragmentation
If the driver handles fragmentation then it wouldn't
be done in software so we can still use the fast-xmit
path in that case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 10:07:36 +02:00
Johannes Berg
17c18bf880 mac80211: add TX fastpath
In order to speed up mac80211's TX path, add the "fast-xmit" cache
that will cache the data frame 802.11 header and other data to be
able to build the frame more quickly. This cache is rebuilt when
external triggers imply changes, but a lot of the checks done per
packet today are simplified away to the check for the cache.

There's also a more detailed description in the code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-22 10:02:25 +02:00
Linus Torvalds
8aaa51b63c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Just a few fixes trickling in at this point.

  1) If we see an attached socket on an skb in the ipv4 forwarding path,
     bail.  This can happen due to races with FIB rule addition, and
     deletion, and we should just drop such frames.  From Sebastian
     Pöhn.

  2) pppoe receive should only accept packets destined for this hosts's
     MAC address.  From Joakim Tjernlund.

  3) Handle checksum unwrapping properly in ppp receive properly when
     it's encapsulated in UDP in some way, fix from Tom Herbert.

  4) Fix some bugs in mv88e6xxx DSA driver resulting from the conversion
     from register offset constants to mnenomic macros.  From Vivien
     Didelot.

  5) Fix handling of HCA max message size in mlx4 adapters, from Eran
     Ben ELisha"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP
  tcp: add memory barriers to write space paths
  altera tse: Error-Bit on tx-avalon-stream always set.
  net: dsa: mv88e6xxx: use PORT_DEFAULT_VLAN
  net: dsa: mv88e6xxx: fix setup of port control 1
  ppp: call skb_checksum_complete_unset in ppp_receive_frame
  net: add skb_checksum_complete_unset
  pppoe: Lacks DST MAC address check
  ip_forward: Drop frames with attached skb->sk
2015-04-21 22:37:27 -07:00
jbaron@akamai.com
3c7151275c tcp: add memory barriers to write space paths
Ensure that we either see that the buffer has write space
in tcp_poll() or that we perform a wakeup from the input
side. Did not run into any actual problem here, but thought
that we should make things explicit.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-21 15:57:34 -04:00
Sebastian Pöhn
2ab957492d ip_forward: Drop frames with attached skb->sk
Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment

Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-20 14:07:33 -04:00
Ilya Dryomov
5cf7bd3012 libceph: expose client options through debugfs
Add a client_options attribute for showing libceph options.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:39 +03:00
Ilya Dryomov
ff40f9ae95 libceph, ceph: split ceph_show_options()
Split ceph_show_options() into two pieces and move the piece
responsible for printing client (libceph) options into net/ceph.  This
way people adding a libceph option wouldn't have to remember to update
code in fs/ceph.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:38 +03:00
Ilya Dryomov
67c64eb742 libceph: don't overwrite specific con error msgs
- specific con->error_msg messages (e.g. "protocol version mismatch")
  end up getting overwritten by a catch-all "socket error on read
  / write", introduced in commit 3a140a0d5c ("libceph: report socket
  read/write error message")
- "bad message sequence # for incoming message" loses to "bad crc" due
  to the fact that -EBADMSG is used for both

Fix it, and tidy up con->error_msg assignments and pr_errs while at it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:37 +03:00
Johannes Berg
35c347ac53 mac80211: lock rate control
Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
control aren't properly taking concurrency into account. It's
likely that the same is true for other rate control algorithms.

In the case of minstrel this manifests itself in crashes when an
update and other data access are run concurrently, for example
when the stations change bandwidth or similar. In iwlwifi, this
can cause firmware crashes.

Since fixing all rate control algorithms will be very difficult,
just provide locking for invocations. This protects the internal
data structures the algorithms maintain.

I've manipulated hostapd to test this, by having it change its
advertised bandwidth roughly ever 150ms. At the same time, I'm
running a flood ping between the client and the AP, which causes
this race of update vs. get_rate/status to easily happen on the
client. With this change, the system survives this test.

Reported-by: Sven Eckelmann <sven@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-20 13:05:29 +02:00
Bob Copeland
48bf6beddf mac80211: introduce plink lock for plink fields
The mesh plink code uses sta->lock to serialize access to the
plink state fields between the peer link state machine and the
peer link timer.  Some paths (e.g. those involving
mps_qos_null_tx()) unfortunately hold this spinlock across
frame tx, which is soon to be disallowed.  Add a new spinlock
just for plink access.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-20 13:05:29 +02:00
Johannes Berg
5eb8f4d742 mac80211: don't warn when stopping VLAN with stations
Stations assigned to an AP_VLAN type interface are flushed
when the interface is stopped, but then we warn about it.
Suppress the warning since there's nothing else that would
ensure those stations are already removed at this point.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-04-20 13:04:39 +02:00
Linus Torvalds
dba94f2155 Merge tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9pfs updates from Eric Van Hensbergen:
 "Some accumulated cleanup patches for kerneldoc and unused variables as
  well as some lock bug fixes and adding privateport option for RDMA"

* tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  net/9p: add a privport option for RDMA transport.
  fs/9p: Initialize status in v9fs_file_do_lock.
  net/9p: Initialize opts->privport as it should be.
  net/9p: use memcpy() instead of snprintf() in p9_mount_tag_show()
  9p: use unsigned integers for nwqid/count
  9p: do not crash on unknown lock status code
  9p: fix error handling in v9fs_file_do_lock
  9p: remove unused variable in p9_fd_create()
  9p: kerneldoc warning fixes
2015-04-18 17:45:30 -04:00
Marcel Holtmann
1f5014d6a7 Bluetooth: hidp: Fix regression with older userspace and flags validation
While it is not used by newer userspace anymore, the older userspace was
utilizing HIDP_VIRTUAL_CABLE_UNPLUG and HIDP_BOOT_PROTOCOL_MODE flags
when adding a new HIDP connection.

The flags validation is important, but we can not break older userspace
and with that allow providing these flags even if newer userspace does
not use them anymore.

Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-18 11:01:08 -04:00
Linus Torvalds
388f997620 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix verifier memory corruption and other bugs in BPF layer, from
    Alexei Starovoitov.

 2) Add a conservative fix for doing BPF properly in the BPF classifier
    of the packet scheduler on ingress.  Also from Alexei.

 3) The SKB scrubber should not clear out the packet MARK and security
    label, from Herbert Xu.

 4) Fix oops on rmmod in stmmac driver, from Bryan O'Donoghue.

 5) Pause handling is not correct in the stmmac driver because it
    doesn't take into consideration the RX and TX fifo sizes.  From
    Vince Bridgers.

 6) Failure path missing unlock in FOU driver, from Wang Cong.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  net: dsa: use DEVICE_ATTR_RW to declare temp1_max
  netns: remove BUG_ONs from net_generic()
  IB/ipoib: Fix ndo_get_iflink
  sfc: Fix memcpy() with const destination compiler warning.
  altera tse: Fix network-delays and -retransmissions after high throughput.
  net: remove unused 'dev' argument from netif_needs_gso()
  act_mirred: Fix bogus header when redirecting from VLAN
  inet_diag: fix access to tcp cc information
  tcp: tcp_get_info() should fetch socket fields once
  net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()
  skbuff: Do not scrub skb mark within the same name space
  Revert "net: Reset secmark when scrubbing packet"
  bpf: fix two bugs in verification logic when accessing 'ctx' pointer
  bpf: fix bpf helpers to use skb->mac_header relative offsets
  stmmac: Configure Flow Control to work correctly based on rxfifo size
  stmmac: Enable unicast pause frame detect in GMAC Register 6
  stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree
  stmmac: Add defines and documentation for enabling flow control
  stmmac: Add properties for transmit and receive fifo sizes
  stmmac: fix oops on rmmod after assigning ip addr
  ...
2015-04-17 16:31:08 -04:00
Vivien Didelot
e3122b7fae net: dsa: use DEVICE_ATTR_RW to declare temp1_max
Since commit da4759c (sysfs: Use only return value from is_visible for
the file mode), it is possible to reduce the permissions of a file.

So declare temp1_max with the DEVICE_ATTR_RW macro and remove the write
permission in dsa_hwmon_attrs_visible if set_temp_limit isn't provided.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17 15:58:37 -04:00
Johannes Berg
8b86a61da3 net: remove unused 'dev' argument from netif_needs_gso()
In commit 04ffcb255f ("net: Add ndo_gso_check") Tom originally
added the 'dev' argument to be able to call ndo_gso_check().

Then later, when generalizing this in commit 5f35227ea3
("net: Generalize ndo_gso_check to ndo_features_check")
Jesse removed the call to ndo_gso_check() in netif_needs_gso()
by calling the new ndo_features_check() in a different place.
This made the 'dev' argument unused.

Remove the unused argument and go back to the code as before.

Cc: Tom Herbert <therbert@google.com>
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17 13:29:41 -04:00
Herbert Xu
f40ae91307 act_mirred: Fix bogus header when redirecting from VLAN
When you redirect a VLAN device to any device, you end up with
crap in af_packet on the xmit path because hard_header_len is
not equal to skb->mac_len.  So the redirected packet contains
four extra bytes at the start which then gets interpreted as
part of the MAC address.

This patch fixes this by only pushing skb->mac_len.  We also
need to fix ifb because it tries to undo the pushing done by
act_mirred.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17 13:29:28 -04:00
Eric Dumazet
521f1cf1db inet_diag: fix access to tcp cc information
Two different problems are fixed here :

1) inet_sk_diag_fill() might be called without socket lock held.
   icsk->icsk_ca_ops can change under us and module be unloaded.
   -> Access to freed memory.
   Fix this using rcu_read_lock() to prevent module unload.

2) Some TCP Congestion Control modules provide information
   but again this is not safe against icsk->icsk_ca_ops
   change and nla_put() errors were ignored. Some sockets
   could not get the additional info if skb was almost full.

Fix this by returning a status from get_info() handlers and
using rcu protection as well.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17 13:28:31 -04:00
Eric Dumazet
fad9dfefea tcp: tcp_get_info() should fetch socket fields once
tcp_get_info() can be called without holding socket lock,
so any socket fields can change under us.

Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate

Fixes: 977cb0ecf8 ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17 13:28:31 -04:00
Herbert Xu
213dd74aee skbuff: Do not scrub skb mark within the same name space
On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
> Le 15/04/2015 15:57, Herbert Xu a écrit :
> >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
> [snip]
> >Subject: skbuff: Do not scrub skb mark within the same name space
> >
> >The commit ea23192e8e ("tunnels:
> Maybe add a Fixes tag?
> Fixes: ea23192e8e ("tunnels: harmonize cleanup done on skb on rx path")
>
> >harmonize cleanup done on skb on rx path") broke anyone trying to
> >use netfilter marking across IPv4 tunnels.  While most of the
> >fields that are cleared by skb_scrub_packet don't matter, the
> >netfilter mark must be preserved.
> >
> >This patch rearranges skb_scurb_packet to preserve the mark field.
> nit: s/scurb/scrub
>
> Else it's fine for me.

Sure.

PS I used the wrong email for James the first time around.  So
let me repeat the question here.  Should secmark be preserved
or cleared across tunnels within the same name space? In fact,
do our security models even support name spaces?

---8<---
The commit ea23192e8e ("tunnels:
harmonize cleanup done on skb on rx path") broke anyone trying to
use netfilter marking across IPv4 tunnels.  While most of the
fields that are cleared by skb_scrub_packet don't matter, the
netfilter mark must be preserved.

This patch rearranges skb_scrub_packet to preserve the mark field.

Fixes: ea23192e8e ("tunnels: harmonize cleanup done on skb on rx path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-16 14:20:40 -04:00
Herbert Xu
4c0ee414e8 Revert "net: Reset secmark when scrubbing packet"
This patch reverts commit b8fb4e0648
because the secmark must be preserved even when a packet crosses
namespace boundaries.  The reason is that security labels apply to
the system as a whole and is not per-namespace.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-16 14:19:02 -04:00
Alexei Starovoitov
a166151cbe bpf: fix bpf helpers to use skb->mac_header relative offsets
For the short-term solution, lets fix bpf helper functions to use
skb->mac_header relative offsets instead of skb->data in order to
get the same eBPF programs with cls_bpf and act_bpf work on ingress
and egress qdisc path. We need to ensure that mac_header is set
before calling into programs. This is effectively the first option
from below referenced discussion.

More long term solution for LD_ABS|LD_IND instructions will be more
intrusive but also more beneficial than this, and implemented later
as it's too risky at this point in time.

I.e., we plan to look into the option of moving skb_pull() out of
eth_type_trans() and into netif_receive_skb() as has been suggested
as second option. Meanwhile, this solution ensures ingress can be
used with eBPF, too, and that we won't run into ABI troubles later.
For dealing with negative offsets inside eBPF helper functions,
we've implemented bpf_skb_clone_unwritable() to test for unwriteable
headers.

Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
Fixes: 608cd71a9c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3 ("tc: bpf: add checksum helpers")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-16 14:08:49 -04:00
Wei Yongjun
5a950ad58d netns: remove duplicated include from net_namespace.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-16 12:14:24 -04:00
WANG Cong
540207ae69 fou: avoid missing unlock in failure path
Fixes: 7a6c8c34e5 ("fou: implement FOU_CMD_GET")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-16 12:11:19 -04:00