Commit Graph

753602 Commits

Author SHA1 Message Date
Colin Ian King
a343e3f89e qedr: Fix spelling mistake: "hanlde" -> "handle"
Trivial fix to spelling mistake in DP_ERR message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:19:23 -06:00
Al Viro
cbd4a5bcb2 d_genocide: move export to definition
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:08:21 -04:00
Al Viro
42177007aa fold dentry_lock_for_move() into its sole caller and clean it up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:49 -04:00
Al Viro
076515fc92 make non-exchanging __d_move() copy ->d_parent rather than swap them
Currently d_move(from, to) does the following:
	* name/parent of from <- old name/parent of to, from hashed there
	* to is unhashed
	* name of to is preserved
	* if from used to be detached, to gets detached
	* if from used to be attached, parent of to <- old parent of from.

That's both user-visibly bogus and complicates reasoning a lot.
Much saner semantics would be
	* name/parent of from <- name/parent of to, from hashed there.
	* to is unhashed
	* name/parent of to is unchanged.

The price, of course, is that old parent of from might lose a reference.
However,
	* all potentially cross-directory callers of d_move() have both
parents pinned directly; typically, dentries themselves are grabbed
only after we have grabbed and locked both parents.  IOW, the decrement
of old parent's refcount in case of d_move() won't reach zero.
	* __d_move() from d_splice_alias() is done to detached alias.
No refcount decrements in that case
	* __d_move() from __d_unalias() *can* get the refcount to zero.
So let's grab a reference to alias' old parent before calling __d_unalias()
and dput() it after we'd dropped rename_lock.

That does make d_splice_alias() potentially blocking.  However, it has
no callers in non-sleepable contexts (and the case where we'd grown
that dget/dput pair is _very_ rare, so performance is not an issue).

Another thing that needs adjustment is unlocking in the end of __d_move();
folded it in.  And cleaned the remnants of bogus ordering from the
"lock them in the beginning" counterpart - it's never been right and
now (well, for 7 years now) we have that thing always serialized on
rename_lock anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:49 -04:00
Al Viro
a749896833 oprofilefs: don't oops on allocation failure
... just short-circuit the creation of potential children

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:48 -04:00
Al Viro
5bf1ddf7ee lustre: get rid of pointless casts to struct dentry *
... when feeding const struct dentry * to primitives taking
exactly that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:48 -04:00
Al Viro
cd1c0c9321 debugfs_lookup(): switch to lookup_one_len_unlocked()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:47 -04:00
Al Viro
a03ece5ff2 fold lookup_real() into __lookup_hash()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:47 -04:00
Al Viro
903ddaf493 take out orphan externs (empty_string/slash_string)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:46 -04:00
Al Viro
7a5cf791a7 split d_path() and friends into a separate file
Those parts of fs/dcache.c are pretty much self-contained.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:46 -04:00
Al Viro
43986d63b6 dcache.c: trim includes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:45 -04:00
John Ogness
8f04da2adb fs/dcache: Avoid a try_lock loop in shrink_dentry_list()
shrink_dentry_list() holds dentry->d_lock and needs to acquire
dentry->d_inode->i_lock. This cannot be done with a spin_lock()
operation because it's the reverse of the regular lock order.
To avoid ABBA deadlocks it is done with a trylock loop.

Trylock loops are problematic in two scenarios:

  1) PREEMPT_RT converts spinlocks to 'sleeping' spinlocks, which are
     preemptible. As a consequence the i_lock holder can be preempted
     by a higher priority task. If that task executes the trylock loop
     it will do so forever and live lock.

  2) In virtual machines trylock loops are problematic as well. The
     VCPU on which the i_lock holder runs can be scheduled out and a
     task on a different VCPU can loop for a whole time slice. In the
     worst case this can lead to starvation. Commits 47be61845c
     ("fs/dcache.c: avoid soft-lockup in dput()") and 046b961b45
     ("shrink_dentry_list(): take parent's d_lock earlier") are
     addressing exactly those symptoms.

Avoid the trylock loop by using dentry_kill(). When pruning ancestors,
the same code applies that is used to kill a dentry in dput(). This
also has the benefit that the locking order is now the same. First
the inode is locked, then the parent.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:44 -04:00
Al Viro
f657a666fd get rid of trylock loop around dentry_kill()
In case when trylock in there fails, deal with it directly in
dentry_kill().  Note that in cases when we drop and retake
->d_lock, we need to recheck whether to retain the dentry.
Another thing is that dropping/retaking ->d_lock might have
ended up with negative dentry turning into positive; that,
of course, can happen only once...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:44 -04:00
Al Viro
62d9956cef handle move to LRU in retain_dentry()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:43 -04:00
Al Viro
a338579f2f dput(): consolidate the "do we need to retain it?" into an inlined helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:43 -04:00
Al Viro
8b987a46a1 split the slow part of lock_parent() off
Turn the "trylock failed" part into uninlined __lock_parent().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:42 -04:00
Al Viro
65d8eb5a8f now lock_parent() can't run into killed dentry
all remaining callers hold either a reference or ->i_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:42 -04:00
Al Viro
3b3f09f48b get rid of trylock loop in locking dentries on shrink list
In case of trylock failure don't re-add to the list - drop the locks
and carefully get them in the right order.  For shrink_dentry_list(),
somebody having grabbed a reference to dentry means that we can
kick it off-list, so if we find dentry being modified under us we
don't need to play silly buggers with retries anyway - off the list
it is.

The locking logics taken out into a helper of its own; lock_parent()
is no longer used for dentries that can be killed under us.

[fix from Eric Biggers folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:02 -04:00
David S. Miller
e3e67a4f07 Merge branch 'dsa-Add-ATU-VTU-statistics'
Andrew Lunn says:

====================
Add ATU/VTU statistics

Previous patches have added basic support for Address Translation Unit
and VLAN translation Unit violation interrupts. Add statistics
counters for when these occur, which can be accessed using
ethtool. Downgrade one of the particularly spammy warnings from VTU
violations to debug only, now that we have a counter for it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 15:04:22 -04:00
Andrew Lunn
7f20d834ea net: dsa: mv88e6xxx: Make VTU miss violations less spammy
VTU miss violations can happen under normal conditions. Don't spam the
kernel log, downgrade the output to debug level only. The statistics
counter will indicate it is happening, if anybody not debugging is
interested.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 15:04:22 -04:00
Andrew Lunn
65f60e4582 net: dsa: mv88e6xxx: Keep ATU/VTU violation statistics
Count the numbers of various ATU and VTU violation statistics and
return them as part of the ethtool -S statistics.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 15:04:22 -04:00
Arnd Bergmann
7ae665f132 sctp: fix unused lable warning
The proc file cleanup left a label possibly unused:

net/sctp/protocol.c: In function 'sctp_defaults_init':
net/sctp/protocol.c:1304:1: error: label 'err_init_proc' defined but not used [-Werror=unused-label]

This adds an #ifdef around it to match the respective 'goto'.

Fixes: d47d08c8ca ("sctp: use proc_remove_subtree()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:38:27 -04:00
Wei Yongjun
75498aa1af net: cavium: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:36:33 -04:00
Wei Yongjun
335ab8ba61 net: bcmgenet: return NULL instead of plain integer
Fixes the following sparse warning:

drivers/net/ethernet/broadcom/genet/bcmgenet.c:1351:16: warning:
 Using plain integer as NULL pointer

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:35:51 -04:00
Dan Carpenter
99fe29d3a2 test_bpf: Fix NULL vs IS_ERR() check in test_skb_segment()
The skb_segment() function returns error pointers on error.  It never
returns NULL.

Fixes: 76db8087c4 ("net: bpf: add a test for skb_segment in test_bpf module")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:33:29 -04:00
Manish Chopra
58f101bf87 qede: Do not drop rx-checksum invalidated packets.
Today, driver drops received packets which are indicated as
invalid checksum by the device. Instead of dropping such packets,
pass them to the stack with CHECKSUM_NONE indication in skb.

Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:32:15 -04:00
Eric Biggers
ce3fd194fc ext4: limit xattr size to INT_MAX
ext4 isn't validating the sizes of xattrs where the value of the xattr
is stored in an external inode.  This is problematic because
->e_value_size is a u32, but ext4_xattr_get() returns an int.  A very
large size is misinterpreted as an error code, which ext4_get_acl()
translates into a bogus ERR_PTR() for which IS_ERR() returns false,
causing a crash.

Fix this by validating that all xattrs are <= INT_MAX bytes.

This issue has been assigned CVE-2018-1095.

https://bugzilla.kernel.org/show_bug.cgi?id=199185
https://bugzilla.redhat.com/show_bug.cgi?id=1560793

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Fixes: e50e5129f3 ("ext4: xattr-in-inode support")
2018-03-29 14:31:42 -04:00
Russell King
981f1f8035 sfp: allow cotsworks modules
Cotsworks modules fail the checksums - it appears that Cotsworks
reprograms the EEPROM at the end of production with the final product
information (serial, date code, and exact part number for module
options) and fails to update the checksum.

Work around this by detecting the Cotsworks name in the manufacturer
field, and reducing the checksum failures to warnings rather than a
hard error.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:30:41 -04:00
David S. Miller
5593e3340f Merge branch 'qed-flash-upgrade-support'
Sudarsana Reddy Kalluru says:

====================
qed*: Flash upgrade support.

The patch series adds adapter flash upgrade support for qed/qede drivers.

Please consider applying it to net-next branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:56 -04:00
Sudarsana Reddy Kalluru
ccfa110cfe qede: Ethtool flash update support.
The patch adds ethtool callback implementation for flash update.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:56 -04:00
Sudarsana Reddy Kalluru
3a69cae80c qed: Adapter flash update support.
This patch adds the required driver support for updating the flash or
non volatile memory of the adapter. At highlevel, flash upgrade comprises
of reading the flash images from the input file, validating the images and
writing them to the respective paritions.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Sudarsana Reddy Kalluru
62e4d4386a qed: Add APIs for flash access.
This patch adds APIs for flash access.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Sudarsana Reddy Kalluru
d8bf47af24 qed: Fix PTT entry leak in the selftest error flow.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Sudarsana Reddy Kalluru
43645ce03e qed: Populate nvm image attribute shadow.
This patch adds support for populating the flash image attributes.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Radim Krčmář
27aa896281 Merge tag 'kvm-ppc-next-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
KVM PPC update for 4.17

- Improvements for the radix page fault handler for HV KVM on POWER9.
2018-03-29 20:20:13 +02:00
Michal Kalderon
50bc60cb15 qed*: Utilize FW 8.33.11.0
This FW contains several fixes and features

RDMA Features
- SRQ support
- XRC support
- Memory window support
- RDMA low latency queue support
- RDMA bonding support

RDMA bug fixes
- RDMA remote invalidate during retransmit fix
- iWARP MPA connect interop issue with RTR fix
- iWARP Legacy DPM support
- Fix MPA reject flow
- iWARP error handling
- RQ WQE validation checks

MISC
- Fix some HSI types endianity
- New Restriction: vlan insertion in core_tx_bd_data can't be set
  for LB packets

ETH
- HW QoS offload support
- Fix vlan, dcb and sriov flow of VF sending a packet with
  inband VLAN tag instead of default VLAN
- Allow GRE version 1 offloads in RX flow
- Allow VXLAN steering

iSCSI / FcoE
- Fix bd availability checking flow
- Support 256th sge proerly in iscsi/fcoe retransmit
- Performance improvement
- Fix handle iSCSI command arrival with AHS and with immediate
- Fix ipv6 traffic class configuration

DEBUG
- Update debug utilities

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:18:02 -04:00
Eric Dumazet
18dcbe12fe ipv6: export ip6 fragments sysctl to unprivileged users
IPv4 was changed in commit 52a773d645 ("net: Export ip fragment
sysctl to unprivileged users")

The only sysctl that is not per-netns is not used :
ip6frag_secret_interval

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:14:03 -04:00
Intiyaz Basha
697fefc7c1 liquidio: Prioritize control messages
During heavy tx traffic, control messages (sent by liquidio driver to NIC
firmware) sometimes do not get processed in a timely manner.  Reason is:
the low-level metadata of control messages and that of egress network
packets indicate that they have the same priority.

Fix it by setting a higher priority for control messages through the new
ctrl_qpg field in the oct_txpciq struct.  It is the NIC firmware that does
the actual setting of priority by writing to the new ctrl_qpg field; the
host driver treats that value as opaque and just assigns it to pki_ih3->qpg

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:13:49 -04:00
David S. Miller
b349e0b5ec Merge branch 'net-Allow-FIB-notifiers-to-fail-add-and-replace'
David Ahern says:

====================
net: Allow FIB notifiers to fail add and replace

I wanted to revisit how resource overload is handled for hardware offload
of FIB entries and rules. At the moment, the in-kernel fib notifier can
tell a driver about a route or rule add, replace, and delete, but the
notifier can not affect the action. Specifically, in the case of mlxsw
if a route or rule add is going to overflow the ASIC resources the only
recourse is to abort hardware offload. Aborting offload is akin to taking
down the switch as the path from data plane to the control plane simply
can not support the traffic bandwidth of the front panel ports. Further,
the current state of FIB notifiers is inconsistent with other resources
where a driver can affect a user request - e.g., enslavement of a port
into a bridge or a VRF.

As a result of the work done over the past 3+ years, I believe we are
at a point where we can bring consistency to the stack and offloads,
and reliably allow the FIB notifiers to fail a request, pushing an error
along with a suitable error message back to the user. Rather than
aborting offload when the switch is out of resources, userspace is simply
prevented from adding more routes and has a clear indication of why.

This set does not resolve the corner case where rules or routes not
supported by the device are installed prior to the driver getting loaded
and registering for FIB notifications. In that case, hardware offload has
not been established and it can refuse to offload anything, sending
errors back to userspace via extack. Since conceptually the driver owns
the netdevices associated with its asic, this corner case mainly applies
to unsupported rules and any races during the bringup phase.

Patch 1 fixes call_fib_notifiers to extract the errno from the encoded
response from handlers.

Patches 2-5 allow the call to call_fib_notifiers to fail the add or
replace of a route or rule.

Patch 6 adds a simple resource controller to netdevsim to illustrate
how a FIB resource controller can limit the number of route entries.

Changes since RFC
- correct return code for call_fib_notifier
- dropped patch 6 exporting devlink symbols
- limited example resource controller to init_net only
- updated Kconfig for netdevsim to use MAY_USE_DEVLINK
- updated cover letter regarding startup case noted by Ido
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:31 -04:00
David Ahern
37923ed6b8 netdevsim: Add simple FIB resource controller via devlink
Add devlink support to netdevsim and use it to implement a simple,
profile based resource controller. Only one controller is needed
per namespace, so the first netdevsim netdevice in a namespace
registers with devlink. If that device is deleted, the resource
settings are deleted.

The resource controller allows a user to limit the number of IPv4 and
IPv6 FIB entries and FIB rules. The resource paths are:
    /IPv4
    /IPv4/fib
    /IPv4/fib-rules
    /IPv6
    /IPv6/fib
    /IPv6/fib-rules

The IPv4 and IPv6 top level resources are unlimited in size and can not
be changed. From there, the number of FIB entries and FIB rule entries
are unlimited by default. A user can specify a limit for the fib and
fib-rules resources:

    $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96
    $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16
    $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64
    $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16
    $ devlink dev reload netdevsim/netdevsim0

such that the number of rules or routes is limited (96 ipv4 routes in the
example above):
    $ for n in $(seq 1 32); do ip ro add 10.99.$n.0/24 dev eth1; done
    Error: netdevsim: Exceeded number of supported fib entries.

    $ devlink resource show netdevsim/netdevsim0
    netdevsim/netdevsim0:
      name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables non
        resources:
          name fib size 96 occ 96 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables
    ...

With this template in place for resource management, it is fairly trivial
to extend and shows one way to implement a simple counter based resource
controller typical of network profiles.

Currently, devlink only supports initial namespace. Code is in place to
adapt netdevsim to a per namespace controller once the network namespace
issues are resolved.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:31 -04:00
David Ahern
2233000cba net/ipv6: Move call_fib6_entry_notifiers up for route adds
Move call to call_fib6_entry_notifiers for new IPv6 routes to right
before the insertion into the FIB. At this point notifier handlers can
decide the fate of the new route with a clean path to delete the
potential new entry if the notifier returns non-0.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:31 -04:00
David Ahern
c1d7ee67ac net/ipv4: Allow notifier to fail route replace
Add checking to call to call_fib_entry_notifiers for IPv4 route replace.
Allows a notifier handler to fail the replace.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:30 -04:00
David Ahern
6635f311ea net/ipv4: Move call_fib_entry_notifiers up for new routes
Move call to call_fib_entry_notifiers for new IPv4 routes to right
before the call to fib_insert_alias. At this point the only remaining
failure path is memory allocations in fib_insert_node. Handle that
very unlikely failure with a call to call_fib_entry_notifiers to
tell drivers about it.

At this point notifier handlers can decide the fate of the new route
with a clean path to delete the potential new entry if the notifier
returns non-0.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:30 -04:00
David Ahern
9776d32537 net: Move call_fib_rule_notifiers up in fib_nl_newrule
Move call_fib_rule_notifiers up in fib_nl_newrule to the point right
before the rule is inserted into the list. At this point there are no
more failure paths within the core rule code, so if the notifier
does not fail then the rule will be inserted into the list.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:30 -04:00
David Ahern
c30d9356e9 net: Fix fib notifer to return errno
Notifier handlers use notifier_from_errno to convert any potential error
to an encoded format. As a consequence the other side, call_fib_notifier{s}
in this case, needs to use notifier_to_errno to return the error from
the handler back to its caller.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:30 -04:00
David S. Miller
6e2135ce54 Merge tag 'mlx5-updates-2018-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2018-03-27 (Misc updates & SQ recovery)

This series contains Misc updates and cleanups for mlx5e rx path
and SQ recovery feature for tx path.

From Tariq: (RX updates)
    - Disable Striding RQ when PCI devices, striding RQ limits the use
      of CQE compression feature, which is very critical for slow PCI
      devices performance, in this change we will prefer CQE compression
      over Striding RQ only on specific "slow"  PCIe links.
    - RX path cleanups
    - Private flag to enable/disable striding RQ

From Eran: (TX fast recovery)
    - TX timeout logic improvements, fast SQ recovery and TX error reporting
      if a HW error occurs while transmitting on a specific SQ, the driver will
      ignore such error and will wait for TX timeout to occur and reset all
      the rings. Instead, the current series improves the resiliency for such
      HW errors by detecting TX completions with errors, which will report them
      and perform a fast recover for the specific faulty SQ even before a TX
      timeout is detected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:01:40 -04:00
David S. Miller
038d49baab Merge branch 'Introduce-net_rwsem-to-protect-net_namespace_list'
Kirill Tkhai says:

====================
Introduce net_rwsem to protect net_namespace_list

The series introduces fine grained rw_semaphore, which will be used
instead of rtnl_lock() to protect net_namespace_list.

This improves scalability and allows to do non-exclusive sleepable
iteration for_each_net(), which is enough for most cases.

scripts/get_maintainer.pl gives enormous list of people, and I add
all to CC.

Note, that this patch is independent of "Close race between
{un, }register_netdevice_notifier and pernet_operations":
https://patchwork.ozlabs.org/project/netdev/list/?series=36495

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:54 -04:00
Kirill Tkhai
152f253152 net: Remove rtnl_lock() in nf_ct_iterate_destroy()
rtnl_lock() doesn't protect net::ct::count,
and it's not needed for__nf_ct_unconfirmed_destroy()
and for nf_queue_nf_hook_drop().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:54 -04:00
Kirill Tkhai
ec9c780925 ovs: Remove rtnl_lock() from ovs_exit_net()
Here we iterate for_each_net() and removes
vport from alive net to the exiting net.

ovs_net::dps are protected by ovs_mutex(),
and the others, who change it (ovs_dp_cmd_new(),
__dp_destroy()) also take it.
The same with datapath::ports list.

So, we remove rtnl_lock() here.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:54 -04:00
Kirill Tkhai
350311aab4 security: Remove rtnl_lock() in selinux_xfrm_notify_policyload()
rt_genid_bump_all() consists of ipv4 and ipv6 part.
ipv4 part is incrementing of net::ipv4::rt_genid,
and I see many places, where it's read without rtnl_lock().

ipv6 part calls __fib6_clean_all(), and it's also
called without rtnl_lock() in other places.

So, rtnl_lock() here was used to iterate net_namespace_list only,
and we can remove it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:53 -04:00