Commit Graph

855799 Commits

Author SHA1 Message Date
Andy Lutomirski
dffb3f9db6 x86/entry/64: Don't compile ignore_sysret if 32-bit emulation is enabled
It's only used if !CONFIG_IA32_EMULATION, so disable it in normal
configs.  This will save a few bytes of text and reduce confusion.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc:  "BaeChang Seok" <chang.seok.bae@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Bae, Chang Seok" <chang.seok.bae@intel.com>
Link: https://lkml.kernel.org/r/0f7dafa72fe7194689de5ee8cfe5d83509fabcf5.1562035429.git.luto@kernel.org
2019-07-02 08:45:20 +02:00
Andy Lutomirski
9402eaf4c1 selftests/x86: Test SYSCALL and SYSENTER manually with TF set
Make sure that both variants of the nasty TF-in-compat-syscall are
exercised regardless of what vendor's CPU is running the tests.

Also change the intentional signal after SYSCALL to use ud2, which
is a lot more comprehensible.

This crashes the kernel due to an FSGSBASE bug right now.

This test *also* detects a bug in KVM when run on an Intel host.  KVM
people, feel free to use it to help debug.  There's a bunch of code in this
test to warn instead of going into an infinite looping when the bug gets
triggered.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc:  "BaeChang Seok" <chang.seok.bae@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: "Bae, Chang Seok" <chang.seok.bae@intel.com>
Link: https://lkml.kernel.org/r/5f5de10441ab2e3005538b4c33be9b1965d1bb63.1562035429.git.luto@kernel.org
2019-07-02 08:45:20 +02:00
YueHaibing
ede7c247ab rslib: Make some functions static
Fix sparse warnings:

lib/reed_solomon/test_rslib.c:313:5: warning: symbol 'ex_rs_helper' was not declared. Should it be static?
lib/reed_solomon/test_rslib.c:349:5: warning: symbol 'exercise_rs' was not declared. Should it be static?
lib/reed_solomon/test_rslib.c:407:5: warning: symbol 'exercise_rs_bc' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <ferdinand.blomqvist@gmail.com>
Link: https://lkml.kernel.org/r/20190702061847.26060-1-yuehaibing@huawei.com
2019-07-02 08:41:37 +02:00
David S. Miller
337d1ccb3d Merge branch 'Add-gve-driver'
Catherine Sullivan says:

====================
Add gve driver

This patch series adds the gve driver which will support the
Compute Engine Virtual NIC that will be available in the future.

v2:
- Patch 1:
  - Remove gve_size_assert.h and use static_assert instead.
  - Loop forever instead of bugging if the device won't reset
  - Use module_pci_driver
- Patch 2:
  - Use be16_to_cpu in the RX Seq No define
  - Remove unneeded ndo_change_mtu
- Patch 3:
  - No Changes
- Patch 4:
  - Instead of checking netif_carrier_ok in ethtool stats, just make sure

v3:
- Patch 1:
  - Remove X86 dep
- Patch 2:
  - No changes
- Patch 3:
  - No changes
- Patch 4:
  - Remove unneeded memsets in ethtool stats

v4:
- Patch 1:
  - Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
  - Explicitly add padding to gve_adminq_set_driver_parameter
  - Use static where appropriate
- Patch 2:
  - Use u64_stats_sync
  - Explicity add padding to gve_adminq_create_rx_queue
  - Fix some enianness typing issues found by kbuild
  - Use static where appropriate
  - Remove unused variables
- Patch 3:
  - Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
- Patch 4:
  - Use u64_stats_sync
  - Use static where appropriate
Warnings reported by:
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
Catherine Sullivan
e5b845dc79 gve: Add ethtool support
Add support for the following ethtool commands:

ethtool -s|--change devname [msglvl N] [msglevel type on|off]
ethtool -S|--statistics devname
ethtool -i|--driver devname
ethtool -l|--show-channels devname
ethtool -L|--set-channels devname
ethtool -g|--show-ring devname
ethtool --reset devname

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
Catherine Sullivan
9e5f7d26a4 gve: Add workqueue and reset support
Add support for the workqueue to handle management interrupts and
support for resets.

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
Catherine Sullivan
f5cedc84a3 gve: Add transmit and receive support
Add support for passing traffic.

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
Catherine Sullivan
893ce44df5 gve: Add basic driver framework for Compute Engine Virtual NIC
Add a driver framework for the Compute Engine Virtual NIC that will be
available in the future.

At this point the only functionality is loading the driver.

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
David S. Miller
2a8d8e0fec Merge branch 'blackhole-device-to-invalidate-dst'
Mahesh Bandewar says:

====================
blackhole device to invalidate dst

When we invalidate dst or mark it "dead", we assign 'lo' to
dst->dev. First of all this assignment is racy and more over,
it has MTU implications.

The standard dev MTU is 1500 while the Loopback MTU is 64k. TCP
code when dereferencing the dst don't check if the dst is valid
or not. TCP when dereferencing a dead-dst while negotiating a
new connection, may use dst device which is 'lo' instead of
using the correct device. Consider the following scenario:

A SYN arrives on an interface and tcp-layer while processing
SYNACK finds a dst and associates it with SYNACK skb. Now before
skb gets passed to L3 for processing, if that dst gets "dead"
(because of the virtual device getting disappeared & then reappeared),
the 'lo' gets assigned to that dst (lo MTU = 64k). Let's assume
the SYN has ADV_MSS set as 9k while the output device through
which this SYNACK is going to go out has standard MTU of 1500.
The MTU check during the route check passes since MIN(9K, 64K)
is 9k and TCP successfully negotiates 9k MSS. The subsequent
data packet; bigger in size gets passed to the device and it
won't be marked as GSO since the assumed MTU of the device is
9k.

This either crashes the NIC and we have seen fixes that went
into drivers to handle this scenario. 8914a59511 ('bnx2x:
disable GSO where gso_size is too big for hardware') and
2b16f04872 ('net: create skb_gso_validate_mac_len()') and
with those fixes TCP eventually recovers but not before
few dropped segments.

Well, I'm not a TCP expert and though we have experienced
these corner cases in our environment, I could not reproduce
this case reliably in my test setup to try this fix myself.
However, Michael Chan <michael.chan@broadcom.com> had a setup
where these fixes helped him mitigate the issue and not cause
the crash.

The idea here is to not alter the data-path with additional
locks or smb()/rmb() barriers to avoid racy assignments but
to create a new device that has really low MTU that has
.ndo_start_xmit essentially a kfree_skb(). Make use of this
device instead of 'lo' when marking the dst dead.

First patch implements the blackhole device and second
patch uses it in IPv4 and IPv6 stack while the third patch
is the self test that ensures the sanity of this device.

v1->v2
  fixed the self-test patch to handle the conflict

v2 -> v3
  fixed Kconfig text/string.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Mahesh Bandewar
509e56b37c blackhole_dev: add a selftest
Since this is not really a device with all capabilities, this test
ensures that it has *enough* to make it through the data path
without causing unwanted side-effects (read crash!).

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Mahesh Bandewar
8d7017fd62 blackhole_netdev: use blackhole_netdev to invalidate dst entries
Use blackhole_netdev instead of 'lo' device with lower MTU when marking
dst "dead".

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Tested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Mahesh Bandewar
4de83b88c6 loopback: create blackhole net device similar to loopack.
Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b)  MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registered it won't have ifindex.

Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Hayes Wang
13e04fbf0b r8152: fix the setting of detecting the linking change for runtime suspend
1. Rename r8153b_queue_wake() to r8153_queue_wake().

2. Correct the setting. The enable bit should be 0xd38c bit 0. Besides,
   the 0xd38a bit 0 and 0xd398 bit 8 have to be cleared for both enabled
   and disabled.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:31:57 -07:00
Hariprasad Kelam
8909783cb5 net: ethernet: broadcom: bcm63xx_enet: Remove unneeded memset
Remove unneeded memset as alloc_etherdev is using kvzalloc which uses
__GFP_ZERO flag

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:29:19 -07:00
David S. Miller
fec3b9ec47 Merge branch 'net-netsec-Add-XDP-Support'
Ilias Apalodimas says:

====================
net: netsec: Add XDP Support

This is a respin of https://www.spinics.net/lists/netdev/msg526066.html
Since page_pool API fixes are merged into net-next we can now safely use
it's DMA mapping capabilities.

First patch changes the buffer allocation from napi/netdev_alloc_frag()
to page_pool API. Although this will lead to slightly reduced performance
(on raw packet drops only) we can use the API for XDP buffer recycling.
Another side effect is a slight increase in memory usage, due to using a
single page per packet.

The second patch adds XDP support on the driver.
There's a bunch of interesting options that come up due to the single
Tx queue.
Locking is needed(to avoid messing up the Tx queues since ndo_xdp_xmit
and the normal stack can co-exist). We also need to track down the
'buffer type' for TX and properly free or recycle the packet depending
on it's nature.

Changes since RFC:
- Bug fixes from Jesper and Maciej
- Added page pool API to retrieve the DMA direction

Changes since v1:
- Use page_pool_free correctly if xdp_rxq_info_reg() failed
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas
ba2b232108 net: netsec: add XDP support
The interface only supports 1 Tx queue so locking is introduced on
the Tx queue if XDP is enabled to make sure .ndo_start_xmit and
.ndo_xdp_xmit won't corrupt Tx ring

- Performance (SMMU off)

Benchmark   XDP_SKB     XDP_DRV
xdp1        291kpps     344kpps
rxdrop      282kpps     342kpps

- Performance (SMMU on)
Benchmark   XDP_SKB     XDP_DRV
xdp1        167kpps     324kpps
rxdrop      164kpps     323kpps

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas
bb005f2a70 net: page_pool: add helper function for retrieving dma direction
Since the dma direction is stored in page pool params, offer an API
helper for driver that choose not to keep track of it locally

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas
5c67bf0ec4 net: netsec: Use page_pool API
Use page_pool and it's DMA mapping capabilities for Rx buffers instead
of netdev/napi_alloc_frag()

Although this will result in a slight performance penalty on small sized
packets (~10%) the use of the API will allow to easily add XDP support.
The penalty won't be visible in network testing i.e ipef/netperf etc, it
only happens during raw packet drops.
Furthermore we intend to add recycling capabilities on the API
in the future. Once the recycling is added the performance penalty will
go away.
The only 'real' penalty is the slightly increased memory usage, since we
now allocate a page per packet instead of the amount of bytes we need +
skb metadata (difference is roughly 2kb per packet).
With a minimum of 4BG of RAM on the only SoC that has this NIC the
extra memory usage is negligible (a bit more on 64K pages)

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Jakub Kicinski
acd3e96d53 net/tls: make sure offload also gets the keys wiped
Commit 86029d10af ("tls: zero the crypto information from tls_context
before freeing") added memzero_explicit() calls to clear the key material
before freeing struct tls_context, but it missed tls_device.c has its
own way of freeing this structure. Replace the missing free.

Fixes: 86029d10af ("tls: zero the crypto information from tls_context before freeing")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:22:36 -07:00
Jakub Kicinski
618bac4593 net/tls: reject offload of TLS 1.3
Neither drivers nor the tls offload code currently supports TLS
version 1.3. Check the TLS version when installing connection
state. TLS 1.3 will just fallback to the kernel crypto for now.

Fixes: 130b392c6c ("net: tls: Add tls 1.3 support")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:21:31 -07:00
Roman Mashak
a8488b7026 tc-testing: added tdc tests for prio qdisc
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:20:43 -07:00
David S. Miller
c8881faf6e Merge branch 'mirred-batch-fixes'
Roman Mashak says:

====================
Fix batched event generation for mirred action

When adding or deleting a batch of entries, the kernel sends upto
TCA_ACT_MAX_PRIO entries in an event to user space. However it does not
consider that the action sizes may vary and require different skb sizes.

For example :

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action mirred
for i in `seq 1 $1`;
do
   cmd="action mirred egress redirect dev lo index $i "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

patch 1 adds callback in tc_action_ops of mirred action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().

patch 2 updates the TDC test suite with relevant test cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:18:04 -07:00
Roman Mashak
5d15a8ec2a tc-testing: updated mirred action tests with batch create/delete
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:18:04 -07:00
Roman Mashak
b84b2d4e38 net sched: update mirred action for batched events operations
Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:18:03 -07:00
David S. Miller
8a534f8fb0 Merge branch 'idr-fix-overflow-cases-on-32-bit-CPU'
Cong Wang says:

====================
idr: fix overflow cases on 32-bit CPU

idr_get_next_ul() is problematic by design, it can't handle
the following overflow case well on 32-bit CPU:

u32 id = UINT_MAX;
idr_alloc_u32(&id);
while (idr_get_next_ul(&id) != NULL)
 id++;

when 'id' overflows and becomes 0 after UINT_MAX, the loop
goes infinite.

Fix this by eliminating external users of idr_get_next_ul()
and migrating them to idr_for_each_entry_continue_ul(). And
add an additional parameter for these iteration macros to detect
overflow properly.

Please merge this through networking tree, as all the users
are in networking subsystem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:15:46 -07:00
Davide Caratti
95b9395ba1 selftests: add a test case for cls_lower handle overflow
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:15:46 -07:00
Cong Wang
d39d714969 idr: introduce idr_for_each_entry_continue_ul()
Similarly, other callers of idr_get_next_ul() suffer the same
overflow bug as they don't handle it properly either.

Introduce idr_for_each_entry_continue_ul() to help these callers
iterate from a given ID.

cls_flower needs more care here because it still has overflow when
does arg->cookie++, we have to fold its nested loops into one
and remove the arg->cookie++.

Fixes: 01683a1469 ("net: sched: refactor flower walk to iterate over idr")
Fixes: 12d6066c3b ("net/mlx5: Add flow counters idr")
Reported-by: Li Shuang <shuali@redhat.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Chris Mi <chrism@mellanox.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:15:46 -07:00
Cong Wang
e33d2b74d8 idr: fix overflow case for idr_for_each_entry_ul()
idr_for_each_entry_ul() is buggy as it can't handle overflow
case correctly. When we have an ID == UINT_MAX, it becomes an
infinite loop. This happens when running on 32-bit CPU where
unsigned long has the same size with unsigned int.

There is no better way to fix this than casting it to a larger
integer, but we can't just 64 bit integer on 32 bit CPU. Instead
we could just use an additional integer to help us to detect this
overflow case, that is, adding a new parameter to this macro.
Fortunately tc action is its only user right now.

Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Reported-by: Li Shuang <shuali@redhat.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:15:46 -07:00
Jason A. Donenfeld
362b87f5b1 netlink: use 48 byte ctx instead of 6 signed longs for callback
People are inclined to stuff random things into cb->args[n] because it
looks like an array of integers. Sometimes people even put u64s in there
with comments noting that a certain member takes up two slots. The
horror! Really this should mirror the usage of skb->cb, which are just
48 opaque bytes suitable for casting a struct. Then people can create
their usual casting macros for accessing strongly typed members of a
struct.

As a plus, this also gives us the same amount of space on 32bit and 64bit.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:12:10 -07:00
Jon Maloy
53962bcea9 tipc: embed jiffies in macro TIPC_BC_RETR_LIM
The macro TIPC_BC_RETR_LIM is always used in combination with 'jiffies',
so we can just as well perform the addition in the macro itself. This
way, we get a few shorter code lines and one less line break.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:10:57 -07:00
David S. Miller
eb1f5c02dd Merge branch 'vsock-virtio-fixes'
Stefano Garzarella says:

====================
vsock/virtio: several fixes in the .probe() and .remove()

During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
before registering the driver", Stefan pointed out some possible issues
in the .probe() and .remove() callbacks of the virtio-vsock driver.

This series tries to solve these issues:
- Patch 1 adds RCU critical sections to avoid use-after-free of
  'the_virtio_vsock' pointer.
- Patch 2 stops workers before to call vdev->config->reset(vdev) to
  be sure that no one is accessing the device.
- Patch 3 moves the works flush at the end of the .remove() to avoid
  use-after-free of 'vsock' object.

v2:
- Patch 1: use RCU to protect 'the_virtio_vsock' pointer
- Patch 2: no changes
- Patch 3: flush works only at the end of .remove()
- Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers
  allocated.

v1: https://patchwork.kernel.org/cover/10964733/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:09:07 -07:00
Stefano Garzarella
0d20e56ecc vsock/virtio: fix flush of works during the .remove()
This patch moves the flush of works after vdev->config->del_vqs(vdev),
because we need to be sure that no workers run before to free the
'vsock' object.

Since we stopped the workers using the [tx|rx|event]_run flags,
we are sure no one is accessing the device while we are calling
vdev->config->reset(vdev), so we can safely move the workers' flush.

Before the vdev->config->del_vqs(vdev), workers can be scheduled
by VQ callbacks, so we must flush them after del_vqs(), to avoid
use-after-free of 'vsock' object.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:09:07 -07:00
Stefano Garzarella
17dd136738 vsock/virtio: stop workers during the .remove()
Before to call vdev->config->reset(vdev) we need to be sure that
no one is accessing the device, for this reason, we add new variables
in the struct virtio_vsock to stop the workers during the .remove().

This patch also add few comments before vdev->config->reset(vdev)
and vdev->config->del_vqs(vdev).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:09:07 -07:00
Stefano Garzarella
9c7a5582f5 vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.

To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.

We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.

We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:09:07 -07:00
Clement Leger
72f64cabc4 remoteproc: copy parent dma_pfn_offset for vdev
When preparing the subdevice for the vdev, also copy dma_pfn_offset
since this is used for sub device dma allocations. Without that, there
is incoherency between the parent dma settings and the childs one,
potentially leading to dma_alloc_coherent failure (due to phys_to_dma
using dma_pfn_offset for translation).

Fixes: 086d08725d ("remoteproc: create vdev subdevice with specific dma memory pool")
Signed-off-by: Clement Leger <cleger@kalray.eu>
Acked-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-07-01 19:09:01 -07:00
Taehee Yoo
7c31e54aee vxlan: do not destroy fdb if register_netdevice() is failed
__vxlan_dev_create() destroys FDB using specific pointer which indicates
a fdb when error occurs.
But that pointer should not be used when register_netdevice() fails because
register_netdevice() internally destroys fdb when error occurs.

This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev
internally.
Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan
dev.

vxlan_fdb_insert() is called after calling register_netdevice().
This routine can avoid situation that ->ndo_uninit() destroys fdb entry
in error path of register_netdevice().
Hence, error path of __vxlan_dev_create() routine can have an opportunity
to destroy default fdb entry by hand.

Test command
    ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \
	    dev enp0s9 dstport 4789

Splat looks like:
[  213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ #256
[  213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan]
[  213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d
[  213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202
[  213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000
[  213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0
[  213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000
[  213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200
[  213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0
[  213.402178] FS:  00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
[  213.402178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0
[  213.402178] Call Trace:
[  213.402178]  __vxlan_dev_create+0x3a9/0x7d0 [vxlan]
[  213.402178]  ? vxlan_changelink+0x740/0x740 [vxlan]
[  213.402178]  ? rcu_read_unlock+0x60/0x60 [vxlan]
[  213.402178]  ? __kasan_kmalloc.constprop.3+0xa0/0xd0
[  213.402178]  vxlan_newlink+0x8d/0xc0 [vxlan]
[  213.402178]  ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan]
[  213.554119]  ? __netlink_ns_capable+0xc3/0xf0
[  213.554119]  __rtnl_newlink+0xb75/0x1180
[  213.554119]  ? rtnl_link_unregister+0x230/0x230
[ ... ]

Fixes: 0241b83673 ("vxlan: fix default fdb entry netlink notify ordering during netdev create")
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:06:02 -07:00
Eiichi Tsukata
00dc3307c0 net/ipv6: Fix misuse of proc_dointvec "flowlabel_reflect"
/proc/sys/net/ipv6/flowlabel_reflect assumes written value to be in the
range of 0 to 3. Use proc_dointvec_minmax instead of proc_dointvec.

Fixes: 323a53c412 ("ipv6: tcp: enable flowlabel reflection in some RST packets")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:04:48 -07:00
Yunsheng Lin
27ba4059e0 net: link_watch: prevent starvation when processing linkwatch wq
When user has configured a large number of virtual netdev, such
as 4K vlans, the carrier on/off operation of the real netdev
will also cause it's virtual netdev's link state to be processed
in linkwatch. Currently, the processing is done in a work queue,
which may cause rtnl locking starvation problem and worker
starvation problem for other work queue, such as irqfd_inject wq.

This patch releases the cpu when link watch worker has processed
a fixed number of netdev' link watch event, and schedule the
work queue again when there is still link watch event remaining.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:02:47 -07:00
Bjorn Andersson
f04b913834 remoteproc: qcom: q6v5-mss: Support loading non-split images
In some software releases the firmware images are not split up with each
loadable segment in it's own file. Check the size of the loaded firmware
to see if it still contains each segment to be loaded, before falling
back to the split-out segments.

Reviewed-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-07-01 19:02:32 -07:00
Bjorn Andersson
498b98e939 soc: qcom: mdt_loader: Support loading non-split images
In some software releases the firmware images are not split up with each
loadable segment in it's own file. Check the size of the loaded firmware
to see if it still contains each segment to be loaded, before falling
back to the split-out segments.

Acked-by: Andy Gross <agross@kernel.org>
Reviewed-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-07-01 19:02:28 -07:00
Marcelo Ricardo Leitner
4d1415811e sctp: fix error handling on stream scheduler initialization
It allocates the extended area for outbound streams only on sendmsg
calls, if they are not yet allocated.  When using the priority
stream scheduler, this initialization may imply into a subsequent
allocation, which may fail.  In this case, it was aborting the stream
scheduler initialization but leaving the ->ext pointer (allocated) in
there, thus in a partially initialized state.  On a subsequent call to
sendmsg, it would notice the ->ext pointer in there, and trip on
uninitialized stuff when trying to schedule the data chunk.

The fix is undo the ->ext initialization if the stream scheduler
initialization fails and avoid the partially initialized state.

Although syzkaller bisected this to commit 4ff40b8626 ("sctp: set
chunk transport correctly when it's a new asoc"), this bug was actually
introduced on the commit I marked below.

Reported-by: syzbot+c1a380d42b190ad1e559@syzkaller.appspotmail.com
Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:01:47 -07:00
Cong Wang
c8c8218ec5 netrom: fix a memory leak in nr_rx_frame()
When the skb is associated with a new sock, just assigning
it to skb->sk is not sufficient, we have to set its destructor
to free the sock properly too.

Reported-by: syzbot+d6636a36d3c34bd88938@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:00:52 -07:00
David S. Miller
0d0bcacc54 Merge branch 'mlxsw-PTP-timestamping-support'
Ido Schimmel says:

====================
mlxsw: PTP timestamping support

This is the second patchset adding PTP support in mlxsw. Next patchset
will add PTP shapers which are required to maintain accuracy under rates
lower than 40Gb/s, while subsequent patchsets will add tracepoints and
selftests.

Petr says:

This patch set introduces support for retrieving and processing hardware
timestamps for PTP packets.

The way PTP timestamping works on Spectrum-1 is that there are two queues
associated with each front panel port. When a packet is timestamped, the
timestamp is put to one of the queues: timestamps for transmitted packets
to one and for received packets to the other. Activity on these queues is
signaled through the events PTP_ING_FIFO and PTP_EGR_FIFO.

Packets themselves arrive through two traps: PTP0 and PTP1. It is possible
to configure which PTP messages should be trapped under which PTP trap. On
Spectrum systems, mlxsw will use PTP0 for event messages (which need
timestamping), and PTP1 for general messages (which do not).

There are therefore four relevant traps: receive of PTP event resp. general
message, and receive of timestamp for a transmitted resp. received PTP
packet. The obvious point where to put the new logic is a custom listener
to the mentioned traps.

Besides handling ingress traffic (be in packets or timestamps), the driver
also needs to handle timestamping of transmitted packets. One option would
be to invoke the relevant logic from mlxsw_core_ptp_transmitted(). However
on Spectrum-2, the timestamps are actually delivered through the completion
queue, and for that reason this patchset opts to invoke the logic from the
PCI code, via core and the driver, to a chip-specific operation. That way
the invocation will be done in a place where a Spectrum-2 implementation
will have an opportunity to extract the timestamp.

As indicated above, the PTP FIFO signaling happens independently from
packet delivery. A packet corresponding to any given timestamp could be
delivered sooner or later than the timestamp itself. Additionally, the
queues are only four elements deep, and it is therefore possible that the
timestamp for a delivered packet never arrives at all. Similarly a PTP
packet might be dropped due to CPU traffic pressure, and never be delivered
even if the corresponding timestamp was.

The driver thus needs to hold a cache of as-yet-unmatched SKBs and
timestamps. The first piece to arrive (be it timestamp or SKB) is put to
this cache. When the other piece arrives, the timestamp is attached to the
SKB and that is passed on. A delayed work is run at regular intervals to
prune the old unmatched entries.

As mentioned above, the mechanism for timestamp delivery changes on
Spectrum-2, where timestamps are part of completion queue elements, and all
packets are timestamped. All this bookkeeping is therefore unnecessary on
Spectrum-2. For this reason, this patchset spends some time introducing
Spectrum-1 specific artifacts such as a possibility to register a given
trap only on Spectrum-1.

Patches #1-#4 describe new registers.

Patches #5 and #6 introduce the possibility to register certain traps
only on some systems. The list of Spectrum-1 specific traps is left empty
at this point.

Patch #7 hooks into packet receive path by registering PTP traps
and appropriate handlers (that however do nothing of substance yet).

Patch #8 adds a helper to allow storing custom data to SKB->cb.

Patch #9 adds a call into the PCI completion queue handler that invokes,
via core and spectrum code, a PTP transmit handler. (Which also does not do
anything interesting yet.)

Patch #10 introduces code to invoke PTP initialization and adds data types
for the cache of unmatched entries.

Patches #11 and #12 implement the timestamping itself. In #11, the PHC
spin_locks are converted to _bh variants, because unlike normal PHC path,
which runs in process context, timestamp processing runs as soft interrupt.
Then #12 introduces the code for saving and retrieval of unmatched entries,
invokes PTP classifier to identify packets of interest, registers timestamp
FIFO events, and handles decoding and attaching timestamps to packets.

Patch #13 introduces a garbage collector for left-behind entries that have
not been matched for about a second.

In patch #14, PTP message types are configured to arrive as PTP0
(events) or PTP1 (everything else) as appropriate. At this point, the PTP
packets start arriving through the traps, but because PTP is disabled and
there is no way to enable it yet, they are always just passed to the usual
receive path right away.

Finally patches #15 and #16 add the plumbing to actually make it possible
to enable this code through SIOCSHWTSTAMP ioctl, and to advertise the
hardware timestamping capabilities through ethtool.

v2:
- Patch #12:
    - In mlxsw_sp1_ptp_fifo_event_func(), post-increment when iterating over PTP
      FIFO records.
- Patch #14:
    - Change namespace of message type enumerators from MLXSW_ to MLXSW_SP_.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:35 -07:00
Petr Machata
87ee07f8e2 mlxsw: spectrum: PTP: Support ethtool get_ts_info
The get_ts_info callback is used for obtaining information about
timestamping capabilities of a network device. On Spectrum-1, implement
it to advertise the PHC and the capability to do HW timestamping, and
the supported RX and TX filters.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:35 -07:00
Petr Machata
8748642751 mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls
The SIOCSHWTSTAMP ioctl configures HW timestamping on a given port.
Dispatch the ioctls to per-chip handler (which add to ptp_ops). Find
which PTP messages need to be timestamped and configure MTPPPC
accordingly.

The SIOCGHWTSTAMP ioctl is getter for the current configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata
a773c76cb8 mlxsw: spectrum: PTP: Configure PTP traps and FIFO events
Configure MTPTPT to set which message types should arrive under which
PTP trap, and MOGCR to clear the timestamp queue after its contents are
reported through PTP_ING_FIFO or PTP_EGR_FIFO.

With this configuration, PTP packets start arriving through the PTP
traps. However since timestamping is disabled by default and there is
currently no way to enable it, they will not be timestamped.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata
5d23e41597 mlxsw: spectrum: PTP: Garbage-collect unmatched entries
On Spectrum-1, timestamped PTP packets and the corresponding timestamps
need to be kept in caches until both are available, at which point they are
matched up and packets forwarded as appropriate. However, not all packets
will ever see their timestamp, and not all timestamps will ever see their
packet. It is therefore necessary to dispose of such abandoned entries.

To that end, introduce a garbage collector to collect entries that have
not had their counterpart turn up within about a second. The GC
maintains a monotonously-increasing value of GC cycle. Every entry that
is put to the hash table is annotated with the GC cycle at which it
should be collected. When the GC runs, it walks the hash table, and
collects the objects according to their GC cycle annotation.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata
d92e4e6e33 mlxsw: spectrum: PTP: Support timestamping on Spectrum-1
On Spectrum-1, timestamps arrive through a pair of dedicated events:
MLXSW_TRAP_ID_PTP_ING_FIFO and _EGR_FIFO. The payload delivered with
those traps is contents of the timestamp FIFO at a given port in a given
direction. Add a Spectrum-1-specific handler for these two events which
decodes the timestamps and forwards them to the PTP module.

Add a function that parses a packet, dispatching to ptp_classify_raw(),
and decodes PTP message type, domain number, and sequence ID. Add a new
mlxsw dependency on the PTP classifier.

Add helpers that can store and retrieve unmatched timestamps and SKBs to
the hash table added in a preceding patch.

Add the matching code itself: upon arrival of a timestamp or a packet,
look up the corresponding unmatched entry, and match it up. If there is
none, add a new unmatched entry. This logic is the same on ingress as on
egress.

Packets and timestamps that never matched need to be eventually disposed
of. A garbage collector added in a follow-up patch will take care of
that. Since currently all this code is turned off, no crud will
accumulate in the hash table.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata
89e602ee6e mlxsw: spectrum: PTP: Disable BH when working with PHC
Up until now, the PTP hardware clock code was only invoked in the process
context (SYS_clock_adjtime -> do_clock_adjtime -> k_clock::clock_adj ->
pc_clock_adjtime -> posix_clock_operations::clock_adjtime ->
ptp_clock_info::adjtime -> mlxsw_spectrum).

In order to enable HW timestamping, which is tied into trap handling, it
will be necessary to take the clock lock from the PCI queue handler
tasklets as well.

Therefore use the _bh variants when handling the clock lock. Incidentally,
Documentation/ptp/ptp.txt recommends _irqsave variants, but that's
unnecessarily strong for our needs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata
810256cec1 mlxsw: spectrum: PTP: Add PTP initialization / finalization
Add two ptp_ops: init and fini, to initialize and finalize the PTP
subsystem. Call as appropriate from mlxsw_sp_init() and _fini().

Lay the groundwork for Spectrum-1 support. On Spectrum-1, the received
timestamped packets and their corresponding timestamps arrive
independently, and need to be matched up. Introduce the related data types
and add to struct mlxsw_sp_ptp_state the hash table that will keep the
unmatched entries.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00