Commit Graph

10199 Commits

Author SHA1 Message Date
Jason Gunthorpe
d384742ed1 RDMA/uverbs: Use offsetofend instead of opencoding
Discovered this was available already.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 21:52:39 -04:00
Leon Romanovsky
da3929218a RDMa/hns: Don't stuck in endless timeout loop
The "end" variable is declared as unsigned and can't be negative, it
leads to the situation where timeout limit is not honored, so let's
convert logic to ensure that loop is bounded.

drivers/infiniband/hw/hns/hns_roce_hw_v1.c: In function _hns_roce_v1_clear_hem_:
drivers/infiniband/hw/hns/hns_roce_hw_v1.c:2471:12: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
 2471 |    if (end < 0) {
      |            ^

Fixes: 669cefb654 ("RDMA/hns: Remove jiffies operation in disable interrupt context")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 15:39:43 -04:00
Bart Van Assche
fd56141244 scsi: RDMA/srp: Fix a sleep-in-invalid-context bug
The previous patch guarantees that srp_queuecommand() does not get
invoked while reconnecting occurs. Hence remove the code from
srp_queuecommand() that prevents command queueing while reconnecting.
This patch avoids that the following can appear in the kernel log:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 5600, name: scsi_eh_9
1 lock held by scsi_eh_9/5600:
 #0:  (rcu_read_lock){....}, at: [<00000000cbb798c7>] __blk_mq_run_hw_queue+0xf1/0x1e0
Preemption disabled at:
[<00000000139badf2>] __blk_mq_delay_run_hw_queue+0x78/0xf0
CPU: 9 PID: 5600 Comm: scsi_eh_9 Tainted: G        W        4.15.0-rc4-dbg+ #1
Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016
Call Trace:
 dump_stack+0x67/0x99
 ___might_sleep+0x16a/0x250 [ib_srp]
 __mutex_lock+0x46/0x9d0
 srp_queuecommand+0x356/0x420 [ib_srp]
 scsi_dispatch_cmd+0xf6/0x3f0
 scsi_queue_rq+0x4a8/0x5f0
 blk_mq_dispatch_rq_list+0x73/0x440
 blk_mq_sched_dispatch_requests+0x109/0x1a0
 __blk_mq_run_hw_queue+0x131/0x1e0
 __blk_mq_delay_run_hw_queue+0x9a/0xf0
 blk_mq_run_hw_queue+0xc0/0x1e0
 blk_mq_start_hw_queues+0x2c/0x40
 scsi_run_queue+0x18e/0x2d0
 scsi_run_host_queues+0x22/0x40
 scsi_error_handler+0x18d/0x5f0
 kthread+0x11c/0x140
 ret_from_fork+0x24/0x30

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-20 15:37:02 -04:00
Leon Romanovsky
836a0fbb3e RDMA: Check umem pointer validity prior to release
Update ib_umem_release() to behave similarly to kfree() and allow
submitting NULL pointer as safe input to this function.

Fixes: a52c8e2469 ("RDMA: Clean destroy CQ in drivers do not return errors")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 15:17:59 -04:00
Lang Cheng
89a6da3cb8 RDMA/hns: reset function when removing module
During removing the driver, we needs to notify the roce engine to
stop working immediately,and symmetrically recycle the hardware
resources requested during initialization.

The hardware provides a command called function clear that can package
these operations,so that the driver can only focus on releasing
resources that applied from the operating system.
This patch implements the call of this command.

Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 15:05:48 -04:00
Leon Romanovsky
a49b1dc7ae RDMA: Convert destroy_wq to be void
All callers of destroy WQ are always success and there is no need
to check their return value, so convert destroy_wq to be void.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 14:37:02 -04:00
Lijun Ou
8d18ad83f1 RDMA/hns: Fix bug when wqe num is larger than 16K
hip08 can support up to 32768 wqes in one qp. currently if the wqe num
is larger than 16384, the driver will lead a calltrace as follows.

[21361.393725] Call trace:
[21361.398605]  hns_roce_v2_modify_qp+0xbcc/0x1360 [hns_roce_hw_v2]
[21361.410627]  hns_roce_modify_qp+0x1d8/0x2f8 [hns_roce]
[21361.420906]  _ib_modify_qp+0x70/0x118
[21361.428222]  ib_modify_qp+0x14/0x1c
[21361.435193]  rt_ktest_modify_qp+0xb8/0x650 [rdma_test]
[21361.445472]  exec_modify_qp_cmd+0x110/0x4d8 [rdma_test]
[21361.455924]  rt_ktest_dispatch_cmd_3+0xa94/0x2edc [rdma_test]
[21361.467422]  rt_ktest_dispatch_cmd_2+0x9c/0x108 [rdma_test]
[21361.478570]  rt_ktest_dispatch_cmd+0x138/0x904 [rdma_test]
[21361.489545]  rt_ktest_dev_write+0x328/0x4b0 [rdma_test]
[21361.499998]  __vfs_write+0x38/0x15c
[21361.506966]  vfs_write+0xa8/0x1a0
[21361.513586]  ksys_write+0x50/0xb0
[21361.520206]  sys_write+0xc/0x14
[21361.526479]  el0_svc_naked+0x30/0x34
[21361.533622] Code: 1ac10841 d37d7c22 0b000021 d37df021 (f86268c0)
[21361.545815] ---[ end trace e2a1feb2c3d7f13c ]---

When the wqe num is larger than 16384, hns_roce_table_find will return an
invalid mtt, this will lead an kernel paging requet error if the driver try
to access it. It's the mtt design defect which can't support up to the max
wqe num of hip08.

This patch fixs it by replacing mtt with mtr for wqe.

Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 12:56:34 -04:00
Lijun Ou
2ac0bc5e72 RDMA/hns: Add a group interfaces for optimizing buffers getting flow
Currently, the code for getting umem and kmem buffers exist many files,
this patch adds a group interfaces to simplify the buffers getting flow.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 12:56:34 -04:00
Lijun Ou
38389eaa4d RDMA/hns: Add mtr support for mixed multihop addressing
Currently, the MTT(memory translate table) design required a buffer
space must has the same hopnum, but the hip08 hw can support mixed
hopnum config in a buffer space.

This patch adds the MTR(memory translate region) design for supporting
mixed multihop.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 12:56:34 -04:00
Doug Ledford
696de2e9cc RDMA/netlink: Resort policy array
Sort the netlink policy array by netlink attribute name.  This will make
it easier in the future to find the entry you are looking for when you
need to make changes, or to make sure you don't add the same entry
twice.

Fix the whitespace while we are there.

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-19 18:05:15 -04:00
Maor Gottlieb
09d985bea9 RDMA/mlx5: Enable decap and packet reformat on FDB
If FDB flow tables support decap operation, enable it on creation,
This allows to perform decapsulation of tunnelled packets by steering
rules. If FDB flow tables support reformat operation, enable it on
creation as well.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:52 -04:00
Maor Gottlieb
cecae747b6 RDMA/mlx5: Consider eswitch encap mode
When flow steering is created, then the encap support should
consider the eswitch encap mode. If the eswitch flow table (FDB)
supports encap then it shouldn't be supported on NIC RX flow tables.

Fixes: 4adda1122c ('RDMA/mlx5: Enable decap and packet reformat on flow tables')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:52 -04:00
Doug Ledford
12dbc04db0 Merge remote-tracking branch 'mlx5-next/mlx5-next' into HEAD
Take mlx5-next so we can take a dependent two patch series next.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:36 -04:00
Jason Gunthorpe
7608bf40cf RDMA/odp: Fix missed unlock in non-blocking invalidate_start
If invalidate_start returns with EAGAIN then the umem_rwsem needs to be
unlocked as no invalidate_end will be called.

Cc: <stable@vger.kernel.org>
Fixes: ca748c39ea ("RDMA/umem: Get rid of per_mm->notifier_count")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:35 -04:00
Geert Uytterhoeven
5a3113d19c IB/hfi1: Spelling s/statisfied/satisfied/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:35 -04:00
Jason Gunthorpe
8f71bb0030 RDMA: Report available cdevs through RDMA_NLDEV_CMD_GET_CHARDEV
Update the struct ib_client for all modules exporting cdevs related to the
ibdevice to also implement RDMA_NLDEV_CMD_GET_CHARDEV. All cdevs are now
autoloadable and discoverable by userspace over netlink instead of relying
on sysfs.

uverbs also exposes the DRIVER_ID for drivers that are able to support
driver id binding in rdma-core.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:08 -04:00
Jason Gunthorpe
0e2d00eb6f RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload
Allow userspace to issue a netlink query against the ib_device for
something like "uverbs" and get back the char dev name, inode major/minor,
and interface ABI information for "uverbs0".

Since we are now in netlink this can also trigger a module autoload to
make the uverbs device come into existence.

Largely this will let us replace searching and reading inside sysfs to
setup devices, and provides an alternative (using driver_id) to device
name based provider binding for things like rxe.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:41:05 -04:00
Gal Pressman
7a5834e456 RDMA/efa: Handle mmap insertions overflow
When inserting a new mmap entry to the xarray we should check for
'mmap_page' overflow as it is limited to 32 bits.

Fixes: 40909f664d ("RDMA/efa: Add EFA verbs implementation")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 16:27:24 -04:00
Denis Kirjanov
64d701c608 ipoib: correcly show a VF hardware address
in the case of IPoIB with SRIOV enabled hardware
ip link show command incorrecly prints
0 instead of a VF hardware address.

Before:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
    link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state disable,
trust off, query_rss off
...
After:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
    link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0     link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof
checking off, link-state disable, trust off, query_rss off

v1->v2: just copy an address without modifing ifla_vf_mac
v2->v3: update the changelog

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18 10:41:27 -07:00
David S. Miller
13091aa305 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 20:20:36 -07:00
Gal Pressman
529254340c RDMA/efa: Fix success return value in case of error
Existing code would mistakenly return success in case of error instead
of a proper return value.

Fixes: e9c6c53730 ("RDMA/efa: Add common command handlers")
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:35:21 -04:00
Mike Marciniszyn
942a899335 IB/hfi1: Handle port down properly in pio
The call to sc_buffer_alloc currently returns NULL (no buffer) or
a buffer descriptor.

There is a third case when the port is down.  Currently that
returns NULL and this prevents the caller from properly handling the
sc_buffer_alloc() failure.  A verbs code link test after the call is
racy so the indication needs to come from the state check inside the allocation
routine to be valid.

Fix by encoding the ECOMM failure like SDMA.   IS_ERR_OR_NULL() tests
are added at all call sites.  For verbs send, this needs to treat any
error by returning a completion without any MMIO copy.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
099a884ba4 IB/hfi1: Handle wakeup of orphaned QPs for pio
Once a send context is taken down due to a link failure, any QPs waiting
for pio credits will stay on the waitlist indefinitely.

Fix by wakeing up all QPs linked to piowait list.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
f972775b1c IB/hfi1: Wakeup QPs orphaned on wait list after flush
Once an SDMA engine is taken down due to a link failure, any waiting QPs
that do not have outstanding descriptors in the ring will stay
on the dmawait list as long as the port is down.

Since there is no timer running, they will stay there for a long time.

The fix is to wake up all iowaits linked to dmawait. The send engine
will build and post packets that get flushed back.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
4bb02e9572 IB/hfi1: Use aborts to trigger RC throttling
SDMA and pio flushes will cause a lot of packets to be transmitted
after a link has gone down, using a lot of CPU to retransmit
packets.

Fix for RC QPs by recognizing the flush status and:
- Forcing a timer start
- Putting the QP into a "send one" mode

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
9755f72496 IB/hfi1: Create inline to get extended headers
This paves the way for another patch that reacts to a
flush sdma completion for RC.

Fixes: 81cd3891f0 ("IB/hfi1: Add support for 16B Management Packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
3230f4a8d4 IB/hfi1: Silence txreq allocation warnings
The following warning can happen when a memory shortage
occurs during txreq allocation:

[10220.939246] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
[10220.939246] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
[10220.939247]   cache: mnt_cache, object size: 384, buffer size: 384, default order: 2, min order: 0
[10220.939260] Workqueue: hfi0_0 _hfi1_do_send [hfi1]
[10220.939261]   node 0: slabs: 1026568, objs: 43115856, free: 0
[10220.939262] Call Trace:
[10220.939262]   node 1: slabs: 820872, objs: 34476624, free: 0
[10220.939263]  dump_stack+0x5a/0x73
[10220.939265]  warn_alloc+0x103/0x190
[10220.939267]  ? wake_all_kswapds+0x54/0x8b
[10220.939268]  __alloc_pages_slowpath+0x86c/0xa2e
[10220.939270]  ? __alloc_pages_nodemask+0x2fe/0x320
[10220.939271]  __alloc_pages_nodemask+0x2fe/0x320
[10220.939273]  new_slab+0x475/0x550
[10220.939275]  ___slab_alloc+0x36c/0x520
[10220.939287]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939299]  ? __get_txreq+0x54/0x160 [hfi1]
[10220.939310]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939312]  __slab_alloc+0x40/0x61
[10220.939323]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939325]  kmem_cache_alloc+0x181/0x1b0
[10220.939336]  hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939348]  ? hfi1_verbs_send_dma+0x386/0xa10 [hfi1]
[10220.939359]  ? find_prev_entry+0xb0/0xb0 [hfi1]
[10220.939371]  hfi1_do_send+0x1d9/0x3f0 [hfi1]
[10220.939372]  process_one_work+0x171/0x380
[10220.939374]  worker_thread+0x49/0x3f0
[10220.939375]  kthread+0xf8/0x130
[10220.939377]  ? max_active_store+0x80/0x80
[10220.939378]  ? kthread_bind+0x10/0x10
[10220.939379]  ret_from_fork+0x35/0x40
[10220.939381] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)

The shortage is handled properly so the message isn't needed. Silence by
adding the no warn option to the slab allocation.

Fixes: 45842abbb2 ("staging/rdma/hfi1: move txreq header code")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
cf131a8196 IB/hfi1: Avoid hardlockup with flushlist_lock
Heavy contention of the sde flushlist_lock can cause hard lockups at
extreme scale when the flushing logic is under stress.

Mitigate by replacing the item at a time copy to the local list with
an O(1) list_splice_init() and using the high priority work queue to
do the flushes.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Jason Gunthorpe
5d60c11154 RDMA: Move rdma_node_type to uapi/
This enum is exposed over the sysfs file 'node_type' and over netlink via
RDMA_NLDEV_ATTR_DEV_NODE_TYPE, so declare it in the uapi headers.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 14:58:59 -04:00
Ingo Molnar
23da766ab1 Merge tag 'v5.2-rc5' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17 12:12:27 +02:00
Nikolay Borisov
9ffbe8ac05 locking/lockdep: Rename lockdep_assert_held_exclusive() -> lockdep_assert_held_write()
All callers of lockdep_assert_held_exclusive() use it to verify the
correct locking state of either a semaphore (ldisc_sem in tty,
mmap_sem for perf events, i_rwsem of inode for dax) or rwlock by
apparmor. Thus it makes sense to rename _exclusive to _write since
that's the semantics callers care. Additionally there is already
lockdep_assert_held_read(), which this new naming is more consistent with.

No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190531100651.3969-1-nborisov@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17 12:09:24 +02:00
Janne Karhunen
42df744c41 LSM: switch to blocking policy update notifiers
Atomic policy updaters are not very useful as they cannot
usually perform the policy updates on their own. Since it
seems that there is no strict need for the atomicity,
switch to the blocking variant. While doing so, rename
the functions accordingly.

Signed-off-by: Janne Karhunen <janne.karhunen@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2019-06-14 09:02:42 -04:00
Yuval Avnery
1f8a7bee27 net/mlx5: Add EQ enable/disable API
Previously, EQ joined the chain notifier on creation.
This forced the caller to be ready to handle events before creating
the EQ through eq_create_generic interface.

To help the caller control when the created EQ will be attached to the
IRQ, add enable/disable API.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Ariel Levkovich
81bfa20603 net/mlx5: Use a single IRQ for all async EQs
The patch modifies the IRQ allocation so that all async EQs are
assigned to the same IRQ resulting in more available IRQs for
completion EQs.

The changes are using the support for IRQ sharing and EQ polling budget
that was introduced in previous patches so when the shared interrupt is
triggered, the kernel will serially call the handler of each of the
sharing EQs with a certain budget of EQEs to poll in order to prevent
starvation.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Yuval Avnery
24163189da net/mlx5: Separate IRQ request/free from EQ life cycle
Instead of requesting IRQ with eq creation, IRQs will be requested
before EQ table creation.
Instead of freeing the IRQs after EQ destroy, free IRQs after eq
table destroy.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Yuval Avnery
ca390799c2 net/mlx5: Change interrupt handler to call chain notifier
Multiple EQs may share the same IRQ in subsequent patches.

Instead of calling the IRQ handler directly, the EQ will register
to an atomic chain notfier.

The Linux built-in shared IRQ is not used because it forces the caller
to disable the IRQ and clear affinity before free_irq() can be called.

This patch is the first step in the separation of IRQ and EQ logic.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Jason Gunthorpe
2d3c72ed50 rdma: Remove nes
This driver was first merged over 10 years ago and has not seen major
activity by the authors in the last 7 years. However, in that time it has
been patched 150 times to adapt it to changing kernel APIs.

Further, the hardware has several issues, like not supporting 64 bit DMA,
that make it rather uninteresting for use with modern systems and RDMA.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-13 09:59:49 -04:00
Kamal Heib
cbdc666f3e RDMA/ipoib: Remove check for ETH_SS_TEST
The default action for unlisted tests is "not-supported", so given that
ipoib doesn't support ETH_SS_TEST, there is no need to check for it
in the case statements, just let it get caught by the default: case.

Fixes: e3614bc9dc ("IB/ipoib: Add readout of statistics using ethtool")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11 19:00:10 -04:00
Leon Romanovsky
e39afe3d6d RDMA: Convert CQ allocations to be under core responsibility
Ensure that CQ is allocated and freed by IB/core and not by drivers.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11 16:39:49 -04:00
Leon Romanovsky
a52c8e2469 RDMA: Clean destroy CQ in drivers do not return errors
Like all other destroy commands, .destroy_cq() call is not supposed
to fail. In all flows, the attempt to return earlier caused to memory
leaks.

This patch converts .destroy_cq() to do not return any errors.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11 16:17:10 -04:00
Leon Romanovsky
147b308e6a RDMA/nes: Avoid memory allocation during CQ destroy
The memory allocation call can fail and cause to early return
from nes_desotroy_cq() function. This situation will cause to
memory leak of struct nes_cq. Rewrite function to avoid memory
allocation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11 16:12:34 -04:00
Mike Marciniszyn
cc78076af1 IB/hfi1: Correct tid qp rcd to match verbs context
The qp priv rcd pointer doesn't match the context being used for verbs
causing issues when 9B and kdeth packets are processed by different
receive contexts and hence different CPUs.

When running on different CPUs the following panic can occur:

 WARNING: CPU: 3 PID: 2584 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
 list_del corruption. prev->next should be ffff9a7ac31f7a30, but was ffff9a7c3bc89230
 CPU: 3 PID: 2584 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  ------------   3.10.0-862.2.3.el7_lustre.x86_64 #1
 Call Trace:
  <IRQ>  [<ffffffffb7b0d78e>] dump_stack+0x19/0x1b
  [<ffffffffb74916d8>] __warn+0xd8/0x100
  [<ffffffffb749175f>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffffb7768671>] __list_del_entry+0xa1/0xd0
  [<ffffffffc0c7a945>] process_rcv_qp_work+0xb5/0x160 [hfi1]
  [<ffffffffc0c7bc2b>] handle_receive_interrupt_nodma_rtail+0x20b/0x2b0 [hfi1]
  [<ffffffffc0c70683>] receive_context_interrupt+0x23/0x40 [hfi1]
  [<ffffffffb7540a94>] __handle_irq_event_percpu+0x44/0x1c0
  [<ffffffffb7540c42>] handle_irq_event_percpu+0x32/0x80
  [<ffffffffb7540ccc>] handle_irq_event+0x3c/0x60
  [<ffffffffb7543a1f>] handle_edge_irq+0x7f/0x150
  [<ffffffffb742d504>] handle_irq+0xe4/0x1a0
  [<ffffffffb7b23f7d>] do_IRQ+0x4d/0xf0
  [<ffffffffb7b16362>] common_interrupt+0x162/0x162
  <EOI>  [<ffffffffb775a326>] ? memcpy+0x6/0x110
  [<ffffffffc109210d>] ? abd_copy_from_buf_off_cb+0x1d/0x30 [zfs]
  [<ffffffffc10920f0>] ? abd_copy_to_buf_off_cb+0x30/0x30 [zfs]
  [<ffffffffc1093257>] abd_iterate_func+0x97/0x120 [zfs]
  [<ffffffffc10934d9>] abd_copy_from_buf_off+0x39/0x60 [zfs]
  [<ffffffffc109b828>] arc_write_ready+0x178/0x300 [zfs]
  [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
  [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
  [<ffffffffc1164d05>] zio_ready+0x65/0x3d0 [zfs]
  [<ffffffffc04d725e>] ? tsd_get_by_thread+0x2e/0x50 [spl]
  [<ffffffffc04d1318>] ? taskq_member+0x18/0x30 [spl]
  [<ffffffffc115ef22>] zio_execute+0xa2/0x100 [zfs]
  [<ffffffffc04d1d2c>] taskq_thread+0x2ac/0x4f0 [spl]
  [<ffffffffb74cee80>] ? wake_up_state+0x20/0x20
  [<ffffffffc115ee80>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
  [<ffffffffc04d1a80>] ? taskq_thread_spawn+0x60/0x60 [spl]
  [<ffffffffb74bae31>] kthread+0xd1/0xe0
  [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40
  [<ffffffffb7b1f5f7>] ret_from_fork_nospec_begin+0x21/0x21
  [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40

Fix by reading the map entry in the same manner as the hardware so that
the kdeth and verbs contexts match.

Cc: <stable@vger.kernel.org>
Fixes: 5190f052a3 ("IB/hfi1: Allow the driver to initialize QP priv struct")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:45 -03:00
Mike Marciniszyn
da9de5f852 IB/hfi1: Close PSM sdma_progress sleep window
The call to sdma_progress() is called outside the wait lock.

In this case, there is a race condition where sdma_progress() can return
false and the sdma_engine can idle.  If that happens, there will be no
more sdma interrupts to cause the wakeup and the user_sdma xmit will hang.

Fix by moving the lock to enclose the sdma_progress() call.

Also, delete busycount. The need for this was removed by:
commit bcad29137a ("IB/hfi1: Serve the most starved iowait entry first")

Cc: <stable@vger.kernel.org>
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:45 -03:00
Kaike Wan
5f90677ed3 IB/hfi1: Validate fault injection opcode user input
The opcode range for fault injection from user should be validated before
it is applied to the fault->opcodes[] bitmap to avoid out-of-bound
error.

Cc: <stable@vger.kernel.org>
Fixes: a74d5307ca ("IB/hfi1: Rework fault injection machinery")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:37 -03:00
Jason Gunthorpe
7a15414252 RDMA: Move owner into struct ib_device_ops
This more closely follows how other subsytems work, with owner being a
member of the structure containing the function pointers.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:03 -03:00
Jason Gunthorpe
72c6ec18eb RDMA: Move uverbs_abi_ver into struct ib_device_ops
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:02 -03:00
Jason Gunthorpe
b9560a419b RDMA: Move driver_id into struct ib_device_ops
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:02 -03:00
Jason Gunthorpe
a1a8e4a85c rdma: Delete the ib_ucm module
This has been marked CONFIG_BROKEN for over a year now with no complaints.
Delete the whole thing for good.

The module provided the /dev/infiniband/ucmX interface.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 15:02:01 -03:00
Linus Torvalds
9331b6740f Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull yet more SPDX updates from Greg KH:
 "Another round of SPDX header file fixes for 5.2-rc4

  These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
  added, based on the text in the files. We are slowly chipping away at
  the 700+ different ways people tried to write the license text. All of
  these were reviewed on the spdx mailing list by a number of different
  people.

  We now have over 60% of the kernel files covered with SPDX tags:
	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
	Files checked:            64533
	Files with SPDX:          40392
	Files with errors:            0

  I think the majority of the "easy" fixups are now done, it's now the
  start of the longer-tail of crazy variants to wade through"

* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
  ...
2019-06-08 12:52:42 -07:00
Matthew Wilcox
afcafe07af ucma: Convert ctx_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-07 16:43:02 -03:00