Commit Graph

9683 Commits

Author SHA1 Message Date
Adit Ranadive
6325e01b6c RDMA/vmw_pvrdma: Return the correct opcode when creating WR
Since the IB_WR_REG_MR opcode value changed, let's set the PVRDMA device
opcodes explicitly.

Reported-by: Ruishuang Wang <ruishuangw@vmware.com>
Fixes: 9a59739bd0 ("IB/rxe: Revise the ib_wr_opcode enum")
Cc: stable@vger.kernel.org
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Ruishuang Wang <ruishuangw@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:00:28 -07:00
Steve Wise
917cb8a72a RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type
A recent regression causes a null ptr crash when dumping cm_id resources.
The cma is incorrectly adding all cm_id restrack resources as kernel mode.

Fixes: af8d70375d ("RDMA/restrack: Resource-tracker should not use uobject pointers")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 17:12:33 -07:00
Leon Romanovsky
13859d5df4 RDMA/mlx5: Embed into the code flow the ODP config option
Convert various places to more readable code, which embeds
CONFIG_INFINIBAND_ON_DEMAND_PAGING into the code flow.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky
8b4d5bc5cf RDMA/mlx5: Introduce and reuse helper to identify ODP MR
Consolidate various checks if MR is ODP backed to one simple helper and
update call sites to use it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky
e502b8b011 RDMA/core: Don't depend device ODP capabilities on kconfig option
Device capability bits are exposing what specific device supports from HW
perspective. Those bits are not dependent on kernel configurations and
RDMA/core should ensure that proper interfaces to users will be disabled
if CONFIG_INFINIBAND_ON_DEMAND_PAGING is not set.

Fixes: f4056bfd8c ("IB/core: Add on demand paging caps to ib_uverbs_ex_query_device")
Fixes: 8cdd312cfe ("IB/mlx5: Implement the ODP capability query verb")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky
96f87ee181 RDMA: Clean structures from CONFIG_INFINIBAND_ON_DEMAND_PAGING
CONFIG_INFINIBAND_ON_DEMAND_PAGING is used in general structures to
micro-optimize the memory footprint. Remove it, so it will allow us to
simplify various ODP device flows.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Gustavo A. R. Silva
7a7b0fea6f IB/srp: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 15:46:59 -07:00
Luis Chamberlain
750afb08ca cross-tree: phase out dma_zalloc_coherent()
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.

This change was generated with the following Coccinelle SmPL patch:

@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@

-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-08 07:58:37 -05:00
Leon Romanovsky
a9666c1cae RDMA/nldev: Don't expose unsafe global rkey to regular user
Unsafe global rkey is considered dangerous because it exposes memory
registered for all memory in the system. Only users with a QP on the same
PD can use the rkey, and generally those QPs will already know the
value. However, out of caution, do not expose the value to unprivleged
users on the local system. Require CAP_NET_ADMIN instead.

Cc: <stable@vger.kernel.org> # 4.16
Fixes: 29cf1351d4 ("RDMA/nldev: provide detailed PD information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 13:35:57 -07:00
Lijun Ou
91fb4d83b8 RDMA/hns: Modify the pbl ba page size for hip08
Modify the pbl ba page size to 16K for in order to support 4G MR size.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:14:42 -07:00
Lijun Ou
44754b95dd RDMA/hns: Add constraint on the setting of local ACK timeout
According to IB protocol, local ACK timeout shall be a 5 bit
value. Currently, hip08 could not support the possible max value 31. Fail
the request in this case.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:13:34 -07:00
Lijun Ou
4d103905eb RDMA/hns: Bugfix for the scene without receiver queue
In some application scenario, the user could not have receive queue when
run rdma write or read operation.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:12:50 -07:00
Lijun Ou
9c6ccc035c RDMA/hns: Fix the bug with updating rq head pointer when flush cqe
When flush cqe with srq, the driver disable to update the rq head pointer
into the hardware.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:12:36 -07:00
Potnuri Bharat Teja
e6b7b7d8a9 iw_cxgb4: Check for send WR also while posting write with completion WR
Inorder to optimize the NVMEoF read IOPs, iw_cxgb4 posts a FW Write with
Completion WQE that combines an RDMA Write WR and the subsequent RDMA Send
with Invalidate WR.

This patch is an extension to it, where it posts a Write with completion
for RDMA WRITE WR + RDMA SEND WR combination as well.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:02:45 -07:00
Gustavo A. R. Silva
5aad26a7ea IB/core: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 11:59:33 -07:00
Gustavo A. R. Silva
02fc184841 IB/usnic: Use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 11:43:03 -07:00
Gustavo A. R. Silva
b5c61b968d IB/cm: Use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 11:43:03 -07:00
Gal Pressman
f687ccea10 RDMA/uverbs: Fix post send success return value in case of error
If get QP object fails 'ret' must be assigned with a proper error code.

Fixes: 9a0738575f ("RDMA/uverbs: Use uverbs_response() for remaining response copying")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 11:42:22 -07:00
Linus Torvalds
3954e1d031 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
 "Over the break a few defects were found, so this is a -rc style pull
  request of various small things that have been posted.

   - An attempt to shorten RCU grace period driven delays showed crashes
     during heavier testing, and has been entirely reverted

   - A missed merge/rebase error between the advise_mr and ib_device_ops
     series

   - Some small static analysis driven fixes from Julia and Aditya

   - Missed ability to create a XRC_INI in the devx verbs interop
     series"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  infiniband/qedr: Potential null ptr dereference of qp
  infiniband: bnxt_re: qplib: Check the return value of send_message
  IB/ipoib: drop useless LIST_HEAD
  IB/core: Add advise_mr to the list of known ops
  Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads"
  IB/mlx5: Allow XRC INI usage via verbs in DEVX context
2019-01-05 18:20:51 -08:00
Linus Torvalds
96d4f267e4 Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03 18:57:57 -08:00
Aditya Pakki
9c6260de50 infiniband/qedr: Potential null ptr dereference of qp
idr_find() may fail and return a NULL pointer. The fix checks the return
value of the function and returns an error in case of NULL.

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 16:08:38 -07:00
Aditya Pakki
94edd87a1c infiniband: bnxt_re: qplib: Check the return value of send_message
In bnxt_qplib_map_tc2cos(), bnxt_qplib_rcfw_send_message() can return an
error value but it is lost. Propagate this error to the callers.

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Acked-By: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 16:04:24 -07:00
Julia Lawall
2fb458953a IB/ipoib: drop useless LIST_HEAD
Drop LIST_HEAD where the variable it declares is never used.

Commit 31c02e2157 ("IPoIB: Avoid using stale last_send counter
when reaping AHs") removed the uses, but not the declaration.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 31c02e2157 ("IPoIB: Avoid using stale last_send counter when reaping AHs")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 15:42:43 -07:00
Moni Shoua
2f1927b090 IB/core: Add advise_mr to the list of known ops
We need to add advise_mr to the list of operation setters on the ib_device
or otherwise callers to ib_set_device_ops() for advise_mr operation will
not have their callback registered.

When the advise_mr series was merged with the device ops series the
SET_DEVICE_OPS() was missed.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 09:40:34 -07:00
Leon Romanovsky
ccffa54548 Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads"
Longer term testing shows this patch didn't play well with MR cache and
caused to call traces during remove_mkeys().

This reverts commit bb7e22a8ab.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 09:40:34 -07:00
Yishai Hadas
7422edce73 IB/mlx5: Allow XRC INI usage via verbs in DEVX context
From device point of view both XRC target and initiator are XRC transport
type.

Fix to use the expected UID as was handled for the XRC target case to
allow its usage via verbs in DEVX context.

Fixes: 5aa3771ded ("IB/mlx5: Allow XRC usage via verbs in DEVX context")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 09:40:34 -07:00
Linus Torvalds
f346b0becb Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - large KASAN update to use arm's "software tag-based mode"

 - a few misc things

 - sh updates

 - ocfs2 updates

 - just about all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits)
  kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
  memcg, oom: notify on oom killer invocation from the charge path
  mm, swap: fix swapoff with KSM pages
  include/linux/gfp.h: fix typo
  mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
  hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  memory_hotplug: add missing newlines to debugging output
  mm: remove __hugepage_set_anon_rmap()
  include/linux/vmstat.h: remove unused page state adjustment macro
  mm/page_alloc.c: allow error injection
  mm: migrate: drop unused argument of migrate_page_move_mapping()
  blkdev: avoid migration stalls for blkdev pages
  mm: migrate: provide buffer_migrate_page_norefs()
  mm: migrate: move migrate_page_lock_buffers()
  mm: migrate: lock buffers before migrate_page_move_mapping()
  mm: migration: factor out code to compute expected number of page references
  mm, page_alloc: enable pcpu_drain with zone capability
  kmemleak: add config to select auto scan
  mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
  ...
2018-12-28 16:55:46 -08:00
Linus Torvalds
5d24ae67a9 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "This has been a fairly typical cycle, with the usual sorts of driver
  updates. Several series continue to come through which improve and
  modernize various parts of the core code, and we finally are starting
  to get the uAPI command interface cleaned up.

   - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4,
     mlx5, qib, rxe, usnic

   - Rework the entire syscall flow for uverbs to be able to run over
     ioctl(). Finally getting past the historic bad choice to use
     write() for command execution

   - More functional coverage with the mlx5 'devx' user API

   - Start of the HFI1 series for 'TID RDMA'

   - SRQ support in the hns driver

   - Support for new IBTA defined 2x lane widths

   - A big series to consolidate all the driver function pointers into a
     big struct and have drivers provide a 'static const' version of the
     struct instead of open coding initialization

   - New 'advise_mr' uAPI to control device caching/loading of page
     tables

   - Support for inline data in SRPT

   - Modernize how umad uses the driver core and creates cdev's and
     sysfs files

   - First steps toward removing 'uobject' from the view of the drivers"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits)
  RDMA/srpt: Use kmem_cache_free() instead of kfree()
  RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
  IB/uverbs: Signedness bug in UVERBS_HANDLER()
  IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
  IB/umad: Start using dev_groups of class
  IB/umad: Use class_groups and let core create class file
  IB/umad: Refactor code to use cdev_device_add()
  IB/umad: Avoid destroying device while it is accessed
  IB/umad: Simplify and avoid dynamic allocation of class
  IB/mlx5: Fix wrong error unwind
  IB/mlx4: Remove set but not used variable 'pd'
  RDMA/iwcm: Don't copy past the end of dev_name() string
  IB/mlx5: Fix long EEH recover time with NVMe offloads
  IB/mlx5: Simplify netdev unbinding
  IB/core: Move query port to ioctl
  RDMA/nldev: Expose port_cap_flags2
  IB/core: uverbs copy to struct or zero helper
  IB/rxe: Reuse code which sets port state
  IB/rxe: Make counters thread safe
  IB/mlx5: Use the correct commands for UMEM and UCTX allocation
  ...
2018-12-28 14:57:10 -08:00
Linus Torvalds
938edb8a31 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
  megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.

  Additionally, we have a pile of annotation, unused variable and minor
  updates.

  The big API change is the updates for Christoph's DMA rework which
  include removing the DISABLE_CLUSTERING flag.

  And finally there are a couple of target tree updates"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
  scsi: isci: request: mark expected switch fall-through
  scsi: isci: remote_node_context: mark expected switch fall-throughs
  scsi: isci: remote_device: Mark expected switch fall-throughs
  scsi: isci: phy: Mark expected switch fall-through
  scsi: iscsi: Capture iscsi debug messages using tracepoints
  scsi: myrb: Mark expected switch fall-throughs
  scsi: megaraid: fix out-of-bound array accesses
  scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
  scsi: fcoe: remove set but not used variable 'port'
  scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
  scsi: smartpqi: fix build warnings
  scsi: smartpqi: update driver version
  scsi: smartpqi: add ofa support
  scsi: smartpqi: increase fw status register read timeout
  scsi: smartpqi: bump driver version
  scsi: smartpqi: add smp_utils support
  scsi: smartpqi: correct lun reset issues
  scsi: smartpqi: correct volume status
  scsi: smartpqi: do not offline disks for transient did no connect conditions
  scsi: smartpqi: allow for larger raid maps
  ...
2018-12-28 14:48:06 -08:00
Jérôme Glisse
5d6527a784 mm/mmu_notifier: use structure for invalidate_range_start/end callback
Patch series "mmu notifier contextual informations", v2.

This patchset adds contextual information, why an invalidation is
happening, to mmu notifier callback.  This is necessary for user of mmu
notifier that wish to maintains their own data structure without having to
add new fields to struct vm_area_struct (vma).

For instance device can have they own page table that mirror the process
address space.  When a vma is unmap (munmap() syscall) the device driver
can free the device page table for the range.

Today we do not have any information on why a mmu notifier call back is
happening and thus device driver have to assume that it is always an
munmap().  This is inefficient at it means that it needs to re-allocate
device page table on next page fault and rebuild the whole device driver
data structure for the range.

Other use case beside munmap() also exist, for instance it is pointless
for device driver to invalidate the device page table when the
invalidation is for the soft dirtyness tracking.  Or device driver can
optimize away mprotect() that change the page table permission access for
the range.

This patchset enables all this optimizations for device drivers.  I do not
include any of those in this series but another patchset I am posting will
leverage this.

The patchset is pretty simple from a code point of view.  The first two
patches consolidate all mmu notifier arguments into a struct so that it is
easier to add/change arguments.  The last patch adds the contextual
information (munmap, protection, soft dirty, clear, ...).

This patch (of 3):

To avoid having to change many callback definition everytime we want to
add a parameter use a structure to group all parameters for the
mmu_notifier invalidate_range_start/end callback.  No functional changes
with this patch.

[akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc]
Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>	[infiniband]
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:50 -08:00
Wei Yongjun
f617e5ffe0 RDMA/srpt: Use kmem_cache_free() instead of kfree()
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Fixes: 5dabcd0456 ("RDMA/srpt: Add support for immediate data")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-22 16:07:47 -07:00
Dan Carpenter
58f7c0bfb4 RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
The "num_actions" variable needs to be signed for the error handling to
work.  The maximum number of actions is less than 256 so int type is large
enough for that.

Fixes: cbfdd442c4 ("IB/uverbs: Add helper to get array size from ptr attribute")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-22 16:07:13 -07:00
Dan Carpenter
573671a5f6 IB/uverbs: Signedness bug in UVERBS_HANDLER()
The "num_sge" variable needs to be signed for the error handling to work.
The uverbs_attr_ptr_get_array_size() returns int so this change is safe.

Fixes: ad8a449675 ("IB/uverbs: Add support to advise_mr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-22 16:07:13 -07:00
Yishai Hadas
aa74be6eea IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
The per-port Q counter is some kernel resource and as such may be used by
few UID(s) upon DEVX usage.

To enable using it for QP/RQ when DEVX context is used need to allocate it
with a sharing mode indication to let firmware allows its usage.

The UID = 0xffff was chosen to mark it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 12:15:07 -07:00
Parav Pandit
75bf8a2a2f IB/umad: Start using dev_groups of class
Start using core defined dev_groups of a class which allows to add device
attributes to the core kernel and simplify the umad module.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 11:39:41 -07:00
Parav Pandit
cdb53b65ae IB/umad: Use class_groups and let core create class file
Use class->class_groups core kernel facility to create the abi version
file instead of open coding.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 11:39:40 -07:00
Parav Pandit
e9dd5daf88 IB/umad: Refactor code to use cdev_device_add()
Refactor code to use cdev_device_add() and do other minor refactors while
modifying these functions as below.

1. Instead of returning generic -1, return an actual error for
   ib_umad_init_port().

2. Introduce and use ib_umad_init_port_dev() for sm and umad char devices.

3. Instead of kobj, use more light weight kref to refcount ib_umad_device.

4. Use modern cdev_device_add() single code cut down three steps of
   cdev_add(), device_create(). This further helps to move device sysfs
   files to class attributes in subsequent patch.

5. Remove few empty lines while refactoring these functions.

6. Use sizeof() instead of sizeof to avoid checkpatch warning.

7. Use struct_size() for calculation of ib_umad_port.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 11:39:38 -07:00
Parav Pandit
cf7ad30302 IB/umad: Avoid destroying device while it is accessed
ib_umad_reg_agent2() and ib_umad_reg_agent() access the device name in
dev_notice(), while concurrently, ib_umad_kill_port() can destroy the
device using device_destroy().

        cpu-0                               cpu-1
        -----                               -----
    ib_umad_ioctl()
        [...]                            ib_umad_kill_port()
                                              device_destroy(dev)

        ib_umad_reg_agent()
            dev_notice(dev)

Therefore, first mark ib_dev as NULL, to block any further access in file
ops, unregister the mad agent and destroy the device at the end after
mutex is unlocked.

This ensures that device doesn't get destroyed, while it may get accessed.

Fixes: 0f29b46d49 ("IB/mad: add new ioctl to ABI to support new registration options")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 10:39:36 -07:00
Parav Pandit
900d07c12d IB/umad: Simplify and avoid dynamic allocation of class
Simplify code to have a static structure instance for umad class
allocation.

This will allow to have class attributes defined along with class
registration in subsequent patch and allows more class methods definition
similar to ib_core module.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 10:39:36 -07:00
Jason Gunthorpe
623d154305 IB/mlx5: Fix wrong error unwind
The destroy_workqueue on error unwind is missing, and the code jumps to
the wrong exit label.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 20:49:31 -07:00
YueHaibing
e7c4d8e604 IB/mlx4: Remove set but not used variable 'pd'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/mlx4/qp.c: In function '_mlx4_ib_destroy_qp':
drivers/infiniband/hw/mlx4/qp.c:1612:22: warning:
 variable 'pd' set but not used [-Wunused-but-set-variable]

Fixes: e00b64f7c5 ("RDMA: Cleanup undesired pd->uobject usage")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 20:47:42 -07:00
Steve Wise
d53ec8af56 RDMA/iwcm: Don't copy past the end of dev_name() string
We now use dev_name(&ib_device->dev) instead of ib_device->name in iwpm
messages.  The name field in struct device is a const char *, where as
ib_device->name is a char array of size IB_DEVICE_NAME_MAX, and it is
pre-initialized to zeros.

Since iw_cm_map() was using memcpy() to copy in the device name, and
copying IWPM_DEVNAME_SIZE bytes, it ends up copying past the end of the
source device name string and copying random bytes.  This results in iwpmd
failing the REGISTER_PID request from iwcm.  Thus port mapping is broken.

Validate the device and if names, and use strncpy() to inialize the entire
message field.

Fixes: 896de0090a ("RDMA/core: Use dev_name instead of ibdev->name")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 20:45:56 -07:00
Huy Nguyen
bb7e22a8ab IB/mlx5: Fix long EEH recover time with NVMe offloads
On NVMe offloads connection with many IO queues, EEH takes long time to
recover. The culprit is the synchronize_srcu in the destroy_mkey. The
solution is to use synchronize_srcu only for ODP mkey.

Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 16:28:47 -07:00
Or Gerlitz
842a9c837e IB/mlx5: Simplify netdev unbinding
When dealing with netdev unregister events, we just need to know that this
is our currently bounded netdev. There's no need to do any further
checks/queries.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:24 -07:00
Michael Guralnik
641d1207d2 IB/core: Move query port to ioctl
Add a method for query port under the uverbs global methods.  Current
ib_port_attr struct is passed as a single attribute and port_cap_flags2 is
added as a new attribute to the function.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:24 -07:00
Michael Guralnik
4fa2813d26 RDMA/nldev: Expose port_cap_flags2
port_cap_flags2 represents IBTA PortInfo:CapabilityMask2.

The field safely extends the RDMA_NLDEV_ATTR_CAP_FLAGS operand as it was
exported as 64 bit to allow this kind of extension.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:24 -07:00
Michael Guralnik
2e8039c656 IB/core: uverbs copy to struct or zero helper
Add a helper to zero fill fields before copying data to
UVERBS_ATTR_STRUCT.

As UVERBS_ATTR_STRUCT can be used as an extensible struct, we want to make
sure that if the user supplies us with a struct that has new fields that
we are not aware of, we return them zeroed to the user.

This helper should be used when using UVERBS_ATTR_STRUCT for an extendable
data structure and there is a need to make sure that extended members of
the struct, that the kernel doesn't handle, are returned zeroed to the
user. This is needed due to the fact that UVERBS_ATTR_STRUCT allows
non-zero values for members after 'last' member.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:18 -07:00
Yuval Shaia
f55c3ec42a IB/rxe: Reuse code which sets port state
Same code is executed in both rxe_param_set_add and rxe_notify functions.
Make one function and call it from both places.

Since both callers already have a rxe object use it directly instead of
deriving it from the net device.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com> 
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 14:20:58 -07:00
Parav Pandit
d5108e69fe IB/rxe: Make counters thread safe
Current rxe device counters are not thread safe.
When multiple QPs are used, they can be racy.
Make them thread safe by making it atomic64.

Fixes: 0b1e5b99a4 ("IB/rxe: Add port protocol stats")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 14:09:45 -07:00
Yishai Hadas
6e3722baac IB/mlx5: Use the correct commands for UMEM and UCTX allocation
During testing the command format was changed to close a security
hole. Revise the driver to use the command format that will actually be
supported in GA firmware.

Both the UMEM and UCTX are intended only for use by the kernel and cannot
be executed using a general command.

Since the UMEM and CTX are not part of the general object the caps bits
were moved to be some log_xxx location in the general HCA caps.

The firmware code was adapted as well to match the above.

Fixes: a8b92ca1b0 ("IB/mlx5: Introduce DEVX")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 13:49:48 -07:00