Commit Graph

11436 Commits

Author SHA1 Message Date
David S. Miller
a57066b1a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The UDP reuseport conflict was a little bit tricky.

The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.

At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.

This requires moving the reuseport_has_conns() logic into the callers.

While we are here, get rid of inline directives as they do not belong
in foo.c files.

The other changes were cases of more straightforward overlapping
modifications.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25 17:49:04 -07:00
Gustavo A. R. Silva
6f24b15925 IB/hfi1: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the
new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7-rc7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Link: https://lore.kernel.org/r/20200721133455.GA14363@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:59:55 -03:00
Leon Romanovsky
9b8d846924 RDMA/uverbs: Silence shiftTooManyBitsSigned warning
Fix reported by kbuild warning.

   drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned]
    BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31));
                                                 ^
Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:53:02 -03:00
Leon Romanovsky
29f3fe1d68 RDMA/uverbs: Remove redundant assignments
The kbuild reported the following warning, so clean whole uverbs_cmd.c
file.

   drivers/infiniband/core/uverbs_cmd.c:1066:6: warning: Variable 'ret' is reassigned a value before the old one has been used. [redundantAssignment]
    ret = uverbs_request(attrs, &cmd, sizeof(cmd));
        ^
   drivers/infiniband/core/uverbs_cmd.c:1064:0: note: Variable 'ret' is reassigned a value before the old one has been used.
    int    ret = -EINVAL;
   ^

Link: https://lore.kernel.org/r/20200720175627.1273096-2-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:53:02 -03:00
Maor Gottlieb
d4d7f59643 RDMA/mlx5: Add missing srcu_read_lock in ODP implicit flow
According to the locking scheme, mlx5_ib_update_xlt() should be called
with srcu_read_lock(dev->odp->srcu). Prefetch missed this. This fixes the
below WARN from lockdep_assert_held():

  WARNING: CPU: 1 PID: 1130 at drivers/infiniband/hw/mlx5/odp.c:132 mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
  Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core
  CPU: 1 PID: 1130 Comm: kworker/u16:11 Tainted: G        W 5.8.0-rc5_for_upstream_debug_2020_07_13_11_04 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
  RIP: 0010:mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
  Code: 08 e2 85 c0 0f 84 65 ff ff ff 49 8b 87 60 01 00 00 be ff ff ff ff 48 8d b8 b0 39 00 00 e8 93 e0 50 e1 85 c0 0f 85 45 ff ff ff <0f> 0b e9 3e ff ff ff 0f 0b eb c7 0f 1f 44 00 00 48 8b 87 98 0f 00
  RSP: 0018:ffff88840f44fc68 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88840cc9d000 RCX: ffff88840efcd940
  RDX: 0000000000000000 RSI: ffff88844871b9b0 RDI: ffff88840efce100
  RBP: ffff88840cc9d040 R08: 0000000000000040 R09: 0000000000000001
  R10: ffff88846ced3068 R11: 0000000000000000 R12: 00000000000156ec
  R13: 0000000000000004 R14: 0000000000000004 R15: ffff888439941000
  FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f8536d12430 CR3: 0000000437a5e006 CR4: 0000000000360ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mlx5_ib_update_xlt+0x37c/0x7c0 [mlx5_ib]
   pagefault_mr+0x315/0x440 [mlx5_ib]
   mlx5_ib_prefetch_mr_work+0x56/0xa0 [mlx5_ib]
   process_one_work+0x215/0x5c0
   worker_thread+0x3c/0x380
   ? process_one_work+0x5c0/0x5c0
   kthread+0x133/0x150
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30

Hold the SRCU during prefetch, even though it strictly isn't needed since
prefetch is holding the num_deferred_work it does make it easier to reason
about.

Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065747.131157-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:44:06 -03:00
Leon Romanovsky
16e51f78a9 RDMA/core: Update write interface to use automatic object lifetime
The automatic object lifetime model allows us to change the write()
interface to have the same logic as the ioctl() path. Update the
create/alloc functions to be in the following format, so the code flow
will be the same:

 * Allocate objects
 * Initialize them
 * Call to the drivers, this is last step that is allowed to fail
 * Finalize object
 * Return response and allow to core code to handle abort/commit
   respectively.

Link: https://lore.kernel.org/r/20200719052223.75245-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:22:30 -03:00
Leon Romanovsky
0f63ef1dd5 RDMA/core: Align abort/commit object scheme for write() and ioctl() paths
Create the same logic flow for the write() interface as we have for the
ioctl() path by making sure that the object is committed or aborted
automatically after HW object creation.

Link: https://lore.kernel.org/r/20200719052223.75245-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 15:57:22 -03:00
Maor Gottlieb
c94e272b57 RDMA/mlx5: Allow SQ modification
Currently the SQ is set to a ready state when the RAW QP is modified to
INIT.  When the TIS is modified, e.g. to change the lag_tx_affinity, then
SQs which are already in the ready state will not be affected.

Open a window to modify the SQ behavior by setting the SQ as ready only
when QP was modified to RTS.

Link: https://lore.kernel.org/r/20200716105416.1423826-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 15:49:19 -03:00
Alexander Lobakin
155065866b qed: add support for different page sizes for chains
Extend current infrastructure to store chain page size in a struct
and use it in all functions instead of fixed QED_CHAIN_PAGE_SIZE.
Its value remains the default one, but can be overridden in
qed_chain_init_params before chain allocation.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
b6db3f71c9 qed: simplify chain allocation with init params struct
To simplify qed_chain_alloc() prototype and call sites, introduce struct
qed_chain_init_params to specify chain params, and pass a pointer to
filled struct to the actual qed_chain_alloc() instead of a long list
of separate arguments.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Jason Gunthorpe
a862192e92 RDMA/mlx5: Prevent prefetch from racing with implicit destruction
Prefetch work in mlx5_ib_prefetch_mr_work can be queued and able to run
concurrently with destruction of the implicit MR. The num_deferred_work
was intended to serialize this, but there is a race:

       CPU0                                          CPU1

    mlx5_ib_free_implicit_mr()
      xa_erase(odp_mkeys)
      synchronize_srcu()
      __xa_erase(implicit_children)
                                      mlx5_ib_prefetch_mr_work()
                                        pagefault_mr()
                                         pagefault_implicit_mr()
                                          implicit_get_child_mr()
                                           xa_cmpxchg()
                                        atomic_dec_and_test(num_deferred_mr)
      wait_event(imr->q_deferred_work)
      ib_umem_odp_release(odp_imr)
        kfree(odp_imr)

At this point in mlx5_ib_free_implicit_mr() the implicit_children list is
supposed to be empty forever so that destroy_unused_implicit_child_mr()
and related are not and will not be running.

Since it is not empty the destroy_unused_implicit_child_mr() flow ends up
touching deallocated memory as mlx5_ib_free_implicit_mr() already tore down the
imr parent.

The solution is to flush out the prefetch wq by driving num_deferred_work
to zero after creation of new prefetch work is blocked.

Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065435.130722-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-21 13:51:35 -03:00
Devesh Sharma
2bb3c32c5c RDMA/bnxt_re: Change wr posting logic to accommodate variable wqes
Modifying the post-send and post-recv to initialize the wqes slot by slot
dynamically depending on the number of max sges requested by consumer at
the time of QP creation.

Changed the QP creation logic to determine the size of SQ and RQ in 16B
slots based on the number of wqe and number of SGEs requested by consumer

Link: https://lore.kernel.org/r/1594822619-4098-6-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:50 -03:00
Devesh Sharma
54ace98443 RDMA/bnxt_re: Add helper data structures
Adding few helper data structure which are useful to initialize hardware
send wqe in variable wqe mode.

Adding a qp flag in HSI to indicate variable wqe is enabled for this qp.

Link: https://lore.kernel.org/r/1594822619-4098-5-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:50 -03:00
Devesh Sharma
5ac5396a6c RDMA/bnxt_re: Pull psn buffer dynamically based on prod
Changing the PSN management memory buffers from statically initialized to
dynamic pull scheme.

During create qp only the start pointers are initialized and during
post-send the psn buffer is pulled based on current producer index.

Adjusting post_send code to accommodate dynamic psn-pull and changing
post_recv code to match post-send code wrt pseudo flush wqe generation.

Link: https://lore.kernel.org/r/1594822619-4098-4-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Devesh Sharma
159fb4ceac RDMA/bnxt_re: introduce a function to allocate swq
The bnxt_re driver now allocates shadow sq and rq to maintain per wqe
wr_id and few other flags required to support variable wqe. Segregated the
allocation of shadow queue in a separate function and adjust the cqe
polling logic. The new polling logic is based on shadow queue indices.

Link: https://lore.kernel.org/r/1594822619-4098-3-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Devesh Sharma
1da968e0ef RDMA/bnxt_re: introduce wqe mode to select execution path
The bnxt_re driver need to decide on how much SQ and RQ memory should to
be allocated and which wqe posting/polling algorithm to use.

Making changes to set the wqe-mode to a default value during device
registration sequence. The wqe-mode is passed to the lower layer driver as
well. Going forward in the lower layer driver wqe-mode will be used to
decide execution path. Initializing the wqe-mode to static wqe type for
now.

Link: https://lore.kernel.org/r/1594822619-4098-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Kamal Heib
ca4beeee98 RDMA/qedr: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed from the common ops and moved to
the RoCE only ops within the qedr driver.

Link: https://lore.kernel.org/r/20200714183414.61069-8-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:17 -03:00
Kamal Heib
c1c5e9fd3a RDMA/i40iw: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-7-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
ce07f1c6a8 RDMA/cxgb4: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-6-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
c4995bd354 RDMA/siw: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
ab75a6cb8c RDMA/core: Remove query_pkey from the mandatory ops
The query_pkey() isn't mandatory for the iwarp providers, so remove this
requirement from the RDMA core.

Link: https://lore.kernel.org/r/20200714183414.61069-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
6f38efca9b RDMA/core: Allocate the pkey cache only if the pkey_tbl_len is set
Allocate the pkey cache only if the pkey_tbl_len is set by the provider,
also add checks to avoid accessing the pkey cache when it not initialized.

Link: https://lore.kernel.org/r/20200714183414.61069-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
90efc8b2d4 RDMA/core: Expose pkeys sysfs files only if pkey_tbl_len is set
Expose the pkeys sysfs files only if the pkey_tbl_len is set by the
providers.

Link: https://lore.kernel.org/r/20200714183414.61069-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Christoph Hellwig
2f9237d4f6 dma-mapping: make support for dma ops optional
Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-19 09:29:23 +02:00
Kees Cook
3f649ab728 treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:15 -07:00
Mikhail Malygin
5f0b2a6093 RDMA/rxe: Prevent access to wr->next ptr afrer wr is posted to send queue
rxe_post_send_kernel() iterates over linked list of wr's, until the
wr->next ptr is NULL.  However if we've got an interrupt after last wr is
posted, control may be returned to the code after send completion callback
is executed and wr memory is freed.

As a result, wr->next pointer may contain incorrect value leading to
panic. Store the wr->next on the stack before posting it.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200716190340.23453-1-m.malygin@yadro.com
Signed-off-by: Mikhail Malygin <m.malygin@yadro.com>
Signed-off-by: Sergey Kojushev <s.kojushev@yadro.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:12:07 -03:00
Michal Kalderon
eb7f84e379 RDMA/qedr: Add EDPM max size to alloc ucontext response
User space should receive the maximum edpm size from kernel driver,
similar to other edpm/ldpm related limits.  Add an additional parameter to
the alloc_ucontext_resp structure for the edpm maximum size.

In addition, pass an indication from user-space to kernel
(and not just kernel to user) that the DPM sizes are supported.

This is for supporting backward-forward compatibility between driver and
lib for everything related to DPM transaction and limit sizes.

This should have been part of commit mentioned in Fixes tag.

Link: https://lore.kernel.org/r/20200707063100.3811-3-michal.kalderon@marvell.com
Fixes: 93a3d05f9d ("RDMA/qedr: Add kernel capability flags for dpm enabled mode")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:01:55 -03:00
Michal Kalderon
bbe4f42452 RDMA/qedr: Add EDPM mode type for user-fw compatibility
In older FW versions the completion flag was treated as the ack flag in
edpm messages.  commit ff937b916e ("qed: Add EDPM mode type for user-fw
compatibility") exposed the FW option of setting which mode the QP is in
by adding a flag to the qedr <-> qed API.

This patch adds the qedr <-> libqedr interface so that the libqedr can set
the flag appropriately and qedr can pass it down to FW.  Flag is added for
backward compatibility with libqedr.

For older libs, this flag didn't exist and therefore set to zero.

Fixes: ac1b36e55a ("qedr: Add support for user context verbs")
Link: https://lore.kernel.org/r/20200707063100.3811-2-michal.kalderon@marvell.com
Signed-off-by: Yuval Bason <yuval.bason@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:01:55 -03:00
Christophe JAILLET
3e9fed7fb6 RDMA/usnic: switch from 'pci_' to 'dma_' API
The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script bellow.
It has been compile tested.

When memory is allocated, GFP_ATOMIC should be used to be consistent with
the surrounding code.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Link: https://lore.kernel.org/r/20200711073120.249146-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 15:42:05 -03:00
Gustavo A. R. Silva
535ee8cdbc IB/hfi1: Remove unnecessary fall-through markings
Reorganize the code a bit in a more standard way[1] and remove
unnecessary fall-through markings.

[1] https://lore.kernel.org/lkml/20200708054703.GR207186@unreal/

Link: https://lore.kernel.org/r/20200709235250.GA26678@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 15:00:02 -03:00
Yuval Basson
acca72e2b0 RDMA/qedr: SRQ's bug fixes
QP's with the same SRQ, working on different CQs and running in parallel
on different CPUs could lead to a race when maintaining the SRQ consumer
count, and leads to FW running out of SRQs. Update the consumer
atomically.  Make sure the wqe_prod is updated after the sge_prod due to
FW requirements.

Fixes: 3491c9e799 ("qedr: Add support for kernel mode SRQ's")
Link: https://lore.kernel.org/r/20200708195526.31040-1-ybason@marvell.com
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Yuval Basson <ybason@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 14:57:08 -03:00
Max Gurtovoy
317000b926 IB/isert: allocate RW ctxs according to max IO size
Current iSER target code allocates MR pool budget based on queue size.
Since there is no handshake between iSER initiator and target on max IO
size, we'll set the iSER target to support upto 16MiB IO operations and
allocate the correct number of RDMA ctxs according to the factor of MR's
per IO operation. This would guarantee sufficient size of the MR pool for
the required IO queue depth and IO size.

Link: https://lore.kernel.org/r/20200708091908.162263-1-maxg@mellanox.com
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 14:23:22 -03:00
Daria Velikovsky
0829d2da60 RDMA/mlx5: Init dest_type when create flow
When using action drop dest_type was never assigned to any value.  Add
initialization of dest_type to -1 since 0 is valid.

Fixes: f29de9eee7 ("RDMA/mlx5: Add support for drop action in DV steering")
Link: https://lore.kernel.org/r/20200707110259.882276-1-leon@kernel.org
Signed-off-by: Daria Velikovsky <daria@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 14:11:53 -03:00
Kamal Heib
420bd9e2d9 RDMA/rxe: Remove rxe_link_layer()
Instead of returning IB_LINK_LAYER_ETHERNET from rxe_link_layer, return it
directly from get_link_layer callback and remove rxe_link_layer().

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:21 -03:00
Kamal Heib
293d8440a0 RDMA/rxe: Return void from rxe_mem_init_dma()
The return value from rxe_mem_init_dma() is always 0 - change it to be
void and fix the callers accordingly.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:21 -03:00
Kamal Heib
9d576eac63 RDMA/rxe: Return void from rxe_init_port_param()
The return value from rxe_init_port_param() is always 0 - change it to be
void.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:20 -03:00
Kamal Heib
6112ef6282 RDMA/rxe: Drop pointless checks in rxe_init_ports
Both pkey_tbl_len and gid_tbl_len are set in rxe_init_port_param() - so no
need to check if they aren't set.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:20 -03:00
Maor Gottlieb
87c4c774cb RDMA/cm: Protect access to remote_sidr_table
cm.lock must be held while accessing remote_sidr_table. This fixes the
below NULL pointer dereference.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 0 P4D 0
  Oops: 0002 [#1] SMP PTI
  CPU: 2 PID: 7288 Comm: udaddy Not tainted 5.7.0_for_upstream_perf_2020_06_09_15_14_20_38 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:rb_erase+0x10d/0x360
  Code: 00 00 00 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 74 48 f6 02 01 75 af 48 8b 7a 10 48 89 c1 48 83 c9 01 48 89 78 08 48 89 42 10 <48> 89 0f 48 8b 08 48 89 0a 48 83 e1 fc 48 89 10 0f 84 b1 00 00 00
  RSP: 0018:ffffc90000f77c30 EFLAGS: 00010086
  RAX: ffff8883df27d458 RBX: ffff8883df27da58 RCX: ffff8883df27d459
  RDX: ffff8883d183fa58 RSI: ffffffffa01e8d00 RDI: 0000000000000000
  RBP: ffff8883d62ac800 R08: 0000000000000000 R09: 00000000000000ce
  R10: 000000000000000a R11: 0000000000000000 R12: ffff8883df27da00
  R13: ffffc90000f77c98 R14: 0000000000000130 R15: 0000000000000000
  FS:  00007f009f877740(0000) GS:ffff8883f1a00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 00000003d467e003 CR4: 0000000000160ee0
  Call Trace:
   cm_send_sidr_rep_locked+0x15a/0x1a0 [ib_cm]
   ib_send_cm_sidr_rep+0x2b/0x50 [ib_cm]
   cma_send_sidr_rep+0x8b/0xe0 [rdma_cm]
   __rdma_accept+0x21d/0x2b0 [rdma_cm]
   ? ucma_get_ctx+0x2b/0xe0 [rdma_ucm]
   ? _copy_from_user+0x30/0x60
   ucma_accept+0x13e/0x1e0 [rdma_ucm]
   ucma_write+0xb4/0x130 [rdma_ucm]
   vfs_write+0xad/0x1a0
   ksys_write+0x9d/0xb0
   do_syscall_64+0x48/0x130
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f009ef60924
  Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 2a ef 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
  RSP: 002b:00007fff843edf38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000055743042e1d0 RCX: 00007f009ef60924
  RDX: 0000000000000130 RSI: 00007fff843edf40 RDI: 0000000000000003
  RBP: 00007fff843ee0e0 R08: 0000000000000000 R09: 0000557430433090
  R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
  R13: 00007fff843edf40 R14: 000000000000038c R15: 00000000ffffff00
  CR2: 0000000000000000

Fixes: 6a8824a74b ("RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock")
Link: https://lore.kernel.org/r/20200716105519.1424266-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:58:53 -03:00
Leon Romanovsky
0d1fd39bb2 RDMA/core: Fix race in rdma_alloc_commit_uobject()
The FD should not be installed until all of the setup is completed as the
fd_install() transfers ownership of the kref to the FD table. A thread can
race a close() and trigger concurrent rdma_alloc_commit_uobject() and
uverbs_uobject_fd_release() which, at least, triggers a safety WARN_ON:

  WARNING: CPU: 4 PID: 6913 at drivers/infiniband/core/rdma_core.c:768 uverbs_uobject_fd_release+0x202/0x230
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 4 PID: 6913 Comm: syz-executor.3 Not tainted 5.7.0-rc2 #22
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  [..]
  RIP: 0010:uverbs_uobject_fd_release+0x202/0x230
  Code: fe 4c 89 e7 e8 af 23 fe ff e9 2a ff ff ff e8 c5 fa 61 fe be 03 00 00 00 4c 89 e7 e8 68 eb f5 fe e9 13 ff ff ff e8 ae fa 61 fe <0f> 0b eb ac e8 e5 aa 3c fe e8 50 2b 86 fe e9 6a fe ff ff e8 46 2b
  RSP: 0018:ffffc90008117d88 EFLAGS: 00010293
  RAX: ffff88810e146580 RBX: 1ffff92001022fb1 RCX: ffffffff82d5b902
  RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88811951b040
  RBP: ffff88811951b000 R08: ffffed10232a3609 R09: ffffed10232a3609
  R10: ffff88811951b043 R11: 0000000000000001 R12: ffff888100a7c600
  R13: ffff888100a7c650 R14: ffffc90008117da8 R15: ffffffff82d5b700
   ? __uverbs_cleanup_ufile+0x270/0x270
   ? uverbs_uobject_fd_release+0x202/0x230
   ? uverbs_uobject_fd_release+0x202/0x230
   ? __uverbs_cleanup_ufile+0x270/0x270
   ? locks_remove_file+0x282/0x3d0
   ? security_file_free+0xaa/0xd0
   __fput+0x2be/0x770
   task_work_run+0x10e/0x1b0
   exit_to_usermode_loop+0x145/0x170
   do_syscall_64+0x2d0/0x390
   ? prepare_exit_to_usermode+0x17a/0x230
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x414da7
  Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 f4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 36 fc ff ff 8b 44 24
  RSP: 002b:00007fff39d379d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
  RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000414da7
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000003
  RBP: 00007fff39d37a3c R08: 0000000400000000 R09: 0000000400000000
  R10: 00007fff39d37910 R11: 0000000000000293 R12: 0000000000000001
  R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003

Reorder so that fd_install() is the last thing done in
rdma_alloc_commit_uobject().

Fixes: aba94548c9 ("IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit")
Link: https://lore.kernel.org/r/20200716102059.1420681-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:58:53 -03:00
Xi Wang
79d5208386 RDMA/hns: Fix wrong PBL offset when VA is not aligned to PAGE_SIZE
ROCE uses "VA % buf_page_size" to caclulate the offset in the PBL's first
page, the actual PA corresponding to the MR's VA is equal to MR's PA plus
this offset. The first PA in PBL has already been aligned to PAGE_SIZE
after calling ib_umem_get(), but the MR's VA may not. If the buf_page_size
is smaller than the PAGE_SIZE, this will lead the HW to access the wrong
memory because the offset is smaller than expected.

Fixes: 9b2cf76c9f ("RDMA/hns: Optimize PBL buffer allocation process")
Link: https://lore.kernel.org/r/1594726935-45666-1-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:55:01 -03:00
Weihang Li
7b9bd73ed1 RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC
The RoCE Engine will schedule to another QP after one has sent
(2 ^ lp_pktn_ini) packets. lp_pktn_ini is set in QPC and should be
calculated from 2 factors:

1. current MTU as a integer
2. the RoCE Engine's maximum slice length 64KB

But the driver use MTU as a enum ib_mtu and the max inline capability, the
lp_pktn_ini will be much bigger than expected which may cause traffic of
some QPs to never get scheduled.

Fixes: b713128de7 ("RDMA/hns: Adjust lp_pktn_ini dynamically")
Link: https://lore.kernel.org/r/1594726138-49294-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:52:14 -03:00
Maor Gottlieb
c3d6057e07 RDMA/mlx5: Use xa_lock_irq when access to SRQ table
SRQ table is accessed both from interrupt and process context,
therefore we must use xa_lock_irq.

   inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
   kworker/u17:9/8573   takes:
   ffff8883e3503d30 (&xa->xa_lock#13){?...}-{2:2}, at: mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
   {IN-HARDIRQ-W} state was registered at:
     lock_acquire+0xb9/0x3a0
     _raw_spin_lock+0x25/0x30
     srq_event_notifier+0x2b/0xc0 [mlx5_ib]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     forward_event+0x36/0xc0 [mlx5_core]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     mlx5_eq_async_int+0xc5/0x160 [mlx5_core]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     mlx5_irq_int_handler+0x19/0x30 [mlx5_core]
     __handle_irq_event_percpu+0x43/0x2a0
     handle_irq_event_percpu+0x30/0x70
     handle_irq_event+0x34/0x60
     handle_edge_irq+0x7c/0x1b0
     do_IRQ+0x60/0x110
     ret_from_intr+0x0/0x2a
     default_idle+0x34/0x160
     do_idle+0x1ec/0x220
     cpu_startup_entry+0x19/0x20
     start_secondary+0x153/0x1a0
     secondary_startup_64+0xa4/0xb0
   irq event stamp: 20907
   hardirqs last  enabled at (20907):   _raw_spin_unlock_irq+0x24/0x30
   hardirqs last disabled at (20906):   _raw_spin_lock_irq+0xf/0x40
   softirqs last  enabled at (20746):   __do_softirq+0x2c9/0x436
   softirqs last disabled at (20681):   irq_exit+0xb3/0xc0

   other info that might help us debug this:
    Possible unsafe locking scenario:

          CPU0
          ----
     lock(&xa->xa_lock#13);
     <Interrupt>
       lock(&xa->xa_lock#13);

    *** DEADLOCK ***

   2 locks held by kworker/u17:9/8573:
    #0: ffff888295218d38 ((wq_completion)mlx5_ib_page_fault){+.+.}-{0:0}, at: process_one_work+0x1f1/0x5f0
    #1: ffff888401647e78 ((work_completion)(&pfault->work)){+.+.}-{0:0}, at: process_one_work+0x1f1/0x5f0

   stack backtrace:
   CPU: 0 PID: 8573 Comm: kworker/u17:9 Tainted: GO      5.7.0_for_upstream_min_debug_2020_06_14_11_31_46_41 #1
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
   Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
   Call Trace:
    dump_stack+0x71/0x9b
    mark_lock+0x4f2/0x590
    ? print_shortest_lock_dependencies+0x200/0x200
    __lock_acquire+0xa00/0x1eb0
    lock_acquire+0xb9/0x3a0
    ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    _raw_spin_lock+0x25/0x30
    ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    mlx5_ib_eqe_pf_action+0x257/0xa30 [mlx5_ib]
    ? process_one_work+0x209/0x5f0
    process_one_work+0x27b/0x5f0
    ? __schedule+0x280/0x7e0
    worker_thread+0x2d/0x3c0
    ? process_one_work+0x5f0/0x5f0
    kthread+0x111/0x130
    ? kthread_park+0x90/0x90
    ret_from_fork+0x24/0x30

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200712102641.15210-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:28:16 -03:00
David S. Miller
71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Mark Zhang
cbeb7d896c RDMA/counter: Allow manually bind QPs with different pids to same counter
In manual mode allow bind user QPs with different pids to same counter,
since this is allowed in auto mode.
Bind kernel QPs and user QPs to the same counter are not allowed.

Fixes: 1bd8e0a9d0 ("RDMA/counter: Allow manual mode configuration support")
Link: https://lore.kernel.org/r/20200702082933.424537-4-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:50:53 -03:00
Mark Zhang
c9f557421e RDMA/counter: Only bind user QPs in auto mode
In auto mode only bind user QPs to a dynamic counter, since this feature
is mainly used for system statistic and diagnostic purpose, while there's
no need to counter kernel QPs so far.

Fixes: 99fa331dc8 ("RDMA/counter: Add "auto" configuration mode support")
Link: https://lore.kernel.org/r/20200702082933.424537-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:50:53 -03:00
Mark Zhang
7c97f3aded RDMA/counter: Add PID category support in auto mode
With the "PID" category QPs have same PID will be bound to same counter;
If this category is not set then QPs have different PIDs will be bound
to same counter.

This is implemented for 2 reasons:
1. The counter is a limited resource, while there may be dozens of
   applications, each of which creates several types of QPs, which means
   it may doesn't have enough counter.
2. The system administrator needs all QPs created by all applications
   with same type bound to one counter.

The counter name and PID is only make sense when "PID" category are
configured.

This category can also be used in combine with others, e.g. QP type.

Link: https://lore.kernel.org/r/20200702082933.424537-2-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:50:53 -03:00
Gal Pressman
6c72a038bf RDMA/mlx5: Remove unused to_mibmr function
The to_mibmr function is unused, remove it.

Link: https://lore.kernel.org/r/20200705141143.47303-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:40:39 -03:00
Leon Romanovsky
0a03715068 RDMA/mlx5: Set PD pointers for the error flow unwind
ib_pd is accessed internally during destroy of the TIR/TIS, but PD
can be not set yet. This leading to the following kernel panic.

  BUG: kernel NULL pointer dereference, address: 0000000000000074
  PGD 8000000079eaa067 P4D 8000000079eaa067 PUD 7ae81067 PMD 0 Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 709 Comm: syz-executor.0 Not tainted 5.8.0-rc3 #41 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:destroy_raw_packet_qp_tis drivers/infiniband/hw/mlx5/qp.c:1189 [inline]
  RIP: 0010:destroy_raw_packet_qp drivers/infiniband/hw/mlx5/qp.c:1527 [inline]
  RIP: 0010:destroy_qp_common+0x2ca/0x4f0 drivers/infiniband/hw/mlx5/qp.c:2397
  Code: 00 85 c0 74 2e e8 56 18 55 ff 48 8d b3 28 01 00 00 48 89 ef e8 d7 d3 ff ff 48 8b 43 08 8b b3 c0 01 00 00 48 8b bd a8 0a 00 00 <0f> b7 50 74 e8 0d 6a fe ff e8 28 18 55 ff 49 8d 55 50 4c 89 f1 48
  RSP: 0018:ffffc900007bbac8 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: ffff88807949e800 RCX: 0000000000000998
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88807c180140
  RBP: ffff88807b50c000 R08: 000000000002d379 R09: ffffc900007bba00
  R10: 0000000000000001 R11: 000000000002d358 R12: ffff888076f37000
  R13: ffff88807949e9c8 R14: ffffc900007bbe08 R15: ffff888076f37000
  FS:  00000000019bf940(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000074 CR3: 0000000076d68004 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mlx5_ib_create_qp+0xf36/0xf90 drivers/infiniband/hw/mlx5/qp.c:3014
   _ib_create_qp drivers/infiniband/core/core_priv.h:333 [inline]
   create_qp+0x57f/0xd20 drivers/infiniband/core/uverbs_cmd.c:1443
   ib_uverbs_create_qp+0xcf/0x100 drivers/infiniband/core/uverbs_cmd.c:1564
   ib_uverbs_write+0x5fa/0x780 drivers/infiniband/core/uverbs_main.c:664
   __vfs_write+0x3f/0x90 fs/read_write.c:495
   vfs_write+0xc7/0x1f0 fs/read_write.c:559
   ksys_write+0x5e/0x110 fs/read_write.c:612
   do_syscall_64+0x3e/0x70 arch/x86/entry/common.c:359
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x466479
  Code: Bad RIP value.
  RSP: 002b:00007ffd057b62b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479
  RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
  RBP: 00000000019bf8fc R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
  R13: 0000000000000bf6 R14: 00000000004cb859 R15: 00000000006fefc0

Fixes: 6c41965d64 ("RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path")
Link: https://lore.kernel.org/r/20200707110612.882962-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-08 20:15:59 -03:00
Aya Levin
530c8632b5 IB/mlx5: Fix 50G per lane indication
Some released FW versions mistakenly don't set the capability that 50G per
lane link-modes are supported for VFs (ptys_extended_ethernet capability
bit).

Use PTYS.ext_eth_proto_capability instead, as this indication is always
accurate. If PTYS.ext_eth_proto_capability is valid
(has a non-zero value) conclude that the HCA supports 50G per lane.

Otherwise, conclude that the HCA doesn't support 50G per lane.

Fixes: 08e8676f16 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Link: https://lore.kernel.org/r/20200707110612.882962-3-leon@kernel.org
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-08 20:15:58 -03:00
Kamal Heib
04340645f6 RDMA/siw: Fix reporting vendor_part_id
Move the initialization of the vendor_part_id to be before calling
ib_register_device(), this is needed because the query_device() callback
is called from the context of ib_register_device() before initializing the
vendor_part_id, so the reported value is wrong.

Fixes: bdcf26bf9b ("rdma/siw: network and RDMA core interface")
Link: https://lore.kernel.org/r/20200707130931.444724-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-08 09:24:45 -03:00