[ Upstream commit 56c5dab20a6391604df9521f812c01d1e3fe1bd0 ]
Since gcc13, each member of an enum has the same type as the enum [1]. And
that is inherited from its members. Provided these two:
SRP_TAG_NO_REQ = ~0U,
SRP_TAG_TSK_MGMT = 1U << 31
all other members are unsigned ints.
Esp. with SRP_MAX_SGE and SRP_TSK_MGMT_SQ_SIZE and their use in min(),
this results in the following warnings:
include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast
drivers/infiniband/ulp/srp/ib_srp.c:563:42: note: in expansion of macro 'min'
include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast
drivers/infiniband/ulp/srp/ib_srp.c:2369:27: note: in expansion of macro 'min'
So move the large values away to a separate enum, so that they don't
affect other members.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113
Link: https://lore.kernel.org/r/20221212120411.13750-1-jirislaby@kernel.org
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dbc94a0fb81771a38733c0e8f2ea8c4fa6934dc1 ]
There are 2 ways to create IPoIB PKEY child interfaces:
1) Writing a PKEY to /sys/class/net/<ib parent interface>/create_child.
2) Using netlink with iproute.
While with sysfs the child interface has the same number of tx and
rx queues as the parent, with netlink there will always be 1 tx
and 1 rx queue for the child interface. That's because the
get_num_tx/rx_queues() netlink ops are missing and the default value
of 1 is taken for the number of queues (in rtnl_create_link()).
This change adds the get_num_tx/rx_queues() ops which allows for
interfaces with multiple queues to be created over netlink. This
constant only represents the max number of tx and rx queues on that
net device.
Fixes: 9baa0b0364 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/f4a42c8aa43c02d5ae5559a60c3e5e0f18c82531.1670485816.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed461b30b22c8fa85c25189c14cb89f29595cd14 ]
In the previous iteration of the while loop, the "ret" may have been
assigned a value of 0, so the error return code -EINVAL may have been
incorrectly set to 0. To fix set valid return code before calling to
goto. Also investigate each case separately as Andy suggessted.
Fixes: e711f968c4 ("IB/srp: replace custom implementation of hex2bin()")
Fixes: 2a174df0c6 ("IB/srp: Use kstrtoull() instead of simple_strtoull()")
Fixes: 19f313438c ("IB/srp: Add RDMA/CM support")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Link: https://lore.kernel.org/r/1669953638-11747-2-git-send-email-wangyufen@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5605148e6ce36bb21020d49010b617693933128 ]
Change the LIO port members inside struct srpt_port from regular members
into pointers. Allocate the LIO port data structures from inside
srpt_make_tport() and free these from inside srpt_make_tport(). Keep
struct srpt_device as long as either an RDMA port or a LIO target port is
associated with it. This patch decouples the lifetime of struct srpt_port
(controlled by the RDMA core) and struct srpt_port_id (controlled by LIO).
This patch fixes the following KASAN complaint:
BUG: KASAN: use-after-free in srpt_enable_tpg+0x31/0x70 [ib_srpt]
Read of size 8 at addr ffff888141cc34b8 by task check/5093
Call Trace:
<TASK>
show_stack+0x4e/0x53
dump_stack_lvl+0x51/0x66
print_address_description.constprop.0.cold+0xea/0x41e
print_report.cold+0x90/0x205
kasan_report+0xb9/0xf0
__asan_load8+0x69/0x90
srpt_enable_tpg+0x31/0x70 [ib_srpt]
target_fabric_tpg_base_enable_store+0xe2/0x140 [target_core_mod]
configfs_write_iter+0x18b/0x210
new_sync_write+0x1f2/0x2f0
vfs_write+0x3e3/0x540
ksys_write+0xbb/0x140
__x64_sys_write+0x42/0x50
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>
Link: https://lore.kernel.org/r/20220727193415.1583860-4-bvanassche@acm.org
Reported-by: Li Zhijian <lizhijian@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Fixes: a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0e8558476faf02ec51256cad9c487c93c346198c ]
drivers/infiniband/ulp/rtrs/rtrs-clt.c:1786:19: warning: result of comparison of
constant 'MAX_SESS_QUEUE_DEPTH' (65536) with expression of type 'u16'
(aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
To fix it, limit MAX_SESS_QUEUE_DEPTH to u16 max, which is 65535, and
drop the check in rtrs-clt, as it's the type u16 max.
Link: https://lore.kernel.org/r/20210531122835.58329-1-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 891e2639deae721dc43764a44fa255890dc34313 ]
During ep_disconnect we have been doing iscsi_suspend_tx/queue to block new
I/O but every driver except cxgbi and iscsi_tcp can still get I/O from
__iscsi_conn_send_pdu() if we haven't called iscsi_conn_failure() before
ep_disconnect. This could happen if we were terminating the session, and
the logout timed out before it was even sent to libiscsi.
Fix the issue by adding a helper which reverses the bind_conn call that
allows new I/O to be queued. Drivers implementing ep_disconnect can use this
to make sure new I/O is not queued to them when handling the disconnect.
Link: https://lore.kernel.org/r/20210525181821.7617-3-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c46fa8911b17e3f808679061a8af8bee219f4602 ]
Error path of rtrs_clt_open() calls free_clt(), where free_permit is
called. This is wrong since error path of rtrs_clt_open() does not need
to call free_permit().
Also, moving free_permits() call to rtrs_clt_close(), makes it more
aligned with the call to alloc_permit() in rtrs_clt_open().
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20220217030929.323849-2-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8700af2cc18c919b2a83e74e0479038fd113c15d ]
Callback function rtrs_clt_dev_release() for put_device() calls kfree(clt)
to free memory. We shouldn't call kfree(clt) again, and we can't use the
clt after kfree too.
Replace device_register() with device_initialize() and device_add() so that
dev_set_name can() be used appropriately.
Move mutex_destroy() to the release function so it can be called in
the alloc_clt err path.
Fixes: eab098246625 ("RDMA/rtrs-clt: Refactor the failure cases in alloc_clt")
Link: https://lore.kernel.org/r/20220217030929.323849-1-haris.iqbal@ionos.com
Reported-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 109d19a5eb3ddbdb87c43bfd4bcf644f4569da64 ]
Since the Linux iser initiator default max I/O size set to 512KB and since
there is no handshake procedure for this size in iser protocol, set the
default max IO size of the target to 512KB as well.
For changing the default values, there is a module parameter for both
drivers.
Link: https://lore.kernel.org/r/20210524085215.29005-1-mgurtovoy@nvidia.com
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e91eabf66c854f16ca2e954e5c68939bc81601e ]
Currently rtrs when create_qp use a coarse numbers (bigger in general),
which leads to hardware create more resources which only waste memory with
no benefits.
For max_send_wr, we don't really need alway max_qp_wr size when creating
qp, reduce it to cq_size.
For max_recv_wr, cq_size is enough.
With the patch when sess_queue_depth=128, per session (2 paths) memory
consumption reduced from 188 MB to 65MB
When always_invalidate is enabled, we need send more wr, so treat it
special.
Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210614090337.29557-2-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ecd7e290bee0ab9cf75b79a367a4cc113cf8292 ]
sess->stats and sess->stats->pcpu_stats objects are freed
when sysfs entry is removed. If something wrong happens and
session is closed before sysfs entry is created,
sess->stats and sess->stats->pcpu_stats objects are not freed.
This patch adds freeing of them at three places:
1. When client uses wrong address and session creation fails.
2. When client fails to create a sysfs entry.
3. When client adds wrong address via sysfs add_path.
Fixes: 215378b838 ("RDMA/rtrs: client: sysfs interface functions")
Link: https://lore.kernel.org/r/20210528113018.52290-21-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b73b799c25c68a4703cd6c5ac4518006d9865b8 ]
The queue_depth is a module parameter for rtrs_server. It is used on the
client side to determing the queue_depth of the request queue for the RNBD
virtual block device.
During a reconnection event for an already mapped device, in case the
rtrs_server module queue_depth has changed, fail the reconnect attempt.
Also stop further auto reconnection attempts. A manual reconnect via
sysfs has to be triggerred.
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20210528113018.52290-20-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2371c40354509746e4a4dad09a752e027a30f148 ]
When closing a session, currently the rtrs_srv_stats object in the
closing session is freed by kobject release. But if it failed
to create a session by various reasons, it must free the rtrs_srv_stats
object directly because kobject is not created yet.
This problem is found by kmemleak as below:
1. One client machine maps /dev/nullb0 with session name 'bla':
root@test1:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb0" > /sys/devices/virtual/rnbd-client/ctl/map_device
2. Another machine failed to create a session with the same name 'bla':
root@test2:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb1" > /sys/devices/virtual/rnbd-client/ctl/map_device
-bash: echo: write error: Connection reset by peer
3. The kmemleak on server machine reported an error:
unreferenced object 0xffff888033cdc800 (size 128):
comm "kworker/2:1", pid 83, jiffies 4295086585 (age 2508.680s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000a72903b2>] __alloc_sess+0x1d4/0x1250 [rtrs_server]
[<00000000d1e5321e>] rtrs_srv_rdma_cm_handler+0xc31/0xde0 [rtrs_server]
[<00000000bb2f6e7e>] cma_ib_req_handler+0xdc5/0x2b50 [rdma_cm]
[<00000000e896235d>] cm_process_work+0x2d/0x100 [ib_cm]
[<00000000b6866c5f>] cm_req_handler+0x11bc/0x1c40 [ib_cm]
[<000000005f5dd9aa>] cm_work_handler+0xe65/0x3cf2 [ib_cm]
[<00000000610151e7>] process_one_work+0x4bc/0x980
[<00000000541e0f77>] worker_thread+0x78/0x5c0
[<00000000423898ca>] kthread+0x191/0x1e0
[<000000005a24b239>] ret_from_fork+0x3a/0x50
Fixes: 39c2d639ca ("RDMA/rtrs-srv: Set .release function for rtrs srv device during device init")
Link: https://lore.kernel.org/r/20210528113018.52290-18-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64bce1ee978491a779eb31098b21c57d4e431d6a ]
When re-connecting, it resets hb_missed_max to 0.
Before the first re-connecting, client will trigger re-connection
when it gets hb-ack more than 5 times. But after the first
re-connecting, clients will do re-connection whenever it does
not get hb-ack because hb_missed_max is 0.
There is no need to reset hb_missed_max when re-connecting.
hb_missed_max should be kept until closing the session.
Fixes: c0894b3ea6 ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20210528113018.52290-16-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41db63a7efe1c8c2dd282c1849a6ebfbbedbaf67 ]
When get_next_path_min_inflight is called to select the next path, it
iterates over the list of available rtrs_clt_sess (paths). It then reads
the number of inflight IOs for that path to select one which has the least
inflight IO.
But it may so happen that rtrs_clt_sess (path) is no longer in the
connected state because closing or error recovery paths can change the status
of the rtrs_clt_Sess.
For example, the client sent the heart-beat and did not get the
response, it would change the session status and stop IO processing.
The added checking of this patch can prevent accessing the broken path
and generating duplicated error messages.
It is ok if the status is changed after checking the status because
the error recovery path does not free memory and only tries to
reconnection. And also it is ok if the session is closed after checking
the status because closing the session changes the session status and
flush all IO beforing free memory. If the session is being accessed for
IO processing, the closing session will wait.
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20210528113018.52290-13-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a3e74fb9247cd530dca246699d5eb5a691884d32 upstream.
After the commit 5ce2dced8e ("RDMA/ipoib: Set rtnl_link_ops for ipoib
interfaces"), if the IPoIB device is moved to non-initial netns,
destroying that netns lets the device vanish instead of moving it back to
the initial netns, This is happening because default_device_exit() skips
the interfaces due to having rtnl_link_ops set.
Steps to reporoduce:
ip netns add foo
ip link set mlx5_ib0 netns foo
ip netns delete foo
WARNING: CPU: 1 PID: 704 at net/core/dev.c:11435 netdev_exit+0x3f/0x50
Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun d
fuse
CPU: 1 PID: 704 Comm: kworker/u64:3 Tainted: G S W 5.13.0-rc1+ #1
Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016
Workqueue: netns cleanup_net
RIP: 0010:netdev_exit+0x3f/0x50
Code: 48 8b bb 30 01 00 00 e8 ef 81 b1 ff 48 81 fb c0 3a 54 a1 74 13 48
8b 83 90 00 00 00 48 81 c3 90 00 00 00 48 39 d8 75 02 5b c3 <0f> 0b 5b
c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00
RSP: 0018:ffffb297079d7e08 EFLAGS: 00010206
RAX: ffff8eb542c00040 RBX: ffff8eb541333150 RCX: 000000008010000d
RDX: 000000008010000e RSI: 000000008010000d RDI: ffff8eb440042c00
RBP: ffffb297079d7e48 R08: 0000000000000001 R09: ffffffff9fdeac00
R10: ffff8eb5003be000 R11: 0000000000000001 R12: ffffffffa1545620
R13: ffffffffa1545628 R14: 0000000000000000 R15: ffffffffa1543b20
FS: 0000000000000000(0000) GS:ffff8ed37fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005601b5f4c2e8 CR3: 0000001fc8c10002 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
ops_exit_list.isra.9+0x36/0x70
cleanup_net+0x234/0x390
process_one_work+0x1cb/0x360
? process_one_work+0x360/0x360
worker_thread+0x30/0x370
? process_one_work+0x360/0x360
kthread+0x116/0x130
? kthread_park+0x80/0x80
ret_from_fork+0x22/0x30
To avoid the above warning and later on the kernel panic that could happen
on shutdown due to a NULL pointer dereference, make sure to set the
netns_refund flag that was introduced by commit 3a5ca857079e ("can: dev:
Move device back to init netns on owning netns delete") to properly
restore the IPoIB interfaces to the initial netns.
Fixes: 5ce2dced8e ("RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces")
Link: https://lore.kernel.org/r/20210525150134.139342-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7f4a8592ff29f19c5a2ca549d0973821319afaad ]
A session can be removed dynamically by sysfs interface "remove_path" that
eventually calls rtrs_clt_remove_path_from_sysfs function. The current
rtrs_clt_remove_path_from_sysfs first removes the sysfs interfaces and
frees sess->stats object. Second it removes the session from the active
list.
Therefore some functions could access non-connected session and access the
freed sess->stats object even-if they check the session status before
accessing the session.
For instance rtrs_clt_request and get_next_path_min_inflight check the
session status and try to send IO to the session. The session status
could be changed when they are trying to send IO but they could not catch
the change and update the statistics information in sess->stats object,
and generate use-after-free problem.
(see: "RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its
stats")
This patch changes the rtrs_clt_remove_path_from_sysfs to remove the
session from the active session list and then destroy the sysfs
interfaces.
Each function still should check the session status because closing or
error recovery paths can change the status.
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20210412084002.33582-1-gi-oh.kim@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit adb76a520d068a54ee5ca82e756cf8e5a47363a4 ]
The device is got by isert_device_get() with refcount is 1, and is
assigned to isert_conn by
isert_conn->device = device.
When isert_create_qp() failed, device will be freed with
isert_device_put().
Later, the device is used in isert_free_login_buf(isert_conn) by the
isert_conn->device->ib_device statement.
Free the device in the correct order.
Fixes: ae9ea9ed38 ("iser-target: Split some logic in isert_connect_request to routines")
Link: https://lore.kernel.org/r/20210322161325.7491-1-lyl2019@mail.ustc.edu.cn
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2b5715fc17386a6223490d5b8f08d031999b0c0b upstream.
The current code computes a number of channels per SRP target and spreads
them equally across all online NUMA nodes. Each channel is then assigned
a CPU within this node.
In the case of unbalanced, or even unpopulated nodes, some channels do not
get a CPU associated and thus do not get connected. This causes the SRP
connection to fail.
This patch solves the issue by rewriting channel computation and
allocation:
- Drop channel to node/CPU association as it had no real effect on
locality but added unnecessary complexity.
- Tweak the number of channels allocated to reduce CPU contention when
possible:
- Up to one channel per CPU (instead of up to 4 by node)
- At least 4 channels per node, unless ch_count module parameter is
used.
Link: https://lore.kernel.org/r/9cb4d9d3-30ad-2276-7eff-e85f7ddfb411@suse.com
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ed408529679737a9a7ad816c8de5d59ba104bb11 ]
smatch gives the warning:
drivers/infiniband/ulp/rtrs/rtrs-srv.c:1805 rtrs_rdma_connect() warn: passing zero to 'PTR_ERR'
Which is trying to say smatch has shown that srv is not an error pointer
and thus cannot be passed to PTR_ERR.
The solution is to move the list_add() down after full initilization of
rtrs_srv. To avoid holding the srv_mutex too long, only hold it during the
list operation as suggested by Leon.
Fixes: 03e9b33a0fd6 ("RDMA/rtrs: Only allow addition of path to an already established session")
Link: https://lore.kernel.org/r/20210216143807.65923-1-jinpu.wang@cloud.ionos.com
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7452a7e96c120d73100387d5f87de9fce7133cb ]
kmemleak reported an error as below:
unreferenced object 0xffff8880674b7640 (size 64):
comm "kworker/4:1H", pid 113, jiffies 4296403507 (age 507.840s)
hex dump (first 32 bytes):
69 70 3a 31 39 32 2e 31 36 38 2e 31 32 32 2e 31 ip:192.168.122.1
31 30 40 69 70 3a 31 39 32 2e 31 36 38 2e 31 32 10@ip:192.168.12
backtrace:
[<0000000054413611>] kstrdup+0x2e/0x60
[<0000000078e3120a>] kobject_set_name_vargs+0x2f/0xb0
[<00000000ca2be3ee>] kobject_init_and_add+0xb0/0x120
[<0000000062ba5e78>] rtrs_srv_create_sess_files+0x14c/0x314 [rtrs_server]
[<00000000b45b7217>] rtrs_srv_info_req_done+0x5b1/0x800 [rtrs_server]
[<000000008fc5aa8f>] __ib_process_cq+0x94/0x100 [ib_core]
[<00000000a9599cb4>] ib_cq_poll_work+0x32/0xc0 [ib_core]
[<00000000cfc376be>] process_one_work+0x4bc/0x980
[<0000000016e5c96a>] worker_thread+0x78/0x5c0
[<00000000c20b8be0>] kthread+0x191/0x1e0
[<000000006c9c0003>] ret_from_fork+0x3a/0x50
It is caused by the not-freed kobject of rtrs_srv_sess. The kobject
embedded in rtrs_srv_sess has ref-counter 2 after calling
process_info_req(). Therefore it must call kobject_put twice. Currently
it calls kobject_put only once at rtrs_srv_destroy_sess_files because
kobject_del removes the state_in_sysfs flag and then kobject_put in
free_sess() is not called.
This patch moves kobject_del() into free_sess() so that the kobject of
rtrs_srv_sess can be freed. And also this patch adds the missing call of
sysfs_remove_group() to clean-up the sysfs directory.
Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-4-jinpu.wang@cloud.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03e9b33a0fd677f554b03352646c13459bf60458 ]
While adding a path from the client side to an already established
session, it was possible to provide the destination IP to a different
server. This is dangerous.
This commit adds an extra member to the rtrs_msg_conn_req structure, named
first_conn; which is supposed to notify if the connection request is the
first for that session or not.
On the server side, if a session does not exist but the first_conn
received inside the rtrs_msg_conn_req structure is 1, the connection
request is failed. This signifies that the connection request is for an
already existing session, and since the server did not find one, it is an
wrong connection request.
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-3-jinpu.wang@cloud.ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6daa8f61d8def10f0619fe51b4c794f69598e4f ]
BUG: KASAN: stack-out-of-bounds in _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
Read of size 4 at addr ffff8880d5a7f980 by task kworker/0:1H/565
CPU: 0 PID: 565 Comm: kworker/0:1H Tainted: G O 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10
Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012
Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Call Trace:
dump_stack+0x96/0xe0
print_address_description.constprop.4+0x1f/0x300
? irq_work_claim+0x2e/0x50
__kasan_report.cold.8+0x78/0x92
? _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
kasan_report+0x10/0x20
_mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
? check_chain_key+0x1d7/0x2e0
? _mlx4_ib_post_recv+0x630/0x630 [mlx4_ib]
? lockdep_hardirqs_on+0x1a8/0x290
? stack_depot_save+0x218/0x56e
? do_profile_hits.isra.6.cold.13+0x1d/0x1d
? check_chain_key+0x1d7/0x2e0
? save_stack+0x4d/0x80
? save_stack+0x19/0x80
? __kasan_slab_free+0x125/0x170
? kfree+0xe7/0x3b0
rdma_write_sg+0x5b0/0x950 [rtrs_server]
The problem is when we send imm_wr, the type should be ib_rdma_wr, so hw
driver like mlx4 can do rdma_wr(wr), so fix it by use the ib_rdma_wr as
type for imm_wr.
Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-2-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f47e4e3e71724f625958b0059f6c8ac5d44d27ef ]
Currently rtrs when create_qp use a coarse numbers (bigger in general),
which leads to hardware create more resources which only waste memory
with no benefits.
- SERVICE con,
For max_send_wr/max_recv_wr, it's 2 times SERVICE_CON_QUEUE_DEPTH + 2
- IO con
For max_send_wr/max_recv_wr, it's sess->queue_depth * 3 + 1
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20201217141915.56989-6-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>