android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Dinghao Liu	b9c4b3ca90	scsi: zfcp: Fix a double put in zfcp_port_enqueue() commit b481f644d9174670b385c3a699617052cd2a79d3 upstream. When device_register() fails, zfcp_port_release() will be called after put_device(). As a result, zfcp_ccw_adapter_put() will be called twice: one in zfcp_port_release() and one in the error path after device_register(). So the reference on the adapter object is doubly put, which may lead to a premature free. Fix this by adjusting the error tag after device_register(). Fixes: `f3450c7b91` ("[SCSI] zfcp: Replace local reference counting with common kref") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20230923103723.10320-1-dinghao.liu@zju.edu.cn Acked-by: Benjamin Block <bblock@linux.ibm.com> Cc: stable@vger.kernel.org # v2.6.33+ Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-10-10 21:53:37 +02:00
Steffen Maier	181274d2f3	scsi: zfcp: Defer fc_rport blocking until after ADISC response commit e65851989001c0c9ba9177564b13b38201c0854c upstream. Storage devices are free to send RSCNs, e.g. for internal state changes. If this happens on all connected paths, zfcp risks temporarily losing all paths at the same time. This has strong requirements on multipath configuration such as "no_path_retry queue". Avoid such situations by deferring fc_rport blocking until after the ADISC response, when any actual state change of the remote port became clear. The already existing port recovery triggers explicitly block the fc_rport. The triggers are: on ADISC reject or timeout (typical cable pull case), and on ADISC indicating that the remote port has changed its WWPN or the port is meanwhile no longer open. As a side effect, this also removes a confusing direct function call to another work item function zfcp_scsi_rport_work() instead of scheduling that other work item. It was probably done that way to have the rport block side effect immediate and synchronous to the caller. Fixes: `a2fa0aede0` ("[SCSI] zfcp: Block FC transport rports early on errors") Cc: stable@vger.kernel.org #v2.6.30+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Link: https://lore.kernel.org/r/20230724145156.3920244-1-maier@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 11:57:51 +02:00
Benjamin Block	d2c7d8f58e	scsi: zfcp: Fix double free of FSF request when qdio send fails commit 0954256e970ecf371b03a6c9af2cf91b9c4085ff upstream. We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache the FSF request ID when sending a new FSF request. This is used in case the sending fails and we need to remove the request from our internal hash table again (so we don't keep an invalid reference and use it when we free the request again). In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32 bit wide), but the rest of the zfcp code (and the firmware specification) handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x ELF ABI]). For one this has the obvious problem that when the ID grows past 32 bit (this can happen reasonably fast) it is truncated to 32 bit when storing it in the cache variable and so doesn't match the original ID anymore. The second less obvious problem is that even when the original ID has not yet grown past 32 bit, as soon as the 32nd bit is set in the original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we cast it back to 'unsigned long'. As the cached variable is of a signed type, the compiler will choose a sign-extending instruction to load the 32 bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the request again all the leading zeros will be flipped to ones to extend the sign and won't match the original ID anymore (this has been observed in practice). If we can't successfully remove the request from the hash table again after 'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify the adapter about new work because the adapter is already gone during e.g. a ChpID toggle) we will end up with a double free. We unconditionally free the request in the calling function when 'zfcp_fsf_req_send()' fails, but because the request is still in the hash table we end up with a stale memory reference, and once the zfcp adapter is either reset during recovery or shutdown we end up freeing the same memory twice. The resulting stack traces vary depending on the kernel and have no direct correlation to the place where the bug occurs. Here are three examples that have been seen in practice: list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:62! monitor event: 0040 ilc:2 [#1] PREEMPT SMP Modules linked in: ... CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded Hardware name: ... Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6 0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8 00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800 00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70 Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336 00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8 #00000003cbeea1f4: af000000 mc 0,0 >00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78 00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48 00000003cbeea204: b9040043 lgr %r4,%r3 00000003cbeea208: b9040051 lgr %r5,%r1 00000003cbeea20c: b9040032 lgr %r3,%r2 Call Trace: [<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140 ([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140) [<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp] [<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp] [<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp] [<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp] [<00000003cb5eece8>] kthread+0x138/0x150 [<00000003cb557f3c>] __ret_from_fork+0x3c/0x60 [<00000003cc4172ea>] ret_from_fork+0xa/0x40 INFO: lockdep is turned off. Last Breaking-Event-Address: [<00000003cc3e9d04>] _printk+0x4c/0x58 Kernel panic - not syncing: Fatal exception: panic_on_oops or: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 Fault in home space mode while using kernel ASCE. AS:0000000063b10007 R3:0000000000000024 Oops: 0038 ilc:3 [#1] SMP Modules linked in: ... CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded Hardware name: ... Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp]) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8 0000000000000000 0000000000000055 0000000000000000 00000000a8515800 0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c 000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48 Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8 000003ff7febaf82: e32020000004 lg %r2,0(%r2) #000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8 >000003ff7febaf8e: e3b020100020 cg %r11,16(%r2) 000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82 000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8 000003ff7febaf9e: e31020080004 lg %r1,8(%r2) 000003ff7febafa4: e33020000004 lg %r3,0(%r2) Call Trace: [<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp] [<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp] [<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp] [<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128 [<000000006292f300>] __do_softirq+0x130/0x3c0 [<0000000061d906c6>] irq_exit_rcu+0xfe/0x118 [<000000006291e818>] do_io_irq+0xc8/0x168 [<000000006292d516>] io_int_handler+0xd6/0x110 [<000000006292d596>] psw_idle_exit+0x0/0xa ([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0) [<000000006292ceea>] default_idle_call+0x52/0xf8 [<0000000061de4fa4>] do_idle+0xd4/0x168 [<0000000061de51fe>] cpu_startup_entry+0x36/0x40 [<0000000061d4faac>] smp_start_secondary+0x12c/0x138 [<000000006292d88e>] restart_int_handler+0x6e/0x90 Last Breaking-Event-Address: [<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp] Kernel panic - not syncing: Fatal exception in interrupt or: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803 Fault in home space mode while using kernel ASCE. AS:0000000077c40007 R3:0000000000000024 Oops: 0038 ilc:3 [#1] SMP Modules linked in: ... CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded Hardware name: ... Workqueue: kblockd blk_mq_run_work_fn Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20 0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20 00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6 00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0 Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa 0000000076fc0308: b904001b lgr %r1,%r11 #0000000076fc030c: e3106020001a algf %r1,32(%r6) >0000000076fc0312: e31010000082 xg %r1,0(%r1) 0000000076fc0318: b9040001 lgr %r0,%r1 0000000076fc031c: e30061700082 xg %r0,368(%r6) 0000000076fc0322: ec59000100d9 aghik %r5,%r9,1 0000000076fc0328: e34003b80004 lg %r4,952 Call Trace: [<0000000076fc0312>] __kmalloc+0xd2/0x398 [<0000000076f318f2>] mempool_alloc+0x72/0x1f8 [<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp] [<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp] [<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp] [<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod] [<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208 [<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8 [<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod] [<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818 [<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0 [<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218 [<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90 [<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0 [<0000000076da6d74>] process_one_work+0x274/0x4d0 [<0000000076da7018>] worker_thread+0x48/0x560 [<0000000076daef18>] kthread+0x140/0x160 [<000000007751d144>] ret_from_fork+0x28/0x30 Last Breaking-Event-Address: [<0000000076fc0474>] __kmalloc+0x234/0x398 Kernel panic - not syncing: Fatal exception: panic_on_oops To fix this, simply change the type of the cache variable to 'unsigned long', like the rest of zfcp and also the argument for 'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension and so can successfully remove the request from the hash table. Fixes: `e60a6d69f1` ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe") Cc: <stable@vger.kernel.org> #v2.6.34+ Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.1668595505.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-25 17:45:53 +01:00
Steffen Maier	b8aad5eba7	scsi: zfcp: Fix missing auto port scan and thus missing target ports commit 4da8c5f76825269f28d6a89fa752934a4bcb6dfa upstream. Case (1): The only waiter on wka_port->completion_wq is zfcp_fc_wka_port_get() trying to open a WKA port. As such it should only be woken up by WKA port open responses, not by WKA port close responses. Case (2): A close WKA port response coming in just after having sent a new open WKA port request and before blocking for the open response with wait_event() in zfcp_fc_wka_port_get() erroneously renders the wait_event a NOP because the close handler overwrites wka_port->status. Hence the wait_event condition is erroneously true and it does not enter blocking state. With non-negligible probability, the following time space sequence happens depending on timing without this fix: user process ERP thread zfcp work queue tasklet system work queue ============ ========== =============== ======= ================= $ echo 1 > online zfcp_ccw_set_online zfcp_ccw_activate zfcp_erp_adapter_reopen msleep scan backoff zfcp_erp_strategy \| ... \| zfcp_erp_action_cleanup \| ... \| queue delayed scan_work \| queue ns_up_work \| ns_up_work: \| zfcp_fc_wka_port_get \| open wka request \| open response \| GSPN FC-GS \| RSPN FC-GS [NPIV-only] \| zfcp_fc_wka_port_put \| (--wka->refcount==0) \| sched delayed wka->work \| ~~~Case (1)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ zfcp_erp_wait flush scan_work \| wka->work: \| wka->status=CLOSING \| close wka request \| scan_work: \| zfcp_fc_wka_port_get \| (wka->status==CLOSING) \| wka->status=OPENING \| open wka request \| wait_event \| \| close response \| \| wka->status=OFFLINE \| \| wake_up /WRONG/ ~~~Case (2)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| wka->work: \| wka->status=CLOSING \| close wka request zfcp_erp_wait flush scan_work \| scan_work: \| zfcp_fc_wka_port_get \| (wka->status==CLOSING) \| wka->status=OPENING \| open wka request \| close response \| wka->status=OFFLINE \| wake_up /WRONG&NOP/ \| wait_event /NOP/ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| (wka->status!=ONLINE) \| return -EIO \| return early open response wka->status=ONLINE wake_up /NOP/ So we erroneously end up with no automatic port scan. This is a big problem when it happens during boot. The timing is influenced by v3.19 commit `18f87a67e6` ("zfcp: auto port scan resiliency"). Fix it by fully mutually excluding zfcp_fc_wka_port_get() and zfcp_fc_wka_port_offline(). For that to work, we make the latter block until we got the response for a close WKA port. In order not to penalize the system workqueue, we move wka_port->work to our own adapter workqueue. Note that before v2.6.30 commit `828bc1212a` ("[SCSI] zfcp: Set WKA-port to offline on adapter deactivation"), zfcp did block in zfcp_fc_wka_port_offline() as well, but with a different condition. While at it, make non-functional cleanups to improve code reading in zfcp_fc_wka_port_get(). If we cannot send the WKA port open request, don't rely on the subsequent wait_event condition to immediately let this case pass without blocking. Also don't want to rely on the additional condition handling the refcount to be skipped just to finally return with -EIO. Link: https://lore.kernel.org/r/20220729162529.1620730-1-maier@linux.ibm.com Fixes: `5ab944f97e` ("[SCSI] zfcp: attach and release SAN nameserver port on demand") Cc: <stable@vger.kernel.org> #v2.6.28+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-21 15:16:13 +02:00
Christoph Hellwig	1cb3032406	block: remove the request_queue to argument request based tracepoints [ Upstream commit a54895fa057c67700270777f7661d8d3c7fda88a ] The request_queue can trivially be derived from the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-21 15:15:36 +02:00
Steffen Maier	f08801252d	scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices commit 8c9db6679be4348b8aae108e11d4be2f83976e30 upstream. Suppose we have an environment with a number of non-NPIV FCP devices (virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical FCP channel (HBA port) and its I_T nexus. Plus a number of storage target ports zoned to such shared channel. Now one target port logs out of the fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port recovery depending on the ADISC result. This happens on all such FCP devices (in different Linux images) concurrently as they all receive a copy of this RSCN. In the following we look at one of those FCP devices. Requests other than FSF_QTCB_FCP_CMND can be slow until they get a response. Depending on which requests are affected by slow responses, there are different recovery outcomes. Here we want to fix failed recoveries on port or adapter level by avoiding recovery requests that can be slow. We need the cached N_Port_ID for the remote port "link" test with ADISC. Just before sending the ADISC, we now intentionally forget the old cached N_Port_ID. The idea is that on receiving an RSCN for a port, we have to assume that any cached information about this port is stale. This forces a fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for the same port. Since we typically can still communicate with the nameserver efficiently, we now reach steady state quicker: Either the nameserver still does not know about the port so we stop recovery, or the nameserver already knows the port potentially with a new N_Port_ID and we can successfully and quickly perform open port recovery. For the one case, where ADISC returns successfully, we re-initialize port->d_id because that case does not involve any port recovery. This also solves a problem if the storage WWPN quickly logs into the fabric again but with a different N_Port_ID. Such as on virtual WWPN takeover during target NPIV failover. [https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the RSCN from the storage FDISC was ignored by zfcp and we could not successfully recover the failover. On some later failback on the storage, we could have been lucky if the virtual WWPN got the same old N_Port_ID from the SAN switch as we still had cached. Then the related RSCN triggered a successful port reopen recovery. However, there is no guarantee to get the same N_Port_ID on NPIV FDISC. Even though NPIV-enabled FCP devices are not affected by this problem, this code change optimizes recovery time for gone remote ports as a side effect. The timely drop of cached N_Port_IDs prevents unnecessary slow open port attempts. While the problem might have been in code before v2.6.32 commit `799b76d09a` ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix depends on the gid_pn_work introduced with that commit, so we mark it as culprit to satisfy fix dependencies. Note: Point-to-point remote port is already handled separately and gets its N_Port_ID from the cached peer_d_id. So resetting port->d_id in general does not affect PtP. Link: https://lore.kernel.org/r/20220118165803.3667947-1-maier@linux.ibm.com Fixes: `799b76d09a` ("[SCSI] zfcp: Decouple gid_pn requests from erp") Cc: <stable@vger.kernel.org> #2.6.32+ Suggested-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-01 17:25:39 +01:00
Steffen Maier	e1261c7a84	scsi: zfcp: Report port fc_security as unknown early during remote cable pull commit 8b3bdd99c092bbaeaa7d9eecb1a3e5dc9112002b upstream. On remote cable pull, a zfcp_port keeps its status and only gets ZFCP_STATUS_PORT_LINK_TEST added. Only after an ADISC timeout, we would actually start port recovery and remove ZFCP_STATUS_COMMON_UNBLOCKED which zfcp_sysfs_port_fc_security_show() detected and reported as "unknown" instead of the old and possibly stale zfcp_port->connection_info. Add check for ZFCP_STATUS_PORT_LINK_TEST for timely "unknown" report. Link: https://lore.kernel.org/r/20210702160922.2667874-1-maier@linux.ibm.com Fixes: `a17c784600` ("scsi: zfcp: report FC Endpoint Security in sysfs") Cc: <stable@vger.kernel.org> #5.7+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:05:36 +02:00
Linus Torvalds	847d4287a0	Merge tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Remove address space overrides using set_fs() - Convert to generic vDSO - Convert to generic page table dumper - Add ARCH_HAS_DEBUG_WX support - Add leap seconds handling support - Add NVMe firmware-assisted kernel dump support - Extend NVMe boot support with memory clearing control and addition of kernel parameters - AP bus and zcrypt api code rework. Add adapter configure/deconfigure interface. Extend debug features. Add failure injection support - Add ECC secure private keys support - Add KASan support for running protected virtualization host with 4-level paging - Utilize destroy page ultravisor call to speed up secure guests shutdown - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code - Various checksum improvements - Other small various fixes and improvements all over the code * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits) s390/uaccess: fix indentation s390/uaccess: add default cases for __put_user_fn()/__get_user_fn() s390/zcrypt: fix wrong format specifications s390/kprobes: move insn_page to text segment s390/sie: fix typo in SIGP code description s390/lib: fix kernel doc for memcmp() s390/zcrypt: Introduce Failure Injection feature s390/zcrypt: move ap_msg param one level up the call chain s390/ap/zcrypt: revisit ap and zcrypt error handling s390/ap: Support AP card SCLP config and deconfig operations s390/sclp: Add support for SCLP AP adapter config/deconfig s390/ap: add card/queue deconfig state s390/ap: add error response code field for ap queue devices s390/ap: split ap queue state machine state from device state s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG s390/zcrypt: introduce msg tracking in zcrypt functions s390/startup: correct early pgm check info formatting s390: remove orphaned extern variables declarations s390/kasan: make sure int handler always run with DAT on s390/ipl: add support to control memory clearing for nvme re-IPL ...	2020-10-16 12:36:38 -07:00
Julian Wiedmann	d251193d17	scsi: zfcp: Clarify access to erp_action in zfcp_fsf_req_complete() While reviewing commit `936e6b85da` ("scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP action"), I stumbled over zfcp_fsf_req_complete() and wondered whether it has similar issues wrt concurrent modification of req->erp_action by zfcp_erp_strategy_check_fsfreq(). But a closer look shows that both its two callers [zfcp_fsf_reqid_check(), zfcp_fsf_req_dismiss_all()] remove the request from the adapter's req_list under the req_list's lock. Hence we can trust that if zfcp_erp_strategy_check_fsfreq() concurrently looks up the corresponding req_id, it won't find this request and is thus unable to modify it while it's being processed by zfcp_fsf_req_complete(). Add a code comment that hopefully makes this easier for future readers, and condense the two accesses to ->erp_action that made me trip over this code path in the first place. Link: https://lore.kernel.org/r/c500eac301fcbba5af942bbd200f2d6b14e46994.1599765652.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-09-15 18:01:58 -04:00
Julian Wiedmann	addf137296	scsi: zfcp: Use list_first_entry_or_null() in zfcp_erp_thread() Use the right helper to avoid poking around in the list's internals. Link: https://lore.kernel.org/r/ed669555c73aab95b29444c10066f492c0c43391.1599765652.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-09-15 18:01:57 -04:00
Julian Wiedmann	180a4c42e5	s390/qdio: always use dev_name() for device name in QIB Passing a custom name from the device driver is nice - but in practice it's only zfcp who has been using this. So we might as well hard-code a naming scheme in the qdio layer, so that qeth also benefits from it. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-09-14 10:30:07 +02:00
Steffen Maier	2d9a2c5f58	scsi: zfcp: Fix use-after-free in request timeout handlers Before v4.15 commit `75492a5156` ("s390/scsi: Convert timers to use timer_setup()"), we intentionally only passed zfcp_adapter as context argument to zfcp_fsf_request_timeout_handler(). Since we only trigger adapter recovery, it was unnecessary to sync against races between timeout and (late) completion. Likewise, we only passed zfcp_erp_action as context argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action, it was unnecessary to sync against races between timeout and (late) completion. Meanwhile the timeout handlers get timer_list as context argument and do a timer-specific container-of to zfcp_fsf_req which can have been freed. Fix it by making sure that any request timeout handlers, that might just have started before del_timer(), are completed by using del_timer_sync() instead. This ensures the request free happens afterwards. Space time diagram of potential use-after-free: Basic idea is to have 2 or more pending requests whose timeouts run out at almost the same time. req 1 timeout ERP thread req 2 timeout ---------------- ---------------- --------------------------------------- zfcp_fsf_request_timeout_handler fsf_req = from_timer(fsf_req, t, timer) adapter = fsf_req->adapter zfcp_qdio_siosl(adapter) zfcp_erp_adapter_reopen(adapter,...) zfcp_erp_strategy ... zfcp_fsf_req_dismiss_all list_for_each_entry_safe zfcp_fsf_req_complete 1 del_timer 1 zfcp_fsf_req_free 1 zfcp_fsf_req_complete 2 zfcp_fsf_request_timeout_handler del_timer 2 fsf_req = from_timer(fsf_req, t, timer) zfcp_fsf_req_free 2 adapter = fsf_req->adapter ^^^^^^^ already freed Link: https://lore.kernel.org/r/20200813152856.50088-1-maier@linux.ibm.com Fixes: `75492a5156` ("s390/scsi: Convert timers to use timer_setup()") Cc: <stable@vger.kernel.org> #4.15+ Suggested-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:12:09 -04:00
Linus Torvalds	dfdf16ecfd	Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This consists of the usual driver updates (ufs, qla2xxx, tcmu, lpfc, hpsa, zfcp, scsi_debug) and minor bug fixes. We also have a huge docbook fix update like most other subsystems and no major update to the core (the few non trivial updates are either minor fixes or removing an unused feature [scsi_sdb_cache])" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (307 commits) scsi: scsi_transport_srp: Sanitize scsi_target_block/unblock sequences scsi: ufs-mediatek: Apply DELAY_AFTER_LPM quirk to Micron devices scsi: ufs: Introduce device quirk "DELAY_AFTER_LPM" scsi: virtio-scsi: Correctly handle the case where all LUNs are unplugged scsi: scsi_debug: Implement tur_ms_to_ready parameter scsi: scsi_debug: Fix request sense scsi: lpfc: Fix typo in comment for ULP scsi: ufs-mediatek: Prevent LPM operation on undeclared VCC scsi: iscsi: Do not put host in iscsi_set_flashnode_param() scsi: hpsa: Correct ctrl queue depth scsi: target: tcmu: Make TMR notification optional scsi: target: tcmu: Implement tmr_notify callback scsi: target: tcmu: Fix and simplify timeout handling scsi: target: tcmu: Factor out new helper ring_insert_padding scsi: target: tcmu: Do not queue aborted commands scsi: target: tcmu: Use priv pointer in se_cmd scsi: target: Add tmr_notify backend function scsi: target: Modify core_tmr_abort_task() scsi: target: iscsi: Fix inconsistent debug message scsi: target: iscsi: Fix login error when receiving ...	2020-08-06 16:50:07 -07:00
Julian Wiedmann	c3bfffa5ec	scsi: zfcp: Avoid benign overflow of the Request Queue's free-level zfcp_qdio_send() and zfcp_qdio_int_req() run concurrently, adding and completing SBALs on the Request Queue. There's a theoretical race where zfcp_qdio_int_req() completes a number of SBALs & increments the queue's free-level _before_ zfcp_qdio_send() was able to decrement it. This can cause ->req_q_free to momentarily hold a value larger than QDIO_MAX_BUFFERS_PER_Q. Luckily zfcp_qdio_send() is always called under ->req_q_lock, and all readers of the free-level also take this lock. So we can trust that zfcp_qdio_send() will clean up such a temporary overflow before anyone can actually observe it. But it's still confusing and annoying to worry about. So adjust the code to avoid this race. Link: https://lore.kernel.org/r/7f61f59a1f8db270312e64644f9173b8f1ac895f.1593780621.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-08 00:50:56 -04:00
Julian Wiedmann	6bcb7c171a	scsi: zfcp: Replace open-coded list move Instead of manually moving each element of the unit and port lists into our temporary on-stack lists, splice them over in one go. Link: https://lore.kernel.org/r/cacb179f49ece50fd4dce119c61252d632cdc1d4.1593780621.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-08 00:50:55 -04:00
Julian Wiedmann	b43cdb5ac8	scsi: zfcp: Clean up zfcp_erp_action_ready() We already maintain a pointer to act->adapter. Use it consistently to avoid any confusion about whose ->erp_ready_head and ->erp_ready_wq we are accessing. Link: https://lore.kernel.org/r/d1bb04322f240dee32f4c4a551bc93bc736f4b01.1593780621.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-08 00:50:55 -04:00
Julian Wiedmann	459ad085d8	scsi: zfcp: Fix an outdated comment for zfcp_qdio_send() zfcp no longer uses the qdio PCI flag, update the comment. Link: https://lore.kernel.org/r/6717c26fc986bff8776d110e27c199b523684c63.1593780621.git.bblock@linux.ibm.com Fixes: `21ddaa53f9` ("[SCSI] zfcp: Remove PCI flag") Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-08 00:50:53 -04:00
George Spelvin	0cd0e57ec8	scsi: zfcp: Use prandom_u32_max() for backoff We don't need crypto-grade random numbers for randomized backoffs. Instead use prandom_u32_max(ep_ro) which generates a pseudo-random number uniformly distributed in the interval [0, ep_ro). Link: https://lore.kernel.org/r/8fc7c4c4069ff1783f4a9ccd84a923f581a09ec5.1593780621.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: George Spelvin <lkml@sdf.org> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-08 00:50:52 -04:00
Steffen Maier	936e6b85da	scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP action Suppose that, for unrelated reasons, FSF requests on behalf of recovery are very slow and can run into the ERP timeout. In the case at hand, we did adapter recovery to a large degree. However due to the slowness a LUN open is pending so the corresponding fc_rport remains blocked. After fast_io_fail_tmo we trigger close physical port recovery for the port under which the LUN should have been opened. The new higher order port recovery dismisses the pending LUN open ERP action and dismisses the pending LUN open FSF request. Such dismissal decouples the ERP action from the pending corresponding FSF request by setting zfcp_fsf_req->erp_action to NULL (among other things) [zfcp_erp_strategy_check_fsfreq()]. If now the ERP timeout for the pending open LUN request runs out, we must not use zfcp_fsf_req->erp_action in the ERP timeout handler. This is a problem since v4.15 commit `75492a5156` ("s390/scsi: Convert timers to use timer_setup()"). Before that we intentionally only passed zfcp_erp_action as context argument to zfcp_erp_timeout_handler(). Note: The lifetime of the corresponding zfcp_fsf_req object continues until a (late) response or an (unrelated) adapter recovery. Just like the regular response path ignores dismissed requests [zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the ERP timeout handler now needs to ignore dismissed requests. So simply return early in the ERP timeout handler if the FSF request is marked as dismissed in its status flags. To protect against the race where zfcp_erp_strategy_check_fsfreq() dismisses and sets zfcp_fsf_req->erp_action to NULL after our previous status flag check, return early if zfcp_fsf_req->erp_action is NULL. After all, the former ERP action does not need to be woken up as that was already done as part of the dismissal above [zfcp_erp_action_dismiss()]. This fixes the following panic due to kernel page fault in IRQ context: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d Oops: 0004 ilc:2 [#1] SMP Modules linked in: ... CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G E X ... Hardware name: IBM 8561 T01 701 (LPAR) Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp]) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000 000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02 0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000 0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8 Krnl Code: 001fffff80549bd4: a7180000 lhi %r1,0 001fffff80549bd8: 4120a0f0 la %r2,240(%r10) #001fffff80549bdc: a53e0003 llilh %r3,3 >001fffff80549be0: ba132000 cs %r1,%r3,0(%r2) 001fffff80549be4: a7740037 brc 7,1fffff80549c52 001fffff80549be8: e320b0180004 lg %r2,24(%r11) 001fffff80549bee: e31020e00004 lg %r1,224(%r2) 001fffff80549bf4: 412020e0 la %r2,224(%r2) Call Trace: [<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp] [<00000985915e26f0>] call_timer_fn+0x38/0x190 [<00000985915e2944>] expire_timers+0xfc/0x190 [<00000985915e2ac4>] run_timer_softirq+0xec/0x218 [<0000098591ca7c4c>] __do_softirq+0x144/0x398 [<00000985915110aa>] do_softirq_own_stack+0x72/0x88 [<0000098591551b58>] irq_exit+0xb0/0xb8 [<0000098591510c6a>] do_IRQ+0x82/0xb0 [<0000098591ca7140>] ext_int_handler+0x128/0x12c [<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60 ([<000009859172ae4c>] clear_huge_page+0xec/0x250) [<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768 [<000009859172a712>] __handle_mm_fault+0x88a/0x900 [<000009859172a860>] handle_mm_fault+0xd8/0x1b0 [<0000098591529ef6>] do_dat_exception+0x136/0x3e8 [<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220 Last Breaking-Event-Address: [<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp] Kernel panic - not syncing: Fatal exception in interrupt Link: https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com Fixes: `75492a5156` ("s390/scsi: Convert timers to use timer_setup()") Cc: <stable@vger.kernel.org> #4.15+ Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-06-24 00:01:09 -04:00
Benjamin Block	d0dff2ac98	scsi: zfcp: Move allocation of the shost object to after xconf- and xport-data At the moment we allocate and register the Scsi_Host object corresponding to a zfcp adapter (FCP device) very early in the life cycle of the adapter - even before we fully discover and initialize the underlying firmware/hardware. This had the advantage that we could already use the Scsi_Host object, and fill in all its information during said discover and initialize. Due to commit `737eb78e82` ("block: Delay default elevator initialization") (first released in v5.4), we noticed a regression that would prevent us from using any storage volume if zfcp is configured with support for DIF or DIX (zfcp.dif=1 \|\| zfcp.dix=1). Doing so would result in an illegal memory access as soon as the first request is sent with such an configuration. As example for a crash resulting from this: scsi host0: scsi_eh_0: sleeping scsi host0: zfcp qdio: 0.0.1900 ZFCP on SC 4bd using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36 Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:0000000035c7c007 R3:00000001effcc007 S:00000001effd1000 P:000000000000003d Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: ... CPU: 1 PID: 783 Comm: kworker/u760:5 Kdump: loaded Not tainted 5.6.0-rc2-bb-next+ #1 Hardware name: ... Workqueue: scsi_wq_0 fc_scsi_scan_rport [scsi_transport_fc] Krnl PSW : 0704e00180000000 000003ff801fcdae (scsi_queue_rq+0x436/0x740 [scsi_mod]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0fffffffffffffff 0000000000000000 0000000187150120 0000000000000000 000003ff80223d20 000000000000018e 000000018adc6400 0000000187711000 000003e0062337e8 00000001ae719000 0000000187711000 0000000187150000 00000001ab808100 0000000187150120 000003ff801fcd74 000003e0062336a0 Krnl Code: 000003ff801fcd9e: e310a35c0012 lt %r1,860(%r10) 000003ff801fcda4: a7840010 brc 8,000003ff801fcdc4 #000003ff801fcda8: e310b2900004 lg %r1,656(%r11) >000003ff801fcdae: d71710001000 xc 0(24,%r1),0(%r1) 000003ff801fcdb4: e310b2900004 lg %r1,656(%r11) 000003ff801fcdba: 41201018 la %r2,24(%r1) 000003ff801fcdbe: e32010000024 stg %r2,0(%r1) 000003ff801fcdc4: b904002b lgr %r2,%r11 Call Trace: [<000003ff801fcdae>] scsi_queue_rq+0x436/0x740 [scsi_mod] ([<000003ff801fcd74>] scsi_queue_rq+0x3fc/0x740 [scsi_mod]) [<00000000349c9970>] blk_mq_dispatch_rq_list+0x390/0x680 [<00000000349d1596>] blk_mq_sched_dispatch_requests+0x196/0x1a8 [<00000000349c7a04>] __blk_mq_run_hw_queue+0x144/0x160 [<00000000349c7ab6>] __blk_mq_delay_run_hw_queue+0x96/0x228 [<00000000349c7d5a>] blk_mq_run_hw_queue+0xd2/0xe0 [<00000000349d194a>] blk_mq_sched_insert_request+0x192/0x1d8 [<00000000349c17b8>] blk_execute_rq_nowait+0x80/0x90 [<00000000349c1856>] blk_execute_rq+0x6e/0xb0 [<000003ff801f8ac2>] __scsi_execute+0xe2/0x1f0 [scsi_mod] [<000003ff801fef98>] scsi_probe_and_add_lun+0x358/0x840 [scsi_mod] [<000003ff8020001c>] __scsi_scan_target+0xc4/0x228 [scsi_mod] [<000003ff80200254>] scsi_scan_target+0xd4/0x100 [scsi_mod] [<000003ff802d8b96>] fc_scsi_scan_rport+0x96/0xc0 [scsi_transport_fc] [<0000000034245ce8>] process_one_work+0x458/0x7d0 [<00000000342462a2>] worker_thread+0x242/0x448 [<0000000034250994>] kthread+0x15c/0x170 [<0000000034e1979c>] ret_from_fork+0x30/0x38 INFO: lockdep is turned off. Last Breaking-Event-Address: [<000003ff801fbc36>] scsi_add_cmd_to_list+0x9e/0xa8 [scsi_mod] Kernel panic - not syncing: Fatal exception: panic_on_oops While this issue is exposed by the commit named above, this is only by accident. The real issue exists for longer already - basically since it's possible to use blk-mq via scsi-mq, and blk-mq pre-allocates all requests for a tag-set during initialization of the same. For a given Scsi_Host object this is done when adding the object to the midlayer (`scsi_add_host()` and such). In `scsi_mq_setup_tags()` the midlayer calculates how much memory is required for a single scsi_cmnd, and its additional data, which also might include space for additional protection data - depending on whether the Scsi_Host has any form of protection capabilities (`scsi_host_get_prot()`). The problem is now thus, because zfcp does this step before we actually know whether the firmware/hardware has these capabilities, we don't set any protection capabilities in the Scsi_Host object. And so, no space is allocated for additional protection data for requests in the Scsi_Host tag-set. Once we go through discover and initialize the FCP device firmware/hardware fully (this is done via the firmware commands "Exchange Config Data" and "Exchange Port Data") we find out whether it actually supports DIF and DIX, and we set the corresponding capabilities in the Scsi_Host object (in `zfcp_scsi_set_prot()`). Now the Scsi_Host potentially has protection capabilities, but the already allocated requests in the tag-set don't have any space allocated for that. When we then trigger target scanning or add scsi_devices manually, the midlayer will use requests from that tag-set, and before sending most requests, it will also call `scsi_mq_prep_fn()`. To prepare the scsi_cmnd this function will check again whether the used Scsi_Host has any protection capabilities - and now it potentially has - and if so, it will try to initialize the assumed to be preallocated structures and thus it causes the crash, like shown above. Before delaying the default elevator initialization with the commit named above, we always would also allocate an elevator for any scsi_device before ever sending any requests - in contrast to now, where we do it after device-probing. That elevator in turn would have its own tag-set, and that is initialized after we went through discovery and initialization of the underlying firmware/hardware. So requests from that tag-set can be allocated properly, and if used - unless the user changes/disabled the default elevator - this would hide the underlying issue. To fix this for any configuration - with or without an elevator - we move the allocation and registration of the Scsi_Host object for a given FCP device to after the first complete discovery and initialization of the underlying firmware/hardware. By doing that we can make all basic properties of the Scsi_Host known to the midlayer by the time we call `scsi_add_host()`, including whether we have any protection capabilities. To do that we have to delay all the accesses that we would have done in the past during discovery and initialization, and do them instead once we are finished with it. The previous patches ramp up to this by fencing and factoring out all these accesses, and make it possible to re-do them later on. In addition we make also use of the diagnostic buffers we recently added with commit `92953c6e0a` ("scsi: zfcp: signal incomplete or error for sync exchange config/port data") commit `7e418833e6` ("scsi: zfcp: diagnostics buffer caching and use for exchange port data") commit `088210233e` ("scsi: zfcp: add diagnostics buffer for exchange config data") (first released in v5.5), because these already cache all the information we need for that "re-do operation" - the information cached are always updated during xconf or xport data, so it won't be stale. In addition to the move and re-do, this patch also updates the function-documentation of `zfcp_scsi_adapter_register()` and changes how it reports if a Scsi_Host object already exists. In that case future recovery-operations can skip this step completely and behave much like they would do in the past - zfcp does not release a once allocated Scsi_Host object unless the corresponding FCP device is deconstructed completely. Link: https://lore.kernel.org/r/030dd6da318bbb529f0b5268ec65cebcd20fc0a3.1588956679.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-11 23:19:52 -04:00
Benjamin Block	71159b6ecb	scsi: zfcp: Fence early sysfs interfaces for accesses of shost objects When setting an adapter online for the first time, we also create a couple of entries for it in the sysfs device tree. This is also true even if the adapter has not yet ever gone successfully through exchange config and exchange port data. When moving the scsi host object allocation and registration to after the first exchange config and exchange port data, this make the `port_rescan` attribute susceptible to invalid pointer-dereferences of the shost field before the adapter is fully initialized. When written to, it schedules a `scan_work` item that will in turn make use of the associated fibre channel host object to check the topology used for this FCP device. Because scanning for remote ports can't be done successfully without completing exchange config and exchange port data first, we can simply fence `port_rescan`, and so prevent the illegal access. As with cases where we can't get a reference to the adapter, we also return -ENODEV here. Applications need to handle that errno today already. After a successful allocation of the scsi host object nothing changes in the work flow. Link: https://lore.kernel.org/r/ef65366d309993ca91b6917727590ca7ca166c8f.1588956679.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-11 23:19:50 -04:00
Benjamin Block	971f2abb4c	scsi: zfcp: Fence adapter status propagation for common statuses Common status flags that all main objects - adapter, port, and unit - support are propagated to sub-objects when set or cleared. For instance, when setting the status ZFCP_STATUS_COMMON_ERP_INUSE for an adapter object, we will propagate this to all its child ports and units - same for when clearing a common status flag. Units of an adapter object are enumerated via __shost_for_each_device() over the scsi host object of the corresponding adapter. Once we move the scsi host object allocation and registration to after the first exchange config and exchange port data, this won't be possible for cases where we set or clear common statuses during the very first adapter recovery. But since we won't have any port or unit objects yet at that point of time, we can just fence the status propagation for cases where the scsi host object is not yet set in the adapter object. It won't change any effective status propagations, but will prevent us from dereferencing invalid pointers. For any later point in the work flow the scsi host object will be set and thus nothing is changed then. Link: https://lore.kernel.org/r/f51fe5f236a1e3d1ce53379c308777561bfe35e1.1588956679.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-11 23:19:50 -04:00
Benjamin Block	ac007adc4d	scsi: zfcp: Move p-t-p port allocation to after xport data When doing the very first adapter recovery - initialization - for a FCP device in a point-to-point topology we also allocate the port object corresponding to the attached remote port, and trigger a port recovery for it that will run after the adapter recovery finished. Right now this happens right after we finished with the exchange config data command, and uses the fibre channel host object corresponding to the FCP device to determine whether a point-to-point topology is used. When moving the scsi host object allocation and registration - and thus also the fibre channel host object allocation - to after the first exchange config and exchange port data, this use of the fc_host object is not possible anymore at that point in the work flow. But the allocation and recovery trigger doesn't have notable side-effects on the following exchange port data processing, so we can move those to after xport data, and thus also to after the scsi host object allocation, once we move it. Then the fc_host object can be used again, like it is now. For any further adapter recoveries this doesn't change anything, because at that point the port object already exists and recovery is triggered elsewhere for existing port objects. Link: https://lore.kernel.org/r/73e5d4ac21e2b37bf0c3ca8e530bc5a5c6e74f8f.1588956679.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-11 23:19:49 -04:00
Benjamin Block	990486f3a8	scsi: zfcp: Fence fc_host updates during link-down handling When receiving a notification that a FCP device lost its local link we usually update the fibre channel host object which represents that FCP device to reflect that. This notification/information can also surface when the FCP device is running through adapter recovery (exchange config and exchange port data return incomplete). When moving the scsi host object allocation and registration - and thus also the fibre channel host object allocation - to after the first exchange config and exchange port data, and this happens during the very first adapter recovery, these updates can not be done until after the scsi host object is allocated. Reorder the fc_host updates in zfcp_fsf_fc_host_link_down() so that they only happen after a check of whether the scsi host object is already allocated or not. During the first adapter recovery this will cause the skip of these updates if a link-down condition is detected, but we can repeat them after we allocated the scsi host object, if necessary. For any further link-down handling the only changes in the work flow are the slightly reordered assignments in zfcp_fsf_fc_host_link_down(). Link: https://lore.kernel.org/r/f841f2cda61dcd7b8549910c44e1831927459edf.1588956679.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-11 23:19:48 -04:00
Benjamin Block	52e61fde5e	scsi: zfcp: Move fc_host updates during xport data handling into fenced function When executing exchange port data for a FCP device for the first time, or after an adapter recovery, we update several properties of the fibre channel host object which represents that FCP device. When moving the scsi host object allocation and registration - and thus also the fibre channel host object allocation - to after the first exchange config and exchange port data, this is not possible for the former case. Move all these update into separate, and fenced function that first checks whether the scsi host object already exists or not, before making the updates. During the first ever exchange port data in the adapter life cycle this will make the exchange port data handler skip over this update step, but we can repeat it later, after we allocated the scsi host object. For any further recovery of that adapter the work flow is only changed slightly because then the scsi host object already exists and we don't free it until we release the adapter completely at the end of its life cycle. Link: https://lore.kernel.org/r/ae454c2dc6da0b02907c489af91d0b211d331825.1588956679.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-11 23:19:48 -04:00
Benjamin Block	bd1684817d	scsi: zfcp: Move shost updates during xconfig data handling into fenced function When executing exchange config data for a FCP device for the first time, or after an adapter recovery, we update several properties of the scsi host or fibre channel host object that represent that FCP device. When moving the scsi host object allocation and registration - and thus also the fibre channel host object allocation - to after the first exchange config and exchange port data, this is not possible for the former case. Move all these update into separate, and fenced function that first checks whether the scsi host object already exists or not, before making the updates. During the first ever exchange config data in the adapter life cycle this will make the exchange config data handler skip over this update step, but we can repeat it later, after we allocated the scsi host object. For any further recovery of that adapter the work flow is only changed slightly because then the scsi host object already exists and we don't free it until we release the adapter completely at the end of its life cycle. Link: https://lore.kernel.org/r/5fc3f4d38d4334f7aa595497c6f7865fb1102e0f.1588956679.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-11 23:19:47 -04:00
Benjamin Block	978857c7e3	scsi: zfcp: Move shost modification after QDIO (re-)open into fenced function When establishing and activating the QDIO queue pair for a FCP device for the first time, or after an adapter recovery, we publish some of its characteristics to the scsi host object representing that FCP device. When moving the scsi host object allocation and registration to after the first exchange config and exchange port data, this is not possible for the former case - QDIO open for the first time - because that happens before exchange config and exchange port data. Move the scsi host object update into a fenced function that checks whether the object already exists or not. This way we can repeat that step later, once we are past the allocation. Once the first recovery succeeds we don't release the scsi host object anymore, so further recoveries do work as before. Link: https://lore.kernel.org/r/a214ebf508f71e3690113e3e90edab1cea0e24e3.1588956679.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-11 23:19:46 -04:00
Linus Torvalds	93f3321f65	Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull more SCSI updates from James Bottomley: "This is a batch of changes that didn't make it in the initial pull request because the lpfc series had to be rebased to redo an incorrect split. It's basically driver updates to lpfc, target, bnx2fc and ufs with the rest being minor updates except the sr_block_release one which fixes a use after free introduced by the removal of the global mutex in the first patch set" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (35 commits) scsi: core: Add DID_ALLOC_FAILURE and DID_MEDIUM_ERROR to hostbyte_table scsi: ufs: Use ufshcd_config_pwr_mode() when scaling gear scsi: bnx2fc: fix boolreturn.cocci warnings scsi: zfcp: use fallthrough; scsi: aacraid: do not overwrite retval in aac_reset_adapter() scsi: sr: Fix sr_block_release() scsi: aic7xxx: Remove more FreeBSD-specific code scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug scsi: ufs: set device as active power mode after resetting device scsi: iscsi: Report unbind session event when the target has been removed scsi: lpfc: Change default SCSI LUN QD to 64 scsi: libfc: rport state move to PLOGI if all PRLI retry exhausted scsi: libfc: If PRLI rejected, move rport to PLOGI state scsi: bnx2fc: Update the driver version to 2.12.13 scsi: bnx2fc: Fix SCSI command completion after cleanup is posted scsi: bnx2fc: Process the RQE with CQE in interrupt context scsi: target: use the stack for XCOPY passthrough cmds scsi: target: increase XCOPY I/O size scsi: target: avoid per-loop XCOPY buffer allocations scsi: target: drop xcopy DISK BLOCK LENGTH debug ...	2020-04-10 12:21:11 -07:00
Julian Wiedmann	1da1092dbf	s390/qdio: remove cdev from init_data It's no longer needed. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-04-06 13:13:50 +02:00
Julian Wiedmann	d8564e19da	s390/qdio: allow for non-contiguous SBAL array in init_data Upper-layer drivers allocate their SBALs by calling qdio_alloc_buffers() for each individual queue. But when later passing the SBAL addresses to qdio_establish(), they need to be in a single array of pointers. So if the driver uses multiple Input or Output queues, it needs to allocate a temporary array just to present all its SBAL pointers in this layout. This patch slightly changes the format of the QDIO initialization data, so that drivers can pass a per-queue array where each element points to a queue's SBAL array. zfcp doesn't use multiple queues, so the impact there is trivial. For qeth this brings a nice reduction in complexity, and removes a page-sized allocation. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-04-06 13:13:50 +02:00
Julian Wiedmann	ad96401cdb	zfcp: inline zfcp_qdio_setup_init_data() In preparation for a subsequent patch, move the setup of init_data into the only caller. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-04-06 13:13:50 +02:00
Julian Wiedmann	3db1db93e3	s390/qdio: cleanly split alloc and establish All that qdio_allocate() actually uses from the init_data is the cdev, and the number of Input and Output Queues. Have the driver pass those as parameters, and defer the init_data processing into qdio_establish(). This includes writing per-device(!) trace entries, and most of the sanity checks. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-04-06 13:13:50 +02:00
Linus Torvalds	79f51b7b9c	Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series has a huge amount of churn because it pulls in Mauro's doc update changing all our txt files to rst ones. Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc, zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and some other minor updates. The major core change is Hannes moving functions out of the aacraid driver and into the core" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits) scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code scsi: ufs: Do not rely on prefetched data scsi: dc395x: remove dc395x_bios_param scsi: libiscsi: Fix error count for active session scsi: hpsa: correct race condition in offload enabled scsi: message: fusion: Replace zero-length array with flexible-array member scsi: qedi: Add PCI shutdown handler support scsi: qedi: Add MFW error recovery process scsi: ufs: Enable block layer runtime PM for well-known logical units scsi: ufs-qcom: Override devfreq parameters scsi: ufshcd: Let vendor override devfreq parameters scsi: ufshcd: Update the set frequency to devfreq scsi: ufs: Resume ufs host before accessing ufs device scsi: ufs-mediatek: customize the delay for enabling host scsi: ufs: make HCE polling more compact to improve initialization latency scsi: ufs: allow custom delay prior to host enabling scsi: ufs-mediatek: use common delay function scsi: ufs: introduce common and flexible delay function scsi: ufs: use an enum for host capabilities scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc() ...	2020-04-02 17:03:53 -07:00
Joe Perches	cec9cbac52	scsi: zfcp: use fallthrough; Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> [bblock@linux.ibm.com: resolved merge conflict with recently upstream-sent patch "zfcp: expose fabric name as common fc_host sysfs attribute"] Link: https://lore.kernel.org/r/d14669a67a17392490d3184117941123765db1a4.1585663010.git.bblock@linux.ibm.com Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-31 22:24:02 -04:00
Jens Remus	42cabdaf10	scsi: zfcp: log FC Endpoint Security errors Log any FC Endpoint Security errors to the kernel ring buffer with rate- limiting. Link: https://lore.kernel.org/r/20200312174505.51294-11-maier@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:42 -04:00
Jens Remus	e53d92856e	scsi: zfcp: enhance handling of FC Endpoint Security errors Enable for explicit FCP channel FC Endpoint Security error reporting and handle any FSF security errors according to specification. Take the following recovery actions when a FSF_SECURITY_ERROR is reported for the specified FSF commands: - Open Port: Retry the command if possible - Send FCP : Physically close the remote port and reopen For Open Port the command status is set to error, which triggers a retry. For Send FCP the command status is set to error and recovery is triggered to physically reopen the remote port. Link: https://lore.kernel.org/r/20200312174505.51294-10-maier@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:41 -04:00
Jens Remus	616da39e00	scsi: zfcp: trace FC Endpoint Security of FCP devices and connections Trace changes in Fibre Channel Endpoint Security capabilities of FCP devices as well as changes in Fibre Channel Endpoint Security state of their connections to FC remote ports as FC Endpoint Security changes with trace level 3 in HBA DBF. A change in FC Endpoint Security capabilities of FCP devices is traced as response to FSF command FSF_QTCB_EXCHANGE_PORT_DATA with a trace tag of "fsfcesa" and a WWPN of ZFCP_DBF_INVALID_WWPN = 0x0000000000000000 (see FC-FS-4 §18 "Name_Identifier Formats", NAA field). A change in FC Endpoint Security state of connections between FCP devices and FC remote ports is traced as response to FSF command FSF_QTCB_OPEN_PORT_WITH_DID with a trace tag of "fsfcesp". Example trace record of FC Endpoint Security capability change of FCP device formatted with zfcpdbf from s390-tools: Timestamp : ... Area : HBA Subarea : 00 Level : 3 Exception : - CPU ID : ... Caller : 0x... Record ID : 5 ZFCP_DBF_HBA_FCES Tag : fsfcesa FSF FC Endpoint Security adapter Request ID : 0x... Request status : 0x00000010 FSF cmnd : 0x0000000e FSF_QTCB_EXCHANGE_PORT_DATA FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000000 FSF_GOOD FSF stat qual : n/a Prot stat : n/a Prot stat qual : n/a Port handle : 0x00000000 none (invalid) LUN handle : n/a WWPN : 0x0000000000000000 ZFCP_DBF_INVALID_WWPN FCES old : 0x00000000 old FC Endpoint Security FCES new : 0x00000007 new FC Endpoint Security Example trace record of FC Endpoint Security change of connection to FC remote port formatted with zfcpdbf from s390-tools: Timestamp : ... Area : HBA Subarea : 00 Level : 3 Exception : - CPU ID : ... Caller : 0x... Record ID : 5 ZFCP_DBF_HBA_FCES Tag : fsfcesp FSF FC Endpoint Security port Request ID : 0x... Request status : 0x00000010 FSF cmnd : 0x00000005 FSF_QTCB_OPEN_PORT_WITH_DID FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000000 FSF_GOOD FSF stat qual : n/a Prot stat : n/a Prot stat qual : n/a Port handle : 0x... WWPN : 0x500507630401120c WWPN FCES old : 0x00000000 old FC Endpoint Security FCES new : 0x00000004 new FC Endpoint Security Link: https://lore.kernel.org/r/20200312174505.51294-9-maier@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:40 -04:00
Jens Remus	f0d26ae847	scsi: zfcp: log FC Endpoint Security of connections Log the usage of and subsequent changes in FC Endpoint Security of connections between FCP devices and FC remote ports to the kernel ring buffer. Activation of FC Endpoint Security is logged as informational. Change and deactivation are logged as warning. No logging takes place, if FC Endpoint Security is not used (i.e. never activated) on a connection or if it does not change during reopen of a port (e.g. due to adapter or port recovery). Link: https://lore.kernel.org/r/20200312174505.51294-8-maier@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:39 -04:00
Jens Remus	a17c784600	scsi: zfcp: report FC Endpoint Security in sysfs Add an interface to read Fibre Channel Endpoint Security information of FCP channels and their connections to FC remote ports. It comes in the form of new sysfs attributes that are attached to the CCW device representing the FCP device and its zfcp port objects. The read-only sysfs attribute "fc_security" of a CCW device representing a FCP device shows the FC Endpoint Security capabilities of the device. Possible values are: "unknown", "unsupported", "none", or a comma- separated list of one or more mnemonics and/or one hexadecimal value representing the supported FC Endpoint Security: Authentication: Authentication supported Encryption : Encryption supported The read-only sysfs attribute "fc_security" of a zfcp port object shows the FC Endpoint Security used on the connection between its parent FCP device and the FC remote port. Possible values are: "unknown", "unsupported", "none", or a mnemonic or hexadecimal value representing the FC Endpoint Security used: Authentication: Connection has been authenticated Encryption : Connection is encrypted Both sysfs attributes may return hexadecimal values instead of mnemonics, if the mnemonic lookup table does not contain an entry for the FC Endpoint Security reported by the FCP device. Link: https://lore.kernel.org/r/20200312174505.51294-7-maier@linux.ibm.com Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:38 -04:00
Jens Remus	185f2d2d59	scsi: zfcp: auto variables for dereferenced structs in open port handler Introduce automatic variables for adapter and QTCB bottom in zfcp_fsf_open_port_handler(). This facilitates subsequent changes to meet the 80 character per line limit. Link: https://lore.kernel.org/r/20200312174505.51294-6-maier@linux.ibm.com Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:36 -04:00
Steffen Maier	7e0e4e0958	scsi: zfcp: fix fc_host attributes that should be unknown on local link down When we get an unsolicited notification on local link went down, zfcp_fsf_status_read_link_down() calls zfcp_fsf_link_down_info_eval(). This only blocks rports, and sets ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED and ZFCP_STATUS_COMMON_ERP_FAILED. Only the fc_host port_state changes to "Linkdown", because zfcp_scsi_get_host_port_state() is an active callback and uses the adapter status. Other fc_host attributes model, port_id, port_type, speed, fabric_name (and zfcp device attributes card_version, peer_wwpn, peer_wwnn, peer_d_id) which depend on a local link, continued to show their last known "good" value. Only if something triggered an exchange config data, some values were updated to their unknown equivalent via case FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE due to local link down. Triggers for exchange config data are adapter recovery, or reading any of the following zfcp-specific scsi host sysfs attributes "requests", "megabytes", or "seconds_active" in /sys/devices/css/../../host/scsi_host/host/. The other fc_host attributes active_fc4s and permanent_port_name continued to show their last known "good" value. Only if something triggered an exchange port data, some values changed. Active_fc4s became all zeros as unknown equivalent during link down. Permanent_port_name does not depend on a local link. But for non-NPIV FCP devices, permanent_port_name erroneously became whatever value fc_host port_name had at that point in time (see previous paragraph). Triggers for exchange port data are the zfcp-specific scsi host sysfs attribute "utilization", or [{reset,get}_fc_host_stats] write anything into "reset_statistics" or read any of the other attributes under /sys/devices/css/../../host/fc_host/host/statistics/. (cf. v4.9 commit `bd77befa5b` ("zfcp: fix fc_host port_type with NPIV")) This is particularly confusing when using "lszfcp -b <fcpdevbusid> -Ha" or dbginfo.sh which read fc_host attributes and also scsi_host attributes. After link down, the first invocation produces (abbreviated): Class = "fc_host" active_fc4s = "0x00 0x00 0x01 0x00 ..." ... fabric_name = "0x10000027f8e04c49" ... permanent_port_name = "0xc05076e4588059c1" port_id = "0x244800" port_state = "Linkdown" port_type = "NPort (fabric via point-to-point)" ... speed = "16 Gbit" Class = "scsi_host" ... megabytes = "0 0" ... requests = "0 0 0" seconds_active = "37" ... utilization = "0 0 0" The second and next invocations produce (abbreviated): Class = "fc_host" active_fc4s = "0x00 0x00 0x00 0x00 ..." ... fabric_name = "0x0" ... permanent_port_name = "0x0" port_id = "0x000000" port_state = "Linkdown" port_type = "Unknown" ... speed = "unknown" Class = "scsi_host" ... megabytes = "0 0" ... requests = "0 0 0" seconds_active = "38" ... utilization = "0 0 0" Factor out the resetting of local link dependent fc_host attributes from zfcp_fsf_exchange_config_data_handler() case FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE into a new helper function zfcp_fsf_fc_host_link_down(). All code places that detect local link down (SRB, FSF_PROT_LINK_DOWN, xconf data/port incomplete) call zfcp_fsf_link_down_info_eval(). Call the new helper from there. This works because zfcp_fsf_link_down_info_eval() and thus the helper is called before zfcp_fsf_exchange_{config,port}_evaluate(). Port_name and node_name are always valid, so never reset them. Get the permanent_port_name from exchange port data unconditionally as it always has a valid known good value, even during link down. Note: Rather than hardcode in zfcp_fsf_exchange_config_evaluate(), fc_host supported_classes could theoretically get its value from fsf_qtcb_bottom_port.class_of_service in zfcp_fsf_exchange_port_evaluate(). When the link comes back, we get a different notification, perform adapter recovery, and this triggers an implicit exchange config data followed by exchange port data filling in the link dependent fc_host attributes with known good values again. Link: https://lore.kernel.org/r/20200312174505.51294-5-maier@linux.ibm.com Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:36 -04:00
Steffen Maier	538c6e910b	scsi: zfcp: wire previously driver-specific sysfs attributes also to fc_host Manufacturer, HBA model, firmware version, and hardware version. Use the same value format as for the driver-specific attributes. Keep the driver-specific attributes for stable user space sysfs API. Link: https://lore.kernel.org/r/20200312174505.51294-4-maier@linux.ibm.com Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:35 -04:00
Steffen Maier	e05a10a055	scsi: zfcp: expose fabric name as common fc_host sysfs attribute FICON Express8S or older, as well as card features newer than FICON Express16S+ have no certain firmware level requirement. FICON Express16S or FICON Express16S+ have the following minimum firmware level requirements to show a proper fabric name value: z13 machine FICON Express16S , MCL P08424.005 , LIC version 0x00000721 z14 machine FICON Express16S , MCL P42611.008 , LIC version 0x10200069 FICON Express16S+ , MCL P42625.010 , LIC version 0x10300147 Otherwise, the read value is not the fabric name. Each FCP channel of these card features might need one SAN fabric re-login after concurrent microcode update in order to show the proper fabric name. Possible ways to trigger a SAN fabric re-login are one of: Pull fibres between FCP channel port and SAN switch port on either side and re-plug, disable SAN switch port adjacent to FCP channel port and re-enable switch port, or at Service Element toggle off all CHPIDs of FCP channel over all LPARs and toggle CHPIDs on again. Zfcp operating subchannels (FCP devices) on such FCP channel recovers a fabric re-login. Initialize fabric name for any topology and have it an invalid WWPN 0x0 for anything but fabric topology. Otherwise for e.g. point-to-point topology one could see the initial -1 from fc_host_setup() and after a link unplug our fabric name would turn to 0x0 (with subsequent commit ("zfcp: fix fc_host attributes that should be unknown on local link down") and stay 0x0 on link replug. I did not initialize to 0x0 somewhere even earlier in the code path such that it would not flap from real to 0x0 to real on e.g. an exchange config data with fabric topology. Link: https://lore.kernel.org/r/20200312174505.51294-3-maier@linux.ibm.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:34 -04:00
Steffen Maier	819732be9f	scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point v2.6.27 commit `cc8c282963` ("[SCSI] zfcp: Automatically attach remote ports") introduced zfcp automatic port scan. Before that, the user had to use the sysfs attribute "port_add" of an FCP device (adapter) to add and open remote (target) ports, even for the remote peer port in point-to-point topology. That code path did a proper port open recovery trigger taking the erp_lock. Since above commit, a new helper function zfcp_erp_open_ptp_port() performed an UNlocked port open recovery trigger. This can race with other parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt e.g. adapter->erp_total_count or adapter->erp_ready_head. As already found for fabric topology in v4.17 commit `fa89adba19` ("scsi: zfcp: fix infinite iteration on ERP ready list"), there was an endless loop during tracing of rport (un)block. A subsequent v4.18 commit `9e156c54ac` ("scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger") introduced a lockdep assertion for that case. As a side effect, that lockdep assertion now uncovered the unlocked code path for PtP. It is from within an adapter ERP action: zfcp_erp_strategy[1479] intentionally DROPs erp lock around zfcp_erp_strategy_do_action() zfcp_erp_strategy_do_action[1441] NO erp lock zfcp_erp_adapter_strategy[876] NO erp lock zfcp_erp_adapter_strategy_open[855] NO erp lock zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock zfcp_erp_adapter_strat_fsf_xconf[772] erp lock only around zfcp_erp_action_to_running(), BUT _not_ around zfcp_erp_enqueue_ptp_port() zfcp_erp_enqueue_ptp_port[728] BUG: _not_ taking erp lock _zfcp_erp_port_reopen[432] assumes to be called with erp lock zfcp_erp_action_enqueue[314] assumes to be called with erp lock zfcp_dbf_rec_trig[288] _checks_ to be called with erp lock: lockdep_assert_held(&adapter->erp_lock); It causes the following lockdep warning: WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288 zfcp_dbf_rec_trig+0x16a/0x188 no locks held by zfcperp0.0.17c0/775. Fix this by using the proper locked recovery trigger helper function. Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com Fixes: `cc8c282963` ("[SCSI] zfcp: Automatically attach remote ports") Cc: <stable@vger.kernel.org> #v2.6.27+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-17 13:12:33 -04:00
Linus Torvalds	7557c1b3f7	Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four small fixes. Three are in drivers for fairly obvious bugs. The fourth is a set of regressions introduced by the compat_ioctl changes because some of the compat updates wrongly replaced .ioctl instead of .compat_ioctl" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places scsi: zfcp: fix wrong data and display format of SFP+ temperature scsi: sd_sbc: Fix sd_zbc_report_zones() scsi: libfc: free response frame from GPN_ID	2020-02-29 09:58:47 -06:00
Benjamin Block	a3fd4bfe85	scsi: zfcp: fix wrong data and display format of SFP+ temperature When implementing support for retrieval of local diagnostic data from the FCP channel, the wrong data format was assumed for the temperature of the local SFP+ connector. The Fibre Channel Link Services (FC-LS-3) specification is not clear on the format of the stored integer, and only after consulting the SNIA specification SFF-8472 did we realize it is stored as two's complement. Thus, the used data and display format is wrong, and highly misleading for users when the temperature should drop below 0°C (however unlikely that may be). To fix this, change the data format in `struct fsf_qtcb_bottom_port` from unsigned to signed, and change the printf format string used to generate `zfcp_sysfs_adapter_diag_sfp_temperature_show()` from `%hu` to `%hd`. Link: https://lore.kernel.org/r/d6e3be5428da5c9490cfff4df7cae868bc9f1a7e.1582039501.git.bblock@linux.ibm.com Fixes: `a10a61e807` ("scsi: zfcp: support retrieval of SFP Data via Exchange Port Data") Fixes: `6028f7c4cd` ("scsi: zfcp: introduce sysfs interface for diagnostics of local SFP transceiver") Cc: <stable@vger.kernel.org> # 5.5+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-02-24 12:51:15 -05:00
Julian Wiedmann	2db01da8d2	s390/qdio: fill SBALEs with absolute addresses sbale->addr holds an absolute address (or for some FCP usage, an opaque request ID), and should only be used with proper virt/phys translation. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-02-19 17:26:32 +01:00
Steffen Maier	100843f176	scsi: zfcp: trace channel log even for FCP command responses While v2.6.26 commit `b75db73159` ("[SCSI] zfcp: Add qtcb dump to hba debug trace") is right that we don't want to flood the (payload) trace ring buffer, we don't trace successful FCP command responses by default. So we can include the channel log for problem determination with failed responses of any FSF request type. Fixes: `b75db73159` ("[SCSI] zfcp: Add qtcb dump to hba debug trace") Fixes: `a54ca0f62f` ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Cc: <stable@vger.kernel.org> #2.6.38+ Link: https://lore.kernel.org/r/e37597b5c4ae123aaa85fd86c23a9f71e994e4a9.1572018132.git.bblock@linux.ibm.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-10-28 22:16:15 -04:00
Steffen Maier	e76acc5194	scsi: zfcp: proper indentation to reduce confusion in zfcp_erp_required_act No functional change. The unary not operator only applies to the sub expression before the logical or. So we return early if (not running) or failed. Link: https://lore.kernel.org/r/df4f897f6e83eaa528465d0858d5a22daac47a2f.1572018132.git.bblock@linux.ibm.com Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-10-28 22:16:15 -04:00
Benjamin Block	48910f8c35	scsi: zfcp: move maximum age of diagnostic buffers into a per-adapter variable Replace the static define (ZFCP_DIAG_MAX_AGE) with a per-adapter variable (${adapter}->diagnostics->max_age). This new variable is exported via sysfs, along with other, already existing adapter variables, and can both be read and written. This way users can choose how much time should pass between refreshes of diagnostic buffers. The default value for the age remains to be five seconds. By setting this new variable to 0, the caching of diagnostic buffers for userspace accesses can also be completely removed. All diagnostic buffers of a given adapter are subject to this setting in the same way. Link: https://lore.kernel.org/r/b1d0977cc884b16dd4ca6418e4320c56a4c31d63.1572018132.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-10-28 22:16:15 -04:00

1 2 3 4 5 ...

720 Commits