Many disks implement the SCSI PRE-FETCH commands. One use case might be a
disk-to-disk compare, say between disks A and B. Then this sequence of
commands might be used: PRE-FETCH(from B, IMMED), READ(from A), VERIFY
(BYTCHK=1 on B with data returned from READ). The PRE-FETCH (which returns
quickly due to the IMMED) fetches the data from the media into B's cache
which should speed the trailing VERIFY command. The next chunk of the
compare might be done in parallel, with A and B reversed.
The implementation tries to bring the specified range in main memory into
the cache(s) associated with this machine's CPU(s) using the
prefetch_range() function.
Link: https://lore.kernel.org/r/20200421151424.32668-7-dgilbert@interlog.com
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Previously the code did the work implied by the given SCSI command and
after that it waited for a timer based on the user specified command
duration to be exhausted before informing the mid-level that the command
was complete. For short command durations, the time to complete the work
implied by the SCSI command could be significant compared to the user
specified command duration.
For example a WRITE of 128 blocks (say 512 bytes each) on a machine that
can copy from main memory to main memory at a rate of 10 GB/sec will take
around 6.4 microseconds to do that copy. If the user specified a command
duration of 5 microseconds (ndelay=5000), should the driver do a further
delay of 5 microseconds after the copy or return immediately because 6.4 >
5 ?
The action prior to this patch was to always do the timer based
delay. After this patch, for ndelay values less than 1 millisecond, this
driver will complete the command immediately. And in the case where the
user specified delay was 7 microseconds, a timer delay of 600 nanoseconds
will be set ((7 - 6.4) * 1000).
Link: https://lore.kernel.org/r/20200421151424.32668-6-dgilbert@interlog.com
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The design of this driver is to do any ramdisk access on the same thread
that invoked the queuecommand() call. That is assumed to be user space
context. The command duration is implemented by setting the delay with a
high resolution timer. The hr timer's callback may well be in interrupt
context, but it doesn't touch the ramdisk. So try removing the
_irqsave()/_irqrestore() portion on the read-write lock that protects
ramdisk access.
Link: https://lore.kernel.org/r/20200421151424.32668-5-dgilbert@interlog.com
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With the addition of the per_host_store option, the ability to check
whether two different ramdisk images are the same or not becomes
practical. Prior to this patch VERIFY(10) always returned true (i.e. the
SCSI GOOD status) without checking. This option adds support for BYTCHK
equal to 0, 1 and 3. If the comparison fails, then a sense key of
MISCOMPARE is returned as per the T10 standards. Also add support for the
VERIFY(16) command.
Link: https://lore.kernel.org/r/20200421151424.32668-4-dgilbert@interlog.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The scsi_debug driver has always been restricted to using one ramdisk image
(or none) for its storage. This means that thousands of scsi_debug devices
can be created without exhausting the host machine's RAM. The downside is
that all scsi_debug devices share the same ramdisk image. This option
changes the way a following write to the add_host parameter (or an add_host
in the module/driver invocation) operates. For each new host that is
created while per_host_store is true, a new store (of dev-size_mb MiB) is
created and associated with all the LUs that belong to that new host. The
user (who will need root permissions) needs to take care not to exhaust all
the machine's available RAM.
One reason for doing this is to check that (partial) disk to disk copies
based on scsi_debug devices have actually copied accurately. To test this
the add_host=<n> parameter where <n> is 2 or greater can be used when the
scsi_debug module is loaded. Let us assume that /dev/sdb and /dev/sg1 are
the same scsi_debug device, while /dev/sdc and /dev/sg2 are the same
scsi_debug device. With per_host_store=1 add_host=2 they will have
different ramdisk images. Then the following pseudocode could be executed
to check if the sgh_dd copy worked:
dd if=/dev/urandom of=/dev/sdb
sgh_dd if=/dev/sg1 of=/dev/sg2 [plus option(s) to test]
cmp /dev/sdb /dev/sdc
If the cmp fails then the copy has failed (or some other mechanism wrote to
/dev/sdb or /dev/sdc in the interim).
[mkp: use kstrtobool()]
Link: https://lore.kernel.org/r/20200421151424.32668-3-dgilbert@interlog.com
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a new command line option (e.g. random=1) and sysfs attribute that
causes subsequent command completion times to be between the current
command delay setting and 0. A uniformly distributed 32 bit, kernel
provided integer is used for this purpose.
Since the existing 'delay' whose units are jiffies (typically milliseconds)
and 'ndelay' (units: nanoseconds) options (and sysfs attributes) span a
range greater than 32 bits, some scaling is required.
The purpose of this patch is to widen the range of testing cases that are
visited in long running tests. Put simply: rarely struct race conditions
are more likely to be found when this facility is used.
The default is the previous case in which all command completions were
roughly equal to (if not, slightly longer) than the value given by the
'delay' or 'ndelay' settings (or their defaults). This option's default is
equivalent to setting 'random=0' .
[mkp: use kstrtobool()]
Link: https://lore.kernel.org/r/20200421151424.32668-2-dgilbert@interlog.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a pointer to the CDROM information structure to struct gendisk.
This will allow various removable media file systems to call directly
into the CDROM layer instead of abusing ioctls with kernel pointers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Smatch complains that the "path_data->handle" variable is user controlled.
It comes from iscsi_set_path() so that seems possible. It's harmless to
add a limit check.
The qedi->ep_tbl[] array has qedi->max_active_conns elements (which is
always ISCSI_MAX_SESS_PER_HBA (4096) elements). The array is allocated in
the qedi_cm_alloc_mem() function.
Link: https://lore.kernel.org/r/20200428131939.GA696531@mwanda
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In case scsi_setup_fs_cmnd() fails we're not freeing the sgtables allocated
by scsi_init_io(), thus we leak the allocated memory.
Free the sgtables allocated by scsi_init_io() in case scsi_setup_fs_cmnd()
fails.
Technically scsi_setup_scsi_cmnd() does not suffer from this problem as it
can only fail if scsi_init_io() fails, so it does not have sgtables
allocated. But to maintain symmetry and as a measure of defensive
programming, free the sgtables on scsi_setup_scsi_cmnd() failure as well.
scsi_mq_free_sgtables() has safeguards against double-freeing of memory so
this is safe to do.
While we're at it, rename scsi_mq_free_sgtables() to scsi_free_sgtables().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205595
Link: https://lore.kernel.org/r/20200428104605.8143-2-johannes.thumshirn@wdc.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit ed830385a2 ("scsi: ibmvfc: Avoid loss of all paths during SVC node
reboot") introduced a regression where when the client resets or re-enables
its CRQ with the hypervisor there is a chance that if the server side
doesn't issue its INIT handshake quick enough the client can issue an
Implicit Logout prior to doing an NPIV Login. The server treats this
scenario as a protocol violation and closes the CRQ on its end forcing the
client through a reset that gets the client host state and next host action
out of agreement leading to a BUG assert.
ibmvfc 30000003: Partner initialization complete
ibmvfc 30000002: Partner initialization complete
ibmvfc 30000002: Host partner adapter deregistered or failed (rc=2)
ibmvfc 30000002: Partner initialized
------------[ cut here ]------------
kernel BUG at ../drivers/scsi/ibmvscsi/ibmvfc.c:4489!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Supported: No, Unreleased kernel
CPU: 16 PID: 1290 Comm: ibmvfc_0 Tainted: G OE X 5.3.18-12-default
NIP: c00800000d84a2b4 LR: c00800000d84a040 CTR: c00800000d84a2a0
REGS: c00000000cb57a00 TRAP: 0700 Tainted: G OE X (5.3.18-12-default)
MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24000848 XER: 00000001
CFAR: c00800000d84a070 IRQMASK: 1
GPR00: c00800000d84a040 c00000000cb57c90 c00800000d858e00 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000000a0
GPR08: c00800000d84a074 0000000000000001 0000000000000014 c00800000d84d7d0
GPR12: 0000000000000000 c00000001ea28200 c00000000016cd98 0000000000000000
GPR16: c00800000d84b7b8 0000000000000000 0000000000000000 c00000542c706d68
GPR20: 0000000000000005 c00000542c706d88 5deadbeef0000100 5deadbeef0000122
GPR24: 000000000000000c 000000000000000b c00800000d852180 0000000000000001
GPR28: 0000000000000000 c00000542c706da0 c00000542c706860 c00000542c706828
NIP [c00800000d84a2b4] ibmvfc_work+0x3ac/0xc90 [ibmvfc]
LR [c00800000d84a040] ibmvfc_work+0x138/0xc90 [ibmvfc]
This scenario can be prevented by rejecting any attempt to send an Implicit
Logout if the client adapter is not logged in yet.
Link: https://lore.kernel.org/r/20200427214824.6890-1-tyreld@linux.ibm.com
Fixes: ed830385a2 ("scsi: ibmvfc: Avoid loss of all paths during SVC node reboot")
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The write performance of TLC NAND is considerably lower than SLC NAND.
Using SLC NAND as a WriteBooster Buffer enables the write request to be
processed with lower latency and improves the overall write performance.
Adds support for shared-buffer mode WriteBooster.
WriteBooster enable: SW enables it when clocks are scaled up, thus it's
enabled only in high load conditions.
WriteBooster disable: SW will disable the feature, when clocks are scaled
down. Thus writes would go as normal writes.
To keep the endurance of the WriteBooster Buffer at a maximum, this
load-based toggling is adopted.
Link: https://lore.kernel.org/r/2871444d9083b0e9323ef6d8ff1b544b7784adc9.1587591527.git.asutoshd@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the following coccicheck warning:
drivers/scsi/isci/isci.h:515:1-12: WARNING: Assignment of 0/1 to bool
variable
drivers/scsi/isci/isci.h:503:1-12: WARNING: Assignment of 0/1 to bool
variable
drivers/scsi/isci/isci.h:509:1-12: WARNING: Assignment of 0/1 to bool
variable
Link: https://lore.kernel.org/r/20200421034050.28193-1-yanaijie@huawei.com
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_host_block() calls scsi_internal_device_block() for each scsi_device and
scsi_internal_device_block() calls blk_mq_quiesce_queue() for each LUN.
Since synchronize_rcu() is called from blk_mq_quiesce_queue(), this can cause
substantial slowdowns on systems with many LUNs.
Use scsi_internal_device_block_nowait() to implement scsi_host_block() so it
is sufficient to run synchronize_rcu() once. This is safe since SCSI does not
set the BLK_MQ_F_BLOCKING flag.
[mkp: commit desc and comment tweaks]
Link: https://lore.kernel.org/r/20200423020713.332743-1-ming.lei@redhat.com
Cc: Steffen Maier <maier@linux.ibm.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the following coccicheck warning:
drivers/scsi/megaraid/megaraid_sas_fusion.c:4242:6-16: WARNING:
Assignment of 0/1 to bool variable
drivers/scsi/megaraid/megaraid_sas_fusion.c:4786:1-29: WARNING:
Assignment of 0/1 to bool variable
drivers/scsi/megaraid/megaraid_sas_fusion.c:4791:1-29: WARNING:
Assignment of 0/1 to bool variable
drivers/scsi/megaraid/megaraid_sas_fusion.c:4716:1-29: WARNING:
Assignment of 0/1 to bool variable
drivers/scsi/megaraid/megaraid_sas_fusion.c:4721:1-29: WARNING:
Assignment of 0/1 to bool variable
Link: https://lore.kernel.org/r/20200421034111.28353-1-yanaijie@huawei.com
Acked-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For INVADER_SERIES, each set of 8 reply queues (0 - 7, 8 - 15,..), and for
VENTURA_SERIES, each set of 16 reply queues (0 - 15, 16 - 31,..) need to be
within the same 4 GB boundary. Driver uses limitation of VENTURA_SERIES to
manage INVADER_SERIES as well. The driver is allocating the DMA able
memory for RDPQs accordingly.
1) At driver load, set DMA mask to 64 and allocate memory for RDPQs
2) Check if allocated resources for RDPQ are in the same 4GB range
3) If #2 is true, continue with 64 bit DMA and go to #6
4) If #2 is false, then free all the resources from #1
5) Set DMA mask to 32 and allocate RDPQs
6) Proceed with driver loading and other allocations
Link: https://lore.kernel.org/r/1587626596-1044-5-git-send-email-suganath-prabu.subramani@broadcom.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In NPIV environment, a NPIV host may use a queue pair created by base host
or other NPIVs, so the check for a queue pair created by this NPIV is not
correct, and can cause an abort to fail, which in turn means the NVME
command not returned. This leads to hang in nvme_fc layer in
nvme_fc_delete_association() which waits for all I/Os to be returned, which
is seen as hang in the application.
Link: https://lore.kernel.org/r/20200331104015.24868-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Today, upon an MPI failure AEN, on top of collecting an MPI dump, a regular
firmware dump is also taken and then chip reset. This is disruptive to IOs
and not required. Make the firmware dump collection, followed by chip
reset, optional (not done by default).
Firmware dump buffer and MPI dump buffer are independent of each
other with this change and each can have dump that was taken at two
different times for two different issues. The MPI dump is saved in a
separate buffer and is retrieved differently from firmware dump.
To collect full dump on MPI failure AEN, a module parameter is
introduced:
ql2xfulldump_on_mpifail (default: 0)
Link: https://lore.kernel.org/r/20200331104015.24868-2-njavali@marvell.com
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
sure, we'd done copy_from_user() on the same range, so we can
skip access_ok()... and it's not worth bothering. Just use
copy_to_user().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There are only two callers of blk_rq_map_sg/__blk_rq_map_sg that set
the dma_pad value in the queue. Move the handling into those callers
instead of burdening the common code, and move the ->extra_len field
from struct request to struct scsi_cmnd.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>