This patch will do proper error handling for DCMD timeout failure cases
for Fusion adapters:
1. For MFI adapters, in case of DCMD timeout (DCMD which must return
SUCCESS) driver will call kill adapter.
2. What action needs to be taken in case of DCMD timeout is decided by
function dcmd_timeout_ocr_possible(). DCMD timeout causing OCR is
applicable to the following commands:
MR_DCMD_PD_LIST_QUERY
MR_DCMD_LD_GET_LIST
MR_DCMD_LD_LIST_QUERY
MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS
MR_DCMD_SYSTEM_PD_MAP_GET_INFO
MR_DCMD_LD_MAP_GET_INFO
3. If DCMD fails from driver init path there are certain DCMDs which
must return SUCCESS. If those DCMDs fail, driver bails out. For optional
DCMDs like pd_info etc., driver continues without executing certain
functionality.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch will do synhronization between OCR function and AEN function
using "reset_mutex" lock. reset_mutex will be acquired only in the
first half of the AEN function which issues a DCMD. Second half of the
function which calls SCSI API (scsi_add_device/scsi_remove_device)
should be out of reset_mutex to avoid deadlock between scsi_eh thread
and driver.
During chip reset (inside OCR function), there should not be any PCI
access and AEN function (which is called in delayed context) may be
firing DCMDs (doing PCI writes) when chip reset is happening in parallel
which will cause FW fault. This patch will solve the problem by making
AEN thread and OCR thread mutually exclusive.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The hpsa driver uses this function to cleanup inquiry data. Our new pqi
driver will also use this function. This function was copied into both
drivers.
This patch exports sanitize_inquiry_string so the hpsa and the pqi
drivers can use this function directly.
Suggested-by: Hannes Reinecke <hare@suse.de>
Suggested-by: Matthew R. Ochs mrochs@linux.vnet.ibm.com
Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com>
Reviewed-by: Justin Lindley <justin.lindley@pmcs.com>
Reviewed-by: Scott Teel <scott.teel@pmcs.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Don Brace <don.brace@pmcs.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we fail to queue an srb for an async login we should set the
relogin flag so it will be retried as the reason for the queuing
failure was most likely transient. Failure to do this can lead to
failed paths as login is never retried if the relogin flag is not
set.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Login/Logout loop was hanging after few hours. /var/log/message showed
that alloc_wrb_handle() function was not able to allocate any new WRB.
Sep 11 11:25:22 Jhelum10 kernel: connection32513:0: Could not send nopout
Sep 11 11:25:22 Jhelum10 kernel: scsi host10: BM_4989 : Alloc of WRB_HANDLE
Failed for the CID : 384
Sep 11 11:25:22 Jhelum10 kernel: connection32513:0: Could not allocate pdu for
mgmt task.
Driver allocates WRB to pass login negotiated parameters information to FW
in beiscsi_offload_connection(). This allocated WRB was not freed so there
was WRB_Leak happening.
Put WRB used for posting the login-negotiated parameters back in pool.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use only port_link_status only to determine link state change. Only bit
0 is used to get the state. Remove code for processing port_fault.
Fixed get_nic_conf structure definition. Removed rsvd[23] field in
be_cmd_get_nic_conf_resp.
Moved definitions of struct field values below the field.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Async link event provides port_speed info. Cache the port_speed info
and use the same to report in ISCSI_HOST_PARAM_PORT_SPEED query.
Removed link status query IOCTL used to do the same.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
OS not responding when running 2 port traffic on 72 CPUs system.
be2iscsi IRQs gets affined to CPU0 when irqbalancer is disabled.
be_iopoll processing completions in BLOCK_IOPOLL_SOFTIRQ hogged CPU0.
1. Use budget to exit the polling loop. beiscsi_process_cq didn't honour
it.
2. Rearming of EQ is done only after iopoll completes.
[mkp: Fixed up blk_iopoll -> irq_poll transition]
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
System crash in I+T card personality.
Fix to add validation for ULP in initiator mode, physical port number,
and supported queue, icd, cid counts.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Log messages for misconfigured transceivers reported by FW.
Register async events that driver handles using MCC_CREATE_EXT ioctl.
Errors messages for faulted/uncertified/unqualified optics are logged.
Added IOCTL to get port_name to be displayed in error message.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Null pointer dereference in shutdown path after taking dump.
Shutdown path is not needed as FW comes up clean every time during probe
after issuing FUNCTION reset MBOX command.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
FW recommended timeout for all mbox command is 30s. Use msleep instead
mdelay to relinquish CPU when polling for mbox completion.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
beiscsi_mccq_compl sets MCC_TAG_STATE_TIMEOUT before setting up
tag_mem_state. be_mcc_compl_process_isr checks for MCC_TAG_STATE_TIMEOUT
first then accesses tag_mem_state which might be still getting populated
in the process context.
Fix: Set MCC_TAG_STATE_TIMEOUT after tag_mem_state is populated.
Removed MCC_TAG_STATE_COMPLETED. When posted its in running state and
the running state is cleared in be_mcc_compl_process_isr. be_mcc_notify
now takes tag argument to set it to running state. Use bit operations
for tag_state. Use barriers before setting the state.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is second part of actual fix for soft lockup.
All mbox cmds issued using BMBX and MCC are synchronized using mutex
mbox_lock instead of spin_lock. Used mutex_lock_interruptible where ever
possible.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We are taking mbox_lock spinlock which disables pre-emption before we
poll for mbox completion. Waiting there with spinlock held in excess of
20s will cause soft lockup.
Actual fix is to change mbox_lock to mutex. The changes are done in
phases. This is the first part.
1. Changed mgmt_get_all_if_id to use MCC where after posting lock is
released.
2. Changed be_mbox_db_ready_wait to busy wait for 12s max and removed
wait_event_timeout. Added error handling code for IO reads.
OPCODE_COMMON_QUERY_FIRMWARE_CONFIG mbox command takes 8s time when
unreachable boot targets configured.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Runtime PM of the SCSI host is already handled by calls to
scsi_autopm_get_host() and scsi_autopm_put_host() from appropriate places
whenever the host needs to be powered on. This works fine when there is
device connected to the host as once it runtime suspends the host will too.
However, if there is no device connected the host is never runtime
suspended (the usage counter is always 0).
Allow runtime suspend of host even if it has no devices connected by
calling scsi_autopm_put_host() at the end of scsi_add_host_with_dma(). We
temporarily increase runtime PM usage counter first so call to
scsi_autopm_put_host() will result idle request to be scheduled for the
device.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We treat system suspend of SCSI devices pretty much the same as runtime
suspend. If the device is already runtime suspended we leave it to that
state during system suspend. On resume from system sleep we then resume the
device and correct the runtime PM status back to "active".
There is a problem with this because runtime PM status of the request queue
in question is not changed (it will be in "suspended" state). When SCSI
disk driver (sd.c) resumes the disk it sends START message to the device
and because the request queue is still in "suspended" state
blk_pm_peek_request() returns NULL preventing resume of the disk.
The issue can be reproduced with following commands:
# echo auto > /sys/block/sda/device/power/control
# echo 15000 > /sys/block/sda/device/power/autosuspend_delay_ms
[ 57.191706] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 57.380015] sd 0:0:0:0: [sda] Stopping disk
Now suspend the machine:
# rtcwake -s10 -mmem
This ends up in soft lockup because resume is not proceeding accordingly
and userspace is never restarted. Also there is nothing printed to the
console.
Fix this by forcing request queue status to "active" before the disk is
resumed.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull SCSI fixes from James Bottomley:
"Two simple fixes.
One prevents a soft lockup on some target removal scenarios and the
other prevents us trying to probe the marvell console device, which
causes it to time out and need the bus resetting"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fix soft lockup in scsi_remove_target() on module removal
SCSI: Add Marvell configuration device to VPD blacklist
Pull SCSI target fixes from Nicholas Bellinger:
"This includes the long awaited series to address a set of bugs around
active I/O remote-port LUN_RESET, as well as properly handling this
same case with concurrent fabric driver session disconnect ->
reconnect.
Note this set of LUN_RESET bug-fixes has been surviving extended
testing on both v4.5-rc1 and v3.14.y code over the last weeks, and is
CC'ed for stable as it's something folks using multiple ESX connected
hosts with slow backends can certainly trigger.
The highlights also include:
- Fix WRITE_SAME/DISCARD emulation 4k sector conversion in
target/iblock (Mike Christie)
- Fix TMR abort interaction and AIO type TMR response in qla2xxx
target (Quinn Tran + Swapnil Nagle)
- Fix >= v3.17 stale descriptor pointer regression in qla2xxx target
(Quinn Tran)
- Fix >= v4.5-rc1 return regression with unmap_zeros_data_store new
configfs store handler (nab)
- Add CPU affinity flag + convert qla2xxx to use bit (Quinn + HCH +
Bart)"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
qla2xxx: use TARGET_SCF_USE_CPUID flag to indiate CPU Affinity
target/transport: add flag to indicate CPU Affinity is observed
target: Fix incorrect unmap_zeroes_data_store return
qla2xxx: Use ATIO type to send correct tmr response
qla2xxx: Fix stale pointer access.
target/user: Fix cast from pointer to phys_addr_t
target: Drop legacy se_cmd->task_stop_comp + REQUEST_STOP usage
target: Fix race with SCF_SEND_DELAYED_TAS handling
target: Fix remote-port TMR ABORT + se_cmd fabric stop
target: Fix TAS handling for multi-session se_node_acls
target: Fix LUN_RESET active TMR descriptor handling
target: Fix LUN_RESET active I/O handling for ACK_KREF
qla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM
qla2xxx: Fix warning reported by static checker
target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors
Pull SCSI fixes from James Bottomley:
"A set of seven fixes:
Two regressions in the new hisi_sas arm driver, a blacklist entry for
the marvell console which was causing a reset cascade without it, a
race fix in the WRITE_SAME/DISCARD routines, a retry fix for the rdac
driver, without which, it would prematurely return EIO and a couple of
fixes for the hyper-v storvsc driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
SCSI: Add Marvell Console to VPD blacklist
scsi_dh_rdac: always retry MODE SELECT on command lock violation
storvsc: Use the specified target ID in device lookup
storvsc: Install the storvsc specific timeout handler for FC devices
hisi_sas: fix v1 hw check for slot error
hisi_sas: add dependency for HAS_IOMEM
The Marvell 91xx configuration device also needs to be on the VPD
blacklist.
[mkp: Match all revisions]
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The function value inside se_cmd can change if the TMR is cancelled.
Use original ATIO Type to correctly determine CTIO response.
Signed-off-by: Swapnil Nagle <swapnil.nagle@purestroage.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ Upstream Commit 84e32a06f4 ]
Commit 84e32a0 ("qla2xxx: Use pci_enable_msix_range() instead of
pci_enable_msix()") introduced a regression when target mode is enabled.
In qla24xx_enable_msix(), ha->max_rsp_queues was incorrectly set
to a value higher than the number of response queues allocated causing
an invalid dereference. Specifically here in qla2x00_init_rings():
*rsp->in_ptr = 0;
Add additional check to make sure the pointer is valid. following
call stack will be seen
---- 8< ----
RIP: 0010:[<ffffffffa02ccadc>] [<ffffffffa02ccadc>] qla2x00_init_rings+0xdc/0x320 [qla2xxx]
RSP: 0018:ffff880429447dd8 EFLAGS: 00010082
....
Call Trace:
[<ffffffffa02ceb40>] qla2x00_abort_isp+0x170/0x6b0 [qla2xxx]
[<ffffffffa02c6f77>] qla2x00_do_dpc+0x357/0x7f0 [qla2xxx]
[<ffffffffa02c6c20>] ? qla2x00_relogin+0x260/0x260 [qla2xxx]
[<ffffffff8107d2c9>] kthread+0xc9/0xe0
[<ffffffff8107d200>] ? flush_kthread_worker+0x90/0x90
[<ffffffff8172cc6f>] ret_from_fork+0x3f/0x70
[<ffffffff8107d200>] ? flush_kthread_worker+0x90/0x90
---- 8< ----
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Drivers should include asm/pci-bridge.h only when they need the arch-
specific things provided there. Outside of the arch/ directories, the only
drivers that actually need things provided by asm/pci-bridge.h are the
powerpc RPA hotplug drivers in drivers/pci/hotplug/rpa*.
Remove the includes of asm/pci-bridge.h from the other drivers, adding an
include of linux/pci.h if necessary.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When a storage device rejects a WRITE SAME command we will disable write
same functionality for the device and return -EREMOTEIO to the block
layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
failing the path.
Yiwen Jiang discovered a small race where WRITE SAME requests issued
simultaneously would cause -EIO to be returned. This happened because
any requests being prepared after WRITE SAME had been disabled for the
device caused us to return BLKPREP_KILL. The latter caused the block
layer to return -EIO upon completion.
To overcome this we introduce BLKPREP_INVALID which indicates that this
is an invalid request for the device. blk_peek_request() is modified to
return -EREMOTEIO in that case.
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
I have a Marvell 88SE9230 SATA Controller that has some sort of
integrated console SCSI device attached to one of the ports.
ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66
ata14.00: configured for UDMA/66
scsi 13:0:0:0: Processor Marvell Console 1.01 PQ: 0 ANSI: 5
Sending it VPD INQUIRY command seem to always fail with following error:
ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata14.00: irq_stat 0x40000001
ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 2 dma 16640 in
Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
ata14: hard resetting link
This has been minor annoyance (only error printed on dmesg) until commit
09e2b0b146 ("scsi: rescan VPD attributes") added call to scsi_attach_vpd()
in scsi_rescan_device(). The commit causes the system to splat out
following errors continuously without ever reaching the UI:
ata14.00: configured for UDMA/66
ata14: EH complete
ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata14.00: irq_stat 0x40000001
ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 6 dma 16640 in
Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
ata14: hard resetting link
ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata14.00: configured for UDMA/66
ata14: EH complete
ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata14.00: irq_stat 0x40000001
ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 7 dma 16640 in
Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
Without in-depth understanding of SCSI layer and the Marvell controller,
I suspect this happens because when the link goes down (because of an
error) we schedule scsi_rescan_device() which again fails to read VPD
data... ad infinitum.
Since VPD data cannot be read from the device anyway we prevent the SCSI
layer from even trying by blacklisting the device. This gets away the
error and the system starts up normally.
[mkp: Widened the match to all revisions of this device]
Cc: <stable@vger.kernel.org>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If MODE SELECT returns with sense '05/91/36' (command lock violation)
it should always be retried without counting the number of retries.
During an HBA upgrade or similar circumstances one might see a flood
of MODE SELECT command from various HBAs, which will easily trigger
the sense code and exceed the retry count.
Cc: <stable@vger.kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>