Pull ia64 updates from Tony Luck:
"The big change here is removal of support for SGI Altix"
* tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits)
genirq: remove the is_affinity_mask_valid hook
ia64: remove CONFIG_SWIOTLB ifdefs
ia64: remove support for machvecs
ia64: move the screen_info setup to common code
ia64: move the ROOT_DEV setup to common code
ia64: rework iommu probing
ia64: remove the unused sn_coherency_id symbol
ia64: remove the SGI UV simulator support
ia64: remove the zx1 swiotlb machvec
ia64: remove CONFIG_ACPI ifdefs
ia64: remove CONFIG_PCI ifdefs
ia64: remove the hpsim platform
ia64: remove now unused machvec indirections
ia64: remove support for the SGI SN2 platform
drivers: remove the SGI SN2 IOC4 base support
drivers: remove the SGI SN2 IOC3 base support
qla2xxx: remove SGI SN2 support
qla1280: remove SGI SN2 support
misc/sgi-xp: remove SGI SN2 support
char/mspec: remove SGI SN2 support
...
Currently blk_set_runtime_active() is checking if q->dev is null by
itself, thus remove the same checking in its user: scsi_dev_type_resume().
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When device gone, it will check whether it is during reset, if not, it will
send internal task abort. Before internal task abort returned, reset
begins, and it will check whether SAS_PHY_UNUSED is set, if not, it will
call hisi_sas_init_device(), but at that time domain_device may already be
freed or part of it is freed, so it may referenece null pointer in
hisi_sas_init_device(). It may occur as follows:
thread0 thread1
hisi_sas_dev_gone()
check whether in RESET(no)
internal task abort
reset prep
soft_reset
... (part of reset_done)
internal task abort failed
release resource anyway
clear_itct
device->lldd_dev=NULL
hisi_sas_reset_init_all_device
check sas_dev->dev_type is SAS_PHY_UNUSED and
!device
set dev_type SAS_PHY_UNUSED
sas_free_device
hisi_sas_init_device
...
Semaphore hisi_hba.sema is used to sync the processes of device gone and
host reset.
To solve the issue, expand the scope that semaphore protects and let them
never occur together.
And also some places will check whether domain_device is NULL to judge
whether the device is gone. So when device gone, need to clear
sas_dev->sas_device.
Link: https://lore.kernel.org/r/1567774537-20003-14-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the NCQ tag is only assigned for FPDMA READ and FPDMA WRITE
commands, and for other NCQ commands (such as FPDMA SEND), their NCQ tags
are set in the delivery command to 0.
So for all the NCQ commands, we also need to assign normal NCQ tag for
them, so drop the command type check in hisi_sas_get_ncq_tag() [drop
hisi_sas_get_ncq_tag() altogether actually], and always use the ATA command
NCQ tag when appropriate.
Link: https://lore.kernel.org/r/1567774537-20003-8-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When init device for SAS disks, it will send TMF IO to clear disks. At that
time TMF IO is broken by some operations such as injecting controller reset
from HW RAs event, the TMF IO will be timeout, and at last device will be
gone. Print is as followed:
hisi_sas_v3_hw 0000:74:02.0: dev[240:1] found
...
hisi_sas_v3_hw 0000:74:02.0: controller resetting...
hisi_sas_v3_hw 0000:74:02.0: phyup: phy7 link_rate=10(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy2 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy3 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy6 link_rate=10(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy5 link_rate=11
hisi_sas_v3_hw 0000:74:02.0: phyup: phy4 link_rate=11
hisi_sas_v3_hw 0000:74:02.0: controller reset complete
hisi_sas_v3_hw 0000:74:02.0: abort tmf: TMF task timeout and not done
hisi_sas_v3_hw 0000:74:02.0: dev[240:1] is gone
sas: driver on host 0000:74:02.0 cannot handle device 5000c500a75a860d,
error:5
To improve the reliability, retry TMF IO max of 3 times for SAS disks which
is the same as softreset does.
Link: https://lore.kernel.org/r/1567774537-20003-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The event handler calls scsi_scan_host() when events are missed, which will
hotplug new LUNs. However, this function won't remove any unplugged LUNs.
The result is that hotunplug doesn't work properly when the number of
unplugged LUNs exceeds the event queue size (currently 8).
Scan existing LUNs when events are missed to check if they are still
present. If not, remove them.
Link: https://lore.kernel.org/r/20190905181903.29756-1-mlupfer@ddn.com
Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
cdb in send_mode_select() is not zeroed and is only partially filled in
rdac_failover_get(), which leads to some random data getting to the
device. Users have reported storage responding to such commands with
INVALID FIELD IN CDB. Code before commit 3278255741 was not affected, as
it called blk_rq_set_block_pc().
Fix this by zeroing out the cdb first.
Identified & fix proposed by HPE.
Fixes: 3278255741 ("scsi_dh_rdac: switch to scsi_execute_req_flags()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20190904155205.1666-1-martin.wilck@suse.com
Signed-off-by: Martin Wilck <mwilck@suse.com>
Acked-by: Ales Novak <alnovak@suse.cz>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A recent patch unconditionally marks the hba as in error as part of
resetting the adapter. The driver flow that called the adapter reset was a
recovery path, which expects the adapter to not be in an error state in
order to finish the recovery. Given the new error state being set, the
recovery fails and the adapter is left in limbo.
Revise the adapter reset routine so that it will only mark the adapter in
error if it was unable to reset the adapter.
Fixes: 8c24a4f643 ("scsi: lpfc: Fix crash due to port reset racing vs adapter error handling")
Link: https://lore.kernel.org/r/20190903215441.10490-1-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch updates log message which indicates number of vectors used by
the driver instead of displaying failure to get maximum requested
vectors. Driver will always request maximum vectors during
initialization. In the event driver is not able to get maximum requested
vectors, it will adjust the allocated vectors. This is normal and does not
imply failure in driver.
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Link: https://lore.kernel.org/r/20190830222402.23688-2-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI fix from James Bottomley:
"Just a single lpfc fix adjusting the number of available queues for
high CPU count systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Raise config max for lpfc_fcp_mq_threshold variable
Using the helper blk_queue_required_elevator_features(), set the
elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request
queue of SCSI ZBC disks.
This feature requirement can always be satisfied as the mq-deadline
elevator is always selected for in-kernel compilation when
CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Port speed printing was added by commit d948e6383e ("scsi: fnic: Add port
speed stat to fnic debug stats"). As currently configured, this will cause
the port speed to be printed to syslog every 2 seconds. To prevent log
spamming, only print the vnic port speed at driver initialization and if
the speed changes. Also clean up a small typo in fnic_trace.c.
Fixes: d948e6383e ("scsi: fnic: Add port speed stat to fnic debug stats")
Signed-off-by: John Pittman <jpittman@redhat.com>
Reviewed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_process_unsol_compl:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:636:30: warning: variable task set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_process_ofld_cmpl:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:1125:21: warning: variable port set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_init_seq_cleanup_task:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:1468:30: warning: variable orig_task set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_initiate_seq_cleanup:
drivers/scsi/bnx2fc/bnx2fc_io.c:932:19: warning: variable lport set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_initiate_cleanup:
drivers/scsi/bnx2fc/bnx2fc_io.c:1001:19: warning: variable lport set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_process_scsi_cmd_compl:
drivers/scsi/bnx2fc/bnx2fc_io.c:1882:20: warning: variable host set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_fcoe.c: In function bnx2fc_rcv:
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:431:26: warning: variable fh set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a race b/w fipvlan request and response path:
=====
qedf_fcoe_process_vlan_resp:113]:2: VLAN response, vid=0xffd.
qedf_initiate_fipvlan_req:165]:2: vlan = 0x6ffd already set.
qedf_set_vlan_id:139]:2: Setting vlan_id=0ffd prio=3.
======
The request thread sees that vlan is already set and fails to call
ctrl_link_up.
Fix:
- While setting vlan_id use local variable and before setting vlan_id.
- Call fcoe_ctlr_link_up in next iteration of fipvlan request.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The list of rports might become stale so we should rather traverse the
discovery list when trying relogin.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Problem Statement:
- Driver has fc_id of 0xcc0200
- Driver gets link down (due to test) and calls fcoe_ctlr_link_down().
- At this point, the fc_id of the initiator port is zeroed out.
- Driver gets a link up 14 seconds later.
- Driver performs FIP VLAN request, gets a response from the switch.
- No change in VLAN is detected.
- Driver then notifies libfcoe via fcoe_ctlr_link_up().
- Libfcoe then issues a multicast discovery solicitation as expected.
- Cisco FCF responds to that correctly.
- Libfcoe at this point starts a 3 sec count-down to allow any other FCFs
to be discovered. However, at this point, it has been 20 seconds since
the last FKA from the driver (which would have been sent prior to
backlink toggle), which causes the CVL to be issued from Cisco CVL from
the switch is dropped by the driver as the vx_port identification
descriptor is present and has value of 0xcc0200, which does not match
the driver's value of 0. Libfcoe completes the 3 sec count down and
proceeds to issue FLOGI as per protocol. Switch rejects FLogi request.
All subsequent FLOGI requests from libfc are rejected by the switch
(possibly because it is now expecting a new solicitation). This
situation will continue until the next link toggle.
Solution:
The Vx_port descriptor in the CVL has three fields:
MAC address
Fabric ID
Port Name
Today, the code checks for both #1 and #2 above. In the case where we went
through a link down, both these will be zero until FLOGI succeeds.
We should change our code to check if any one of these 3 is valid and if
so, handle the CVL (basically switching from AND to OR). The port name
field is definitely expected to be valid always.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Log s_id, d_id, type and command to the log message.
[mkp: fixed warning]
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current code doeesn't support 20Gbps speed for current and supported
speed. Add support for it.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver was wrongly interpreting the supported cap value returned by qed.
Solution: Use QED define macros instead of OS defined for interpreting
supporting speeds.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>