Changes in 5.10.65
locking/mutex: Fix HANDOFF condition
regmap: fix the offset of register error log
regulator: tps65910: Silence deferred probe error
crypto: mxs-dcp - Check for DMA mapping errors
sched/deadline: Fix reset_on_fork reporting of DL tasks
power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors
crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop()
sched/deadline: Fix missing clock update in migrate_task_rq_dl()
rcu/tree: Handle VM stoppage in stall detection
EDAC/mce_amd: Do not load edac_mce_amd module on guests
posix-cpu-timers: Force next expiration recalc after itimer reset
hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns()
hrtimer: Ensure timerfd notification for HIGHRES=n
udf: Check LVID earlier
udf: Fix iocharset=utf8 mount option
isofs: joliet: Fix iocharset=utf8 mount option
bcache: add proper error unwinding in bcache_device_init
blk-throtl: optimize IOPS throttle for large IO scenarios
nvme-tcp: don't update queue count when failing to set io queues
nvme-rdma: don't update queue count when failing to set io queues
nvmet: pass back cntlid on successful completion
power: supply: smb347-charger: Add missing pin control activation
power: supply: max17042_battery: fix typo in MAx17042_TOFF
s390/cio: add dev_busid sysfs entry for each subchannel
s390/zcrypt: fix wrong offset index for APKA master key valid state
libata: fix ata_host_start()
crypto: omap - Fix inconsistent locking of device lists
crypto: qat - do not ignore errors from enable_vf2pf_comms()
crypto: qat - handle both source of interrupt in VF ISR
crypto: qat - fix reuse of completion variable
crypto: qat - fix naming for init/shutdown VF to PF notifications
crypto: qat - do not export adf_iov_putmsg()
fcntl: fix potential deadlock for &fasync_struct.fa_lock
udf_get_extendedattr() had no boundary checks.
s390/kasan: fix large PMD pages address alignment check
s390/pci: fix misleading rc in clp_set_pci_fn()
s390/debug: keep debug data on resize
s390/debug: fix debug area life cycle
s390/ap: fix state machine hang after failure to enable irq
power: supply: cw2015: use dev_err_probe to allow deferred probe
m68k: emu: Fix invalid free in nfeth_cleanup()
sched/numa: Fix is_core_idle()
sched: Fix UCLAMP_FLAG_IDLE setting
rcu: Fix to include first blocked task in stall warning
rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees
rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
m68k: Fix invalid RMW_INSNS on CPUs that lack CAS
block: return ELEVATOR_DISCARD_MERGE if possible
spi: spi-fsl-dspi: Fix issue with uninitialized dma_slave_config
spi: spi-pic32: Fix issue with uninitialized dma_slave_config
genirq/timings: Fix error return code in irq_timings_test_irqs()
irqchip/loongson-pch-pic: Improve edge triggered interrupt support
lib/mpi: use kcalloc in mpi_resize
clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel
block: nbd: add sanity check for first_minor
spi: coldfire-qspi: Use clk_disable_unprepare in the remove function
irqchip/gic-v3: Fix priority comparison when non-secure priorities are used
crypto: qat - use proper type for vf_mask
certs: Trigger creation of RSA module signing key if it's not an RSA key
tpm: ibmvtpm: Avoid error message when process gets signal while waiting
x86/mce: Defer processing of early errors
spi: davinci: invoke chipselect callback
blk-crypto: fix check for too-large dun_bytes
regulator: vctrl: Use locked regulator_get_voltage in probe path
regulator: vctrl: Avoid lockdep warning in enable/disable ops
spi: sprd: Fix the wrong WDG_LOAD_VAL
spi: spi-zynq-qspi: use wait_for_completion_timeout to make zynq_qspi_exec_mem_op not interruptible
EDAC/i10nm: Fix NVDIMM detection
drm/panfrost: Fix missing clk_disable_unprepare() on error in panfrost_clk_init()
drm/gma500: Fix end of loop tests for list_for_each_entry
ASoC: mediatek: mt8183: Fix Unbalanced pm_runtime_enable in mt8183_afe_pcm_dev_probe
media: TDA1997x: enable EDID support
leds: is31fl32xx: Fix missing error code in is31fl32xx_parse_dt()
soc: rockchip: ROCKCHIP_GRF should not default to y, unconditionally
media: cxd2880-spi: Fix an error handling path
drm/of: free the right object
bpf: Fix a typo of reuseport map in bpf.h.
bpf: Fix potential memleak and UAF in the verifier.
drm/of: free the iterator object on failure
gve: fix the wrong AdminQ buffer overflow check
libbpf: Fix the possible memory leak on error
ARM: dts: aspeed-g6: Fix HVI3C function-group in pinctrl dtsi
arm64: dts: renesas: r8a77995: draak: Remove bogus adv7511w properties
i40e: improve locking of mac_filter_hash
soc: qcom: rpmhpd: Use corner in power_off
libbpf: Fix removal of inner map in bpf_object__create_map
gfs2: Fix memory leak of object lsi on error return path
firmware: fix theoretical UAF race with firmware cache and resume
driver core: Fix error return code in really_probe()
ionic: cleanly release devlink instance
media: dvb-usb: fix uninit-value in dvb_usb_adapter_dvb_init
media: dvb-usb: fix uninit-value in vp702x_read_mac_addr
media: dvb-usb: Fix error handling in dvb_usb_i2c_init
media: go7007: fix memory leak in go7007_usb_probe
media: go7007: remove redundant initialization
media: rockchip/rga: use pm_runtime_resume_and_get()
media: rockchip/rga: fix error handling in probe
media: coda: fix frame_mem_ctrl for YUV420 and YVU420 formats
media: atomisp: fix the uninitialized use and rename "retvalue"
Bluetooth: sco: prevent information leak in sco_conn_defer_accept()
6lowpan: iphc: Fix an off-by-one check of array index
drm/amdgpu/acp: Make PM domain really work
tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos
ARM: dts: meson8: Use a higher default GPU clock frequency
ARM: dts: meson8b: odroidc1: Fix the pwm regulator supply properties
ARM: dts: meson8b: mxq: Fix the pwm regulator supply properties
ARM: dts: meson8b: ec100: Fix the pwm regulator supply properties
net/mlx5e: Prohibit inner indir TIRs in IPoIB
net/mlx5e: Block LRO if firmware asks for tunneled LRO
cgroup/cpuset: Fix a partition bug with hotplug
drm: mxsfb: Enable recovery on underflow
drm: mxsfb: Increase number of outstanding requests on V4 and newer HW
drm: mxsfb: Clear FIFO_CLEAR bit
net: cipso: fix warnings in netlbl_cipsov4_add_std
Bluetooth: mgmt: Fix wrong opcode in the response for add_adv cmd
arm64: dts: renesas: rzg2: Convert EtherAVB to explicit delay handling
arm64: dts: renesas: hihope-rzg2-ex: Add EtherAVB internal rx delay
devlink: Break parameter notification sequence to be before/after unload/load driver
net/mlx5: Fix missing return value in mlx5_devlink_eswitch_inline_mode_set()
i2c: highlander: add IRQ check
leds: lt3593: Put fwnode in any case during ->probe()
leds: trigger: audio: Add an activate callback to ensure the initial brightness is set
media: em28xx-input: fix refcount bug in em28xx_usb_disconnect
media: venus: venc: Fix potential null pointer dereference on pointer fmt
PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently
PCI: PM: Enable PME if it can be signaled from D3cold
bpf, samples: Add missing mprog-disable to xdp_redirect_cpu's optstring
soc: qcom: smsm: Fix missed interrupts if state changes while masked
debugfs: Return error during {full/open}_proxy_open() on rmmod
Bluetooth: increase BTNAMSIZ to 21 chars to fix potential buffer overflow
PM: EM: Increase energy calculation precision
selftests/bpf: Fix bpf-iter-tcp4 test to print correctly the dest IP
drm/msm/mdp4: refactor HW revision detection into read_mdp_hw_revision
drm/msm/mdp4: move HW revision detection to earlier phase
drm/msm/dpu: make dpu_hw_ctl_clear_all_blendstages clear necessary LMs
arm64: dts: exynos: correct GIC CPU interfaces address range on Exynos7
counter: 104-quad-8: Return error when invalid mode during ceiling_write
cgroup/cpuset: Miscellaneous code cleanup
cgroup/cpuset: Fix violation of cpuset locking rule
ASoC: Intel: Fix platform ID matching
Bluetooth: fix repeated calls to sco_sock_kill
drm/msm/dsi: Fix some reference counted resource leaks
net/mlx5: Register to devlink ingress VLAN filter trap
net/mlx5: Fix unpublish devlink parameters
ASoC: rt5682: Implement remove callback
ASoC: rt5682: Properly turn off regulators if wrong device ID
usb: dwc3: meson-g12a: add IRQ check
usb: dwc3: qcom: add IRQ check
usb: gadget: udc: at91: add IRQ check
usb: gadget: udc: s3c2410: add IRQ check
usb: phy: fsl-usb: add IRQ check
usb: phy: twl6030: add IRQ checks
usb: gadget: udc: renesas_usb3: Fix soc_device_match() abuse
selftests/bpf: Fix test_core_autosize on big-endian machines
devlink: Clear whole devlink_flash_notify struct
samples: pktgen: add missing IPv6 option to pktgen scripts
Bluetooth: Move shutdown callback before flushing tx and rx queue
PM: cpu: Make notifier chain use a raw_spinlock_t
usb: host: ohci-tmio: add IRQ check
usb: phy: tahvo: add IRQ check
libbpf: Re-build libbpf.so when libbpf.map changes
mac80211: Fix insufficient headroom issue for AMSDU
locking/lockdep: Mark local_lock_t
locking/local_lock: Add missing owner initialization
lockd: Fix invalid lockowner cast after vfs_test_lock
nfsd4: Fix forced-expiry locking
arm64: dts: marvell: armada-37xx: Extend PCIe MEM space
clk: staging: correct reference to config IOMEM to config HAS_IOMEM
i2c: synquacer: fix deferred probing
firmware: raspberrypi: Keep count of all consumers
firmware: raspberrypi: Fix a leak in 'rpi_firmware_get()'
usb: gadget: mv_u3d: request_irq() after initializing UDC
mm/swap: consider max pages in iomap_swapfile_add_extent
lkdtm: replace SCSI_DISPATCH_CMD with SCSI_QUEUE_RQ
Bluetooth: add timeout sanity check to hci_inquiry
i2c: iop3xx: fix deferred probing
i2c: s3c2410: fix IRQ check
i2c: fix platform_get_irq.cocci warnings
i2c: hix5hd2: fix IRQ check
gfs2: init system threads before freeze lock
rsi: fix error code in rsi_load_9116_firmware()
rsi: fix an error code in rsi_probe()
ASoC: Intel: kbl_da7219_max98927: Fix format selection for max98373
ASoC: Intel: Skylake: Leave data as is when invoking TLV IPCs
ASoC: Intel: Skylake: Fix module resource and format selection
mmc: sdhci: Fix issue with uninitialized dma_slave_config
mmc: dw_mmc: Fix issue with uninitialized dma_slave_config
mmc: moxart: Fix issue with uninitialized dma_slave_config
bpf: Fix possible out of bound write in narrow load handling
CIFS: Fix a potencially linear read overflow
i2c: mt65xx: fix IRQ check
i2c: xlp9xx: fix main IRQ check
usb: ehci-orion: Handle errors of clk_prepare_enable() in probe
usb: bdc: Fix an error handling path in 'bdc_probe()' when no suitable DMA config is available
usb: bdc: Fix a resource leak in the error handling path of 'bdc_probe()'
tty: serial: fsl_lpuart: fix the wrong mapbase value
ASoC: wcd9335: Fix a double irq free in the remove function
ASoC: wcd9335: Fix a memory leak in the error handling path of the probe function
ASoC: wcd9335: Disable irq on slave ports in the remove function
iwlwifi: follow the new inclusive terminology
iwlwifi: skip first element in the WTAS ACPI table
ice: Only lock to update netdev dev_addr
ath6kl: wmi: fix an error code in ath6kl_wmi_sync_point()
atlantic: Fix driver resume flow.
bcma: Fix memory leak for internally-handled cores
brcmfmac: pcie: fix oops on failure to resume and reprobe
ipv6: make exception cache less predictible
ipv4: make exception cache less predictible
net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed
net: qualcomm: fix QCA7000 checksum handling
octeontx2-af: Fix loop in free and unmap counter
octeontx2-af: Fix static code analyzer reported issues
octeontx2-af: Set proper errorcode for IPv4 checksum errors
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
ASoC: rt5682: Remove unused variable in rt5682_i2c_remove()
iwlwifi Add support for ax201 in Samsung Galaxy Book Flex2 Alpha
f2fs: guarantee to write dirty data when enabling checkpoint back
time: Handle negative seconds correctly in timespec64_to_ns()
io_uring: IORING_OP_WRITE needs hash_reg_file set
bio: fix page leak bio_add_hw_page failure
tty: Fix data race between tiocsti() and flush_to_ldisc()
perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op
x86/resctrl: Fix a maybe-uninitialized build warning treated as error
Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
KVM: s390: index kvm->arch.idle_mask by vcpu_idx
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter
ARM: dts: at91: add pinctrl-{names, 0} for all gpios
fuse: truncate pagecache on atomic_o_trunc
fuse: flush extending writes
IMA: remove -Wmissing-prototypes warning
IMA: remove the dependency on CRYPTO_MD5
fbmem: don't allow too huge resolutions
backlight: pwm_bl: Improve bootloader/kernel device handover
clk: kirkwood: Fix a clocking boot regression
Linux 5.10.65
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie0b9306ba6ee4193de3200df7cdacaeba152b83e
[ Upstream commit 866663b7b52d2da267b28e12eed89ee781b8fed1 ]
When merging one bio to request, if they are discard IO and the queue
supports multi-range discard, we need to return ELEVATOR_DISCARD_MERGE
because both block core and related drivers(nvme, virtio-blk) doesn't
handle mixed discard io merge(traditional IO merge together with
discard merge) well.
Fix the issue by returning ELEVATOR_DISCARD_MERGE in this situation,
so both blk-mq and drivers just need to handle multi-range discard.
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 2705dfb20947 ("block: fix discard request merge")
Link: https://lore.kernel.org/r/20210729034226.1591070-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Add ANDROID_OEM_DATA to block_device_operations which allows a new
vendor specific function call.
Bug: 193106408
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Change-Id: I472f1cc25698c841841822908c4827545b8593df
There are a lot of different structures that need to have a "frozen" abi
for the next 5+ years. Add padding to a lot of them in order to be able
to handle any future changes that might be needed due to LTS and
security fixes that might come up.
It's a best guess, based on what has happened in the past from the
5.4.0..5.4.129 release (1 1/2 years). Yes, past changes do not mean
that future changes will also be needed in the same area, but that is a
hint that those areas are both well maintained and looked after, and
there have been previous problems found in them.
Also the list of structures that are being required based on OEM usage
in the android/ symbol lists were consulted as that's a larger list than
what has been changed in the past.
Hopefully we caught everything we need to worry about, only time will
tell...
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I880bbcda0628a7459988eeb49d18655522697664
Commit "block: Do not accept any requests while suspended" broke the UFS
driver. In the upstream kernel this has been fixed by commit b294ff3e3449
("scsi: ufs: core: Enable power management for wlun"). Backporting that
commit or backporting the entire v5.14-rc1 UFS driver is too risky.
Hence revert the block layer patch that is incompatible with the v5.10
UFS driver power management code.
This reverts commit d55d15a332.
Bug: 193181075
Change-Id: Ic50d4e1df98d7ed393bf9797787225ae22e5d7a3
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Add ANDROID_OEM_DATA for implement of oem gki
Bug: 188749221
Change-Id: I96b1c690fda172d0c490e944557a674a37620742
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Remove this structure member since it is no longer used.
Cc: Changheun Lee <nanich.lee@samsung.com>
Cc: Jaegeuk Kim <jaegeuk@google.com>
Bug: 182716953
Change-Id: I01475bd875fd17c36d02203dc6de5293d7c8db38
(cherry picked from commit 35c820e71565d1fa835b82499359218b219828ac)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.
When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now in do_direct_IO() loop. it's timeline.
| bio merge for 32MB. total 8,192 pages are merged.
| total elapsed time is over 2ms.
|------------------ ... ----------------------->|
| 8,192 pages merged a bio.
| at this time, first bio submit is done.
| 1 bio is split to 32 read request and issue.
|--------------->
|--------------->
|--------------->
......
|--------------->
|--------------->|
total 19ms elapsed to complete 32MB read done from device. |
If bio max size is limited with 1MB, behavior is changed below.
| bio merge for 1MB. 256 pages are merged for each bio.
| total 32 bio will be made.
| total elapsed time is over 2ms. it's same.
| but, first bio submit timing is fast. about 100us.
|--->|--->|--->|---> ... -->|--->|--->|--->|--->|
| 256 pages merged a bio.
| at this time, first bio submit is done.
| and 1 read request is issued for 1 bio.
|--------------->
|--------------->
|--------------->
......
|--------------->
|--------------->|
total 17ms elapsed to complete 32MB read done from device. |
As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.
Signed-off-by: Changheun Lee <nanich.lee@samsung.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210503095203.29076-1-nanich.lee@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 182716953
(cherry picked from commit cd2c7545ae1beac3b6aae033c7f31193b3255946)
Change-Id: Ie3876daa495535dc7f856ed9a281e65d72a437c1
Signed-off-by: Bart Van Assche <bvanassche@google.com>
* aosp/upstream-f2fs-stable-linux-5.10.y:
fs-verity: support reading signature with ioctl
fs-verity: support reading descriptor with ioctl
fs-verity: support reading Merkle tree with ioctl
fs-verity: add FS_IOC_READ_VERITY_METADATA ioctl
fs-verity: don't pass whole descriptor to fsverity_verify_signature()
fs-verity: factor out fsverity_get_descriptor()
fs: simplify freeze_bdev/thaw_bdev
f2fs: remove FAULT_ALLOC_BIO
f2fs: use blkdev_issue_flush in __submit_flush_wait
f2fs: remove a few bd_part checks
Documentation: f2fs: fix typo s/automaic/automatic
f2fs: give a warning only for readonly partition
f2fs: don't grab superblock freeze for flush/ckpt thread
f2fs: add ckpt_thread_ioprio sysfs node
f2fs: introduce checkpoint_merge mount option
f2fs: relocate inline conversion from mmap() to mkwrite()
f2fs: fix a wrong condition in __submit_bio
f2fs: remove unnecessary initialization in xattr.c
f2fs: fix to avoid inconsistent quota data
f2fs: flush data when enabling checkpoint back
f2fs: deprecate f2fs_trace_io
f2fs: Remove readahead collision detection
f2fs: remove unused stat_{inc, dec}_atomic_write
f2fs: introduce sb_status sysfs node
f2fs: fix to use per-inode maxbytes
f2fs: compress: fix potential deadlock
libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()
f2fs: fix to set/clear I_LINKABLE under i_lock
f2fs: fix null page reference in redirty_blocks
f2fs: clean up post-read processing
f2fs: trival cleanup in move_data_block()
f2fs: fix out-of-repair __setattr_copy()
f2fs: fix to tag FIEMAP_EXTENT_MERGED in f2fs_fiemap()
f2fs: introduce a new per-sb directory in sysfs
f2fs: compress: support compress level
f2fs: compress: deny setting unsupported compress algorithm
f2fs: relocate f2fs_precache_extents()
f2fs: enforce the immutable flag on open files
f2fs: enhance to update i_mode and acl atomically in f2fs_setattr()
f2fs: fix to set inode->i_mode correctly for posix_acl_update_mode
f2fs: Replace expression with offsetof()
f2fs: handle unallocated section and zone on pinned/atgc
Bug: 178226640
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I95112779a0a75f3cdbc222126a198d54f1e378ac
Store the frozen superblock in struct block_device to avoid the awkward
interface that can return a sb only used a cookie, an ERR_PTR or NULL.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[ Upstream commit 52abca64fd9410ea6c9a3a74eab25663b403d7da ]
blk_queue_enter() accepts BLK_MQ_REQ_PM requests independent of the runtime
power management state. Now that SCSI domain validation no longer depends
on this behavior, modify the behavior of blk_queue_enter() as follows:
- Do not accept any requests while suspended.
- Only process power management requests while suspending or resuming.
Submitting BLK_MQ_REQ_PM requests to a device that is runtime suspended
causes runtime-suspended devices not to resume as they should. The request
which should cause a runtime resume instead gets issued directly, without
resuming the device first. Of course the device can't handle it properly,
the I/O fails, and the device remains suspended.
The problem is fixed by checking that the queue's runtime-PM status isn't
RPM_SUSPENDED before allowing a request to be issued, and queuing a
runtime-resume request if it is. In particular, the inline
blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and the
code is unified by merging the surrounding checks into the routine. If the
queue isn't set up for runtime PM, or there currently is no restriction on
allowed requests, the request is allowed. Likewise if the BLK_MQ_REQ_PM
flag is set and the status isn't RPM_SUSPENDED. Otherwise a runtime resume
is queued and the request is blocked until conditions are more suitable.
[ bvanassche: modified commit message and removed Cc: stable because
without the previous patches from this series this patch would break
parallel SCSI domain validation + introduced queue_rpm_status() ]
Link: https://lore.kernel.org/r/20201209052951.16136-9-bvanassche@acm.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Can Guo <cang@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-and-tested-by: Martin Kepplinger <martin.kepplinger@puri.sm>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
If non-zero 'chunk_sectors' is passed in to blk_max_size_offset() that
override will be incorrectly ignored.
Old blk_max_size_offset() branching, prior to commit 3ee16db390,
must be used only if passed 'chunk_sectors' override is zero.
Fixes: 3ee16db390 ("dm: fix IO splitting")
Cc: stable@vger.kernel.org # 5.9
Reported-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commit 882ec4e609 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
chunk_sectors must reflect the most limited of all devices in the
IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
.iterate_devices method no longer had there IO split properly.
And commit 5091cdec56 ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).
Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.
Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.
Fixes: 882ec4e609 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec56 ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Pull block driver updates from Jens Axboe:
"Here are the driver updates for 5.10.
A few SCSI updates in here too, in coordination with Martin as they
depend on core block changes for the shared tag bitmap.
This contains:
- NVMe pull requests via Christoph:
- fix keep alive timer modification (Amit Engel)
- order the PCI ID list more sensibly (Andy Shevchenko)
- cleanup the open by controller helper (Chaitanya Kulkarni)
- use an xarray for the CSE log lookup (Chaitanya Kulkarni)
- support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
- fix nvme_ns_report_zones (Christoph Hellwig)
- add a sanity check to nvmet-fc (James Smart)
- fix interrupt allocation when too many polled queues are
specified (Jeffle Xu)
- small nvmet-tcp optimization (Mark Wunderlich)
- fix a controller refcount leak on init failure (Chaitanya
Kulkarni)
- misc cleanups (Chaitanya Kulkarni)
- major refactoring of the scanning code (Christoph Hellwig)
- MD updates via Song:
- Bug fixes in bitmap code, from Zhao Heming
- Fix a work queue check, from Guoqing Jiang
- Fix raid5 oops with reshape, from Song Liu
- Clean up unused code, from Jason Yan
- Discard improvements, from Xiao Ni
- raid5/6 page offset support, from Yufen Yu
- Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
Hannes)
- null_blk open/active zone limit support (Niklas)
- Set of bcache updates (Coly, Dongsheng, Qinglang)"
* tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
md/raid5: fix oops during stripe resizing
md/bitmap: fix memory leak of temporary bitmap
md: fix the checking of wrong work queue
md/bitmap: md_bitmap_get_counter returns wrong blocks
md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
md/raid0: remove unused function is_io_in_chunk_boundary()
nvme-core: remove extra condition for vwc
nvme-core: remove extra variable
nvme: remove nvme_identify_ns_list
nvme: refactor nvme_validate_ns
nvme: move nvme_validate_ns
nvme: query namespace identifiers before adding the namespace
nvme: revalidate zone bitmaps in nvme_update_ns_info
nvme: remove nvme_update_formats
nvme: update the known admin effects
nvme: set the queue limits in nvme_update_ns_info
nvme: remove the 0 lba_shift check in nvme_update_ns_info
nvme: clean up the check for too large logic block sizes
nvme: freeze the queue over ->lba_shift updates
nvme: factor out a nvme_configure_metadata helper
...
Pull block updates from Jens Axboe:
- Series of merge handling cleanups (Baolin, Christoph)
- Series of blk-throttle fixes and cleanups (Baolin)
- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)
- Removal of bdget() as a generic API (Christoph)
- Removal of blkdev_get() as a generic API (Christoph)
- Cleanup of is-partition checks (Christoph)
- Series reworking disk revalidation (Christoph)
- Series cleaning up bio flags (Christoph)
- bio crypt fixes (Eric)
- IO stats inflight tweak (Gabriel)
- blk-mq tags fixes (Hannes)
- Buffer invalidation fixes (Jan)
- Allow soft limits for zone append (Johannes)
- Shared tag set improvements (John, Kashyap)
- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)
- DM no-wait support (Mike, Konstantin)
- Request allocation improvements (Ming)
- Allow md/dm/bcache to use IO stat helpers (Song)
- Series improving blk-iocost (Tejun)
- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)
* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...
Always return BLK_ZONED_NONE if zoned device support is not enabled.
This allows various compiler optimizations including the dead code
elimination that we so like for avoiding ifdefs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Also move the definition from the public blkdev.h to the private
block/blk.h header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Also move the definition from the public blkdev.h to the private
block/blk.h header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The field of 'q_usage_counter' is always fetched in fast path of every
block driver, and move it into front of 'request_queue', so it can be
fetched into 1st cacheline of 'request_queue' instance.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Veronika Kabatova <vkabatov@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All remaining callers of bdget() outside of fs/block_dev.c want to get a
reference to the struct block_device for a given struct hd_struct. Add
a helper just for that and then mark bdget static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add QUEUE_FLAG_NOWAIT to allow a block device to advertise support for
REQ_NOWAIT. Bio-based devices may set QUEUE_FLAG_NOWAIT where
applicable.
Update QUEUE_FLAG_MQ_DEFAULT to include QUEUE_FLAG_NOWAIT. Also
update submit_bio_checks() to verify it is set for REQ_NOWAIT bios.
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a littler helper to make the somewhat arcane bd_contains checks a
little more obvious.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it. This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.
One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore. It is replaced with a queue attribute which
also is writable for easier testing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Drivers shouldn't really mess with the readahead size, as that is a VM
concept. Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk. Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors. To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is possible, albeit more unlikely, for a block device to have a non
power-of-2 for chunk_sectors (e.g. 10+2 RAID6 with 128K chunk_sectors,
which results in a full-stripe size of 1280K. This causes the RAID6's
io_opt to be advertised as 1280K, and a stacked device _could_ then be
made to use a blocksize, aka chunk_sectors, that matches non power-of-2
io_opt of underlying RAID6 -- resulting in stacked device's
chunk_sectors being a non power-of-2).
Update blk_queue_chunk_sectors() and blk_max_size_offset() to
accommodate drivers that need a non power-of-2 chunk_sectors.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as
regular disks. In this case, ensure that command completion is correctly
executed by changing sd_zbc_complete() to return good_bytes instead of 0
and causing a hang during device probe (endless retries).
When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to
have partitions, it will be used as a regular disk. In this case, make sure
to not do anything in sd_zbc_revalidate_zones() as that triggers warnings.
Since all these different cases result in subtle settings of the disk queue
zoned model, introduce the block layer helper function
blk_queue_set_zoned() to generically implement setting up the effective
zoned model according to the disk type, the presence of partitions on the
disk and CONFIG_BLK_DEV_ZONED configuration.
Link: https://lore.kernel.org/r/20200915073347.832424-2-damien.lemoal@wdc.com
Fixes: b72053072c ("block: allow partitions on host aware zone devices")
Cc: <stable@vger.kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
These functions can be used to enable iostat for partitions on devices
like md, bcache.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Discarding blocks and buffers under a mounted filesystem is hardly
anything admin wants to do. Usually it will confuse the filesystem and
sometimes the loss of buffer_head state (including b_private field) can
even cause crashes like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G O --------- - - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
...
Call Trace:
__jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
kjournald2+0xbd/0x270 [jbd2]
So if we don't have block device open with O_EXCL already, claim the
block device while we truncate buffer cache. This makes sure any
exclusive block device user (such as filesystem) cannot operate on the
device while we are discarding buffer cache.
Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[axboe: fix !CONFIG_BLOCK error in truncate_bdev_range()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For when using a shared sbitmap, no longer should the number of active
request queues per hctx be relied on for when judging how to share the tag
bitmap.
Instead maintain the number of active request queues per tag_set, and make
the judgement based on that.
Originally-from: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The per-hctx nr_active value can no longer be used to fairly assign a share
of tag depth per request queue for when using a shared sbitmap, as it does
not consider that the tags are shared tags over all hctx's.
For this case, record the nr_active_requests per request_queue, and make
the judgement based on that value.
Co-developed-with: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The alignment offset is only used in slow path callers, so just calculate
it on the fly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The alignment offset is only used in slow path callers, so just calculate
it on the fly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
request_queue.rpm_status is assigned values of the rpm_status enum only,
so reflect that in its type.
Note that including <linux/pm.h> is (currently) a no-op, as it is
already included through <linux/genhd.h> and <linux/device.h>, but it is
better to play it safe.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block stacking updates from Jens Axboe:
"The stacking related fixes depended on both the core block and drivers
branches, so here's a topic branch with that change.
Outside of that, a late fix from Johannes for zone revalidation"
* tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-block:
block: don't do revalidate zones on invalid devices
block: remove blk_queue_stack_limits
block: remove bdev_stack_limits
block: inherit the zoned characteristics in blk_stack_limits
Pull block driver updates from Jens Axboe:
- NVMe:
- ZNS support (Aravind, Keith, Matias, Niklas)
- Misc cleanups, optimizations, fixes (Baolin, Chaitanya, David,
Dongli, Max, Sagi)
- null_blk zone capacity support (Aravind)
- MD:
- raid5/6 fixes (ChangSyun)
- Warning fixes (Damien)
- raid5 stripe fixes (Guoqing, Song, Yufen)
- sysfs deadlock fix (Junxiao)
- raid10 deadlock fix (Vitaly)
- struct_size conversions (Gustavo)
- Set of bcache updates/fixes (Coly)
* tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block: (117 commits)
md/raid5: Allow degraded raid6 to do rmw
md/raid5: Fix Force reconstruct-write io stuck in degraded raid5
raid5: don't duplicate code for different paths in handle_stripe
raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show
md: print errno in super_written
md/raid5: remove the redundant setting of STRIPE_HANDLE
md: register new md sysfs file 'uuid' read-only
md: fix max sectors calculation for super 1.0
nvme-loop: remove extra variable in create ctrl
nvme-loop: set ctrl state connecting after init
nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths
nvme-multipath: fix logic for non-optimized paths
nvme-rdma: fix controller reset hang during traffic
nvme-tcp: fix controller reset hang during traffic
nvmet: introduce the passthru Kconfig option
nvmet: introduce the passthru configfs interface
nvmet: Add passthru enable/disable helpers
nvmet: add passthru code to process commands
nvme: export nvme_find_get_ns() and nvme_put_ns()
nvme: introduce nvme_ctrl_get_by_path()
...
Pull io_uring updates from Jens Axboe:
"Lots of cleanups in here, hardening the code and/or making it easier
to read and fixing bugs, but a core feature/change too adding support
for real async buffered reads. With the latter in place, we just need
buffered write async support and we're done relying on kthreads for
the fast path. In detail:
- Cleanup how memory accounting is done on ring setup/free (Bijan)
- sq array offset calculation fixup (Dmitry)
- Consistently handle blocking off O_DIRECT submission path (me)
- Support proper async buffered reads, instead of relying on kthread
offload for that. This uses the page waitqueue to drive retries
from task_work, like we handle poll based retry. (me)
- IO completion optimizations (me)
- Fix race with accounting and ring fd install (me)
- Support EPOLLEXCLUSIVE (Jiufei)
- Get rid of the io_kiocb unionizing, made possible by shrinking
other bits (Pavel)
- Completion side cleanups (Pavel)
- Cleanup REQ_F_ flags handling, and kill off many of them (Pavel)
- Request environment grabbing cleanups (Pavel)
- File and socket read/write cleanups (Pavel)
- Improve kiocb_set_rw_flags() (Pavel)
- Tons of fixes and cleanups (Pavel)
- IORING_SQ_NEED_WAKEUP clear fix (Xiaoguang)"
* tag 'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-block: (127 commits)
io_uring: flip if handling after io_setup_async_rw
fs: optimise kiocb_set_rw_flags()
io_uring: don't touch 'ctx' after installing file descriptor
io_uring: get rid of atomic FAA for cq_timeouts
io_uring: consolidate *_check_overflow accounting
io_uring: fix stalled deferred requests
io_uring: fix racy overflow count reporting
io_uring: deduplicate __io_complete_rw()
io_uring: de-unionise io_kiocb
io-wq: update hash bits
io_uring: fix missing io_queue_linked_timeout()
io_uring: mark ->work uninitialised after cleanup
io_uring: deduplicate io_grab_files() calls
io_uring: don't do opcode prep twice
io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works
io_uring: batch put_task_struct()
tasks: add put_task_struct_many()
io_uring: return locked and pinned page accounting
io_uring: don't miscount pinned memory
io_uring: don't open-code recv kbuf managment
...
Pull core block updates from Jens Axboe:
"Good amount of cleanups and tech debt removals in here, and as a
result, the diffstat shows a nice net reduction in code.
- Softirq completion cleanups (Christoph)
- Stop using ->queuedata (Christoph)
- Cleanup bd claiming (Christoph)
- Use check_events, moving away from the legacy media change
(Christoph)
- Use inode i_blkbits consistently (Christoph)
- Remove old unused writeback congestion bits (Christoph)
- Cleanup/unify submission path (Christoph)
- Use bio_uninit consistently, instead of bio_disassociate_blkg
(Christoph)
- sbitmap cleared bits handling (John)
- Request merging blktrace event addition (Jan)
- sysfs add/remove race fixes (Luis)
- blk-mq tag fixes/optimizations (Ming)
- Duplicate words in comments (Randy)
- Flush deferral cleanup (Yufen)
- IO context locking/retry fixes (John)
- struct_size() usage (Gustavo)
- blk-iocost fixes (Chengming)
- blk-cgroup IO stats fixes (Boris)
- Various little fixes"
* tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
block: blk-timeout: delete duplicated word
block: blk-mq-sched: delete duplicated word
block: blk-mq: delete duplicated word
block: genhd: delete duplicated words
block: elevator: delete duplicated word and fix typos
block: bio: delete duplicated words
block: bfq-iosched: fix duplicated word
iocost_monitor: start from the oldest usage index
iocost: Fix check condition of iocg abs_vdebt
block: Remove callback typedefs for blk_mq_ops
block: Use non _rcu version of list functions for tag_set_list
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
...
This function is just a tiny wrapper around blk_stack_limit and has
two callers. Simplify the stack a bit by open coding it in the two
callers.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lift the code from device mapper into blk_stack_limits to inherity
the stacking limitations. This ensures we do the right thing for
all stacked zoned block devices.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-5.9/drivers: (38 commits)
block: add max_active_zones to blk-sysfs
block: add max_open_zones to blk-sysfs
s390/dasd: Use struct_size() helper
s390/dasd: fix inability to use DASD with DIAG driver
md-cluster: fix wild pointer of unlock_all_bitmaps()
md/raid5-cache: clear MD_SB_CHANGE_PENDING before flushing stripes
md: fix deadlock causing by sysfs_notify
md: improve io stats accounting
md: raid0/linear: fix dereference before null check on pointer mddev
rsxx: switch from 'pci_free_consistent()' to 'dma_free_coherent()'
nvme: remove ns->disk checks
nvme-pci: use standard block status symbolic names
nvme-pci: use the consistent return type of nvme_pci_iod_alloc_size()
nvme-pci: add a blank line after declarations
nvme-pci: fix some comments issues
nvme-pci: remove redundant segment validation
nvme: document quirked Intel models
nvme: expose reconnect_delay and ctrl_loss_tmo via sysfs
nvme: support for zoned namespaces
nvme: support for multiple Command Sets Supported and Effects log pages
...
* for-5.9/block: (124 commits)
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
block: use bd_prepare_to_claim directly in the loop driver
block: refactor bd_start_claiming
block: simplify the restart case in __blkdev_get
Revert "blk-rq-qos: remove redundant finish_wait to rq_qos_wait."
block: always remove partitions from blk_drop_partitions()
block: relax jiffies rounding for timeouts
blk-mq: remove redundant validation in __blk_mq_end_request()
blk-mq: Remove unnecessary local variable
writeback: remove bdi->congested_fn
writeback: remove struct bdi_writeback_congested
writeback: remove {set,clear}_wb_congested
...
The arcane magic in bd_start_claiming is only needed to be able to claim
a block_device that hasn't been fully set up. Switch the loop driver
that claims from the ioctl path with a fully set up struct block_device
to just use the much simpler bd_prepare_to_claim directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new max_active zones definition in the sysfs documentation.
This definition will be common for all devices utilizing the zoned block
device support in the kernel.
Export max_active_zones according to this new definition for NVMe Zoned
Namespace devices, ZAC ATA devices (which are treated as SCSI devices by
the kernel), and ZBC SCSI devices.
Add the new max_active_zones member to struct request_queue, rather
than as a queue limit, since this property cannot be split across stacking
drivers.
For SCSI devices, even though max active zones is not part of the ZBC/ZAC
spec, export max_active_zones as 0, signifying "no limit".
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new max_open_zones definition in the sysfs documentation.
This definition will be common for all devices utilizing the zoned block
device support in the kernel.
Export max open zones according to this new definition for NVMe Zoned
Namespace devices, ZAC ATA devices (which are treated as SCSI devices by
the kernel), and ZBC SCSI devices.
Add the new max_open_zones member to struct request_queue, rather
than as a queue limit, since this property cannot be split across stacking
drivers.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
md is the last driver using the legacy media_changed method. Switch
it over to (not so) new ->clear_events approach, which also removes the
need for the ->revalidate_disk method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[axboe: remove unused 'bdops' variable in disk_clear_events()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>