Merge tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - The largest change for this cycle is the DM zoned target's metadata
   version 2 feature that adds support for pairing regular block devices
   with a zoned device to ease the performance impact associated with
   finite random zones of zoned device.

   The changes came in three batches: the first prepared for and then
   added the ability to pair a single regular block device, the second
   was a batch of fixes to improve zoned's reclaim heuristic, and the
   third removed the limitation of only adding a single additional
   regular block device to allow many devices.

   Testing has shown linear scaling as more devices are added.

 - Add new emulated block size (ebs) target that emulates a smaller
   logical_block_size than a block device supports

   The primary use-case is to emulate "512e" devices that have 512 byte
   logical_block_size and 4KB physical_block_size. This is useful to
   some legacy applications that otherwise wouldn't be able to be used
   on 4K devices because they depend on issuing IO in 512 byte
   granularity.

 - Add discard interfaces to DM bufio. First consumer of the interface
   is the dm-ebs target that makes heavy use of dm-bufio.

 - Fix DM crypt's block queue_limits stacking to not truncate
   logic_block_size.

 - Add Documentation for DM integrity's status line.

 - Switch DMDEBUG from a compile time config option to instead use
   dynamic debug via pr_debug.

 - Fix DM multipath target's hueristic for how it manages
   "queue_if_no_path" state internally.

   DM multipath now avoids disabling "queue_if_no_path" unless it is
   actually needed (e.g. in response to configure timeout or explicit
   "fail_if_no_path" message).

   This fixes reports of spurious -EIO being reported back to userspace
   application during fault tolerance testing with an NVMe backend.
   Added various dynamic DMDEBUG messages to assist with debugging
   queue_if_no_path in the future.

 - Add a new DM multipath "Historical Service Time" Path Selector.

 - Fix DM multipath's dm_blk_ioctl() to switch paths on IO error.

 - Improve DM writecache target performance by using explicit cache
   flushing for target's single-threaded usecase and a small cleanup to
   remove unnecessary test in persistent_memory_claim.

 - Other small cleanups in DM core, dm-persistent-data, and DM
   integrity.

* tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
  dm crypt: avoid truncating the logical block size
  dm mpath: add DM device name to Failing/Reinstating path log messages
  dm mpath: enhance queue_if_no_path debugging
  dm mpath: restrict queue_if_no_path state machine
  dm mpath: simplify __must_push_back
  dm zoned: check superblock location
  dm zoned: prefer full zones for reclaim
  dm zoned: select reclaim zone based on device index
  dm zoned: allocate zone by device index
  dm zoned: support arbitrary number of devices
  dm zoned: move random and sequential zones into struct dmz_dev
  dm zoned: per-device reclaim
  dm zoned: add metadata pointer to struct dmz_dev
  dm zoned: add device pointer to struct dm_zone
  dm zoned: allocate temporary superblock for tertiary devices
  dm zoned: convert to xarray
  dm zoned: add a 'reserved' zone flag
  dm zoned: improve logging messages for reclaim
  dm zoned: avoid unnecessary device recalulation for secondary superblock
  dm zoned: add debugging message for reading superblocks
  ...
This commit is contained in:
Linus Torvalds
2020-06-05 15:45:03 -07:00
30 changed files with 2779 additions and 649 deletions

View File

@@ -0,0 +1,51 @@
======
dm-ebs
======
This target is similar to the linear target except that it emulates
a smaller logical block size on a device with a larger logical block
size. Its main purpose is to provide emulation of 512 byte sectors on
devices that do not provide this emulation (i.e. 4K native disks).
Supported emulated logical block sizes 512, 1024, 2048 and 4096.
Underlying block size can be set to > 4K to test buffering larger units.
Table parameters
----------------
<dev path> <offset> <emulated sectors> [<underlying sectors>]
Mandatory parameters:
<dev path>:
Full pathname to the underlying block-device,
or a "major:minor" device-number.
<offset>:
Starting sector within the device;
has to be a multiple of <emulated sectors>.
<emulated sectors>:
Number of sectors defining the logical block size to be emulated;
1, 2, 4, 8 sectors of 512 bytes supported.
Optional parameter:
<underyling sectors>:
Number of sectors defining the logical block size of <dev path>.
2^N supported, e.g. 8 = emulate 8 sectors of 512 bytes = 4KiB.
If not provided, the logical block size of <dev path> will be used.
Examples:
Emulate 1 sector = 512 bytes logical block size on /dev/sda starting at
offset 1024 sectors with underlying devices block size automatically set:
ebs /dev/sda 1024 1
Emulate 2 sector = 1KiB logical block size on /dev/sda starting at
offset 128 sectors, enforce 2KiB underlying device block size.
This presumes 2KiB logical blocksize on /dev/sda or less to work:
ebs /dev/sda 128 2 4

View File

@@ -193,6 +193,14 @@ should not be changed when reloading the target because the layout of disk
data depend on them and the reloaded target would be non-functional.
Status line:
1. the number of integrity mismatches
2. provided data sectors - that is the number of sectors that the user
could use
3. the current recalculating position (or '-' if we didn't recalculate)
The layout of the formatted block device:
* reserved sectors

View File

@@ -37,9 +37,13 @@ Algorithm
dm-zoned implements an on-disk buffering scheme to handle non-sequential
write accesses to the sequential zones of a zoned block device.
Conventional zones are used for caching as well as for storing internal
metadata.
metadata. It can also use a regular block device together with the zoned
block device; in that case the regular block device will be split logically
in zones with the same size as the zoned block device. These zones will be
placed in front of the zones from the zoned block device and will be handled
just like conventional zones.
The zones of the device are separated into 2 types:
The zones of the device(s) are separated into 2 types:
1) Metadata zones: these are conventional zones used to store metadata.
Metadata zones are not reported as useable capacity to the user.
@@ -127,6 +131,13 @@ resumed. Flushing metadata thus only temporarily delays write and
discard requests. Read requests can be processed concurrently while
metadata flush is being executed.
If a regular device is used in conjunction with the zoned block device,
a third set of metadata (without the zone bitmaps) is written to the
start of the zoned block device. This metadata has a generation counter of
'0' and will never be updated during normal operation; it just serves for
identification purposes. The first and second copy of the metadata
are located at the start of the regular block device.
Usage
=====
@@ -138,9 +149,46 @@ Ex::
dmzadm --format /dev/sdxx
For a formatted device, the target can be created normally with the
dmsetup utility. The only parameter that dm-zoned requires is the
underlying zoned block device name. Ex::
echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \
dmsetup create dmz-`basename ${dev}`
If two drives are to be used, both devices must be specified, with the
regular block device as the first device.
Ex::
dmzadm --format /dev/sdxx /dev/sdyy
Fomatted device(s) can be started with the dmzadm utility, too.:
Ex::
dmzadm --start /dev/sdxx /dev/sdyy
Information about the internal layout and current usage of the zones can
be obtained with the 'status' callback from dmsetup:
Ex::
dmsetup status /dev/dm-X
will return a line
0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential
where <nr_zones> is the total number of zones, <nr_unmap_rnd> is the number
of unmapped (ie free) random zones, <nr_rnd> the total number of zones,
<nr_unmap_seq> the number of unmapped sequential zones, and <nr_seq> the
total number of sequential zones.
Normally the reclaim process will be started once there are less than 50
percent free random zones. In order to start the reclaim process manually
even before reaching this threshold the 'dmsetup message' function can be
used:
Ex::
dmsetup message /dev/dm-X 0 reclaim
will start the reclaim process and random zones will be moved to sequential
zones.