Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer: - A major update for DM cache that reduces the latency for deciding whether blocks should migrate to/from the cache. The bio-prison-v2 interface supports this improvement by enabling direct dispatch of work to workqueues rather than having to delay the actual work dispatch to the DM cache core. So the dm-cache policies are much more nimble by being able to drive IO as they see fit. One immediate benefit from the improved latency is a cache that should be much more adaptive to changing workloads. - Add a new DM integrity target that emulates a block device that has additional per-sector tags that can be used for storing integrity information. - Add a new authenticated encryption feature to the DM crypt target that builds on the capabilities provided by the DM integrity target. - Add MD interface for switching the raid4/5/6 journal mode and update the DM raid target to use it to enable aid4/5/6 journal write-back support. - Switch the DM verity target over to using the asynchronous hash crypto API (this helps work better with architectures that have access to off-CPU algorithm providers, which should reduce CPU utilization). - Various request-based DM and DM multipath fixes and improvements from Bart and Christoph. - A DM thinp target fix for a bio structure leak that occurs for each discard IFF discard passdown is enabled. - A fix for a possible deadlock in DM bufio and a fix to re-check the new buffer allocation watermark in the face of competing admin changes to the 'max_cache_size_bytes' tunable. - A couple DM core cleanups. * tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits) dm bufio: check new buffer allocation watermark every 30 seconds dm bufio: avoid a possible ABBA deadlock dm mpath: make it easier to detect unintended I/O request flushes dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit() dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH dm: introduce enum dm_queue_mode to cleanup related code dm mpath: verify __pg_init_all_paths locking assumptions at runtime dm: verify suspend_locking assumptions at runtime dm block manager: remove an unused argument from dm_block_manager_create() dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue() dm mpath: delay requeuing while path initialization is in progress dm mpath: avoid that path removal can trigger an infinite loop dm mpath: split and rename activate_path() to prepare for its expanded use dm ioctl: prevent stack leak in dm ioctl call dm integrity: use previously calculated log2 of sectors_per_block dm integrity: use hex2bin instead of open-coded variant dm crypt: replace custom implementation of hex2bin() dm crypt: remove obsolete references to per-CPU state dm verity: switch to using asynchronous hash crypto API dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues ...
This commit is contained in:
@@ -11,14 +11,31 @@ Parameters: <cipher> <key> <iv_offset> <device path> \
|
||||
<offset> [<#opt_params> <opt_params>]
|
||||
|
||||
<cipher>
|
||||
Encryption cipher and an optional IV generation mode.
|
||||
(In format cipher[:keycount]-chainmode-ivmode[:ivopts]).
|
||||
Examples:
|
||||
des
|
||||
aes-cbc-essiv:sha256
|
||||
twofish-ecb
|
||||
Encryption cipher, encryption mode and Initial Vector (IV) generator.
|
||||
|
||||
/proc/crypto contains supported crypto modes
|
||||
The cipher specifications format is:
|
||||
cipher[:keycount]-chainmode-ivmode[:ivopts]
|
||||
Examples:
|
||||
aes-cbc-essiv:sha256
|
||||
aes-xts-plain64
|
||||
serpent-xts-plain64
|
||||
|
||||
Cipher format also supports direct specification with kernel crypt API
|
||||
format (selected by capi: prefix). The IV specification is the same
|
||||
as for the first format type.
|
||||
This format is mainly used for specification of authenticated modes.
|
||||
|
||||
The crypto API cipher specifications format is:
|
||||
capi:cipher_api_spec-ivmode[:ivopts]
|
||||
Examples:
|
||||
capi:cbc(aes)-essiv:sha256
|
||||
capi:xts(aes)-plain64
|
||||
Examples of authenticated modes:
|
||||
capi:gcm(aes)-random
|
||||
capi:authenc(hmac(sha256),xts(aes))-random
|
||||
capi:rfc7539(chacha20,poly1305)-random
|
||||
|
||||
The /proc/crypto contains a list of curently loaded crypto modes.
|
||||
|
||||
<key>
|
||||
Key used for encryption. It is encoded either as a hexadecimal number
|
||||
@@ -93,6 +110,32 @@ submit_from_crypt_cpus
|
||||
thread because it benefits CFQ to have writes submitted using the
|
||||
same context.
|
||||
|
||||
integrity:<bytes>:<type>
|
||||
The device requires additional <bytes> metadata per-sector stored
|
||||
in per-bio integrity structure. This metadata must by provided
|
||||
by underlying dm-integrity target.
|
||||
|
||||
The <type> can be "none" if metadata is used only for persistent IV.
|
||||
|
||||
For Authenticated Encryption with Additional Data (AEAD)
|
||||
the <type> is "aead". An AEAD mode additionally calculates and verifies
|
||||
integrity for the encrypted device. The additional space is then
|
||||
used for storing authentication tag (and persistent IV if needed).
|
||||
|
||||
sector_size:<bytes>
|
||||
Use <bytes> as the encryption unit instead of 512 bytes sectors.
|
||||
This option can be in range 512 - 4096 bytes and must be power of two.
|
||||
Virtual device will announce this size as a minimal IO and logical sector.
|
||||
|
||||
iv_large_sectors
|
||||
IV generators will use sector number counted in <sector_size> units
|
||||
instead of default 512 bytes sectors.
|
||||
|
||||
For example, if <sector_size> is 4096 bytes, plain64 IV for the second
|
||||
sector will be 8 (without flag) and 1 if iv_large_sectors is present.
|
||||
The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
|
||||
if this flag is specified.
|
||||
|
||||
Example scripts
|
||||
===============
|
||||
LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
|
||||
|
199
Documentation/device-mapper/dm-integrity.txt
Normal file
199
Documentation/device-mapper/dm-integrity.txt
Normal file
@@ -0,0 +1,199 @@
|
||||
The dm-integrity target emulates a block device that has additional
|
||||
per-sector tags that can be used for storing integrity information.
|
||||
|
||||
A general problem with storing integrity tags with every sector is that
|
||||
writing the sector and the integrity tag must be atomic - i.e. in case of
|
||||
crash, either both sector and integrity tag or none of them is written.
|
||||
|
||||
To guarantee write atomicity, the dm-integrity target uses journal, it
|
||||
writes sector data and integrity tags into a journal, commits the journal
|
||||
and then copies the data and integrity tags to their respective location.
|
||||
|
||||
The dm-integrity target can be used with the dm-crypt target - in this
|
||||
situation the dm-crypt target creates the integrity data and passes them
|
||||
to the dm-integrity target via bio_integrity_payload attached to the bio.
|
||||
In this mode, the dm-crypt and dm-integrity targets provide authenticated
|
||||
disk encryption - if the attacker modifies the encrypted device, an I/O
|
||||
error is returned instead of random data.
|
||||
|
||||
The dm-integrity target can also be used as a standalone target, in this
|
||||
mode it calculates and verifies the integrity tag internally. In this
|
||||
mode, the dm-integrity target can be used to detect silent data
|
||||
corruption on the disk or in the I/O path.
|
||||
|
||||
|
||||
When loading the target for the first time, the kernel driver will format
|
||||
the device. But it will only format the device if the superblock contains
|
||||
zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
|
||||
target can't be loaded.
|
||||
|
||||
To use the target for the first time:
|
||||
1. overwrite the superblock with zeroes
|
||||
2. load the dm-integrity target with one-sector size, the kernel driver
|
||||
will format the device
|
||||
3. unload the dm-integrity target
|
||||
4. read the "provided_data_sectors" value from the superblock
|
||||
5. load the dm-integrity target with the the target size
|
||||
"provided_data_sectors"
|
||||
6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
|
||||
with the size "provided_data_sectors"
|
||||
|
||||
|
||||
Target arguments:
|
||||
|
||||
1. the underlying block device
|
||||
|
||||
2. the number of reserved sector at the beginning of the device - the
|
||||
dm-integrity won't read of write these sectors
|
||||
|
||||
3. the size of the integrity tag (if "-" is used, the size is taken from
|
||||
the internal-hash algorithm)
|
||||
|
||||
4. mode:
|
||||
D - direct writes (without journal) - in this mode, journaling is
|
||||
not used and data sectors and integrity tags are written
|
||||
separately. In case of crash, it is possible that the data
|
||||
and integrity tag doesn't match.
|
||||
J - journaled writes - data and integrity tags are written to the
|
||||
journal and atomicity is guaranteed. In case of crash,
|
||||
either both data and tag or none of them are written. The
|
||||
journaled mode degrades write throughput twice because the
|
||||
data have to be written twice.
|
||||
R - recovery mode - in this mode, journal is not replayed,
|
||||
checksums are not checked and writes to the device are not
|
||||
allowed. This mode is useful for data recovery if the
|
||||
device cannot be activated in any of the other standard
|
||||
modes.
|
||||
|
||||
5. the number of additional arguments
|
||||
|
||||
Additional arguments:
|
||||
|
||||
journal_sectors:number
|
||||
The size of journal, this argument is used only if formatting the
|
||||
device. If the device is already formatted, the value from the
|
||||
superblock is used.
|
||||
|
||||
interleave_sectors:number
|
||||
The number of interleaved sectors. This values is rounded down to
|
||||
a power of two. If the device is already formatted, the value from
|
||||
the superblock is used.
|
||||
|
||||
buffer_sectors:number
|
||||
The number of sectors in one buffer. The value is rounded down to
|
||||
a power of two.
|
||||
|
||||
The tag area is accessed using buffers, the buffer size is
|
||||
configurable. The large buffer size means that the I/O size will
|
||||
be larger, but there could be less I/Os issued.
|
||||
|
||||
journal_watermark:number
|
||||
The journal watermark in percents. When the size of the journal
|
||||
exceeds this watermark, the thread that flushes the journal will
|
||||
be started.
|
||||
|
||||
commit_time:number
|
||||
Commit time in milliseconds. When this time passes, the journal is
|
||||
written. The journal is also written immediatelly if the FLUSH
|
||||
request is received.
|
||||
|
||||
internal_hash:algorithm(:key) (the key is optional)
|
||||
Use internal hash or crc.
|
||||
When this argument is used, the dm-integrity target won't accept
|
||||
integrity tags from the upper target, but it will automatically
|
||||
generate and verify the integrity tags.
|
||||
|
||||
You can use a crc algorithm (such as crc32), then integrity target
|
||||
will protect the data against accidental corruption.
|
||||
You can also use a hmac algorithm (for example
|
||||
"hmac(sha256):0123456789abcdef"), in this mode it will provide
|
||||
cryptographic authentication of the data without encryption.
|
||||
|
||||
When this argument is not used, the integrity tags are accepted
|
||||
from an upper layer target, such as dm-crypt. The upper layer
|
||||
target should check the validity of the integrity tags.
|
||||
|
||||
journal_crypt:algorithm(:key) (the key is optional)
|
||||
Encrypt the journal using given algorithm to make sure that the
|
||||
attacker can't read the journal. You can use a block cipher here
|
||||
(such as "cbc(aes)") or a stream cipher (for example "chacha20",
|
||||
"salsa20", "ctr(aes)" or "ecb(arc4)").
|
||||
|
||||
The journal contains history of last writes to the block device,
|
||||
an attacker reading the journal could see the last sector nubmers
|
||||
that were written. From the sector numbers, the attacker can infer
|
||||
the size of files that were written. To protect against this
|
||||
situation, you can encrypt the journal.
|
||||
|
||||
journal_mac:algorithm(:key) (the key is optional)
|
||||
Protect sector numbers in the journal from accidental or malicious
|
||||
modification. To protect against accidental modification, use a
|
||||
crc algorithm, to protect against malicious modification, use a
|
||||
hmac algorithm with a key.
|
||||
|
||||
This option is not needed when using internal-hash because in this
|
||||
mode, the integrity of journal entries is checked when replaying
|
||||
the journal. Thus, modified sector number would be detected at
|
||||
this stage.
|
||||
|
||||
block_size:number
|
||||
The size of a data block in bytes. The larger the block size the
|
||||
less overhead there is for per-block integrity metadata.
|
||||
Supported values are 512, 1024, 2048 and 4096 bytes. If not
|
||||
specified the default block size is 512 bytes.
|
||||
|
||||
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
|
||||
be changed when reloading the target (load an inactive table and swap the
|
||||
tables with suspend and resume). The other arguments should not be changed
|
||||
when reloading the target because the layout of disk data depend on them
|
||||
and the reloaded target would be non-functional.
|
||||
|
||||
|
||||
The layout of the formatted block device:
|
||||
* reserved sectors (they are not used by this target, they can be used for
|
||||
storing LUKS metadata or for other purpose), the size of the reserved
|
||||
area is specified in the target arguments
|
||||
* superblock (4kiB)
|
||||
* magic string - identifies that the device was formatted
|
||||
* version
|
||||
* log2(interleave sectors)
|
||||
* integrity tag size
|
||||
* the number of journal sections
|
||||
* provided data sectors - the number of sectors that this target
|
||||
provides (i.e. the size of the device minus the size of all
|
||||
metadata and padding). The user of this target should not send
|
||||
bios that access data beyond the "provided data sectors" limit.
|
||||
* flags - a flag is set if journal_mac is used
|
||||
* journal
|
||||
The journal is divided into sections, each section contains:
|
||||
* metadata area (4kiB), it contains journal entries
|
||||
every journal entry contains:
|
||||
* logical sector (specifies where the data and tag should
|
||||
be written)
|
||||
* last 8 bytes of data
|
||||
* integrity tag (the size is specified in the superblock)
|
||||
every metadata sector ends with
|
||||
* mac (8-bytes), all the macs in 8 metadata sectors form a
|
||||
64-byte value. It is used to store hmac of sector
|
||||
numbers in the journal section, to protect against a
|
||||
possibility that the attacker tampers with sector
|
||||
numbers in the journal.
|
||||
* commit id
|
||||
* data area (the size is variable; it depends on how many journal
|
||||
entries fit into the metadata area)
|
||||
every sector in the data area contains:
|
||||
* data (504 bytes of data, the last 8 bytes are stored in
|
||||
the journal entry)
|
||||
* commit id
|
||||
To test if the whole journal section was written correctly, every
|
||||
512-byte sector of the journal ends with 8-byte commit id. If the
|
||||
commit id matches on all sectors in a journal section, then it is
|
||||
assumed that the section was written correctly. If the commit id
|
||||
doesn't match, the section was written partially and it should not
|
||||
be replayed.
|
||||
* one or more runs of interleaved tags and data. Each run contains:
|
||||
* tag area - it contains integrity tags. There is one tag for each
|
||||
sector in the data area
|
||||
* data area - it contains data sectors. The number of data sectors
|
||||
in one run must be a power of two. log2 of this value is stored
|
||||
in the superblock.
|
@@ -170,6 +170,13 @@ The target is named "raid" and it accepts the following parameters:
|
||||
Takeover/reshape is not possible with a raid4/5/6 journal device;
|
||||
it has to be deconfigured before requesting these.
|
||||
|
||||
[journal_mode <mode>]
|
||||
This option sets the caching mode on journaled raid4/5/6 raid sets
|
||||
(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
|
||||
If 'writeback' is selected the journal device has to be resilient
|
||||
and must not suffer from the 'write hole' problem itself (e.g. use
|
||||
raid1 or raid10) to avoid a single point of failure.
|
||||
|
||||
<#raid_devs>: The number of devices composing the array.
|
||||
Each device consists of two entries. The first is the device
|
||||
containing the metadata (if any); the second is the one containing the
|
||||
@@ -254,7 +261,8 @@ recovery. Here is a fuller description of the individual fields:
|
||||
<data_offset> The current data offset to the start of the user data on
|
||||
each component device of a raid set (see the respective
|
||||
raid parameter to support out-of-place reshaping).
|
||||
<journal_char> 'A' - active raid4/5/6 journal device.
|
||||
<journal_char> 'A' - active write-through journal device.
|
||||
'a' - active write-back journal device.
|
||||
'D' - dead journal device.
|
||||
'-' - no journal device.
|
||||
|
||||
@@ -331,3 +339,7 @@ Version History
|
||||
'D' on the status line. If '- -' is passed into the constructor, emit
|
||||
'- -' on the table line and '-' as the status line health character.
|
||||
1.10.0 Add support for raid4/5/6 journal device
|
||||
1.10.1 Fix data corruption on reshape request
|
||||
1.11.0 Fix table line argument order
|
||||
(wrong raid10_copies/raid10_format sequence)
|
||||
1.11.1 Add raid4/5/6 journal write-back support via journal_mode option
|
||||
|
Reference in New Issue
Block a user