Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block

Pull core block layer changes from Jens Axboe:
 "This is the core block IO pull request for 3.18.  Apart from the new
  and improved flush machinery for blk-mq, this is all mostly bug fixes
  and cleanups.

   - blk-mq timeout updates and fixes from Christoph.

   - Removal of REQ_END, also from Christoph.  We pass it through the
     ->queue_rq() hook for blk-mq instead, freeing up one of the request
     bits.  The space was overly tight on 32-bit, so Martin also killed
     REQ_KERNEL since it's no longer used.

   - blk integrity updates and fixes from Martin and Gu Zheng.

   - Update to the flush machinery for blk-mq from Ming Lei.  Now we
     have a per hardware context flush request, which both cleans up the
     code should scale better for flush intensive workloads on blk-mq.

   - Improve the error printing, from Rob Elliott.

   - Backing device improvements and cleanups from Tejun.

   - Fixup of a misplaced rq_complete() tracepoint from Hannes.

   - Make blk_get_request() return error pointers, fixing up issues
     where we NULL deref when a device goes bad or missing.  From Joe
     Lawrence.

   - Prep work for drastically reducing the memory consumption of dm
     devices from Junichi Nomura.  This allows creating clone bio sets
     without preallocating a lot of memory.

   - Fix a blk-mq hang on certain combinations of queue depths and
     hardware queues from me.

   - Limit memory consumption for blk-mq devices for crash dump
     scenarios and drivers that use crazy high depths (certain SCSI
     shared tag setups).  We now just use a single queue and limited
     depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
  block: Remove REQ_KERNEL
  blk-mq: allocate cpumask on the home node
  bio-integrity: remove the needless fail handle of bip_slab creating
  block: include func name in __get_request prints
  block: make blk_update_request print prefix match ratelimited prefix
  blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
  block: fix alignment_offset math that assumes io_min is a power-of-2
  blk-mq: Make bt_clear_tag() easier to read
  blk-mq: fix potential hang if rolling wakeup depth is too high
  block: add bioset_create_nobvec()
  block: use bio_clone_fast() in blk_rq_prep_clone()
  block: misplaced rq_complete tracepoint
  sd: Honor block layer integrity handling flags
  block: Replace strnicmp with strncasecmp
  block: Add T10 Protection Information functions
  block: Don't merge requests if integrity flags differ
  block: Integrity checksum flag
  block: Relocate bio integrity flags
  block: Add a disk flag to block integrity profile
  block: Add prefix to block integrity profile flags
  ...
This commit is contained in:
Linus Torvalds
2014-10-18 11:53:51 -07:00
65 changed files with 1210 additions and 1172 deletions

View File

@@ -53,6 +53,14 @@ Description:
512 bytes of data.
What: /sys/block/<disk>/integrity/device_is_integrity_capable
Date: July 2014
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Indicates whether a storage device is capable of storing
integrity metadata. Set if the device is T10 PI-capable.
What: /sys/block/<disk>/integrity/write_generate
Date: June 2008
Contact: Martin K. Petersen <martin.petersen@oracle.com>

View File

@@ -129,11 +129,11 @@ interface for this is being worked on.
4.1 BIO
The data integrity patches add a new field to struct bio when
CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer
to a struct bip which contains the bio integrity payload. Essentially
a bip is a trimmed down struct bio which holds a bio_vec containing
the integrity metadata and the required housekeeping information (bvec
pool, vector count, etc.)
CONFIG_BLK_DEV_INTEGRITY is enabled. bio_integrity(bio) returns a
pointer to a struct bip which contains the bio integrity payload.
Essentially a bip is a trimmed down struct bio which holds a bio_vec
containing the integrity metadata and the required housekeeping
information (bvec pool, vector count, etc.)
A kernel subsystem can enable data integrity protection on a bio by
calling bio_integrity_alloc(bio). This will allocate and attach the
@@ -192,16 +192,6 @@ will require extra work due to the application tag.
supported by the block device.
int bdev_integrity_enabled(block_device, int rw);
bdev_integrity_enabled() will return 1 if the block device
supports integrity metadata transfer for the data direction
specified in 'rw'.
bdev_integrity_enabled() honors the write_generate and
read_verify flags in sysfs and will respond accordingly.
int bio_integrity_prep(bio);
To generate IMD for WRITE and to set up buffers for READ, the
@@ -216,36 +206,6 @@ will require extra work due to the application tag.
bio_integrity_enabled() returned 1.
int bio_integrity_tag_size(bio);
If the filesystem wants to use the application tag space it will
first have to find out how much storage space is available.
Because tag space is generally limited (usually 2 bytes per
sector regardless of sector size), the integrity framework
supports interleaving the information between the sectors in an
I/O.
Filesystems can call bio_integrity_tag_size(bio) to find out how
many bytes of storage are available for that particular bio.
Another option is bdev_get_tag_size(block_device) which will
return the number of available bytes per hardware sector.
int bio_integrity_set_tag(bio, void *tag_buf, len);
After a successful return from bio_integrity_prep(),
bio_integrity_set_tag() can be used to attach an opaque tag
buffer to a bio. Obviously this only makes sense if the I/O is
a WRITE.
int bio_integrity_get_tag(bio, void *tag_buf, len);
Similarly, at READ I/O completion time the filesystem can
retrieve the tag buffer using bio_integrity_get_tag().
5.3 PASSING EXISTING INTEGRITY METADATA
Filesystems that either generate their own integrity metadata or
@@ -298,8 +258,6 @@ will require extra work due to the application tag.
.name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
.generate_fn = my_generate_fn,
.verify_fn = my_verify_fn,
.get_tag_fn = my_get_tag_fn,
.set_tag_fn = my_set_tag_fn,
.tuple_size = sizeof(struct my_tuple_size),
.tag_size = <tag bytes per hw sector>,
};
@@ -321,7 +279,5 @@ will require extra work due to the application tag.
are available per hardware sector. For DIF this is either 2 or
0 depending on the value of the Control Mode Page ATO bit.
See 6.2 for a description of get_tag_fn and set_tag_fn.
----------------------------------------------------------------------
2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>