Commit Graph

6153 Commits

Author SHA1 Message Date
Shaohua Li
5d8817833c MD: fix null pointer deference
The md device might not have personality (for example, ddf raid array). The
issue is introduced by 8430e7e0af9a15(md: disconnect device from personality
before trying to remove it)

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-07-28 09:06:34 -07:00
Linus Torvalds
f7e6816994 Merge tag 'dm-4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:

 - initially based on Jens' 'for-4.8/core' (given all the flag churn)
   and later merged with 'for-4.8/core' to pickup the QUEUE_FLAG_DAX
   commits that DM depends on to provide its DAX support

 - clean up the bio-based vs request-based DM core code by moving the
   request-based DM core code out to dm-rq.[hc]

 - reinstate bio-based support in the DM multipath target (done with the
   idea that fast storage like NVMe over Fabrics could benefit) -- while
   preserving support for request_fn and blk-mq request-based DM mpath

 - SCSI and DM multipath persistent reservation fixes that were
   coordinated with Martin Petersen.

 - the DM raid target saw the most extensive change this cycle; it now
   provides reshape and takeover support (by layering ontop of the
   corresponding MD capabilities)

 - DAX support for DM core and the linear, stripe and error targets

 - a DM thin-provisioning block discard vs allocation race fix that
   addresses potential for corruption

 - a stable fix for DM verity-fec's block calculation during decode

 - a few cleanups and fixes to DM core and various targets

* tag 'dm-4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (73 commits)
  dm: allow bio-based table to be upgraded to bio-based with DAX support
  dm snap: add fake origin_direct_access
  dm stripe: add DAX support
  dm error: add DAX support
  dm linear: add DAX support
  dm: add infrastructure for DAX support
  dm thin: fix a race condition between discarding and provisioning a block
  dm btree: fix a bug in dm_btree_find_next_single()
  dm raid: fix random optimal_io_size for raid0
  dm raid: address checkpatch.pl complaints
  dm: call PR reserve/unreserve on each underlying device
  sd: don't use the ALL_TG_PT bit for reservations
  dm: fix second blk_delay_queue() parameter to be in msec units not jiffies
  dm raid: change logical functions to actually return bool
  dm raid: use rdev_for_each in status
  dm raid: use rs->raid_disks to avoid memory leaks on free
  dm raid: support delta_disks for raid1, fix table output
  dm raid: enhance reshape check and factor out reshape setup
  dm raid: allow resize during recovery
  dm raid: fix rs_is_recovering() to allow for lvextend
  ...
2016-07-26 17:12:11 -07:00
Toshi Kani
b5ab4a9ba5 dm: allow bio-based table to be upgraded to bio-based with DAX support
Allow table type DM_TYPE_BIO_BASED to extend with DM_TYPE_DAX_BIO_BASED
since DM_TYPE_DAX_BIO_BASED supports bio-based requests.

This is needed to allow a snapshot of an LV with DAX support to be
removed.  One of the intermediate table reloads that lvm2 does switches
from DM_TYPE_BIO_BASED to DM_TYPE_DAX_BIO_BASED.  No known reason to
disallow this so...

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 23:49:52 -04:00
Toshi Kani
f6e629bd23 dm snap: add fake origin_direct_access
dax-capable mapped-device is marked as DM_TYPE_DAX_BIO_BASED,
which supports both dax and bio-based operations.  dm-snap
needs to work with dax-capable device when bio-based operation
is used.

Add fake origin_direct_access() to origin device so that its
origin device is also marked as DM_TYPE_DAX_BIO_BASED for
dax-capable device.  This allows to extend target's DM table.
dm-snap works normally when bio-based operation is used.

dm-snap does not support dax operation, and mount with dax
option to a target device or snapshot device fails.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 23:49:51 -04:00
Toshi Kani
beec25b457 dm stripe: add DAX support
Change dm-stripe to implement direct_access function,
stripe_direct_access(), which maps bdev and sector and
calls direct_access function of its physical target device.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 23:49:51 -04:00
Mike Snitzer
f8df1fdf18 dm error: add DAX support
Allow the error target to replace an existing DAX-enabled target.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 23:49:50 -04:00
Toshi Kani
84b22f8378 dm linear: add DAX support
Change dm-linear to implement direct_access function,
linear_direct_access(), which maps sector and calls direct_access
function of its physical target device.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 23:49:49 -04:00
Toshi Kani
545ed20e6d dm: add infrastructure for DAX support
Change mapped device to implement direct_access function,
dm_blk_direct_access(), which calls a target direct_access function.
'struct target_type' is extended to have target direct_access interface.
This function limits direct accessible size to the dm_target's limit
with max_io_len().

Add dm_table_supports_dax() to iterate all targets and associated block
devices to check for DAX support.  To add DAX support to a DM target the
target must only implement the direct_access function.

Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped
device supports DAX and is bio based.  This new type is used to assure
that all target devices have DAX support and remain that way after
QUEUE_FLAG_DAX is set in mapped device.

At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting
DM_TYPE_DAX_BIO_BASED to the type.  Any subsequent table load to the
mapped device must have the same type, or else it fails per the check in
table_load().

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 23:49:49 -04:00
Christoph Hellwig
ed996a52c8 block: simplify and cleanup bvec pool handling
Instead of a flag and an index just make sure an index of 0 means
no need to free the bvec array.  Also move the constants related
to the bvec pools together and use a consistent naming scheme for
them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:37:02 -06:00
Christoph Hellwig
70246286e9 block: get rid of bio_rw and READA
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense.  Any check for READA is replaced with an
explicit check for REQ_RAHEAD.  Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:37:01 -06:00
Joe Thornber
2a0fbffb1e dm thin: fix a race condition between discarding and provisioning a block
The discard passdown was being issued after the block was unmapped,
which meant the block could be reprovisioned whilst the passdown discard
was still in flight.

We can only identify unshared blocks (safe to do a passdown a discard
to) once they're unmapped and their ref count hits zero.  Block ref
counts are now used to guard against concurrent allocation of these
blocks that are being discarded.  So now we unmap the block, issue
passdown discards, and the immediately increment ref counts for regions
that have been discarded via passed down (this is safe because
allocation occurs within the same thread).  We then decrement ref counts
once the passdown discard IO is complete -- signaling these blocks may
now be allocated.

This fixes the potential for corruption that was reported here:
https://www.redhat.com/archives/dm-devel/2016-June/msg00311.html

Reported-by: Dennis Yang <dennisyang@qnap.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 12:43:35 -04:00
Joe Thornber
e7e0f73047 dm btree: fix a bug in dm_btree_find_next_single()
dm_btree_find_next_single() can short-circuit the search for a block
with a return of -ENODATA if all entries are higher than the search key
passed to lower_bound().

This hasn't been a problem because of the way the btree has been used by
DM thinp.  But it must be fixed now in preparation for fixing the race
in DM thinp's handling of simultaneous block discard vs allocation.
Otherwise, once that fix is in place, some of the blocks in a discard
would not be unmapped as expected.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 12:43:34 -04:00
Tomasz Majchrzak
0e5313e2d4 raid10: improve random reads performance
RAID10 random read performance is lower than expected due to excessive spinlock
utilisation which is required mostly for rebuild/resync. Simplify allow_barrier
as it's in IO path and encounters a lot of unnecessary congestion.

As lower_barrier just takes a lock in order to decrement a counter, convert
counter (nr_pending) into atomic variable and remove the spin lock. There is
also a congestion for wake_up (it uses lock internally) so call it only when
it's really needed. As wake_up is not called constantly anymore, ensure process
waiting to raise a barrier is notified when there are no more waiting IOs.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-07-19 15:20:28 -07:00
Tomasz Majchrzak
573275b58e md: add missing sysfs_notify on array_state update
Changeset 6791875e2e has added early return from a function so there is no
sysfs notification for 'active' and 'clean' state change.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-07-19 11:28:39 -07:00
Alexey Obitotskiy
4cb9da7d9c Fix kernel module refcount handling
md loads raidX modules and increments module refcount each time level
has changed but does not decrement it. You are unable to unload raid0
module after reshape because raid0 reshape changes level to raid4
and back to raid0.

Signed-off-by: Aleksey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-07-19 11:17:31 -07:00
Arnd Bergmann
0e3ef49eda md: use seconds granularity for error logging
The md code stores the exact time of the last error in the
last_read_error variable using a timespec structure. It only
ever uses the seconds portion of that though, so we can
use a scalar for it.

There won't be an overflow in 2038 here, because it already
used monotonic time and 32-bit is enough for that, but I've
decided to use time64_t for consistency in the conversion.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-07-19 11:00:47 -07:00
Heinz Mauelshagen
89d3d9a1e3 dm raid: fix random optimal_io_size for raid0
raid_io_hints() was retrieving the number of data stripes used for the
calculation of io_opt from struct r5conf, which is not defined for raid0
mappings.

Base the calculation on the in-core raid_set structure instead.

Also, adjust to use to_bytes() for the sector -> bytes conversion
throughout.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-19 11:37:08 -04:00
Heinz Mauelshagen
094f394df6 dm raid: address checkpatch.pl complaints
Use 'unsigned int' where appropriate.
Return negative errors.
Correct an indentation.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-19 11:37:07 -04:00
Christoph Hellwig
9c72bad1f3 dm: call PR reserve/unreserve on each underlying device
So far we tried to rely on the SCSI 'all target ports' bit to register
all path, but for many setups this didn't work properly as the different
paths are seen as separate initiators to the target instead of multiple
ports of the same initiator.  Because of that we'll stop setting the
'all target ports' bit in SCSI, and let device mapper handle iterating
over the device for each path and register them manually.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:35 -04:00
Tahsin Erdogan
bd9f55ea1c dm: fix second blk_delay_queue() parameter to be in msec units not jiffies
Commit d548b34b06 ("dm: reduce the queue delay used in dm_request_fn
from 100ms to 10ms") always intended the value to be 10 msecs -- it
just expressed it in jiffies because earlier commit 7eaceaccab ("block:
remove per-queue plugging") did.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: d548b34b06 ("dm: reduce the queue delay used in dm_request_fn from 100ms to 10ms")
Cc: stable@vger.kernel.org # 4.1+ -- stable@ backports must be applied to drivers/md/dm.c
2016-07-18 15:37:34 -04:00
Heinz Mauelshagen
d7ccc2e2a0 dm raid: change logical functions to actually return bool
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:33 -04:00
Heinz Mauelshagen
326824099f dm raid: use rdev_for_each in status
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:33 -04:00
Heinz Mauelshagen
ffeeac7515 dm raid: use rs->raid_disks to avoid memory leaks on free
Also makes code more consistent throughout.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:32 -04:00
Heinz Mauelshagen
7a7c330fc2 dm raid: support delta_disks for raid1, fix table output
Add "delta_disks" constructor argument support to raid1 to allow for
consistent userspace disk addition/removal handling.

Fix raid_status() to report all raid disks with status and table output
on disk adding reshapes, not just the ones listed on the mddev; optimize
its rebuild and writemostly output.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:31 -04:00
Heinz Mauelshagen
469b304b58 dm raid: enhance reshape check and factor out reshape setup
Enhance rs_reshape_requested() check function to be more transparent and
fix its raid10 check.

Streamline the constructor by factoring out reshaping preparation into
fucntion rs_prepare_reshape().

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:31 -04:00
Heinz Mauelshagen
2a5556c2a8 dm raid: allow resize during recovery
Resizing a RAID set during recovery can be allowed, because the MD
resynchronization thread will either stop any ongoing recovery in case
of shrinking below the current recovery position or carry on recovery
to the new size if the set is growing.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:30 -04:00
Heinz Mauelshagen
345a6cdc25 dm raid: fix rs_is_recovering() to allow for lvextend
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:29 -04:00
Heinz Mauelshagen
37f10be150 dm raid: fix rebuild and catch bogus sync/resync flags
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:28 -04:00
Heinz Mauelshagen
b1956dc4fa dm raid: fix ctr memory leaks on error paths
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:28 -04:00
Heinz Mauelshagen
65359ee6b1 dm raid: fix typo in write_mostly flag
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:27 -04:00
Heinz Mauelshagen
4348309a8b dm raid: also reject size change during recovery
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:26 -04:00
Heinz Mauelshagen
f6895fd505 dm raid: fix new superblock/bitmap creation on disk addition
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:26 -04:00
Heinz Mauelshagen
2527b56e0d dm raid: add comments and fix typos
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:25 -04:00
Heinz Mauelshagen
fbe6365bb4 dm raid: fix raid10 device size error on out-of-place reshape
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:24 -04:00
Heinz Mauelshagen
2d92a3c2a4 dm raid: prohibit 'nosync' on new raid6 and reject resize during reshape
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:24 -04:00
Heinz Mauelshagen
4dff2f1e26 dm raid: clarify and fix recovery
Add function rs_setup_recovery() to allow for defined setup of RAID set
recovery in the constructor.

Will be called with dev_sectors={0, rdev->sectors, MaxSectors} to
recover a new or enforced sync, grown or not to be synhronized RAID set
respectively.

Prevents recovery on raid0, which doesn't support it.

Enforces recovery on raid6 to ensure properly defined Syndromes
mandatory for that MD personality are being created.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:23 -04:00
Heinz Mauelshagen
0095dbc98b dm raid: fix rs_set_capacity on growing reshape
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:22 -04:00
Heinz Mauelshagen
9d9d939c80 dm raid: make rs_set_capacity to work on shrinking reshape
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:22 -04:00
Heinz Mauelshagen
6ee0bae9c8 dm raid: enhance comments in takeover checks
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:21 -04:00
Heinz Mauelshagen
ae3c6cfff9 dm raid: remove bogus comment and fix comment typos
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:20 -04:00
Heinz Mauelshagen
75dd3b9ecb dm raid: more restricting data_offset value checks
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:19 -04:00
Heinz Mauelshagen
5fa146b25b dm raid: reject too many write_mostly devices
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:19 -04:00
Heinz Mauelshagen
0a7b818892 dm raid: the sync_page_io() metadata_op argument is bool
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:18 -04:00
Heinz Mauelshagen
0d851d14b8 dm raid: prohibit to pass in both sync and nosync ctr flags
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:17 -04:00
Heinz Mauelshagen
ff4a88bf1c dm raid: avoid superfluous memory barriers on static metadata
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:17 -04:00
Mike Snitzer
7193a9defc dm rq: check kthread_run return for .request_fn request-based DM
Check return value of kthread_run() in dm_old_init_request_queue().

Reported-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-06 09:06:37 -04:00
Yijing Wang
89b920e003 bcache: Remove redundant block_size assignment
We have assigned sb->block_size before the switch,
so remove the redundant one.

Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Eric Wheeler <bcache@lists.ewheeler.net>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05 11:34:50 -06:00
Yijing Wang
7abc70d700 bcache: update document info
There is no return in continue_at(), update the documentation.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05 11:34:49 -06:00
Yijing Wang
c50d4d5dd3 bcache: Remove redundant parameter for cache_alloc()
Cache_sb is not used in cache_alloc, and we have copied
sb info to cache->sb already, remove it.

Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05 11:34:47 -06:00
Sami Tolvanen
602d1657c6 dm verity fec: fix block calculation
do_div was replaced with div64_u64 at some point, causing a bug with
block calculation due to incompatible semantics of the two functions.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Fixes: a739ff3f54 ("dm verity: add support for forward error correction")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-01 23:29:08 -04:00