android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Guoqing Jiang	1cdd125794	md/raid10: fix FailFast test for wrong device We need to test FailFast flag for replacement device here since the set up for writing is for the replacement, so we need fix it like: - if (test_bit(FailFast, &conf->mirrors[d].rdev->flags)) + if (test_bit(FailFast, &conf->mirrors[d].replacement->flags)) Since commit `f90145f317` ("md/raid10: add rcu protection to rdev access in raid10_sync_request.") had added the rcu protection for the part, so let's extend the range protected by rcu and use rdev directly. Fixes: `1919cbb` ("md/raid10: add failfast handling for writes.") Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-16 12:04:08 -07:00
Jens Axboe	c27b2d634f	Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-4.13/block Pull NVMe changes for 4.13 from Christoph: Highlights: - UUID identifier support from Johannes - Lots of cleanups from Sagi - Host Memory Buffer support from me And lots of cleanups and smaller fixes of course. Note that the UUID identifier changes are based on top of the uuid tree. I am the maintainer of that tree and will send it to Linus as soon as 4.12 is released as various other trees depend on it as well (and the diffstat includes those changes unfortunately)	2017-06-16 10:14:59 -06:00
Dan Williams	abebfbe2f7	dm: add ->flush() dax operation support Allow device-mapper to route flush operations to the per-target implementation. In order for the device stacking to work we need a dax_dev and a pgoff relative to that device. This gives each layer of the stack the information it needs to look up the operation pointer for the next level. This conceptually allows for an array of mixed device drivers with varying flush implementations. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-15 14:34:59 -07:00
Mike Snitzer	cd15fb64ee	Revert "dm mirror: use all available legs on multiple failures" This reverts commit `12a7cf5ba6`. This commit apparently attempted to fix an issue that didn't really exist, furthermore: this commit is the source of deadlocks and crashes seen in multiple cases related to failing the primary mirror dev while syncing. Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-06-15 08:39:15 -04:00
Dan Carpenter	047385b3dd	dm: missing break in process_queued_bios() his used to be a fall through case, but we shifted code around and I think we want a break here now. Fixes: `4e4cbee93d` ("block: switch bios to blk_status_t") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-14 08:22:55 -06:00
Mikulas Patocka	f9c79bc05a	md: don't use flush_signals in userspace processes The function flush_signals clears all pending signals for the process. It may be used by kernel threads when we need to prepare a kernel thread for responding to signals. However using this function for an userspaces processes is incorrect - clearing signals without the program expecting it can cause misbehavior. The raid1 and raid5 code uses flush_signals in its request routine because it wants to prepare for an interruptible wait. This patch drops flush_signals and uses sigprocmask instead to block all signals (including SIGKILL) around the schedule() call. The signals are not lost, but the schedule() call won't respond to them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Acked-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-13 10:18:02 -07:00
NeilBrown	cc27b0c78c	md: fix deadlock between mddev_suspend() and md_write_start() If mddev_suspend() races with md_write_start() we can deadlock with mddev_suspend() waiting for the request that is currently in md_write_start() to complete the ->make_request() call, and md_write_start() waiting for the metadata to be updated to mark the array as 'dirty'. As metadata updates done by md_check_recovery() only happen then the mddev_lock() can be claimed, and as mddev_suspend() is often called with the lock held, these threads wait indefinitely for each other. We fix this by having md_write_start() abort if mddev_suspend() is happening, and ->make_request() aborts if md_write_start() aborted. md_make_request() can detect this abort, decrease the ->active_io count, and wait for mddev_suspend(). Reported-by: Nix <nix@esperi.org.uk> Fix: 68866e425be2(MD: no sync IO while suspended) Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-13 10:18:01 -07:00
Christoph Hellwig	fdd050b5b3	Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into nvme-base	2017-06-13 11:45:14 +02:00
Ondrej Mosnáček	2ad50606f8	dm integrity: reject mappings too large for device dm-integrity would successfully create mappings with the number of sectors greater than the provided data sector count. Attempts to read sectors of this mapping that were beyond the provided data sector count would then yield run-time messages of the form "device-mapper: integrity: Too big sector number: ...". Fix this by emitting an error when the requested mapping size is bigger than the provided data sector count. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-06-12 17:05:55 -04:00
Jens Axboe	8f66439eec	Merge tag 'v4.12-rc5' into for-4.13/block We've already got a few conflicts and upcoming work depends on some of the changes that have gone into mainline as regression fixes for this series. Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream trees to continue working on 4.13 changes. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-12 08:30:13 -06:00
Dan Williams	7e026c8c0a	dm: add ->copy_from_iter() dax operation support Allow device-mapper to route copy_from_iter operations to the per-target implementation. In order for the device stacking to work we need a dax_dev and a pgoff relative to that device. This gives each layer of the stack the information it needs to look up the operation pointer for the next level. This conceptually allows for an array of mixed device drivers with varying copy_from_iter implementations. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-09 09:22:21 -07:00
Christoph Hellwig	4e4cbee93d	block: switch bios to blk_status_t Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	fc17b6534e	blk-mq: switch ->queue_rq return value to blk_status_t Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	2a842acab1	block: introduce new block status code type Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	1be5690984	dm: change ->end_io calling convention Turn the error paramter into a pointer so that target drivers can change the value, and make sure only DM_ENDIO_* values are returned from the methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	846785e6a5	dm: don't return errnos from ->map Instead use the special DM_MAPIO_KILL return value to return -EIO just like we do for the request based path. Note that dm-log-writes returned -ENOMEM in a few places, which now becomes -EIO instead. No consumer treats -ENOMEM special so this shouldn't be an issue (and it should use a mempool to start with to make guaranteed progress). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	14ef1e4826	dm mpath: merge do_end_io_bio into multipath_end_io_bio This simplifies the code and especially the error passing a bit and will help with the next patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	9966afaf91	dm: fix REQ_RAHEAD handling A few (but not all) dm targets use a special EWOULDBLOCK error code for failing REQ_RAHEAD requests that fail due to a lack of available resources. But no one else knows about this magic code, and lower level drivers also don't generate it when failing read-ahead requests for similar reasons. So remove this special casing and ignore all additional error handling for REQ_RAHEAD - if this was a real underlying error we'd get a normal read once the real read comes in. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
NeilBrown	a415c0f106	md: initialise ->writes_pending in personality modules. The new per-cpu counter for writes_pending is initialised in md_alloc(), which is not called by dm-raid. So dm-raid fails when md_write_start() is called. Move the initialization to the personality modules that need it. This way it is always initialised when needed, but isn't unnecessarily initialized (requiring memory allocation) when the personality doesn't use writes_pending. Reported-by: Heinz Mauelshagen <heinzm@redhat.com> Fixes: `4ad23a9764` ("MD: use per-cpu counter for writes_pending") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-05 16:04:35 -07:00
Amir Goldstein	e6fd2093a8	md: namespace private helper names The md private helper uuid_equal() collides with a generic helper of the same name. Rename the md private helper to md_uuid_equal() and do the same for md_sb_equal(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Shaohua Li <shli@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>	2017-06-05 16:56:37 +02:00
Linus Torvalds	60c42a31dc	Merge tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - a DM verity fix for a mode when no salt is used - a fix to DM to account for the possibility that PREFLUSH or FUA are used without the SYNC flag if the underlying storage doesn't have a volatile write-cache - a DM ioctl memory allocation flag fix to use __GFP_HIGH to allow emergency forward progress (by using memory reserves as last resort) - a small DM integrity cleanup to use kvmalloc() instead of duplicating the same * tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: make flush bios explicitly sync dm ioctl: restore __GFP_HIGH in copy_params() dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc() dm verity: fix no salt use case	2017-06-02 11:50:37 -07:00
Jan Kara	5a8948f8a3	md: Make flush bios explicitely sync Commit `b685d3d65a` "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA\|PREFLUSH\|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. CC: linux-raid@vger.kernel.org CC: Shaohua Li <shli@kernel.org> Fixes: `b685d3d65a` CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-31 09:25:53 -07:00
Jan Kara	ff0361b34a	dm: make flush bios explicitly sync Commit `b685d3d65a` ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous") removed REQ_SYNC flag from WRITE_{FUA\|PREFLUSH\|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions. Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. Fixes: `b685d3d65a` ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous") Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-31 10:50:23 -04:00
Nix	e153903686	md: report sector of stripes with check mismatches This makes it possible, with appropriate filesystem support, for a sysadmin to tell what is affected by the mismatch, and whether it should be ignored (if it's inside a swap partition, for instance). We ratelimit to prevent log flooding: if there are so many mismatches that ratelimiting is necessary, the individual messages are relatively unlikely to be important (either the machine is swapping like crazy or something is very wrong with the disk). Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-24 16:31:13 -07:00
Kyungchan Koh	4179bc30b2	md: uuid debug statement now in processor byte order. Previously, the uuid debug statements were printed in little-endian format, which wasn't consistent in machines that might not be in little-endian byte order. With this change, the output will be consistent for all machines with different byte-ordering. Signed-off-by: Kyungchan Koh <kkc6196@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-24 15:58:43 -07:00
Junaid Shahid	8c1e2162f2	dm ioctl: restore __GFP_HIGH in copy_params() Commit `d224e93818` ("drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant") left out the __GFP_HIGH flag when converting from __vmalloc to kvmalloc. This can cause the DM ioctl to fail in some low memory situations where it wouldn't have failed earlier. Add __GFP_HIGH back to avoid any potential regression. Fixes: `d224e93818` ("drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant") Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-22 19:30:03 -04:00
Mikulas Patocka	702a6204f8	dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc() Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-22 14:09:52 -04:00
Gilad Ben-Yossef	f52236e0b0	dm verity: fix no salt use case DM-Verity has an (undocumented) mode where no salt is used. This was never handled directly by the DM-Verity code, instead working due to the fact that calling crypto_shash_update() with a zero length data is an implicit noop. This is no longer the case now that we have switched to crypto_ahash_update(). Fix the issue by introducing explicit handling of the no salt use case to DM-Verity. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Reported-by: Marian Csontos <mcsontos@redhat.com> Fixes: `d1ac3ff` ("dm verity: switch to using asynchronous hash crypto API") Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-22 13:49:03 -04:00
Guoqing Jiang	2dffdc0724	md-cluster: fix potential lock issue in add_new_disk The add_new_disk returns with communication locked if __sendmsg returns failure, fix it with call unlock_comm before return. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> CC: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-21 20:37:09 -07:00
Linus Torvalds	8b4822de59	Merge tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD fixes from Shaohua Li: - Several bug fixes for raid5-cache from Song Liu, mainly handle journal disk error - Fix bad block handling in choosing raid1 disk from Tomasz Majchrzak - Simplify external metadata array sysfs handling from Artur Paszkiewicz - Optimize raid0 discard handling from me, now raid0 will dispatch large discard IO directly to underlayer disks. * tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid1: prefer disk without bad blocks md/r5cache: handle sync with data in write back cache md/r5cache: gracefully handle journal device errors for writeback mode md/raid1/10: avoid unnecessary locking md/raid5-cache: in r5l_do_submit_io(), submit io->split_bio first md/md0: optimize raid0 discard handling md: don't return -EAGAIN in md_allow_write for external metadata arrays md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock	2017-05-18 12:04:41 -07:00
Colin Ian King	7e1b9521f5	dm cache: handle kmalloc failure allocating background_tracker struct Currently there is no kmalloc failure check on the allocation of the background_tracker struct in btracker_create(), and so a NULL return will lead to a NULL pointer dereference. Add a NULL check. Detected by CoverityScan, CID#1416587 ("Dereference null return value") Fixes: `b29d4986d` ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-17 09:44:53 -04:00
Mikulas Patocka	13840d3801	dm bufio: make the parameter "retain_bytes" unsigned long Change the type of the parameter "retain_bytes" from unsigned to unsigned long, so that on 64-bit machines the user can set more than 4GiB of data to be retained. Also, change the type of the variable "count" in the function "__evict_old_buffers" to unsigned long. The assignment "count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];" could result in unsigned long to unsigned overflow and that could result in buffers not being freed when they should. While at it, avoid division in get_retain_buffers(). Division is slow, we can change it to shift because we have precalculated the log2 of block size. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-16 15:12:08 -04:00
Christoph Hellwig	f98e0eb680	dm mpath: multipath_clone_and_map must not return -EIO Since `412445ac` ("dm: introduce a new DM_MAPIO_KILL return value"), the clone_and_map_rq methods must not return errno values, so fix it up to properly return DM_MAPIO_KILL, instead of the -EIO value that snuck in due to a conflict between two patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:53 -04:00
Christoph Hellwig	18a482f524	dm mpath: don't return -EIO from dm_report_EIO Instead just turn the macro into a helper for the warning message. This removes an unnecessary assignment and will allow the next commit to fix a place where -EIO is the wrong return value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:52 -04:00
Christoph Hellwig	ece0728037	dm rq: add a missing break to map_request We don't want to bug when receiving a DM_MAPIO_KILL value.. Fixes: `412445ac` ("dm: introduce a new DM_MAPIO_KILL return value") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:51 -04:00
Joe Thornber	0377a07c7a	dm space map disk: fix some book keeping in the disk space map When decrementing the reference count for a block, the free count wasn't being updated if the reference count went to zero. Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:50 -04:00
Joe Thornber	91bcdb92d3	dm thin metadata: call precommit before saving the roots These calls were the wrong way round in __write_initial_superblock. Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:49 -04:00
Joe Thornber	2e63309507	dm cache policy smq: don't do any writebacks unless IDLE If there are no clean blocks to be demoted the writeback will be triggered at that point. Preemptively writing back can hurt high IO load scenarios. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00
Joe Thornber	49b7f76890	dm cache: simplify the IDLE vs BUSY state calculation Drop the MODERATE state since it wasn't buying us much. Also, in check_migrations(), prepare for the next commit ("dm cache policy smq: don't do any writebacks unless IDLE") by deferring to the policy to make the final decision on whether writebacks can be serviced. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00
Joe Thornber	701e03e4e1	dm cache: track all IO to the cache rather than just the origin device's IO IO tracking used to throttle writebacks when the origin device is busy. Even if all the IO is going to the fast device, writebacks can significantly degrade performance. So track all IO to gauge whether the cache is busy or not. Otherwise, synthetic IO tests (e.g. fio) that might send all IO to the fast device wouldn't cause writebacks to get throttled. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00
Joe Thornber	6cf4cc8f8b	dm cache policy smq: stop preemptively demoting blocks It causes a lot of churn if the working set's size is close to the fast device's size. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00
Joe Thornber	4d44ec5ab7	dm cache policy smq: put newly promoted entries at the top of the multiqueue This stops entries bouncing in and out of the cache quickly. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00
Joe Thornber	78c45607b9	dm cache policy smq: be more aggressive about triggering a writeback If there are no clean entries to demote we really want to writeback immediately. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:32 -04:00
Joe Thornber	a8cd1eba61	dm cache policy smq: only demote entries in bottom half of the clean multiqueue Heavy IO load may mean there are very few clean blocks in the cache, and we risk demoting entries that get hit a lot. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:32 -04:00
Joe Thornber	072792dcdf	dm cache: fix incorrect 'idle_time' reset in IO tracker Some bios have no payload (eg, a FLUSH), don't reset the idle_time when these come in. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:53:11 -04:00
Tomasz Majchrzak	d82dd0e34d	raid1: prefer disk without bad blocks If an array consists of two drives and the first drive has the bad block, the read request to the region overlapping the bad block chooses the same disk (with bad block) as device to read from over and over and the request gets stuck. If the first disk only partially overlaps with bad block, it becomes a candidate ("best disk") for shorter range of sectors. The second disk is capable of reading the entire requested range and it is updated accordingly, however it is not recorded as a best device for the request. In the end the request is sent to the first disk to read entire range of sectors. It fails and is re-tried in a moment but with the same outcome. Actually it is quite likely scenario but it had little exposure in my test until commit 715d40b93b10 ("md/raid1: add failfast handling for reads.") removed preference for idle disk. Such scenario had been passing as second disk was always chosen when idle. Reset a candidate ("best disk") to read from if disk can read entire range. Do it only if other disk has already been chosen as a candidate for a smaller range. The head position / disk type logic will select the best disk to read from - it is fine as disk with bad block won't be considered for it. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-12 14:41:15 -07:00
Song Liu	5ddf0440a1	md/r5cache: handle sync with data in write back cache Currently, sync of raid456 array cannot make progress when hitting data in writeback r5cache. This patch fixes this issue by flushing cached data of the stripe before processing the sync request. This is achived by: 1. In handle_stripe(), do not set STRIPE_SYNCING if the stripe is in write back cache; 2. In r5c_try_caching_write(), handle the stripe in sync with write through; 3. In do_release_stripe(), make stripe in sync write out and send it to the state machine. Shaohua: explictly set STRIPE_HANDLE after write out completed Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-11 22:14:40 -07:00
Song Liu	70d466f760	md/r5cache: gracefully handle journal device errors for writeback mode For the raid456 with writeback cache, when journal device failed during normal operation, it is still possible to persist all data, as all pending data is still in stripe cache. However, it is necessary to handle journal failure gracefully. During journal failures, the following logic handles the graceful shutdown of journal: 1. raid5_error() marks the device as Faulty and schedules async work log->disable_writeback_work; 2. In disable_writeback_work (r5c_disable_writeback_async), the mddev is suspended, set to write through, and then resumed. mddev_suspend() flushes all cached stripes; 3. All cached stripes need to be flushed carefully to the RAID array. This patch fixes issues within the process above: 1. In r5c_update_on_rdev_error() schedule disable_writeback_work for journal failures; 2. In r5c_disable_writeback_async(), wait for MD_SB_CHANGE_PENDING, since raid5_error() updates superblock. 3. In handle_stripe(), allow stripes with data in journal (s.injournal > 0) to make progress during log_failed; 4. In delay_towrite(), if log failed only process data in the cache (skip new writes in dev->towrite); 5. In __get_priority_stripe(), process loprio_list during journal device failures. 6. In raid5_remove_disk(), wait for all cached stripes are flushed before calling log_exit(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-11 22:11:11 -07:00
Shaohua Li	23b245c04d	md/raid1/10: avoid unnecessary locking If we add bios to block plugging list, locking is unnecessry, since the block unplug is guaranteed not to run at that time. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-11 15:32:17 -07:00
Song Liu	bb3338d347	md/raid5-cache: in r5l_do_submit_io(), submit io->split_bio first In r5l_do_submit_io(), it is necessary to check io->split_bio before submit io->current_bio. This is because, endio of current_bio may free the whole IO unit, and thus change io->split_bio. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-10 10:07:55 -07:00

... 8 9 10 11 12 ...

5319 Commits