btrfs: dio iomap DSYNC workaround
iomap dio will run generic_write_sync() for us if the iocb is DSYNC. This is problematic for us because of 2 reasons: 1. we hold the inode_lock() during this operation, and we take it in generic_write_sync() 2. we hold a read lock on the dio_sem but take the write lock in fsync Since we don't want to rip out this code right now, but reworking the locking is a bit much to do at this point, work around this problem with this masterpiece of a patch. First, we clear DSYNC on the iocb so that the iomap stuff doesn't know that it needs to handle the sync. We save this fact in current->journal_info, because we need to see do special things once we're in iomap_begin, and we have no way to pass private information into iomap_dio_rw(). Next we specify a separate iomap_dio_ops for sync, which implements an ->end_io() callback that gets called when the dio completes. This is important for AIO, because we really do need to run generic_write_sync() if we complete asynchronously. However if we're still in the submitting context when we enter ->end_io() we clear the flag so that the submitter knows they're the ones that needs to run generic_write_sync(). This is meant to be temporary. We need to work out how to eliminate the inode_lock() and the dio_sem in our fsync and use another mechanism to protect these operations. Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:

committed by
David Sterba

parent
f85781fb50
commit
0eb79294db
@@ -2023,7 +2023,40 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
|
||||
atomic_inc(&BTRFS_I(inode)->sync_writers);
|
||||
|
||||
if (iocb->ki_flags & IOCB_DIRECT) {
|
||||
/*
|
||||
* 1. We must always clear IOCB_DSYNC in order to not deadlock
|
||||
* in iomap, as it calls generic_write_sync() in this case.
|
||||
* 2. If we are async, we can call iomap_dio_complete() either
|
||||
* in
|
||||
*
|
||||
* 2.1. A worker thread from the last bio completed. In this
|
||||
* case we need to mark the btrfs_dio_data that it is
|
||||
* async in order to call generic_write_sync() properly.
|
||||
* This is handled by setting BTRFS_DIO_SYNC_STUB in the
|
||||
* current->journal_info.
|
||||
* 2.2 The submitter context, because all IO completed
|
||||
* before we exited iomap_dio_rw(). In this case we can
|
||||
* just re-set the IOCB_DSYNC on the iocb and we'll do
|
||||
* the sync below. If our ->end_io() gets called and
|
||||
* current->journal_info is set, then we know we're in
|
||||
* our current context and we will clear
|
||||
* current->journal_info to indicate that we need to
|
||||
* sync below.
|
||||
*/
|
||||
if (sync) {
|
||||
ASSERT(current->journal_info == NULL);
|
||||
iocb->ki_flags &= ~IOCB_DSYNC;
|
||||
current->journal_info = BTRFS_DIO_SYNC_STUB;
|
||||
}
|
||||
num_written = __btrfs_direct_write(iocb, from);
|
||||
|
||||
/*
|
||||
* As stated above, we cleared journal_info, so we need to do
|
||||
* the sync ourselves.
|
||||
*/
|
||||
if (sync && current->journal_info == NULL)
|
||||
iocb->ki_flags |= IOCB_DSYNC;
|
||||
current->journal_info = NULL;
|
||||
} else {
|
||||
num_written = btrfs_buffered_write(iocb, from);
|
||||
if (num_written > 0)
|
||||
|
Reference in New Issue
Block a user