read_block_for_search() can be simplified as:
tmp = find_extent_buffer();
if (tmp)
return;
...
free_extent_buffer();
read_tree_block();
Apparently, @tmp must be NULL at this point, free_extent_buffer() is not
needed.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since commit dc2d3005d2 ("btrfs: remove dead create_space_info
calls"), there is only one caller btrfs_init_space_info. However, it
doesn't need create_space_info to return space_info at all.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As verify_level_key() is checked after verify_parent_transid(), i.e.
if (verify_parent_transid())
ret = -EIO;
else if (verify_level_key())
ret = -EUCLEAN;
if parent_transid is 0, verify_parent_transid() skips verifying
parent_transid and considers eb as valid, and if verify_level_key()
reports something wrong, we're not going to know if it's caused by
corrupted metadata or non-checkecd eb (e.g. stale eb).
The stale eb can be from an outdated raid1 mirror after a degraded
mount, see eg "btrfs: fix reading stale metadata blocks after degraded
raid1 mounts" (02a3307aa9) for more details.
@parent_transid is able to tell whether the eb's generation has been
verified by the caller.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Error message from qgroup_rescan_init() mostly looks like:
BTRFS info (device nvme0n1p1): qgroup_rescan_init failed with -115
Which is far from meaningful, and sometimes confusing as for above
-EINPROGRESS it's mostly (despite the init race) harmless, but sometimes
it can also indicate problem if the return value is -EINVAL.
Change it to some more meaningful messages like:
BTRFS info (device nvme0n1p1): qgroup rescan is already in progress
And
BTRFS err(device nvme0n1p1): qgroup rescan init failed, qgroup is not enabled
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ update the messages and level ]
Signed-off-by: David Sterba <dsterba@suse.com>
If we have invalid flags set, when we error out we must drop our writer
counter and free the buffer we allocated for the arguments. This bug is
trivially reproduced with the following program on 4.7+:
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/btrfs.h>
#include <linux/btrfs_tree.h>
int main(int argc, char **argv)
{
struct btrfs_ioctl_vol_args_v2 vol_args = {
.flags = UINT64_MAX,
};
int ret;
int fd;
if (argc != 2) {
fprintf(stderr, "usage: %s PATH\n", argv[0]);
return EXIT_FAILURE;
}
fd = open(argv[1], O_WRONLY);
if (fd == -1) {
perror("open");
return EXIT_FAILURE;
}
ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args);
if (ret == -1)
perror("ioctl");
close(fd);
return EXIT_SUCCESS;
}
When unmounting the filesystem, we'll hit the
WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the
filesystem to be remounted read-only as the writer count will stay
lifted.
Fixes: 6b526ed70c ("btrfs: introduce device delete by devid")
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For inlined extent, we only have one segment, thus less things to check.
And further more, inlined extent always has the csum in its leaf header,
it's less probable to have corrupted data.
Anyway, still check header and segment header.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
James Harvey reported that some corrupted compressed extent data can
lead to various kernel memory corruption.
Such corrupted extent data belongs to inode with NODATASUM flags, thus
data csum won't help us detecting such bug.
If lucky enough, KASAN could catch it like:
BUG: KASAN: slab-out-of-bounds in lzo_decompress_bio+0x384/0x7a0 [btrfs]
Write of size 4096 at addr ffff8800606cb0f8 by task kworker/u16:0/2338
CPU: 3 PID: 2338 Comm: kworker/u16:0 Tainted: G O 4.17.0-rc5-custom+ #50
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
Call Trace:
dump_stack+0xc2/0x16b
print_address_description+0x6a/0x270
kasan_report+0x260/0x380
memcpy+0x34/0x50
lzo_decompress_bio+0x384/0x7a0 [btrfs]
end_compressed_bio_read+0x99f/0x10b0 [btrfs]
bio_endio+0x32e/0x640
normal_work_helper+0x15a/0xea0 [btrfs]
process_one_work+0x7e3/0x1470
worker_thread+0x1b0/0x1170
kthread+0x2db/0x390
ret_from_fork+0x22/0x40
...
The offending compressed data has the following info:
Header: length 32768 (looks completely valid)
Segment 0 Header: length 3472882419 (obviously out of bounds)
Then when handling segment 0, since it's over the current page, we need
the copy the compressed data to temporary buffer in workspace, then such
large size would trigger out-of-bounds memory access, screwing up the
whole kernel.
Fix it by adding extra checks on header and segment headers to ensure we
won't access out-of-bounds, and even checks the decompressed data won't
be out-of-bounds.
Reported-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ updated comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
as it is, the logics in native io_submit(2) is "if asked for
more than LONG_MAX/sizeof(pointer) iocbs to submit, don't
bother with more than LONG_MAX/sizeof(pointer)" (i.e.
512M requests on 32bit and 1E requests on 64bit) while
compat io_submit(2) goes with "stop after the first
PAGE_SIZE/sizeof(pointer) iocbs", i.e. 1K or so. Which is
* inconsistent
* *way* too much in native case
* possibly too little in compat one
and
* wrong anyway, since the natural point where we
ought to stop bothering is ctx->nr_events
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
get rid of insane "copy array of 32bit pointers into an array of
native ones" glue.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The logics for 'avail' is
* not past the tail of cyclic buffer
* no more than asked
* not past the end of buffer
* not past the end of a page
Unobfuscate the last part.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... so just make them return 0 when caller does not need to destroy iocb
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull AFS fixes from David Howells:
- fix a BUG triggerable from faccessat()
- fix the mounting of backup volumes
* tag 'afs-fixes-20180529' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix mounting of backup volumes
afs: Fix directory permissions check
For non-atomic files, this patch adds an option to give nobarrier which
doesn't issue flush commands to the device.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The fstrim gathers huge number of large discard commands, and tries to issue
without IO awareness, which results in long user-perceive IO latencies on
READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
seconds due to long discard latency.
This patch limits the maximum size to 2MB per candidate, and check IO congestion
when issuing them to disk.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In inode_init_always(), we clear the inode mapping flags, which clears
any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not
also clear wb_err, which means that old mapping errors can leak through
to new inodes.
This is crucial for the XFS inode allocation path because we recycle old
in-core inodes and we do not want error state from an old file to leak
into the new file. This bug was discovered by running generic/036 and
generic/047 in a loop and noticing that the EIOs generated by the
collision of direct and buffered writes in generic/036 would survive the
remount between 036 and 047, and get reported to the fsyncs (on
different files!) in generic/047.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
It seems the first error assignment in if branch is redundant.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Although it's not that complex, but such comment could still save
several minutes for newer reader/reviewer instead of inferring that from
the code.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor wording updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
Since compression.h is using the SZ_* macros, and if some file includes
only compression.h without linux/sizes.h, it will cause compile error.
One example is lzo.c, if it uses BTRFS_MAX_COMPRESSED. Fix it by adding
linux/sizes.h in compression.h
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In btrfs_clone_files(), we must check the NODATASUM flag while the
inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags()
will change the flags after we check and we can end up with a party
checksummed file.
The race window is only a few instructions in size, between the if and
the locks which is:
3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
3835 return -EISDIR;
where the setflags must be run and toggle the NODATASUM flag (provided
the file size is 0). The clone will block on the inode lock, segflags
takes the inode lock, changes flags, releases log and clone continues.
Not impossible but still needs a lot of bad luck to hit unintentionally.
Fixes: 0e7b824c4e ("Btrfs: don't make a file partly checksummed through file clone")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Function btrfs_exclude_logged_extents may call __exclude_logged_extent
which may fail.
Propagate the failures of __exclude_logged_extent to upper caller.
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of setting "parent" to ref->parent only when dealing with
a shared ref and subsequently performing another check to see
if (parent > 0), check the "node->type" directly and act accordingly.
This makes the code more streamline. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of taking only specific member of this structure, which results
in 2 extra arguments, just take the delayed_extent_op struct and
reference the arguments inside the functions. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function currently takes 7 parameters, most of which are proxies
for values from btrfs_delayed_ref_node struct which is not passed. This
patch simplifies the interface of the function by simply passing said
delayed ref node struct to the function. This enables us to:
1. Move locals variables and init code related to them from
run_delayed_tree_ref which should only be used inside
alloc_reserved_tree_block, such as skinny_metadata and the btrfs_key,
representing the extent being inserted. This removes the need for the
"ins" argument. Instead, it's replaced by a local var with a more
verbose name - extent_key.
2. Now that we have a reference to the node in alloc_reserved_tree_block
the delayed_tree_ref struct can be referenced inside the function and
this enable removing the "ref->level", "parent" and "ref_root"
arguments.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function already takes a transaction handle which contains a
reference to the fs_info. So use this and remove the extra argument.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The test failures are not clearly visible in the system log as they're
printed at INFO level. Add a new helper that is level ERROR. As this
touches almost all strings, I took the opportunity to unify them:
- decapitalize the first letter as there's a prefix and the text
continues after ":"
- glue strings split to more lines and un-indent so they fit to 80
columns
- use %llu instead of %Lu
- drop \n from the modified messages (test_msg is left untouched)
Signed-off-by: David Sterba <dsterba@suse.com>
We should remove O_NONBLOCK flag when calling sock->ops->connect()
in sctp_connect_to_sock() function.
Why?
1. up to now, sctp socket connect() function ignores the flag argument,
that means O_NONBLOCK flag does not take effect, then we should remove
it to avoid the confusion (but is not urgent).
2. for the future, there will be a patch to fix this problem, then the flag
argument will take effect, the patch has been queued at https://git.kernel.o
rg/pub/scm/linux/kernel/git/davem/net.git/commit/net/sctp?id=644fbdeacf1d3ed
d366e44b8ba214de9d1dd66a9.
But, the O_NONBLOCK flag will make sock->ops->connect() directly return
without any wait time, then the connection will not be established, DLM kernel
module will call sock->ops->connect() again and again, the bad results are,
CPU usage is almost 100%, even trigger soft_lockup problem if the related
configurations are enabled,
DLM kernel module also prints lots of messages like,
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
The upper application (e.g. ocfs2 mount command) is hanged at new_lockspace(),
the whole backtrace is as below,
tb0307-nd2:~ # cat /proc/2935/stack
[<0>] new_lockspace+0x957/0xac0 [dlm]
[<0>] dlm_new_lockspace+0xae/0x140 [dlm]
[<0>] user_cluster_connect+0xc3/0x3a0 [ocfs2_stack_user]
[<0>] ocfs2_cluster_connect+0x144/0x220 [ocfs2_stackglue]
[<0>] ocfs2_dlm_init+0x215/0x440 [ocfs2]
[<0>] ocfs2_fill_super+0xcb0/0x1290 [ocfs2]
[<0>] mount_bdev+0x173/0x1b0
[<0>] mount_fs+0x35/0x150
[<0>] vfs_kern_mount.part.23+0x54/0x100
[<0>] do_mount+0x59a/0xc40
[<0>] SyS_mount+0x80/0xd0
[<0>] do_syscall_64+0x76/0x140
[<0>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[<0>] 0xffffffffffffffff
So, I think we should remove O_NONBLOCK flag here, since DLM kernel module can
not handle non-block sockect in connect() properly.
Signed-off-by: Gang He <ghe@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
The information about a size change in this case just creates confusion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Only used in block_dev.c and the partitions code, and it should remain
that way..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We can optimise away any lookup for a rename target, unless we're
being asked to revalidate a dentry that might be in use.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If nfs_lookup_revalidate() is called with LOOKUP_REVAL because a
previous path lookup failed, then we ought to force a full lookup
of the component name.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFSv4 should not need to perform an extra close-to-open GETATTR as part
of the process of looking up a regular file, since the OPEN call will
do that for us.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
link error when another driver using it is built-in. The
INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
a 'bool' symbol that does not force anything to be a module in turn.
fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
net/9p/trans_rdma.o: In function `rdma_request':
trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
net/9p/trans_rdma.o: In function `rdma_destroy_trans':
trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
Fixes: 9533b292a7 ("IB: remove redundant INFINIBAND kconfig dependencies")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
btrfs_read_fs_root_no_name() may return ERR_PTR(-ENOENT) or
ERR_PTR(-ENOMEM) and therefore search_ioctl() and
btrfs_search_path_in_tree() should use PTR_ERR() instead of -ENOENT,
which all other callers of btrfs_read_fs_root_no_name() do.
Drop the error message as it would be confusing, the caller of ioctl
will likely interpret the error code and not look into the syslog.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I got a report that after upgrading to 4.16, someone's filesystems
weren't mounting:
[ 23.845852] BTRFS info (device loop0): unrecognized mount option 'subvol='
Before 4.16, this mounted the default subvolume. It turns out that this
empty "subvol=" is actually an application bug, but it was causing the
application to fail, so it's an ABI break if you squint.
The generic parsing code we use for mount options (match_token())
doesn't match an empty string as "%s". Previously, setup_root_args()
removed the "subvol=" string, but the mount path was cleaned up to not
need that. Add a dummy Opt_subvol_empty to fix this.
The simple workaround is to use / or . for the value of 'subvol=' .
Fixes: 312c89fbca ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
CC: stable@vger.kernel.org # 4.16+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Looks like the original idea was to print the hex of the flags which is
not coded with their flag name. So use the current buf pointer bp
instead of buf.
Reaching the uknown flags should never happen, it's there just in case.
Fixes: ebce0e01b9 ("btrfs: make block group flags in balance printks human-readable")
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The dedupe range is 16 MiB, with 4 KiB pages and 8 byte pointers, the
arrays can be 32KiB large. To avoid allocation failures due to
fragmented memory, use the allocation with fallback to vmalloc.
The arrays are allocated and freed only inside btrfs_extent_same and
reused for all the ranges.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We support big dedup requests by splitting range to smaller parts, and
call dedupe logic on each of them.
Instead of repeated allocation and deallocation, allocate once at the
beginning and reuse in the iteration.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently btrfs_dedupe_file_range silently restricts the dedupe range to
to 16MiB to limit locking and working memory size and is documented in
manual page as implementation specific.
Let's remove that restriction by iterating over the dedup range in 16MiB
steps. This is backward compatible and will not change anything for
requests smaller then 16MiB.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Split btrfs_extent_same() to two parts where one is the main EXTENT_SAME
entry and a helper that can be repeatedly called on a range. This will
be used in following patches.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_link() calls btrfs_orphan_del() if it's linking an O_TMPFILE but
it doesn't reserve space to do so. Even before the removal of the
orphan_block_rsv it wasn't using it.
Fixes: ef3b9af50b ("Btrfs: implement inode_operations callback tmpfile")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We got rid of BTRFS_INODE_HAS_ORPHAN_ITEM and
BTRFS_INODE_ORPHAN_META_RESERVED, so we can renumber the flags to make
them consecutive again.
Signed-off-by: Omar Sandoval <osandov@fb.com>
[ switch them enums so we don't have to do that again ]
Signed-off-by: David Sterba <dsterba@suse.com>
Now that we don't keep long-standing reservations for orphan items,
root->orphan_block_rsv isn't used. We can git rid of it, along with:
- root->orphan_lock, which was used to protect root->orphan_block_rsv
- root->orphan_inodes, which was used as a refcount for root->orphan_block_rsv
- BTRFS_INODE_ORPHAN_META_RESERVED, which was used to track reservations
in root->orphan_block_rsv
- btrfs_orphan_commit_root(), which was the last user of any of these
and does nothing else
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, we keep space reserved for all inode orphan items until the
inode is evicted (i.e., all references to it are dropped). We hit an
issue where an application would keep a bunch of deleted files open (by
design) and thus keep a large amount of space reserved, causing ENOSPC
errors when other operations tried to reserve space. This long-standing
reservation isn't absolutely necessary for a couple of reasons:
- We can almost always make the reservation we need or steal from the
global reserve for the orphan item
- If we can't, it's not the end of the world if we drop the orphan item
on the floor and let the next mount clean it up
So, get rid of persistent reservation and just reserve space in
btrfs_evict_inode().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The truncate loop in btrfs_evict_inode() does two things at once:
- It refills the temporary block reserve, potentially stealing from the
global reserve or committing
- It calls btrfs_truncate_inode_items()
The tangle of continues hides the fact that these two steps are actually
separate. Split the first step out into a separate function both for
clarity and so that we can reuse it in a later patch.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>