This function is internal to btrfs and doesn't really deal with any
VFS members, as such it needn't take a struct inode refrence but
btrfs_inode.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba <dsterba@suse.com>
The expression is open-coded in several places, this asks for a wrapper.
As we know the MAX_EXTENT fits to u32, we can use the appropirate
division helper. This cascades to the result type updates.
Compiler is clever enough to use shift instead of integer division, so
there's no change in the generated assembly.
Signed-off-by: David Sterba <dsterba@suse.com>
A proposed patch in https://marc.info/?l=linux-btrfs&m=147859791003837
pointed out bad limit threshold in cow_file_range_async, but it turned
out that the whole logic is not necessary and is done by writeback. We
agreed to remove it.
Signed-off-by: David Sterba <dsterba@suse.com>
As of now writes smaller than 64k for non compressed extents and 16k
for compressed extents inside eof are considered as candidate
for auto defrag, put them together at a place.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since btrfs_defrag_leaves() does not support extent_root, remove its
corresponding call. The user can use the file based defrag to defrag
extents as of now.
No change in behaviour as extent_root is explicitly skipped in
btrfs_defrag_leaves and this has never worked as expected.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ ehnance changelong ]
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_add_delayed_data_ref is always called with a NULL extent_op,
so let's drop the argument.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The check for a null inode is redundant since the function
is a callback for exportfs, which will itself crash if
dentry->d_inode or parent->d_inode is NULL. Removing the
null check makes this consistent with other file systems.
Also remove the redundant null dir check too.
Found with static analysis by CoverityScan, CID 1389472
Kudos to Jeff Mahoney for reviewing and explaining the error in
my original patch (most of this explanation went into the above
commit message) and David Sterba for pointing out that the dir
check is also redundant.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This replaces ACCESS_ONCE macro with the corresponding
READ|WRITE macros
Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This cleans up the cases where the min/max macros were used with a cast
rather than using directly min_t/max_t.
Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
try_release_extent_state reduces the gfp mask to GFP_NOFS if it is
compatible. This is true for GFP_KERNEL as well. There is no real
reason to do that though. There is no new lock taken down the
the only consumer of the gfp mask which is
try_release_extent_state
clear_extent_bit
__clear_extent_bit
alloc_extent_state
So this seems just unnecessary and confusing.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
b335b0034e ("Btrfs: Avoid using __GFP_HIGHMEM with slab allocator")
has reduced the allocation mask in btrfs_releasepage to GFP_NOFS just
to prevent from giving an unappropriate gfp mask to the slab allocator
deeper down the callchain (in alloc_extent_state). This is wrong for
two reasons a) GFP_NOFS might be just too restrictive for the calling
context b) it is better to tweak the gfp mask down when it needs that.
So just remove the mask tweaking from btrfs_releasepage and move it
down to alloc_extent_state where it is needed.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Goldwyn Rodrigues has exposed and fixed a bug which underflows btrfs
qgroup reserved space, and leads to non-writable fs.
This reminds us that we don't have enough underflow check for qgroup
reserved space.
For underflow case, we should not really underflow the numbers but warn
and keeps qgroup still work.
So add more check on qgroup reserved space and add WARN_ON() and
btrfs_warn() for any underflow case.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Right now bprm_fill_uid() uses inode fetched from file_inode(bprm->file).
This in turn returns inode of lower filesystem (in a stacked filesystem
setup).
I was playing with modified patches of shiftfs posted by james bottomley
and realized that through shiftfs setuid bit does not take effect. And
reason being that we fetch uid/gid from inode of lower fs (and not from
shiftfs inode). And that results in following checks failing.
/* We ignore suid/sgid if there are no mappings for them in the ns */
if (!kuid_has_mapping(bprm->cred->user_ns, uid) ||
!kgid_has_mapping(bprm->cred->user_ns, gid))
return;
uid/gid fetched from lower fs inode might not be mapped inside the user
namespace of container. So we need to look at uid/gid fetched from
upper filesystem (shiftfs in this particular case) and these should be
mapped and setuid bit can take affect.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
David Howells says:
====================
afs: Use system UUID generation
There is now a general function for generating a UUID and AFS should make
use of it. It's also been recommended to me that I switch to using random
rather than time plus MAC address-based UUIDs which this function does.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of needing additional checks in callers for unallocated przs,
perform the check in the walker, which gives us a more universal way to
handle the situation.
Signed-off-by: Kees Cook <keescook@chromium.org>
Currently unregistering sysctl table does not prune its dentries.
Stale dentries could slowdown sysctl operations significantly.
For example, command:
# for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
creates a millions of stale denties around sysctls of loopback interface:
# sysctl fs.dentry-state
fs.dentry-state = 25812579 24724135 45 0 0 0
All of them have matching names thus lookup have to scan though whole
hash chain and call d_compare (proc_sys_compare) which checks them
under system-wide spinlock (sysctl_lock).
# time sysctl -a > /dev/null
real 1m12.806s
user 0m0.016s
sys 1m12.400s
Currently only memory reclaimer could remove this garbage.
But without significant memory pressure this never happens.
This patch collects sysctl inodes into list on sysctl table header and
prunes all their dentries once that table unregisters.
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> On 10.02.2017 10:47, Al Viro wrote:
>> how about >> the matching stats *after* that patch?
>
> dcache size doesn't grow endlessly, so stats are fine
>
> # sysctl fs.dentry-state
> fs.dentry-state = 92712 58376 45 0 0 0
>
> # time sysctl -a &>/dev/null
>
> real 0m0.013s
> user 0m0.004s
> sys 0m0.008s
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Pull btrfs fixes from Chris Mason:
"This has two last minute fixes. The highest priority here is a
regression fix for the decompression code, but we also fixed up a
problem with the 32-bit compat ioctls.
The decompression bug could hand back the wrong data on big reads when
zlib was used. I have a larger cleanup to make the math here less
error prone, but at this stage in the release Omar's patch is the best
choice"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix btrfs_decompress_buf2page()
btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.
Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.
A case I saw when testing with zlib was:
buf_start = 42769
total_out = 46865
working_bytes = total_out - buf_start = 4096
start_byte = 45056
The condition (total_out > start_byte && buf_start < start_byte) is
true, so we adjust the offset:
buf_offset = start_byte - buf_start = 2287
working_bytes -= buf_offset = 1809
current_buf_start = buf_start = 42769
Then, we copy
bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
buf_offset += bytes = 4096
working_bytes -= bytes = 0
current_buf_start += bytes = 44578
After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
which is true! So, we adjust the values again:
buf_offset = start_byte - buf_start = 2287
working_bytes = total_out - start_byte = 1809
current_buf_start = buf_start + buf_offset = 45056
But note that working_bytes was already zero before this, so we should
have stopped copying.
Fixes: 974b1adc3b ("btrfs: use bio iterators for the decompression handlers")
Reported-by: Pat Erley <pat-lkml@erley.org>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Liu Bo <bo.li.liu@oracle.com>
Pull nfsd revert from Bruce Fields:
"This patch turned out to have a couple problems. The problems are
fixable, but at least one of the fixes is a little ugly. The original
bug has always been there, so we can wait another week or two to get
this right"
* tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux:
nfsd: Revert "nfsd: special case truncates some more"
AFS uses a time based UUID to identify the host itself. This requires
getting a timestamp which is currently done through the getnstimeofday()
interface that we want to eventually get rid of.
Instead of replacing it with a ktime-based interface, simply remove the
entire function and use generate_random_uuid() instead, which has a v4
("completely random") UUID instead of the time-based one.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Move the afs_uuid struct to linux/uuid.h, rename it to uuid_v1 and change
the u16/u32 fields to __be16/__be32 instead so that the structure can be
cast to a 16-octet network-order buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de