It's a user pointer, and while the permissions of the file are pretty
questionable (should it really be readable to everybody), hashing the
pointer isn't going to be the solution.
We should take a closer look at more of the /proc/<pid> file permissions
in general. Sure, we do want many of them to often be readable (for
'ps' and friends), but I think we should probably do a few conversions
from S_IRUGO to S_IRUSR.
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The name char array passed to btrfs_search_path_in_tree is of size
BTRFS_INO_LOOKUP_PATH_MAX (4080). So the actual accessible char indexes
are in the range of [0, 4079]. Currently the code uses the define but this
represents an off-by-one.
Implications:
Size of btrfs_ioctl_ino_lookup_args is 4096, so the new byte will be
written to extra space, not some padding that could be provided by the
allocator.
btrfs-progs store the arguments on stack, but kernel does own copy of
the ioctl buffer and the off-by-one overwrite does not affect userspace,
but the ending 0 might be lost.
Kernel ioctl buffer is allocated dynamically so we're overwriting
somebody else's memory, and the ioctl is privileged if args.objectid is
not 256. Which is in most cases, but resolving a subvolume stored in
another directory will trigger that path.
Before this patch the buffer was one byte larger, but then the -1 was
not added.
Fixes: ac8e9819d7 ("Btrfs: add search and inode lookup ioctls")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ added implications ]
Signed-off-by: David Sterba <dsterba@suse.com>
I was seeing disk flushes still happening when I mounted a Btrfs
filesystem with nobarrier for testing. This is because we use FUA to
write out the first super block, and on devices without FUA support, the
block layer translates FUA to a flush. Even on devices supporting true
FUA, using FUA when we asked for no barriers is surprising.
Fixes: 387125fc72 ("Btrfs: fix barrier flushes")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the
error but then return 0 anyway due to mixing err and ret.
Fixes: 79787eaab4 ("btrfs: replace many BUG_ONs with proper error handling")
Cc: <stable@vger.kernel.org> # v3.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since commit fb235dc06f (btrfs: qgroup: Move half of the qgroup
accounting time out of commit trans) the assumption that
btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has
been false. The qgroup operations call into btrfs_search_slot
and friends and can now return the full spectrum of error codes.
Fortunately, the fix here is easy since update_ref_for_cow failing
is already handled so we just need to bail early with the error
code.
Fixes: fb235dc06f (btrfs: qgroup: Move half of the qgroup accounting ...)
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Edmund Nadolski <enadolski@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit c6887cd111 ("Btrfs: don't do nocow check unless we have to")
changed the behavior of __btrfs_buffered_write() so that it first tries
to get a data space reservation, and then skips the relatively expensive
nocow check if the reservation succeeded.
If we have quotas enabled, the data space reservation also includes a
quota reservation. But in the rewrite case, the space has already been
accounted for in qgroups. So btrfs_check_data_free_space() increases
the quota reservation, but it never gets decreased when the data
actually gets written and overwrites the pre-existing data. So we're
left with both the qgroup and qgroup reservation accounting for the same
space.
This commit adds the missing btrfs_qgroup_free_data() call in the case
of BTRFS_ORDERED_PREALLOC extents.
Fixes: c6887cd111 ("Btrfs: don't do nocow check unless we have to")
Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
cifs.ko makes DFS queries regardless of the type of the server and
non-DFS servers are common. This often results in superfluous logging of
non-critical errors.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
This resolves the merge issue pointed out by Stephen in
drivers/iio/adc/meson_saradc.c.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code in __d_alloc() carefully orders filling in the NUL character
of the name (and the length, hash, and the name itself) with assigning
of the name itself. However, prepend_name() does not order the accesses
to the ->name and ->len fields, other than on TSO systems. This commit
therefore replaces prepend_name()'s READ_ONCE() of ->name with an
smp_load_acquire(), which orders against the subsequent READ_ONCE() of
->len. Because READ_ONCE() now incorporates smp_read_barrier_depends(),
prepend_name()'s smp_read_barrier_depends() is removed. Finally,
to save a line, the smp_wmb()/store pair in __d_alloc() is replaced
by smp_store_release().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <linux-fsdevel@vger.kernel.org>
Currently, fallocate(2) with KEEP_SIZE followed by a fdatasync(2)
then crash, we'll see wrong allocated block number (stat -c %b), the
blocks allocated beyond EOF are all lost. fstests generic/468
exposes this bug.
Commit 67a7d5f561 ("ext4: fix fdatasync(2) after extent
manipulation operations") fixed all the other extent manipulation
operation paths such as hole punch, zero range, collapse range etc.,
but forgot the fallocate case.
So similarly, fix it by recording the correct journal tid in ext4
inode in fallocate(2) path, so that ext4_sync_file() will wait for
the right tid to be committed on fdatasync(2).
This addresses the test failure in xfstests test generic/468.
Signed-off-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
407cd7fb83 (ext4: change fast symlink test to not rely on i_blocks)
broke ~10 years old ext3 file systems created by 2.6.17. Any ELF
executable fails because the /lib/ld-linux.so.2 fast symlink
cannot be read anymore.
The patch assumed fast symlinks were created in a specific way,
but that's not true on these really old file systems.
The new behavior is apparently needed only with the large EA inode
feature.
Revert to the old behavior if the large EA inode feature is not set.
This makes my old VM boot again.
Fixes: 407cd7fb83 (ext4: change fast symlink test to not rely on i_blocks)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@vger.kernel.org
Pull NFS client fixes from Anna Schumaker:
"These patches fix a problem with compiling using an old version of
gcc, and also fix up error handling in the SUNRPC layer.
- NFSv4: Ensure gcc 4.4.4 can compile initialiser for
"invalid_stateid"
- SUNRPC: Allow connect to return EHOSTUNREACH
- SUNRPC: Handle ENETDOWN errors"
* tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Handle ENETDOWN errors
SUNRPC: Allow connect to return EHOSTUNREACH
NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
Pull xfs fixes from Darrick Wong:
"Here are some bug fixes for 4.15-rc2.
- fix memory leaks that appeared after removing ifork inline data
buffer
- recover deferred rmap update log items in correct order
- fix memory leaks when buffer construction fails
- fix memory leaks when bmbt is corrupt
- fix some uninitialized variables and math problems in the quota
scrubber
- add some omitted attribution tags on the log replay commit
- fix some UBSAN complaints about integer overflows with large sparse
files
- implement an effective inode mode check in online fsck
- fix log's inability to retry quota item writeout due to transient
errors"
* tag 'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Properly retry failed dquot items in case of error during buffer writeback
xfs: scrub inode mode properly
xfs: remove unused parameter from xfs_writepage_map
xfs: ubsan fixes
xfs: calculate correct offset in xfs_scrub_quota_item
xfs: fix uninitialized variable in xfs_scrub_quota
xfs: fix leaks on corruption errors in xfs_bmap.c
xfs: fortify xfs_alloc_buftarg error handling
xfs: log recovery should replay deferred ops in order
xfs: always free inline data before resetting inode fork during ifree
When an AFS inode is allocated by afs_alloc_inode(), the allocated
afs_vnode struct isn't necessarily reset from the last time it was used as
an inode because the slab constructor is only invoked once when the memory
is obtained from the page allocator.
This means that information can leak from one inode to the next because
we're not calling kmem_cache_zalloc(). Some of the information isn't
reset, in particular the permit cache pointer.
Bring the clearances up to date.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Fix four refcount bugs in afs_cache_permit():
(1) When checking the result of the kzalloc(), we can't just return, but
must put 'permits'.
(2) We shouldn't put permits immediately after hashing a new permit as we
need to keep the pointer stable so that we can check to see if
vnode->permit_cache has changed before we decide whether to assign to
it.
(3) 'permits' is being put twice.
(4) We need to put either the replacement or the thing replaced after the
assignment to vnode->permit_cache.
Without this, lots of the following are seen:
Kernel BUG at ffffffffa039857b [verbose debug info unavailable]
------------[ cut here ]------------
Kernel BUG at ffffffffa039858a [verbose debug info unavailable]
------------[ cut here ]------------
The addresses are in the .text..refcount section of the kafs.ko module.
Following the relocation records for the __ex_table section shows one to be
due to the decrement in afs_put_permits() and the other to be key_get() in
afs_cache_permit().
Occasionally, the following is seen:
refcount_t overflow at afs_cache_permit+0x57d/0x5c0 [kafs] in cc1[562], uid/euid: 0/0
WARNING: CPU: 0 PID: 562 at kernel/panic.c:657 refcount_error_report+0x9c/0xac
...
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
The code used to do that and then I mucked with it and never quite put
the code back. Today the code references current_cred()->uid and
current_cred()->gid which is equivalent but more wordy, and not
idiomatic.
Fixes: 93faccbbfa ("fs: Better permission checking for submounts")
Fixes: 069d5ac9ae ("autofs: Fix automounts by using current_real_cred()->uid")
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When vfs_submount was added the test to limit automounts from
filesystems that with s_user_ns != &init_user_ns accidentially left
in follow_automount. The test was never about any security concerns
and was always about how do we implement this for filesystems whose
s_user_ns != &init_user_ns.
At the moment this check makes no difference as there are no
filesystems that both set FS_USERNS_MOUNT and implement d_automount.
Remove this check now while I am thinking about it so there will not
be odd booby traps for someone who does want to make this combination
work.
vfs_submount still needs improvements to allow this combination to work,
and vfs_submount contains a check that presents a warning.
The autofs4 filesystem could be modified to set FS_USERNS_MOUNT and it would
need not work on this code path, as userspace performs the mounts.
Fixes: 93faccbbfa ("fs: Better permission checking for submounts")
Fixes: aeaa4a79ff ("fs: Call d_automount with the filesystems creds")
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Once the inode item writeback errors is already fixed, it's time to fix the same
problem in dquot code.
Although there were no reports of users hitting this bug in dquot code (at least
none I've seen), the bug is there and I was already planning to fix it when the
correct approach to fix the inodes part was decided.
This patch aims to fix the same problem in dquot code, regarding failed buffers
being unable to be resubmitted once they are flush locked.
Tested with the recently test-case sent to fstests list by Hou Tao.
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Since we've used up all the bits in i_mode, the existing mode check
doesn't actually do anything useful. However, we've not used all the
bit values in the format portion of i_mode, so we /do/ need to test
that for bad values.
Fixes: 80e4e1268 ("xfs: scrub inodes")
Fixes-coverity-id: 1423992
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
The first thing that xfs_writepage_map does is clobber the offset
parameter. Since we never use the passed-in value, turn the parameter
into a local variable. This gets rid of an UBSAN warning in generic/466.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Fix some complaints from the UBSAN about signed integer addition overflows.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Mergr misc fixes from Andrew Morton:
"28 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate()
mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
autofs: revert "autofs: take more care to not update last_used on path walk"
fs/fat/inode.c: fix sb_rdonly() change
mm, memcg: fix mem_cgroup_swapout() for THPs
mm: migrate: fix an incorrect call of prep_transhuge_page()
kmemleak: add scheduling point to kmemleak_scan()
scripts/bloat-o-meter: don't fail with division by 0
fs/mbcache.c: make count_objects() more robust
Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"
mm/madvise.c: fix madvise() infinite loop under special circumstances
exec: avoid RLIMIT_STACK races with prlimit()
IB/core: disable memory registration of filesystem-dax vmas
v4l2: disable filesystem-dax mapping support
mm: fail get_vaddr_frames() for filesystem-dax mappings
mm: introduce get_user_pages_longterm
device-dax: implement ->split() to catch invalid munmap attempts
mm, hugetlbfs: introduce ->split() to vm_operations_struct
scripts/faddr2line: extend usage on generic arch
...
hugetlfs_fallocate() currently performs put_page() before unlock_page().
This scenario opens a small time window, from the time the page is added
to the page cache, until it is unlocked, in which the page might be
removed from the page-cache by another core. If the page is removed
during this time windows, it might cause a memory corruption, as the
wrong page will be unlocked.
It is arguable whether this scenario can happen in a real system, and
there are several mitigating factors. The issue was found by code
inspection (actually grep), and not by actually triggering the flow.
Yet, since putting the page before unlocking is incorrect it should be
fixed, if only to prevent future breakage or someone copy-pasting this
code.
Mike said:
"I am of the opinion that this does not need to be sent to stable.
Although the ordering is current code is incorrect, there is no way
for this to be a problem with current locking. In addition, I verified
that the perhaps bigger issue with sys_fadvise64(POSIX_FADV_DONTNEED)
for hugetlbfs and other filesystems is addressed in 3a77d21480 ("mm:
fadvise: avoid fadvise for fs without backing device")"
Link: http://lkml.kernel.org/r/20170826191124.51642-1-namit@vmware.com
Fixes: 70c3547e36 ("hugetlbfs: add hugetlbfs_fallocate()")
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT
flag but introduced a semantic change.
In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().
This changed the unconditional triggering of an automount in this case
to no longer be done and an error returned instead.
This has caused more problems than I expected so reverting the change is
needed.
In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with a
description of the problem and what needs to be done by module users for
this specific case.
Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net
Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Neil Brown <neilb@suse.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Colin Walters <walters@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Cc: <stable@vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While commit 092a53452b ("autofs: take more care to not update
last_used on path walk") helped (partially) resolve a problem where
automounts were not expiring due to aggressive accesses from user space
it has a side effect for very large environments.
This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.
It turns out I put the last_used in this position to solve this very
problem and failed to update my own thinking of the autofs expire
policy. So the patch being reverted introduces a regression which
should be fixed.
Link: http://lkml.kernel.org/r/151174729420.6162.1832622523537052460.stgit@pluto.themaw.net
Fixes: 092a53452b ("autofs: take more care to not update last_used on path walk")
Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: <stable@vger.kernel.org> [4.11+]
Cc: Colin Walters <walters@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull nfsd fixes from Bruce Fields:
"I screwed up my merge window pull request; I only sent half of what I
meant to.
There were no new features, just bugfixes of various importance and
some very minor cleanup, so I think it's all still appropriate for
-rc2.
Highlights:
- Fixes from Trond for some races in the NFSv4 state code.
- Fix from Naofumi Honda for a typo in the blocked lock notificiation
code
- Fixes from Vasily Averin for some problems starting and stopping
lockd especially in network namespaces"
* tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits)
lockd: fix "list_add double add" caused by legacy signal interface
nlm_shutdown_hosts_net() cleanup
race of nfsd inetaddr notifiers vs nn->nfsd_serv change
race of lockd inetaddr notifiers vs nlmsvc_rqst change
SUNRPC: make cache_detail structures const
NFSD: make cache_detail structures const
sunrpc: make the function arg as const
nfsd: check for use of the closed special stateid
nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
lockd: lost rollback of set_grace_period() in lockd_down_net()
lockd: added cleanup checks in exit_net hook
grace: replace BUG_ON by WARN_ONCE in exit_net hook
nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
lockd: remove net pointer from messages
nfsd: remove net pointer from debug messages
nfsd: Fix races with check_stateid_generation()
nfsd: Ensure we check stateid validity in the seqid operation checks
nfsd: Fix race in lock stateid creation
nfsd4: move find_lock_stateid
nfsd: Ensure we don't recognise lock stateids after freeing them
...
Pull btrfs fixes from David Sterba:
"We've collected some fixes in since the pre-merge window freeze.
There's technically only one regression fix for 4.15, but the rest
seems important and candidates for stable.
- fix missing flush bio puts in error cases (is serious, but rarely
happens)
- fix reporting stat::st_blocks for buffered append writes
- fix space cache invalidation
- fix out of bound memory access when setting zlib level
- fix potential memory corruption when fsync fails in the middle
- fix crash in integrity checker
- incremetnal send fix, path mixup for certain unlink/rename
combination
- pass flags to writeback so compressed writes can be throttled
properly
- error handling fixes"
* tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: incremental send, fix wrong unlink path after renaming file
btrfs: tree-checker: Fix false panic for sanity test
Btrfs: fix list_add corruption and soft lockups in fsync
btrfs: Fix wild memory access in compression level parser
btrfs: fix deadlock when writing out space cache
btrfs: clear space cache inode generation always
Btrfs: fix reported number of inode blocks after buffered append writes
Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
Btrfs: bail out gracefully rather than BUG_ON
btrfs: dev_alloc_list is not protected by RCU, use normal list_del
btrfs: add missing device::flush_bio puts
btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
Btrfs: add write_flags for compression bio
register_shrinker() might return -ENOMEM error since Linux 3.12.
Call panic() as with other failure checks in this function if
register_shrinker() failed.
Fixes: 1d3d4437ea ("vmscan: per-node deferred work")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
two callers that do so don't need to bother - we'd already
masked it with epi->event.events, which
* couldn't have changed since we are holding ->mtx
* had been set to event->events
* is still equal to event->events, since *event is never
changed by anything.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It's only used for tracepoints so it's relatively harmless,
but the offset is calculated incorrectly in xfs_scrub_quota_item.
qi_dqperchunk is the nr. of dquots per "chunk" which we have
conveniently *cough* defined to always be 1 FSB. Therefore
block_offset * qi_dqperchunk == first id in that chunk,
and so offset = id / qi_dqperchunk
id * dqperchunk is ... meaningless.
Fixes-coverity-id: 1423965
Fixes: c2fc338c ("xfs: scrub quota information")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
On the first pass through the while(1) loop, we get to
xfs_scrub_should_terminate() which can test the uninitialized
error variable.
Fixes-coverity-id: 1423737
Fixes: c2fc338c ("xfs: scrub quota information")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Use _GOTO instead of _RETURN so we can free the allocated
cursor on error.
Fixes: bf80628 ("xfs: remove xfs_bmse_shift_one")
Fixes-coverity-id: 1423813, 1423676
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
percpu_counter_init failure path doesn't clean up &btp->bt_lru list.
Call list_lru_destroy in that error path. Similarly register_shrinker
error path is not handled.
While it is unlikely to trigger these error path, it is not impossible
especially the later might fail with large NUMAs. Let's handle the
failure to make the code more robust.
Noticed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Under some circumstances, an incremental send operation can issue wrong
paths for unlink commands related to files that have multiple hard links
and some (or all) of those links were renamed between the parent and send
snapshots. Consider the following example:
Parent snapshot
. (ino 256)
|---- a/ (ino 257)
| |---- b/ (ino 259)
| | |---- c/ (ino 260)
| | |---- f2 (ino 261)
| |
| |---- f2l1 (ino 261)
|
|---- d/ (ino 262)
|---- f1l1_2 (ino 258)
|---- f2l2 (ino 261)
|---- f1_2 (ino 258)
Send snapshot
. (ino 256)
|---- a/ (ino 257)
| |---- f2l1/ (ino 263)
| |---- b2/ (ino 259)
| |---- c/ (ino 260)
| | |---- d3 (ino 262)
| | |---- f1l1_2 (ino 258)
| | |---- f2l2_2 (ino 261)
| | |---- f1_2 (ino 258)
| |
| |---- f2 (ino 261)
| |---- f1l2 (ino 258)
|
|---- d (ino 261)
When computing the incremental send stream the following steps happen:
1) When processing inode 261, a rename operation is issued that renames
inode 262, which currently as a path of "d", to an orphan name of
"o262-7-0". This is done because in the send snapshot, inode 261 has
of its hard links with a path of "d" as well.
2) Two link operations are issued that create the new hard links for
inode 261, whose names are "d" and "f2l2_2", at paths "/" and
"o262-7-0/" respectively.
3) Still while processing inode 261, unlink operations are issued to
remove the old hard links of inode 261, with names "f2l1" and "f2l2",
at paths "a/" and "d/". However path "d/" does not correspond anymore
to the directory inode 262 but corresponds instead to a hard link of
inode 261 (link command issued in the previous step). This makes the
receiver fail with a ENOTDIR error when attempting the unlink
operation.
The problem happens because before sending the unlink operation, we failed
to detect that inode 262 was one of ancestors for inode 261 in the parent
snapshot, and therefore we didn't recompute the path for inode 262 before
issuing the unlink operation for the link named "f2l2" of inode 262. The
detection failed because the function "is_ancestor()" only follows the
first hard link it finds for an inode instead of all of its hard links
(as it was originally created for being used with directories only, for
which only one hard link exists). So fix this by making "is_ancestor()"
follow all hard links of the input inode.
A test case for fstests follows soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>