When xfstest generic/035, we found the target file was deleted
if the rename return -EACESS.
In cifs_rename2, we unlink the positive target dentry if rename
failed with EACESS or EEXIST, even if the target dentry is positived
before rename. Then the existing file was deleted.
We should just delete the target file which created during the
rename.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
The flag from the primary tcon needs to be copied into the volume info
so that cifs_get_tcon will try to enable extensions on the per-user
tcon. At that point, since posix extensions must have already been
enabled on the superblock, don't try to needlessly adjust the mount
flags.
Fixes: ce558b0e17 ("smb3: Add posix create context for smb3.11 posix mounts")
Fixes: b326614ea2 ("smb3: allow "posix" mount option to enable new SMB311 protocol extensions")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Without this:
- persistent handles will only be enabled for per-user tcons if the
server advertises the 'Continuous Availabity' capability
- resilient handles would never be enabled for per-user tcons
Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Ensure multiuser SMB3 mounts use encryption for all users' tcons if the
mount options are configured to require encryption. Without this, only
the primary tcon and IPC tcons are guaranteed to be encrypted. Per-user
tcons would only be encrypted if the server was configured to require
encryption.
Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Pull exfat fixes from Namjae Jeon:
- Zero out unused characters of FileName field to avoid a complaint
from some fsck tool.
- Fix memory leak on error paths.
- Fix unnecessary VOL_DIRTY set when calling rmdir on non-empty
directory.
- Call sync_filesystem() for read-only remount (Fix generic/452 test in
xfstests)
- Add own fsync() to flush dirty metadata.
* tag 'exfat-for-5.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: flush dirty metadata in fsync
exfat: move setting VOL_DIRTY over exfat_remove_entries()
exfat: call sync_filesystem for read-only remount
exfat: add missing brelse() calls on error paths
exfat: Set the unused characters of FileName field to the value 0000h
Since 5.7, we've been using task_work to trigger async running of
requests in the context of the original task. This generally works
great, but there's a case where if the task is currently blocked
in the kernel waiting on a condition to become true, it won't process
task_work. Even though the task is woken, it just checks whatever
condition it's waiting on, and goes back to sleep if it's still false.
This is a problem if that very condition only becomes true when that
task_work is run. An example of that is the task registering an eventfd
with io_uring, and it's now blocked waiting on an eventfd read. That
read could depend on a completion event, and that completion event
won't get trigged until task_work has been run.
Use the TWA_SIGNAL notification for task_work, so that we ensure that
the task always runs the work when queued.
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In flush_delete_work, instead of flushing each individual pending
delayed work item, cancel and re-queue them for immediate execution.
The waiting isn't needed here because we're already waiting for all
queued work items to complete in gfs2_flush_delete_work. This makes the
code more efficient, but more importantly, it avoids sleeping during a
rhashtable walk, inside rcu_read_lock().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Log flush operations (gfs2_log_flush()) can target a specific transaction.
But if the function encounters errors (e.g. io errors) and withdraws,
the transaction was only freed it if was queued to one of the ail lists.
If the withdraw occurred before the transaction was queued to the ail1
list, function ail_drain never freed it. The result was:
BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown()
This patch makes log_flush() add the targeted transaction to the ail1
list so that function ail_drain() will find and free it properly.
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Callers expect gfs2_inode_lookup to return an inode pointer or ERR_PTR(error).
Commit b66648ad6d caused it to return NULL instead of ERR_PTR(-ESTALE) in
some cases. Fix that.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: b66648ad6d ("gfs2: Move inode generation number check into gfs2_inode_lookup")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
I don't understand this code well, but I'm seeing a warning about a
still-referenced inode on unmount, and every other similar filesystem
does a dput() here.
Fixes: e8a79fb14f ("nfsd: add nfsd/clients directory")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't drop the reference on the nfsdfs filesystem with
mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called
until the nfsd module's unloaded, and we can't unload the module as long
as there's a reference on nfsdfs. So this prevents module unloading.
Fixes: 2c830dd720 ("nfsd: persist nfsd filesystem across mounts")
Reported-and-Tested-by: Luo Xiaogang <lxgrxd@163.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit e9c15badbb ("fs: Do not check if there is a
fsnotify watcher on pseudo inodes"). The commit intended to eliminate
fsnotify-related overhead for pseudo inodes but it is broken in
concept. inotify can receive events of pipe files under /proc/X/fd and
chromium relies on close and open events for sandboxing. Maxim Levitsky
reported the following
Chromium starts as a white rectangle, shows few white rectangles that
resemble its notifications and then crashes.
The stdout output from chromium:
[mlevitsk@starship ~]$chromium-freeworld
mesa: for the --simplifycfg-sink-common option: may only occur zero or one times!
mesa: for the --global-isel-abort option: may only occur zero or one times!
[3379:3379:0628/135151.440930:ERROR:browser_switcher_service.cc(238)] XXX Init()
../../sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc:**CRASHING**:seccomp-bpf failure in syscall 0072
Received signal 11 SEGV_MAPERR 0000004a9048
Crashes are not universal but even if chromium does not crash, it certainly
does not work properly. While filtering just modify and access might be
safe, the benefit is not worth the risk hence the revert.
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes: e9c15badbb ("fs: Do not check if there is a fsnotify watcher on pseudo inodes")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
generic_file_fsync() exfat used could not guarantee the consistency of
a file because it has flushed not dirty metadata but only dirty data pages
for a file.
Instead of that, use exfat_file_fsync() for files and directories so that
it guarantees to commit both the metadata and data pages for a file.
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
We need to commit dirty metadata and pages to disk
before remounting exfat as read-only.
This fixes a failure in xfstests generic/452
generic/452 does the following:
cp something <exfat>/
mount -o remount,ro <exfat>
the <exfat>/something is corrupted. because while
exfat is remounted as read-only, exfat doesn't
have a chance to commit metadata and
vfs invalidates page caches in a block device.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
If the second exfat_get_dentry() call fails then we need to release
"old_bh" before returning. There is a similar bug in exfat_move_file().
Fixes: 5f2aa07507 ("exfat: add inode operations")
Reported-by: Markus Elfring <Markus.Elfring@web.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Some fsck tool complain that padding part of the FileName field
is not set to the value 0000h. So let's maintain filesystem cleaner,
as exfat's spec. recommendation.
Signed-off-by: Hyeongseok.Kim <Hyeongseok@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Pull EFI fixes from Ingo Molnar:
- Fix build regression on v4.8 and older
- Robustness fix for TPM log parsing code
- kobject refcount fix for the ESRT parsing code
- Two efivarfs fixes to make it behave more like an ordinary file
system
- Style fixup for zero length arrays
- Fix a regression in path separator handling in the initrd loader
- Fix a missing prototype warning
- Add some kerneldoc headers for newly introduced stub routines
- Allow support for SSDT overrides via EFI variables to be disabled
- Report CPU mode and MMU state upon entry for 32-bit ARM
- Use the correct stack pointer alignment when entering from mixed mode
* tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub: arm: Print CPU boot mode and MMU state at boot
efi/libstub: arm: Omit arch specific config table matching array on arm64
efi/x86: Setup stack correctly for efi_pe_entry
efi: Make it possible to disable efivar_ssdt entirely
efi/libstub: Descriptions for stub helper functions
efi/libstub: Fix path separator regression
efi/libstub: Fix missing-prototype warning for skip_spaces()
efi: Replace zero-length array and use struct_size() helper
efivarfs: Don't return -EINTR when rate-limiting reads
efivarfs: Update inode modification time for successful writes
efi/esrt: Fix reference count leak in esre_create_sysfs_entry.
efi/tpm: Verify event log header before parsing
efi/x86: Fix build with gcc 4
The cell name stored in the afs_cell struct is a 64-char + NUL buffer -
when it needs to be able to handle up to AFS_MAXCELLNAME (256 chars) + NUL.
Fix this by changing the array to a pointer and allocating the string.
Found using Coverity.
Fixes: 989782dcdc ("afs: Overhaul cell database management")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull cifs fixes from Steve French:
"Six cifs/smb3 fixes, three of them for stable.
Fixes xfstests 451, 313 and 316"
* tag '5.8-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: misc: Use array_size() in if-statement controlling expression
cifs: update ctime and mtime during truncate
cifs/smb3: Fix data inconsistent when punch hole
cifs/smb3: Fix data inconsistent when zero file range
cifs: Fix double add page to memcg when cifs_readpages
cifs: Fix cached_fid refcnt leak in open_shroot
Pull NFS client bugfixes from Anna Schumaker:
"Stable Fixes:
- xprtrdma: Fix handling of RDMA_ERROR replies
- sunrpc: Fix rollback in rpc_gssd_dummy_populate()
- pNFS/flexfiles: Fix list corruption if the mirror count changes
- NFSv4: Fix CLOSE not waiting for direct IO completion
- SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment()
Other Fixes:
- xprtrdma: Fix a use-after-free with r_xprt->rx_ep
- Fix other xprtrdma races during disconnect
- NFS: Fix memory leak of export_path"
* tag 'nfs-for-5.8-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment()
NFSv4 fix CLOSE not waiting for direct IO compeletion
pNFS/flexfiles: Fix list corruption if the mirror count changes
nfs: Fix memory leak of export_path
sunrpc: fixed rollback in rpc_gssd_dummy_populate()
xprtrdma: Fix handling of RDMA_ERROR replies
xprtrdma: Clean up disconnect
xprtrdma: Clean up synopsis of rpcrdma_flush_disconnect()
xprtrdma: Use re_connect_status safely in rpcrdma_xprt_connect()
xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed
Pull io_uring fixes from Jens Axboe:
"Three small fixes:
- Close a corner case for polled IO resubmission (Pavel)
- Toss commands when exiting (Pavel)
- Fix SQPOLL conditional reschedule on perpetually busy submit
(Xuan)"
* tag 'io_uring-5.8-2020-06-26' of git://git.kernel.dk/linux-block:
io_uring: fix current->mm NULL dereference on exit
io_uring: fix hanging iopoll in case of -EAGAIN
io_uring: fix io_sq_thread no schedule when busy
Figuring out the root case for the REMOVE/CLOSE race and
suggesting the solution was done by Neil Brown.
Currently what happens is that direct IO calls hold a reference
on the open context which is decremented as an asynchronous task
in the nfs_direct_complete(). Before reference is decremented,
control is returned to the application which is free to close the
file. When close is being processed, it decrements its reference
on the open_context but since directIO still holds one, it doesn't
sent a close on the wire. It returns control to the application
which is free to do other operations. For instance, it can delete a
file. Direct IO is finally releasing its reference and triggering
an asynchronous close. Which races with the REMOVE. On the server,
REMOVE can be processed before the CLOSE, failing the REMOVE with
EACCES as the file is still opened.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Suggested-by: Neil Brown <neilb@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the mirror count changes in the new layout we pick up inside
ff_layout_pg_init_write(), then we can end up adding the
request to the wrong mirror and corrupting the mirror->pg_list.
Fixes: d600ad1f2b ("NFS41: pop some layoutget errors to application")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The try_location function is called within a loop by nfs_follow_referral.
try_location calls nfs4_pathname_string to created the export_path.
nfs4_pathname_string allocates the memory. export_path is stored in the
nfs_fs_context/fs_context structure similarly as hostname and source.
But whereas the ctx hostname and source are freed before assignment,
export_path is not. So if there are multiple loops, the new export_path
will overwrite the old without the old being freed.
So call kfree for export_path.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull fsnotify fixlet from Jan Kara:
"A performance improvement to reduce impact of fsnotify for inodes
where it isn't used"
* tag 'fsnotify_for_v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fs: Do not check if there is a fsnotify watcher on pseudo inodes
io_do_iopoll() won't do anything with a request unless
req->iopoll_completed is set. So io_complete_rw_iopoll() has to set
it, otherwise io_do_iopoll() will poll a file again and again even
though the request of interest was completed long time ago.
Also, remove -EAGAIN check from io_issue_sqe() as it races with
the changed lines. The request will take the long way and be
resubmitted from io_iopoll*().
io_kiocb's result and iopoll_completed")
Fixes: bbde017a32 ("io_uring: add memory barrier to synchronize
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull erofs fix from Gao Xiang:
"Fix a regression which uses potential uninitialized high 32-bit value
unexpectedly recently observed with specific compiler options"
* tag 'erofs-for-5.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup
Hongyu reported "id != index" in z_erofs_onlinepage_fixup() with
specific aarch64 environment easily, which wasn't shown before.
After digging into that, I found that high 32 bits of page->private
was set to 0xaaaaaaaa rather than 0 (due to z_erofs_onlinepage_init
behavior with specific compiler options). Actually we only use low
32 bits to keep the page information since page->private is only 4
bytes on most 32-bit platforms. However z_erofs_onlinepage_fixup()
uses the upper 32 bits by mistake.
Let's fix it now.
Reported-and-tested-by: Hongyu Jin <hongyu.jin@unisoc.com>
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200618234349.22553-1-hsiangkao@aol.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Use array_size() instead of the open-coded version in the controlling
expression of the if statement.
Also, while there, use the preferred form for passing a size of a struct.
The alternative form where struct name is spelled out hurts readability
and introduces an opportunity for a bug when the pointer variable type is
changed but the corresponding sizeof that is passed as argument is not.
This issue was found with the help of Coccinelle and, audited and fixed
manually.
Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
As the man description of the truncate, if the size changed,
then the st_ctime and st_mtime fields should be updated. But
in cifs, we doesn't do it.
It lead the xfstests generic/313 failed.
So, add the ATTR_MTIME|ATTR_CTIME flags on attrs when change
the file size
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When punch hole success, we also can read old data from file:
# strace -e trace=pread64,fallocate xfs_io -f -c "pread 20 40" \
-c "fpunch 20 40" -c"pread 20 40" file
pread64(3, " version 5.8.0-rc1+"..., 40, 20) = 40
fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 20, 40) = 0
pread64(3, " version 5.8.0-rc1+"..., 40, 20) = 40
CIFS implements the fallocate(FALLOCATE_FL_PUNCH_HOLE) with send SMB
ioctl(FSCTL_SET_ZERO_DATA) to server. It just set the range of the
remote file to zero, but local page caches not updated, then the
local page caches inconsistent with server.
Also can be found by xfstests generic/316.
So, we need to remove the page caches before send the SMB
ioctl(FSCTL_SET_ZERO_DATA) to server.
Fixes: 31742c5a33 ("enable fallocate punch hole ("fallocate -p") for SMB3")
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Cc: stable@vger.kernel.org # v3.17
Signed-off-by: Steve French <stfrench@microsoft.com>
CIFS implements the fallocate(FALLOC_FL_ZERO_RANGE) with send SMB
ioctl(FSCTL_SET_ZERO_DATA) to server. It just set the range of the
remote file to zero, but local page cache not update, then the data
inconsistent with server, which leads the xfstest generic/008 failed.
So we need to remove the local page caches before send SMB
ioctl(FSCTL_SET_ZERO_DATA) to server. After next read, it will
re-cache it.
Fixes: 30175628bf ("[SMB3] Enable fallocate -z support for SMB3 mounts")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Cc: stable@vger.kernel.org # v3.17
Signed-off-by: Steve French <stfrench@microsoft.com>
When the user consumes and generates sqe at a fast rate,
io_sqring_entries can always get sqe, and ret will not be equal to -EBUSY,
so that io_sq_thread will never call cond_resched or schedule, and then
we will get the following system error prompt:
rcu: INFO: rcu_sched self-detected stall on CPU
or
watchdog: BUG: soft lockup-CPU#23 stuck for 112s! [io_uring-sq:1863]
This patch checks whether need to call cond_resched() by checking
the need_resched() function every cycle.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull btrfs fixes from David Sterba:
"A number of fixes, located in two areas, one performance fix and one
fixup for better integration with another patchset.
- bug fixes in nowait aio:
- fix snapshot creation hang after nowait-aio was used
- fix failure to write to prealloc extent past EOF
- don't block when extent range is locked
- block group fixes:
- relocation failure when scrub runs in parallel
- refcount fix when removing fails
- fix race between removal and creation
- space accounting fixes
- reinstante fast path check for log tree at unlink time, fixes
performance drop up to 30% in REAIM
- kzfree/kfree fixup to ease treewide patchset renaming kzfree"
* tag 'for-5.8-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: use kfree() in btrfs_ioctl_get_subvol_info()
btrfs: fix RWF_NOWAIT writes blocking on extent locks and waiting for IO
btrfs: fix RWF_NOWAIT write not failling when we need to cow
btrfs: fix failure of RWF_NOWAIT write into prealloc extent beyond eof
btrfs: fix hang on snapshot creation after RWF_NOWAIT write
btrfs: check if a log root exists before locking the log_mutex on unlink
btrfs: fix bytes_may_use underflow when running balance and scrub in parallel
btrfs: fix data block group relocation failure due to concurrent scrub
btrfs: fix race between block group removal and block group creation
btrfs: fix a block group ref counter leak after failure to remove block group
xlog_wait() on the CIL context can reference a freed context if the
waiter doesn't get scheduled before the CIL context is freed. This
can happen when a task is on the hard throttle and the CIL push
aborts due to a shutdown. This was detected by generic/019:
thread 1 thread 2
__xfs_trans_commit
xfs_log_commit_cil
<CIL size over hard throttle limit>
xlog_wait
schedule
xlog_cil_push_work
wake_up_all
<shutdown aborts commit>
xlog_cil_committed
kmem_free
remove_wait_queue
spin_lock_irqsave --> UAF
Fix it by moving the wait queue to the CIL rather than keeping it in
in the CIL context that gets freed on push completion. Because the
wait queue is now independent of the CIL context and we might have
multiple contexts in flight at once, only wake the waiters on the
push throttle when the context we are pushing is over the hard
throttle size threshold.
Fixes: 0e7ab7efe7 ("xfs: Throttle commits on delayed background CIL push")
Reported-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
open_shroot() invokes kref_get(), which increases the refcount of the
"tcon->crfid" object. When open_shroot() returns not zero, it means the
open operation failed and close_shroot() will not be called to decrement
the refcount of the "tcon->crfid".
The reference counting issue happens in one normal path of
open_shroot(). When the cached root have been opened successfully in a
concurrent process, the function increases the refcount and jump to
"oshr_free" to return. However the current return value "rc" may not
equal to 0, thus the increased refcount will not be balanced outside the
function, causing a refcnt leak.
Fix this issue by setting the value of "rc" to 0 before jumping to
"oshr_free" label.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Pull tracing fixes from Steven Rostedt:
- Have recordmcount work with > 64K sections (to support LTO)
- kprobe RCU fixes
- Correct a kprobe critical section with missing mutex
- Remove redundant arch_disarm_kprobe() call
- Fix lockup when kretprobe triggers within kprobe_flush_task()
- Fix memory leak in fetch_op_data operations
- Fix sleep in atomic in ftrace trace array sample code
- Free up memory on failure in sample trace array code
- Fix incorrect reporting of function_graph fields in format file
- Fix quote within quote parsing in bootconfig
- Fix return value of bootconfig tool
- Add testcases for bootconfig tool
- Fix maybe uninitialized warning in ftrace pid file code
- Remove unused variable in tracing_iter_reset()
- Fix some typos
* tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix maybe-uninitialized compiler warning
tools/bootconfig: Add testcase for show-command and quotes test
tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig
tools/bootconfig: Fix to use correct quotes for value
proc/bootconfig: Fix to use correct quotes for value
tracing: Remove unused event variable in tracing_iter_reset
tracing/probe: Fix memleak in fetch_op_data operations
trace: Fix typo in allocate_ftrace_ops()'s comment
tracing: Make ftrace packed events have align of 1
sample-trace-array: Remove trace_array 'sample-instance'
sample-trace-array: Fix sleeping function called from invalid context
kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
kprobes: Remove redundant arch_disarm_kprobe() call
kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
kprobes: Use non RCU traversal APIs on kprobe_tables if possible
kprobes: Suppress the suspicious RCU warning on kprobes
recordmcount: support >64k sections
The fileserver probe timer, net->fs_probe_timer, isn't cancelled when
the kafs module is being removed and so the count it holds on
net->servers_outstanding doesn't get dropped..
This causes rmmod to wait forever. The hung process shows a stack like:
afs_purge_servers+0x1b5/0x23c [kafs]
afs_net_exit+0x44/0x6e [kafs]
ops_exit_list+0x72/0x93
unregister_pernet_operations+0x14c/0x1ba
unregister_pernet_subsys+0x1d/0x2a
afs_exit+0x29/0x6f [kafs]
__do_sys_delete_module.isra.0+0x1a2/0x24b
do_syscall_64+0x51/0x95
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by:
(1) Attempting to cancel the probe timer and, if successful, drop the
count that the timer was holding.
(2) Make the timer function just drop the count and not schedule the
prober if the afs portion of net namespace is being destroyed.
Also, whilst we're at it, make the following changes:
(3) Initialise net->servers_outstanding to 1 and decrement it before
waiting on it so that it doesn't generate wake up events by being
decremented to 0 until we're cleaning up.
(4) Switch the atomic_dec() on ->servers_outstanding for ->fs_timer in
afs_purge_servers() to use the helper function for that.
Fixes: f6cbb368bc ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix afs_do_lookup()'s fallback case for when FS.InlineBulkStatus isn't
supported by the server.
In the fallback, it calls FS.FetchStatus for the specific vnode it's
meant to be looking up. Commit b6489a49f7 broke this by renaming one
of the two identically-named afs_fetch_status_operation descriptors to
something else so that one of them could be made non-static. The site
that used the renamed one, however, wasn't renamed and didn't produce
any warning because the other was declared in a header.
Fix this by making afs_do_lookup() use the renamed variant.
Note that there are two variants of the success method because one is
called from ->lookup() where we may or may not have an inode, but can't
call iget until after we've talked to the server - whereas the other is
called from within iget where we have an inode, but it may or may not be
initialised.
The latter variant expects there to be an inode, but because it's being
called from there former case, there might not be - resulting in an oops
like the following:
BUG: kernel NULL pointer dereference, address: 00000000000000b0
...
RIP: 0010:afs_fetch_status_success+0x27/0x7e
...
Call Trace:
afs_wait_for_operation+0xda/0x234
afs_do_lookup+0x2fe/0x3c1
afs_lookup+0x3c5/0x4bd
__lookup_slow+0xcd/0x10f
walk_component+0xa2/0x10c
path_lookupat.isra.0+0x80/0x110
filename_lookup+0x81/0x104
vfs_statx+0x76/0x109
__do_sys_newlstat+0x39/0x6b
do_syscall_64+0x4c/0x78
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: b6489a49f7 ("afs: Fix silly rename")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>