android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Yafang Shao	0d5a57140b	xfs: remove useless definitions in xfs_linux.h Remove current_pid(), current_test_flags() and current_clear_flags_nested(), because they are useless. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-07-06 10:46:58 -07:00
Dave Chinner	cd647d5651	xfs: use MMAPLOCK around filemap_map_pages() The page faultround path ->map_pages is implemented in XFS via filemap_map_pages(). This function checks that pages found in page cache lookups have not raced with truncate based invalidation by checking page->mapping is correct and page->index is within EOF. However, we've known for a long time that this is not sufficient to protect against races with invalidations done by operations that do not change EOF. e.g. hole punching and other fallocate() based direct extent manipulations. The way we protect against these races is we wrap the page fault operations in a XFS_MMAPLOCK_SHARED lock so they serialise against fallocate and truncate before calling into the filemap function that processes the fault. Do the same for XFS's ->map_pages implementation to close this potential data corruption issue. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-07-06 10:46:58 -07:00
Darrick J. Wong	e2aaee9cd3	xfs: move helpers that lock and unlock two inodes against userspace IO Move the double-inode locking helpers to xfs_inode.c since they're not specific to reflink. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-07-06 10:46:57 -07:00
Darrick J. Wong	10b4bd6c9c	xfs: refactor locking and unlocking two inodes against userspace IO Refactor the two functions that we use to lock and unlock two inodes to block userspace from initiating IO against a file, whether via system calls or mmap activity. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-07-06 10:46:57 -07:00
Darrick J. Wong	451d34ee07	xfs: fix xfs_reflink_remap_prep calling conventions Fix the return value of xfs_reflink_remap_prep so that its return value conventions match the rest of xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-07-06 10:46:57 -07:00
Darrick J. Wong	168eae803c	xfs: reflink can skip remap existing mappings If the source and destination map are identical, we can skip the remap step to save some time. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-07-06 10:46:57 -07:00
Darrick J. Wong	94b941fd7a	xfs: only reserve quota blocks if we're mapping into a hole When logging quota block count updates during a reflink operation, we only log the /delta/ of the block count changes to the dquot. Since we now know ahead of time the extent type of both dmap and smap (and that they have the same length), we know that we only need to reserve quota blocks for dmap's blockcount if we're mapping it into a hole. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-07-06 10:46:57 -07:00
Darrick J. Wong	aa5d0ba0b5	xfs: only reserve quota blocks for bmbt changes if we're changing the data fork Now that we've reworked xfs_reflink_remap_extent to remap only one extent per transaction, we actually know if the extent being removed is an allocated mapping. This means that we now know ahead of time if we're going to be touching the data fork. Since we only need blocks for a bmbt split if we're going to update the data fork, we only need to get quota reservation if we know we're going to touch the data fork. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-07-06 10:46:57 -07:00
Darrick J. Wong	00fd1d56dd	xfs: redesign the reflink remap loop to fix blkres depletion crash The existing reflink remapping loop has some structural problems that need addressing: The biggest problem is that we create one transaction for each extent in the source file without accounting for the number of mappings there are for the same range in the destination file. In other words, we don't know the number of remap operations that will be necessary and we therefore cannot guess the block reservation required. On highly fragmented filesystems (e.g. ones with active dedupe) we guess wrong, run out of block reservation, and fail. The second problem is that we don't actually use the bmap intents to their full potential -- instead of calling bunmapi directly and having to deal with its backwards operation, we could call the deferred ops xfs_bmap_unmap_extent and xfs_refcount_decrease_extent instead. This makes the frontend loop much simpler. Solve all of these problems by refactoring the remapping loops so that we only perform one remapping operation per transaction, and each operation only tries to remap a single extent from source to dest. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reported-by: Edwin Török <edwin@etorok.net> Tested-by: Edwin Török <edwin@etorok.net>	2020-07-06 10:46:57 -07:00
Darrick J. Wong	877f58f536	xfs: rename xfs_bmap_is_real_extent to is_written_extent The name of this predicate is a little misleading -- it decides if the extent mapping is allocated and written. Change the name to be more direct, as we're going to add a new predicate in the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-07-06 10:46:57 -07:00
Darrick J. Wong	83895227ab	xfs: fix reflink quota reservation accounting error Quota reservations are supposed to account for the blocks that might be allocated due to a bmap btree split. Reflink doesn't do this, so fix this to make the quota accounting more accurate before we start rearranging things. Fixes: `862bb360ef` ("xfs: reflink extents from one file to another") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-07-06 10:46:56 -07:00
Darrick J. Wong	eb0efe5063	xfs: don't eat an EIO/ENOSPC writeback error when scrubbing data fork The data fork scrubber calls filemap_write_and_wait to flush dirty pages and delalloc reservations out to disk prior to checking the data fork's extent mappings. Unfortunately, this means that scrub can consume the EIO/ENOSPC errors that would otherwise have stayed around in the address space until (we hope) the writer application calls fsync to persist data and collect errors. The end result is that programs that wrote to a file might never see the error code and proceed as if nothing were wrong. xfs_scrub is not in a position to notify file writers about the writeback failure, and it's only here to check metadata, not file contents. Therefore, if writeback fails, we should stuff the error code back into the address space so that an fsync by the writer application can pick that up. Fixes: `99d9d8d05d` ("xfs: scrub inode block mappings") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2020-07-06 10:46:56 -07:00
Brian Foster	f74681ba20	xfs: preserve rmapbt swapext block reservation from freed blocks The rmapbt extent swap algorithm remaps individual extents between the source inode and the target to trigger reverse mapping metadata updates. If either inode straddles a format or other bmap allocation boundary, the individual unmap and map cycles can trigger repeated bmap block allocations and frees as the extent count bounces back and forth across the boundary. While net block usage is bound across the swap operation, this behavior can prematurely exhaust the transaction block reservation because it continuously drains as the transaction rolls. Each allocation accounts against the reservation and each free returns to global free space on transaction roll. The previous workaround to this problem attempted to detect this boundary condition and provide surplus block reservation to acommodate it. This is insufficient because more remaps can occur than implied by the extent counts; if start offset boundaries are not aligned between the two inodes, for example. To address this problem more generically and dynamically, add a transaction accounting mode that returns freed blocks to the transaction reservation instead of the superblock counters on transaction roll and use it when the rmapbt based algorithm is active. This allows the chain of remap transactions to preserve the block reservation based own its own frees and prevent premature exhaustion regardless of the remap pattern. Note that this is only safe for superblocks with lazy sb accounting, but the latter is required for v5 supers and the rmap feature depends on v5. Fixes: `b3fed43482` ("xfs: account format bouncing into rmapbt swapext tx reservation") Root-caused-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-07-06 10:46:56 -07:00
Keyur Patel	06734e3c95	xfs: Couple of typo fixes in comments ./xfs/libxfs/xfs_inode_buf.c:56: unnecssary ==> unnecessary ./xfs/libxfs/xfs_inode_buf.c:59: behavour ==> behaviour ./xfs/libxfs/xfs_inode_buf.c:206: unitialized ==> uninitialized Signed-off-by: Keyur Patel <iamkeyur96@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-07-06 10:46:56 -07:00
Pavel Begunkov	3fcee5a6d5	io_uring: briefly loose locks while reaping events It's not nice to hold @uring_lock for too long io_iopoll_reap_events(). For instance, the lock is needed to publish requests to @poll_list, and that locks out tasks doing that for no good reason. Loose it occasionally. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-06 09:06:20 -06:00
Pavel Begunkov	eba0a4dd2a	io_uring: fix stopping iopoll'ing too early Nobody adjusts *nr_events (number of completed requests) before calling io_iopoll_getevents(), so the passed @min shouldn't be adjusted as well. Othewise it can return less than initially asked @min without hitting need_resched(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-06 09:06:20 -06:00
Pavel Begunkov	3aadc23e60	io_uring: don't delay iopoll'ed req completion ->iopoll() may have completed current request, but instead of reaping it, io_do_iopoll() just continues with the next request in the list. As a result it can leave just polled and completed request in the list up until next syscall. Even outer loop in io_iopoll_getevents() doesn't help the situation. E.g. poll_list: req0 -> req1 If req0->iopoll() completed both requests, and @min<=1, then @req0 will be left behind. Check whether a req was completed after ->iopoll(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-06 09:06:20 -06:00
Pavel Begunkov	8b3656af2a	io_uring: fix lost cqe->flags Don't forget to fill cqe->flags properly in io_submit_flush_completions() Fixes: `a1d7c393c4` ("io_uring: enable READ/WRITE to use deferred completions") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-05 15:07:50 -06:00
Pavel Begunkov	652532ad45	io_uring: keep queue_sqe()'s fail path separately A preparation path, extracts error path into a separate block. It looks saner then calling req_set_fail_links() after io_put_req_find_next(), even though it have been working well. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-05 15:07:37 -06:00
Pavel Begunkov	6df1db6b54	io_uring: fix mis-refcounting linked timeouts io_prep_linked_timeout() sets REQ_F_LINK_TIMEOUT altering refcounting of the following linked request. After that someone should call io_queue_linked_timeout(), otherwise a submission reference of the linked timeout won't be ever dropped. That's what happens in io_steal_work() if io-wq decides to postpone linked request with io_wqe_enqueue(). io_queue_linked_timeout() can also be potentially called twice without synchronisation during re-submission, e.g. io_rw_resubmit(). There are the rules, whoever did io_prep_linked_timeout() must also call io_queue_linked_timeout(). To not do it twice, io_prep_linked_timeout() will return non NULL only for the first call. That's controlled by REQ_F_LINK_TIMEOUT flag. Also kill REQ_F_QUEUE_TIMEOUT. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-05 15:07:35 -06:00
Jens Axboe	c2c4c83c58	io_uring: use new io_req_task_work_add() helper throughout Since we now have that in the 5.9 branch, convert the existing users of task_work_add() to use this new helper. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-05 15:07:31 -06:00
Jens Axboe	4c6e277c4c	io_uring: abstract out task work running Provide a helper to run task_work instead of checking and running manually in a bunch of different spots. While doing so, also move the task run state setting where we run the task work. Then we can move it out of the callback helpers. This also helps ensure we only do this once per task_work list run, not per task_work item. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-05 15:05:22 -06:00
Jens Axboe	58c6a581de	Merge branch 'io_uring-5.8' into for-5.9/io_uring Pull in task_work changes from the 5.8 series, as we'll need to apply the same kind of changes to other parts in the 5.9 branch. * io_uring-5.8: io_uring: fix regression with always ignoring signals in io_cqring_wait() io_uring: use signal based task_work running task_work: teach task_work_add() to do signal_wake_up()	2020-07-05 15:04:17 -06:00
Alexander A. Klimov	cba22b1c59	Replace HTTP links with HTTPS ones: CIFS Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w\|/)`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Link: https://lore.kernel.org/r/20200627103125.71828-1-grandmaster@al2klimov.de Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2020-07-05 14:23:38 -06:00
Linus Torvalds	9fbe565cb7	Merge tag 'io_uring-5.8-2020-07-05' of git://git.kernel.dk/linux-block Pull io_uring fix from Jens Axboe: "Andres reported a regression with the fix that was merged earlier this week, where his setup of using signals to interrupt io_uring CQ waits no longer worked correctly. Fix this, and also limit our use of TWA_SIGNAL to the case where we need it, and continue using TWA_RESUME for task_work as before. Since the original is marked for 5.7 stable, let's flush this one out early" * tag 'io_uring-5.8-2020-07-05' of git://git.kernel.dk/linux-block: io_uring: fix regression with always ignoring signals in io_cqring_wait()	2020-07-05 10:41:33 -07:00
Jens Axboe	b7db41c9e0	io_uring: fix regression with always ignoring signals in io_cqring_wait() When switching to TWA_SIGNAL for task_work notifications, we also made any signal based condition in io_cqring_wait() return -ERESTARTSYS. This breaks applications that rely on using signals to abort someone waiting for events. Check if we have a signal pending because of queued task_work, and repeat the signal check once we've run the task_work. This provides a reliable way of telling the two apart. Additionally, only use TWA_SIGNAL if we are using an eventfd. If not, we don't have the dependency situation described in the original commit, and we can get by with just using TWA_RESUME like we previously did. Fixes: `ce593a6c48` ("io_uring: use signal based task_work running") Cc: stable@vger.kernel.org # v5.7 Reported-by: Andres Freund <andres@anarazel.de> Tested-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-04 13:44:45 -06:00
Eric W. Biederman	25cf336de5	exec: Remove do_execve_file Now that the last callser has been removed remove this code from exec. For anyone thinking of resurrecing do_execve_file please note that the code was buggy in several fundamental ways. - It did not ensure the file it was passed was read-only and that deny_write_access had been called on it. Which subtlely breaks invaniants in exec. - The caller of do_execve_file was expected to hold and put a reference to the file, but an extra reference for use by exec was not taken so that when exec put it's reference to the file an underflow occured on the file reference count. - The point of the interface was so that a pathname did not need to exist. Which breaks pathname based LSMs. Tetsuo Handa originally reported these issues[1]. While it was clear that deny_write_access was missing the fundamental incompatibility with the passed in O_RDWR filehandle was not immediately recognized. All of these issues were fixed by modifying the usermode driver code to have a path, so it did not need this hack. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ v1: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87lfk54p0m.fsf_-_@x220.int.ebiederm.org Link: https://lkml.kernel.org/r/20200702164140.4468-10-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2020-07-04 09:35:43 -05:00
Linus Torvalds	8b082a41da	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull sysctl fix from Al Viro: "Another regression fix for sysctl changes this cycle..." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Call sysctl_head_finish on error	2020-07-03 23:20:14 -07:00
Linus Torvalds	b8e516b367	Merge tag '5.8-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Eight cifs/smb3 fixes, most when specifying the multiuser mount flag. Five of the fixes are for stable" * tag '5.8-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: prevent truncation from long to int in wait_for_free_credits cifs: Fix the target file was deleted when rename failed. SMB3: Honor 'posix' flag for multiuser mounts SMB3: Honor 'handletimeout' flag for multiuser mounts SMB3: Honor lease disabling for multiuser mounts SMB3: Honor persistent/resilient handle flags for multiuser mounts SMB3: Honor 'seal' flag for multiuser mounts cifs: Display local UID details for SMB sessions in DebugData	2020-07-03 23:03:45 -07:00
Linus Torvalds	0c7d7d1fad	Merge tag 'xfs-5.8-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fix from Darrick Wong: "Fix a use-after-free bug when the fs shuts down" * tag 'xfs-5.8-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix use-after-free on CIL context on shutdown	2020-07-03 14:46:46 -07:00
Linus Torvalds	bf2d63694e	Merge tag 'gfs2-v5.8-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Various gfs2 fixes" * tag 'gfs2-v5.8-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: The freeze glock should never be frozen gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE gfs2: read-only mounts should grab the sd_freeze_gl glock gfs2: freeze should work on read-only mounts gfs2: eliminate GIF_ORDERED in favor of list_empty gfs2: Don't sleep during glock hash walk gfs2: fix trans slab error when withdraw occurs inside log_flush gfs2: Don't return NULL from gfs2_inode_lookup	2020-07-03 12:01:04 -07:00
Matthew Wilcox (Oracle)	d4d80e6992	Call sysctl_head_finish on error This error path returned directly instead of calling sysctl_head_finish(). Fixes: `ef9d965bc8` ("sysctl: reject gigantic reads/write to sysctl files") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-07-03 14:10:46 -04:00
Bob Peterson	c860f8ffbe	gfs2: The freeze glock should never be frozen Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP (Do not freeze) flag, and some did not. We never want to freeze the freeze glock, so this patch makes it consistently use LM_FLAG_NOEXP always. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2020-07-03 12:05:35 +02:00
Bob Peterson	623ba664b7	gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE Before this patch, the freeze code in gfs2 specified GL_NOCACHE in several places. That's wrong because we always want to know the state of whether the file system is frozen. There was also a problem with freeze/thaw transitioning the glock from frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX to processes that request it in SH mode, unless GL_EXACT is specified. Therefore, the freeze/thaw code, which tried to reacquire the glock in SH mode would get the glock in EX mode, and miss the transition from EX to SH. That made it think the thaw had completed normally, but since the glock was still cached in EX, other nodes could not freeze again. This patch removes the GL_NOCACHE flag to allow the freeze glock to be cached. It also adds the GL_EXACT flag so the glock is fully transitioned from EX to SH, thereby allowing future freeze operations. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2020-07-03 12:05:35 +02:00
Bob Peterson	b780cc615b	gfs2: read-only mounts should grab the sd_freeze_gl glock Before this patch, only read-write mounts would grab the freeze glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze glock was never initialized. That meant requests to freeze, which request the glock in EX, were granted without any state transition. That meant you could mount a gfs2 file system, which is currently frozen on a different cluster node, in read-only mode. This patch makes read-only mounts lock the freeze glock in SH mode, which will block for file systems that are frozen on another node. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2020-07-03 12:05:35 +02:00
Bob Peterson	541656d3a5	gfs2: freeze should work on read-only mounts Before this patch, function freeze_go_sync, called when promoting the freeze glock, was testing for the SDF_JOURNAL_LIVE superblock flag. That's only set for read-write mounts. Read-only mounts don't use a journal, so the bit is never set, so the freeze never happened. This patch removes the check for SDF_JOURNAL_LIVE for freeze requests but still checks it when deciding whether to flush a journal. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2020-07-03 12:05:35 +02:00
Bob Peterson	7542486b89	gfs2: eliminate GIF_ORDERED in favor of list_empty In several places, we used the GIF_ORDERED inode flag to determine if an inode was on the ordered writes list. However, since we always held the sd_ordered_lock spin_lock during the manipulation, we can just as easily check list_empty(&ip->i_ordered) instead. This allows us to keep more than one ordered writes list to make journal writing improvements. This patch eliminates GIF_ORDERED in favor of checking list_empty. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2020-07-03 12:05:34 +02:00
Linus Torvalds	083176c86f	Merge tag 'nfsd-5.8-1' of git://linux-nfs.org/~bfields/linux Pull nfsd fixes from Bruce Fields: "Fixes for a umask bug on exported filesystems lacking ACL support, a leak and a module unloading bug in the /proc/fs/nfsd/clients/ code, and a compile warning" * tag 'nfsd-5.8-1' of git://linux-nfs.org/~bfields/linux: SUNRPC: Add missing definition of ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE nfsd: fix nfsdfs inode reference count leak nfsd4: fix nfsdfs reference count loop nfsd: apply umask on fs without ACL support	2020-07-02 20:35:33 -07:00
Linus Torvalds	c93493b7cd	Merge tag 'io_uring-5.8-2020-07-01' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "One fix in here, for a regression in 5.7 where a task is waiting in the kernel for a condition, but that condition won't become true until task_work is run. And the task_work can't be run exactly because the task is waiting in the kernel, so we'll never make any progress. One example of that is registering an eventfd and queueing io_uring work, and then the task goes and waits in eventfd read with the expectation that it'll get woken (and read an event) when the io_uring request completes. The io_uring request is finished through task_work, which won't get run while the task is looping in eventfd read" * tag 'io_uring-5.8-2020-07-01' of git://git.kernel.dk/linux-block: io_uring: use signal based task_work running task_work: teach task_work_add() to do signal_wake_up()	2020-07-02 14:56:22 -07:00
Josef Bacik	0465337c55	btrfs: reset tree root pointer after error in init_tree_roots Eric reported an issue where mounting -o recovery with a fuzzed fs resulted in a kernel panic. This is because we tried to free the tree node, except it was an error from the read. Fix this by properly resetting the tree_root->node == NULL in this case. The panic was the following BTRFS warning (device loop0): failed to read tree root BUG: kernel NULL pointer dereference, address: 000000000000001f RIP: 0010:free_extent_buffer+0xe/0x90 [btrfs] Call Trace: free_root_extent_buffers.part.0+0x11/0x30 [btrfs] free_root_pointers+0x1a/0xa2 [btrfs] open_ctree+0x1776/0x18a5 [btrfs] btrfs_mount_root.cold+0x13/0xfa [btrfs] ? selinux_fs_context_parse_param+0x37/0x80 legacy_get_tree+0x27/0x40 vfs_get_tree+0x25/0xb0 fc_mount+0xe/0x30 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x147/0x3e0 [btrfs] ? cred_has_capability+0x7c/0x120 ? legacy_get_tree+0x27/0x40 legacy_get_tree+0x27/0x40 vfs_get_tree+0x25/0xb0 do_mount+0x735/0xa40 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Nik says: this is problematic only if we fail on the last iteration of the loop as this results in init_tree_roots returning err value with tree_root->node = -ERR. Subsequently the caller does: fail_tree_roots which calls free_root_pointers on the bogus value. Reported-by: Eric Sandeen <sandeen@redhat.com> Fixes: `b8522a1e5f` ("btrfs: Factor out tree roots initialization during mount") CC: stable@vger.kernel.org # 5.5+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add details how the pointer gets dereferenced ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-07-02 10:27:12 +02:00
Filipe Manana	6d548b9e5d	btrfs: fix reclaim_size counter leak after stealing from global reserve Commit `7f9fe61440` ("btrfs: improve global reserve stealing logic"), added in the 5.8 merge window, introduced another leak for the space_info's reclaim_size counter. This is very often triggered by the test cases generic/269 and generic/416 from fstests, producing a stack trace like the following during unmount: [37079.155499] ------------[ cut here ]------------ [37079.156844] WARNING: CPU: 2 PID: 2000423 at fs/btrfs/block-group.c:3422 btrfs_free_block_groups+0x2eb/0x300 [btrfs] [37079.158090] Modules linked in: dm_snapshot btrfs dm_thin_pool (...) [37079.164440] CPU: 2 PID: 2000423 Comm: umount Tainted: G W 5.7.0-rc7-btrfs-next-62 #1 [37079.165422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), (...) [37079.167384] RIP: 0010:btrfs_free_block_groups+0x2eb/0x300 [btrfs] [37079.168375] Code: bd 58 ff ff ff 00 4c 8d (...) [37079.170199] RSP: 0018:ffffaa53875c7de0 EFLAGS: 00010206 [37079.171120] RAX: ffff98099e701cf8 RBX: ffff98099e2d4000 RCX: 0000000000000000 [37079.172057] RDX: 0000000000000001 RSI: ffffffffc0acc5b1 RDI: 00000000ffffffff [37079.173002] RBP: ffff98099e701cf8 R08: 0000000000000000 R09: 0000000000000000 [37079.173886] R10: 0000000000000000 R11: 0000000000000000 R12: ffff98099e701c00 [37079.174730] R13: ffff98099e2d5100 R14: dead000000000122 R15: dead000000000100 [37079.175578] FS: 00007f4d7d0a5840(0000) GS:ffff9809ec600000(0000) knlGS:0000000000000000 [37079.176434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [37079.177289] CR2: 0000559224dcc000 CR3: 000000012207a004 CR4: 00000000003606e0 [37079.178152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [37079.178935] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [37079.179675] Call Trace: [37079.180419] close_ctree+0x291/0x2d1 [btrfs] [37079.181162] generic_shutdown_super+0x6c/0x100 [37079.181898] kill_anon_super+0x14/0x30 [37079.182641] btrfs_kill_super+0x12/0x20 [btrfs] [37079.183371] deactivate_locked_super+0x31/0x70 [37079.184012] cleanup_mnt+0x100/0x160 [37079.184650] task_work_run+0x68/0xb0 [37079.185284] exit_to_usermode_loop+0xf9/0x100 [37079.185920] do_syscall_64+0x20d/0x260 [37079.186556] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [37079.187197] RIP: 0033:0x7f4d7d2d9357 [37079.187836] Code: eb 0b 00 f7 d8 64 89 01 48 (...) [37079.189180] RSP: 002b:00007ffee4e0d368 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [37079.189845] RAX: 0000000000000000 RBX: 00007f4d7d3fb224 RCX: 00007f4d7d2d9357 [37079.190515] RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000559224dc5c90 [37079.191173] RBP: 0000559224dc1970 R08: 0000000000000000 R09: 00007ffee4e0c0e0 [37079.191815] R10: 0000559224dc7b00 R11: 0000000000000246 R12: 0000000000000000 [37079.192451] R13: 0000559224dc5c90 R14: 0000559224dc1a80 R15: 0000559224dc1ba0 [37079.193096] irq event stamp: 0 [37079.193729] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [37079.194379] hardirqs last disabled at (0): [<ffffffff97ab8935>] copy_process+0x755/0x1ea0 [37079.195033] softirqs last enabled at (0): [<ffffffff97ab8935>] copy_process+0x755/0x1ea0 [37079.195700] softirqs last disabled at (0): [<0000000000000000>] 0x0 [37079.196318] ---[ end trace b32710d864dea887 ]--- In the past commit `d611add48b` ("btrfs: fix reclaim counter leak of space_info objects") fixed similar cases. That commit however has a date more recent (April 7 2020) then the commit mentioned before (March 13 2020), however it was merged in kernel 5.7 while the older commit, which introduces a new leak, was merged only in the 5.8 merge window. So the leak sneaked in unnoticed. Fix this by making steal_from_global_rsv() remove the ticket using the helper remove_ticket(), which decrements the reclaim_size counter of the space_info object. Fixes: `7f9fe61440` ("btrfs: improve global reserve stealing logic") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-07-02 10:18:34 +02:00
Boris Burkov	6bf9cd2eed	btrfs: fix fatal extent_buffer readahead vs releasepage race Under somewhat convoluted conditions, it is possible to attempt to release an extent_buffer that is under io, which triggers a BUG_ON in btrfs_release_extent_buffer_pages. This relies on a few different factors. First, extent_buffer reads done as readahead for searching use WAIT_NONE, so they free the local extent buffer reference while the io is outstanding. However, they should still be protected by TREE_REF. However, if the system is doing signficant reclaim, and simultaneously heavily accessing the extent_buffers, it is possible for releasepage to race with two concurrent readahead attempts in a way that leaves TREE_REF unset when the readahead extent buffer is released. Essentially, if two tasks race to allocate a new extent_buffer, but the winner who attempts the first io is rebuffed by a page being locked (likely by the reclaim itself) then the loser will still go ahead with issuing the readahead. The loser's call to find_extent_buffer must also race with the reclaim task reading the extent_buffer's refcount as 1 in a way that allows the reclaim to re-clear the TREE_REF checked by find_extent_buffer. The following represents an example execution demonstrating the race: CPU0 CPU1 CPU2 reada_for_search reada_for_search readahead_tree_block readahead_tree_block find_create_tree_block find_create_tree_block alloc_extent_buffer alloc_extent_buffer find_extent_buffer // not found allocates eb lock pages associate pages to eb insert eb into radix tree set TREE_REF, refs == 2 unlock pages read_extent_buffer_pages // WAIT_NONE not uptodate (brand new eb) lock_page if !trylock_page goto unlock_exit // not an error free_extent_buffer release_extent_buffer atomic_dec_and_test refs to 1 find_extent_buffer // found try_release_extent_buffer take refs_lock reads refs == 1; no io atomic_inc_not_zero refs to 2 mark_buffer_accessed check_buffer_tree_ref // not STALE, won't take refs_lock refs == 2; TREE_REF set // no action read_extent_buffer_pages // WAIT_NONE clear TREE_REF release_extent_buffer atomic_dec_and_test refs to 1 unlock_page still not uptodate (CPU1 read failed on trylock_page) locks pages set io_pages > 0 submit io return free_extent_buffer release_extent_buffer dec refs to 0 delete from radix tree btrfs_release_extent_buffer_pages BUG_ON(io_pages > 0)!!! We observe this at a very low rate in production and were also able to reproduce it in a test environment by introducing some spurious delays and by introducing probabilistic trylock_page failures. To fix it, we apply check_tree_ref at a point where it could not possibly be unset by a competing task: after io_pages has been incremented. All the codepaths that clear TREE_REF check for io, so they would not be able to clear it after this point until the io is done. Stack trace, for reference: [1417839.424739] ------------[ cut here ]------------ [1417839.435328] kernel BUG at fs/btrfs/extent_io.c:4841! [1417839.447024] invalid opcode: 0000 [#1] SMP [1417839.502972] RIP: 0010:btrfs_release_extent_buffer_pages+0x20/0x1f0 [1417839.517008] Code: ed e9 ... [1417839.558895] RSP: 0018:ffffc90020bcf798 EFLAGS: 00010202 [1417839.570816] RAX: 0000000000000002 RBX: ffff888102d6def0 RCX: 0000000000000028 [1417839.586962] RDX: 0000000000000002 RSI: ffff8887f0296482 RDI: ffff888102d6def0 [1417839.603108] RBP: ffff88885664a000 R08: 0000000000000046 R09: 0000000000000238 [1417839.619255] R10: 0000000000000028 R11: ffff88885664af68 R12: 0000000000000000 [1417839.635402] R13: 0000000000000000 R14: ffff88875f573ad0 R15: ffff888797aafd90 [1417839.651549] FS: 00007f5a844fa700(0000) GS:ffff88885f680000(0000) knlGS:0000000000000000 [1417839.669810] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1417839.682887] CR2: 00007f7884541fe0 CR3: 000000049f609002 CR4: 00000000003606e0 [1417839.699037] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1417839.715187] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1417839.731320] Call Trace: [1417839.737103] release_extent_buffer+0x39/0x90 [1417839.746913] read_block_for_search.isra.38+0x2a3/0x370 [1417839.758645] btrfs_search_slot+0x260/0x9b0 [1417839.768054] btrfs_lookup_file_extent+0x4a/0x70 [1417839.778427] btrfs_get_extent+0x15f/0x830 [1417839.787665] ? submit_extent_page+0xc4/0x1c0 [1417839.797474] ? __do_readpage+0x299/0x7a0 [1417839.806515] __do_readpage+0x33b/0x7a0 [1417839.815171] ? btrfs_releasepage+0x70/0x70 [1417839.824597] extent_readpages+0x28f/0x400 [1417839.833836] read_pages+0x6a/0x1c0 [1417839.841729] ? startup_64+0x2/0x30 [1417839.849624] __do_page_cache_readahead+0x13c/0x1a0 [1417839.860590] filemap_fault+0x6c7/0x990 [1417839.869252] ? xas_load+0x8/0x80 [1417839.876756] ? xas_find+0x150/0x190 [1417839.884839] ? filemap_map_pages+0x295/0x3b0 [1417839.894652] __do_fault+0x32/0x110 [1417839.902540] __handle_mm_fault+0xacd/0x1000 [1417839.912156] handle_mm_fault+0xaa/0x1c0 [1417839.921004] __do_page_fault+0x242/0x4b0 [1417839.930044] ? page_fault+0x8/0x30 [1417839.937933] page_fault+0x1e/0x30 [1417839.945631] RIP: 0033:0x33c4bae [1417839.952927] Code: Bad RIP value. [1417839.960411] RSP: 002b:00007f5a844f7350 EFLAGS: 00010206 [1417839.972331] RAX: 000000000000006e RBX: 1614b3ff6a50398a RCX: 0000000000000000 [1417839.988477] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [1417840.004626] RBP: 00007f5a844f7420 R08: 000000000000006e R09: 00007f5a94aeccb8 [1417840.020784] R10: 00007f5a844f7350 R11: 0000000000000000 R12: 00007f5a94aecc79 [1417840.036932] R13: 00007f5a94aecc78 R14: 00007f5a94aecc90 R15: 00007f5a94aecc40 CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>	2020-07-02 10:18:33 +02:00
Marcos Paulo de Souza	c730ae0c6b	btrfs: convert comments to fallthrough annotations Convert fall through comments to the pseudo-keyword which is now the preferred way. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-07-02 10:18:30 +02:00
Ronnie Sahlberg	19e888678b	cifs: prevent truncation from long to int in wait_for_free_credits The wait_event_... defines evaluate to long so we should not assign it an int as this may truncate the value. Reported-by: Marshall Midden <marshallmidden@gmail.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2020-07-01 20:01:26 -05:00
Zhang Xiaoxu	9ffad9263b	cifs: Fix the target file was deleted when rename failed. When xfstest generic/035, we found the target file was deleted if the rename return -EACESS. In cifs_rename2, we unlink the positive target dentry if rename failed with EACESS or EEXIST, even if the target dentry is positived before rename. Then the existing file was deleted. We should just delete the target file which created during the rename. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:41:56 -05:00
Paul Aurich	5391b8e1b7	SMB3: Honor 'posix' flag for multiuser mounts The flag from the primary tcon needs to be copied into the volume info so that cifs_get_tcon will try to enable extensions on the per-user tcon. At that point, since posix extensions must have already been enabled on the superblock, don't try to needlessly adjust the mount flags. Fixes: `ce558b0e17` ("smb3: Add posix create context for smb3.11 posix mounts") Fixes: `b326614ea2` ("smb3: allow "posix" mount option to enable new SMB311 protocol extensions") Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:41:36 -05:00
Paul Aurich	6b356f6cf9	SMB3: Honor 'handletimeout' flag for multiuser mounts Fixes: `ca567eb2b3` ("SMB3: Allow persistent handle timeout to be configurable on mount") Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:40:33 -05:00
Paul Aurich	ad35f169db	SMB3: Honor lease disabling for multiuser mounts Fixes: `3e7a02d478` ("smb3: allow disabling requesting leases") Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:40:17 -05:00
Paul Aurich	00dfbc2f9c	SMB3: Honor persistent/resilient handle flags for multiuser mounts Without this: - persistent handles will only be enabled for per-user tcons if the server advertises the 'Continuous Availabity' capability - resilient handles would never be enabled for per-user tcons Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:40:06 -05:00
Paul Aurich	cc15461c73	SMB3: Honor 'seal' flag for multiuser mounts Ensure multiuser SMB3 mounts use encryption for all users' tcons if the mount options are configured to require encryption. Without this, only the primary tcon and IPC tcons are guaranteed to be encrypted. Per-user tcons would only be encrypted if the server was configured to require encryption. Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>	2020-07-01 19:38:46 -05:00

... 17 18 19 20 21 ...

66232 Commits