android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Xiaoguang Wang	6b668c9b7f	io_uring: don't submit sqes when ctx->refs is dying When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait for sq thread to idle by busy loop: while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait)) cond_resched(); Above loop isn't very CPU friendly, it may introduce a short cpu burst on the current cpu. If ctx->refs is dying, we forbid sq_thread from submitting any further SQEs. Instead they just get discarded when we exit. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-20 08:41:26 -06:00
Xiaoguang Wang	d4ae271dfa	io_uring: reset -EBUSY error when io sq thread is waken up In io_sq_thread(), currently if we get an -EBUSY error and go to sleep, we will won't clear it again, which will result in io_sq_thread() will never have a chance to submit sqes again. Below test program test.c can reveal this bug: int main(int argc, char argv[]) { struct io_uring ring; int i, fd, ret; struct io_uring_sqe sqe; struct io_uring_cqe cqe; struct iovec iovecs; void *buf; struct io_uring_params p; if (argc < 2) { printf("%s: file\n", argv[0]); return 1; } memset(&p, 0, sizeof(p)); p.flags = IORING_SETUP_SQPOLL; ret = io_uring_queue_init_params(4, &ring, &p); if (ret < 0) { fprintf(stderr, "queue_init: %s\n", strerror(-ret)); return 1; } fd = open(argv[1], O_RDONLY \| O_DIRECT); if (fd < 0) { perror("open"); return 1; } iovecs = calloc(10, sizeof(struct iovec)); for (i = 0; i < 10; i++) { if (posix_memalign(&buf, 4096, 4096)) return 1; iovecs[i].iov_base = buf; iovecs[i].iov_len = 4096; } ret = io_uring_register_files(&ring, &fd, 1); if (ret < 0) { fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret); return ret; } for (i = 0; i < 10; i++) { sqe = io_uring_get_sqe(&ring); if (!sqe) break; io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0); sqe->flags \|= IOSQE_FIXED_FILE; ret = io_uring_submit(&ring); sleep(1); printf("submit %d\n", i); } for (i = 0; i < 10; i++) { io_uring_wait_cqe(&ring, &cqe); printf("receive: %d\n", i); if (cqe->res != 4096) { fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res); ret = 1; } io_uring_cqe_seen(&ring, cqe); } close(fd); io_uring_queue_exit(&ring); return 0; } sudo ./test testfile above command will hang on the tenth request, to fix this bug, when io sq_thread is waken up, we reset the variable 'ret' to be zero. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-20 07:26:47 -06:00
Jens Axboe	b532576ed3	io_uring: don't add non-IO requests to iopoll pending list We normally disable any commands that aren't specifically poll commands for a ring that is setup for polling, but we do allow buffer provide and remove commands to support buffer selection for polled IO. Once a request is issued, we add it to the poll list to poll for completion. But we should not do that for non-IO commands, as those request complete inline immediately and aren't pollable. If we do, we can leave requests on the iopoll list after they are freed. Fixes: `ddf0322db7` ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-19 21:20:27 -06:00
Linus Torvalds	115a54162a	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fix from Al Viro: "Stable fodder fix: copy_fdtable() would get screwed on 64bit boxen with sysctl_nr_open raised to 512M or higher, which became possible since 2.6.25. Nobody sane would set the things up that way, but..." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix multiplication overflow in copy_fdtable()	2020-05-19 16:33:26 -07:00
Al Viro	4e89b72104	fix multiplication overflow in copy_fdtable() cpy and set really should be size_t; we won't get an overflow on that, since sysctl_nr_open can't be set above ~(size_t)0 / sizeof(void *), so nr that would've managed to overflow size_t on that multiplication won't get anywhere near copy_fdtable() - we'll fail with EMFILE before that. Cc: stable@kernel.org # v2.6.25+ Fixes: `9cfe015aa4` (get rid of NR_OPEN and introduce a sysctl_nr_open) Reported-by: Thiago Macieira <thiago.macieira@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-19 18:29:36 -04:00
Bijan Mottahedeh	4f4eeba87c	io_uring: don't use kiocb.private to store buf_index kiocb.private is used in iomap_dio_rw() so store buf_index separately. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Move 'buf_index' to a hole in io_kiocb. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-19 16:19:49 -06:00
Christoph Hellwig	959f758451	ext4: fix fiemap size checks for bitmap files Add an extra validation of the len parameter, as for ext4 some files might have smaller file size limits than others. This also means the redundant size check in ext4_ioctl_get_es_cache can go away, as all size checking is done in the shared fiemap handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200505154324.3226743-3-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2020-05-19 15:03:37 -04:00
Ritesh Harjani	9f44eda195	ext4: fix EXT4_MAX_LOGICAL_BLOCK macro ext4 supports max number of logical blocks in a file to be 0xffffffff. (This is since ext4_extent's ee_block is __le32). This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting from 0 logical offset). This patch fixes this. The issue was seen when ext4 moved to iomap_fiemap API and when overlayfs was mounted on top of ext4. Since overlayfs was missing filemap_check_ranges(), so it could pass a arbitrary huge length which lead to overflow of map.m_len logic. This patch fixes that. Fixes: `d3b6f23f71` ("ext4: move ext4_fiemap to use iomap framework") Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2020-05-19 15:03:37 -04:00
Christoph Hellwig	ef8385128d	xfs: cleanup xfs_idestroy_fork Move freeing the dynamically allocated attr and COW fork, as well as zeroing the pointers where actually needed into the callers, and just pass the xfs_ifork structure to xfs_idestroy_fork. Also simplify the kmem_free calls by not checking for NULL first. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:59 -07:00
Christoph Hellwig	f7e67b20ec	xfs: move the fork format fields into struct xfs_ifork Both the data and attr fork have a format that is stored in the legacy idinode. Move it into the xfs_ifork structure instead, where it uses up padding. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:58 -07:00
Christoph Hellwig	daf83964a3	xfs: move the per-fork nextents fields into struct xfs_ifork There are there are three extents counters per inode, one for each of the forks. Two are in the legacy icdinode and one is directly in struct xfs_inode. Switch to a single counter in the xfs_ifork structure where it uses up padding at the end of the structure. This simplifies various bits of code that just wants the number of extents counter and can now directly dereference it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:58 -07:00
Christoph Hellwig	b2c20045b6	xfs: remove xfs_ifree_local_data xfs_ifree only need to free inline data in the data fork, as we've already taken care of the attr fork before (and in fact freed the fork structure). Just open code the freeing of the inline data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:58 -07:00
Christoph Hellwig	09c38edd54	xfs: remove the XFS_DFORK_Q macro Just checking di_forkoff directly is a little easier to follow. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:58 -07:00
Darrick J. Wong	5fd68bdb5a	xfs: clean up xchk_bmap_check_rmaps usage of XFS_IFORK_Q XFS_IFORK_Q is supposed to be a predicate, not a function returning a value. Its usage is in xchk_bmap_check_rmaps is incorrect, but that function only cares about whether or not the "size" of the data is zero or not. Convert that logic to use a proper boolean, and teach the caller to skip the call entirely if the end result would be that we'd do nothing anyway. This avoids a crash later in this series. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [hch: generalized the NULL ifor check] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-05-19 09:40:58 -07:00
Christoph Hellwig	4b516ff4e7	xfs: remove the NULL fork handling in xfs_bmapi_read Now that we fully verify the inode forks before they are added to the inode cache, the crash reported in https://bugzilla.kernel.org/show_bug.cgi?id=204031 can't happen anymore, as we'll never let an inode that has inconsistent nextents counts vs the presence of an in-core attr fork leak into the inactivate code path. So remove the work around to try to handle the case, and just return an error and warn if the fork is not present. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:58 -07:00
Christoph Hellwig	1a1c57b282	xfs: remove the special COW fork handling in xfs_bmapi_read We don't call xfs_bmapi_read for the COW fork anymore, so remove the special casing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:58 -07:00
Christoph Hellwig	0f45a1b20c	xfs: improve local fork verification Call the data/attr local fork verifiers as soon as we are ready for them. This keeps them close to the code setting up the forks, and avoids a few branches later on. Also open code xfs_inode_verify_forks in the only remaining caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:58 -07:00
Christoph Hellwig	7c7ba21863	xfs: refactor xfs_inode_verify_forks The split between xfs_inode_verify_forks and the two helpers implementing the actual functionality is a little strange. Reshuffle it so that xfs_inode_verify_forks verifies if the data and attr forks are actually in local format and only call the low-level helpers if that is the case. Handle the actual error reporting in the low-level handlers to streamline the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Christoph Hellwig	1934c8bd81	xfs: remove xfs_ifork_ops xfs_ifork_ops add up to two indirect calls per inode read and flush, despite just having a single instance in the kernel. In xfsprogs phase6 in xfs_repair overrides the verify_dir method to deal with inodes that do not have a valid parent, but that can be fixed pretty easily by ensuring they always have a valid looking parent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Christoph Hellwig	bb8a66af4f	xfs: remove xfs_iread There is not much point in the xfs_iread function, as it has a single caller and not a whole lot of code. Move it into the only caller, and trim down the overdocumentation to just documenting the important "why" instead of a lot of redundant "what". Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Christoph Hellwig	7f02901235	xfs: don't reset i_delayed_blks in xfs_iread i_delayed_blks is set to 0 in xfs_inode_alloc and can't have anything assigned to it until the inode is visible to the VFS. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Christoph Hellwig	2d6051d496	xfs: call xfs_dinode_verify from xfs_inode_from_disk Keep the code dealing with the dinode together, and also ensure we verify the dinode in the owner change log recovery case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Christoph Hellwig	0bce8173fd	xfs: handle unallocated inodes in xfs_inode_from_disk Handle inodes with a 0 di_mode in xfs_inode_from_disk, instead of partially duplicating inode reading in xfs_iread. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Christoph Hellwig	9229d18e80	xfs: split xfs_iformat_fork xfs_iformat_fork is a weird catchall. Split it into one helper for the data fork and one for the attr fork, and then call both helper as well as the COW fork initialization from xfs_inode_from_disk. Order the COW fork initialization after the attr fork initialization given that it can't fail to simplify the error handling. Note that the newly split helpers are moved down the file in xfs_inode_fork.c to avoid the need for forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Christoph Hellwig	cb7d585944	xfs: call xfs_iformat_fork from xfs_inode_from_disk We always need to fill out the fork structures when reading the inode, so call xfs_iformat_fork from the tail of xfs_inode_from_disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Christoph Hellwig	b90c2a9c8b	xfs: xfs_bmapi_read doesn't take a fork id as the last argument The last argument to xfs_bmapi_raad contains XFS_BMAPI_* flags, not the fork. Given that XFS_DATA_FORK evaluates to 0 no real harm is done, but let's fix this anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:57 -07:00
Kaixu Xia	14506f7a91	xfs: fix the warning message in xfs_validate_sb_common() Fix this error message to complain about project and group quota flag bits instead of "PUOTA" and "QUOTA". Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-19 09:40:56 -07:00
Darrick J. Wong	765d3c393c	xfs: don't allow SWAPEXT if we'd screw up quota accounting Since the old SWAPEXT ioctl doesn't know how to adjust quota ids, bail out of the ids don't match and quotas are enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2020-05-19 09:40:56 -07:00
Darrick J. Wong	78bba5c812	xfs: use ordered buffers to initialize dquot buffers during quotacheck While QAing the new xfs_repair quotacheck code, I uncovered a quota corruption bug resulting from a bad interaction between dquot buffer initialization and quotacheck. The bug can be reproduced with the following sequence: # mkfs.xfs -f /dev/sdf # mount /dev/sdf /opt -o usrquota # su nobody -s /bin/bash -c 'touch /opt/barf' # sync # xfs_quota -x -c 'report -ahi' /opt User quota on /opt (/dev/sdf) Inodes User ID Used Soft Hard Warn/Grace ---------- --------------------------------- root 3 0 0 00 [------] nobody 1 0 0 00 [------] # xfs_io -x -c 'shutdown' /opt # umount /opt # mount /dev/sdf /opt -o usrquota # touch /opt/man2 # xfs_quota -x -c 'report -ahi' /opt User quota on /opt (/dev/sdf) Inodes User ID Used Soft Hard Warn/Grace ---------- --------------------------------- root 1 0 0 00 [------] nobody 1 0 0 00 [------] # umount /opt Notice how the initial quotacheck set the root dquot icount to 3 (rootino, rbmino, rsumino), but after shutdown -> remount -> recovery, xfs_quota reports that the root dquot has only 1 icount. We haven't deleted anything from the filesystem, which means that quota is now under-counting. This behavior is not limited to icount or the root dquot, but this is the shortest reproducer. I traced the cause of this discrepancy to the way that we handle ondisk dquot updates during quotacheck vs. regular fs activity. Normally, when we allocate a disk block for a dquot, we log the buffer as a regular (dquot) buffer. Subsequent updates to the dquots backed by that block are done via separate dquot log item updates, which means that they depend on the logged buffer update being written to disk before the dquot items. Because individual dquots have their own LSN fields, that initial dquot buffer must always be recovered. However, the story changes for quotacheck, which can cause dquot block allocations but persists the final dquot counter values via a delwri list. Because recovery doesn't gate dquot buffer replay on an LSN, this means that the initial dquot buffer can be replayed over the (newer) contents that were delwritten at the end of quotacheck. In effect, this re-initializes the dquot counters after they've been updated. If the log does not contain any other dquot items to recover, the obsolete dquot contents will not be corrected by log recovery. Because quotacheck uses a transaction to log the setting of the CHKD flags in the superblock, we skip quotacheck during the second mount call, which allows the incorrect icount to remain. Fix this by changing the ondisk dquot initialization function to use ordered buffers to write out fresh dquot blocks if it detects that we're running quotacheck. If the system goes down before quotacheck can complete, the CHKD flags will not be set in the superblock and the next mount will run quotacheck again, which can fix uninitialized dquot buffers. This requires amending the defer code to maintaine ordered buffer state across defer rolls for the sake of the dquot allocation code. For regular operations we preserve the current behavior since the dquot items require properly initialized ondisk dquot records. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2020-05-19 09:40:56 -07:00
Eric Biggers	e3b1078bed	fscrypt: add support for IV_INO_LBLK_32 policies The eMMC inline crypto standard will only specify 32 DUN bits (a.k.a. IV bits), unlike UFS's 64. IV_INO_LBLK_64 is therefore not applicable, but an encryption format which uses one key per policy and permits the moving of encrypted file contents (as f2fs's garbage collector requires) is still desirable. To support such hardware, add a new encryption format IV_INO_LBLK_32 that makes the best use of the 32 bits: the IV is set to 'SipHash-2-4(inode_number) + file_logical_block_number mod 2^32', where the SipHash key is derived from the fscrypt master key. We hash only the inode number and not also the block number, because we need to maintain contiguity of DUNs to merge bios. Unlike with IV_INO_LBLK_64, with this format IV reuse is possible; this is unavoidable given the size of the DUN. This means this format should only be used where the requirements of the first paragraph apply. However, the hash spreads out the IVs in the whole usable range, and the use of a keyed hash makes it difficult for an attacker to determine which files use which IVs. Besides the above differences, this flag works like IV_INO_LBLK_64 in that on ext4 it is only allowed if the stable_inodes feature has been enabled to prevent inode numbers and the filesystem UUID from changing. Link: https://lore.kernel.org/r/20200515204141.251098-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Paul Crowley <paulcrowley@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-05-19 09:34:18 -07:00
Brian Foster	f28cef9e4d	xfs: don't fail verifier on empty attr3 leaf block The attr fork can transition from shortform to leaf format while empty if the first xattr doesn't fit in shortform. While this empty leaf block state is intended to be transient, it is technically not due to the transactional implementation of the xattr set operation. We historically have a couple of bandaids to work around this problem. The first is to hold the buffer after the format conversion to prevent premature writeback of the empty leaf buffer and the second is to bypass the xattr count check in the verifier during recovery. The latter assumes that the xattr set is also in the log and will be recovered into the buffer soon after the empty leaf buffer is reconstructed. This is not guaranteed, however. If the filesystem crashes after the format conversion but before the xattr set that induced it, only the format conversion may exist in the log. When recovered, this creates a latent corrupted state on the inode as any subsequent attempts to read the buffer fail due to verifier failure. This includes further attempts to set xattrs on the inode or attempts to destroy the attr fork, which prevents the inode from ever being removed from the unlinked list. To avoid this condition, accept that an empty attr leaf block is a valid state and remove the count check from the verifier. This means that on rare occasions an attr fork might exist in an unexpected state, but is otherwise consistent and functional. Note that we retain the logic to avoid racing with metadata writeback to reduce the window where this can occur. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2020-05-19 09:05:24 -07:00
Eric Biggers	0ca2ddb0cd	fscrypt: make test_dummy_encryption use v2 by default Since v1 encryption policies are deprecated, make test_dummy_encryption test v2 policies by default. Note that this causes ext4/023 and ext4/028 to start failing due to known bugs in those tests (see previous commit). Link: https://lore.kernel.org/r/20200512233251.118314-5-ebiggers@kernel.org Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-05-18 20:21:48 -07:00
Eric Biggers	ed318a6cc0	fscrypt: support test_dummy_encryption=v2 v1 encryption policies are deprecated in favor of v2, and some new features (e.g. encryption+casefolding) are only being added for v2. Therefore, the "test_dummy_encryption" mount option (which is used for encryption I/O testing with xfstests) needs to support v2 policies. To do this, extend its syntax to be "test_dummy_encryption=v1" or "test_dummy_encryption=v2". The existing "test_dummy_encryption" (no argument) also continues to be accepted, to specify the default setting -- currently v1, but the next patch changes it to v2. To cleanly support both v1 and v2 while also making it easy to support specifying other encryption settings in the future (say, accepting "$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a pointer to the dummy fscrypt_context rather than using mount flags. To avoid concurrency issues, don't allow test_dummy_encryption to be set or changed during a remount. (The former restriction is new, but xfstests doesn't run into it, so no one should notice.) Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4, there are two regressions, both of which are test bugs: ext4/023 and ext4/028 fail because they set an xattr and expect it to be stored inline, but the increase in size of the fscrypt_context from 24 to 40 bytes causes this xattr to be spilled into an external block. Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-05-18 20:21:48 -07:00
Linus Torvalds	45088963ca	Merge tag 'for-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - Fix potential memory leak in exfat_find - Set exfat's splice_write to iter_file_splice_write to fix a splice failure on direct-opened files * tag 'for-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix possible memory leak in exfat_find() exfat: use iter_file_splice_write	2020-05-18 10:33:13 -07:00
David Howells	9d1be4f4dc	afs: Don't unlock fetched data pages until the op completes successfully Don't call req->page_done() on each page as we finish filling it with the data coming from the network. Whilst this might speed up the application a bit, it's a problem if there's a network failure and the operation has to be reissued. If this happens, an oops occurs because afs_readpages_page_done() clears the pointer to each page it unlocks and when a retry happens, the pointers to the pages it wants to fill are now NULL (and the pages have been unlocked anyway). Instead, wait till the operation completes successfully and only then release all the pages after clearing any terminal gap (the server can give us less data than we requested as we're allowed to ask for more than is available). KASAN produces a bug like the following, and even without KASAN, it can oops and panic. BUG: KASAN: wild-memory-access in _copy_to_iter+0x323/0x5f4 Write of size 1404 at addr 0005088000000000 by task md5sum/5235 CPU: 0 PID: 5235 Comm: md5sum Not tainted 5.7.0-rc3-fscache+ #250 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: memcpy+0x39/0x58 _copy_to_iter+0x323/0x5f4 __skb_datagram_iter+0x89/0x2a6 skb_copy_datagram_iter+0x129/0x135 rxrpc_recvmsg_data.isra.0+0x615/0xd42 rxrpc_kernel_recv_data+0x1e9/0x3ae afs_extract_data+0x139/0x33a yfs_deliver_fs_fetch_data64+0x47a/0x91b afs_deliver_to_call+0x304/0x709 afs_wait_for_call_to_complete+0x1cc/0x4ad yfs_fs_fetch_data+0x279/0x288 afs_fetch_data+0x1e1/0x38d afs_readpages+0x593/0x72e read_pages+0xf5/0x21e __do_page_cache_readahead+0x128/0x23f ondemand_readahead+0x36e/0x37f generic_file_buffered_read+0x234/0x680 new_sync_read+0x109/0x17e vfs_read+0xe6/0x138 ksys_read+0xd8/0x14d do_syscall_64+0x6e/0x8a entry_SYSCALL_64_after_hwframe+0x49/0xb3 Fixes: `196ee9cd2d` ("afs: Make afs_fs_fetch_data() take a list of pages") Fixes: `30062bd13e` ("afs: Implement YFS support in the fs client") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-05-18 10:29:17 -07:00
Jens Axboe	e3aabf9554	io_uring: cancel work if task_work_add() fails We currently move it to the io_wqe_manager for execution, but we cannot safely do so as we may lack some of the state to execute it out of context. As we cancel work anyway when the ring/task exits, just mark this request as canceled and io_async_task_func() will do the right thing. Fixes: `aa96bf8a9e` ("io_uring: use io-wq manager as backup task if task is exiting") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-18 11:14:22 -06:00
Wei Yongjun	94182167ec	exfat: fix possible memory leak in exfat_find() 'es' is malloced from exfat_get_dentry_set() in exfat_find() and should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Fixes: `5f2aa07507` ("exfat: add inode operations") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2020-05-18 11:51:44 +09:00
Eric Sandeen	0357794830	exfat: use iter_file_splice_write Doing copy_file_range() on exfat with a file opened for direct IO leads to an -EFAULT: # xfs_io -f -d -c "truncate 32768" \ -c "copy_range -d 16384 -l 16384 -f 0" /mnt/test/junk copy_range: Bad address and the reason seems to be that we go through: default_file_splice_write splice_from_pipe __splice_from_pipe write_pipe_buf __kernel_write new_sync_write generic_file_write_iter generic_file_direct_write exfat_direct_IO do_blockdev_direct_IO iov_iter_get_pages and land in iterate_all_kinds(), which does "return -EFAULT" for our kvec iter. Setting exfat's splice_write to iter_file_splice_write fixes this and lets fsx (which originally detected the problem) run to success from the xfstests harness. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2020-05-18 11:51:40 +09:00
Jens Axboe	310672552f	io_uring: async task poll trigger cleanup If the request is still hashed in io_async_task_func(), then it cannot have been canceled and it's pointless to check. So save that check. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 17:43:31 -06:00
Eric Biggers	3c3c32f85b	ubifs: fix wrong use of crypto_shash_descsize() crypto_shash_descsize() returns the size of the shash_desc context needed to compute the hash, not the size of the hash itself. crypto_shash_digestsize() would be correct, or alternatively using c->hash_len and c->hmac_desc_len which already store the correct values. But actually it's simpler to just use stack arrays, so do that instead. Fixes: `49525e5eec` ("ubifs: Add helper functions for authentication support") Fixes: `da8ef65f95` ("ubifs: Authenticate replayed journal") Cc: <stable@vger.kernel.org> # v4.20+ Cc: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>	2020-05-17 23:38:21 +02:00
Jens Axboe	948a774945	io_uring: remove dead check in io_splice() We checked for 'force_nonblock' higher up, so it's definitely false at this point. Kill the check, it's a remnant of when we tried to do inline splice without always punting to async context. Fixes: `2fb3e82284` ("io_uring: punt splice async because of inode mutex") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 14:21:38 -06:00
Pavel Begunkov	f2a8d5c7a2	io_uring: add tee(2) support Add IORING_OP_TEE implementing tee(2) support. Almost identical to splice bits, but without offsets. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 14:10:07 -06:00
Pavel Begunkov	9dafdfc2f0	splice: export do_tee() export do_tee() for use in io_uring Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 14:10:07 -06:00
Pavel Begunkov	c11368a57b	io_uring: don't repeat valid flag list req->flags stores all sqe->flags. After checking that sqe->flags are valid set if IOSQE* flags, no need to double check it, just forward them all. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 14:10:07 -06:00
Pavel Begunkov	9f13c35b33	io_uring: rename io_file_put() io_file_put() deals with flushing state's file refs, adding "state" to its name makes it a bit clearer. Also, avoid double check of state->file in __io_file_get() in some cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 14:10:07 -06:00
Pavel Begunkov	0cdaf760f4	io_uring: remove req->needs_fixed_files A submission is "async" IIF it's done by SQPOLL thread. Instead of passing @async flag into io_submit_sqes(), deduce it from ctx->flags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 14:10:07 -06:00
Jens Axboe	3bfa5bcb26	io_uring: cleanup io_poll_remove_one() logic We only need apoll in the one section, do the juggling with the work restoration there. This removes a special case further down as well. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 14:10:01 -06:00
Linus Torvalds	b48397cb75	Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve fix from Eric Biederman: "While working on my exec cleanups I found a bug in exec that I introduced by accident a couple of years ago. I apparently missed the fact that bprm->file can change. Now I have a very personal motive to clean up exec and make it more approachable. The change is just moving woud_dump to where it acts on the final bprm->file not the initial bprm->file. I have been careful and tested and verify this fix works" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: exec: Move would_dump into flush_old_exec	2020-05-17 12:23:37 -07:00
Eric W. Biederman	f87d1c9559	exec: Move would_dump into flush_old_exec I goofed when I added mm->user_ns support to would_dump. I missed the fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and binfmt_script bprm->file is reassigned. Which made the move of would_dump from setup_new_exec to __do_execve_file before exec_binprm incorrect as it can result in would_dump running on the script instead of the interpreter of the script. The net result is that the code stopped making unreadable interpreters undumpable. Which allows them to be ptraced and written to disk without special permissions. Oops. The move was necessary because the call in set_new_exec was after bprm->mm was no longer valid. To correct this mistake move the misplaced would_dump from __do_execve_file into flos_old_exec, before exec_mmap is called. I tested and confirmed that without this fix I can attach with gdb to a script with an unreadable interpreter, and with this fix I can not. Cc: stable@vger.kernel.org Fixes: `f84df2a6f2` ("exec: Ensure mm->user_ns contains the execed files") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2020-05-17 10:48:24 -05:00
Pavel Begunkov	bd2ab18a1d	io_uring: fix FORCE_ASYNC req preparation As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs, so they can be prepared properly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-17 09:22:09 -06:00

... 5 6 7 8 9 ...

64610 Commits