android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Nikolay Borisov	4660c49f9b	btrfs: Remove redundant memory barrier in dev stats As per atomic_t.txt documentation : - RMW operations that have a return value are fully ordered; atomic_xchg is one such operation so it already includes everything it needs w.r.t memory ordering and add a comment to be more explicit about that. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:10 +01:00
Nikolay Borisov	9deae96892	btrfs: Fix memory barriers usage with device stats counters Commit `addc3fa74e` ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared") reworked the way device stats changes are tracked. A new atomic dev_stats_ccnt counter was introduced which is incremented every time any of the device stats counters are changed. This serves as a flag whether there are any pending stats changes. However, this patch only partially implemented the correct memory barriers necessary: - It only ordered the stores to the counters but not the reads e.g. btrfs_run_dev_stats - It completely omitted any comments documenting the intended design and how the memory barriers pair with each-other This patch provides the necessary comments as well as adds a missing smp_rmb in btrfs_run_dev_stats. Furthermore since dev_stats_cnt is only a snapshot at best there was no point in reading the counter twice - once in btrfs_dev_stats_dirty and then again when assigning stats_cnt. Just collapse both reads into 1. Fixes: `addc3fa74e` ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared") Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:10 +01:00
Anand Jain	1cb34c8ecd	btrfs: clean up btrfs_dev_stat_inc usage btrfs_end_bio() is using btrfs_dev_stat_inc() and then btrfs_dev_stat_print_on_error() separately instead use btrfs_dev_stat_inc_and_print() directly. As of now there isn't any bio in btrfs which is - a non-empty write and also the REQ_PREFLUSH flag is set. So in actual the condition if (bio->bi_opf & REQ_PREFLUSH) is never true in btrfs_end_bio(), and so there won't be any redundant error log by using btrfs_dev_stat_inc_and_print() separately one for write and another for flush. This consolidation will help to add the device critical error handles in the function btrfs_dev_stat_inc_and_print() and which can be renamed as needed. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:10 +01:00
Liu Bo	9f5316c17b	Btrfs: free btrfs_device in place It's pointless to defer it to a kthread helper as we're not under a special context. For reference, commit `1f78160ce1` ("Btrfs: using rcu lock in the reader side of devices list") introduced RCU freeing for device structures. Originally the blkdev_put was called from free_device and rcu_barrier had to be called. This is no longer required, bdev and our device structures are now freed separately. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ enhance changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:10 +01:00
Liu Bo	1805f2ca3f	Btrfs: remove redundant btrfs_balance_delayed_items In functions like btrfs_create(), we run both btrfs_balance_delayed_items() and btrfs_btree_balance_dirty() after the operation, but btrfs_btree_balance_dirty() is surely going to run btrfs_balance_delayed_items(). This keeps only btrfs_btree_balance_dirty(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:10 +01:00
Eric Biggers	49686cbbb3	NFS: reject request for id_legacy key without auxdata nfs_idmap_legacy_upcall() is supposed to be called with 'aux' pointing to a 'struct idmap', via the call to request_key_with_auxdata() in nfs_idmap_request_key(). However it can also be reached via the request_key() system call in which case 'aux' will be NULL, causing a NULL pointer dereference in nfs_idmap_prepare_pipe_upcall(), assuming that the key description is valid enough to get that far. Fix this by making nfs_idmap_legacy_upcall() negate the key if no auxdata is provided. As usual, this bug was found by syzkaller. A simple reproducer using the command-line keyctl program is: keyctl request2 id_legacy uid:0 '' @s Fixes: `57e62324e4` ("NFS: Store the legacy idmapper result in the keyring") Reported-by: syzbot+5dfdbcf7b3eb5912abbb@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Trond Myklebust <trondmy@gmail.com>	2018-01-22 10:05:11 -05:00
Andreas Gruenbacher	0ff5916ad4	gfs2: Get rid of gfs2_log_header_in Get rid of gfs2_log_header_in by integrating it into get_log_header. Clean up the crc32 computations and use the same functions for encoding and decoding to make things less confusing. Eliminate lh_hash from gfs2_log_header_host which is completely useless. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2018-01-22 07:06:15 -07:00
Miklos Szeredi	cf5eebae2c	seq_file: fix incomplete reset on read from zero offset When resetting iterator on a zero offset we need to discard any data already in the buffer (count), and private state of the iterator (version). For example this bug results in first line being repeated in /proc/mounts if doing a zero size read before a non-zero size read. Reported-by: Rich Felker <dalias@libc.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `e522751d60` ("seq_file: reset iterator to first record for zero offset") Cc: <stable@vger.kernel.org> # v4.10 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-01-20 02:31:15 -05:00
David S. Miller	8565d26bcb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The BPF verifier conflict was some minor contextual issue. The TUN conflict was less trivial. Cong Wang fixed a memory leak of tfile->tx_array in 'net'. This is an skb_array. But meanwhile in net-next tun changed tfile->tx_arry into tfile->tx_ring which is a ptr_ring. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-19 22:59:33 -05:00
Dan Williams	569d0365f5	dax: require 'struct page' by default for filesystem dax If a dax buffer from a device that does not map pages is passed to read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If gdb attempts to examine the contents of a dax buffer from a device that does not map pages it triggers SIGBUS. If fork(2) is called on a process with a dax mapping from a device that does not map pages it triggers SIGBUS. 'struct page' is required otherwise several kernel code paths break in surprising ways. Disable filesystem-dax on devices that do not map pages. In addition to needing pfn_to_page() to be valid we also require devmap pages. We need this to detect dax pages in the get_user_pages_fast() path and so that we can stop managing the VM_MIXEDMAP flag. For DAX drivers that have not supported get_user_pages() to date we allow them to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration option which requires ->direct_access() to return pfn_t_special() pfns. This leaves DAX support in brd disabled and scheduled for removal. Note that when the initial dax support was being merged a few years back there was concern that struct page was unsuitable for use with next generation persistent memory devices. The theoretical concern was that struct page access, being such a hotly used data structure in the kernel, would lead to media wear out. While that was a reasonable conservative starting position it has not held true in practice. We have long since committed to using devm_memremap_pages() to support higher order kernel functionality that needs get_user_pages() and pfn_to_page(). Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-01-19 16:50:53 -08:00
Dan Williams	b4b5798cea	ext2: auto disable dax instead of failing mount Bring the ext2 filesystem in line with xfs that only warns and continues when the "-o dax" option is specified to mount and the backing device does not support dax. This is in preparation for removing dax support from devices that do not enable get_user_pages() operations on dax mappings. In other words 'gup' support is required and configurations that were using so called 'page-less' dax will be converted back to using the page cache. Removing the broken 'page-less' dax support is a pre-requisite for removing the "EXPERIMENTAL" warning when mounting a filesystem in dax mode. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-01-19 16:50:53 -08:00
Dan Williams	24f3478d66	ext4: auto disable dax instead of failing mount Bring the ext4 filesystem in line with xfs that only warns and continues when the "-o dax" option is specified to mount and the backing device does not support dax. This is in preparation for removing dax support from devices that do not enable get_user_pages() operations on dax mappings. In other words 'gup' support is required and configurations that were using so called 'page-less' dax will be converted back to using the page cache. Removing the broken 'page-less' dax support is a pre-requisite for removing the "EXPERIMENTAL" warning when mounting a filesystem in dax mode. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-01-19 16:50:53 -08:00
Alexey Dobriyan	8bb2ee192e	proc: fix coredump vs read /proc//stat race do_task_stat() accesses IP and SP of a task without bumping reference count of a stack (which became an entity with independent lifetime at some point). Steps to reproduce: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/time.h> #include <sys/resource.h> #include <unistd.h> #include <sys/wait.h> int main(void) { setrlimit(RLIMIT_CORE, &(struct rlimit){}); while (1) { char buf[64]; char buf2[4096]; pid_t pid; int fd; pid = fork(); if (pid == 0) { (volatile int *)0 = 0; } snprintf(buf, sizeof(buf), "/proc/%u/stat", pid); fd = open(buf, O_RDONLY); read(fd, buf2, sizeof(buf2)); close(fd); waitpid(pid, NULL, 0); } return 0; } BUG: unable to handle kernel paging request at 0000000000003fd8 IP: do_task_stat+0x8b4/0xaf0 PGD 800000003d73e067 P4D 800000003d73e067 PUD 3d558067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1417 Comm: a.out Not tainted 4.15.0-rc8-dirty #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014 RIP: 0010:do_task_stat+0x8b4/0xaf0 Call Trace: proc_single_show+0x43/0x70 seq_read+0xe6/0x3b0 __vfs_read+0x1e/0x120 vfs_read+0x84/0x110 SyS_read+0x3d/0xa0 entry_SYSCALL_64_fastpath+0x13/0x6c RIP: 0033:0x7f4d7928cba0 RSP: 002b:00007ffddb245158 EFLAGS: 00000246 Code: 03 b7 a0 01 00 00 4c 8b 4c 24 70 4c 8b 44 24 78 4c 89 74 24 18 e9 91 f9 ff ff f6 45 4d 02 0f 84 fd f7 ff ff 48 8b 45 40 48 89 ef <48> 8b 80 d8 3f 00 00 48 89 44 24 20 e8 9b 97 eb ff 48 89 44 24 RIP: do_task_stat+0x8b4/0xaf0 RSP: ffffc90000607cc8 CR2: 0000000000003fd8 John Ogness said: for my tests I added an else case to verify that the race is hit and correctly mitigated. Link: http://lkml.kernel.org/r/20180116175054.GA11513@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: "Kohli, Gaurav" <gkohli@codeaurora.org> Tested-by: John Ogness <john.ogness@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-01-19 10:09:41 -08:00
Ivan Vecera	ba87977a49	kernfs: fix regression in kernfs_fop_write caused by wrong type Commit `b7ce40cff0` ("kernfs: cache atomic_write_len in kernfs_open_file") changes type of local variable 'len' from ssize_t to size_t. This change caused that the ppos value is updated also when the previous write callback failed. Mentioned snippet: ... len = ops->write(...); <- return value can be negative ... if (len > 0) <- true here in this case ppos += len; ... Fixes: `b7ce40cff0` ("kernfs: cache atomic_write_len in kernfs_open_file") Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-01-19 12:19:13 -05:00
Amir Goldstein	a5a927a7c8	ovl: take mnt_want_write() for removing impure xattr The optimization in ovl_cache_get_impure() that tries to remove an unneeded "impure" xattr needs to take mnt_want_write() on upper fs. Fixes: `4edb83bb10` ("ovl: constant d_ino for non-merge dirs") Cc: <stable@vger.kernel.org> #v4.14 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-01-19 17:43:24 +01:00
Amir Goldstein	2ba9d57e65	ovl: take mnt_want_write() for work/index dir setup There are several write operations on upper fs not covered by mnt_want_write(): - test set/remove OPAQUE xattr - test create O_TMPFILE - set ORIGIN xattr in ovl_verify_origin() - cleanup of index entries in ovl_indexdir_cleanup() Some of these go way back, but this patch only applies over the v4.14 re-factoring of ovl_fill_super(). Cc: <stable@vger.kernel.org> #v4.14 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-01-19 17:43:24 +01:00
Amir Goldstein	f81678173c	ovl: fix another overlay: warning prefix Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-01-19 17:43:24 +01:00
Amir Goldstein	6d0a8a90a5	ovl: take lower dir inode mutex outside upper sb_writers lock The functions ovl_lower_positive() and ovl_check_empty_dir() both take inode mutex on the real lower dir under ovl_want_write() which takes the upper_mnt sb_writers lock. While this is not a clear locking order or layering violation, it creates an undesired lock dependency between two unrelated layers for no good reason. This lock dependency materializes to a false(?) positive lockdep warning when calling rmdir() on a nested overlayfs, where both nested and underlying overlayfs both use the same fs type as upper layer. rmdir() on the nested overlayfs creates the lock chain: sb_writers of upper_mnt (e.g. tmpfs) in ovl_do_remove() ovl_i_mutex_dir_key[] of lower overlay dir in ovl_lower_positive() rmdir() on the underlying overlayfs creates the lock chain in reverse order: ovl_i_mutex_dir_key[] of lower overlay dir in vfs_rmdir() sb_writers of nested upper_mnt (e.g. tmpfs) in ovl_do_remove() To rid of the unneeded locking dependency, move both ovl_lower_positive() and ovl_check_empty_dir() to before ovl_want_write() in rmdir() and rename() implementation. This change spreads the pieces of ovl_check_empty_and_clear() directly inside the rmdir()/rename() implementations so the helper is no longer needed and removed. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-01-19 17:43:23 +01:00
Amir Goldstein	d796e77f1d	ovl: fix failure to fsync lower dir As a writable mount, it is not expected for overlayfs to return EINVAL/EROFS for fsync, even if dir/file is not changed. This commit fixes the case of fsync of directory, which is easier to address, because overlayfs already implements fsync file operation for directories. The problem reported by Raphael is that new PostgreSQL 10.0 with a database in overlayfs where lower layer in squashfs fails to start. The failure is due to fsync error, when PostgreSQL does fsync on all existing db directories on startup and a specific directory exists lower layer with no changes. Reported-by: Raphael Hertzog <raphael@ouaza.com> Cc: <stable@vger.kernel.org> # v3.18 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Raphaël Hertzog <hertzog@debian.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-01-19 13:54:33 +01:00
Amir Goldstein	31747eda41	ovl: hash directory inodes for fsnotify fsnotify pins a watched directory inode in cache, but if directory dentry is released, new lookup will allocate a new dentry and a new inode. Directory events will be notified on the new inode, while fsnotify listener is watching the old pinned inode. Hash all directory inodes to reuse the pinned inode on lookup. Pure upper dirs are hashes by real upper inode, merge and lower dirs are hashed by real lower inode. The reference to lower inode was being held by the lower dentry object in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry may drop lower inode refcount to zero. Add a refcount on behalf of the overlay inode to prevent that. As a by-product, hashing directory inodes also detects multiple redirected dirs to the same lower dir and uncovered redirected dir target on and returns -ESTALE on lookup. The reported issue dates back to initial version of overlayfs, but this patch depends on ovl_inode code that was introduced in kernel v4.13. Cc: <stable@vger.kernel.org> #v4.13 Reported-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com>	2018-01-19 13:54:33 +01:00
Daeho Jeong	9ac1e2d88d	f2fs: prevent newly created inode from being dirtied incorrectly Now, we invoke f2fs_mark_inode_dirty_sync() to make an inode dirty in advance of creating a new node page for the inode. By this, some inodes whose node page is not created yet can be linked into the global dirty list. If the checkpoint is executed at this moment, the inode will be written back by writeback_single_inode() and finally update_inode_page() will fail to detach the inode from the global dirty list because the inode doesn't have a node page. The problem is that the inode's state in VFS layer will become clean after execution of writeback_single_inode() and it's still linked in the global dirty list of f2fs and this will cause a kernel panic. So, we will prevent the newly created inode from being dirtied during the FI_NEW_INODE flag of the inode is set. We will make it dirty right after the flag is cleared. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Tested-by: Hobin Woo <hobin.woo@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-18 22:09:12 -08:00
Chao Yu	442a9dbd57	f2fs: support FIEMAP_FLAG_XATTR This patch enables ->fiemap to handle FIEMAP_FLAG_XATTR flag for xattr mapping info lookup purpose. It makes f2fs passing generic/425 test in fstest. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-18 22:09:11 -08:00
Chao Yu	f1b43d4cd5	f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock This patch fix to cover f2fs_inline_data_fiemap with inode_lock in order to make that interface avoiding race with mapping change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-18 22:09:10 -08:00
Yunlei He	7dff55d27e	f2fs: check node page again in write end io Check node page again in write end io in case of data corruption during inflght IO. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-18 22:09:09 -08:00
Chao Yu	25a912e51a	f2fs: fix to caclulate required free section correctly When calculating required free section during file defragmenting, we should skip holes in file, otherwise we will probably fail to defrag sparse file with large size. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-18 22:09:08 -08:00
Daeho Jeong	f1d2564a7c	f2fs: handle newly created page when revoking inmem pages When committing inmem pages is successful, we revoke already committed blocks in __revoke_inmem_pages() and finally replace the committed ones with the old blocks using f2fs_replace_block(). However, if the committed block was newly created one, the address of the old block is NEW_ADDR and __f2fs_replace_block() cannot handle NEW_ADDR as new_blkaddr properly and a kernel panic occurrs. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Tested-by: Shu Tan <shu.tan@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-01-18 22:09:07 -08:00
Andreas Gruenbacher	88b65ce5fd	gfs2: Minor gfs2_page_add_databufs cleanup The to parameter of gfs2_page_add_databufs is passed inconsistently: once as from + len, once as from + len - 1. Just pass len instead. In addition, once we're past the end, we can immediately break out of the loop. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2018-01-18 14:18:55 -07:00
Andreas Gruenbacher	235628c5c7	gfs2: Add gfs2_max_stuffed_size Add a small inline function for computing the maximum size of a stuffed inode instead of open coding that in several places throughout the code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2018-01-18 14:18:53 -07:00
Andreas Gruenbacher	9db115a0e3	gfs2: Typo fixes Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2018-01-18 14:18:49 -07:00
Bob Peterson	786ebd9f68	Merge branch 'punch-hole' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git	2018-01-18 14:17:13 -07:00
Andreas Gruenbacher	4e56a6411f	gfs2: Implement fallocate(FALLOC_FL_PUNCH_HOLE) Implement the top-level bits of punching a hole into a file. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2018-01-18 21:15:58 +01:00
Andreas Gruenbacher	10d2cf94c2	gfs2: Turn trunc_dealloc into punch_hole Add an upper bound to the range of blocks to deallocate blocks to function trunc_dealloc so that this function can be used for truncating a file as well as for punching a hole into a file. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2018-01-18 21:15:57 +01:00
Andreas Gruenbacher	5cf26b1e88	gfs2: Generalize truncate code Pull the code for computing the range of metapointers to iterate out of gfs2_metapath_ra (for readahead), sweep_bh_for_rgrps (for deallocating metapointers within a block), and trunc_dealloc (for walking the metadata tree). In sweep_bh_for_rgrps, move the code for looking up the resource group descriptor of the current resource group out of the inner loop. The metatype check moves to trunc_dealloc. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2018-01-18 21:15:37 +01:00
Jan Chochol	cbebc6ef4f	nfs: Do not convert nfs_idmap_cache_timeout to jiffies Since commit `57e62324e4` ("NFS: Store the legacy idmapper result in the keyring") nfs_idmap_cache_timeout changed units from jiffies to seconds. Unfortunately sysctl interface was not updated accordingly. As a effect updating /proc/sys/fs/nfs/idmap_cache_timeout with some value will incorrectly multiply this value by HZ. Also reading /proc/sys/fs/nfs/idmap_cache_timeout will show real value divided by HZ. Fixes: `57e62324e4` ("NFS: Store the legacy idmapper result in the keyring") Signed-off-by: Jan Chochol <jan@chochol.info> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-18 15:10:47 -05:00
Chuck Lever	06e1902456	nfs: Use proper enum definitions for nfs_show_stable Commit `8224b2734a` ("NFS: Add static NFS I/O tracepoints") had a hack to work around some odd behavior observed with __print_symbolic. I couldn't ever get it to display NFS_FILE_SYNC when using TRACE_DEFINE_ENUM macros to set up the enum values. I tracked down the actual bug that forced me to add the workaround. That issue will be addressed soon, so replace the hack with a proper implementation. Fixes: `8224b2734a` ("NFS: Add static NFS I/O tracepoints") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-18 15:01:22 -05:00
Tigran Mkrtchyan	7ff4cff637	nfs41: do not return ENOMEM on LAYOUTUNAVAILABLE A pNFS server may return LAYOUTUNAVAILABLE error on LAYOUTGET for files which don't have any layout. In this situation pnfs_update_layout currently returns NULL. As this NULL is converted into ENOMEM, IO requests fails instead of falling back to MDS. Do not return ENOMEM on LAYOUTUNAVAILABLE and let client retry through MDS. Fixes `8d40b0f148`. I will suggest to backport this fix to affected stable branches. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> [trondmy: Use IS_ERR_OR_NULL()] Fixes: `8d40b0f148` ("NFS filelayout:call GETDEVICEINFO after...") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-18 12:51:31 -05:00
Darrick J. Wong	75d4a13b1f	xfs: fix non-debug build compiler warnings Fix compiler warning on non-debug build Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:47 -08:00
Darrick J. Wong	4bb73d0147	xfs: check sb_agblocks and sb_agblklog when validating superblock Currently, we don't check sb_agblocks or sb_agblklog when we validate the superblock, which means that we can fuzz garbage values into those values and the mount succeeds. This leads to all sorts of UBSAN warnings in xfs/350 since we can then coerce other parts of xfs into shifting by ridiculously large values. Once we've validated agblocks, make sure the agcount makes sense. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-01-17 21:00:47 -08:00
Darrick J. Wong	be78ff0e72	xfs: recheck reflink / dirty page status before freeing CoW reservations Eryu Guan reported seeing occasional hangs when running generic/269 with a new fsstress that supports clonerange/deduperange. The cause of this hang is an infinite loop when we convert the CoW fork extents from unwritten to real just prior to writing the pages out; the infinite loop happens because there's nothing in the CoW fork to convert, and so it spins forever. The fundamental issue here is that when we go to perform these CoW fork conversions, we're supposed to have an extent waiting for us, but the low space CoW reaper has snuck in and blown them away! There are four conditions that can dissuade the reaper from touching our file -- no reflink iflag; dirty page cache; writeback in progress; or directio in progress. We check the four conditions prior to taking the locks, but we neglect to recheck them once we have the locks, which is how we end up whacking the writeback that's in progress. Therefore, refactor the four checks into a helper function and call it once again once we have the locks to make sure we really want to reap the inode. While we're at it, add an ASSERT for this weird condition so that we'll fail noisily if we ever screw this up again. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-01-17 21:00:47 -08:00
Darrick J. Wong	a5f460b168	xfs: check that br_blockcount doesn't overflow xfs_bmbt_irec.br_blockcount is declared as xfs_filblks_t, which is an unsigned 64-bit integer. Though the bmbt helpers will never set a value larger than 2^21 (since the underlying on-disk extent record has a length field that is only 21 bits wide), we should be a little defensive about checking that a bmbt record doesn't exceed what we're expecting or overflow into the next AG. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:47 -08:00
Darrick J. Wong	55e45429ce	xfs: btree format ifork loader should check for zero numrecs A btree format inode fork with zero records makes no sense, so reject it if we see it, or else we can miscalculate memory allocations. Found by zeroes fuzzing {a,u3}.bmbt.numrecs in xfs/{374,378,412} with KASAN. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-01-17 21:00:46 -08:00
Darrick J. Wong	79a69bf8dc	xfs: attr leaf verifier needs to check for obviously bad count In the attribute leaf verifier, we can check for obviously bad values of firstused and count so that later attempts at lasthash don't run off the end of the memory buffer. Found by ones fuzzing hdr.count in xfs/400 with KASAN. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-01-17 21:00:46 -08:00
Darrick J. Wong	ce92d29ddf	xfs: directory scrubber must walk through data block to offset In xfs_scrub_dir_rec, we must walk through the directory block entries to arrive at the offset given by the hash structure. If we blindly trust the hash address, we can end up midway into a directory entry and stray outside the block. Found by lastbit fuzzing lents[3].address in xfs/390 with KASAN enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:46 -08:00
Darrick J. Wong	638a717489	xfs: don't iunlock unlocked inodes Don't iunlock an unlocked inode, which can happen if the parent pointer scrubber bails out with sc->ip unlocked while trying to grab the parent directory inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-01-17 21:00:46 -08:00
Darrick J. Wong	cf1b0b8b1a	xfs: scrub in-core metadata Whenever we load a buffer, explicitly re-call the structure verifier to ensure that memory isn't corrupting things. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:46 -08:00
Darrick J. Wong	561f648ab2	xfs: cross-reference the block mappings when possible Use an inode's block mappings to cross-reference inode block counters. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:46 -08:00
Darrick J. Wong	46d9bfb5e7	xfs: cross-reference the realtime bitmap While we're scrubbing various btrees, cross-reference the records with the other metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:46 -08:00
Darrick J. Wong	f6d5fc21fd	xfs: cross-reference refcount btree during scrub During metadata btree scrub, we should cross-reference with the reference counts. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:46 -08:00
Darrick J. Wong	dbde19da96	xfs: cross-reference the rmapbt data with the refcountbt Cross reference the refcount data with the rmap data to check that the number of rmaps for a given block match the refcount of that block, and that CoW blocks (which are owned entirely by the refcountbt) are tracked as well. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:45 -08:00
Darrick J. Wong	d852657ccf	xfs: cross-reference reverse-mapping btree When scrubbing various btrees, we should cross-reference the records with the reverse mapping btree and ensure that traversing the btree finds the same number of blocks that the rmapbt thinks are owned by that btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-17 21:00:45 -08:00

... 86 87 88 89 90 ...

56429 Commits