android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Fabian Frederick	75f271380d	udf: use __packed instead of __attribute__ ((packed)) defined in linux/compiler-gcc.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>	2017-01-10 11:29:11 +01:00
Gu Zheng	497de07d89	tmpfs: clear S_ISGID when setting posix ACLs This change was missed the tmpfs modification in In CVE-2016-7097 commit `073931017b` ("posix_acl: Clear SGID bit when setting file permissions") It can test by xfstest generic/375, which failed to clear setgid bit in the following test case on tmpfs: touch $testfile chown 100:100 $testfile chmod 2755 $testfile _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile Signed-off-by: Gu Zheng <guzheng1@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-01-10 01:29:48 -05:00
Al Viro	4675ac39b5	namei.c: split unlazy_walk() In all but one case, the last two arguments are NULL and 0 resp.; almost everyone just wants to switch nameidata to non-RCU mode. The only exception is lookup_fast(), where we have a child dentry we want to legitimize as well. Split these two cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-01-09 22:29:15 -05:00
Al Viro	a89f833737	namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-01-09 22:25:28 -05:00
Zhou Chengming	93362fa47f	sysctl: Drop reference added by grab_header in proc_sys_readdir Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference added by grab_header when return from !dir_emit_dots path. It can cause any path called unregister_sysctl_table will wait forever. The calltrace of CVE-2016-9191: [ 5535.960522] Call Trace: [ 5535.963265] [<ffffffff817cdaaf>] schedule+0x3f/0xa0 [ 5535.968817] [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0 [ 5535.975346] [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130 [ 5535.982256] [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130 [ 5535.988972] [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80 [ 5535.994804] [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0 [ 5536.001227] [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0 [ 5536.007648] [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0 [ 5536.014654] [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0 [ 5536.021657] [<ffffffff810f57f5>] unregister_sched_domain_sysctl+0x15/0x40 [ 5536.029344] [<ffffffff810d7704>] partition_sched_domains+0x44/0x450 [ 5536.036447] [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0 [ 5536.043844] [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0 [ 5536.051336] [<ffffffff8116789d>] update_flag+0x11d/0x210 [ 5536.057373] [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450 [ 5536.064186] [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60 [ 5536.070899] [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10 [ 5536.077420] [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450 [ 5536.084234] [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220 [ 5536.091049] [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60 [ 5536.097571] [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220 [ 5536.104207] [<ffffffff810bc83f>] process_one_work+0x1df/0x710 [ 5536.110736] [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710 [ 5536.117461] [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0 [ 5536.123697] [<ffffffff810bcd70>] ? process_one_work+0x710/0x710 [ 5536.130426] [<ffffffff810c3f7e>] kthread+0xfe/0x120 [ 5536.135991] [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40 [ 5536.142041] [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230 One cgroup maintainer mentioned that "cgroup is trying to offline a cpuset css, which takes place under cgroup_mutex. The offlining ends up trying to drain active usages of a sysctl table which apprently is not happening." The real reason is that proc_sys_readdir doesn't drop reference added by grab_header when return from !dir_emit_dots path. So this cpuset offline path will wait here forever. See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13 Fixes: `f0c3b5093a` ("[readdir] convert procfs") Cc: stable@vger.kernel.org Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: Yang Shukui <yangshukui@huawei.com> Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2017-01-10 13:34:57 +13:00
Eric W. Biederman	75422726b0	libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount Add MS_KERNMOUNT to the flags that are passed. Use sget_userns and force &init_user_ns instead of calling sget so that even if called from a weird context the internal filesystem will be considered to be in the intial user namespace. Luis Ressel reported that the the failure to pass MS_KERNMOUNT into mount_pseudo broke his in development graphics driver that uses the generic drm infrastructure. I am not certain the deriver was bug free in it's usage of that infrastructure but since mount_pseudo_xattr can never be triggered by userspace it is clearer and less error prone, and less problematic for the code to be explicit. Reported-by: Luis Ressel <aranea@aixah.de> Tested-by: Luis Ressel <aranea@aixah.de> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2017-01-10 13:34:55 +13:00
Eric W. Biederman	3895dbf898	mnt: Protect the mountpoint hashtable with mount_lock Protecting the mountpoint hashtable with namespace_sem was sufficient until a call to umount_mnt was added to mntput_no_expire. At which point it became possible for multiple calls of put_mountpoint on the same hash chain to happen on the same time. Kristen Johansen <kjlx@templeofstupid.com> reported: > This can cause a panic when simultaneous callers of put_mountpoint > attempt to free the same mountpoint. This occurs because some callers > hold the mount_hash_lock, while others hold the namespace lock. Some > even hold both. > > In this submitter's case, the panic manifested itself as a GP fault in > put_mountpoint() when it called hlist_del() and attempted to dereference > a m_hash.pprev that had been poisioned by another thread. Al Viro observed that the simple fix is to switch from using the namespace_sem to the mount_lock to protect the mountpoint hash table. I have taken Al's suggested patch moved put_mountpoint in pivot_root (instead of taking mount_lock an additional time), and have replaced new_mountpoint with get_mountpoint a function that does the hash table lookup and addition under the mount_lock. The introduction of get_mounptoint ensures that only the mount_lock is needed to manipulate the mountpoint hashtable. d_set_mounted is modified to only set DCACHE_MOUNTED if it is not already set. This allows get_mountpoint to use the setting of DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry happens exactly once. Cc: stable@vger.kernel.org Fixes: `ce07d891a0` ("mnt: Honor MNT_LOCKED when detaching mounts") Reported-by: Krister Johansen <kjlx@templeofstupid.com> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2017-01-10 13:34:43 +13:00
Christoph Hellwig	84a4620cfe	xfs: don't print warnings when xfs_log_force fails There are only two reasons for xfs_log_force / xfs_log_force_lsn to fail: one is an I/O error, for which xlog_bdstrat already logs a warning, and the second is an already shutdown log due to a previous I/O errors. In the latter case we'll already have a previous indication for the actual error, but the large stream of misleading warnings from xfs_log_force will probably scroll it out of the message buffer. Simply removing the warnings thus makes the XFS log reporting significantly better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-09 13:45:01 -08:00
Christoph Hellwig	12ef830198	xfs: don't rely on ->total in xfs_alloc_space_available ->total is a bit of an odd parameter passed down to the low-level allocator all the way from the high-level callers. It's supposed to contain the maximum number of blocks to be allocated for the whole transaction [1]. But in xfs_iomap_write_allocate we only convert existing delayed allocations and thus only have a minimal block reservation for the current transaction, so xfs_alloc_space_available can't use it for the allocation decisions. Use the maximum of args->total and the calculated block requirement to make a decision. We probably should get rid of args->total eventually and instead apply ->minleft more broadly, but that will require some extensive changes all over. [1] which creates lots of confusion as most callers don't decrement it once doing a first allocation. But that's for a separate series. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-09 13:45:01 -08:00
Christoph Hellwig	54fee133ad	xfs: adjust allocation length in xfs_alloc_space_available We must decide in xfs_alloc_fix_freelist if we can perform an allocation from a given AG is possible or not based on the available space, and should not fail the allocation past that point on a healthy file system. But currently we have two additional places that second-guess xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the maxlen parameter to remove the reservation before doing the allocation (but ignores the various minium freespace requirements), and xfs_alloc_fix_minleft tries to fix up the allocated length after we've found an extent, but ignores the reservations and also doesn't take the AGFL into account (and thus fails allocations for not matching minlen in some cases). Remove all these later fixups and just correct the maxlen argument inside xfs_alloc_fix_freelist once we have the AGF buffer locked. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-09 13:37:44 -08:00
Christoph Hellwig	255c516278	xfs: fix bogus minleft manipulations We can't just set minleft to 0 when we're low on space - that's exactly what we need minleft for: to protect space in the AG for btree block allocations when we are low on free space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-09 13:36:36 -08:00
Christoph Hellwig	5149fd327f	xfs: bump up reserved blocks in xfs_alloc_set_aside Setting aside 4 blocks globally for bmbt splits isn't all that useful, as different threads can allocate space in parallel. Bump it to 4 blocks per AG to allow each thread that is currently doing an allocation to dip into it separately. Without that we may no have enough reserved blocks if there are enough parallel transactions in an almost out space file system that all run into bmap btree splits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-09 13:35:00 -08:00
David S. Miller	aaa9c1071d	Merge tag 'rxrpc-rewrite-20170109' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== afs: Refcount afs_call struct These patches provide some tracepoints for AFS and fix a potential leak by adding refcounting to the afs_call struct. The patches are: (1) Add some tracepoints for logging incoming calls and monitoring notifications from AF_RXRPC and data reception. (2) Get rid of afs_wait_mode as it didn't turn out to be as useful as initially expected. It can be brought back later if needed. This clears some stuff out that I don't then need to fix up in (4). (3) Allow listen(..., 0) to be used to disable listening. This makes shutting down the AFS cache manager server in the kernel much easier and the accounting simpler as we can then be sure that (a) all preallocated afs_call structs are relesed and (b) no new incoming calls are going to be started. For the moment, listening cannot be reenabled. (4) Add refcounting to the afs_call struct to fix a potential multiple release detected by static checking and add a tracepoint to follow the lifecycle of afs_call objects. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-09 15:47:52 -05:00
David S. Miller	bb1d303444	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-01-09 15:39:11 -05:00
Stephen Smalley	b21507e272	proc,security: move restriction on writing /proc/pid/attr nodes to proc Processes can only alter their own security attributes via /proc/pid/attr nodes. This is presently enforced by each individual security module and is also imposed by the Linux credentials implementation, which only allows a task to alter its own credentials. Move the check enforcing this restriction from the individual security modules to proc_pid_attr_write() before calling the security hook, and drop the unnecessary task argument to the security hook since it can only ever be the current task. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2017-01-09 10:07:31 -05:00
David Howells	341f741f04	afs: Refcount the afs_call struct A static checker warning occurs in the AFS filesystem: fs/afs/cmservice.c:155 SRXAFSCB_CallBack() error: dereferencing freed memory 'call' due to the reply being sent before we access the server it points to. The act of sending the reply causes the call to be freed if an error occurs (but not if it doesn't). On top of this, the lifetime handling of afs_call structs is fragile because they get passed around through workqueues without any sort of refcounting. Deal with the issues by: (1) Fix the maybe/maybe not nature of the reply sending functions with regards to whether they release the call struct. (2) Refcount the afs_call struct and sort out places that need to get/put references. (3) Pass a ref through the work queue and release (or pass on) that ref in the work function. Care has to be taken because a work queue may already own a ref to the call. (4) Do the cleaning up in the put function only. (5) Simplify module cleanup by always incrementing afs_outstanding_calls whenever a call is allocated. (6) Set the backlog to 0 with kernel_listen() at the beginning of the process of closing the socket to prevent new incoming calls from occurring and to remove the contribution of preallocated calls from afs_outstanding_calls before we wait on it. A tracepoint is also added to monitor the afs_call refcount and lifetime. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Fixes: `08e0e7c82e`: "[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC."	2017-01-09 11:10:02 +00:00
David Howells	56ff9c8377	afs: Kill afs_wait_mode The afs_wait_mode struct isn't really necessary. Client calls only use one of a choice of two (synchronous or the asynchronous) and incoming calls don't use the wait at all. Replace with a boolean parameter. Signed-off-by: David Howells <dhowells@redhat.com>	2017-01-09 11:10:02 +00:00
Liu Bo	92a1bf76a8	Btrfs: add 'inode' for extent map tracepoint 'inode' is an important field for btrfs_get_extent, lets trace it. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-01-09 11:27:02 +01:00
David Sterba	ac0c7cf8be	btrfs: fix crash when tracepoint arguments are freed by wq callbacks Enabling btrfs tracepoints leads to instant crash, as reported. The wq callbacks could free the memory and the tracepoints started to dereference the members to get to fs_info. The proposed fix https://marc.info/?l=linux-btrfs&m=148172436722606&w=2 removed the tracepoints but we could preserve them by passing only the required data in a safe way. Fixes: `bc074524e1` ("btrfs: prefix fsid to all trace events") CC: stable@vger.kernel.org # 4.8+ Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-01-09 11:24:50 +01:00
David Howells	8e8d7f13b6	afs: Add some tracepoints Add three tracepoints to the AFS filesystem: (1) The afs_recv_data tracepoint logs data segments that are extracted from the data received from the peer through afs_extract_data(). (2) The afs_notify_call tracepoint logs notification from AF_RXRPC of data coming in to an asynchronous call. (3) The afs_cb_call tracepoint logs incoming calls that have had their operation ID extracted and mapped into a supported cache manager service call. To make (3) work, the name strings in the afs_call_type struct objects have to be annotated with __tracepoint_string. This is done with the CM_NAME() macro. Further, the AFS call state enum needs a name so that it can be used to declare parameter types. Signed-off-by: David Howells <dhowells@redhat.com>	2017-01-09 09:18:13 +00:00
Al Viro	209a7fb210	lookup_fast(): clean up the logics around the fallback to non-rcu mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-01-09 01:35:39 -05:00
Al Viro	ad1633a151	namei: fold unlazy_link() into its sole caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-01-08 22:35:31 -05:00
Roman Pen	03e916fa8b	ext4: do not polute the extents cache while shifting extents Inside ext4_ext_shift_extents() function ext4_find_extent() is called without EXT4_EX_NOCACHE flag, which should prevent cache population. This leads to oudated offsets in the extents tree and wrong blocks afterwards. Patch fixes the problem providing EXT4_EX_NOCACHE flag for each ext4_find_extents() call inside ext4_ext_shift_extents function. Fixes: `331573febb` Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org	2017-01-08 21:00:35 -05:00
Roman Pen	2a9b8cba62	ext4: Include forgotten start block on fallocate insert range While doing 'insert range' start block should be also shifted right. The bug can be easily reproduced by the following test: ptr = malloc(4096); assert(ptr); fd = open("./ext4.file", O_CREAT \| O_TRUNC \| O_RDWR, 0600); assert(fd >= 0); rc = fallocate(fd, 0, 0, 8192); assert(rc == 0); for (i = 0; i < 2048; i++) ((unsigned short )ptr + i) = 0xbeef; rc = pwrite(fd, ptr, 4096, 0); assert(rc == 4096); rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); for (block = 2; block < 1000; block++) { rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096); assert(rc == 0); for (i = 0; i < 2048; i++) ((unsigned short )ptr + i) = block; rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); } Because start block is not included in the range the hole appears at the wrong offset (just after the desired offset) and the following pwrite() overwrites already existent block, keeping hole untouched. Simple way to verify wrong behaviour is to check zeroed blocks after the test: $ hexdump ./ext4.file \| grep '0000 0000' The root cause of the bug is a wrong range (start, stop], where start should be inclusive, i.e. [start, stop]. This patch fixes the problem by including start into the range. But not to break left shift (range collapse) stop points to the beginning of the a block, not to the end. The other not obvious change is an iterator check on validness in a main loop. Because iterator is unsigned the following corner case should be considered with care: insert a block at 0 offset, when stop variables overflows and never becomes less than start, which is 0. To handle this special case iterator is set to NULL to indicate that end of the loop is reached. Fixes: `331573febb` Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org	2017-01-08 20:59:35 -05:00
Theodore Ts'o	56735be053	Merge branch 'fscrypt' into d	2017-01-08 20:57:35 -05:00
Eric Biggers	a5d431eff2	fscrypt: make fscrypt_operations.key_prefix a string There was an unnecessary amount of complexity around requesting the filesystem-specific key prefix. It was unclear why; perhaps it was envisioned that different instances of the same filesystem type could use different key prefixes, or that key prefixes could be binary. However, neither of those things were implemented or really make sense at all. So simplify the code by making key_prefix a const char *. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-08 01:03:41 -05:00
Theodore Ts'o	173b8439e1	ext4: don't allow encrypted operations without keys While we allow deletes without the key, the following should not be permitted: # cd /vdc/encrypted-dir-without-key # ls -l total 4 -rw-r--r-- 1 root root 0 Dec 27 22:35 6,LKNRJsp209FbXoSvJWzB -rw-r--r-- 1 root root 286 Dec 27 22:35 uRJ5vJh9gE7vcomYMqTAyD # mv uRJ5vJh9gE7vcomYMqTAyD 6,LKNRJsp209FbXoSvJWzB Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-08 00:58:23 -05:00
Bob Peterson	b63f5e8482	GFS2: Wake up io waiters whenever a flush is done Before this patch, if a process called function gfs2_log_reserve to reserve some journal blocks, but the journal not enough blocks were free, it would call io_schedule. However, in the log flush daemon, it woke up the waiters only if an gfs2_ail_flush was no longer required. This resulted in situations where processes would wait forever because the number of blocks required was so high that it pushed the journal into a perpetual state of flush being required. This patch changes the logd daemon so that it wakes up io waiters every time the log is actually flushed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2017-01-06 22:14:28 -05:00
David Howells	91b467e0a3	afs: Make afs_readpages() fetch data in bulk Make afs_readpages() use afs_vnode_fetch_data()'s new ability to take a list of pages and do a bulk fetch. Signed-off-by: David Howells <dhowells@redhat.com>	2017-01-06 16:54:41 +00:00
David Howells	196ee9cd2d	afs: Make afs_fs_fetch_data() take a list of pages Make afs_fs_fetch_data() take a list of pages for bulk data transfer. This will allow afs_readpages() to be made more efficient. Signed-off-by: David Howells <dhowells@redhat.com>	2017-01-06 16:54:41 +00:00
Linus Torvalds	6989606a72	Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit Pull audit fixes from Paul Moore: "Two small fixes relating to audit's use of fsnotify. The first patch plugs a leak and the second fixes some lock shenanigans. The patches are small and I banged on this for an afternoon with our testsuite and didn't see anything odd" * 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit: audit: Fix sleep in atomic fsnotify: Remove fsnotify_duplicate_mark()	2017-01-05 23:06:06 -08:00
Bob Peterson	f07b352021	GFS2: Made logd daemon take into account log demand Before this patch, the logd daemon only tried to flush things when the log blocks pinned exceeded a certain threshold. But when we're deleting very large files, it may require a huge number of journal blocks, and that, in turn, may exceed the threshold. This patch factors that into account. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2017-01-05 16:01:45 -05:00
Bob Peterson	2fcf5cc3be	GFS2: Limit number of transaction blocks requested for truncates This patch limits the number of transaction blocks requested during file truncates. If we have very large multi-terabyte files, and want to delete or truncate them, they might span so many resource groups that we overflow the journal blocks, and cause an assert failure. By limiting the number of blocks in the transaction, we prevent this overflow and give other running processes time to do transactions. The limiting factor I chose is sd_log_thresh2 which is currently set to 4/5ths of the journal. This same ratio is used in function gfs2_ail_flush_reqd to determine when a log flush is required. If we make the maximum value less than this, we can get into a infinite hang whereby the log stops moving because the number of used blocks is less than the threshold and the iterative loop needs more, but since we're under the threshold, the log daemon never starts any IO on the log. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2017-01-05 14:47:36 -05:00
Jan Kara	ad4d05329d	udf: Make stat on symlink report symlink length as st_size UDF encodes symlinks in a more complex fashion and thus i_size of a symlink does not match the lenght of a string returned by readlink(2). This confuses some applications (see bug 191241) and may be considered a violation of POSIX. Fix the problem by reading the link into page cache in response to stat(2) call and report the length of the decoded path. Signed-off-by: Jan Kara <jack@suse.cz>	2017-01-05 07:52:57 +01:00
Linus Torvalds	e02003b515	Merge tag 'xfs-for-linus-4.10-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: - fixes for crashes and double-cleanup errors - XFS maintainership handover - fix to prevent absurdly large block reservations - fix broken sysfs getter/setters * tag 'xfs-for-linus-4.10-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix max_retries _show and _store functions xfs: update MAINTAINERS xfs: fix crash and data corruption due to removal of busy COW extents xfs: use the actual AG length when reserving blocks xfs: fix double-cleanup when CUI recovery fails	2017-01-04 18:33:35 -08:00
Linus Torvalds	62f8c40592	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "A set of fixes for the current series, one fixing a regression with block size < page cache size in the alias series from Jan. Outside of that, two small cleanups for wbt from Bart, a nvme pull request from Christoph, and a few small fixes of documentation updates" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix up io_poll documentation block: Avoid that sparse complains about context imbalance in __wbt_wait() block: Make wbt_wait() definition consistent with declaration clean_bdev_aliases: Prevent cleaning blocks that are not in block range genhd: remove dead and duplicated scsi code block: add back plugging in __blkdev_direct_IO nvmet/fcloop: remove some logically dead code performing redundant ret checks nvmet: fix KATO offset in Set Features nvme/fc: simplify error handling of nvme_fc_create_hw_io_queues nvme/fc: correct some printk information nvme/scsi: Remove START STOP emulation nvme/pci: Delete misleading queue-wrap comment nvme/pci: Fix whitespace problem nvme: simplify stripe quirk nvme: update maintainers information	2017-01-04 09:03:37 -08:00
Carlos Maiolino	ff97f2399e	xfs: fix max_retries _show and _store functions max_retries _show and _store functions should test against cfg->max_retries, not cfg->retry_timeout Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-03 20:34:17 -08:00
Christoph Hellwig	a1b7a4dea6	xfs: fix crash and data corruption due to removal of busy COW extents There is a race window between write_cache_pages calling clear_page_dirty_for_io and XFS calling set_page_writeback, in which the mapping for an inode is tagged neither as dirty, nor as writeback. If the COW shrinker hits in exactly that window we'll remove the delayed COW extents and writepages trying to write it back, which in release kernels will manifest as corruption of the bmap btree, and in debug kernels will trip the ASSERT about now calling xfs_bmapi_write with the COWFORK flag for holes. A complex customer load manages to hit this window fairly reliably, probably by always having COW writeback in flight while the cow shrinker runs. This patch adds another check for having the I_DIRTY_PAGES flag set, which is still set during this race window. While this fixes the problem I'm still not overly happy about the way the COW shrinker works as it still seems a bit fragile. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-03 18:39:33 -08:00
Darrick J. Wong	20e73b000b	xfs: use the actual AG length when reserving blocks We need to use the actual AG length when making per-AG reservations, since we could otherwise end up reserving more blocks out of the last AG than there are actual blocks. Complained-about-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-01-03 18:39:33 -08:00
Darrick J. Wong	7a21272b08	xfs: fix double-cleanup when CUI recovery fails Dan Carpenter reported a double-free of rcur if _defer_finish fails while we're recovering CUI items. Fix the error recovery to prevent this. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-03 18:39:32 -08:00
Liu Bo	c2931667c8	Btrfs: adjust outstanding_extents counter properly when dio write is split Currently how btrfs dio deals with split dio write is not good enough if dio write is split into several segments due to the lack of contiguous space, a large dio write like 'dd bs=1G count=1' can end up with incorrect outstanding_extents counter and endio would complain loudly with an assertion. This fixes the problem by compensating the outstanding_extents counter in inode if a large dio write gets split. Reported-by: Anand Jain <anand.jain@oracle.com> Tested-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-01-03 17:29:50 +01:00
Liu Bo	781feef7e6	Btrfs: fix lockdep warning about log_mutex While checking INODE_REF/INODE_EXTREF for a corner case, we may acquire a different inode's log_mutex with holding the current inode's log_mutex, and lockdep has complained this with a possilble deadlock warning. Fix this by using mutex_lock_nested() when processing the other inode's log_mutex. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-01-03 15:19:28 +01:00
Liu Bo	e321f8a801	Btrfs: use down_read_nested to make lockdep silent If @block_group is not @used_bg, it'll try to get @used_bg's lock without droping @block_group 's lock and lockdep has throwed a scary deadlock warning about it. Fix it by using down_read_nested. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-01-03 15:19:17 +01:00
Jeff Mahoney	d028099643	btrfs: fix locking when we put back a delayed ref that's too new In __btrfs_run_delayed_refs, when we put back a delayed ref that's too new, we have already dropped the lock on locked_ref when we set ->processing = 0. This patch keeps the lock to cover that assignment. Fixes: `d7df2c796d` (Btrfs: attach delayed ref updates to delayed ref heads) Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-01-03 15:14:21 +01:00
Jeff Mahoney	aa7c8da35d	btrfs: fix error handling when run_delayed_extent_op fails In __btrfs_run_delayed_refs, the error path when run_delayed_extent_op fails sets locked_ref->processing = 0 but doesn't re-increment delayed_refs->num_heads_ready. As a result, we end up triggering the WARN_ON in btrfs_select_ref_head. Fixes: `d7df2c796d` (Btrfs: attach delayed ref updates to delayed ref heads) Reported-by: Jon Nelson <jnelson-suse@jamponi.net> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-01-03 15:14:08 +01:00
Steve Kenton	a17f0cb5b9	fs/udf: make #ifdef UDF_PREALLOCATE unconditional Signed-off-by: Steve Kenton <skenton@ou.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2017-01-03 10:51:45 +01:00
Deepa Dinamani	88b50ce3ab	fs: udf: Replace CURRENT_TIME with current_time() CURRENT_TIME is not y2038 safe. CURRENT_TIME macro is also not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Logical Volume Integrity format is described to have the same timestamp format for "Recording Date and time" as the other [a,c,m]timestamps. The function udf_time_to_disk_format() does this conversion. Hence the timestamp is passed directly to the function and not truncated. This is as per Arnd's suggestion on the thread. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz>	2017-01-03 10:51:26 +01:00
Linus Torvalds	c8b4ec8351	Merge tag 'fscrypt-for-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt fixes from Ted Ts'o: "Two fscrypt bug fixes, one of which was unmasked by an update to the crypto tree during the merge window" * tag 'fscrypt-for-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: fix renaming and linking special files fscrypt: fix the test_dummy_encryption mount option	2017-01-02 18:32:59 -08:00
Theodore Ts'o	5bbdcbbb39	fscrypt: make test_dummy_encryption require a keyring key Currently, the test_dummy_encryption ext4 mount option, which exists only to test encrypted I/O paths with xfstests, overrides all per-inode encryption keys with a fixed key. This change minimizes test_dummy_encryption-specific code path changes by supplying a fake context for directories which are not encrypted for use when creating new directories, files, or symlinks. This allows us to properly exercise the keyring lookup, derivation, and context inheritance code paths. Before mounting a file system using test_dummy_encryption, userspace must execute the following shell commands: mode='\x00\x00\x00\x00' raw="$(printf ""\\\\x%02x"" $(seq 0 63))" if lscpu \| grep "Byte Order" \| grep -q Little ; then size='\x40\x00\x00\x00' else size='\x00\x00\x00\x40' fi key="${mode}${raw}${size}" keyctl new_session echo -n -e "${key}" \| keyctl padd logon fscrypt:4242424242424242 @s Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-02 15:39:46 -05:00
Chandan Rajendra	6c006a9d94	clean_bdev_aliases: Prevent cleaning blocks that are not in block range The first block to be cleaned may start at a non-zero page offset. In such a scenario clean_bdev_aliases() will end up cleaning blocks that do not fall in the range of blocks to be cleaned. This commit fixes the issue by skipping blocks that do not fall in valid block range. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-02 09:35:14 -07:00

... 17 18 19 20 21 ...

48409 Commits