Commit Graph

49262 Commits

Author SHA1 Message Date
Darrick J. Wong
b46dc03381 ext2: reject inodes with negative size
Don't load an inode with a negative size; this causes integer overflow
problems in the VFS.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-12-07 13:03:31 +01:00
Dave Chinner
a444d72e60 Merge branch 'xfs-4.10-misc-fixes-3' into for-next 2016-12-07 17:42:30 +11:00
Lucas Stach
6031e73a5b xfs: use rhashtable to track buffer cache
On filesystems with a lot of metadata and in metadata intensive workloads
xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of
the CPU time is spent on CPU cache misses while traversing the rbtree.

As the buffer cache does not need any kind of ordering, but fast lookups
a hashtable is the natural data structure to use. The rhashtable
infrastructure provides a self-scaling hashtable implementation and
allows lookups to proceed while the table is going through a resize
operation.

This reduces the CPU-time spent for the lookups to 1/3 even for small
filesystems with a relatively small number of cached buffers, with
possibly much larger gains on higher loaded filesystems.

[dchinner: reduce minimum hash size to an acceptable size for large
	   filesystems with many AGs with no active use.]
[dchinner: remove stale rbtree asserts.]
[dchinner: use xfs_buf_map for compare function argument.]
[dchinner: make functions static.]
[dchinner: remove redundant comments.]

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-07 17:36:36 +11:00
Miklos Szeredi
c01638f5d9 fuse: fix clearing suid, sgid for chown()
Basically, the pjdfstests set the ownership of a file to 06555, and then
chowns it (as root) to a new uid/gid. Prior to commit a09f99edde ("fuse:
fix killing s[ug]id in setattr"), fuse would send down a setattr with both
the uid/gid change and a new mode.  Now, it just sends down the uid/gid
change.

Technically this is NOTABUG, since POSIX doesn't _require_ that we clear
these bits for a privileged process, but Linux (wisely) has done that and I
think we don't want to change that behavior here.

This is caused by the use of should_remove_suid(), which will always return
0 when the process has CAP_FSETID.

In fact we really don't need to be calling should_remove_suid() at all,
since we've already been indicated that we should remove the suid, we just
don't want to use a (very) stale mode for that.

This patch should fix the above as well as simplify the logic.

Reported-by: Jeff Layton <jlayton@redhat.com> 
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: a09f99edde ("fuse: fix killing s[ug]id in setattr")
Cc: <stable@vger.kernel.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-12-06 16:18:45 +01:00
David Sterba
34441361c4 btrfs: opencode chunk locking, remove helpers
The helpers are trivial and we don't use them consistently.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney
3a45bb207e btrfs: remove root parameter from transaction commit/end routines
Now we only use the root parameter to print the root objectid in
a tracepoint.  We can use the root parameter from the transaction
handle for that.  It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney
bf89d38feb btrfs: split btrfs_wait_marked_extents into normal and tree log functions
btrfs_write_and_wait_marked_extents and btrfs_sync_log both call
btrfs_wait_marked_extents, which provides a core loop and then handles
errors differently based on whether it's it's a log root or not.

This means that btrfs_write_and_wait_marked_extents needs to take a root
because btrfs_wait_marked_extents requires one, even though it's only
used to determine whether the root is a log root.  The log root code
won't ever call into the transaction commit code using a log root, so we
can factor out the core loop and provide the error handling appropriate
to each waiter in new routines.  This allows us to eventually remove
the root argument from btrfs_commit_transaction, and as a result,
btrfs_end_transaction.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney
2ff7e61e0d btrfs: take an fs_info directly when the root is not used otherwise
There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer.  Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
afdb571890 btrfs: simplify btrfs_wait_cache_io prototype
With the exception of the one case where btrfs_wait_cache_io is called
without a block group, it's called with the same arguments.  The root
argument is only used in the special case, so let's factor out the core
and simplify the call in the normal case to require a trans, block group,
and path.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
71ff6437c2 btrfs: convert extent-tree tracepoints to use fs_info
The extent-tree tracepoints all operate on the extent root, regardless of
which root is passed in.  Let's just use the extent root objectid instead.
If it turns out that nobody is depending on the format of this tracepoint,
we can drop the root printing entirely.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
ccdf9b305a btrfs: root->fs_info cleanup, access fs_info->delayed_root directly
This results in btrfs_assert_delayed_root_empty and
btrfs_destroy_delayed_inode taking an fs_info instead of a root.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
0b246afa62 btrfs: root->fs_info cleanup, add fs_info convenience variables
In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable.  This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
6202df6921 btrfs: root->fs_info cleanup, update_block_group{,flags}
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
3796d33535 btrfs: root->fs_info cleanup, lock/unlock_chunks
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
27965b6c2c btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
da17066c40 btrfs: pull node/sector/stripe sizes out of root and into fs_info
We track the node sizes per-root, but they never vary from the values
in the superblock.  This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
f15376df0d btrfs: root->fs_info cleanup, io_ctl_init
The io_ctl->root member was only being used to access root->fs_info.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
fb456252d3 btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
c28f158e5e btrfs: struct reada_control.root -> reada_control.fs_info
The root is never used.  We substitute extent_root in for the
reada_find_extent call, since it's only ever used to obtain the node
size.  This call site will be changed to use fs_info in a later patch.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Jeff Mahoney
de14379225 btrfs: struct btrfsic_state->root should be an fs_info
The root member is never used except for obtaining an fs_info pointer.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Jeff Mahoney
2b2e27eb92 btrfs: alloc_reserved_file_extent trace point should use extent_root
Even though a separate root is passed in, we're still operating on the
extent root.  Let's use that for the trace point.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Jeff Mahoney
5112febbc7 btrfs: btrfs_init_new_device should use fs_info->dev_root
btrfs_init_new_device only uses the root passed in via the ioctl to
start the transaction.  Nothing else that happens is related to whatever
root the user used to initiate the ioctl.  We can drop the root requirement
and just use fs_info->dev_root instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Jeff Mahoney
6bccf3ab1e btrfs: call functions that always use the same root with fs_info instead
There are many functions that are always called with the same root
argument.  Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Jeff Mahoney
5b4aacefb8 btrfs: call functions that overwrite their root parameter with fs_info
There are 11 functions that accept a root parameter and immediately
overwrite it.  We can pass those an fs_info pointer instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Trond Myklebust
362fb578a5 pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateid
Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the
layout stateid, so that processes and RPC tasks that are waiting on
the layout return can continue.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-05 22:52:01 -05:00
Al Viro
8f64fb1cce namei: fold should_follow_link() with the step into not-followed link
All callers are followed by the same boilerplate - "if it has returned
0, update nd->path/inode/seq - we are not following a symlink here".
Pull it into the function itself, renaming it into step_into().
Rename WALK_GET to WALK_FOLLOW, while we are at it - more descriptive
name.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:11:58 -05:00
Al Viro
31d66bcd3f namei: pass both WALK_GET and WALK_MORE to should_follow_link()
... and pull put_link() logics into it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:11:58 -05:00
Al Viro
1c4ff1a87e namei: invert WALK_PUT logics
... turning the condition for put_link() in walk_component() into
"WALK_MORE not passed and depth is non-zero".  Again, makes for
simpler arguments.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:11:57 -05:00
Al Viro
7f49b47109 namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
Simplifies the arguments both for it and for walk_component()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:11:57 -05:00
Al Viro
ba8f46135a namei: saner calling conventions for mountpoint_last()
leave the result in nd->path, have caller do follow_mount() and
copy it to the final destination.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:11:57 -05:00
Al Viro
c1d4dd2767 namei.c: get rid of user_path_parent()
direct use of filename_parentat() is just as readable

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:11:57 -05:00
Al Viro
f0bb5aaf2c vfs: misc struct path constification
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:03:49 -05:00
Al Viro
ca71cf71ee namespace.c: constify struct path passed to a bunch of primitives
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:03:12 -05:00
Al Viro
8c54ca9c68 quota: constify struct path in quota_on
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:03:06 -05:00
Al Viro
a4141d7cf8 constify alloc_file()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:01:16 -05:00
Al Viro
92872094a1 constify btrfs_mksubvol()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:01:16 -05:00
Al Viro
5b5577e4eb autofs: constify find_autofs_mount() callback
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:01:16 -05:00
Al Viro
71215a75ce constify get_dcookie() and friends
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:01:16 -05:00
Al Viro
12c7f9dc0f constify fsnotify_parent()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 18:58:32 -05:00
Al Viro
e637835ecc fsnotify(): constify 'data'
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 18:58:31 -05:00
Al Viro
3cd5eca8d7 fsnotify: constify 'data' passed to ->handle_event()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 18:58:31 -05:00
Mickaël Salaün
640eb7e7b5 fs: Constify path_is_under()'s arguments
The function path_is_under() doesn't modify the paths pointed by its
arguments but only browse them. Constifying this pointers make a cleaner
interface to be used by (future) code which may only have access to
const struct path pointers (e.g. LSM hooks).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 18:55:47 -05:00
Pavel Shilovsky
96a988ffeb CIFS: Fix a possible double locking of mutex during reconnect
With the current code it is possible to lock a mutex twice when
a subsequent reconnects are triggered. On the 1st reconnect we
reconnect sessions and tcons and then persistent file handles.
If the 2nd reconnect happens during the reconnecting of persistent
file handles then the following sequence of calls is observed:

cifs_reopen_file -> SMB2_open -> small_smb2_init -> smb2_reconnect
-> cifs_reopen_persistent_file_handles -> cifs_reopen_file (again!).

So, we are trying to acquire the same cfile->fh_mutex twice which
is wrong. Fix this by moving reconnecting of persistent handles to
the delayed work (smb2_reconnect_server) and submitting this work
every time we reconnect tcon in SMB2 commands handling codepath.

This can also lead to corruption of a temporary file list in
cifs_reopen_persistent_file_handles() because we can recursively
call this function twice.

Cc: Stable <stable@vger.kernel.org> # v4.9+
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-12-05 12:52:01 -08:00
Pavel Shilovsky
53e0e11efe CIFS: Fix a possible memory corruption during reconnect
We can not unlock/lock cifs_tcp_ses_lock while walking through ses
and tcon lists because it can corrupt list iterator pointers and
a tcon structure can be released if we don't hold an extra reference.
Fix it by moving a reconnect process to a separate delayed work
and acquiring a reference to every tcon that needs to be reconnected.
Also do not send an echo request on newly established connections.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-12-05 12:08:33 -08:00
Jaegeuk Kim
f455c8a5f0 f2fs: call sync_fs when f2fs is idle
The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05 11:44:07 -08:00
Jaegeuk Kim
204706c7ac Revert "f2fs: use percpu_counter for # of dirty pages in inode"
This reverts commit 1beba1b3a9.

The perpcu_counter doesn't provide atomicity in single core and consume more
DRAM. That incurs fs_mark test failure due to ENOMEM.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05 11:43:59 -08:00
Al Viro
cbbd26b8b1 [iov_iter] new primitives - copy_from_iter_full() and friends
copy_from_iter_full(), copy_from_iter_full_nocache() and
csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
et.al., advancing iterator only in case of successful full copy
and returning whether it had been successful or not.

Convert some obvious users.  *NOTE* - do not blindly assume that
something is a good candidate for those unless you are sure that
not advancing iov_iter in failure case is the right thing in
this case.  Anything that does short read/short write kind of
stuff (or is in a loop, etc.) is unlikely to be a good one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 14:33:36 -05:00
Pavel Shilovsky
e3d240e9d5 CIFS: Fix a possible memory corruption in push locks
If maxBuf is not 0 but less than a size of SMB2 lock structure
we can end up with a memory corruption.

Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-12-05 11:08:55 -08:00
Pavel Shilovsky
4772c79599 CIFS: Fix missing nls unload in smb2_reconnect()
Cc: Stable <stable@vger.kernel.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-12-05 11:08:40 -08:00
Dave Chinner
cae028df53 xfs: optimise CRC updates
Nick Piggin reported that the CRC overhead in an fsync heavy
workload was higher than expected on a Power8 machine. Part of this
was to do with the fact that the power8 CRC implementation is not
efficient for CRC lengths of less than 512 bytes, and so the way we
split the CRCs over the CRC field means a lot of the CRCs are
reduced to being less than than optimal size.

To optimise this, change the CRC update mechanism to zero the CRC
field first, and then compute the CRC in one pass over the buffer
and write the result back into the buffer. We can do this safely
because anything writing a CRC has exclusive access to the buffer
the CRC is being calculated over.

We leave the CRC verify code the same - it still splits the CRC
calculation - because we do not want read-only operations modifying
the underlying buffer. This is because read-only operations may not
have an exclusive access to the buffer guaranteed, and so temporary
modifications could leak out to to other processes accessing the
buffer concurrently.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-12-05 14:40:32 +11:00