Commit Graph

48450 Commits

Author SHA1 Message Date
Al Viro
dc4067f671 orangefs: don't bother with splitting iovecs
copy_page_{to,from}_iter() advances it just fine *and* it has no
problem with partially consumed segments.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13 11:23:02 -05:00
Al Viro
3c2fcfcb68 orangefs: make wait_for_direct_io() take iov_iter
incidentally, insane or compromised server returning *more* than
requested on read should not oops the kernel - initialize the
iov_iter for read according to the iovec we've got.  That's why
pvfs_bufmap_copy_to_iovec() needed a separate size argument - we
shouldn't abuse iov_iter_count(iter) for passing that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13 11:21:37 -05:00
Al Viro
a5c126a522 orangefs: make precopy_buffers() take iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13 11:18:50 -05:00
Al Viro
5f0e3c953f orangefs: make postcopy_buffers() take iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13 11:11:55 -05:00
Al Viro
34204fde4c pvfs_bufmap_copy_from_iovec(): don't rely upon size being equal to iov_iter_count(iter)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13 10:57:53 -05:00
Al Viro
5c278228bb orangefs: explicitly pass the size to pvfs_bufmap_copy_to_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13 10:25:01 -05:00
Dan Williams
152d7bd80d dax: fix __dax_pmd_fault crash
Since 4.3 introduced devm_memremap_pages() the pfns handled by DAX may
optionally have a struct page backing.  When a mapped pfn reaches
vmf_insert_pfn_pmd() it fails with a crash signature like the following:

 kernel BUG at mm/huge_memory.c:905!
 [..]
 Call Trace:
  [<ffffffff812a73ba>] __dax_pmd_fault+0x2ea/0x5b0
  [<ffffffffa01a4182>] xfs_filemap_pmd_fault+0x92/0x150 [xfs]
  [<ffffffff811fbe02>] handle_mm_fault+0x312/0x1b50

Fix this by falling back to 4K mappings in the pfn_valid() case.  Longer
term, vmf_insert_pfn_pmd() needs to grow support for architectures that
can provide a 'pmd_special' capability.

Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-12 18:33:54 -08:00
Linus Torvalds
5e2078b289 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull misc block fixes from Jens Axboe:
 "Stuff that got collected after the merge window opened.  This
  contains:

   - NVMe:
        - Fix for non-striped transfer size setting for NVMe from
          Sathyavathi.
        - (Some) support for the weird Apple nvme controller in the
          macbooks. From Stephan Günther.

   - The error value leak for dax from Al.

   - A few minor blk-mq tweaks from me.

   - Add the new linux-block@vger.kernel.org mailing list to the
     MAINTAINERS file.

   - Discard fix for brd, from Jan.

   - A kerneldoc warning for block core from Randy.

   - An older fix from Vivek, converting a WARN_ON() to a rate limited
     printk when a device is hot removed with dirty inodes"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: don't hardcode blk_qc_t -> tag mask
  dax_io(): don't let non-error value escape via retval instead of EFAULT
  block: fix blk-core.c kernel-doc warning
  fs/block_dev.c: Remove WARN_ON() when inode writeback fails
  NVMe: add support for Apple NVMe controller
  NVMe: use split lo_hi_{read,write}q
  blk-mq: mark __blk_mq_complete_request() static
  MAINTAINERS: add reference to new linux-block list
  NVMe: Increase the max transfer size when mdts is 0
  brd: Refuse improperly aligned discard requests
2015-11-12 15:54:30 -08:00
Linus Torvalds
5d50ac70fe Merge tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs updates from Dave Chinner:
 "There is nothing really major here - the only significant addition is
  the per-mount operation statistics infrastructure.  Otherwises there's
  various ACL, xattr, DAX, AIO and logging fixes, and a smattering of
  small cleanups and fixes elsewhere.

  Summary:

   - per-mount operational statistics in sysfs
   - fixes for concurrent aio append write submission
   - various logging fixes
   - detection of zeroed logs and invalid log sequence numbers on v5 filesystems
   - memory allocation failure message improvements
   - a bunch of xattr/ACL fixes
   - fdatasync optimisation
   - miscellaneous other fixes and cleanups"

* tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (39 commits)
  xfs: give all workqueues rescuer threads
  xfs: fix log recovery op header validation assert
  xfs: Fix error path in xfs_get_acl
  xfs: optimise away log forces on timestamp updates for fdatasync
  xfs: don't leak uuid table on rmmod
  xfs: invalidate cached acl if set via ioctl
  xfs: Plug memory leak in xfs_attrmulti_attr_set
  xfs: Validate the length of on-disk ACLs
  xfs: invalidate cached acl if set directly via xattr
  xfs: xfs_filemap_pmd_fault treats read faults as write faults
  xfs: add ->pfn_mkwrite support for DAX
  xfs: DAX does not use IO completion callbacks
  xfs: Don't use unwritten extents for DAX
  xfs: introduce BMAPI_ZERO for allocating zeroed extents
  xfs: fix inode size update overflow in xfs_map_direct()
  xfs: clear PF_NOFREEZE for xfsaild kthread
  xfs: fix an error code in xfs_fs_fill_super()
  xfs: stats are no longer dependent on CONFIG_PROC_FS
  xfs: simplify /proc teardown & error handling
  xfs: per-filesystem stats counter implementation
  ...
2015-11-11 20:18:48 -08:00
Linus Torvalds
31c1febd7a Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Apologies for coming a little late in the merge window.  Fortunately
  this is another fairly quiet one:

  Mainly smaller bugfixes and cleanup.  We're still finding some bugs
  from the breakup of the big NFSv4 state lock in 3.17 -- thanks
  especially to Andrew Elble and Jeff Layton for tracking down some of
  the remaining races"

* tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux:
  svcrpc: document lack of some memory barriers
  nfsd: fix race with open / open upgrade stateids
  nfsd: eliminate sending duplicate and repeated delegations
  nfsd: remove recurring workqueue job to clean DRC
  SUNRPC: drop stale comment in svc_setup_socket()
  nfsd: ensure that seqid morphing operations are atomic wrt to copies
  nfsd: serialize layout stateid morphing operations
  nfsd: improve client_has_state to check for unused openowners
  nfsd: fix clid_inuse on mount with security change
  sunrpc/cache: make cache flushing more reliable.
  nfsd: move include of state.h from trace.c to trace.h
  sunrpc: avoid warning in gss_key_timeout
  lockd: get rid of reference-counted NSM RPC clients
  SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage()
  lockd: create NSM handles per net namespace
  nfsd: switch unsigned char flags in svc_fh to bools
  nfsd: move svc_fh->fh_maxsize to just after fh_handle
  nfsd: drop null test before destroy functions
  nfsd: serialize state seqid morphing operations
2015-11-11 20:11:28 -08:00
Linus Torvalds
842cf0b952 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - misc stable fixes

 - trivial kernel-doc and comment fixups

 - remove never-used block_page_mkwrite() wrapper function, and rename
   the function that is _actually_ used to not have double underscores.

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: 9p: cache.h: Add #define of include guard
  vfs: remove stale comment in inode_operations
  vfs: remove unused wrapper block_page_mkwrite()
  binfmt_elf: Correct `arch_check_elf's description
  fs: fix writeback.c kernel-doc warnings
  fs: fix inode.c kernel-doc warning
  fs/pipe.c: return error code rather than 0 in pipe_write()
  fs/pipe.c: preserve alloc_file() error code
  binfmt_elf: Don't clobber passed executable's file header
  FS-Cache: Handle a write to the page immediately beyond the EOF marker
  cachefiles: perform test on s_blocksize when opening cache file.
  FS-Cache: Don't override netfs's primary_index if registering failed
  FS-Cache: Increase reference of parent after registering, netfs success
  debugfs: fix refcount imbalance in start_creating
2015-11-11 09:45:24 -08:00
Al Viro
cadfbb6ec2 dax_io(): don't let non-error value escape via retval instead of EFAULT
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-11 09:36:57 -07:00
Vivek Goyal
dbd3ca5075 fs/block_dev.c: Remove WARN_ON() when inode writeback fails
If a block device is hot removed and later last reference to device
is put, we try to writeback the dirty inode. But device is gone and
that writeback fails.

Currently we do a WARN_ON() which does not seem to be the right thing.
Convert it to a ratelimited kernel warning.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
[jmoyer@redhat.com: get rid of unnecessary name initialization, 80 cols]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-11 09:36:57 -07:00
Tzvetelin Katchov
7c7afc440c fs: 9p: cache.h: Add #define of include guard
The include file was intended to have an include guard, but the #define
part is missing.

Signed-off-by: Tzvetelin Katchov <katchov@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:19:50 -05:00
Ross Zwisler
5c50002963 vfs: remove unused wrapper block_page_mkwrite()
The function currently called "__block_page_mkwrite()" used to be called
"block_page_mkwrite()" until a wrapper for this function was added by:

commit 24da4fab5a ("vfs: Create __block_page_mkwrite() helper passing
	error values back")

This wrapper, the current "block_page_mkwrite()", is currently unused.
__block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.

Remove the unused wrapper, rename __block_page_mkwrite() back to
block_page_mkwrite() and update the comment above block_page_mkwrite().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:19:33 -05:00
Maciej W. Rozycki
54d15714f7 binfmt_elf: Correct `arch_check_elf's description
Correct `arch_check_elf's description, mistakenly copied and pasted from
`arch_elf_pt_proc'.

Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:19:21 -05:00
Randy Dunlap
88a578d823 fs: fix writeback.c kernel-doc warnings
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro
to after the function's opening brace. Also #undef this macro at the
end of the function.

..//fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE'
..//fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:18:27 -05:00
Randy Dunlap
034ae4bac9 fs: fix inode.c kernel-doc warning
Fix kernel-doc warning in fs/inode.c:

..//fs/inode.c:1606: warning: No description found for parameter 'inode'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:18:27 -05:00
Eric Biggers
6ae0806993 fs/pipe.c: return error code rather than 0 in pipe_write()
pipe_write() would return 0 if it failed to merge the beginning of the
data to write with the last, partially filled pipe buffer.  It should
return an error code instead.  Userspace programs could be confused by
write() returning 0 when called with a nonzero 'count'.

The EFAULT error case was a regression from f0d1bec9d5 ("new helper:
copy_page_from_iter()"), while the ops->confirm() error case was a much
older bug.

Test program:

	#include <assert.h>
	#include <errno.h>
	#include <unistd.h>

	int main(void)
	{
		int fd[2];
		char data[1] = {0};

		assert(0 == pipe(fd));
		assert(1 == write(fd[1], data, 1));

		/* prior to this patch, write() returned 0 here  */
		assert(-1 == write(fd[1], NULL, 1));
		assert(errno == EFAULT);
	}

Cc: stable@vger.kernel.org # at least v3.15+
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:18:26 -05:00
Eric Biggers
e9bb1f9b12 fs/pipe.c: preserve alloc_file() error code
If sys_pipe() was unable to allocate a 'struct file', it always failed
with ENFILE, which means "The number of simultaneously open files in the
system would exceed a system-imposed limit." However, alloc_file()
actually returns an ERR_PTR value and might fail with other error codes.
Currently, in addition to ENFILE, it can fail with ENOMEM, potentially
when there are few open files in the system.  Update sys_pipe() to
preserve this error code.

In a prior submission of a similar patch (1) some concern was raised
about introducing a new error code for sys_pipe().  However, for most
system calls, programs cannot assume that new error codes will never be
introduced.  In addition, ENOMEM was, in fact, already a possible error
code for sys_pipe(), in the case where the file descriptor table could
not be expanded due to insufficient memory.

	(1) http://comments.gmane.org/gmane.linux.kernel/1357942

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:18:23 -05:00
Maciej W. Rozycki
b582ef5c53 binfmt_elf: Don't clobber passed executable's file header
Do not clobber the buffer space passed from `search_binary_handler' and
originally preloaded by `prepare_binprm' with the executable's file
header by overwriting it with its interpreter's file header.  Instead
keep the buffer space intact and directly use the data structure locally
allocated for the interpreter's file header, fixing a bug introduced in
2.1.14 with loadable module support (linux-mips.org commit beb11695
[Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
Adjust the amount of data read from the interpreter's file accordingly.

This was not an issue before loadable module support, because back then
`load_elf_binary' was executed only once for a given ELF executable,
whether the function succeeded or failed.

With loadable module support supported and enabled, upon a failure of
`load_elf_binary' -- which may for example be caused by architecture
code rejecting an executable due to a missing hardware feature requested
in the file header -- a module load is attempted and then the function
reexecuted by `search_binary_handler'.  With the executable's file
header replaced with its interpreter's file header the executable can
then be erroneously accepted in this subsequent attempt.

Cc: stable@vger.kernel.org # all the way back
Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:18:07 -05:00
David Howells
102f4d900c FS-Cache: Handle a write to the page immediately beyond the EOF marker
Handle a write being requested to the page immediately beyond the EOF
marker on a cache object.  Currently this gets an assertion failure in
CacheFiles because the EOF marker is used there to encode information about
a partial page at the EOF - which could lead to an unknown blank spot in
the file if we extend the file over it.

The problem is actually in fscache where we check the index of the page
being written against store_limit.  store_limit is set to the number of
pages that we're allowed to store by fscache_set_store_limit() - which
means it's one more than the index of the last page we're allowed to store.
The problem is that we permit writing to a page with an index _equal_ to
the store limit - when we should reject that case.

Whilst we're at it, change the triggered assertion in CacheFiles to just
return -ENOBUFS instead.

The assertion failure looks something like this:

CacheFiles: Assertion failed
1000 < 7b1 is false
------------[ cut here ]------------
kernel BUG at fs/cachefiles/rdwr.c:962!
...
RIP: 0010:[<ffffffffa02c9e83>]  [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]

Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754f (at least)
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:11:02 -05:00
NeilBrown
95201a4060 cachefiles: perform test on s_blocksize when opening cache file.
cachefiles requires that s_blocksize in the cache is not greater than
PAGE_SIZE, and performs the check every time a block is accessed.

Move the test to the place where the file is "opened", where other
file-validity tests are performed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:08:17 -05:00
Kinglong Mee
b130ed5998 FS-Cache: Don't override netfs's primary_index if registering failed
Only override netfs->primary_index when registering success.

Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:07:51 -05:00
Kinglong Mee
86108c2e34 FS-Cache: Increase reference of parent after registering, netfs success
If netfs exist, fscache should not increase the reference of parent's
usage and n_children, otherwise, never be decreased.

v2: thanks David's suggest,
 move increasing reference of parent if success
 use kmem_cache_free() freeing primary_index directly

v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"

Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:06:53 -05:00
Daniel Borkmann
0ee9608c89 debugfs: fix refcount imbalance in start_creating
In debugfs' start_creating(), we pin the file system to safely access
its root. When we failed to create a file, we unpin the file system via
failed_creating() to release the mount count and eventually the reference
of the vfsmount.

However, when we run into an error during lookup_one_len() when still
in start_creating(), we only release the parent's mutex but not so the
reference on the mount. Looks like it was done in the past, but after
splitting portions of __create_file() into start_creating() and
end_creating() via 190afd81e4 ("debugfs: split the beginning and the
end of __create_file() off"), this seemed missed. Noticed during code
review.

Fixes: 190afd81e4 ("debugfs: split the beginning and the end of __create_file() off")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:04:44 -05:00
Zhao Lei
d5f2e33b92 btrfs: Use fs_info directly in btrfs_delete_unused_bgs
No need to use root->fs_info in btrfs_delete_unused_bgs(),
use fs_info directly instead.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:27:24 -08:00
Zhao Lei
2c9fe83552 btrfs: Fix lost-data-profile caused by balance bg
Reproduce:
 (In integration-4.3 branch)

 TEST_DEV=(/dev/vdg /dev/vdh)
 TEST_DIR=/mnt/tmp

 umount "$TEST_DEV" >/dev/null
 mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"

 mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
 btrfs balance start -dusage=0 $TEST_DIR
 btrfs filesystem usage $TEST_DIR

 dd if=/dev/zero of="$TEST_DIR"/file count=100
 btrfs filesystem usage $TEST_DIR

Result:
 We can see "no data chunk" in first "btrfs filesystem usage":
 # btrfs filesystem usage $TEST_DIR
 Overall:
    ...
 Metadata,single: Size:8.00MiB, Used:0.00B
    /dev/vdg        8.00MiB
 Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
    /dev/vdg      122.88MiB
    /dev/vdh      122.88MiB
 System,single: Size:4.00MiB, Used:0.00B
    /dev/vdg        4.00MiB
 System,RAID1: Size:8.00MiB, Used:16.00KiB
    /dev/vdg        8.00MiB
    /dev/vdh        8.00MiB
 Unallocated:
    /dev/vdg        1.06GiB
    /dev/vdh        1.07GiB

 And "data chunks changed from raid1 to single" in second
 "btrfs filesystem usage":
 # btrfs filesystem usage $TEST_DIR
 Overall:
    ...
 Data,single: Size:256.00MiB, Used:0.00B
    /dev/vdh      256.00MiB
 Metadata,single: Size:8.00MiB, Used:0.00B
    /dev/vdg        8.00MiB
 Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
    /dev/vdg      122.88MiB
    /dev/vdh      122.88MiB
 System,single: Size:4.00MiB, Used:0.00B
    /dev/vdg        4.00MiB
 System,RAID1: Size:8.00MiB, Used:16.00KiB
    /dev/vdg        8.00MiB
    /dev/vdh        8.00MiB
 Unallocated:
    /dev/vdg        1.06GiB
    /dev/vdh      841.92MiB

Reason:
 btrfs balance delete last data chunk in case of no data in
 the filesystem, then we can see "no data chunk" by "fi usage"
 command.

 And when we do write operation to fs, the only available data
 profile is 0x0, result is all new chunks are allocated single type.

Fix:
 Allocate a data chunk explicitly to ensure we don't lose the
 raid profile for data.

Test:
 Test by above script, and confirmed the logic by debug output.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:27:20 -08:00
Zhao Lei
aefbe9a633 btrfs: Fix lost-data-profile caused by auto removing bg
Reproduce:
 (In integration-4.3 branch)

 TEST_DEV=(/dev/vdg /dev/vdh)
 TEST_DIR=/mnt/tmp

 umount "$TEST_DEV" >/dev/null
 mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"

 mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
 umount "$TEST_DEV"

 mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
 btrfs filesystem usage $TEST_DIR

We can see the data chunk changed from raid1 to single:
 # btrfs filesystem usage $TEST_DIR
 Data,single: Size:8.00MiB, Used:0.00B
    /dev/vdg        8.00MiB
 #

Reason:
 When a empty filesystem mount with -o nospace_cache, the last
 data blockgroup will be auto-removed in umount.

 Then if we mount it again, there is no data chunk in the
 filesystem, so the only available data profile is 0x0, result
 is all new chunks are created as single type.

Fix:
 Don't auto-delete last blockgroup for a raid type.

Test:
 Test by above script, and confirmed the logic by debug output.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:27:16 -08:00
Zhao Lei
3b5753ec23 btrfs: Remove len argument from scrub_find_csum
It is useless.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:27:13 -08:00
Zhao Lei
affe4a5ae1 btrfs: Reduce unnecessary arguments in scrub_recheck_block
We don't need pass so many arguments for recheck sblock now,
this patch cleans them.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:27:10 -08:00
Zhao Lei
ba7cf9882b btrfs: Use scrub_checksum_data and scrub_checksum_tree_block for scrub_recheck_block_checksum
We can use existing scrub_checksum_data() and scrub_checksum_tree_block()
for scrub_recheck_block_checksum(), instead of write duplicated code.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:27:06 -08:00
Zhao Lei
772d233f5d btrfs: Reset sblock->xxx_error stats before calling scrub_recheck_block_checksum
We should reset sblock->xxx_error stats before calling
scrub_recheck_block_checksum().

Current code run correctly because all sblock are allocated by
k[cz]alloc(), and the error stats are not got changed.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:27:03 -08:00
Zhao Lei
4734b7ed79 btrfs: scrub: setup all fields for sblock_to_check
scrub_setup_recheck_block() isn't setup all necessary fields for
sblock_to_check because history reason.

So current code need more arguments in severial functions,
and more local variables, just to passing these lacked values to
necessary place.

This patch setup above fields to sblock_to_check in
scrub_setup_recheck_block(), for:
1: more cleanup for function arg, local variable
2: to make sblock_to_check complete, then we can use sblock_to_check
   without concern about some uninitialized member.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:27:00 -08:00
Zhao Lei
9799d2c32b btrfs: scrub: set error stats when tree block spanning stripes
It is better to show error stats to user when we found tree block
spanning stripes.

On a btrfs created by old version of btrfs-convert:
Before patch:
  # btrfs scrub start -B /dev/vdh
  scrub done for 8b342d35-2904-41ab-b3cb-2f929709cf47
          scrub started at Tue Aug 25 21:19:09 2015 and finished after 00:00:00
          total bytes scrubbed: 53.54MiB with 0 errors
  # dmesg
  ...
  [  128.711434] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832
  [  128.712744] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368
  ...

After patch:
  # btrfs scrub start -B /dev/vdh
  scrub done for ff7f844b-7a4e-4b1a-88a9-8252ab25be1b
          scrub started at Tue Aug 25 21:42:29 2015 and finished after 00:00:00
          total bytes scrubbed: 53.60MiB with 2 errors
          error details:
          corrected errors: 0, uncorrectable errors: 2, unverified errors: 0
  ERROR: There are uncorrectable errors.
  # dmesg
  ...omit...
  #

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10 19:26:57 -08:00
Linus Torvalds
3419b45039 Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block
Pull block IO poll support from Jens Axboe:
 "Various groups have been doing experimentation around IO polling for
  (really) fast devices.  The code has been reviewed and has been
  sitting on the side for a few releases, but this is now good enough
  for coordinated benchmarking and further experimentation.

  Currently O_DIRECT sync read/write are supported.  A framework is in
  the works that allows scalable stats tracking so we can auto-tune
  this.  And we'll add libaio support as well soon.  Fow now, it's an
  opt-in feature for test purposes"

* 'for-4.4/io-poll' of git://git.kernel.dk/linux-block:
  direct-io: be sure to assign dio->bio_bdev for both paths
  directio: add block polling support
  NVMe: add blk polling support
  block: add block polling support
  blk-mq: return tag/queue combo in the make_request_fn handlers
  block: change ->make_request_fn() and users to return a queue cookie
2015-11-10 17:23:49 -08:00
Linus Torvalds
01504f5e9e Merge tag 'upstream-4.4-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI/UBIFS updates from Richard Weinberger:

 - access time support for UBIFS by Dongsheng Yang

 - random cleanups and bug fixes all over the place

* tag 'upstream-4.4-rc1' of git://git.infradead.org/linux-ubifs:
  ubifs: introduce UBIFS_ATIME_SUPPORT to ubifs
  ubifs: make ubifs_[get|set]xattr atomic
  UBIFS: Delete unnecessary checks before the function call "iput"
  UBI: Remove in vain semicolon
  UBI: Fastmap: Fix PEB array type
  UBIFS: Fix possible memory leak in ubifs_readdir()
  fs/ubifs: remove unnecessary new_valid_dev check
  ubi: fastmap: Implement produce_free_peb()
  UBIFS: print verbose message when rescanning a corrupted node
  UBIFS: call dbg_is_power_cut() instead of reading c->dbg->pc_happened
  UBI: drop null test before destroy functions
  UBI: Update comments to reflect UBI_METAONLY flag
  UBI: Fix debug message
  UBI: Fix typo in comment
  UBI: Fastmap: Simplify expression
  UBIFS: fix a typo in comment of ubifs_budget_req
  UBIFS: use kmemdup rather than duplicating its implementation
2015-11-10 16:35:06 -08:00
Abhi Das
acc546fd61 gfs2: Automatically set GFS2_DIF_SYSTEM flag on system files
When new files and directories are created inside a parent directory
we automatically inherit the GFS2_DIF_SYSTEM flag (if set) and assign
it to the new file/dirs.

All new system files/dirs created in the metafs by, say gfs2_jadd,
will have this flag set because they will have parent directories in
the metafs whose GFS2_DIF_SYSTEM flag has already been set (most likely
by a previous mkfs.gfs2)

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-11-10 15:11:06 -06:00
Linus Torvalds
264015f8a8 Merge tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
 "Outside of the new ACPI-NFIT hot-add support this pull request is more
  notable for what it does not contain, than what it does.  There were a
  handful of development topics this cycle, dax get_user_pages, dax
  fsync, and raw block dax, that need more more iteration and will wait
  for 4.5.

  The patches to make devm and the pmem driver NUMA aware have been in
  -next for several weeks.  The hot-add support has not, but is
  contained to the NFIT driver and is passing unit tests.  The coredump
  support is straightforward and was looked over by Jeff.  All of it has
  received a 0day build success notification across 107 configs.

  Summary:

   - Add support for the ACPI 6.0 NFIT hot add mechanism to process
     updates of the NFIT at runtime.

   - Teach the coredump implementation how to filter out DAX mappings.

   - Introduce NUMA hints for allocations made by the pmem driver, and
     as a side effect all devm allocations now hint their NUMA node by
     default"

* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  coredump: add DAX filtering for FDPIC ELF coredumps
  coredump: add DAX filtering for ELF coredumps
  acpi: nfit: Add support for hot-add
  nfit: in acpi_nfit_init, break on a 0-length table
  pmem, memremap: convert to numa aware allocations
  devm_memremap_pages: use numa_mem_id
  devm: make allocations numa aware by default
  devm_memremap: convert to return ERR_PTR
  devm_memunmap: use devres_release()
  pmem: kill memremap_pmem()
  x86, mm: quiet arch_add_memory()
2015-11-10 12:07:22 -08:00
Linus Torvalds
d55fc37856 Merge branch 'i2c/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:

 - New drivers: UniPhier (with and without FIFO)

 - some drivers got some bigger rework: ismt, designware, img-scb (rcar
   had to be reverted because issues were showing up just lately)

 - ACPI: reworked the device scanning and added support for muxes

... and quite a lot of driver bugfixes and cleanups this time.  All
files touched outside of the i2c realm have proper acks.

* 'i2c/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (70 commits)
  i2c: rcar: Revert the latest refactoring series
  i2c: pnx: remove superfluous assignment
  MAINTAINERS: i2c: drop i2c-pnx maintainer
  MAINTAINERS: i2c: mark also subdirectories as maintained
  i2c: cadence: enable driver for ARM64
  i2c: i801: Document Intel DNV and Broxton
  i2c: at91: manage unexpected RXRDY flag when starting a transfer
  i2c: pnx: Use setup_timer instead of open coding it
  i2c: add ACPI support for I2C mux ports
  acpi: add acpi_preset_companion() stub
  i2c: pxa: Add support for pxa910/988 & new configuration features
  i2c: au1550: Convert to devm_kzalloc and devm_ioremap_resource
  i2c-dev: Fix I2C_SLAVE ioctl comment
  i2c-dev: Fix typo in ioctl name reference
  i2c: sirf: tune the divider to make i2c bus freq more accurate
  i2c: imx: Use -ENXIO as error in the NACK case
  i2c: i801: Add support for Intel Broxton
  i2c: i801: Add support for Intel DNV
  i2c: mediatek: add i2c resume support
  i2c: imx: implement bus recovery
  ...
2015-11-10 11:58:25 -08:00
Jens Axboe
c1c534609f direct-io: be sure to assign dio->bio_bdev for both paths
btrfs sets ->submit_io(), and we failed to set the block dev for
that path. That resulted in a potential NULL dereference when
we later wait for IO in dio_await_one().

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-10 10:14:38 -07:00
Stephen Hemminger
257f871993 ovl: move super block magic number to magic.h
The overlayfs file system is not recognized by programs
like tail because the magic number is not in standard header location.

Move it so that the value will propagate on for the GNU library
and utilities. Needs to go in the fstatfs manual page as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2015-11-10 17:13:35 +01:00
Vito Caputo
e4ad29fa0d ovl: use a minimal buffer in ovl_copy_xattr
Rather than always allocating the high-order XATTR_SIZE_MAX buffer
which is costly and prone to failure, only allocate what is needed and
realloc if necessary.

Fixes https://github.com/coreos/bugs/issues/489

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
2015-11-10 17:08:42 +01:00
Miklos Szeredi
97daf8b97a ovl: allow zero size xattr
When ovl_copy_xattr() encountered a zero size xattr no more xattrs were
copied and the function returned success.  This is clearly not the desired
behavior.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
2015-11-10 17:08:41 +01:00
Andrew Elble
7fc0564e3a nfsd: fix race with open / open upgrade stateids
We observed multiple open stateids on the server for files that
seemingly should have been closed.

nfsd4_process_open2() tests for the existence of a preexisting
stateid. If one is not found, the locks are dropped and a new
one is created. The problem is that init_open_stateid(), which
is also responsible for hashing the newly initialized stateid,
doesn't check to see if another open has raced in and created
a matching stateid. This fix is to enable init_open_stateid() to
return the matching stateid and have nfsd4_process_open2()
swap to that stateid and switch to the open upgrade path.
In testing this patch, coverage to the newly created
path indicates that the race was indeed happening.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 09:29:45 -05:00
Andrew Elble
34ed9872e7 nfsd: eliminate sending duplicate and repeated delegations
We've observed the nfsd server in a state where there are
multiple delegations on the same nfs4_file for the same client.
The nfs client does attempt to DELEGRETURN these when they are presented to
it - but apparently under some (unknown) circumstances the client does not
manage to return all of them. This leads to the eventual
attempt to CB_RECALL more than one delegation with the same nfs
filehandle to the same client. The first recall will succeed, but the
next recall will fail with NFS4ERR_BADHANDLE. This leads to the server
having delegations on cl_revoked that the client has no way to FREE
or DELEGRETURN, with resulting inability to recover. The state manager
on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED,
and the state manager on the client will be looping unable to satisfy
the server.

List discussion also reports a race between OPEN and DELEGRETURN that
will be avoided by only sending the delegation once to the
client. This is also logically in accordance with RFC5561 9.1.1 and 10.2.

So, let's:

1.) Not hand out duplicate delegations.
2.) Only send them to the client once.

RFC 5561:

9.1.1:
"Delegations and layouts, on the other hand, are not associated with a
specific owner but are associated with the client as a whole
(identified by a client ID)."

10.2:
"...the stateid for a delegation is associated with a client ID and may be
used on behalf of all the open-owners for the given client.  A
delegation is made to the client as a whole and not to any specific
process or thread of control within it."

Reported-by: Eric Meddaugh <etmsys@rit.edu>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 09:29:44 -05:00
Jeff Layton
3e80dbcda7 nfsd: remove recurring workqueue job to clean DRC
We have a shrinker, we clean out the cache when nfsd is shut down, and
prune the chains on each request. A recurring workqueue job seems like
unnecessary overhead. Just remove it.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 09:25:51 -05:00
Ravishankar N
0b5da8db14 fuse: add support for SEEK_HOLE and SEEK_DATA in lseek
A useful performance improvement for accessing virtual machine images
via FUSE mount.

See https://bugzilla.redhat.com/show_bug.cgi?id=1220173 for a use-case
for glusterFS.

Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2015-11-10 10:32:37 +01:00
Roman Gushchin
3ca8138f01 fuse: break infinite loop in fuse_fill_write_pages()
I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.

Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.

A similar problem is described in 124d3b7041 ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.

Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.

Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Maxim Patlasov <mpatlasov@parallels.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: ea9b9907b8 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org>
2015-11-10 10:32:37 +01:00
Miklos Szeredi
2c5816b4be cuse: fix memory leak
The problem is that fuse_dev_alloc() acquires an extra reference to cc.fc,
and the original ref count is never dropped.

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: cc080e9e9b ("fuse: introduce per-instance fuse_dev structure")
Cc: <stable@vger.kernel.org> # v4.2+
2015-11-10 10:32:36 +01:00