Commit Graph

41956 Commits

Author SHA1 Message Date
Dave Chinner
a448f8f1b7 Merge branch 'fallocate-insert-range' into for-next 2015-03-25 15:12:53 +11:00
Dave Chinner
2b93681f59 Merge branch 'xfs-misc-fixes-for-4.1-2' into for-next
Conflicts:
	fs/xfs/libxfs/xfs_bmap.c
	fs/xfs/xfs_inode.c
2015-03-25 15:12:30 +11:00
Namjae Jeon
a904b1ca57 xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS.

1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
   block number is not the starting block of the extent, split the extent
   such that the block number is the starting block of the extent.
4) Shift all the extents which are lying bewteen [offset, last allocated extent]
   towards right by len bytes. This step will make a hole of len bytes
   at offset.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 15:08:56 +11:00
Namjae Jeon
dd46c78778 fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
FALLOC_FL_INSERT_RANGE command is the opposite command of
FALLOC_FL_COLLAPSE_RANGE that is needed for someone who wants to add
some data in the middle of file.

FALLOC_FL_INSERT_RANGE will create space for writing new data within
a file after shifting extents to right as given length. This command
also has same limitations as FALLOC_FL_COLLAPSE_RANGE in that
operations need to be filesystem block boundary aligned and cannot
cross the current EOF.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 15:07:05 +11:00
Joe Perches
5e9383f97e xfs: Fix incorrect positive ENOMEM return
added a positive error return value.

This value filters up through the return layers and should be
negative as the other return values are in the same function.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 15:00:24 +11:00
Byoungyoung Lee
20dafeefac xfs: xfs_mru_cache_insert() should use GFP_NOFS
xfs_mru_cache_insert() can be called from within transaction context
during block allocation like so:

write(2)
  ....
    xfs_get_blocks
      xfs_iomap_write_direct
        start transaction
        xfs_bmapi_write
          xfs_bmapi_allocate
            xfs_bmap_btalloc
              xfs_bmap_btalloc_filestreams
                xfs_filestream_new_ag
                  xfs_filestream_pick_ag
                    xfs_mru_cache_insert
                      radix_tree_preload(GFP_KERNEL)

In this case, GFP_KERNEL is incorrect and can potentially lead to
deadlocks in memory reclaim. It should use GFP_NOFS allocations to
avoid lock recursion problems.

[dchinner: rewrote commit message]

Signed-off-by: Byoungyoung Lee <blee@gatech.edu>
Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:57:53 +11:00
Scott Wood
65dd297ac2 xfs: %pF is only for function pointers
Use %pS for actual addresses, otherwise you'll get bad output
on arches like ppc64 where %pF expects a function descriptor.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:56:21 +11:00
Fabian Frederick
29916df08d xfs: fix shadow warning in xfs_da3_root_split()
Use icnodehdr for struct xfs_da3_icnode_hdr instead of nodehdr
(already declared above).

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:55:25 +11:00
Fabian Frederick
86aaf02e57 xfs: use bool instead of int in xfs_rename()
new_parent and src_is_directory are only used in 0/1 context.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:54:53 +11:00
Eric Sandeen
b26384dc52 xfs: fix NULL pointer dereference in xfs_filestream_lookup_ag()
If xfs_filestream_get_parent() fails, we have a null pip,
goto out, and attempt to IRELE(NULL).  This causes a null
pointer dereference and BUG().

Fix this by directly returning NULLAGNUMBER in this case.

Reported-by: Adrien Nader <adrien@notk.org>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:54:25 +11:00
Dave Chinner
d64588ca28 xfs: remove xfs_bmap_sanity_check()
This code is redundant now that we have verifiers that sanity check
the buffers as they are read from disk.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:53:48 +11:00
Dave Chinner
d41bb03444 Merge branch 'xfs-rename-whiteout' into for-next
Conflicts:
	fs/xfs/xfs_inode.c
2015-03-25 14:29:13 +11:00
Dave Chinner
7dcf5c3e45 xfs: add RENAME_WHITEOUT support
Whiteouts are used by overlayfs -  it has a crazy convention that a
whiteout is a character device inode with a major:minor of 0:0.
Because it's not documented anywhere, here's an example of what
RENAME_WHITEOUT does on ext4:

# echo foo > /mnt/scratch/foo
# echo bar > /mnt/scratch/bar
# ls -l /mnt/scratch
total 24
-rw-r--r-- 1 root root     4 Feb 11 20:22 bar
-rw-r--r-- 1 root root     4 Feb 11 20:22 foo
drwx------ 2 root root 16384 Feb 11 20:18 lost+found
# src/renameat2 -w /mnt/scratch/foo /mnt/scratch/bar
# ls -l /mnt/scratch
total 20
-rw-r--r-- 1 root root     4 Feb 11 20:22 bar
c--------- 1 root root  0, 0 Feb 11 20:23 foo
drwx------ 2 root root 16384 Feb 11 20:18 lost+found
# cat /mnt/scratch/bar
foo
#

In XFS rename terms, the operation that has been done is that source
(foo) has been moved to the target (bar), which is like a nomal
rename operation, but rather than the source being removed, it have
been replaced with a whiteout.

We can't allocate whiteout inodes within the rename transaction due
to allocation being a multi-commit transaction: rename needs to
be a single, atomic commit. Hence we have several options here, form
most efficient to least efficient:

    - use DT_WHT in the target dirent and do no whiteout inode
      allocation.  The main issue with this approach is that we need
      hooks in lookup to create a virtual chardev inode to present
      to userspace and in places where we might need to modify the
      dirent e.g. unlink.  Overlayfs also needs to be taught about
      DT_WHT. Most invasive change, lowest overhead.

    - create a special whiteout inode in the root directory (e.g. a
      ".wino" dirent) and then hardlink every new whiteout to it.
      This means we only need to create a single whiteout inode, and
      rename simply creates a hardlink to it. We can use DT_WHT for
      these, though using DT_CHR means we won't have to modify
      overlayfs, nor anything in userspace. Downside is we have to
      look up the whiteout inode on every operation and create it if
      it doesn't exist.

    - copy ext4: create a special whiteout chardev inode for every
      whiteout.  This is more complex than the above options because
      of the lack of atomicity between inode creation and the rename
      operation, requiring us to create a tmpfile inode and then
      linking it into the directory structure during the rename. At
      least with a tmpfile inode crashes between the create and
      rename doesn't leave unreferenced inodes or directory
      pollution around.

By far the simplest thing to do in the short term is to copy ext4.
While it is the most inefficient way of supporting whiteouts, but as
an initial implementation we can simply reuse existing functions and
add a small amount of extra code the the rename operation.

When we get full whiteout support in the VFS (via the dentry cache)
we can then look to supporting DT_WHT method outlined as the first
method of supporting whiteouts. But until then, we'll stick with
what overlayfs expects us to be: dumb and stupid.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
2015-03-25 14:08:08 +11:00
Dave Chinner
eeacd3217b xfs: make xfs_cross_rename() complete fully
Now that xfs_finish_rename() exists, there is no reason for
xfs_cross_rename() to return to xfs_rename() to finish off the
rename transaction. Drive the completion code into
xfs_cross_rename() and handle all errors there so as to simplify
the xfs_rename() code.

Further, push the rename exchange target_ip check to early in the
rename code so as to make the error handling easy and obviously
correct.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:08:07 +11:00
Dave Chinner
310606b0c7 xfs: factor out xfs_finish_rename()
Rather than use a jump label for the final transaction commit in
the rename, factor it into a simple helper function and call it
appropriately. This slightly reduces the spaghetti nature of
xfs_rename.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:06:07 +11:00
Dave Chinner
445883e813 xfs: cleanup xfs_rename error handling
The jump labels are ambiguous and unclear and some of the error
paths are used inconsistently. Rules for error jumps are:

- use out_trans_cancel for unmodified transaction context
- use out_bmap_cancel on ENOSPC errors
- use out_trans_abort when transaction is likely to be dirty.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:05:43 +11:00
Dave Chinner
95afcf5c7b xfs: clean up inode locking for RENAME_WHITEOUT
When doing RENAME_WHITEOUT, we now have to lock 5 inodes into the
rename transaction. This means we need to update
xfs_sort_for_rename() and xfs_lock_inodes() to handle up to 5
inodes. Because of the vagaries of rename, this means we could have
anywhere between 3 and 5 inodes locked into the transaction....

While xfs_lock_inodes() does not need anything other than an assert
telling us we are passing more inodes that we ever thought we should
see, it could do with a logic rework to remove all the indenting.
This is not a functional change - it just makes the code a lot
easier to read.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:03:32 +11:00
Al Viro
6e8a1f8741 switch path_init() to struct filename
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24 17:19:16 -04:00
Al Viro
668696dcbb switch path_mountpoint() to struct filename
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24 17:19:15 -04:00
Al Viro
5eb6b495c6 switch path_lookupat() to struct filename
all callers were passing it ->name of some struct filename

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24 17:19:15 -04:00
Al Viro
94b5d2621a getname_flags(): clean up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24 17:19:14 -04:00
Hari Bathini
ae011d2e48 pstore: Add pstore type id for PPC64 opal nvram partition
This patch adds a new PPC64 partition type to be used for opal
specific nvram partition. A new partition type is needed as none
of the existing type matches this partition type.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:06:10 +11:00
Linus Torvalds
4541c22605 Merge tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are two bugfixes for things reported.  One regression in kernfs,
  and another issue fixed in the LZ4 code that was fixed in the
  "upstream" codebase that solves a reported kernel crash

  Both have been in linux-next for a while"

* tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  LZ4 : fix the data abort issue
  kernfs: handle poll correctly on 'direct_read' files.
2015-03-22 12:07:47 -07:00
Dominique Martinet
5c4086b8de fs/9p: Initialize status in v9fs_file_do_lock.
If p9_client_lock_dotl returns an error, status is possibly never filled
but will be used in the following switch.
Initializing it to P9_LOCK_ERROR makes sur we will return an error and
cleanup (and not hit the default case).

Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2015-03-21 19:30:31 -07:00
Linus Torvalds
521d474631 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Most of these are fixing extent reservation accounting, or corners
  with tree writeback during commit.

  Josef's set does add a test, which isn't strictly a fix, but it'll
  keep us from making this same mistake again"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix outstanding_extents accounting in DIO
  Btrfs: add sanity test for outstanding_extents accounting
  Btrfs: just free dummy extent buffers
  Btrfs: account merges/splits properly
  Btrfs: prepare block group cache before writing
  Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list)
  Btrfs: account for the correct number of extents for delalloc reservations
  Btrfs: fix merge delalloc logic
  Btrfs: fix comp_oper to get right order
  Btrfs: catch transaction abortion after waiting for it
  btrfs: fix sizeof format specifier in btrfs_check_super_valid()
2015-03-21 10:53:37 -07:00
Linus Torvalds
0d122f7430 Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Pull nfsd bufix from Bruce Fields:
 "This is a fix for a crash easily triggered by 4.1 activity to a server
  built with CONFIG_NFSD_PNFS.

  There are some more bugfixes queued up that I intend to pass along
  next week, but this is the most critical"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  Subject: nfsd: don't recursively call nfsd4_cb_layout_fail
2015-03-21 10:41:15 -07:00
Taesoo Kim
2bd50fb3d4 cifs: potential memory leaks when parsing mnt opts
For example, when mount opt is redundently specified
(e.g., "user=A,user=B,user=C"), kernel kept allocating new key/val
with kstrdup() and overwrite previous ptr (to be freed).

Althouhg mount.cifs in userspace performs a bit of sanitization
(e.g., forcing one user option), current implementation is not
robust. Other options such as iocharset and domainanme are similarly
vulnerable.

Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-03-21 12:01:50 -05:00
David Disseldorp
e1e9bda22d cifs: fix use-after-free bug in find_writable_file
Under intermittent network outages, find_writable_file() is susceptible
to the following race condition, which results in a user-after-free in
the cifs_writepages code-path:

Thread 1                                        Thread 2
========                                        ========

inv_file = NULL
refind = 0
spin_lock(&cifs_file_list_lock)

// invalidHandle found on openFileList

inv_file = open_file
// inv_file->count currently 1

cifsFileInfo_get(inv_file)
// inv_file->count = 2

spin_unlock(&cifs_file_list_lock);

cifs_reopen_file()                            cifs_close()
// fails (rc != 0)                            ->cifsFileInfo_put()
                                       spin_lock(&cifs_file_list_lock)
                                       // inv_file->count = 1
                                       spin_unlock(&cifs_file_list_lock)

spin_lock(&cifs_file_list_lock);
list_move_tail(&inv_file->flist,
      &cifs_inode->openFileList);
spin_unlock(&cifs_file_list_lock);

cifsFileInfo_put(inv_file);
->spin_lock(&cifs_file_list_lock)

  // inv_file->count = 0
  list_del(&cifs_file->flist);
  // cleanup!!
  kfree(cifs_file);

  spin_unlock(&cifs_file_list_lock);

spin_lock(&cifs_file_list_lock);
++refind;
// refind = 1
goto refind_writable;

At this point we loop back through with an invalid inv_file pointer
and a refind value of 1. On second pass, inv_file is not overwritten on
openFileList traversal, and is subsequently dereferenced.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@samba.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-03-21 10:56:27 -05:00
Sachin Prabhu
2477bc58d4 cifs: smb2_clone_range() - exit on unhandled error
While attempting to clone a file on a samba server, we receive a
STATUS_INVALID_DEVICE_REQUEST. This is mapped to -EOPNOTSUPP which
isn't handled in smb2_clone_range(). We end up looping in the while loop
making same call to the samba server over and over again.

The proposed fix is to exit and return the error value when encountered
with an unhandled error.

Cc: <stable@vger.kernel.org>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-03-21 10:56:22 -05:00
Kinglong Mee
a1420384e3 NFSD: Put exports after nfsd4_layout_verify fail
Fix commit 9cf514ccfa (nfsd: implement pNFS operations).

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-20 16:15:42 -04:00
Kinglong Mee
a68465c9cb NFSD: Error out when register_shrinker() fail
If register_shrinker() failed, nfsd will cause a NULL pointer access as,

[ 9250.875465] nfsd: last server has exited, flushing export cache
[ 9251.427270] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 9251.427393] IP: [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
[ 9251.427579] PGD 13e4d067 PUD 13e4c067 PMD 0
[ 9251.427633] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 9251.427706] Modules linked in: ip6t_rpfilter ip6t_REJECT bnep bluetooth xt_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw btrfs xfs microcode ppdev serio_raw pcspkr xor libcrc32c raid6_pq e1000 parport_pc parport i2c_piix4 i2c_core nfsd(OE-) auth_rpcgss nfs_acl lockd sunrpc(E) ata_generic pata_acpi
[ 9251.428240] CPU: 0 PID: 1557 Comm: rmmod Tainted: G           OE 3.16.0-rc2+ #22
[ 9251.428366] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 9251.428496] task: ffff880000849540 ti: ffff8800136f4000 task.ti: ffff8800136f4000
[ 9251.428593] RIP: 0010:[<ffffffff8136fc29>]  [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
[ 9251.428696] RSP: 0018:ffff8800136f7ea0  EFLAGS: 00010207
[ 9251.428751] RAX: 0000000000000000 RBX: ffffffffa0116d48 RCX: dead000000200200
[ 9251.428814] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0116d48
[ 9251.428876] RBP: ffff8800136f7ea0 R08: ffff8800136f4000 R09: 0000000000000001
[ 9251.428939] R10: 8080808080808080 R11: 0000000000000000 R12: ffffffffa011a5a0
[ 9251.429002] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000018ac090
[ 9251.429064] FS:  00007fb9acef0740(0000) GS:ffff88003fa00000(0000) knlGS:0000000000000000
[ 9251.429164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9251.429221] CR2: 0000000000000000 CR3: 0000000031a17000 CR4: 00000000001407f0
[ 9251.429306] Stack:
[ 9251.429410]  ffff8800136f7eb8 ffffffff8136fcdd ffffffffa0116d20 ffff8800136f7ed0
[ 9251.429511]  ffffffff8118a0f2 0000000000000000 ffff8800136f7ee0 ffffffffa00eb765
[ 9251.429610]  ffff8800136f7ef0 ffffffffa010e93c ffff8800136f7f78 ffffffff81104ac2
[ 9251.429709] Call Trace:
[ 9251.429755]  [<ffffffff8136fcdd>] list_del+0xd/0x30
[ 9251.429896]  [<ffffffff8118a0f2>] unregister_shrinker+0x22/0x40
[ 9251.430037]  [<ffffffffa00eb765>] nfsd_reply_cache_shutdown+0x15/0x90 [nfsd]
[ 9251.430106]  [<ffffffffa010e93c>] exit_nfsd+0x9/0x6cd [nfsd]
[ 9251.430192]  [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200
[ 9251.430280]  [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90
[ 9251.430395]  [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b
[ 9251.430457] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[ 9251.430691] RIP  [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
[ 9251.430755]  RSP <ffff8800136f7ea0>
[ 9251.430805] CR2: 0000000000000000
[ 9251.431033] ---[ end trace 080f3050d082b4ea ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-20 12:44:00 -04:00
Kinglong Mee
db59c0ef08 NFSD: Take care the return value from nfsd4_decode_stateid
Return status after nfsd4_decode_stateid failed.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-20 12:43:59 -04:00
Kinglong Mee
6f8f28ec5f NFSD: Check layout type when returning client layouts
According to RFC5661:
" When lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used
   to identify the file system and all layouts matching the client ID,
   the fsid of the file system, lora_layout_type, and lora_iomode are
   returned.  When lr_returntype is LAYOUTRETURN4_ALL, all layouts
   matching the client ID, lora_layout_type, and lora_iomode are
   returned and the current filehandle is not used. "

When returning client layouts, always check layout type.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-20 12:43:59 -04:00
Kinglong Mee
715a03d284 NFSD: restore trace event lost in mismerge
31ef83dc05 "nfsd: add trace events" had a typo that dropped a trace
event and replaced it by an incorrect recursive call to
nfsd4_cb_layout_fail.  133d558216 "Subject: nfsd: don't recursively
call nfsd4_cb_layout_fail" fixed the crash, this restores the
tracepoint.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-20 12:43:06 -04:00
Kirill A. Shutemov
b642f7269b 9p: do not crash on unknown lock status code
Current 9p implementation will crash whole system if sees unknown lock
status code. It's trivial target for DOS: 9p server can produce such
code easily.

Let's fallback more gracefully: warning in dmesg + -ENOLCK.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2015-03-20 07:34:41 -07:00
Kirill A. Shutemov
ad80492df5 9p: fix error handling in v9fs_file_do_lock
p9_client_lock_dotl() doesn't set status if p9_client_rpc() fails.
It can lead to 'default:' case in switch below and kernel crashes.

Let's bypass the switch if p9_client_lock_dotl() fails.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2015-03-20 07:34:41 -07:00
Fabian Frederick
e8782e2fbc 9p: kerneldoc warning fixes
options argument was removed from v9fs_session_info in commit 4b53e4b500
("9p: remove unnecessary v9fses->options which duplicates the mount string")

iov and nr_segs were removed from v9fs_direct_IO
in commit d8d3d94b80
("pass iov_iter to ->direct_IO()")

Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: v9fs-developer@lists.sourceforge.net
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2015-03-20 07:34:41 -07:00
Linus Torvalds
1e744c938d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "This fixes bugs in zero-copy splice to the fuse device"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: explicitly set /dev/fuse file's private_data
  fuse: set stolen page uptodate
  fuse: notify: don't move pages
2015-03-19 16:36:24 -07:00
Linus Torvalds
e409ac3550 Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "This fixes minor issues with the multi-layer update in v4.0"

* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: upper fs should not be R/O
  ovl: check lowerdir amount for non-upper mount
  ovl: print error message for invalid mount options
2015-03-19 16:27:36 -07:00
Christoph Hellwig
133d558216 Subject: nfsd: don't recursively call nfsd4_cb_layout_fail
Due to a merge error when creating c5c707f9 ("nfsd: implement pNFS
layout recalls"), we recursively call nfsd4_cb_layout_fail from itself,
leading to stack overflows.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes:  c5c707f9 ("nfsd: implement pNFS layout recalls")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/nfs4layouts.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 3c1bfa1..1028a06 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)

 	rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));

-	nfsd4_cb_layout_fail(ls);
-
 	printk(KERN_WARNING
 		"nfsd: client %s failed to respond to layout recall. "
 		"  Fencing..\n", addr_str);
--
1.9.1
2015-03-19 15:49:27 -04:00
Tom Van Braeckel
94e4fe2cab fuse: explicitly set /dev/fuse file's private_data
The misc subsystem (which is used for /dev/fuse) initializes private_data to
point to the misc device when a driver has registered a custom open file
operation, and initializes it to NULL when a custom open file operation has
*not* been provided.

This subtle quirk is confusing, to the point where kernel code registers
*empty* file open operations to have private_data point to the misc device
structure. And it leads to bugs, where the addition or removal of a custom open
file operation surprisingly changes the initial contents of a file's
private_data structure.

So to simplify things in the misc subsystem, a patch [1] has been proposed to
*always* set the private_data to point to the misc device, instead of only
doing this when a custom open file operation has been registered.

But before this patch can be applied we need to modify drivers that make the
assumption that a misc device file's private_data is initialized to NULL
because they didn't register a custom open file operation, so they don't rely
on this assumption anymore. FUSE uses private_data to store the fuse_conn and
errors out if this is not initialized to NULL at mount time.

Hence, we now set a file's private_data to NULL explicitly, to be independent
of whatever value the misc subsystem initializes it to by default.

[1] https://lkml.org/lkml/2014/12/4/939

Reported-by: Giedrius Statkevicius <giedriuswork@gmail.com>
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Tom Van Braeckel <tomvanbraeckel@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-03-19 15:29:22 +01:00
Li Xi
847aac644e vfs: Add general support to enforce project quota limits
This patch adds support for a new quota type PRJQUOTA for project quota
enforcement. Also a new method get_projid() is added into dquot_operations
structure.

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-03-18 21:55:08 +01:00
Abhi Das
d9be0cda77 gfs2: allow fallocate to max out quotas/fs efficiently
We can quickly get an estimate of how many blocks are available
for allocation restricted by quota and fs size respectively, using
the ap->allowed field in the gfs2_alloc_parms structure.
gfs2_quota_check() and gfs2_inplace_reserve() provide these values.

Once we have the total number of blocks available to us, we can
compute how many bytes of data can be written using those blocks
instead of guessing inefficiently.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-03-18 12:48:02 -05:00
Abhi Das
25435e5ed6 gfs2: allow quota_check and inplace_reserve to return available blocks
struct gfs2_alloc_parms is passed to gfs2_quota_check() and
gfs2_inplace_reserve() with ap->target containing the number of
blocks being requested for allocation in the current operation.

We add a new field to struct gfs2_alloc_parms called 'allowed'.
gfs2_quota_check() and gfs2_inplace_reserve() return the max
blocks allowed by quota and the max blocks allowed by the chosen
rgrp respectively in 'allowed'.

A new field 'min_target', when non-zero, tells gfs2_quota_check()
and gfs2_inplace_reserve() to not return -EDQUOT/-ENOSPC when
there are atleast 'min_target' blocks allowable/available. The
assumption is that the caller is ok with just 'min_target' blocks
and will likely proceed with allocating them.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-03-18 12:47:10 -05:00
Abhi Das
b8fbf471ed gfs2: perform quota checks against allocation parameters
Use struct gfs2_alloc_parms as an argument to gfs2_quota_check()
and gfs2_quota_lock_check() to check for quota violations while
accounting for the new blocks requested by the current operation
in ap->target.

Previously, the number of new blocks requested during an operation
were not accounted for during quota_check and would allow these
operations to exceed quota. This was not very apparent since most
operations allocated only 1 block at a time and quotas would get
violated in the next operation. i.e. quota excess would only be by
1 block or so. With fallocate, (where we allocate a bunch of blocks
at once) the quota excess is non-trivial and is addressed by this
patch.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-03-18 12:46:54 -05:00
Bob Peterson
f1ea6f4ec0 GFS2: Move gfs2_file_splice_write outside of #ifdef
This patch moves function gfs2_file_splice_write so it's not
conditionally compiled.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-03-18 12:43:09 -05:00
Bob Peterson
f42a69fadc GFS2: Allocate reservation during splice_write
This patch adds a GFS2-specific function for splice_write which
first calls function gfs2_rs_alloc to make sure a reservation
structure has been allocated before attempting to reserve blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-03-18 12:42:22 -05:00
Andreas Gruenbacher
932e468a37 GFS2: gfs2_set_acl(): Cache "no acl" as well
When removing a default acl or setting an access acl that is entirely
represented in the file mode, we end up with acl == NULL in gfs2_set_acl(). In
that case, bring gfs2 in line with other file systems and cache the NULL acl
with set_cached_acl() instead of invalidating the cache with
forget_cached_acl().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-03-18 12:41:57 -05:00
hujianyang
71cbad7e69 ovl: upper fs should not be R/O
After importing multi-lower layer support, users could mount a r/o
partition as the left most lowerdir instead of using it as upperdir.
And a r/o upperdir may cause an error like

	overlayfs: failed to create directory ./workdir/work

during mount.

This patch check the *s_flags* of upper fs and return an error if
it is a r/o partition. The checking of *upper_mnt->mnt_sb->s_flags*
can be removed now.

This patch also remove

	/* FIXME: workdir is not needed for a R/O mount */

from ovl_fill_super() because:

1) for upper fs r/o case
Setting a r/o partition as upper is prevented, no need to care about
workdir in this case.

2) for "mount overlay -o ro" with a r/w upper fs case
Users could remount overlayfs to r/w in this case, so workdir should
not be omitted.

Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-03-18 10:29:48 +01:00
hujianyang
6be4506e34 ovl: check lowerdir amount for non-upper mount
Recently multi-lower layer mount support allow upperdir and workdir
to be omitted, then cause overlayfs can be mount with only one
lowerdir directory. This action make no sense and have potential risk.

This patch check the total number of lower directories to prevent
mounting overlayfs with only one directory.

Also, an error message is added to indicate lower directories exceed
OVL_MAX_STACK limit.

Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-03-18 10:29:47 +01:00