Commit Graph

24897 Commits

Author SHA1 Message Date
Josef Bacik
9e4871070b Btrfs: skip looking for delalloc if we don't have ->fill_delalloc
We always look for delalloc bytes in our io_tree so we can fill in delalloc.
This is fine in most cases, but if we're writing out the btree_inode this is
just a superfluous tree search on the io_tree, and if we have a lot of metadata
dirty this could be an expensive check.  So instead check to see if our io_tree
has a ->fill_delalloc op, and if not don't even bother doing the lookup.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19 15:12:30 -04:00
Josef Bacik
fb25e9141a Btrfs: use bytes_may_use for all ENOSPC reservations
We have been using bytes_reserved for metadata reservations, which is wrong
since we use that to keep track of outstanding reservations from the allocator.
This resulted in us doing a lot of silly things to make sure we don't allocate a
bunch of metadata chunks since we never had a real view of how much space was
actually in use by metadata.

This passes Arne's enospc test and xfstests as well as my own enospc tests.
Hopefully this will get us moving in the right direction.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19 15:12:30 -04:00
Josef Bacik
830c4adbd0 Btrfs: fix how we mount subvol=<whatever>
We've only been able to mount with subvol=<whatever> where whatever was a subvol
within whatever root we had as the default.  This allows us to mount -o
subvol=path/to/subvol/you/want relative from the normal fs_tree root.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19 15:12:29 -04:00
Josef Bacik
ba5b8958da Btrfs: use d_obtain_alias when mounting subvol/subvolid
Currently what we do is just wrong.  We either

1) Alloc a new "root" dentry with sb->s_root as it's parent which is just wrong
as we could walk into this subvol later on via another path and hilarity could
ensue.  Also we don't check the return value of d_splice_alias which isn't good
either.

or

2) Do a d_find_alias() which we could have lost our dentry from cache at this
point and found nothing.

So use d_obtain_alias().  In the case that we already have the inode/dentry in
cache we will get the correct dentry.  If not we will get a disconnected dentry
tree so if we walk into it later on everything will be connected up properly.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19 15:12:29 -04:00
Josef Bacik
0cbbdf7c9c Btrfs: kill reserved_bytes in inode
reserved_bytes is not used for anything in the inode, remove it.

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19 15:12:28 -04:00
Josef Bacik
f1bdcc0a82 Btrfs: move stuff around in btrfs_inode to get better packing
Moving things around to give us better packing in the btrfs_inode.  This reduces
the size of our inode by 8 bytes.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19 15:12:28 -04:00
J. Bruce Fields
8b289b2c23 nfsd4: implement new 4.1 open reclaim types
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-19 11:52:12 -04:00
J. Bruce Fields
a8d86cd75b nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround
0c12eaffdf "nfsd: don't break lease on
CLAIM_DELEGATE_CUR" was a temporary workaround for a problem fixed
properly in the vfs layer by 778fc546f7
"locks: fix tracking of inprogress lease breaks", so we can revert that
change (but keeping some minor cleanup from that commit).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-19 11:42:03 -04:00
Andres Salomon
123005f3cc jffs2: add compr=lzo and compr=zlib options
..to allow forcing of either compression scheme.  This will override
compiled-in defaults.  jffs2_compress is reworked a bit, as the lzo/zlib
override shares lots of code w/ the PRIORITY mode.

v2: update show_options accordingly.

Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-10-19 17:22:21 +03:00
Andres Salomon
92abc475d8 jffs2: implement mount option parsing and compression overriding
Currently jffs2 has compile-time constants (and .config options)
controlling whether or not the various compression/decompression
drivers are built in and enabled.  This is fine for embedded
systems, but it clashes with distribution kernels.  Distro kernels
tend to turn on everything; this causes OpenFirmware to fall
over, as it understands ZLIB-compressed inodes.  Booting a kernel
that has LZO compression enabled, writing to the boot partition,
and then rebooting causes OFW to fail to read the kernel from
the filesystem.  This is because LZO compression has priority
when writing new data to jffs2, if LZO is enabled.

This patch adds mount option parsing, and a single supported
option ("compr=none").  This adds the flexibility of being
able to specify which compressor overrides on a per-superblock
basis.  For now, we can simply disable compression;
additional flexibility coming soon.

v2: kill some printks, and implement show_options as suggested
by Artem Bityutskiy.

Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-10-19 17:22:20 +03:00
Tejun Heo
523e1d399c block: make gendisk hold a reference to its queue
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:31:07 +02:00
Trond Myklebust
08ef7bd3bc NFSv4: Translate NFS4ERR_BADNAME into ENOENT when applied to a lookup
Both LOOKUP and OPEN operations may return NFS4ERR_BADNAME if we send a
an invalid name as a filename argument. As far as the application is
concerned, it just has to know that the file doesn't exist, and so
ENOENT would be the appropriate reply. We should only return EINVAL
if the filename is being used to _create_ a new object on the
remote filesystem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 16:13:51 -07:00
Trond Myklebust
0c2e53f11a NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()
...and also remove the associated nfs_v4_clientops entry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 16:13:51 -07:00
Christoph Hellwig
9e4c109ac8 xfs: add AIL pushing tracepoints
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-18 15:12:04 -05:00
Alex Elder
2900b33999 xfs: put in missed fix for merge problem
I intended to do this as part of fixing part of the conflict with
the merge with Linus' tree, but evidently it didn't get included in
the commit.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-10-18 20:00:14 +00:00
Trond Myklebust
a9a4a87a59 NFS: Use the inode->i_version to cache NFSv4 change attribute information
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:14:34 -07:00
H Hartley Sweeten
45402c38ee nfs/super.c: local functions should be static
commit ae50c0b5 "pnfs: client stats" added additional information to
the output of /proc/self/mountstats. The new functions introduced are
only used in this file and should be marked static.

If CONFIG_NFS_V4_1 is not defined, empty stub functions are used.  If
CONFIG_NFS_V4 is not defined these stub functions are not used at all.
Adding static for the functions results in compile warnings:

fs/nfs/super.c:743: warning: 'show_sessions' defined but not used
fs/nfs/super.c:756: warning: 'show_pnfs' defined but not used

Fix this by adding a #ifdef CONFIG_NFS_V4 guard around the two
show_ functions.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:15 -07:00
Peng Tao
7542274519 pnfsblock: fix writeback deadlock
We should check if the sector is already initialized before
trying to grab the page from page cache. Otherwise when two
pages of the same block are written back by two threads each
calling from writepage_locked, it can cause deadlock like bellow.

 [ 1080.972099] INFO: task kswapd0:25 blocked for more than 120 seconds.
 [ 1080.972377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 [ 1080.972812] kswapd0         D ffff88000c4926c0     0    25      2 0x00000000
 [ 1080.972816]  ffff88000df276b0 0000000000000046 ffff88000df27640 ffffffff81013ba7
 [ 1080.972821]  ffff88000c492310 ffff88000df27fd8 ffff88000df27fd8 00000000001d3440
 [ 1080.972824]  ffff88000c378000 ffff88000c492310 ffff8800175d3d40 ffff880017fc75a8
 [ 1080.972828] Call Trace:
 [ 1080.972860]  [<ffffffff81013ba7>] ? read_tsc+0x9/0x19
 [ 1080.972877]  [<ffffffff810e0b23>] ? lock_page+0x2b/0x2b
 [ 1080.972899]  [<ffffffff81475a1d>] io_schedule+0x63/0x7e
 [ 1080.972902]  [<ffffffff810e0b31>] sleep_on_page+0xe/0x12
 [ 1080.972905]  [<ffffffff81475fe8>] __wait_on_bit_lock+0x46/0x8f
 [ 1080.972916]  [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72
 [ 1080.972919]  [<ffffffff810e0af6>] __lock_page+0x66/0x68
 [ 1080.972928]  [<ffffffff81072705>] ? autoremove_wake_function+0x3d/0x3d
 [ 1080.972932]  [<ffffffff810e0b1f>] lock_page+0x27/0x2b
 [ 1080.972934]  [<ffffffff810e0bcf>] find_lock_page+0x34/0x57
 [ 1080.972937]  [<ffffffff810e1738>] find_or_create_page+0x34/0x8a
 [ 1080.972947]  [<ffffffffa034245b>] bl_write_pagelist+0x205/0x6da [blocklayoutdriver]
 [ 1080.972951]  [<ffffffffa034145d>] ? bl_free_lseg+0x38/0x38 [blocklayoutdriver]
 [ 1080.972995]  [<ffffffffa02e27b9>] ? nfs_write_rpcsetup+0x118/0x123 [nfs]
 [ 1080.973033]  [<ffffffffa030246b>] pnfs_generic_pg_writepages+0x10b/0x1f4 [nfs]
 [ 1080.973089]  [<ffffffffa02deaae>] nfs_pageio_doio+0x1a/0x43 [nfs]
 [ 1080.973098]  [<ffffffffa02df035>] nfs_pageio_complete+0x16/0x2d [nfs]
 [ 1080.973108]  [<ffffffffa02e2d8f>] nfs_writepage_locked+0xa0/0xbf [nfs]
 [ 1080.973119]  [<ffffffffa02e36a1>] nfs_writepage+0x16/0x2b [nfs]
 [ 1080.973122]  [<ffffffff810e8762>] ? clear_page_dirty_for_io+0x87/0x9a
 [ 1080.973133]  [<ffffffff810efc5b>] shrink_page_list+0x39b/0x6c8
 [ 1080.973139]  [<ffffffff810f03bb>] shrink_inactive_list+0x22c/0x39e
 [ 1080.973144]  [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72
 [ 1080.973148]  [<ffffffff810f0c33>] shrink_zone+0x445/0x588
 [ 1080.973152]  [<ffffffff810f1a11>] balance_pgdat+0x2c2/0x56b
 [ 1080.973170]  [<ffffffff81254208>] ? __bitmap_weight+0x34/0x80
 [ 1080.973175]  [<ffffffff810f1f78>] kswapd+0x2be/0x2fa
 [ 1080.973179]  [<ffffffff810726c8>] ? __init_waitqueue_head+0x4b/0x4b
 [ 1080.973183]  [<ffffffff810f1cba>] ? balance_pgdat+0x56b/0x56b
 [ 1080.973187]  [<ffffffff81071f69>] kthread+0xa8/0xb0
 [ 1080.973200]  [<ffffffff814806b4>] kernel_thread_helper+0x4/0x10
 [ 1080.973205]  [<ffffffff81071ec1>] ? __init_kthread_worker+0x5a/0x5a
 [ 1080.973210]  [<ffffffff814806b0>] ? gs_change+0x13/0x13
 [ 1080.973213] no locks held by kswapd0/25.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:15 -07:00
Peng Tao
e6d05a757c pnfsblock: fix NULL pointer dereference
bl_add_page_to_bio returns error pointer. bio should be reset to
NULL in failure cases as the out path always calls bl_submit_bio.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:14 -07:00
Peng Tao
9b7eecdcfe pnfs: recoalesce when ld read pagelist fails
For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to
mds.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:14 -07:00
Peng Tao
8ce160c5ef pnfs: recoalesce when ld write pagelist fails
For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to
mds.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:13 -07:00
Peng Tao
1b0ae06877 pnfs: make _set_lo_fail generic
file layout and block layout both use it to set mark layout io failure
bit. So make it generic.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:13 -07:00
Peng Tao
760383f1ee pnfsblock: add missing rpc_put_mount and path_put
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:12 -07:00
Peng Tao
c1225158a8 SUNRPC/NFS: make rpc pipe upcall generic
The same function is used by idmap, gss and blocklayout code. Make it
generic.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:12 -07:00
Jim Rees
fdc17abbc4 pnfsblock: fix size of upcall message
Make the status field explicitly 32 bits.  "...it's unlikely that the kernel
and userspace would differ on the size of an int here, but it might be a
good idea to go ahead and make that explicitly 32 bits in case we end up
dealing with more exotic arches at some point in the future."

Suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:11 -07:00
Jim Rees
516f2e24fa pnfsblock: fix return code confusion
Always return PTR_ERR, not NULL, from nfs4_blk_get_deviceinfo and
nfs4_blk_decode_device.

Check for IS_ERR, not NULL, in bl_set_layoutdriver when calling
nfs4_blk_get_deviceinfo.

Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:11 -07:00
Jeff Layton
2da9565235 nfs: don't try to migrate pages with active requests
nfs_find_and_lock_request will take a reference to the nfs_page and
will then put it if the req is already locked. It's possible though
that the reference will be the last one. That put then can kick off
a whole series of reference puts:

nfs_page
   nfs_open_context
      dentry
          inode

If the inode ends up being deleted, then the VFS will call
truncate_inode_pages. That function will try to take the page lock, but
it was already locked when migrate_page was called. The code
deadlocks.

Fix this by simply refusing the migration request if PagePrivate is
already set, indicating that the page is already associated with an
active read or write request.

We've had a customer test a backported version of this patch and
the preliminary results seem good.

Cc: stable@kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:11 -07:00
Mi Jinlong
b9dd3abbbc nfs: fix bug about IPv6 address scope checking
The result from ipv6_addr_scope() always not be a single SCOPE,
so we can't use equal to compare the result with IPV6_ADDR_SCOPE_LINKLOCAL
at nfs_sockaddr_match_ipaddr6.

This patch fixs the problem, and lets checking address before scope_id.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:10 -07:00
Jeff Layton
3236c3e1ad nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pages
commit 420e3646 allowed the kernel to reduce the number of unnecessary
commit calls by skipping the commit when there are a large number of
outstanding pages.

However, the current test in nfs_commit_unstable_pages does not handle
the edge condition properly. When ncommit == 0, then that means that the
kernel doesn't need to do anything more for the inode. The current test
though in the WB_SYNC_NONE case will return true, and the inode will end
up being marked dirty. Once that happens the inode will never be clean
until there's a WB_SYNC_ALL flush.

Fix this by immediately returning from nfs_commit_unstable_pages when
ncommit == 0.

Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which
has a backported version of 420e3646. The inode cache there was growing
very large. The inode cache was unable to be shrunk since the inodes
were all marked dirty. Calling sync() would essentially "fix" the
problem -- the WB_SYNC_ALL flush would result in the inodes all being
marked clean.

What I'm not clear on is how big a problem this is in mainline kernels
as the writeback code there is very different. Either way, it seems
incorrect to re-mark the inode dirty in this case.

Reported-by: Mike McLean <mikem@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org [2.6.34+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:10 -07:00
Trond Myklebust
59b7c05fff Revert "NFS: Ensure that writeback_single_inode() calls write_inode() when syncing"
This reverts commit b80c3cb628.

The reverted commit was rendered obsolete by a VFS fix: commit
5547e8aac6 (writeback: Update dirty flags in
two steps). We now no longer need to worry about writeback_single_inode()
missing our marking the inode for COMMIT in 'do_writepages()' call.

Reverting this patch, fixes a performance regression in which the inode
would continuously get queued to the dirty list, causing the writeback
code to unnecessarily try to send a COMMIT.

Signed-off-by: Trond Myklebust <Trond.Myklebust>
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org [2.6.35+]
2011-10-18 09:08:09 -07:00
Gerlando Falauto
ad4778fb40 CIFS: fix automount for DFS shares
Automounting directories are now invalidated by .d_revalidate()
so to be d_instantiate()d again with the right DCACHE_NEED_AUTOMOUNT
flag

Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-18 10:49:38 -05:00
Dan Carpenter
7748dd6eab CIFS: cleanup min_t() cast in cifs_read()
Smatch complains that the cast to "int" in min_t() changes very large
values of current_read_size into negative values and so min_t()
could return the wrong value.  I removed the const as well, as that
doesn't do anything here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-18 10:33:44 -05:00
H Hartley Sweeten
ee90d57e20 ext4: quiet sparse noise about plain integer as NULL pointer
The third parameter to ext4_free_blocks is a struct buffer_head *.  This
parameter should be NULL not 0.

This quiets the sparse noise:

warning: Using plain integer as NULL pointer

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18 11:01:51 -04:00
H Hartley Sweeten
e6705f7c25 ext4: add __user decoration to calls of copy_{from,to}_user()
This quiets the sparse noise:

warning: incorrect type in argument 2 (different address spaces)
   expected void const [noderef] <asn:1>*from
   got struct fstrim_range *<noident>
warning: incorrect type in argument 1 (different address spaces)
   expected void [noderef] <asn:1>*to
   got struct fstrim_range *<noident>

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18 10:59:51 -04:00
H Hartley Sweeten
e0cbee3e14 ext4: functions should not be declared extern
The function declarations in ext4.h are already marked extern, so it's
not necessary to do so in the .c files.

This quiets the sparse noise:

warning: function 'ext4_flush_completed_IO' with external linkage has definition
warning: function 'ext4_init_inode_table' with external linkage has definition

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18 10:57:51 -04:00
Shaohua Li
1bce63d1a2 ext4: add block plug for .writepages
Add block plug for ext4 .writepages. Though ext4 .writepages
already handles request merge very well, block plug is still
helpful to reduce block lock contention.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18 10:55:51 -04:00
Darrick J. Wong
f6f96fdb8c ext4: Fix comparison endianness problem in MMP initialization
As part of startup, the MMP initialization code does this:

mmp->mmp_seq = seq = cpu_to_le32(mmp_new_seq());

Next, mmp->mmp_seq is written out to disk, a delay happens, and then
the MMP block is read back in and the sequence value is tested:

if (seq != le32_to_cpu(mmp->mmp_seq)) {
	/* fail the mount */

On a LE system such as x86, the *le32* functions do nothing and this
works.  Unfortunately, on a BE system such as ppc64, this comparison
becomes:

if (cpu_to_le32(new_seq) != le32_to_cpu(cpu_to_le32(new_seq)) {
	/* fail the mount */

Except for a few palindromic sequence numbers, this test always causes
the mount to fail, which makes MMP filesystems generally unmountable
on ppc64.  The attached patch fixes this situation.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18 10:53:51 -04:00
Nikitas Angelinas
bdfc230f33 ext4: MMP: fix error message rate-limiting logic in kmmpd
Current logic would print an error message only once, and then
'failed_writes' would stay at 1.  Rework the loop to increment
'failed_writes' and print the error message every
s_mmp_update_interval * 60 seconds, as intended according to the
comment.

Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Signed-off-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Andreas Dilger <adilger@dilger.ca>
2011-10-18 10:51:51 -04:00
Nikitas Angelinas
215fc6af73 ext4: MMP: kmmpd should use nodename from init_uts_ns.name, not sysname
sysname holds "Linux" by default, i.e. what appears when doing a "uname
-s"; nodename should be used to print the machine's hostname, i.e. what
is returned when doing a "uname -n" or "hostname", and what
gethostname(2)/sethostname(2) manipulate, in order to notify the
administrator of the node which is contending to mount the filesystem.

Acked-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Signed-off-by: Andrew Perepechko <andrew_perepechko@xyratex.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18 10:49:51 -04:00
J. Bruce Fields
856121b2e8 nfsd4: warn on open failure after create
If we create the object and then return failure to the client, we're
left with an unexpected file in the filesystem.

I'm trying to eliminate such cases but not 100% sure I have so an
assertion might be helpful for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:50:08 -04:00
J. Bruce Fields
4cdc951b86 nfsd4: preallocate open stateid in process_open1()
As with the nfs4_file, we'd prefer to find out about any failure before
creating a new file rather than after.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:50:07 -04:00
J. Bruce Fields
996e09385c nfsd4: do idr preallocation with stateid allocation
Move idr preallocation out of stateid initialization, into stateid
allocation, so that we no longer have to handle any errors from the
former.

This is a little subtle due to the way the idr code manages these
preallocated items--document that in comments.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:50:07 -04:00
J. Bruce Fields
32513b40ef nfsd4: preallocate nfs4_file in process_open1()
Creating a new file is an irrevocable step--once it's visible in the
filesystem, other processes may have seen it and done something with it,
and unlinking it wouldn't simply undo the effects of the create.

Therefore, in the case where OPEN creates a new file, we shouldn't do
the create until we know that the rest of the OPEN processing will
succeed.

For example, we should preallocate a struct file in case we need it
until waiting to allocate it till process_open2(), which is already too
late.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:50:00 -04:00
J. Bruce Fields
d29b20cd58 nfsd4: clean up open owners on OPEN failure
If process_open1() creates a new open owner, but the open later fails,
the current code will leave the open owner around.  It won't be on the
close_lru list, and the client isn't expected to send a CLOSE, so it
will hang around as long as the client does.

Similarly, if process_open1() removes an existing open owner from the
close lru, anticipating that an open owner that previously had no
associated stateid's now will, but the open subsequently fails, then
we'll again be left with the same leak.

Fix both problems.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:33:57 -04:00
J. Bruce Fields
bcf130f9df nfsd4: simplify process_open1 logic
No change in behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:33:51 -04:00
J. Bruce Fields
3557e43b8f nfsd4: make is_open_owner boolean
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:09:37 -04:00
J. Bruce Fields
a50d2ad172 nfsd4: centralize renew_client() calls
There doesn't seem to be any harm to renewing the client a bit earlier,
when it is looked up.  That saves us from having to sprinkle
renew_client calls over quite so many places.

Also remove a redundant comment and do a little cleanup.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:09:37 -04:00
Alex Elder
9508534c5f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Resolved conflicts:
  fs/xfs/xfs_trans_priv.h:
    - deleted struct xfs_ail field xa_flags
    - kept field xa_log_flush in struct xfs_ail
  fs/xfs/xfs_trans_ail.c:
    - in xfsaild_push(), in XFS_ITEM_PUSHBUF case, replaced
      "flush_log = 1" with "ailp->xa_log_flush++"

Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-17 15:42:02 -05:00
Tao Ma
f472e02669 ext4: avoid stamping on other memories in ext4_ext_insert_index()
Add a sanity check to make sure ix hasn't gone beyond the valid bounds
of the extent block.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-17 10:13:46 -04:00
Shirish Pargaonkar
a5ff376966 cifs: Call id to SID mapping functions to change owner/group (try #4 repost)
Now build security descriptor to change either owner or group at the
server.  Initially security descriptor was built to change only
(D)ACL, that functionality has been extended.

When either an Owner or a Group of a file object at the server is changed,
rest of security descriptor remains same (DACL etc.).

To set security descriptor, it is necessary to open that file
with permission bits of either WRITE_DAC if DACL is being modified or
WRITE_OWNER (Take Ownership) if Owner or Group is being changed.

It is the server that decides whether a set security descriptor with
either owner or group change succeeds or not.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-17 09:11:11 -05:00