Commit Graph

25699 Commits

Author SHA1 Message Date
Bob Peterson
c688b8b334 GFS2: Add non-try locks back to get_local_rgrp
This upstream patch had what I believe is an unintended consequence:

http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-3.0-nmw.git;a=commitdiff;h=beca42486749c1538a5ed58fe9dcc9f26d428c93

The patch changed function get_local_rgrp such that it ONLY
used TRY locks for RGRP searches. Prior to that patch, the code
used TRY locks during the first loop, and if that was unsuccessful,
it used normal blocking locks on subsequent searches. This patch
changes it back to the old way.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-15 15:24:22 +00:00
Liu Bo
f1ebcc74d5 Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush
The btrfs snapshotting code requires that once a root has been
snapshotted, we don't change it during a commit.

But there are two cases to lead to tree corruptions:

1) multi-thread snapshots can commit serveral snapshots in a transaction,
   and this may change the src root when processing the following pending
   snapshots, which lead to the former snapshots corruptions;

2) the free inode cache was changing the roots when it root the cache,
   which lead to corruptions.

This fixes things by making sure we force COW the block after we create a
snapshot during commiting a transaction, then any changes to the roots
will result in COW, and we get all the fs roots and snapshot roots to be
consistent.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-15 09:53:28 -05:00
Jiri Kosina
2290c0d06d Merge branch 'master' into for-next
Sync with Linus tree to have 157550ff ("mtd: add GPMI-NAND driver
in the config and Makefile") as I have patch depending on that one.
2011-11-13 20:55:53 +01:00
Linus Torvalds
c1f4246716 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: rename the option to nospace_cache
  Btrfs: handle bio_add_page failure gracefully in scrub
  Btrfs: fix deadlock caused by the race between relocation
  Btrfs: only map pages if we know we need them when reading the space cache
  Btrfs: fix orphan backref nodes
  Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}
  Btrfs: fix unreleased path in btrfs_orphan_cleanup()
  Btrfs: fix no reserved space for writing out inode cache
  Btrfs: fix nocow when deleting the item
  Btrfs: tweak the delayed inode reservations again
  Btrfs: rework error handling in btrfs_mount()
  Btrfs: close devices on all error paths in open_ctree()
  Btrfs: avoid null dereference and leaks when bailing from open_ctree()
  Btrfs: fix subvol_name leak on error in btrfs_mount()
  Btrfs: fix memory leak in btrfs_parse_early_options()
  Btrfs: fix our reservations for updating an inode when completing io
  Btrfs: fix oops on NULL trans handle in btrfs_truncate
  btrfs: fix double-free 'tree_root' in 'btrfs_mount()'
2011-11-11 23:47:06 -02:00
Linus Torvalds
53e3ccfd15 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix force shutdown handling in xfs_end_io
  xfs: constify xfs_item_ops
  xfs: Fix possible memory corruption in xfs_readlink
2011-11-11 23:37:17 -02:00
Sage Weil
774ac21da7 ceph: initialize root dentry
Set up d_fsdata on the root dentry.  This fixes a NULL pointer dereference
in ceph_d_prune on umount.  It also means we can eventually strip out all
of the conditional checks on d_fsdata because it is now set unconditionally
(prior to setting up the d_ops).

Fix the ceph_d_prune debug print while we're here.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-11 09:50:17 -08:00
David Sterba
8965593e41 btrfs: rename the option to nospace_cache
Rename no_space_cache option to nospace_cache to be more consistent with
the rest, where the simple prefix 'no' is used to negate an option.

The option has been introduced during the -rc1 cycle and there are has not been
widely used, so it's safe.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-11 10:14:57 -05:00
Arne Jansen
69f4cb526b Btrfs: handle bio_add_page failure gracefully in scrub
Currently scrub fails with ENOMEM when bio_add_page fails. Unfortunately
dm based targets accept only one page per bio, thus making scrub always
fails. This patch just submits the current bio when an error is encountered
and starts a new one.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-11 08:17:10 -05:00
Miao Xie
62f30c5462 Btrfs: fix deadlock caused by the race between relocation
We can not do flushable reservation for the relocation when we create snapshot,
because it may make the transaction commit task and the flush task wait for
each other and the deadlock happens.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10 20:45:05 -05:00
Josef Bacik
2f120c05e6 Btrfs: only map pages if we know we need them when reading the space cache
People have been running into a warning when loading space cache because the
page is already mapped when trying to read in a bitmap.  The way we read in
entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry
maps the entries if it needs to, and if it hits the end of the page it simply
unmaps the page.  That way we can unconditionally unmap the io_ctl before
reading in the bitmap and we should stop hitting these warnings.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10 20:45:05 -05:00
Miao Xie
76b9e23d25 Btrfs: fix orphan backref nodes
If the root node of a fs/file tree is in the block group that is
being relocated, but the others are not in the other block groups.
when we create a snapshot for this tree between the relocation tree
creation ends and ->create_reloc_tree is set to 0, Btrfs will create
some backref nodes that are the lowest nodes of the backrefs cache.
But we forget to add them into ->leaves list of the backref cache
and deal with them, and at last, they will triggered BUG_ON().

  kernel BUG at fs/btrfs/relocation.c:239!

This patch fixes it by adding them into ->leaves list of backref cache.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10 20:45:05 -05:00
Miao Xie
61b520a9d0 Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}
btrfs_block_rsv_add{, _noflush}() have similar code, so abstract that code.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10 20:45:05 -05:00
Miao Xie
3254c87618 Btrfs: fix unreleased path in btrfs_orphan_cleanup()
When we did stress test for the space relocation, the deadlock happened.
By debugging, We found it was caused by the carelessness that we forgot
to unlock the read lock of the extent buffers in btrfs_orphan_cleanup()
before we end the transaction handle, so the transaction commit task waited
the task, which called btrfs_orphan_cleanup(), to unlock the extent buffer,
but that task waited the commit task to end the transaction commit, and
the deadlock happened. Fix it.

Signed-ff-by: Miao Xie <miaox@cn.fujitsu.com>

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10 20:45:05 -05:00
Miao Xie
ba38eb4de3 Btrfs: fix no reserved space for writing out inode cache
I-node cache forgets to reserve the space when writing out it. And when
we do some stress test, such as synctest, it will trigger WARN_ON() in
use_block_rsv().

WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]()
...
Call Trace:
 [<ffffffff8104df86>] warn_slowpath_common+0x80/0x98
 [<ffffffff8104dfb3>] warn_slowpath_null+0x15/0x17
 [<ffffffffa0369c60>] btrfs_alloc_free_block+0xbf/0x281 [btrfs]
 [<ffffffff810cbcb8>] ? __set_page_dirty_nobuffers+0xfe/0x108
 [<ffffffffa035c040>] __btrfs_cow_block+0x118/0x3b5 [btrfs]
 [<ffffffffa035c7ba>] btrfs_cow_block+0x103/0x14e [btrfs]
 [<ffffffffa035e4c4>] btrfs_search_slot+0x249/0x6a4 [btrfs]
 [<ffffffffa036d086>] btrfs_lookup_inode+0x2a/0x8a [btrfs]
 [<ffffffffa03788b7>] btrfs_update_inode+0xaa/0x141 [btrfs]
 [<ffffffffa036d7ec>] btrfs_save_ino_cache+0xea/0x202 [btrfs]
 [<ffffffffa03a761e>] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs]
 [<ffffffffa0373867>] commit_fs_roots+0xaa/0x158 [btrfs]
 [<ffffffffa03746a6>] btrfs_commit_transaction+0x405/0x731 [btrfs]
 [<ffffffff810690df>] ? wake_up_bit+0x25/0x25
 [<ffffffffa039d652>] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs]
 [<ffffffffa0381c5f>] btrfs_sync_file+0x16a/0x198 [btrfs]
 [<ffffffff81122806>] ? mntput+0x21/0x23
 [<ffffffff8112d150>] vfs_fsync_range+0x18/0x21
 [<ffffffff8112d170>] vfs_fsync+0x17/0x19
 [<ffffffff8112d316>] do_fsync+0x29/0x3e
 [<ffffffff8112d348>] sys_fsync+0xb/0xf
 [<ffffffff81468352>] system_call_fastpath+0x16/0x1b

Sometimes it causes BUG_ON() in the reservation code of the delayed inode
is triggered.

So we must reserve enough space for inode cache.

Note: If we can not reserve the enough space for inode cache, we will
give up writing out it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10 20:45:04 -05:00
Miao Xie
924cd8fbe4 Btrfs: fix nocow when deleting the item
btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves,
if we modify the result of it, the meta-data will be broken. fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10 20:45:04 -05:00
Chris Mason
f7d572188b Merge branch 'mount-fixes' of git://github.com/idryomov/btrfs-unstable into integration 2011-11-10 20:42:53 -05:00
Chris Mason
2115133f8b Btrfs: tweak the delayed inode reservations again
Josef sent along an incremental to the inode reservation
code to make sure we try and fall back to directly updating
the inode item if things go horribly wrong.

This reworks that patch slightly, adding a fallback function
that will always try to update the inode item directly without
going through the delayed_inode code.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10 20:39:08 -05:00
Trond Myklebust
62e4a76987 NFS: Revert pnfs ugliness from the generic NFS read code path
pNFS-specific code belongs in the pnfs layer. It should not be
hijacking generic NFS read or write code paths.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-10 14:50:26 -05:00
Linus Torvalds
5e442a493f Revert "proc: fix races against execve() of /proc/PID/fd**"
This reverts commit aa6afca5bc.

It escalates of some of the google-chrome SELinux problems with ptrace
("Check failed: pid_ > 0.  Did not find zygote process"), and Andrew
says that it is also causing mystery lockdep reports.

Reported-by: Alex Villacís Lasso <a_villacis@palosanto.com>
Requested-by: James Morris <jmorris@namei.org>
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-09 18:16:00 -05:00
Ilya Dryomov
04d21a244f Btrfs: rework error handling in btrfs_mount()
Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer
dereference on error paths, also we would leave all devices busy and
leak fs_info with all sub-structures on error when trying to mount an
already mounted fs to a different directory.

Fix this by doing all allocations before trying to open any of the
devices, adjust error path for mount-already-mounted-fs case.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-11-09 22:53:39 +02:00
Ilya Dryomov
586e46e281 Btrfs: close devices on all error paths in open_ctree()
Fix a bug introduced by 7e662854 where we would leave devices busy on
certain error paths in open_ctree().  fs_info is guaranteed to be
non-NULL now so it's safe to dereference it on all error paths.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-11-09 22:53:38 +02:00
Ilya Dryomov
4d34b27895 Btrfs: avoid null dereference and leaks when bailing from open_ctree()
Fix bugs introduced by 6c41761f.  Firstly, after failing to allocate any
of the tree roots (first 'goto fail' in open_ctree()) we would
dereference a NULL fs_info pointer in free_fs_info().  Secondly, after
failures from init_srcu_struct(), setup_bdi() and new_inode() we would
leak all earlier allocated roots: fs_info fields haven't been
initialized yet so free_fs_info() is rendered useless.

Fix this by initializing fs_info pointer and fs_info fields before any
allocations happen.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-11-09 22:53:38 +02:00
Ilya Dryomov
f23c8af8ca Btrfs: fix subvol_name leak on error in btrfs_mount()
btrfs_parse_early_options() can fail due to error while scanning devices
(-o device= option), but still strdup() subvol_name string:

mount -o subvol=SUBV,device=BAD_DEVICE <dev> <mnt>

So free subvol_name string on error.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-11-09 22:53:38 +02:00
Ilya Dryomov
a90e8b6fb8 Btrfs: fix memory leak in btrfs_parse_early_options()
Don't leak subvol_name string in case multiple subvol= options are
given.  "The lastest option is effective" behavior (consistent with
subvolid= and subvolrootid= options) is preserved.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-11-09 22:53:38 +02:00
Steven Whitehouse
79c4c379c8 GFS2: f_ra is always valid in dir readahead function
As a result, we don't need to test it each time.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2011-11-09 13:46:06 +00:00
Steven Whitehouse
114b80ce2c GFS2: Fix very unlikley memory leak in ACL xattr code
This was spotted by automated code analysis. In case reading
an ACL xattr failed (only likely to happen if there is an I/O
error for example, and even then only with unstuffed xattrs,
so pretty difficult to trigger) a small amount of memory could
potentially be leaked.

This patch adds a kfree to the error path, and also removes a
test which is no longer required (gfs2_ea_get_copy always
returns either a negative error, or a length)

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-09 12:54:43 +00:00
Eryu Guan
63894ab9f6 ext3: call ext3_mark_recovery_complete() when recovery is really needed
Call ext3_mark_recovery_complete() in ext3_fill_super() only if
needs_recovery is non-zero.

Besides that, print out "recovery complete" message after calling
ext3_mark_recovery_complete().

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-09 12:23:17 +01:00
Josef Bacik
7fd2ae21a4 Btrfs: fix our reservations for updating an inode when completing io
People have been reporting ENOSPC crashes in finish_ordered_io.  This is because
we try to steal from the delalloc block rsv to satisfy a reservation to update
the inode.  The problem with this is we don't explicitly save space for updating
the inode when doing delalloc.  This is kind of a problem and we've gotten away
with this because way back when we just stole from the delalloc reserve without
any questions, and this worked out fine because generally speaking the leaf had
been modified either by the mtime update when we did the original write or
because we just updated the leaf when we inserted the file extent item, only on
rare occasions had the leaf not actually been modified, and that was still ok
because we'd just use a block or two out of the over-reservation that is
delalloc.

Then came the delayed inode stuff.  This is amazing, except it wants a full
reservation for updating the inode since it may do it at some point down the
road after we've written the blocks and we have to recow everything again.  This
worked out because the delayed inode stuff just stole from the global reserve,
that is until recently when I changed that because it caused other problems.

So here we are, we're doing everything right and being screwed for it.  So take
an extra reservation for the inode at delalloc reservation time and carry it
through the life of the delalloc reservation.  If we need it we can steal it in
the delayed inode stuff.  If we have already stolen it try and do a normal
metadata reservation.  If that fails try to steal from the delalloc reservation.
If _that_ fails we'll get a WARN_ON() so I can start thinking of a better way to
solve this and in the meantime we'll steal from the global reserve.

With this patch I ran xfstests 13 in a loop for a couple of hours and didn't see
any problems.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-08 15:47:34 -05:00
Chris Mason
917c16b2b6 Btrfs: fix oops on NULL trans handle in btrfs_truncate
If we fail to reserve space in the transaction during truncate, we can
error out with a NULL trans handle.  The cleanup code needs an extra
check to make sure we aren't trying to use the bad handle.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-08 14:49:59 -05:00
Christoph Hellwig
810627d9a6 xfs: fix force shutdown handling in xfs_end_io
Ensure ioend->io_error gets propagated back to e.g. AIO completions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-11-08 10:48:23 -06:00
Christoph Hellwig
272e42b215 xfs: constify xfs_item_ops
The log item ops aren't nessecarily the biggest exploit vector, but marking
them const is easy enough.  Also remove the unused xfs_item_ops_t typedef
while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
2011-11-08 10:48:23 -06:00
Carlos Maiolino
b52a360b2a xfs: Fix possible memory corruption in xfs_readlink
Fixes a possible memory corruption when the link is larger than
MAXPATHLEN and XFS_DEBUG is not enabled. This also remove the
S_ISLNK assert, since the inode mode is checked previously in
xfs_readlink_by_handle() and via VFS.

Updated to address concerns raised by Ben Hutchings about the loose
attention paid to 32- vs 64-bit values, and the lack of handling a
potentially negative pathlen value:
 - Changed type of "pathlen" to be xfs_fsize_t, to match that of
   ip->i_d.di_size
 - Added checking for a negative pathlen to the too-long pathlen
   test, and generalized the message that gets reported in that case
   to reflect the change
As a result, if a negative pathlen were encountered, this function
would return EFSCORRUPTED (and would fail an assertion for a debug
build)--just as would a too-long pathlen.

Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-11-08 10:48:23 -06:00
J. Bruce Fields
16bfdaafa2 nfsd4: share open and lock owner hash tables
Now that they're used in the same way, it's a little simpler to put open
and lock owners in the same hash table, and I can't see a reason not to.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-08 11:28:45 -05:00
Steven Whitehouse
87654896ca GFS2: More automated code analysis fixes
A potentially uninitialised variable, some unreachable code,
and the main part of this, fixing the error path in the
unlink function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-08 14:04:20 +00:00
Bob Peterson
dfe4d34b39 GFS2: Add readahead to sequential directory traversal
This patch adds read-ahead capability to GFS2's
directory hash table management.  It greatly improves
performance for some directory operations.  For example:
In one of my file systems that has 1000 directories, each
of which has 1000 files, time to execute a recursive
ls (time ls -fR /mnt/gfs2 > /dev/null) was reduced
from 2m2.814s on a stock kernel to 0m45.938s.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-08 09:52:12 +00:00
Steven Whitehouse
20ed0535d3 GFS2: Fix up REQ flags
Christoph has split up REQ_PRIO from REQ_META. That means that
we can drop REQ_PRIO from places where is it not needed. I'm
not at all sure that the combination WRITE_FLUSH_FUA | REQ_PRIO
makes any kind of sense, anyway.

In addition, I've added REQ_META to one place in the code where
it was missing. REQ_PRIO has been left for read/writes triggered
by glock acquisition and writeback only. We can adjust it again
if required, but these are the most important points from a
performance perspective.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
2011-11-08 09:51:53 +00:00
J. Bruce Fields
06f1f864d4 nfsd4: hash lockowners to simplify RELEASE_LOCKOWNER
Hash lockowners on just the owner string rather than on (owner, inode).
This makes the owner-string lookup needed for RELEASE_LOCKOWNER simpler
(currently it's doing at a linear search through the entire hash
table!).  That may come at the expense of making (owner, inode) lookups
more expensive if a client reuses the same lockowner across multiple
files.  We might add a separate lookup for that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-07 21:10:48 -05:00
Bryan Schumaker
c7e8472cf8 NFSD: Remove unnecessary whitespace
The close parenthesis was hard to find with it spaced so far over.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[bfields@redhat.com: get all these lines under 80 chars while we're here]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-07 21:10:48 -05:00
Bryan Schumaker
7208339607 NFSD: Call nfsd4_init_slabs() from init_nfsd()
init_nfsd() was calling free_slabs() during cleanup code, but the call
to init_slabs() was hidden in nfsd4_state_init().  This could be
confusing to people unfamiliar with the code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-07 21:10:47 -05:00
Bryan Schumaker
65178db42a NFSD: Added fault injection
Fault injection on the NFS server makes it easier to test the client's
state manager and recovery threads.  Simulating errors on the server is
easier than finding the right conditions that cause them naturally.

This patch uses debugfs to add a simple framework for fault injection to
the server.  This framework is a config option, and can be enabled
through CONFIG_NFSD_FAULT_INJECTION.  Assuming you have debugfs mounted
to /sys/debug, a set of files will be created in /sys/debug/nfsd/.
Writing to any of these files will cause the corresponding action and
write a log entry to dmesg.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-07 21:10:47 -05:00
J. Bruce Fields
64a284d07c nfsd4: maintain one seqid stream per (lockowner, file)
Instead of creating a new lockowner and stateid for every
open_to_lockowner call, reuse the existing lockowner if it exists.

Reported-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-07 21:10:47 -05:00
J. Bruce Fields
684e563858 nfsd4: cleanup lock clientid handling in sessions case
I'd rather the "ignore clientid in sessions case" rule be enforced in
just one place.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-07 21:10:47 -05:00
J. Bruce Fields
b93d87c198 nfsd4: fix lockowner matching
Lockowners are looked up by file as well as by owner, but we were
forgetting to do a comparison on the file.  This could cause an
incorrect result from lockt.

(Note looking up the inode from the lockowner is pretty awkward here.
The data structures need fixing.)

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-07 21:10:47 -05:00
Al Viro
a3fbbde70a VFS: we need to set LOOKUP_JUMPED on mountpoint crossing
Mountpoint crossing is similar to following procfs symlinks - we do
not get ->d_revalidate() called for dentry we have arrived at, with
unpleasant consequences for NFS4.

Simple way to reproduce the problem in mainline:

    cat >/tmp/a.c <<'EOF'
    #include <unistd.h>
    #include <fcntl.h>
    #include <stdio.h>
    main()
    {
            struct flock fl = {.l_type = F_RDLCK, .l_whence = SEEK_SET, .l_len = 1};
            if (fcntl(0, F_SETLK, &fl))
                    perror("setlk");
    }
    EOF
    cc /tmp/a.c -o /tmp/test

then on nfs4:

    mount --bind file1 file2
    /tmp/test < file1		# ok
    /tmp/test < file2		# spews "setlk: No locks available"...

What happens is the missing call of ->d_revalidate() after mountpoint
crossing and that's where NFS4 would issue OPEN request to server.

The fix is simple - treat mountpoint crossing the same way we deal with
following procfs-style symlinks.  I.e.  set LOOKUP_JUMPED...

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-07 14:58:06 -08:00
Trond Myklebust
a6f498a891 NFS: Fix a regression in the referral code
Fix a regression that was introduced by commit
0c2e53f11a (NFS: Remove the unused
"lookupfh()" version of nfs4_proc_lookup()).

In the case where the lookup gets an NFS4ERR_MOVED, we want to return
the result of nfs4_get_referral(). Instead, that value is getting
clobbered by the call to nfs4_handle_exception()...

Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Tested-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-07 16:20:00 -05:00
slyich@gmail.com
45ea6095c8 btrfs: fix double-free 'tree_root' in 'btrfs_mount()'
On error path 'tree_root' is treed in 'free_fs_info()'.
No need to free it explicitely. Noticed by SLUB in debug mode:

Complete reproducer under usermode linux (discovered on real
machine):

    bdev=/dev/ubda
    btr_root=/btr
    /mkfs.btrfs $bdev
    mount $bdev $btr_root
    mkdir $btr_root/subvols/
    cd $btr_root/subvols/
    /btrfs su cr foo
    /btrfs su cr bar
    mount $bdev -osubvol=subvols/foo $btr_root/subvols/bar
    umount $btr_root/subvols/bar

which gives

device fsid 4d55aa28-45b1-474b-b4ec-da912322195e devid 1 transid 7 /dev/ubda
=============================================================================
BUG kmalloc-2048: Object already free
-----------------------------------------------------------------------------

INFO: Allocated in btrfs_mount+0x389/0x7f0 age=0 cpu=0 pid=277
INFO: Freed in btrfs_mount+0x51c/0x7f0 age=0 cpu=0 pid=277
INFO: Slab 0x0000000062886200 objects=15 used=9 fp=0x0000000070b4d2d0 flags=0x4081
INFO: Object 0x0000000070b4d2d0 @offset=21200 fp=0x0000000070b4a968
...
Call Trace:
70b31948:  [<6008c522>] print_trailer+0xe2/0x130
70b31978:  [<6008c5aa>] object_err+0x3a/0x50
70b319a8:  [<6008e242>] free_debug_processing+0x142/0x2a0
70b319e0:  [<600ebf6f>] btrfs_mount+0x55f/0x7f0
70b319f8:  [<6008e5c1>] __slab_free+0x221/0x2d0

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Arne Jansen <sensille@gmx.net>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-07 16:08:01 -05:00
Al Viro
50e696308c vfs: d_invalidate() should leave mountpoints alone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-07 10:54:10 -08:00
Linus Torvalds
ff4d7fa8c3 Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Cleanup byte-range locking code style
  CIFS: Simplify setlk error handling for mandatory locking
2011-11-07 09:56:22 -08:00
Linus Torvalds
e0d65113a7 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (226 commits)
  mtd: tests: annotate as DANGEROUS in Kconfig
  mtd: tests: don't use mtd0 as a default
  mtd: clean up usage of MTD_DOCPROBE_ADDRESS
  jffs2: add compr=lzo and compr=zlib options
  jffs2: implement mount option parsing and compression overriding
  mtd: nand: initialize ops.mode
  mtd: provide an alias for the redboot module name
  mtd: m25p80: don't probe device which has status of 'disabled'
  mtd: nand_h1900 never worked
  mtd: Add DiskOnChip G3 support
  mtd: m25p80: add EON flash EN25Q32B into spi flash id table
  mtd: mark block device queue as non-rotational
  mtd: r852: make r852_pm_ops static
  mtd: m25p80: add support for at25df321a spi data flash
  mtd: mxc_nand: preset_v1_v2: unlock all NAND flash blocks
  mtd: nand: switch `check_pattern()' to standard `memcmp()'
  mtd: nand: invalidate cache on unaligned reads
  mtd: nand: do not scan bad blocks with NAND_BBT_NO_OOB set
  mtd: nand: wait to set BBT version
  mtd: nand: scrub BBT on ECC errors
  ...

Fix up trivial conflicts:
 - arch/arm/mach-at91/board-usb-a9260.c
	Merged into board-usb-a926x.c
 - drivers/mtd/maps/lantiq-flash.c
	add_mtd_partitions -> mtd_device_register vs changed to use
	mtd_device_parse_register.
2011-11-07 09:11:16 -08:00
Linus Torvalds
cf5e15fbd7 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix the dark space calculation
  UBIFS: introduce a helper to dump scanning info
2011-11-07 08:52:19 -08:00