Commit Graph

48450 Commits

Author SHA1 Message Date
Trond Myklebust
b20135d0b2 NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid
If the layout segment is invalid, then we should not be adding more
write requests to the commit list. Instead, those writes should be
replayed after requesting a new layout.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31 15:55:35 -05:00
Jaegeuk Kim
1f6fa26199 f2fs: remove f2fs_bug_on in terms of max_depth
There is no report on this bug_on case, but if malicious attacker changed this
field intentionally, we can just reset it as a MAX value.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 11:42:46 -08:00
Trond Myklebust
af7cf05793 NFS: Allow multiple commit requests in flight per file
Allow synchronous RPC calls to wait for pending RPC calls to finish,
but also allow asynchronous ones to just fire off another commit.

With this patch, the xfstests generic/074 test completes in 226s
instead of 242s

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31 13:53:48 -05:00
Trond Myklebust
dc602dd706 NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs
The flexfiles layout in particular, seems to want to poke around in the
O_DIRECT flags when retransmitting.
This patch sets up an interface to allow it to call back into O_DIRECT
to handle retransmission correctly. It also fixes a potential bug whereby
we could change the behaviour of O_DIRECT if an error is already pending.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31 13:50:42 -05:00
Filipe Manana
9269d12b2d Btrfs: fix number of transaction units required to create symlink
We weren't accounting for the insertion of an inline extent item for the
symlink inode nor that we need to update the parent inode item (through
the call to btrfs_add_nondir()). So fix this by including two more
transaction units.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-31 18:18:40 +00:00
Filipe Manana
d50866d00f Btrfs: don't leave dangling dentry if symlink creation failed
When we are creating a symlink we might fail with an error after we
created its inode and added the corresponding directory indexes to its
parent inode. In this case we end up never removing the directory indexes
because the inode eviction handler, called for our symlink inode on the
final iput(), only removes items associated with the symlink inode and
not with the parent inode.

Example:

  $ mkfs.btrfs -f /dev/sdi
  $ mount /dev/sdi /mnt
  $ touch /mnt/foo
  $ ln -s /mnt/foo /mnt/bar
  ln: failed to create symbolic link ‘bar’: Cannot allocate memory
  $ umount /mnt
  $ btrfsck /dev/sdi
  Checking filesystem on /dev/sdi
  UUID: d5acb5ba-31bd-42da-b456-89dca2e716e1
  checking extents
  checking free space cache
  checking fs roots
  root 5 inode 258 errors 2001, no inode item, link count wrong
	unresolved ref dir 256 index 3 namelen 3 name bar filetype 7 errors 4, no inode ref
  found 131073 bytes used err is 1
  total csum bytes: 0
  total tree bytes: 131072
  total fs tree bytes: 32768
  total extent tree bytes: 16384
  btree space waste bytes: 124305
  file data blocks allocated: 262144
   referenced 262144
  btrfs-progs v4.2.3

So fix this by adding the directory index entries as the very last
step of symlink creation.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-31 18:10:56 +00:00
Filipe Manana
a879719b8c Btrfs: send, don't BUG_ON() when an empty symlink is found
When a symlink is successfully created it always has an inline extent
containing the source path. However if an error happens when creating
the symlink, we can leave in the subvolume's tree a symlink inode without
any such inline extent item - this happens if after btrfs_symlink() calls
btrfs_end_transaction() and before it calls the inode eviction handler
(through the final iput() call), the transaction gets committed and a
crash happens before the eviction handler gets called, or if a snapshot
of the subvolume is made before the eviction handler gets called. Sadly
we can't just avoid this by making btrfs_symlink() call
btrfs_end_transaction() after it calls the eviction handler, because the
later can commit the current transaction before it removes any items from
the subvolume tree (if it encounters ENOSPC errors while reserving space
for removing all the items).

So make send fail more gracefully, with an -EIO error, and print a
message to dmesg/syslog informing that there's an empty symlink inode,
so that the user can delete the empty symlink or do something else
about it.

Reported-by: Stephen R. van den Berg <srb@cuci.nl>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-31 18:08:20 +00:00
Jaegeuk Kim
732d56489f f2fs: fix f2fs_ioc_abort_volatile_write
There are two rules to handle aborting volatile or atomic writes.

1. drop atomic writes
 - we don't need to keep any stale db data.

2. write journal data
 - we should keep the journal data with fsync for db recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:53:25 -08:00
Chao Yu
4e0d836d5f f2fs: fix to skip recovering dot dentries in a readonly fs
If filesystem is readonly, leave user message info instead of recovering
inline dot inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:51:28 -08:00
Jaegeuk Kim
ed3d12561a f2fs: load largest extent all the time
Otherwise, we can get mismatched largest extent information.

One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:20 -08:00
Jaegeuk Kim
819d9153d4 f2fs: use i_size_read to get i_size
We need to use i_size_read() to get inode->i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:19 -08:00
Jaegeuk Kim
8dc0d6a11e f2fs: early check broken symlink length in the encrypted case
If link is broken, its len is zero, and we don't need to move forward.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:19 -08:00
Chao Yu
e96248bb45 f2fs: clean up f2fs_ioc_write_checkpoint
Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unused err variable]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:18 -08:00
Yunlei He
179448bfe4 f2fs: add a max block check for get_data_block_bmap
This patch adds a max block check for get_data_block_bmap.

Trinity test program will send a block number as parameter into
ioctl_fibmap, which will be used in get_node_path(), when the block
number large than f2fs max blocks, it will trigger kernel bug.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com>
[Jaegeuk Kim: fix missing condition, pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:17 -08:00
Fan Li
9a950d52b7 f2fs: fix bugs and simplify codes of f2fs_fiemap
fix bugs:
1. len could be updated incorrectly when start+len is beyond isize.
2. If there is a hole consisting of more than two blocks, it could
   fail to add FIEMAP_EXTENT_LAST flag for the last extent.
3. If there is an extent beyond isize, when we search extents in a range
   that ends at isize, it will also return the extent beyond isize,
   which is outside the range.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:16 -08:00
Chao Yu
6d5a1495ee f2fs: let user being aware of IO error
Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.

This sould be avoided, so this patch reports such kind of error to user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:15 -08:00
Chao Yu
d53841740f f2fs: add missing f2fs_balance_fs in __recover_dot_dentries
__recover_do_dentries will try to grab free space in storage, so fix to
add missing f2fs_balance_fs here.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:14 -08:00
Jaegeuk Kim
06d6f22639 f2fs: declare static function
The __f2fs_commit_super is static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:14 -08:00
Jaegeuk Kim
b4d07a3e1a f2fs: avoid f2fs_lock_op in f2fs_write_begin
If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in
order to avoid the checkpoint latency in the write syscall.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:13 -08:00
Jaegeuk Kim
4aa69d5667 f2fs: return early when trying to read null nid
If get_node_page() gets zero nid, we can return early without getting a wrong
page. For example, get_dnode_of_data() can try to do that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:12 -08:00
Jaegeuk Kim
2aadac085c f2fs: introduce prepare_write_begin to clean up
This patch adds prepare_write_begin to clean f2fs_write_begin.
The major role of this function is to convert any inline_data and allocate
or find block address.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:11 -08:00
Chao Yu
fba48a8b14 f2fs: don't convert inline inode when inline_data option is disable
If inline_data option is disable, when truncating an inline inode with
size which is not exceed maxinum inline size, we should not convert
inline inode to regular one to avoid the overhead of synchronizing
conversion.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:10 -08:00
Chao Yu
c34f42e2cb f2fs: report error of do_checkpoint
do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.

So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:09 -08:00
Jaegeuk Kim
2a34076070 f2fs: call f2fs_balance_fs only when node was changed
If user tries to update or read data, we don't need to call f2fs_balance_fs
which triggers f2fs_gc, which increases unnecessary long latency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:09 -08:00
Chao Yu
3104af35eb f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocks
Only cover sbi->cp_rwsem on one dnode page's allocation and modification
instead of multiple's in f2fs_map_blocks, it can reduce the covered region
of cp_rwsem, then we can avoid potential long time delay for concurrent
checkpointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:08 -08:00
Jaegeuk Kim
93bae099ea f2fs: record node block allocation in dnode_of_data
This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:07 -08:00
Jaegeuk Kim
00623e6bcf f2fs: avoid unnecessary f2fs_gc for dir operations
The f2fs_balance_fs doesn't need to cover f2fs_new_inode or f2fs_find_entry
works.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:06 -08:00
Jaegeuk Kim
b9d777b85f f2fs: check inline_data flag at converting time
We can check inode's inline_data flag  when calling to convert it.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:05 -08:00
Jaegeuk Kim
74fd8d9927 f2fs: speed up shrinking extent tree entries
If there is no candidates for shrinking slab entries, we don't need to traverse
any trees at all.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix missing initialization reported by Yunlei He]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:13:00 -08:00
Mike Marshall
f987f4c28a Orangefs: don't trigger copy_attributes_to_inode from d_revalidate.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-12-30 13:04:28 -05:00
Al Viro
fceef393a5 switch ->get_link() to delayed_call, kill ->put_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30 13:01:03 -05:00
Filipe Manana
2bc0bb5fe7 Btrfs: fix race between free space endio workers and space cache writeout
While running a stress test I ran into the following trace/transaction
abort:

[471626.672243] ------------[ cut here ]------------
[471626.673322] WARNING: CPU: 9 PID: 19107 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]()
[471626.675492] BTRFS: Transaction aborted (error -2)
[471626.676748] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix
[471626.688802] CPU: 14 PID: 19107 Comm: fsstress Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[471626.690148] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[471626.691901]  0000000000000000 ffff880016037cf0 ffffffff812566f4 ffff880016037d38
[471626.695009]  ffff880016037d28 ffffffff8104d0a6 ffffffffa040c84e 00000000fffffffe
[471626.697490]  ffff88011fe855f8 ffff88000c484cb0 ffff88000d195000 ffff880016037d90
[471626.699201] Call Trace:
[471626.699804]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
[471626.701049]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
[471626.702542]  [<ffffffffa040c84e>] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
[471626.704326]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
[471626.705636]  [<ffffffffa0403717>] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs]
[471626.707048]  [<ffffffffa040c84e>] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
[471626.708616]  [<ffffffffa048a50a>] commit_cowonly_roots+0x1d7/0x25a [btrfs]
[471626.709950]  [<ffffffffa041e34a>] btrfs_commit_transaction+0x4c4/0x991 [btrfs]
[471626.711286]  [<ffffffff81081c61>] ? signal_pending_state+0x31/0x31
[471626.712611]  [<ffffffffa03f6df4>] btrfs_sync_fs+0x145/0x1ad [btrfs]
[471626.715610]  [<ffffffff811962a2>] ? SyS_tee+0x226/0x226
[471626.716718]  [<ffffffff811962c2>] sync_fs_one_sb+0x20/0x22
[471626.717672]  [<ffffffff8116fc01>] iterate_supers+0x75/0xc2
[471626.718800]  [<ffffffff8119669a>] sys_sync+0x52/0x80
[471626.719990]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[471626.721835] ---[ end trace baf57f43d76693f4 ]---
[471626.722954] BTRFS: error (device sdc) in btrfs_write_dirty_block_groups:3740: errno=-2 No such entry

This is a very rare situation and it happened due to a race between a free
space endio worker and writing the space caches for dirty block groups at
a transaction's commit critical section. The steps leading to this are:

1) A task calls btrfs_commit_transaction() and starts the writeout of the
   space caches for all currently dirty block groups (i.e. it calls
   btrfs_start_dirty_block_groups());

2) The previous step starts writeback for space caches;

3) When the writeback finishes it queues jobs for free space endio work
   queue (fs_info->endio_freespace_worker) that execute
   btrfs_finish_ordered_io();

4) The task committing the transaction sets the transaction's state
   to TRANS_STATE_COMMIT_DOING and shortly after calls
   btrfs_write_dirty_block_groups();

5) A free space endio job joins the transaction, through
   btrfs_join_transaction_nolock(), and updates a free space inode item
   in the root tree through btrfs_update_inode_fallback();

6) Updating the free space inode item resulted in COWing one or more
   nodes/leaves of the root tree, and that resulted in creating a new
   metadata block group, which gets added to the transaction's list
   of dirty block groups (this is a very rare case);

7) The free space endio job has not released yet its transaction handle
   at this point, so the new metadata block group was not yet fully
   created (didn't go through btrfs_create_pending_block_groups() yet);

8) The transaction commit task sees the new metadata block group in
   the transaction's list of dirty block groups and processes it.
   When it attempts to update the block group's block group item in
   the extent tree, through write_one_cache_group(), it isn't able
   to find it and aborts the transaction with error -ENOENT - this
   is because the free space endio job hasn't yet released its
   transaction handle (which calls btrfs_create_pending_block_groups())
   and therefore the block group item was not yet added to the extent
   tree.

Fix this waiting for free space endio jobs if we fail to find a block
group item in the extent tree and then retry once updating the block
group item.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-30 16:08:13 +00:00
Trond Myklebust
86fb449b07 pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()
Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out
copy+paste has played cruel tricks on a nested loop.

Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-30 10:57:01 -05:00
Chris Mason
511711af91 btrfs: don't run delayed references while we are creating the free space tree
This is a short term solution to make sure btrfs_run_delayed_refs()
doesn't change the extent tree while we are scanning it to create the
free space tree.

Longer term we need to synchronize scanning the block groups one by one,
similar to what happens during a balance.

Signed-off-by: Chris Mason <clm@fb.com>
2015-12-30 07:52:35 -08:00
Chris Mason
b4570aa994 btrfs: fix compiling with CONFIG_BTRFS_DEBUG enabled.
Merging in the free space tree deleted a variable needed when
CONFIG_BTRFS_DEBUG=y

Signed-off-by: Chris Mason <clm@fb.com>
2015-12-30 07:37:26 -08:00
Trond Myklebust
ade14a7df7 NFS: Fix attribute cache revalidation
If a NFSv4 client uses the cache_consistency_bitmask in order to
request only information about the change attribute, timestamps and
size, then it has not revalidated all attributes, and hence the
attribute timeout timestamp should not be updated.

Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-30 10:18:26 -05:00
xuejiufei
cc28d6d80f ocfs2/dlm: clear migration_pending when migration target goes down
We have found a BUG on res->migration_pending when migrating lock
resources.  The situation is as follows.

dlm_mark_lockres_migration
  res->migration_pending = 1;
  __dlm_lockres_reserve_ast
  dlm_lockres_release_ast returns with res->migration_pending remains
      because other threads reserve asts
  wait dlm_migration_can_proceed returns 1
  >>>>>>> o2hb found that target goes down and remove target
          from domain_map
  dlm_migration_can_proceed returns 1
  dlm_mark_lockres_migrating returns -ESHOTDOWN with
      res->migration_pending still remains.

When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
with res->migration_pending.  So clear migration_pending when target is
down.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29 17:45:49 -08:00
Junxiao Bi
b5a8bc338e ocfs2: fix flock panic issue
Commit 4f6563677a ("Move locks API users to locks_lock_inode_wait()")
move flock/posix lock indentify code to locks_lock_inode_wait(), but
missed to set fl_flags to FL_FLOCK which caused the following kernel
panic on 4.4.0_rc5.

  kernel BUG at fs/locks.c:1895!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ocfs2(O) ocfs2_dlmfs(O) ocfs2_stack_o2cb(O) ocfs2_dlm(O) ocfs2_nodemanager(O) ocfs2_stackglue(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront
  CPU: 0 PID: 20268 Comm: flock_unit_test Tainted: G           O    4.4.0-rc5-next-20151217 #1
  Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
  task: ffff88007b3672c0 ti: ffff880028b58000 task.ti: ffff880028b58000
  RIP: locks_lock_inode_wait+0x2e/0x160
  Call Trace:
    ocfs2_do_flock+0x91/0x160 [ocfs2]
    ocfs2_flock+0x76/0xd0 [ocfs2]
    SyS_flock+0x10f/0x1a0
    entry_SYSCALL_64_fastpath+0x12/0x71
  Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 48 81 ec 88 00 00 00 8b 46 40 83 e0 03 83 f8 01 0f 84 ad 00 00 00 83 f8 02 74 04 <0f> 0b eb fe 4c 8d ad 60 ff ff ff 4c 8d 7b 58 e8 0e 8e 73 00 4d
  RIP  locks_lock_inode_wait+0x2e/0x160
   RSP <ffff880028b5bce8>
  ---[ end trace dfca74ec9b5b274c ]---

Fixes: 4f6563677a ("Move locks API users to locks_lock_inode_wait()")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29 17:45:49 -08:00
Joseph Qi
5c9ee4cbf2 ocfs2: fix BUG when calculate new backup super
When resizing, it firstly extends the last gd.  Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.

But it currently doesn't consider the situation that the backup super is
already done.  And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:

    BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);

So check whether the backup super is done and then do the updates.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29 17:45:49 -08:00
Al Viro
cd3417c8fc kill free_page_put_link()
all callers are better off with kfree_put_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-29 16:03:53 -05:00
Trond Myklebust
5c5fc09a11 NFS: Ensure we revalidate attributes before using execute_ok()
Donald Buczek reports that NFS clients can also report incorrect
results for access() due to lack of revalidation of attributes
before calling execute_ok().
Looking closely, it seems chdir() is afflicted with the same problem.

Fix is to ensure we call nfs_revalidate_inode_rcu() or
nfs_revalidate_inode() as appropriate before deciding to trust
execute_ok().

Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Link: http://lkml.kernel.org/r/1451331530-3748-1-git-send-email-buczek@molgen.mpg.de
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 19:37:05 -05:00
Trond Myklebust
58baac0ac7 Merge branch 'flexfiles'
* flexfiles:
  pNFS/flexfiles: Ensure we record layoutstats even if RPC is terminated early
  pNFS: Add flag to track if we've called nfs4_ff_layout_stat_io_start_read/write
  pNFS/flexfiles: Fix a statistics gathering imbalance
  pNFS/flexfiles: Don't mark the entire layout as failed, when returning it
  pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGET
  pnfs/flexfiles: count io stat in rpc_count_stats callback
  pnfs/flexfiles: do not mark delay-like status as DS failure
  NFS41: map NFS4ERR_LAYOUTUNAVAILABLE to ENODATA
  nfs: only remove page from mapping if launder_page fails
  nfs: handle request add failure properly
  nfs: centralize pgio error cleanup
  nfs: clean up rest of reqs when failing to add one
  NFS41: pop some layoutget errors to application
  pNFS/flexfiles: Support server-supplied layoutstats sampling period
2015-12-28 14:49:41 -05:00
Trond Myklebust
e07db907eb NFSv4: List stateid information in the callback tracepoints
The stateid is extremely valuable when debugging.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:05 -05:00
Trond Myklebust
e0d9243048 NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL
If the client is promising to return the layout ASAP, then there is no
need to return DELAY and have the server retry. Instead default to the
normal procedure described in RFC5661.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:05 -05:00
Trond Myklebust
41c9127d6d NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1
The RFC requires us to check if the server is recalling a stateid that we
haven't yet received. If so, tell it to wait.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:04 -05:00
Trond Myklebust
fc7ff36747 pNFS: If we have to delay the layout callback, mark the layout for return
If the client needs to delay the layout callback, then speed up the recall
process by marking the remaining layout segments to be actively returned
by the client.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:04 -05:00
Trond Myklebust
0654cc726f NFSv4.1/pNFS: Add a helper to mark the layout as returned
This ensures that we don't reuse the stateid if a layout return or
implied layout return means that we've returned all layout segments

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:04 -05:00
Trond Myklebust
ab7d763e47 pNFS: Ensure nfs4_layoutget_prepare returns the correct error
If we're unable to perform the layoutget due to an invalid open stateid
or a bulk recall, ensure that we return the error so that the caller
can decide on an appropriate action.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:03 -05:00
Trond Myklebust
4d0ac22109 pNFS/flexfiles: Ensure we record layoutstats even if RPC is terminated early
Currently, we will only record the layoutstats correctly if the
RPC call successfully obtains a slot. If we exit before that
happens, then we may find ourselves starting the busy timer through
the call in ff_layout_(read|write)_prepare_layoutstats, but never stopping it.

The same thing happens if we're doing DA-DS.

The fix is to ensure that we catch these cases in the rpc_release()
callback.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:41 -05:00
Trond Myklebust
37e9ed22b1 pNFS: Add flag to track if we've called nfs4_ff_layout_stat_io_start_read/write
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:41 -05:00