Commit Graph

48450 Commits

Author SHA1 Message Date
Chao Yu
9cd81ce3c2 f2fs: disallow switch extent_cache option dynamically
Swith extent_cache option dynamically when remount may casue consistency
issue between extent cache and dnode page. Fix in this patch to avoid
that condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Chao Yu
46c9e1413f f2fs: use correct flag in f2fs_map_blocks()
We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc88 ("f2fs: fix
incorrect mapping for bmap"), but forget to use this flag in the right
place, fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
f9811703fe f2fs: fix to handle io error in ->direct_IO
Here is a oops reported as following message when testing generic/019 of
xfstest:

 ------------[ cut here ]------------
 kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
nf_def
 CPU: 2 PID: 25441 Comm: fio Tainted: G           O    4.3.0-rc1+ #6
 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
 task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000
 RIP: 0010:[<ffffffffa0784981>]  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
 RSP: 0018:ffff8803fd61f918  EFLAGS: 00010246
 RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f
 RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300
 RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000
 R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001
 FS:  00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0
 Stack:
  000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b
  0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000
  ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358
 Call Trace:
  [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs]
  [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs]
  [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs]
  [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs]
  [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160
  [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0
  [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200
  [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20
  [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs]
  [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs]
  [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290
  [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50
  [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0
  [<ffffffff8120b913>] do_io_submit+0x373/0x630
  [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0
  [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20
  [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a
 Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb
 RIP  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
  RSP <ffff8803fd61f918>
 ---[ end trace 2e577d7f711ddb86 ]---

The reason is that: in the test of generic/019, we will trigger a manmade
IO error in block layer through debugfs, after that, prefree segment will
no longer be freed, because we always skip doing gc or checkpoint when
there occurs an IO error.

Meanwhile fio with aio engine generated a large number of direct IOs,
which continue allocating spaces in free segment until we run out of them,
eventually, results in panic in new_curseg as no more free segment was
found.

So, this patch changes to return EIO in direct_IO for this condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
ea58711e88 f2fs: do in batches truncation in truncate_hole
truncate_data_blocks_range can do in batches truncation which makes all
changes in dnode page content, dnode page status, extent cache, block
count updating together.

But previously, truncate_hole() always truncates one block in dnode page
at a time by invoking truncate_data_blocks_range(,1), which make thing
slow.

This patch changes truncate_hole() to do in batches truncation for all
target blocks in one direct node inside truncate_data_blocks_range, which
can make our punch hole operation in ->fallocate more efficent.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Fan Li
4d1fa815f2 f2fs: optimize code of f2fs_update_extent_tree_range
Fix 2 potential problems:
1. when largest extent needs to be invalidated, it will be reset in
   __drop_largest_extent, which makes __is_extent_same after always
   return false, and largest extent unchanged. Now we update it properly.

2. when extent is split and the latter part remains in tree, next_en
   should be the latter part instead of next extent of original extent.
   It will cause merge failure if there is in-place update, although
   there is not, I think this fix will still makes codes less ambiguous.

This patch also simplifies codes of invalidating extents, and optimizes the
procedues that split extent into two.
There are a few modifications after last patch:
1. prev_en now is updated properly.
2. more codes and branches are simplified.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Fan Li
41a099de3a f2fs: drop largest extent by range
now we update extent by range, fofs may not be on the largest
extent if the new extent overlaps with it. so add a new function
to drop largest extent properly.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
a7230d16d5 f2fs: check end_io for metapages before making next checkpoint blocks
This patch avoids to produce new checkpoint blocks before the previous meta
pages were written completely.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
569cf1876a f2fs crypto: allocate buffer for decrypting filename
We got dentry pages from high_mem, and its address space directly goes into the
decryption path via f2fs_fname_disk_to_usr.
But, sg_init_one assumes the address is not from high_mem, so we can get this
panic since it doesn't call kmap_high but kunmap_high is triggered at the end.

kernel BUG at ../../../../../../kernel/mm/highmem.c:290!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
 (kunmap_high+0xb0/0xb8) from [<c0114534>] (__kunmap_atomic+0xa0/0xa4)
 (__kunmap_atomic+0xa0/0xa4) from [<c035f028>] (blkcipher_walk_done+0x128/0x1ec)
 (blkcipher_walk_done+0x128/0x1ec) from [<c0366c24>] (crypto_cbc_decrypt+0xc0/0x170)
 (crypto_cbc_decrypt+0xc0/0x170) from [<c0367148>] (crypto_cts_decrypt+0xc0/0x114)
 (crypto_cts_decrypt+0xc0/0x114) from [<c035ea98>] (async_decrypt+0x40/0x48)
 (async_decrypt+0x40/0x48) from [<c032ca34>] (f2fs_fname_disk_to_usr+0x124/0x304)
 (f2fs_fname_disk_to_usr+0x124/0x304) from [<c03056fc>] (f2fs_fill_dentries+0xac/0x188)
 (f2fs_fill_dentries+0xac/0x188) from [<c03059c8>] (f2fs_readdir+0x1f0/0x300)
 (f2fs_readdir+0x1f0/0x300) from [<c0218054>] (vfs_readdir+0x90/0xb4)
 (vfs_readdir+0x90/0xb4) from [<c0218418>] (SyS_getdents64+0x64/0xcc)
 (SyS_getdents64+0x64/0xcc) from [<c0105ba0>] (ret_fast_syscall+0x0/0x30)

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Chao Yu
973163fc0c f2fs: reorganize f2fs_map_blocks
In this patch, we try to reorganize f2fs_map_blocks to make block mapping
flow more clear by using following structure:

/* check status of mapping */

if (unmapped) {
	/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */

	if (create) {
		/* write path, handle dio write case here */
		alloc_and_map;
	} else {
		/*
		 * handle read cases from all call paths:
		 *     1. generic read;
		 *     2. dio read;
		 *     3. fiemap;
		 *     4. bmap
		 */
	}
}

/* map buffer_header */

Besides, this patch handles the missing case correctly for dio write:
When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
not allocate blocks correctly for preallocated blocks, but returning with
an unmapped buffer head, which will result in failure of dio write.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
514053e454 f2fs: declare f2fs_update_extent_tree_range as static
This function should be static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
9edcdabf36 f2fs: fix overflow of size calculation
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:

pgoff_t index =  0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000

So we should cast index with 64-bits type to avoid this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
100136acfb f2fs: fix incorrect searching position when shrinking extent cache
When shrinking extent cache, we have two steps in the flow:
1) shrink objects which are unreferenced by inodes;
2) shrink objects from LRU list of extent cache.

In step 1, if we haven't shrunk enough number of objects, we will try
step 2, but before that we didn't update the searching position which
may point to last inode index in global extent tree, result in failing
to shrink objects by traversing the all inodes' extent tree.

In this patch, we reset searching position to beginning of global extent
tree for fixing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
c998012b0b f2fs: verify file type early in f2fs_fallocate
This patch changes to verify file type early in f2fs_fallocate for
cleanup, meanwhile this also fixes to add missing verification for
expand_inode_data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Jaegeuk Kim
c5cd29d21c f2fs: no need to lock for update_inode_page all the time
As comment says, we don't need to call f2fs_lock_op in write_inode to prevent
from producing dirty node pages all the time.
That happens only when there is not enough free sections and we can avoid that
by calling balance_fs in prior to that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Jaegeuk Kim
25b93346a6 f2fs: cover number of dirty node pages under node_write lock
This number is referenced by checkpoint under node_write lock.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Nicholas Krause
538e17e7e9 f2fs: fix incorrect return statement in the function f2fs_ioc_release_volatile_write
This fixes the incorrect return statement at the end of the function
f2fs_ioc_release_volatile_write's body for returning zero as this is
incorrect due to the function call before this return statement to
the function punch_hole being able to fail and we should return this
function's return fail directly in order to signal to callers of the
function f2fs_ioc_release_volatile if a failure arises with this call
to punch_hole fails.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Chao Yu
744288c721 f2fs: trace in batches extent info update
Rename trace_f2fs_update_extent_tree to trace_f2fs_update_extent_tree_range,
then expand and enable it to trace in batches extent info updates.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Christoph Hellwig
8c3ad9cb73 nfsd/blocklayout: accept any minlength
Recent Linux clients have started to send GETLAYOUT requests with
minlength less than blocksize.

Servers aren't really allowed to impose this kind of restriction on
layouts; see RFC 5661 section 18.43.3 for details.

This has been observed to cause indefinite hangs on fsx runs on some
clients.

Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-09 16:11:40 -04:00
Jens Axboe
fd48ca3849 Merge tag 'v4.3-rc4' into for-4.4/core
Linux 4.3-rc4

Pulling in v4.3-rc4 to avoid conflicts with NVMe fixes that have gone
in since for-4.4/core was based.
2015-10-09 10:08:39 -06:00
Trond Myklebust
037fc9808a NFSv4: Unify synchronous and asynchronous error handling
They now only differ in the way we handle waiting, so let's unify.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08 10:45:53 -04:00
Trond Myklebust
4816fdadab NFSv4: Don't use synchronous delegation recall in exception handling
The code needs to be able to work from inside an asynchronous context.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08 10:45:53 -04:00
Trond Myklebust
516285ebe0 NFSv4: nfs4_async_handle_error should take a non-const nfs_server
For symmetry with the synchronous handler, and so that we can potentially
handle errors such as NFS4ERR_BADNAME.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08 10:45:52 -04:00
Trond Myklebust
2598ed3445 NFSv4: Update the delay statistics counter for synchronous delays
Currently, we only do so for asynchronous delays.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08 10:45:51 -04:00
Trond Myklebust
b3c2aa0745 NFSv4: Refactor NFSv4 error handling
Prepare for unification of the synchronous and asynchronous error
handling.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08 10:45:51 -04:00
David Sterba
f14d104dbd btrfs: switch more printks to our helpers
Convert the simple cases, not all functions provide a way to reach the
fs_info. Also skipped debugging messages (print-tree, integrity
checker and pr_debug) and messages that are printed from possibly
unfinished mount.

Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08 13:08:03 +02:00
David Sterba
9464732266 btrfs: switch message printers to ratelimited variants
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08 13:04:06 +02:00
David Sterba
1dd6d7ca9d btrfs: introduce ratelimited variants of message printing functions
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08 11:07:56 +02:00
David Sterba
b14af3b46f btrfs: switch message printers to ratelimited _in_rcu variants
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08 11:07:55 +02:00
David Sterba
24aa6b41d4 btrfs: introduce ratelimited _in_rcu variants of message printing functions
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08 11:07:55 +02:00
David Sterba
ecaeb14b91 btrfs: switch message printers to _in_rcu variants
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08 11:07:55 +02:00
David Sterba
08a84e25a8 btrfs: introduce _in_rcu variants of message printing functions
Due to the missing variants there are messages that lack the information
printed by btrfs_info etc helpers.

Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-08 11:07:55 +02:00
Emilio López
7f5028cf61 sysfs: Support is_visible() on binary attributes
According to the sysfs header file:

    "The returned value will replace static permissions defined in
     struct attribute or struct bin_attribute."

but this isn't the case, as is_visible is only called on struct attribute
only. This patch introduces a new is_bin_visible() function to implement
the same functionality for binary attributes, and updates documentation
accordingly.

Note that to keep functionality and code similar to that of normal
attributes, the mode is now checked as well to ensure it contains only
read/write permissions or SYSFS_PREALLOC.

Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Emilio López <emilio.lopez@collabora.co.uk>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2015-10-07 15:05:31 -07:00
Linus Torvalds
a0eeb8dd34 Merge tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Bugfixes:
   - Fix a use-after-free bug in the RPC/RDMA client
   - Fix a write performance regression
   - Fix up page writeback accounting
   - Don't try to reclaim unused state owners
   - Fix a NFSv4 nograce recovery hang
   - reset states to use open_stateid when returning delegation
     voluntarily
   - Fix a tracepoint NULL-pointer dereference"

* tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a tracepoint NULL-pointer dereference
  nfs4: reset states to use open_stateid when returning delegation voluntarily
  NFSv4: Fix a nograce recovery hang
  NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FH
  NFSv4: Don't try to reclaim unused state owners
  NFS: Fix a write performance regression
  NFS: Fix up page writeback accounting
  xprtrdma: disconnect and flush cqs before freeing buffers
2015-10-07 08:54:22 +01:00
Anna Schumaker
39d0d3bdf7 NFS: Fix a tracepoint NULL-pointer dereference
Running xfstest generic/013 with the tracepoint nfs:nfs4_open_file
enabled produces a NULL-pointer dereference when calculating fileid and
filehandle of the opened file.  Fix this by checking if state is NULL
before trying to use the inode pointer.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-06 18:56:25 -04:00
NeilBrown
7d35199e15 BTRFS: support NFSv2 export
The "fh_len" passed to ->fh_to_* is not guaranteed to be that same as
that returned by encode_fh - it may be larger.

With NFSv2, the filehandle is fixed length, so it may appear longer
than expected and be zero-padded.

So we must test that fh_len is at least some value, not exactly equal
to it.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Sterba <dsterba@suse.cz>
2015-10-06 06:55:23 -07:00
chandan
e5fffbac4a Btrfs: open_ctree: Fix possible memory leak
After reading one of chunk or tree root tree's root node from disk, if the
root node does not have EXTENT_BUFFER_UPTODATE flag set, we fail to release
the memory used by the root node. Fix this.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
2015-10-06 06:55:22 -07:00
Linus Torvalds
3c68319b28 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Two fixes for problems pointed out by automated tools.

  Thanks PaX/grsecurity team and Dan Carpenter (and the Smatch tool)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] Update cifs version number
  [SMB3] Do not fall back to SMBWriteX in set_file_size error cases
  [SMB3] Missing null tcon check
2015-10-06 14:30:21 +01:00
Filipe Manana
d9a0540a79 Btrfs: fix deadlock when finalizing block group creation
Josef ran into a deadlock while a transaction handle was finalizing the
creation of its block groups, which produced the following trace:

  [260445.593112] fio             D ffff88022a9df468     0  8924   4518 0x00000084
  [260445.593119]  ffff88022a9df468 ffffffff81c134c0 ffff880429693c00 ffff88022a9df488
  [260445.593126]  ffff88022a9e0000 ffff8803490d7b00 ffff8803490d7b18 ffff88022a9df4b0
  [260445.593132]  ffff8803490d7af8 ffff88022a9df488 ffffffff8175a437 ffff8803490d7b00
  [260445.593137] Call Trace:
  [260445.593145]  [<ffffffff8175a437>] schedule+0x37/0x80
  [260445.593189]  [<ffffffffa0850f37>] btrfs_tree_lock+0xa7/0x1f0 [btrfs]
  [260445.593197]  [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0
  [260445.593225]  [<ffffffffa07eac44>] btrfs_lock_root_node+0x34/0x50 [btrfs]
  [260445.593253]  [<ffffffffa07eff6b>] btrfs_search_slot+0x88b/0xa00 [btrfs]
  [260445.593295]  [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs]
  [260445.593324]  [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
  [260445.593351]  [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
  [260445.593394]  [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs]
  [260445.593427]  [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs]
  [260445.593459]  [<ffffffffa0800964>] do_chunk_alloc+0x2a4/0x2e0 [btrfs]
  [260445.593491]  [<ffffffffa0803815>] find_free_extent+0xa55/0xd90 [btrfs]
  [260445.593524]  [<ffffffffa0803c22>] btrfs_reserve_extent+0xd2/0x220 [btrfs]
  [260445.593532]  [<ffffffff8119fe5d>] ? account_page_dirtied+0xdd/0x170
  [260445.593564]  [<ffffffffa0803e78>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs]
  [260445.593597]  [<ffffffffa080c9de>] ? btree_set_page_dirty+0xe/0x10 [btrfs]
  [260445.593626]  [<ffffffffa07eb5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs]
  [260445.593654]  [<ffffffffa07ebbff>] btrfs_cow_block+0x11f/0x1c0 [btrfs]
  [260445.593682]  [<ffffffffa07ef8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs]
  [260445.593724]  [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs]
  [260445.593752]  [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
  [260445.593830]  [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
  [260445.593905]  [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs]
  [260445.593946]  [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs]
  [260445.593990]  [<ffffffffa0815798>] btrfs_commit_transaction+0xa8/0xb40 [btrfs]
  [260445.594042]  [<ffffffffa085abcd>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs]
  [260445.594089]  [<ffffffffa082bc84>] btrfs_sync_file+0x294/0x350 [btrfs]
  [260445.594115]  [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0
  [260445.594133]  [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180
  [260445.594149]  [<ffffffff8123e35d>] do_fsync+0x3d/0x70
  [260445.594169]  [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110
  [260445.594187]  [<ffffffff8123e600>] SyS_fsync+0x10/0x20
  [260445.594204]  [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71

This happened because the same transaction handle created a large number
of block groups and while finalizing their creation (inserting new items
and updating existing items in the chunk and device trees) a new metadata
extent had to be allocated and no free space was found in the current
metadata block groups, which made find_free_extent() attempt to allocate
a new block group via do_chunk_alloc(). However at do_chunk_alloc() we
ended up allocating a new system chunk too and exceeded the threshold
of 2Mb of reserved chunk bytes, which makes do_chunk_alloc() enter the
final part of block group creation again (at
btrfs_create_pending_block_groups()) and attempt to lock again the root
of the chunk tree when it's already write locked by the same task.

Similarly we can deadlock on extent tree nodes/leafs if while we are
running delayed references we end up creating a new metadata block group
in order to allocate a new node/leaf for the extent tree (as part of
a CoW operation or growing the tree), as btrfs_create_pending_block_groups
inserts items into the extent tree as well. In this case we get the
following trace:

  [14242.773581] fio             D ffff880428ca3418     0  3615   3100 0x00000084
  [14242.773588]  ffff880428ca3418 ffff88042d66b000 ffff88042a03c800 ffff880428ca3438
  [14242.773594]  ffff880428ca4000 ffff8803e4b20190 ffff8803e4b201a8 ffff880428ca3460
  [14242.773600]  ffff8803e4b20188 ffff880428ca3438 ffffffff8175a437 ffff8803e4b20190
  [14242.773606] Call Trace:
  [14242.773613]  [<ffffffff8175a437>] schedule+0x37/0x80
  [14242.773656]  [<ffffffffa057ff07>] btrfs_tree_lock+0xa7/0x1f0 [btrfs]
  [14242.773664]  [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0
  [14242.773692]  [<ffffffffa0519c44>] btrfs_lock_root_node+0x34/0x50 [btrfs]
  [14242.773720]  [<ffffffffa051ef6b>] btrfs_search_slot+0x88b/0xa00 [btrfs]
  [14242.773750]  [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
  [14242.773758]  [<ffffffff811ef4a2>] ? kmem_cache_alloc+0x1d2/0x200
  [14242.773786]  [<ffffffffa0520ad1>] btrfs_insert_item+0x71/0xf0 [btrfs]
  [14242.773818]  [<ffffffffa052f292>] btrfs_create_pending_block_groups+0x102/0x200 [btrfs]
  [14242.773850]  [<ffffffffa052f96e>] do_chunk_alloc+0x2ae/0x2f0 [btrfs]
  [14242.773934]  [<ffffffffa0532825>] find_free_extent+0xa55/0xd90 [btrfs]
  [14242.773998]  [<ffffffffa0532c22>] btrfs_reserve_extent+0xc2/0x1d0 [btrfs]
  [14242.774041]  [<ffffffffa0532e38>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs]
  [14242.774078]  [<ffffffffa051a5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs]
  [14242.774118]  [<ffffffffa051abff>] btrfs_cow_block+0x11f/0x1c0 [btrfs]
  [14242.774155]  [<ffffffffa051e8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs]
  [14242.774194]  [<ffffffffa0528021>] ? __btrfs_free_extent.isra.70+0x2e1/0xcb0 [btrfs]
  [14242.774235]  [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
  [14242.774274]  [<ffffffffa051994a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
  [14242.774318]  [<ffffffffa052c433>] __btrfs_run_delayed_refs+0xbb3/0x1020 [btrfs]
  [14242.774358]  [<ffffffffa052f404>] btrfs_run_delayed_refs.part.78+0x74/0x280 [btrfs]
  [14242.774391]  [<ffffffffa052f627>] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
  [14242.774432]  [<ffffffffa05be236>] commit_cowonly_roots+0x8d/0x2bd [btrfs]
  [14242.774474]  [<ffffffffa059d07f>] ? __btrfs_run_delayed_items+0x1cf/0x210 [btrfs]
  [14242.774516]  [<ffffffffa05adac3>] ? btrfs_qgroup_account_extents+0x83/0x130 [btrfs]
  [14242.774558]  [<ffffffffa0544c40>] btrfs_commit_transaction+0x590/0xb40 [btrfs]
  [14242.774599]  [<ffffffffa0589b9d>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs]
  [14242.774642]  [<ffffffffa055ac54>] btrfs_sync_file+0x294/0x350 [btrfs]
  [14242.774650]  [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0
  [14242.774657]  [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180
  [14242.774663]  [<ffffffff8123e35d>] do_fsync+0x3d/0x70
  [14242.774669]  [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110
  [14242.774675]  [<ffffffff8123e600>] SyS_fsync+0x10/0x20
  [14242.774681]  [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71

Fix this by never recursing into the finalization phase of block group
creation and making sure we never trigger the finalization of block group
creation while running delayed references.

Reported-by: Josef Bacik <jbacik@fb.com>
Fixes: 00d80e342c ("Btrfs: fix quick exhaustion of the system array in the superblock")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-10-05 16:56:38 -07:00
Filipe Manana
808f80b467 Btrfs: update fix for read corruption of compressed and shared extents
My previous fix in commit 005efedf2c ("Btrfs: fix read corruption of
compressed and shared extents") was effective only if the compressed
extents cover a file range with a length that is not a multiple of 16
pages. That's because the detection of when we reached a different range
of the file that shares the same compressed extent as the previously
processed range was done at extent_io.c:__do_contiguous_readpages(),
which covers subranges with a length up to 16 pages, because
extent_readpages() groups the pages in clusters no larger than 16 pages.
So fix this by tracking the start of the previously processed file
range's extent map at extent_readpages().

The following test case for fstests reproduces the issue:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_cloner

  rm -f $seqres.full

  test_clone_and_read_compressed_extent()
  {
      local mount_opts=$1

      _scratch_mkfs >>$seqres.full 2>&1
      _scratch_mount $mount_opts

      # Create our test file with a single extent of 64Kb that is going to
      # be compressed no matter which compression algo is used (zlib/lzo).
      $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
          $SCRATCH_MNT/foo | _filter_xfs_io

      # Now clone the compressed extent into an adjacent file offset.
      $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
          $SCRATCH_MNT/foo $SCRATCH_MNT/foo

      echo "File digest before unmount:"
      md5sum $SCRATCH_MNT/foo | _filter_scratch

      # Remount the fs or clear the page cache to trigger the bug in
      # btrfs. Because the extent has an uncompressed length that is a
      # multiple of 16 pages, all the pages belonging to the second range
      # of the file (64K to 128K), which points to the same extent as the
      # first range (0K to 64K), had their contents full of zeroes instead
      # of the byte 0xaa. This was a bug exclusively in the read path of
      # compressed extents, the correct data was stored on disk, btrfs
      # just failed to fill in the pages correctly.
      _scratch_remount

      echo "File digest after remount:"
      # Must match the digest we got before.
      md5sum $SCRATCH_MNT/foo | _filter_scratch
  }

  echo -e "\nTesting with zlib compression..."
  test_clone_and_read_compressed_extent "-o compress=zlib"

  _scratch_unmount

  echo -e "\nTesting with lzo compression..."
  test_clone_and_read_compressed_extent "-o compress=lzo"

  status=0
  exit

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Timofey Titovets <nefelim4ag@gmail.com>
2015-10-05 16:56:27 -07:00
Filipe Manana
b786f16ac3 Btrfs: send, fix corner case for reference overwrite detection
When the inode given to did_overwrite_ref() matches the current progress
and has a reference that collides with the reference of other inode that
has the same number as the current progress, we were always telling our
caller that the inode's reference was overwritten, which is incorrect
because the other inode might be a new inode (different generation number)
in which case we must return false from did_overwrite_ref() so that its
callers don't use an orphanized path for the inode (as it will never be
orphanized, instead it will be unlinked and the new inode created later).

The following test case for fstests reproduces the issue:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"

  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      rm -fr $send_files_dir
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # real QA test starts here
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _need_to_be_root

  send_files_dir=$TEST_DIR/btrfs-test-$seq

  rm -f $seqres.full
  rm -fr $send_files_dir
  mkdir $send_files_dir

  _scratch_mkfs >>$seqres.full 2>&1
  _scratch_mount

  # Create our test file with a single extent of 64K.
  mkdir -p $SCRATCH_MNT/foo
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" $SCRATCH_MNT/foo/bar \
      | _filter_xfs_io

  _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
      $SCRATCH_MNT/mysnap1
  _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \
      $SCRATCH_MNT/mysnap2

  echo "File digest before being replaced:"
  md5sum $SCRATCH_MNT/mysnap1/foo/bar | _filter_scratch

  # Remove the file and then create a new one in the same location with
  # the same name but with different content. This new file ends up
  # getting the same inode number as the previous one, because that inode
  # number was the highest inode number used by the snapshot's root and
  # therefore when attempting to find the a new inode number for the new
  # file, we end up reusing the same inode number. This happens because
  # currently btrfs uses the highest inode number summed by 1 for the
  # first inode created once a snapshot's root is loaded (done at
  # fs/btrfs/inode-map.c:btrfs_find_free_objectid in the linux kernel
  # tree).
  # Having these two different files in the snapshots with the same inode
  # number (but different generation numbers) caused the btrfs send code
  # to emit an incorrect path for the file when issuing an unlink
  # operation because it failed to realize they were different files.
  rm -f $SCRATCH_MNT/mysnap2/foo/bar
  $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 96K" \
      $SCRATCH_MNT/mysnap2/foo/bar | _filter_xfs_io

  _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/mysnap2 \
      $SCRATCH_MNT/mysnap2_ro

  _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap
  _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 \
      $SCRATCH_MNT/mysnap2_ro -f $send_files_dir/2.snap

  echo "File digest in the original filesystem after being replaced:"
  md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch

  # Now recreate the filesystem by receiving both send streams and verify
  # we get the same file contents that the original filesystem had.
  _scratch_unmount
  _scratch_mkfs >>$seqres.full 2>&1
  _scratch_mount

  _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/1.snap
  _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/2.snap

  echo "File digest in the new filesystem:"
  # Must match the digest from the new file.
  md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch

  status=0
  exit

Reported-by: Martin Raiber <martin@urbackup.org>
Fixes: 8b191a6849 ("Btrfs: incremental send, check if orphanized dir inode needs delayed rename")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-10-05 16:56:27 -07:00
Mike Marshall
548049495c Orangefs: fix some checkpatch.pl complaints that had creeped in.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-10-05 13:44:24 -04:00
Wei Fang
eb042ec359 jffs2: fix a memleak in read_direntry()
Need to free the memory allocated for 'fd' if failed to read all
of the remainder name.

Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2015-10-04 22:30:50 +01:00
NeilBrown
65da3484d9 sysfs: correctly handle short reads on PREALLOC attrs.
attributes declared with __ATTR_PREALLOC use sysfs_kf_read()
which ignores the 'count' arg.
So a 1-byte read request can return more bytes than that.

This is seen with the 'dash' shell when 'read' is used on
some 'md' sysfs attributes.

So only return the 'min' of count and the attribute length.

Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04 19:42:22 +01:00
Ulf Magnusson
398dc4ad52 debugfs: document that debugfs_remove*() accepts NULL and error values
According to commit a59d6293e5 ("debugfs: change parameter check in
debugfs_remove() functions"), this is meant to make cleanup easier for
callers. In that case it ought to be documented.

Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04 11:36:07 +01:00
Viresh Kumar
621a5f7ad9 debugfs: Pass bool pointer to debugfs_create_bool()
Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument,
when all it needs is a boolean pointer.

It would be better to update this API to make it accept 'bool *'
instead, as that will make it more consistent and often more convenient.
Over that bool takes just a byte.

That required updates to all user sites as well, in the same commit
updating the API. regmap core was also using
debugfs_{read|write}_file_bool(), directly and variable types were
updated for that to be bool as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04 11:36:07 +01:00
Steve French
616a5399b8 [CIFS] Update cifs version number
Update modinfo cifs.ko version number to 2.08

Signed-off-by: Steve French <steve.french@primarydata.com>
2015-10-03 16:54:17 -05:00
shengyong
be186cc4de UBIFS: print verbose message when rescanning a corrupted node
This is a trivial fix of showing verbose message when leb-recovery detects
a corrupted node, which is not the last one in the LEB. Rescan expects to
show more detail of the corrupted node.

Reviewed-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-10-03 20:42:01 +02:00
shengyong
8f6983abe3 UBIFS: call dbg_is_power_cut() instead of reading c->dbg->pc_happened
Call dbg_is_power_cut() to emulate power cut instead of reading
c->dbg->pc_happened. Otherwise, the function becomes dead code.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-10-03 20:40:21 +02:00
Dongsheng Yang
7d25b361a2 UBIFS: fix a typo in comment of ubifs_budget_req
s/now/how

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-10-03 20:03:25 +02:00
Andrzej Hajda
bbc8a0044f UBIFS: use kmemdup rather than duplicating its implementation
The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2014320

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-10-03 20:03:14 +02:00