Commit Graph

47524 Commits

Author SHA1 Message Date
Pavel Shilovsky
6cc3b24235 CIFS: Fix SMB2+ interim response processing for read requests
For interim responses we only need to parse a header and update
a number credits. Now it is done for all SMB2+ command except
SMB2_READ which is wrong. Fix this by adding such processing.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-29 00:21:36 -06:00
Justin Maggard
deb7deff2f cifs: fix out-of-bounds access in lease parsing
When opening a file, SMB2_open() attempts to parse the lease state from the
SMB2 CREATE Response.  However, the parsing code was not careful to ensure
that the create contexts are not empty or invalid, which can lead to out-
of-bounds memory access.  This can be seen easily by trying
to read a file from a OSX 10.11 SMB3 server.  Here is sample crash output:

BUG: unable to handle kernel paging request at ffff8800a1a77cc6
IP: [<ffffffff8828a734>] SMB2_open+0x804/0x960
PGD 8f77067 PUD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 3 PID: 2876 Comm: cp Not tainted 4.5.0-rc3.x86_64.1+ #14
Hardware name: NETGEAR ReadyNAS 314          /ReadyNAS 314          , BIOS 4.6.5 10/11/2012
task: ffff880073cdc080 ti: ffff88005b31c000 task.ti: ffff88005b31c000
RIP: 0010:[<ffffffff8828a734>]  [<ffffffff8828a734>] SMB2_open+0x804/0x960
RSP: 0018:ffff88005b31fa08  EFLAGS: 00010282
RAX: 0000000000000015 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88007eb8c8b0
RBP: ffff88005b31fad8 R08: 666666203d206363 R09: 6131613030383866
R10: 3030383866666666 R11: 00000000000002b0 R12: ffff8800660fd800
R13: ffff8800a1a77cc2 R14: 00000000424d53fe R15: ffff88005f5a28c0
FS:  00007f7c8a2897c0(0000) GS:ffff88007eb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8800a1a77cc6 CR3: 000000005b281000 CR4: 00000000000006e0
Stack:
 ffff88005b31fa70 ffffffff88278789 00000000000001d3 ffff88005f5a2a80
 ffffffff00000003 ffff88005d029d00 ffff88006fde05a0 0000000000000000
 ffff88005b31fc78 ffff88006fde0780 ffff88005b31fb2f 0000000100000fe0
Call Trace:
 [<ffffffff88278789>] ? cifsConvertToUTF16+0x159/0x2d0
 [<ffffffff8828cf68>] smb2_open_file+0x98/0x210
 [<ffffffff8811e80c>] ? __kmalloc+0x1c/0xe0
 [<ffffffff882685f4>] cifs_open+0x2a4/0x720
 [<ffffffff88122cef>] do_dentry_open+0x1ff/0x310
 [<ffffffff88268350>] ? cifsFileInfo_get+0x30/0x30
 [<ffffffff88123d92>] vfs_open+0x52/0x60
 [<ffffffff88131dd0>] path_openat+0x170/0xf70
 [<ffffffff88097d48>] ? remove_wait_queue+0x48/0x50
 [<ffffffff88133a29>] do_filp_open+0x79/0xd0
 [<ffffffff8813f2ca>] ? __alloc_fd+0x3a/0x170
 [<ffffffff881240c4>] do_sys_open+0x114/0x1e0
 [<ffffffff881241a9>] SyS_open+0x19/0x20
 [<ffffffff8896e257>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: 4d 8d 6c 07 04 31 c0 4c 89 ee e8 47 6f e5 ff 31 c9 41 89 ce 44 89 f1 48 c7 c7 28 b1 bd 88 31 c0 49 01 cd 4c 89 ee e8 2b 6f e5 ff <45> 0f b7 75 04 48 c7 c7 31 b1 bd 88 31 c0 4d 01 ee 4c 89 f6 e8
RIP  [<ffffffff8828a734>] SMB2_open+0x804/0x960
 RSP <ffff88005b31fa08>
CR2: ffff8800a1a77cc6
---[ end trace d9f69ba64feee469 ]---

Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2016-02-29 00:21:31 -06:00
Jan Kara
74c66bcb7e ext4: Fix data exposure after failed AIO DIO
When AIO DIO fails e.g. due to IO error, we must not convert unwritten
extents as that will expose uninitialized data. Handle this case
by clearing unwritten flag from io_end in case of error and thus
preventing extent conversion.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-02-29 08:36:38 +11:00
Linus Torvalds
12b9fa6a97 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do_last(): ELOOP failure exit should be done after leaving RCU mode
  should_follow_link(): validate ->d_seq after having decided to follow
  namei: ->d_inode of a pinned dentry is stable only for positives
  do_last(): don't let a bogus return value from ->open() et.al. to confuse us
  fs: return -EOPNOTSUPP if clone is not supported
  hpfs: don't truncate the file when delete fails
2016-02-27 17:10:32 -08:00
Al Viro
5129fa482b do_last(): ELOOP failure exit should be done after leaving RCU mode
... or we risk seeing a bogus value of d_is_symlink() there.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:37:37 -05:00
Al Viro
a7f775428b should_follow_link(): validate ->d_seq after having decided to follow
... otherwise d_is_symlink() above might have nothing to do with
the inode value we've got.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:31:01 -05:00
Al Viro
d4565649b6 namei: ->d_inode of a pinned dentry is stable only for positives
both do_last() and walk_component() risk picking a NULL inode out
of dentry about to become positive, *then* checking its flags and
seeing that it's not negative anymore and using (already stale by
then) value they'd fetched earlier.  Usually ends up oopsing soon
after that...

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:23:16 -05:00
Al Viro
c80567c82a do_last(): don't let a bogus return value from ->open() et.al. to confuse us
... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.

Cc: stable@vger.kernel.org # v3.6+, at least
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:17:33 -05:00
Christoph Hellwig
0fcbf996d8 fs: return -EOPNOTSUPP if clone is not supported
-EBADF is a rather confusing error if an operations is not supported,
and nfsd gets rather upset about it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:15:51 -05:00
Mikulas Patocka
b6853f78e7 hpfs: don't truncate the file when delete fails
The delete opration can allocate additional space on the HPFS filesystem
due to btree split. The HPFS driver checks in advance if there is
available space, so that it won't corrupt the btree if we run out of space
during splitting.

If there is not enough available space, the HPFS driver attempted to
truncate the file, but this results in a deadlock since the commit
7dd29d8d86 ("HPFS: Introduce a global mutex
and lock it on every callback from VFS").

This patch removes the code that tries to truncate the file and -ENOSPC is
returned instead. If the user hits -ENOSPC on delete, he should try to
delete other files (that are stored in a leaf btree node), so that the
delete operation will make some space for deleting the file stored in
non-leaf btree node.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: stable@vger.kernel.org	# 2.6.39+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:15:51 -05:00
Linus Torvalds
691429e13d Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dax: move writeback calls into the filesystems
  dax: give DAX clearing code correct bdev
  ext4: online defrag not supported with DAX
  ext2, ext4: only set S_DAX for regular inodes
  block: disable block device DAX by default
  ocfs2: unlock inode if deleting inode from orphan fails
  mm: ASLR: use get_random_long()
  drivers: char: random: add get_random_long()
  mm: numa: quickly fail allocations for NUMA balancing on full nodes
  mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-27 12:46:16 -08:00
Linus Torvalds
1c271479b5 Merge tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext2/4 DAX fix from Ted Ts'o:
 "This fixes a file system corruption bug with DAX"

* tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
2016-02-27 12:40:49 -08:00
Ross Zwisler
1e9d180ba3 ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry.  For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect.  The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.

Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().

Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-27 14:01:16 -05:00
Ross Zwisler
7f6d5b529b dax: move writeback calls into the filesystems
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev.  This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.

Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device.  This also fixes DAX code to properly flush caches in response
to sync(2).

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
20a90f5899 dax: give DAX clearing code correct bdev
dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases.  This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.

Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block.  This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
73f34a5e2c ext4: online defrag not supported with DAX
Online defrag operations for ext4 are hard coded to use the page cache.
See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()

When combined with DAX I/O, which circumvents the page cache, this can
result in data corruption.  This was observed with xfstests ext4/307 and
ext4/308.

Fix this by only allowing online defrag for non-DAX files.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
0a6cf9137d ext2, ext4: only set S_DAX for regular inodes
When S_DAX is set on an inode we assume that if there are pages attached
to the mapping (mapping->nrpages != 0), those pages are clean zero pages
that were used to service reads from holes.  Any dirty data associated
with the inode should be in the form of DAX exceptional entries
(mapping->nrexceptional) that is written back via
dax_writeback_mapping_range().

With the current code, though, this isn't always true.  For example,
ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
data stored as dirty page cache entries.  For these types of inodes,
having S_DAX set doesn't really make sense since their I/O doesn't
actually happen through the DAX code path.

Instead, only allow S_DAX to be set for regular inodes for ext2 and
ext4.  This allows us to have strict DAX vs non-DAX paths in the
writeback code.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Dan Williams
03cdadb040 block: disable block device DAX by default
The recent *sync enabling discovered that we are inserting into the
block_device pagecache counter to the expectations of the dirty data
tracking for dax mappings.  This can lead to data corruption.

We want to support DAX for block devices eventually, but it requires
wider changes to properly manage the pagecache.

   dump_stack+0x85/0xc2
   dax_writeback_mapping_range+0x60/0xe0
   blkdev_writepages+0x3f/0x50
   do_writepages+0x21/0x30
   __filemap_fdatawrite_range+0xc6/0x100
   filemap_write_and_wait+0x4a/0xa0
   set_blocksize+0x70/0xd0
   sb_set_blocksize+0x1d/0x50
   ext4_fill_super+0x75b/0x3360
   mount_bdev+0x180/0x1b0
   ext4_mount+0x15/0x20
   mount_fs+0x38/0x170

Mark the support broken so its disabled by default, but otherwise still
available for testing.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Guozhonghua
a4a8481ff6 ocfs2: unlock inode if deleting inode from orphan fails
When doing append direct io cleanup, if deleting inode fails, it goes
out without unlocking inode, which will cause the inode deadlock.

This issue was introduced by commit cf1776a9e8 ("ocfs2: fix a tiny
race when truncate dio orohaned entry").

Signed-off-by: Guozhonghua <guozhonghua@h3c.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Daniel Cashman
5ef11c35ce mm: ASLR: use get_random_long()
Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long().  Also address shifting bug which, in
case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.

Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Chao Yu
19c7377b56 f2fs: fix to avoid deadlock when merging inline data
When testing with fsstress, kworker and user threads were both blocked:

INFO: task kworker/u16:1:16580 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:1   D ffff8803f2595390     0 16580      2 0x00000000
Workqueue: writeback bdi_writeback_workfn (flush-251:0)
 ffff8802730e5760 0000000000000046 ffff880274729fc0 0000000000012440
 ffff8802730e5fd8 ffff8802730e4010 0000000000012440 0000000000012440
 ffff8802730e5fd8 0000000000012440 ffff880274729fc0 ffff88026eb50000
Call Trace:
 [<ffffffff816fe9d9>] schedule+0x29/0x70
 [<ffffffff816ff895>] rwsem_down_read_failed+0xa5/0xf9
 [<ffffffff81378584>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffffa0694feb>] f2fs_write_data_page+0x31b/0x420 [f2fs]
 [<ffffffffa0690f1a>] __f2fs_writepage+0x1a/0x50 [f2fs]
 [<ffffffffa06922a0>] f2fs_write_data_pages+0xe0/0x290 [f2fs]
 [<ffffffff811473b3>] do_writepages+0x23/0x40
 [<ffffffff811cc3ee>] __writeback_single_inode+0x4e/0x250
 [<ffffffff811cd4f1>] writeback_sb_inodes+0x2c1/0x470
 [<ffffffff811cd73e>] __writeback_inodes_wb+0x9e/0xd0
 [<ffffffff811cda0b>] wb_writeback+0x1fb/0x2d0
 [<ffffffff811cdb7c>] wb_do_writeback+0x9c/0x220
 [<ffffffff811ce232>] bdi_writeback_workfn+0x72/0x1c0
 [<ffffffff8106b74e>] process_one_work+0x1de/0x5b0
 [<ffffffff8106e78f>] worker_thread+0x11f/0x3e0
 [<ffffffff810750ce>] kthread+0xde/0xf0
 [<ffffffff817093f8>] ret_from_fork+0x58/0x90

fsstress thread stack:
 [<ffffffff81139f0e>] sleep_on_page+0xe/0x20
 [<ffffffff81139ef7>] __lock_page+0x67/0x70
 [<ffffffff8113b100>] find_lock_page+0x50/0x80
 [<ffffffff8113b24f>] find_or_create_page+0x3f/0xb0
 [<ffffffffa06983a9>] sync_node_pages+0x259/0x810 [f2fs]
 [<ffffffffa068d874>] write_checkpoint+0x1a4/0xce0 [f2fs]
 [<ffffffffa0686b0c>] f2fs_sync_fs+0x7c/0xd0 [f2fs]
 [<ffffffffa067c813>] f2fs_sync_file+0x143/0x5f0 [f2fs]
 [<ffffffff811d301b>] vfs_fsync_range+0x2b/0x40
 [<ffffffff811d304c>] vfs_fsync+0x1c/0x20
 [<ffffffff811d3291>] do_fsync+0x41/0x70
 [<ffffffff811d32d3>] SyS_fdatasync+0x13/0x20
 [<ffffffff817094a2>] system_call_fastpath+0x16/0x1b
 [<ffffffffffffffff>] 0xffffffffffffffff

The reason of this issue is:
CPU0:					CPU1:
 - f2fs_write_data_pages
					 - f2fs_sync_fs
					  - write_checkpoint
					   - block_operations
					    - f2fs_lock_all
					     - down_write(sbi->cp_rwsem)
  - lock_page(page)
  - f2fs_write_data_page
					    - sync_node_pages
					     - flush_inline_data
					      - pagecache_get_page(page, GFP_LOCK)
   - f2fs_lock_op
    - down_read(sbi->cp_rwsem)

This patch alters to use trylock_page in flush_inline_data to fix this ABBA
deadlock issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26 11:52:03 -08:00
Chao Yu
406657dd18 f2fs: introduce f2fs_flush_merged_bios for cleanup
Add a new helper f2fs_flush_merged_bios to clean up redundant codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26 11:52:02 -08:00
Chao Yu
f28b3434af f2fs: introduce f2fs_update_data_blkaddr for cleanup
Add a new help f2fs_update_data_blkaddr to clean up redundant codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26 11:52:01 -08:00
Chao Yu
4356e48e64 f2fs crypto: fix incorrect positioning for GCing encrypted data page
For now, flow of GCing an encrypted data page:
1) try to grab meta page in meta inode's mapping with index of old block
address of that data page
2) load data of ciphertext into meta page
3) allocate new block address
4) write the meta page into new block address
5) update block address pointer in direct node page.

Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to
check and wait on GCed encrypted data cached in meta page writebacked
in order to avoid inconsistence among data page cache, meta page cache
and data on-disk when updating.

However, we will use new block address updated in step 5) as an index to
lookup meta page in inner bio buffer. That would be wrong, and we will
never find the GCing meta page, since we use the old block address as
index of that page in step 1).

This patch fixes the issue by adjust the order of step 1) and step 3),
and in step 1) grab page with index generated in step 3).

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26 11:51:58 -08:00
Mike Marshall
9f08cfe944 Orangefs: update orangefs.txt
Al Viro has cleaned up the way ops are processed and waited for,
now orangefs.txt has an overview of how it works. Several recent
related commits have added to the comments in the code as well.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26 14:39:08 -05:00
Mike Marshall
ca9f518ead Orangefs: code sanitation.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26 10:21:12 -05:00
Arnd Bergmann
401898eed7 orangefs: remove unused 'diff' function
orangefs contains a helper function to calculate the difference
between two timeval structures. We are trying to remove all
instances of timespec from the kernel, and this one is not
used at all, so let's remove it now.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26 10:18:43 -05:00
Arnd Bergmann
be81ce48b2 orangefs: avoid time conversion function
The new orangefs code uses a helper function to read a time field to
its private structures from struct iattr. This will conflict with the
move to 64-bit timestamps in the kernel and is generally not necessary.

This replaces the conversion with a simple cast to time64_t that shows
what is going on. As the orangefs-internal representation already uses
64-bit timestamps, there should be no ambiguity to negative values,
and the cast ensures that we treat them as times before 1970 on both
32-bit and 64-bit architectures, rather than times after 2038. This
patch keeps that behavior.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26 10:18:39 -05:00
David Sterba
f5bc27c71a Merge branch 'dev/control-ioctl' into for-chris-4.6 2016-02-26 15:38:34 +01:00
David Sterba
fa695b01bc Merge branch 'misc-4.6' into for-chris-4.6
# Conflicts:
#	fs/btrfs/file.c
2016-02-26 15:38:34 +01:00
David Sterba
f004fae0cf Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
David Sterba
675d276b32 Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6 2016-02-26 15:38:32 +01:00
David Sterba
e9ddd77a31 Merge branch 'foreign/josef/space-updates' into for-chris-4.6 2016-02-26 15:38:31 +01:00
David Sterba
ff7db6e05a Merge branch 'foreign/zhaolei/reada' into for-chris-4.6 2016-02-26 15:38:30 +01:00
David Sterba
23c1a966f2 Merge branch 'foreign/qu/norecovery-v7' into for-chris-4.6 2016-02-26 15:38:30 +01:00
David Sterba
67d605fec1 Merge branch 'dev/rename-keys' into for-chris-4.6 2016-02-26 15:38:29 +01:00
David Sterba
e22b3d1fbe Merge branch 'dev/gfp-flags' into for-chris-4.6 2016-02-26 15:38:28 +01:00
David Sterba
5f1b5664d9 Merge branch 'chandan/prep-subpage-blocksize' into for-chris-4.6
# Conflicts:
#	fs/btrfs/file.c
2016-02-26 15:38:28 +01:00
Deepa Dinamani
b1f1a29d8f configfs: Replace CURRENT_TIME by current_fs_time()
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-02-26 10:42:35 +01:00
Chao Yu
80dd9c0e9d f2fs: fix incorrect upper bound when iterating inode mapping tree
1. Inode mapping tree can index page in range of [0, ULONG_MAX], however,
in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX],
result in miss hitting in page cache.

2. filemap_fdatawait_range accepts range parameters in unit of bytes, so
the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX]
as range for waiting on writeback, big number of pages will not be covered.

This patch corrects above two issues.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-25 17:27:03 -08:00
David Woodhouse
be629c62a6 Fix directory hardlinks from deleted directories
When a directory is deleted, we don't take too much care about killing off
all the dirents that belong to it — on the basis that on remount, the scan
will conclude that the directory is dead anyway.

This doesn't work though, when the deleted directory contained a child
directory which was moved *out*. In the early stages of the fs build
we can then end up with an apparent hard link, with the child directory
appearing both in its true location, and as a child of the original
directory which are this stage of the mount process we don't *yet* know
is defunct.

To resolve this, take out the early special-casing of the "directories
shall not have hard links" rule in jffs2_build_inode_pass1(), and let the
normal nlink processing happen for directories as well as other inodes.

Then later in the build process we can set ic->pino_nlink to the parent
inode#, as is required for directories during normal operaton, instead
of the nlink. And complain only *then* about hard links which are still
in evidence even after killing off all the unreachable paths.

Reported-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
2016-02-25 11:11:28 +00:00
David Woodhouse
49e91e7079 jffs2: Fix page lock / f->sem deadlock
With this fix, all code paths should now be obtaining the page lock before
f->sem.

Reported-by: Szabó Tamás <sztomi89@gmail.com>
Tested-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
2016-02-25 11:11:26 +00:00
Thomas Betker
157078f64b Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
This reverts commit 5ffd3412ae
("jffs2: Fix lock acquisition order bug in jffs2_write_begin").

The commit modified jffs2_write_begin() to remove a deadlock with
jffs2_garbage_collect_live(), but this introduced new deadlocks found
by multiple users. page_lock() actually has to be called before
mutex_lock(&c->alloc_sem) or mutex_lock(&f->sem) because
jffs2_write_end() and jffs2_readpage() are called with the page locked,
and they acquire c->alloc_sem and f->sem, resp.

In other words, the lock order in jffs2_write_begin() was correct, and
it is the jffs2_garbage_collect_live() path that has to be changed.

Revert the commit to get rid of the new deadlocks, and to clear the way
for a better fix of the original deadlock.

Reported-by: Deng Chao <deng.chao1@zte.com.cn>
Reported-by: Ming Liu <liu.ming50@gmail.com>
Reported-by: wangzaiwei <wangzaiwei@top-vision.cn>
Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
2016-02-25 11:11:25 +00:00
Martin Brandenburg
69a23de2f3 orangefs: clean up fill_default_sys_attrs
Size and type are read-only and not in the mask. The times were left
unset despite being in the mask.

We zero-fill the times since the server will fill them in and we will
get the correct time when we fill the inode with getattr.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:51 -05:00
Martin Brandenburg
6ceaf7818f orangefs: we never lookup with sym_follow set
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:51 -05:00
Martin Brandenburg
9c2bcf288e orangefs: remove vestigial async io code
I have verified that there is nothing in the userspace daemon version we
are implementing this protocol against that ever looks at this field.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:50 -05:00
Martin Brandenburg
47b4948fdb orangefs: use ORANGEFS_NAME_LEN everywhere; remove ORANGEFS_NAME_MAX
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:50 -05:00
Martin Brandenburg
ee70fca0bc orangefs: don't d_drop in d_revalidate since the caller will
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:50 -05:00
Martin Brandenburg
ee3b8d377c orangefs: free readdir buffer index before the dir_emit loop
We only need it while the service operation is actually in progress
since it is only used to co-ordinate the client-core's memory use. The
kernel allocates its own space.

Also clean up some comments which mislead the reader into thinking
the readdir buffers are shared memory.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:50 -05:00
Linus Torvalds
aa263c43fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Assorted fixes - xattr one from this cycle, the rest - stable fodder"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/pnode.c: treat zero mnt_group_id-s as unequal
  affs_do_readpage_ofs(): just use kmap_atomic() around memcpy()
  xattr handlers: plug a lock leak in simple_xattr_list
  fs: allow no_seek_end_llseek to actually seek
2016-02-24 14:00:26 -08:00