Commit Graph

48450 Commits

Author SHA1 Message Date
Jaegeuk Kim
1d353eb7e4 f2fs: fix ERR_PTR returned by bio
This is to fix wrong error pointer handling flow reported by Dan.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15 15:21:19 -07:00
Jann Horn
3e0a396546 xfs: fix type confusion in xfs_ioc_swapext
Without this check, the following XFS_I invocations would return bad
pointers when used on non-XFS inodes (perhaps pointers into preceding
allocator chunks).

This could be used by an attacker to trick xfs_swap_extents into
performing locking operations on attacker-chosen structures in kernel
memory, potentially leading to code execution in the kernel.  (I have
not investigated how likely this is to be usable for an attack in
practice.)

Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-16 06:30:06 +09:00
Daniel Bristot de Oliveira
752d596b39 tracing: Use __get_str() when manipulating strings
Use __get_str(str) rather than __get_dynamic_array(str) when
deadling with strings.

It is just a code cleanup, no changes on tracepoint ABI.

Link: http://lkml.kernel.org/r/ea260df91817411cca2a1f3db2abd88860094788.1467407618.git.bristot@redhat.com

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-nfs@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-15 15:52:20 -04:00
Kinglong Mee
c77efc1e78 nfs/blocklayout: Check max uuids and devices before decoding
Avoid nfs return uuids/devices larger than maximum.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:51:11 -04:00
Kinglong Mee
ecc2b88c4a nfs/blocklayout: Make sure calculate signature length aligned
Avoid a bad nfs server return an unaligned length of signature.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:51:11 -04:00
Christoph Hellwig
11487ddbdb nfs/blocklayout: support RH/Fedora dm-mpath device nodes
Instead of reusing the wwn-* names for multipath devices nodes RHEL and
Fedora introduce new dm-mpath-uuid-* nodes with a slightly different
naming scheme.  Try these names first to ensure we always get a
multipath-capable device if it exists.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Christoph Hellwig
d702d41ed4 nfs/blocklayout: refactor open-by-wwn
The current code works with the standard udev/systemd names, but we'll have
to add another method in the next patch.  Refactor it into a separate helper
to make room for the new variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Christoph Hellwig
0173ca0544 nfs/blocklayout: use proper fmode for opening block devices
This was fixed for the original block layout code a while ago, but also
needs to be fixed for the SCSI layout path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Jeff Layton
8a4c392688 nfsd: allow nfsd to advertise multiple layout types
If the underlying filesystem supports multiple layout types, then there
is little reason not to advertise that fact to clients and let them
choose what type to use.

Turn the ex_layout_type field into a bitfield. For each supported
layout type, we set a bit in that field. When the client requests a
layout, ensure that the bit for that layout type is set. When the
client requests attributes, send back a list of supported types.

Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-15 15:31:32 -04:00
Chuck Lever
885848186f nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
nfsd4_release_lockowner finds a lock owner that has no lock state,
and drops cl_lock. Then release_lockowner picks up cl_lock and
unhashes the lock owner.

During the window where cl_lock is dropped, I don't see anything
preventing a concurrent nfsd4_lock from finding that same lock owner
and adding lock state to it.

Move release_lockowner() into nfsd4_release_lockowner and hang onto
the cl_lock until after the lock owner's state cannot be found
again.

Found by inspection, we don't currently have a reproducer.

Fixes: 2c41beb0e5 ("nfsd: reduce cl_lock thrashing in ... ")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-15 15:31:31 -04:00
Kinglong Mee
dd51db1886 nfsd/blocklayout: Make sure calculate signature/designator length aligned
These values are all multiples of 4 already, so there's no change in
behavior from this patch.  But perhaps this will prevent mistakes in the
future.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-15 15:31:30 -04:00
Benjamin Coddington
15d66ac209 xfs: abstract block export operations from nfsd layouts
Instead of creeping pnfs layout configuration into filesystems, move the
definition of block-based export operations under a more abstract
configuration.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-15 15:31:29 -04:00
H.J. Lu
3ebfd81f7f x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
Don't use the same syscall numbers for 2 different syscalls:

 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
 534	x32	preadv2			compat_sys_preadv2
 535	x32	pwritev2		compat_sys_pwritev2

Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset
is passed in one 64-bit register on x32, similar to compat_sys_preadv64()
and compat_sys_pwritev64().

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:30:26 +02:00
Vegard Nossum
7bc9491645 ext4: verify extent header depth
Although the extent tree depth of 5 should enough be for the worst
case of 2*32 extents of length 1, the extent tree code does not
currently to merge nodes which are less than half-full with a sibling
node, or to shrink the tree depth if possible.  So it's possible, at
least in theory, for the tree depth to be greater than 5.  However,
even in the worst case, a tree depth of 32 is highly unlikely, and if
the file system is maliciously corrupted, an insanely large eh_depth
can cause memory allocation failures that will trigger kernel warnings
(here, eh_depth = 65280):

    JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
    CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
    Stack:
     604a8947 625badd8 0002fd09 00000000
     60078643 00000000 62623910 601bf9bc
     62623970 6002fc84 626239b0 900000125
    Call Trace:
     [<6001c2dc>] show_stack+0xdc/0x1a0
     [<601bf9bc>] dump_stack+0x2a/0x2e
     [<6002fc84>] __warn+0x114/0x140
     [<6002fdff>] warn_slowpath_null+0x1f/0x30
     [<60165829>] start_this_handle+0x569/0x580
     [<60165d4e>] jbd2__journal_start+0x11e/0x220
     [<60146690>] __ext4_journal_start_sb+0x60/0xa0
     [<60120a81>] ext4_truncate+0x131/0x3a0
     [<60123677>] ext4_setattr+0x757/0x840
     [<600d5d0f>] notify_change+0x16f/0x2a0
     [<600b2b16>] do_truncate+0x76/0xc0
     [<600c3e56>] path_openat+0x806/0x1300
     [<600c55c9>] do_filp_open+0x89/0xf0
     [<600b4074>] do_sys_open+0x134/0x1e0
     [<600b4140>] SyS_open+0x20/0x30
     [<6001ea68>] handle_syscall+0x88/0x90
     [<600295fd>] userspace+0x3fd/0x500
     [<6001ac55>] fork_handler+0x85/0x90

    ---[ end trace 08b0b88b6387a244 ]---

[ Commit message modified and the extent tree depath check changed
from 5 to 32 -- tytso ]

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-15 00:22:07 -04:00
Vegard Nossum
c65d5c6c81 ext4: short-cut orphan cleanup on error
If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.

    EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
    EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
    Aborting journal on device loop0-8.
    EXT4-fs (loop0): Remounting filesystem read-only
    EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
    EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
    EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
    EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
    EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
    [...]
    EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
    [...]
    EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
    [...]

See-also: c9eb13a910 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2016-07-14 23:21:35 -04:00
Vegard Nossum
554a5ccc4e ext4: fix reference counting bug on block allocation error
If we hit this error when mounted with errors=continue or
errors=remount-ro:

    EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata

then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
continue. However, ext4_mb_release_context() is the wrong thing to call
here since we are still actually using the allocation context.

Instead, just error out. We could retry the allocation, but there is a
possibility of getting stuck in an infinite loop instead, so this seems
safer.

[ Fixed up so we don't return EAGAIN to userspace. --tytso ]

Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
2016-07-14 23:02:47 -04:00
Trond Myklebust
8b7d9d09b2 NFSv4: Revert "Truncating file opens should also sync O_DIRECT writes"
We're not holding any locks, so both nfs_wb_all() and inode_dio_wait()
are unenforcible and have livelock potential. Just limit ourselves to
flushing out the data.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-14 12:42:40 -04:00
Fengguang Wu
077e2642fb chardev: add missing line break in pr_warn
To fix super long dmesg error lines like

  CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation rangeCHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation rangeswapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)

After fix, it should look like

  CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation range
  CHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation range
  swapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)

Reported-by: Philip Li <philip.li@intel.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-14 16:21:53 +09:00
Christophe JAILLET
d28c442f5b nfsd: Fix some indent inconsistancy
Silent a few smatch warnings about indentation

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:53:41 -04:00
Oleg Drokin
93f580a9a2 nfsd: Correct a comment for NFSD_MAY_ defines location
Those are now defined in fs/nfsd/vfs.h

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:53:40 -04:00
Tom Haynes
9b9960a0ca nfsd: Add a super simple flex file server
Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2)
is also the ds (NFSv3). I.e., the metadata and the data file are
the exact same file.

This will allow testing of the flex file client.

Simply add the "pnfs" export option to your export
in /etc/exports and mount from a client that supports
flex files.

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:40:48 -04:00
Tom Haynes
d7c920d134 nfsd: flex file device id encoding will need the server address
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:40:47 -04:00
Andrew Elble
ed94164398 nfsd: implement machine credential support for some operations
This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.

The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.

Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.

Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:32:47 -04:00
Andrew Elble
dedeb13f9e nfsd: allow mach_creds_match to be used more broadly
Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:32:47 -04:00
Dan Williams
7a9eb20666 pmem: kill __pmem address space
The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem().  Now that
wmb_pmem() is gone, there is little need to keep this annotation.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 19:25:38 -07:00
Dan Williams
14df6a4e7e fs/dax: remove wmb_pmem()
Flushing posted-write queues is now deferred to REQ_FLUSH context, or
otherwise handled by an ADR event at the platform level.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 15:13:48 -07:00
Sachin Prabhu
8d9535b6ef cifs: Check for existing directory when opening file with O_CREAT
When opening a file with O_CREAT flag, check to see if the file opened
is an existing directory.

This prevents the directory from being opened which subsequently causes
a crash when the close function for directories cifs_closedir() is called
which frees up the file->private_data memory while the file is still
listed on the open file list for the tcon.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
2016-07-12 16:09:38 -05:00
Bob Peterson
44f52122a2 GFS2: Check rs_free with rd_rsspin protection
For the last process to close a file opened for write, function
gfs2_rsqa_delete was deleting the file's inode's block reservation
out of the rgrp reservations tree. Then it was checking to make sure
rs_free was 0, but it was performing the check outside the protection
of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed
to prevent a race between the process freeing the reservation and
another who is allocating a new set of blocks inside the same rgrp
for the same inode, thus changing its value.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-07-12 11:48:22 -05:00
Linus Torvalds
08d27eb206 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  posix_acl: de-union a_refcount and a_rcu
  nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed
  Use the right predicate in ->atomic_open() instances
2016-07-12 16:49:01 +09:00
Sachin Prabhu
5b23c97d7e Add MF-Symlinks support for SMB 2.0
We should be able to use the same helper functions used for SMB 2.1 and
later versions.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-07-11 22:20:54 -05:00
Chuck Lever
a4e187d83d NFS: Don't drop CB requests with invalid principals
Before commit 778be232a2 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.

Fixes: 778be232a2 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11 15:50:43 -04:00
Jaegeuk Kim
a7550b30ab ext4 crypto: migrate into vfs's crypto engine
This patch removes the most parts of internal crypto codes.
And then, it modifies and adds some ext4-specific crypt codes to use the generic
facility.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-10 14:01:03 -04:00
Tal Shorer
3dc3afadeb configfs: don't set buffer_needs_fill to zero if show() returns error
A confgifs attribute's show() callback is called once the first time
the user attempts to read from it. If it returns an error, that
error is returned to the user. However, the open file's
buffer_needs_fill is still set to zero and consecutive read() calls
will find an empty buffer that doesn't need filling and return 0 to
the user. This could give the user the wrong impression that the
attribute was read successfully.

Fix this by not setting buffer_needs_fill if show() returns an error,
making consecutive read() calls call show() again and either get an
error again or get data.

Signed-off-by: Tal Shorer <tal.shorer@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-07-10 21:02:18 +09:00
Mauro Carvalho Chehab
cbb5c8355a Merge branch 'topic/cec' into patchwork
* topic/cec:
  [media] DocBook/media: add CEC documentation
  [media] s5p_cec: get rid of an unused var
  [media] move s5p-cec to staging
  [media] vivid: add CEC emulation
  [media] cec: s5p-cec: Add s5p-cec driver
  [media] cec: adv7511: add cec support
  [media] cec: adv7842: add cec support
  [media] cec: adv7604: add cec support
  [media] cec: add compat32 ioctl support
  [media] cec/TODO: add TODO file so we know why this is still in staging
  [media] cec: add HDMI CEC framework (api)
  [media] cec: add HDMI CEC framework (adapter)
  [media] cec: add HDMI CEC framework (core)
  [media] cec-funcs.h: static inlines to pack/unpack CEC messages
  [media] cec.h: add cec header
  [media] cec-edid: add module for EDID CEC helper functions
  [media] cec.txt: add CEC framework documentation
  [media] rc: Add HDMI CEC protocol handling
2016-07-08 18:16:10 -03:00
Jaegeuk Kim
b56ab837a0 f2fs: avoid mark_inode_dirty
Let's check inode's dirtiness before calling mark_inode_dirty.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:34:09 -07:00
Jaegeuk Kim
a2ee0a3003 f2fs: move i_size_write in f2fs_write_end
We don't need to do i_size_write under page lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:35 -07:00
Chao Yu
c24a0fd655 f2fs: fix to avoid redundant discard during fstrim
With below test steps, f2fs will issue redundant discard when doing fstrim,
the reason is that we issue discards for both prefree segments and
consecutive freed region user wants to trim, part regions they covered are
overlapped, here, we change to do not to issue any discards for prefree
segments in trimmed range.

1. mount -t f2fs -o discard /dev/zram0 /mnt/f2fs
2. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/
3. dd if=/dev/zero  of=/mnt/f2fs/a bs=2M count=1
4. dd if=/dev/zero  of=/mnt/f2fs/b bs=1M count=1
5. sync
6. rm /mnt/f2fs/a /mnt/f2fs/b
7. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/

Before:
<...>-5428  [001] ...1  9511.052125: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x200
<...>-5428  [001] ...1  9511.052787: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300

After:
<...>-6764  [000] ...1  9720.382504: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:34 -07:00
Yunlei He
c7b41e1613 f2fs: avoid mismatching block range for discard
This patch skip discard block range smaller than trim_minlen,
and can not be merged by neighbour

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:33 -07:00
Chao Yu
3e6d0b4d9c f2fs: fix incorrect f_bfree calculation in ->statfs
As manual described, f_bfree indicates total free blocks in fs, in f2fs, it
includes two parts: visible free blocks and over-provision blocks. This
patch corrrects the calculation.

fsblkcnt_t   f_bfree;   /* free blocks in fs */

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:32 -07:00
Jaegeuk Kim
ec795418c4 f2fs: use percpu_rw_semaphore
This patch replaces rw_semaphore with percpu_rw_semaphore for:
sbi->cp_rwsem
nm_i->nat_tree_lock

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:31 -07:00
Jaegeuk Kim
3bdad3c7ee f2fs: skip to check the block address of node page
If the node page is up-to-date, it should be alive.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:31 -07:00
Jaegeuk Kim
2555a2d558 f2fs: shrink critical region in spin_lock
This patch shrinks the critical region in spin_lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:30 -07:00
Jaegeuk Kim
237c0790e5 f2fs: call SetPageUptodate if needed
SetPageUptodate() issues memory barrier, resulting in performance degrdation.
Let's avoid that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:29 -07:00
Jaegeuk Kim
fe76b796fc f2fs: introduce f2fs_set_page_dirty_nobuffer
This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
overall performance.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:28 -07:00
Tiezhu Yang
a0995af695 f2fs: remove unnecessary goto statement
When base_addr is NULL, there is no need to call kzfree,
it should return -ENOMEM directly. Additionally, it is
better to initialize variable 'error' with 0.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:27 -07:00
Chao Yu
64058be9c8 f2fs: add nodiscard mount option
This patch adds 'nodiscard' mount option.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:26 -07:00
Chao Yu
72e1c797b5 f2fs: fix to redirty page if fail to gc data page
If we fail to move data page during foreground GC, we should give another
chance to writeback that page which was set dirty previously by writer.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:26 -07:00
Chao Yu
1563ac75e7 f2fs: fix to detect truncation prior rather than EIO during read
In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:25 -07:00
Chao Yu
78682f7944 f2fs: fix to avoid reading out encrypted data in page cache
For encrypted inode, if user overwrites data of the inode, f2fs will read
encrypted data into page cache, and then do the decryption.

However reader can race with overwriter, and it will see encrypted data
which has not been decrypted by overwriter yet. Fix it by moving decrypting
work to background and keep page non-uptodated until data is decrypted.

Thread A				Thread B
- f2fs_file_write_iter
 - __generic_file_write_iter
  - generic_perform_write
   - f2fs_write_begin
    - f2fs_submit_page_bio
					- generic_file_read_iter
					 - do_generic_file_read
					  - lock_page_killable
					  - unlock_page
					  - copy_page_to_iter
					  hit the encrypted data in updated page
    - lock_page
    - fscrypt_decrypt_page

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:24 -07:00
Linus Torvalds
b987c759d2 Merge tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
 "Provide a more concise fix for CVE-2016-1583:
   - Additionally fixes linux-stable regressions caused by the
     cherry-picking of the original fix

  Some very minor changes that have queued up:
   - Fix typos in code comments
   - Remove unnecessary check for NULL before destroying kmem_cache"

* tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: don't allow mmap when the lower fs doesn't support it
  Revert "ecryptfs: forbid opening files without mmap handler"
  ecryptfs: fix spelling mistakes
  eCryptfs: fix typos in comment
  ecryptfs: drop null test before destroy functions
2016-07-08 09:48:28 -07:00