Commit Graph

49262 Commits

Author SHA1 Message Date
Anna Schumaker
67911c8f18 NFS: Add nfs_commit_file()
Copy will use this to set up a commit request for a generic range.  I
don't want to allocate a new pagecache entry for the file, so I needed
to change parts of the commit path to handle requests with a null
wb_page.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:47:55 -04:00
Olga Kornievskaia
c2985d001d Fixing oops in callback path
Commit 80f9642724 ("NFSv4.x: Enforce the ca_maxreponsesize_cached
on the back channel") causes an oops when it receives a callback with
cachethis=yes.

[  109.667378] BUG: unable to handle kernel NULL pointer dereference at 00000000000002c8
[  109.669476] IP: [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
[  109.671216] PGD 0
[  109.671736] Oops: 0000 [#1] SMP
[  109.705427] CPU: 1 PID: 3579 Comm: nfsv4.1-svc Not tainted 4.5.0-rc1+ #1
[  109.706987] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[  109.709468] task: ffff8800b4408000 ti: ffff88008448c000 task.ti: ffff88008448c000
[  109.711207] RIP: 0010:[<ffffffffa08a3e68>]  [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
[  109.713521] RSP: 0018:ffff88008448fca0  EFLAGS: 00010286
[  109.714762] RAX: ffff880081ee202c RBX: ffff8800b7b5b600 RCX: 0000000000000001
[  109.716427] RDX: 0000000000000008 RSI: 0000000000000008 RDI: 0000000000000000
[  109.718091] RBP: ffff88008448fda8 R08: 0000000000000000 R09: 000000000b000000
[  109.719757] R10: ffff880137786000 R11: ffff8800b7b5b600 R12: 0000000001000000
[  109.721415] R13: 0000000000000002 R14: 0000000053270000 R15: 000000000000000b
[  109.723061] FS:  0000000000000000(0000) GS:ffff880139640000(0000) knlGS:0000000000000000
[  109.724931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.726278] CR2: 00000000000002c8 CR3: 0000000034d50000 CR4: 00000000001406e0
[  109.727972] Stack:
[  109.728465]  ffff880081ee202c ffff880081ee201c 000000008448fcc0 ffff8800baccb800
[  109.730349]  ffff8800baccc800 ffffffffa08d0380 0000000000000000 0000000000000000
[  109.732211]  ffff8800b7b5b600 0000000000000001 ffffffff81d073c0 ffff880081ee3090
[  109.734056] Call Trace:
[  109.734657]  [<ffffffffa03795d4>] svc_process_common+0x5c4/0x6c0 [sunrpc]
[  109.736267]  [<ffffffffa0379a4c>] bc_svc_process+0x1fc/0x360 [sunrpc]
[  109.737775]  [<ffffffffa08a2c2c>] nfs41_callback_svc+0x10c/0x1d0 [nfsv4]
[  109.739335]  [<ffffffff810cb380>] ? prepare_to_wait_event+0xf0/0xf0
[  109.740799]  [<ffffffffa08a2b20>] ? nfs4_callback_svc+0x50/0x50 [nfsv4]
[  109.742349]  [<ffffffff810a6998>] kthread+0xd8/0xf0
[  109.743495]  [<ffffffff810a68c0>] ? kthread_park+0x60/0x60
[  109.744776]  [<ffffffff816abc4f>] ret_from_fork+0x3f/0x70
[  109.746037]  [<ffffffff810a68c0>] ? kthread_park+0x60/0x60
[  109.747324] Code: cc 45 31 f6 48 8b 85 00 ff ff ff 44 89 30 48 8b 85 f8 fe ff ff 44 89 20 48 8b 9d 38 ff ff ff 48 8b bd 30 ff ff ff 48 85 db 74 4c <4c> 8b af c8 02 00 00 4d 8d a5 08 02 00 00 49 81 c5 98 02 00 00
[  109.754361] RIP  [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
[  109.756123]  RSP <ffff88008448fca0>
[  109.756951] CR2: 00000000000002c8
[  109.757738] ---[ end trace 2b8555511ab5dfb4 ]---
[  109.758819] Kernel panic - not syncing: Fatal exception
[  109.760126] Kernel Offset: disabled
[  118.938934] ---[ end Kernel panic - not syncing: Fatal exception

It doesn't unlock the table nor does it set the cps->clp pointer which
is later needed by nfs4_cb_free_slot().

Fixes: 80f9642724 ("NFSv4.x: Enforce the ca_maxresponsesize_cached ...")
CC: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:45:00 -04:00
Steve French
897fba1172 remove directory incorrectly tries to set delete on close on non-empty directories
Wrong return code was being returned on SMB3 rmdir of
non-empty directory.

For SMB3 (unlike for cifs), we attempt to delete a directory by
set of delete on close flag on the open. Windows clients set
this flag via a set info (SET_FILE_DISPOSITION to set this flag)
which properly checks if the directory is empty.

With this patch on smb3 mounts we correctly return
 "DIRECTORY NOT EMPTY"
on attempts to remove a non-empty directory.

Signed-off-by: Steve French <steve.french@primarydata.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
2016-05-17 14:09:44 -05:00
Steve French
5a4f7e8e7f Update cifs.ko version to 2.09
Signed-off-by: Steven French <steve.french@primarydata.com>
2016-05-17 14:09:34 -05:00
Stefan Metzmacher
1a967d6c9b fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication
Only server which map unknown users to guest will allow
access using a non-null NTLMv2_Response.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:34 -05:00
Stefan Metzmacher
777f69b8d2 fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication
Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:34 -05:00
Stefan Metzmacher
fa8f3a354b fs/cifs: correctly to anonymous authentication for the LANMAN authentication
Only server which map unknown users to guest will allow
access using a non-null LMChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:34 -05:00
Stefan Metzmacher
cfda35d982 fs/cifs: correctly to anonymous authentication via NTLMSSP
See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client:

   ...
   Set NullSession to FALSE
   If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND
      AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND
      (AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1)
       OR
       AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0))
       -- Special case: client requested anonymous authentication
       Set NullSession to TRUE
   ...

Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:33 -05:00
Sachin Prabhu
11e31647c9 cifs: remove any preceding delimiter from prefix_path
We currently do not check if any delimiter exists before the prefix
path in cifs_compose_mount_options(). Consequently when building the
devname using cifs_build_devname() we can end up with multiple
delimiters separating the UNC and the prefix path.

An issue was reported by the customer mounting a folder within a DFS
share from a Netapp server which uses McAfee antivirus. We have
narrowed down the cause to the use of double backslashes in the file
name used to open the file. This was determined to be caused because of
additional delimiters as a result of the bug.

In addition to changes in cifs_build_devname(), we also fix
cifs_parse_devname() to ignore any preceding delimiter for the prefix
path.

The problem was originally reported on RHEL 6 in RHEL bz 1252721. This
is the upstream version of the fix. The fix was confirmed by looking at
the packet capture of a DFS mount.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:33 -05:00
Goldwyn Rodrigues
1f1735cb75 cifs: Use file_dentry()
CIFS may be used as lower layer of overlayfs and accessing f_path.dentry can
lead to a crash.

Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:33 -05:00
Linus Torvalds
7f427d3a60 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull parallel filesystem directory handling update from Al Viro.

This is the main parallel directory work by Al that makes the vfs layer
able to do lookup and readdir in parallel within a single directory.
That's a big change, since this used to be all protected by the
directory inode mutex.

The inode mutex is replaced by an rwsem, and serialization of lookups of
a single name is done by a "in-progress" dentry marker.

The series begins with xattr cleanups, and then ends with switching
filesystems over to actually doing the readdir in parallel (switching to
the "iterate_shared()" that only takes the read lock).

A more detailed explanation of the process from Al Viro:
 "The xattr work starts with some acl fixes, then switches ->getxattr to
  passing inode and dentry separately.  This is the point where the
  things start to get tricky - that got merged into the very beginning
  of the -rc3-based #work.lookups, to allow untangling the
  security_d_instantiate() mess.  The xattr work itself proceeds to
  switch a lot of filesystems to generic_...xattr(); no complications
  there.

  After that initial xattr work, the series then does the following:

   - untangle security_d_instantiate()

   - convert a bunch of open-coded lookup_one_len_unlocked() to calls of
     that thing; one such place (in overlayfs) actually yields a trivial
     conflict with overlayfs fixes later in the cycle - overlayfs ended
     up switching to a variant of lookup_one_len_unlocked() sans the
     permission checks.  I would've dropped that commit (it gets
     overridden on merge from #ovl-fixes in #for-next; proper resolution
     is to use the variant in mainline fs/overlayfs/super.c), but I
     didn't want to rebase the damn thing - it was fairly late in the
     cycle...

   - some filesystems had managed to depend on lookup/lookup exclusion
     for *fs-internal* data structures in a way that would break if we
     relaxed the VFS exclusion.  Fixing hadn't been hard, fortunately.

   - core of that series - parallel lookup machinery, replacing
     ->i_mutex with rwsem, making lookup_slow() take it only shared.  At
     that point lookups happen in parallel; lookups on the same name
     wait for the in-progress one to be done with that dentry.

     Surprisingly little code, at that - almost all of it is in
     fs/dcache.c, with fs/namei.c changes limited to lookup_slow() -
     making it use the new primitive and actually switching to locking
     shared.

   - parallel readdir stuff - first of all, we provide the exclusion on
     per-struct file basis, same as we do for read() vs lseek() for
     regular files.  That takes care of most of the needed exclusion in
     readdir/readdir; however, these guys are trickier than lookups, so
     I went for switching them one-by-one.  To do that, a new method
     '->iterate_shared()' is added and filesystems are switched to it
     as they are either confirmed to be OK with shared lock on directory
     or fixed to be OK with that.  I hope to kill the original method
     come next cycle (almost all in-tree filesystems are switched
     already), but it's still not quite finished.

   - several filesystems get switched to parallel readdir.  The
     interesting part here is dealing with dcache preseeding by readdir;
     that needs minor adjustment to be safe with directory locked only
     shared.

     Most of the filesystems doing that got switched to in those
     commits.  Important exception: NFS.  Turns out that NFS folks, with
     their, er, insistence on VFS getting the fuck out of the way of the
     Smart Filesystem Code That Knows How And What To Lock(tm) have
     grown the locking of their own.  They had their own homegrown
     rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink
     is the reader there).  Of course, with VFS getting the fuck out of
     the way, as requested, the actual smarts of the smart filesystem
     code etc. had become exposed...

   - do_last/lookup_open/atomic_open cleanups.  As the result, open()
     without O_CREAT locks the directory only shared.  Including the
     ->atomic_open() case.  Backmerge from #for-linus in the middle of
     that - atomic_open() fix got brought in.

   - then comes NFS switch to saner (VFS-based ;-) locking, killing the
     homegrown "lookup and readdir are writers" kinda-sorta rwsem.  All
     exclusion for sillyunlink/lookup is done by the parallel lookups
     mechanism.  Exclusion between sillyunlink and rmdir is a real rwsem
     now - rmdir being the writer.

     Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel
     now.

   - the rest of the series consists of switching a lot of filesystems
     to parallel readdir; in a lot of cases ->llseek() gets simplified
     as well.  One backmerge in there (again, #for-linus - rockridge
     fix)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits)
  ext4: switch to ->iterate_shared()
  hfs: switch to ->iterate_shared()
  hfsplus: switch to ->iterate_shared()
  hostfs: switch to ->iterate_shared()
  hpfs: switch to ->iterate_shared()
  hpfs: handle allocation failures in hpfs_add_pos()
  gfs2: switch to ->iterate_shared()
  f2fs: switch to ->iterate_shared()
  afs: switch to ->iterate_shared()
  befs: switch to ->iterate_shared()
  befs: constify stuff a bit
  isofs: switch to ->iterate_shared()
  get_acorn_filename(): deobfuscate a bit
  btrfs: switch to ->iterate_shared()
  logfs: no need to lock directory in lseek
  switch ecryptfs to ->iterate_shared
  9p: switch to ->iterate_shared()
  fat: switch to ->iterate_shared()
  romfs, squashfs: switch to ->iterate_shared()
  more trivial ->iterate_shared conversions
  ...
2016-05-17 11:01:31 -07:00
Linus Torvalds
9a07a79684 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:

   - Crypto self tests can now be disabled at boot/run time.
   - Add async support to algif_aead.

  Algorithms:

   - A large number of fixes to MPI from Nicolai Stange.
   - Performance improvement for HMAC DRBG.

  Drivers:

   - Use generic crypto engine in omap-des.
   - Merge ppc4xx-rng and crypto4xx drivers.
   - Fix lockups in sun4i-ss driver by disabling IRQs.
   - Add DMA engine support to ccp.
   - Reenable talitos hash algorithms.
   - Add support for Hisilicon SoC RNG.
   - Add basic crypto driver for the MXC SCC.

  Others:

   - Do not allocate crypto hash tfm in NORECLAIM context in ecryptfs"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
  crypto: qat - change the adf_ctl_stop_devices to void
  crypto: caam - fix caam_jr_alloc() ret code
  crypto: vmx - comply with ABIs that specify vrsave as reserved.
  crypto: testmgr - Add a flag allowing the self-tests to be disabled at runtime.
  crypto: ccp - constify ccp_actions structure
  crypto: marvell/cesa - Use dma_pool_zalloc
  crypto: qat - make adf_vf_isr.c dependant on IOV config
  crypto: qat - Fix typo in comments
  lib: asn1_decoder - add MODULE_LICENSE("GPL")
  crypto: omap-sham - Use dma_request_chan() for requesting DMA channel
  crypto: omap-des - Use dma_request_chan() for requesting DMA channel
  crypto: omap-aes - Use dma_request_chan() for requesting DMA channel
  crypto: omap-des - Integrate with the crypto engine framework
  crypto: s5p-sss - fix incorrect usage of scatterlists api
  crypto: s5p-sss - Fix missed interrupts when working with 8 kB blocks
  crypto: s5p-sss - Use common BIT macro
  crypto: mxc-scc - fix unwinding in mxc_scc_crypto_register()
  crypto: mxc-scc - signedness bugs in mxc_scc_ablkcipher_req_init()
  crypto: talitos - fix ahash algorithms registration
  crypto: ccp - Ensure all dependencies are specified
  ...
2016-05-17 09:33:39 -07:00
Dan Williams
8b3db9798c dax: fallback from pmd to pte on error
In preparation for consulting a badblocks list in pmem_direct_access(),
teach dax_pmd_fault() to fallback rather than fail immediately upon
encountering an error.  The thought being that reducing the span of the
dax request may avoid the error region.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:13 -06:00
Toshi Kani
a8078b1fc6 block: Update blkdev_dax_capable() for consistency
blkdev_dax_capable() is similar to bdev_dax_supported(), but needs
to remain as a separate interface for checking dax capability of
a raw block device.

Rename and relocate blkdev_dax_capable() to keep them maintained
consistently, and call bdev_direct_access() for the dax capability
check.

There is no change in the behavior.

Link: https://lkml.org/lkml/2016/5/9/950
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@fb.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:13 -06:00
Toshi Kani
1e937cddd1 xfs: Add alignment check for DAX mount
When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.

Call bdev_dax_supported() to perform proper precondition checks
which includes this partition alignment check.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:12 -06:00
Toshi Kani
284854be2b ext2: Add alignment check for DAX mount
When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.

Call bdev_dax_supported() to perform proper precondition checks
which includes this partition alignment check.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:12 -06:00
Toshi Kani
87eefeb4e8 ext4: Add alignment check for DAX mount
When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.

Call bdev_dax_supported() to perform proper precondition checks
which includes this partition alignment check.

Reported-by: Micah Parrish <micah.parrish@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:11 -06:00
Toshi Kani
2d96afc8f7 block: Add bdev_dax_supported() for dax mount checks
DAX imposes additional requirements to a device.  Add
bdev_dax_supported() which performs all the precondition checks
necessary for filesystem to mount the device with dax option.

Also add a new check to verify if a partition is aligned by 4KB.
When a partition is unaligned, any dax read/write access fails,
except for metadata update.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@fb.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:11 -06:00
Toshi Kani
2af3a8159c block: Add vfs_msg() interface
In preparation of moving DAX capability checks to the block layer
from filesystem code, add a VFS message interface that aligns with
filesystem's message format.

For instance, a vfs_msg() message followed by XFS messages in case
of a dax mount error may look like:

  VFS (pmem0p1): error: unaligned partition for dax
  XFS (pmem0p1): DAX unsupported by block device. Turning off DAX.
  XFS (pmem0p1): Mounting V5 Filesystem
   :

vfs_msg() is largely based on ext4_msg().

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@fb.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:10 -06:00
Jan Kara
7795bec89e dax: Remove redundant inode size checks
Callers of dax fault handlers must make sure these calls cannot race
with truncate. Thus it is enough to check inode size when entering the
function and we don't have to recheck it again later in the handler.
Note that inode size itself can be decreased while the fault handler
runs but filesystem locking prevents against any radix tree or block
mapping information changes resulting from the truncate and that is what
we really care about.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:10 -06:00
Jan Kara
c3d98e39d5 dax: Remove pointless writeback from dax_do_io()
dax_do_io() is calling filemap_write_and_wait() if DIO_LOCKING flags is
set. Presumably this was copied over from direct IO code. However DAX
inodes have no pagecache pages to write so the call is pointless. Remove
it.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:09 -06:00
Jan Kara
069c77bc9e dax: Remove zeroing from dax_io()
All the filesystems are now zeroing blocks themselves for DAX IO to avoid
races between dax_io() and dax_fault(). Remove the zeroing code from
dax_io() and add warning to catch the case when somebody unexpectedly
returns new or unwritten buffer.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:09 -06:00
Jan Kara
2b10945c53 dax: Remove dead zeroing code from fault handlers
Now that all filesystems zero out blocks allocated for a fault handler,
we can just remove the zeroing from the handler itself. Also add checks
that no filesystem returns to us unwritten or new buffer.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:08 -06:00
Jan Kara
86b0624e42 ext2: Avoid DAX zeroing to corrupt data
Currently ext2 zeroes any data blocks allocated for DAX inode however it
still returns them as BH_New. Thus DAX code zeroes them again in
dax_insert_mapping() which can possibly overwrite the data that has been
already stored to those blocks by a racing dax_io(). Avoid marking
pre-zeroed buffers as new.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:44:07 -06:00
Jan Kara
9b6cd5f76d ext2: Fix block zeroing in ext2_get_blocks() for DAX
When zeroing allocated blocks for DAX, we accidentally zeroed only the
first allocated block instead of all of them. So far this problem is
hidden by the fact that page faults always need only a single block and
DAX write code zeroes blocks again. But the zeroing in DAX code is racy
and needs to be removed so fix the zeroing in ext2 to zero all allocated
blocks.

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-17 00:41:24 -06:00
Al Viro
0e0162bb8c Merge branch 'ovl-fixes' into for-linus
Backmerge to resolve a conflict in ovl_lookup_real();
"ovl_lookup_real(): use lookup_one_len_unlocked()" instead,
but it was too late in the cycle to rebase.
2016-05-17 02:17:59 -04:00
Jan Kara
02fbd13975 dax: Remove complete_unwritten argument
Fault handlers currently take complete_unwritten argument to convert
unwritten extents after PTEs are updated. However no filesystem uses
this anymore as the code is racy. Remove the unused argument.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-16 18:11:51 -06:00
NeilBrown
e4b2749158 DAX: move RADIX_DAX_ definitions to dax.c
These don't belong in radix-tree.c any more than PAGECACHE_TAG_* do.
Let's try to maintain the idea that radix-tree simply implements an
abstract data type.

Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2016-05-16 18:11:51 -06:00
Jaegeuk Kim
10aa97c379 f2fs: manipulate dirty file inodes when DATA_FLUSH is set
It needs to maintain dirty file inodes only if DATA_FLUSH is set.
Otherwise, let's avoid its overhead.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-16 15:32:03 -07:00
Sheng Yong
087968974f f2fs: add fault injection to sysfs
This patch introduces a new struct f2fs_fault_info and a global f2fs_fault
to save fault injection status. Fault injection entries are created in
/sys/fs/f2fs/fault_injection/ during initializing f2fs module.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-16 15:32:02 -07:00
Yunlei He
b951a4ec16 f2fs: no need inc dirty pages under inode lock
No need inc dirty pages under inode lock

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-16 15:32:01 -07:00
Chao Yu
8975bdf482 f2fs: fix incorrect error path handling in f2fs_move_rehashed_dirents
Fix two bugs in error path of f2fs_move_rehashed_dirents:
 - release dir's inode page if fail to call kmalloc
 - recover i_current_depth if fail to converting

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-16 15:32:01 -07:00
Chao Yu
e4103849ba f2fs: fix i_current_depth during inline dentry conversion
With below steps, we will see that dentry page becoming unaccessable later.
This is because we forget updating i_current_depth in inode during inline
dentry conversion, after that, once we failed at somewhere, it will leave
i_current_depth as 0 in non-inline directory. Then, during ->lookup, the
current_depth value makes all dentry pages in first level invisible. Fix
it.

1) mount f2fs with inline_dentry option
2) mkdir dir
3) touch 180 files named [0-179] in dir
4) touch 180 in dir (fail after inline dir conversion)
5) ll dir

ls: cannot access /mnt/f2fs/dir/0: No such file or directory
ls: cannot access /mnt/f2fs/dir/1: No such file or directory
ls: cannot access /mnt/f2fs/dir/2: No such file or directory
ls: cannot access /mnt/f2fs/dir/3: No such file or directory
ls: cannot access /mnt/f2fs/dir/4: No such file or directory

drwxr-xr-x 2 root root 4096  may 13 21:47 ./
drwxr-xr-x 3 root root 4096  may 13 21:46 ../
-????????? ? ?    ?       ?             ? 0
-????????? ? ?    ?       ?             ? 1
-????????? ? ?    ?       ?             ? 10
-????????? ? ?    ?       ?             ? 100
-????????? ? ?    ?       ?             ? 101
-????????? ? ?    ?       ?             ? 102

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-16 15:32:00 -07:00
Sheng Yong
99e3e858a4 f2fs: correct return value type of f2fs_fill_super
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-16 15:31:59 -07:00
Linus Torvalds
49817c3343 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Drop the unused EFI_SYSTEM_TABLES efi.flags bit and ensure the
     ARM/arm64 EFI System Table mapping is read-only (Ard Biesheuvel)

   - Add a comment to explain that one of the code paths in the x86/pat
     code is only executed for EFI boot (Matt Fleming)

   - Improve Secure Boot status checks on arm64 and handle unexpected
     errors (Linn Crosetto)

   - Remove the global EFI memory map variable 'memmap' as the same
     information is already available in efi::memmap (Matt Fleming)

   - Add EFI Memory Attribute table support for ARM/arm64 (Ard
     Biesheuvel)

   - Add EFI GOP framebuffer support for ARM/arm64 (Ard Biesheuvel)

   - Add EFI Bootloader Control driver for storing reboot(2) data in EFI
     variables for consumption by bootloaders (Jeremy Compostella)

   - Add Core EFI capsule support (Matt Fleming)

   - Add EFI capsule char driver (Kweh, Hock Leong)

   - Unify EFI memory map code for ARM and arm64 (Ard Biesheuvel)

   - Add generic EFI support for detecting when firmware corrupts CPU
     status register bits (like IRQ flags) when performing EFI runtime
     service calls (Mark Rutland)

  ... and other misc cleanups"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  efivarfs: Make efivarfs_file_ioctl() static
  efi: Merge boolean flag arguments
  efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
  efibc: Fix excessive stack footprint warning
  efi/capsule: Make efi_capsule_pending() lockless
  efi: Remove unnecessary (and buggy) .memmap initialization from the Xen EFI driver
  efi/runtime-wrappers: Remove ARCH_EFI_IRQ_FLAGS_MASK #ifdef
  x86/efi: Enable runtime call flag checking
  arm/efi: Enable runtime call flag checking
  arm64/efi: Enable runtime call flag checking
  efi/runtime-wrappers: Detect firmware IRQ flag corruption
  efi/runtime-wrappers: Remove redundant #ifdefs
  x86/efi: Move to generic {__,}efi_call_virt()
  arm/efi: Move to generic {__,}efi_call_virt()
  arm64/efi: Move to generic {__,}efi_call_virt()
  efi/runtime-wrappers: Add {__,}efi_call_virt() templates
  efi/arm-init: Reserve rather than unmap the memory map for ARM as well
  efi: Add misc char driver interface to update EFI firmware
  x86/efi: Force EFI reboot to process pending capsules
  efi: Add 'capsule' update support
  ...
2016-05-16 13:06:27 -07:00
George Spelvin
0fed3ac866 namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS
The hash mixing between adding the next 64 bits of name
was just a bit weak.

Replaced with a still very fast but slightly more effective
mixing function.

Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-16 11:35:08 -07:00
David Sterba
680834ca0a Merge branch 'foreign/jeffm/uapi' into for-chris-4.7-20160516
# Conflicts:
#	include/uapi/linux/btrfs.h
2016-05-16 15:46:29 +02:00
David Sterba
36fac9e9ff Merge branch 'foreign/anand/dev-del-by-id-ext' into for-chris-4.7-20160516 2016-05-16 15:46:26 +02:00
David Sterba
5ef64a3e75 Merge branch 'cleanups-4.7' into for-chris-4.7-20160516 2016-05-16 15:46:24 +02:00
Scott Talbert
4673272f43 btrfs: fix memory leak during RAID 5/6 device replacement
A 'struct bio' is allocated in scrub_missing_raid56_pages(), but it was never
freed anywhere.

Signed-off-by: Scott Talbert <scott.talbert@hgst.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-16 10:17:58 +02:00
David S. Miller
909b27f706 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next'
because we no longer have a per-netns conntrack hash.

The ip_gre.c conflict as well as the iwlwifi ones were cases of
overlapping changes.

Conflicts:
	drivers/net/wireless/intel/iwlwifi/mvm/tx.c
	net/ipv4/ip_gre.c
	net/netfilter/nf_conntrack_core.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-15 13:32:48 -04:00
Linus Torvalds
6ba5b85fd4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Overlayfs fixes from Miklos, assorted fixes from me.

  Stable fodder of varying severity, all sat in -next for a while"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ovl: ignore permissions on underlying lookup
  vfs: add lookup_hash() helper
  vfs: rename: check backing inode being equal
  vfs: add vfs_select_inode() helper
  get_rock_ridge_filename(): handle malformed NM entries
  ecryptfs: fix handling of directory opening
  atomic_open(): fix the handling of create_error
  fix the copy vs. map logics in blk_rq_map_user_iov()
  do_splice_to(): cap the size before passing to ->splice_read()
2016-05-14 11:59:43 -07:00
Linus Torvalds
1410b74e40 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "During v4.6-rc1 cgroup namespace support was merged.  There is an
  issue where it's impossible to tell whether a given cgroup mount point
  is bind mounted or namespaced.  Serge has been working on the issue
  but it took longer than expected to resolve, so the late pull request.

  Given that it's a completely new feature and the patches don't touch
  anything else, the risk seems acceptable.  However, if this is too
  late, an alternative is plugging new cgroup ns creation for v4.6 and
  retrying for v4.7"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix compile warning
  kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
  cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
  kernfs_path_from_node_locked: don't overwrite nlen
2016-05-13 16:26:46 -07:00
Chuck Lever
6625d09137 svcrdma: Do not add XDR padding to xdr_buf page vector
An xdr_buf has a head, a vector of pages, and a tail. Each
RPC request is presented to the NFS server contained in an
xdr_buf.

The RDMA transport would like to supply the NFS server with only
the NFS WRITE payload bytes in the page vector. In some common
cases, that would allow the NFS server to swap those pages right
into the target file's page cache.

Have the transport's RDMA Read logic put XDR pad bytes in the tail
iovec, and not in the pages that hold the data payload.

The NFSv3 WRITE XDR decoder is finicky about the lengths involved,
so make sure it is looking in the correct places when computing
the total length of the incoming NFS WRITE request.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-05-13 15:53:05 -04:00
Jeff Layton
14b7f4a1ed nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid
Move the existing static function to an inline helper, and call it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-05-13 15:34:47 -04:00
Daniel Wagner
e55d531244 crash_dump: Add vmcore_elf32_check_arch
parse_crash_elf{32|64}_headers will check the headers via the
elf_check_arch respectively vmcore_elf64_check_arch macro.

The MIPS architecture implements those two macros differently.
In order to make the differentiation more explicit, let's introduce
an vmcore_elf32_check_arch to allow the archs to overwrite it.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Suggested-by: Maciej W. Rozycki <macro@imgtec.com>
Reviewed-by: Maciej W. Rozycki <macro@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12535/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-05-13 14:01:59 +02:00
Jan Kara
12735f8819 ext4: pre-zero allocated blocks for DAX IO
Currently ext4 treats DAX IO the same way as direct IO. I.e., it
allocates unwritten extents before IO is done and converts unwritten
extents afterwards. However this way DAX IO can race with page fault to
the same area:

ext4_ext_direct_IO()				dax_fault()
  dax_io()
    get_block() - allocates unwritten extent
    copy_from_iter_pmem()
						  get_block() - converts
						    unwritten block to
						    written and zeroes it
						    out
  ext4_convert_unwritten_extents()

So data written with DAX IO gets lost. Similarly dax_new_buf() called
from dax_io() can overwrite data that has been already written to the
block via mmap.

Fix the problem by using pre-zeroed blocks for DAX IO the same way as we
use them for DAX mmap. The downside of this solution is that every
allocating write writes each block twice (once zeros, once data). Fixing
the race with locking is possible as well however we would need to
lock-out faults for the whole range written to by DAX IO. And that is
not easy to do without locking-out faults for the whole file which seems
too aggressive.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-13 00:51:15 -04:00
Jan Kara
914f82a32d ext4: refactor direct IO code
Currently ext4 direct IO handling is split between ext4_ext_direct_IO()
and ext4_ind_direct_IO(). However the extent based function calls into
the indirect based one for some cases and for example it is not able to
handle file extending. Previously it was not also properly handling
retries in case of ENOSPC errors. With DAX things would get even more
contrieved so just refactor the direct IO code and instead of indirect /
extent split do the split to read vs writes.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-13 00:44:16 -04:00
Jan Kara
dbc427ce40 ext4: fix race in transient ENOSPC detection
When there are blocks to free in the running transaction, block
allocator can return ENOSPC although the filesystem has some blocks to
free. We use ext4_should_retry_alloc() to force commit of the current
transaction and return whether anything was committed so that it makes
sense to retry the allocation. However the transaction may get committed
after block allocation fails but before we call
ext4_should_retry_alloc(). So ext4_should_retry_alloc() returns false
because there is nothing to commit and we wrongly return ENOSPC.

Fix the race by unconditionally returning 1 from ext4_should_retry_alloc()
when we tried to commit a transaction. This should not add any
unnecessary retries since we had a transaction running a while ago when
trying to allocate blocks and we want to retry the allocation once that
transaction has committed anyway.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-13 00:42:40 -04:00
Jan Kara
7cb476f834 ext4: handle transient ENOSPC properly for DAX
ext4_dax_get_blocks() was accidentally omitted fixing get blocks
handlers to properly handle transient ENOSPC errors. Fix it now to use
ext4_get_blocks_trans() helper which takes care of these errors.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-13 00:38:16 -04:00