Commit Graph

47524 Commits

Author SHA1 Message Date
David Sterba
3e4c5efbb3 btrfs: add free space tree to the cow-only list
Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-25 16:48:07 +01:00
David Sterba
6b20e0ad2e btrfs: add free space tree to lockdep classes
Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-25 16:48:06 +01:00
Trond Myklebust
810d82e683 NFSv4.x: Allow multiple callbacks in flight
Hook the callback channel into the same session management machinery
as we use for the forward channel.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-25 09:36:21 -05:00
Trond Myklebust
5f83d86cf5 NFSv4.x: Fix wraparound issues when validing the callback sequence id
We need to make sure that we don't allow args->csa_sequenceid == 0.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-24 17:12:49 -05:00
Trond Myklebust
80f9642724 NFSv4.x: Enforce the ca_maxresponsesize_cached on the back channel
We have no duplicate reply cache, so we always set the back channel
ca_maxresponsesize_cached to zero when negotiating the session.
That means we should always error out as soon as we see the server
set args->csa_cachethis.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-24 17:12:48 -05:00
Trond Myklebust
f74a834a0e NFSv4.x: CB_SEQUENCE should return NFS4ERR_DELAY if still executing
See RFC5661 Section 2.10.6.2: if retrying a request, and the old one is
still in progress, we must return NFS4ERR_DELAY as the reply to sequence.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-24 17:12:48 -05:00
Trond Myklebust
f4f58ed19b NFSv4.x: Remove hard coded slotids in callback channel
Instead, use the values encoded in the slot table itself.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-24 17:12:47 -05:00
Linus Torvalds
e2464688b5 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "This is the main pull request for MIPS for 4.5 plus some 4.4 fixes.

  The executive summary:

   - ATH79 platform improvments, use DT bindings for the ATH79 USB PHY.
   - Avoid useless rebuilds for zboot.
   - jz4780: Add NEMC, BCH and NAND device tree nodes
   - Initial support for the MicroChip's DT platform.  As all the device
     drivers are missing this is still of limited use.
   - Some Loongson3 cleanups.
   - The unavoidable whitespace polishing.
   - Reduce clock skew when synchronizing the CPU cycle counters on CPU
     startup.
   - Add MIPS R6 fixes.
   - Lots of cleanups across arch/mips as fallout from KVM.
   - Lots of minor fixes and changes for IEEE 754-2008 support to the
     FPU emulator / fp-assist software.
   - Minor Ralink, BCM47xx and bcm963xx platform support improvments.
   - Support SMP on BCM63168"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (84 commits)
  MIPS: zboot: Add support for serial debug using the PROM
  MIPS: zboot: Avoid useless rebuilds
  MIPS: BMIPS: Enable ARCH_WANT_OPTIONAL_GPIOLIB
  MIPS: bcm63xx: nvram: Remove unused bcm63xx_nvram_get_psi_size() function
  MIPS: bcm963xx: Update bcm_tag field image_sequence
  MIPS: bcm963xx: Move extended flash address to bcm_tag header file
  MIPS: bcm963xx: Move Broadcom BCM963xx image tag data structure
  MIPS: bcm63xx: nvram: Use nvram structure definition from header file
  MIPS: bcm963xx: Add Broadcom BCM963xx board nvram data structure
  MAINTAINERS: Add KVM for MIPS entry
  MIPS: KVM: Add missing newline to kvm_err()
  MIPS: Move KVM specific opcodes into asm/inst.h
  MIPS: KVM: Use cacheops.h definitions
  MIPS: Break down cacheops.h definitions
  MIPS: Use EXCCODE_ constants with set_except_vector()
  MIPS: Update trap codes
  MIPS: Move Cause.ExcCode trap codes to mipsregs.h
  MIPS: KVM: Make kvm_mips_{init,exit}() static
  MIPS: KVM: Refactor added offsetof()s
  MIPS: KVM: Convert EXPORT_SYMBOL to _GPL
  ...
2016-01-24 12:50:56 -08:00
Linus Torvalds
00e3f5cc30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "The two main changes are aio support in CephFS, and a series that
  fixes several issues in the authentication key timeout/renewal code.

  On top of that are a variety of cleanups and minor bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: remove outdated comment
  libceph: kill off ceph_x_ticket_handler::validity
  libceph: invalidate AUTH in addition to a service ticket
  libceph: fix authorizer invalidation, take 2
  libceph: clear messenger auth_retry flag if we fault
  libceph: fix ceph_msg_revoke()
  libceph: use list_for_each_entry_safe
  ceph: use i_size_{read,write} to get/set i_size
  ceph: re-send AIO write request when getting -EOLDSNAP error
  ceph: Asynchronous IO support
  ceph: Avoid to propagate the invalid page point
  ceph: fix double page_unlock() in page_mkwrite()
  rbd: delete an unnecessary check before rbd_dev_destroy()
  libceph: use list_next_entry instead of list_entry_next
  ceph: ceph_frag_contains_value can be boolean
  ceph: remove unused functions in ceph_frag.h
2016-01-24 12:34:13 -08:00
Linus Torvalds
772950ed21 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull SMB3 fixes from Steve French:
 "A collection of CIFS/SMB3 fixes.

  It includes a couple bug fixes, a few for improved debugging of
  cifs.ko and some improvements to the way cifs does key generation.

  I do have some additional bug fixes I expect in the next week or two
  (to address a problem found by xfstest, and some fixes for SMB3.11
  dialect, and a couple patches that just came in yesterday that I am
  reviewing)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs_dbg() outputs an uninitialized buffer in cifs_readdir()
  cifs: fix race between call_async() and reconnect()
  Prepare for encryption support (first part). Add decryption and encryption key generation. Thanks to Metze for helping with this.
  cifs: Allow using O_DIRECT with cache=loose
  cifs: Make echo interval tunable
  cifs: Check uniqueid for SMB2+ and return -ESTALE if necessary
  Print IP address of unresponsive server
  cifs: Ratelimit kernel log messages
2016-01-24 12:31:12 -08:00
Linus Torvalds
cc673757e2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull final vfs updates from Al Viro:

 - The ->i_mutex wrappers (with small prereq in lustre)

 - a fix for too early freeing of symlink bodies on shmem (they need to
   be RCU-delayed) (-stable fodder)

 - followup to dedupe stuff merged this cycle

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: abort dedupe loop if fatal signals are pending
  make sure that freeing shmem fast symlinks is RCU-delayed
  wrappers for ->i_mutex access
  lustre: remove unused declaration
2016-01-23 12:24:56 -08:00
Al Viro
115b93a859 orangefs: clean up op_alloc()
fold orangefs_op_initialize() in there, don't bother locking something
nobody else could've seen yet, use kmem_cache_zalloc() instead of
explicit memset()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 15:20:11 -05:00
Al Viro
b0bc3a7b62 orangefs: move handle_io_error() to file.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 15:20:11 -05:00
Al Viro
2a9e5c2260 orangefs: don't reinvent completion.h...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 15:20:11 -05:00
Al Viro
4f55e39732 if ORANGEFS_VFS_OP_FILE_IO request had been given up, don't bother waiting
... we are not going to get woken up anyway, so it's just going to time out
and whine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 15:20:11 -05:00
Al Viro
727cbfea62 orangefs: get rid of MSECS_TO_JIFFIES
All timeouts are in _seconds_, so all calls are of form
MSECS_TO_JIFFIES(n * 1000), which is a convoluted way to
spell n * HZ.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 15:20:11 -05:00
Al Viro
eab9b38939 orangefs_clean_up_interrupted_operation: call with op->lock held
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 15:15:09 -05:00
Al Viro
70c6ea26ff orangefs: reduce nesting in wait_for_matching_downcall()
reorder if branches...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 15:15:09 -05:00
Al Viro
e1056a9cc3 orangefs: remove cargo-culting spin_lock_irqsave() in service_operation()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 15:15:09 -05:00
Linus Torvalds
fa7d9a1d28 Merge tag 'nfs-for-4.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes and cleanups from Trond Myklebust:
 "Bugfixes:
   - pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn
   - pNFS/flexfiles: Improve merging of errors in LAYOUTRETURN

  Cleanups:
   - NFS: Simplify nfs_request_add_commit_list() arguments"

* tag 'nfs-for-4.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn
  NFS: Simplify nfs_request_add_commit_list() arguments
  pNFS/flexfiles: Improve merging of errors in LAYOUTRETURN
2016-01-23 11:47:13 -08:00
Al Viro
ed42fe0593 orangefs: hopefully saner op refcounting and locking
* create with refcount 1
* make op_release() decrement and free if zero (i.e. old put_op()
  has become that).
* mark when submitter has given up waiting; from that point nobody
  else can move between the lists, change state, etc.
* have daemon read/write_iter grab a reference when picking op
  and *always* give it up in the end
* don't put into hash until we know it's been successfully passed to
  daemon

* move op->lock _lower_ than htab_in_progress_lock (and make sure
  to take it in purge_inprogress_ops())

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 13:03:12 -05:00
Al Viro
fee25ce125 orangefs: make sure that reopening pvfs2-req won't overlap with the end of close
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:55:24 -05:00
Al Viro
96acf9d65e orangefs: nothing should remain in request list and in hash
... otherwise some thread is running in .text that is about to
be freed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:55:24 -05:00
Al Viro
60831949cc orangefs: move wakeups into set_op_state_{serviced,purged}()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:55:24 -05:00
Al Viro
ade3d78104 orangefs: make wait_for_...downcall() static
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:48:02 -05:00
Al Viro
831d094979 orangefs: move wakeups into set_op_state_{serviced,purged}()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
b7ae37b09e orangefs: make wait_for_...downcall() static
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
e07db0a2c2 make orangefs_clean_up_interrupted_operation() static
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
1264ddfdb7 orangefs: kill orangefs_inode_s ->list
no users...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
fc916da52d orangefs: get rid of <censored> macros
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
90e54e36c9 orangefs: ->poll() doesn't need spinlock
not just for list_empty()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
8016387ce7 orangefs: kill ioctl32 rudiments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
83595db052 orangefs: ->poll() is only called between successful ->open() and ->release()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
fb6d2526e9 orangefs: generic_file_open() is pointless for character devices
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Al Viro
3e1dd9aa82 orangefs: use DEFINE_MUTEX (and mutex_init() had been too late)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-23 12:42:43 -05:00
Darrick J. Wong
e62e560fc8 vfs: abort dedupe loop if fatal signals are pending
If the program running dedupe receives a fatal signal during the
dedupe loop, we should bail out to avoid tying up the system.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 20:29:55 -05:00
Tetsuo Handa
1d5cfdb076 tree wide: use kvfree() than conditional kfree()/vfree()
There are many locations that do

  if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
  else
    kfree(ptr);

but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
using is_vmalloc_addr().  Unless callers have special reasons, we can
replace this branch with kvfree().  Please check and reply if you found
problems.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Boris Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Ross Zwisler
eab95db69d dax: never rely on bh.b_dev being set by get_block()
Previously in DAX we assumed that calls to get_block() would set
bh.b_bdev, and we would then use that value even in error cases for
debugging.  This caused a NULL pointer dereference in __dax_dbg() which
was fixed by a previous commit, but that commit only changed the one
place where we were hitting an error.

Instead, update dax.c so that we always initialize bh.b_bdev as best we
can based on the information that DAX has.  get_block() may or may not
update to a new value, but this at least lets us get something helpful
from bh.b_bdev for error messages and not have to worry about whether it
was set by get_block() or not.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Ross Zwisler
5eb88dca9c xfs: call dax_pfn_mkwrite() for DAX fsync/msync
To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can track when user pages are
dirtied.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Ross Zwisler
d5be7a03b0 ext4: call dax_pfn_mkwrite() for DAX fsync/msync
To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can track when user pages are
dirtied.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Ross Zwisler
80b4adcafc ext2: call dax_pfn_mkwrite() for DAX fsync/msync
To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can track when user pages are
dirtied.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Ross Zwisler
9973c98ecf dax: add support for fsync/sync
To properly handle fsync/msync in an efficient way DAX needs to track
dirty pages so it is able to flush them durably to media on demand.

The tracking of dirty pages is done via the radix tree in struct
address_space.  This radix tree is already used by the page writeback
infrastructure for tracking dirty pages associated with an open file,
and it already has support for exceptional (non struct page*) entries.
We build upon these features to add exceptional entries to the radix
tree for DAX dirty PMD or PTE pages at fault time.

[dan.j.williams@intel.com: fix dax_pmd_dbg build warning]
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Ross Zwisler
f9fe48bece dax: support dirty DAX entries in radix tree
Add support for tracking dirty DAX entries in the struct address_space
radix tree.  This tree is already used for dirty page writeback, and it
already supports the use of exceptional (non struct page*) entries.

In order to properly track dirty DAX pages we will insert new
exceptional entries into the radix tree that represent dirty DAX PTE or
PMD pages.  These exceptional entries will also contain the writeback
addresses for the PTE or PMD faults that we can use at fsync/msync time.

There are currently two types of exceptional entries (shmem and shadow)
that can be placed into the radix tree, and this adds a third.  We rely
on the fact that only one type of exceptional entry can be found in a
given radix tree based on its usage.  This happens for free with DAX vs
shmem but we explicitly prevent shadow entries from being added to radix
trees for DAX mappings.

The only shadow entries that would be generated for DAX radix trees
would be to track zero page mappings that were created for holes.  These
pages would receive minimal benefit from having shadow entries, and the
choice to have only one type of exceptional entry in a given radix tree
makes the logic simpler both in clear_exceptional_entry() and in the
rest of DAX.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Ross Zwisler
de14b9cb5e dax: fix conversion of holes to PMDs
When we get a DAX PMD fault for a write it is possible that there could
be some number of 4k zero pages already present for the same range that
were inserted to service reads from a hole.  These 4k zero pages need to
be unmapped from the VMAs and removed from the struct address_space
radix tree before the real DAX PMD entry can be inserted.

For PTE faults this same use case also exists and is handled by a
combination of unmap_mapping_range() to unmap the VMAs and
delete_from_page_cache() to remove the page from the address_space radix
tree.

For PMD faults we do have a call to unmap_mapping_range() (protected by
a buffer_new() check), but nothing clears out the radix tree entry.  The
buffer_new() check is also incorrect as the current ext4 and XFS
filesystem code will never return a buffer_head with BH_New set, even
when allocating new blocks over a hole.  Instead the filesystem will
zero the blocks manually and return a buffer_head with only BH_Mapped
set.

Fix this situation by removing the buffer_new() check and adding a call
to truncate_inode_pages_range() to clear out the radix tree entries
before we insert the DAX PMD.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Ross Zwisler
d4bbe7068b dax: fix NULL pointer dereference in __dax_dbg()
In __dax_pmd_fault() we currently assume that get_block() will always
set bh.b_bdev and we unconditionally dereference it in __dax_dbg().

This assumption isn't always true - when called for reads of holes
ext4_dax_mmap_get_block() returns a buffer head where bh->b_bdev is
never set.  I hit this BUG while testing the DAX PMD fault path.

Instead, initialize bh.b_bdev before passing bh into get_block().  It is
possible that the filesystem's get_block() will update bh.b_bdev, and
this is fine - we just want to initialize bh.b_bdev to something
reasonable so that the calls to __dax_dbg() work and print something
useful.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Al Viro
5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Linus Torvalds
2101ae4289 Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull more btrfs updates from Chris Mason:
 "These are mostly fixes that we've been testing, but also we grabbed
  and tested a few small cleanups that had been on the list for a while.

  Zhao Lei's patchset also fixes some early ENOSPC buglets"

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (21 commits)
  btrfs: raid56: Use raid_write_end_io for scrub
  btrfs: Remove unnecessary ClearPageUptodate for raid56
  btrfs: use rbio->nr_pages to reduce calculation
  btrfs: Use unified stripe_page's index calculation
  btrfs: Fix calculation of rbio->dbitmap's size calculation
  btrfs: Fix no_space in write and rm loop
  btrfs: merge functions for wait snapshot creation
  btrfs: delete unused argument in btrfs_copy_from_user
  btrfs: Use direct way to determine raid56 write/recover mode
  btrfs: Small cleanup for get index_srcdev loop
  btrfs: Enhance chunk validation check
  btrfs: Enhance super validation check
  Btrfs: fix deadlock running delayed iputs at transaction commit time
  Btrfs: fix typo in log message when starting a balance
  btrfs: remove duplicate const specifier
  btrfs: initialize the seq counter in struct btrfs_device
  Btrfs: clean up an error code in btrfs_init_space_info()
  btrfs: fix iterator with update error in backref.c
  Btrfs: fix output of compression message in btrfs_parse_options()
  Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots
  ...
2016-01-22 11:49:21 -08:00
Linus Torvalds
391f2a16b7 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Some locking and page fault bug fixes from Jan Kara, some ext4
  encryption fixes from me, and Li Xi's Project Quota commits"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fs: clean up the flags definition in uapi/linux/fs.h
  ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
  ext4: add project quota support
  ext4: adds project ID support
  ext4 crypto: simplify interfaces to directory entry insert functions
  ext4 crypto: add missing locking for keyring_key access
  ext4: use pre-zeroed blocks for DAX page faults
  ext4: implement allocation of pre-zeroed blocks
  ext4: provide ext4_issue_zeroout()
  ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag
  ext4: document lock ordering
  ext4: fix races of writeback with punch hole and zero range
  ext4: fix races between buffered IO and collapse / insert range
  ext4: move unlocked dio protection from ext4_alloc_file_blocks()
  ext4: fix races between page faults and hole punching
2016-01-22 11:23:35 -08:00
Linus Torvalds
d5ffdf8b4a Merge tag 'xfs-for-linus-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull more xfs updates from Dave Chinner:
 "This is the second update for XFS that I mentioned in the original
  pull request last week.

  It contains a revert for a suspend regression in 4.4 and a fix for a
  long standing log recovery issue that has been further exposed by all
  the log recovery changes made in the original 4.5 merge.

  There is one more thing in this pull request - one that I forgot to
  merge into the origin.  That is, pulling the XFS_IOC_FS[GS]ETXATTR
  ioctl up to the VFS level so that other filesystems can also use it
  for modifying project quota IDs

  Summary:

   - promotion of XFS_IOC_FS[GS]ETXATTR ioctl to the vfs level so that
     it can be shared with other filesystems.  The ext4 project quota
     functionality is the first target for this.  The commits in this
     series have not been updated with review or final SOB tags because
     the branch they were originally published in was needed by ext4.
     Those tags are:

        Reviewed-by: Theodore Ts'o <tytso@mit.edu>
        Signed-off-by: Dave Chinner <david@fromrobit.com>

   - Revert a change that is causing suspend failures.

   - Fix a use-after-free that can occur on log mount failures.  Been
     around forever, but now exposed by other changes to log recovery
     made in the first 4.5 merge"

* tag 'xfs-for-linus-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: log mount failures don't wait for buffers to be released
  Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"
  xfs: introduce per-inode DAX enablement
  xfs: use FS_XFLAG definitions directly
  fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR promotion
2016-01-22 10:54:13 -08:00
Linus Torvalds
eadee0ce6f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Embarrassing braino fix + pipe page accounting + fixing an eyesore in
  find_filesystem() (checking that s1 is equal to prefix of s2 of given
  length can be done in many ways, but "compare strlen(s1) with length
  and then do strncmp()" is not a good one...)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [regression] fix braino in fs/dlm/user.c
  pipe: limit the per-user amount of pages allocated in pipes
  find_filesystem(): simplify comparison
2016-01-22 10:24:03 -08:00