Commit Graph

37611 Commits

Author SHA1 Message Date
Brian Foster
2b64ee5cdc xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper
Refactor xfs_difree() in preparation for the finobt. xfs_difree()
performs the validity checks against the ag and reads the agi
header. The work of physically updating the inode allocation btree
is pushed down into the new xfs_difree_inobt() helper.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-24 16:00:53 +10:00
Brian Foster
6dd8638e4e xfs: use and update the finobt on inode allocation
Replace xfs_dialloc_ag() with an implementation that looks for a
record in the finobt. The finobt only tracks records with at least
one free inode. This eliminates the need for the intra-ag scan in
the original algorithm. Once the inode is allocated, update the
finobt appropriately (possibly removing the record) as well as the
inobt.

Move the original xfs_dialloc_ag() algorithm to
xfs_dialloc_ag_inobt() and fall back as such if finobt support is
not enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-24 16:00:53 +10:00
Brian Foster
0aa0a756ec xfs: insert newly allocated inode chunks into the finobt
A newly allocated inode chunk, by definition, has at least one
free inode, so a record is always inserted into the finobt.

Create the xfs_inobt_insert() helper from existing code to insert
a record in an inobt based on the provided BTNUM. Update
xfs_ialloc_ag_alloc() to invoke the helper for the existing
XFS_BTNUM_INO tree and XFS_BTNUM_FINO tree, if enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-24 16:00:53 +10:00
Brian Foster
9d43b180af xfs: update inode allocation/free transaction reservations for finobt
Create the xfs_calc_finobt_res() helper to calculate the finobt log
reservation for inode allocation and free. Update
XFS_IALLOC_SPACE_RES() to reserve blocks for the additional finobt
insertion on inode allocation. Create XFS_IFREE_SPACE_RES() to
reserve blocks for the potential finobt record insertion on inode
free (i.e., if an inode chunk was previously fully allocated).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-24 16:00:52 +10:00
Brian Foster
aafc3c2465 xfs: support the XFS_BTNUM_FINOBT free inode btree type
Define the AGI fields for the finobt root/level and add magic
numbers. Update the btree code to add support for the new
XFS_BTNUM_FINOBT inode btree.

The finobt root block is reserved immediately following the inobt
root block in the AG. Update XFS_PREALLOC_BLOCKS() to determine the
starting AG data block based on whether finobt support is enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-24 16:00:52 +10:00
Brian Foster
8e2c84df20 xfs: reserve v5 superblock read-only compat. feature bit for finobt
Reserve a v5 read-only compatibility feature bit for the finobt and
create the xfs_sb_version_hasfinobt() helper to determine whether
an fs has the feature enabled.

The finobt does not change existing on-disk structures, but must
remain consistent with the ialloc btree. Modifications from older
kernels would violate that constrant. Therefore, we restrict older
kernels to read-only mounts of finobt-enabled filesystems.

Note that this does not yet enable the ability to rw mount a finobt
fs (by setting the feature bit in the XFS_SB_FEAT_RO_COMPAT_ALL
mask).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-24 16:00:52 +10:00
Brian Foster
57bd3dbe40 xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers
The introduction of the free inode btree (finobt) requires that
xfs_ialloc_btree.c handle multiple trees. Refactor xfs_ialloc_btree.c
so the caller specifies the btree type on cursor initialization to
prepare for addition of the finobt.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-24 16:00:50 +10:00
Jeff Layton
cff2fce58b locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead
File-private locks have been re-christened as "open file description"
locks.  Finish the symbol name cleanup in the internal implementation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-23 16:17:03 -04:00
Christoph Hellwig
b94acd4786 xfs: add filestream allocator tracepoints
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23 07:11:52 +10:00
Christoph Hellwig
3b8d90766a xfs: remove xfs_filestream_associate
There is no good reason to create a filestream when a directory entry
is created.  Delay it until the first allocation happens to simply
the code and reduce the amount of mru cache lookups we do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23 07:11:52 +10:00
Christoph Hellwig
1919adda07 xfs: don't create a slab cache for filestream items
We only have very few of these around, and allocation isn't that
much of a hot path.  Remove the slab cache to simplify the code,
and to not waste any resources for the usual case of not having
any inodes that use the filestream allocator.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23 07:11:51 +10:00
Christoph Hellwig
2cd2ef6a30 xfs: rewrite the filestream allocator using the dentry cache
In Linux we will always be able to find a parent inode for file that are
undergoing I/O.  Use this to simply the file stream allocator by only
keeping track of parent inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-04-23 07:11:51 +10:00
Christoph Hellwig
f37211c336 xfs: remove XFS_IFILESTREAM
We never test the flag except in xfs_inode_is_filestream, but that
function already tests the on-disk flag or filesystem wide flags,
and is used to decide if we want to set XFS_IFILESTREAM in the
first place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23 07:11:51 +10:00
Christoph Hellwig
22328d712d xfs: embedd mru_elem into parent structure
There is no need to do a separate allocation for each mru element, just
embedd the structure into the parent one in the user.  Besides saving
a memory allocation and the infrastructure required for it this also
simplifies the API.

While we do major surgery on xfs_mru_cache.c also de-typedef it and
make struct mru_cache private to the implementation file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23 07:11:51 +10:00
Christoph Hellwig
ce695c6551 xfs: handle duplicate entries in xfs_mru_cache_insert
The radix tree code can detect and reject duplicate keys at insert
time.  Make xfs_mru_cache_insert handle this case so that future
changes to the filestream allocator can take advantage of this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-23 07:11:50 +10:00
Christoph Hellwig
c977eb1065 xfs: split xfs_bmap_btalloc_nullfb
Split xfs_bmap_btalloc_nullfb into one function for filestream allocations
and one for everything else that share a few helpers.  This dramatically
simplifies the control flow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-04-23 07:11:41 +10:00
Fabian Frederick
7410b3c6c5 fs/bio.c: remove nr_segs (unused function parameter)
nr_segs is no longer used in bio_alloc_map_data since c8db444820
("block: Don't save/copy bvec array anymore")

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-22 15:09:07 -06:00
Fabian Frederick
a6c39cb4f7 fs/bio: remove bs paramater in biovec_create_pool
bs is no longer used in biovec_create_pool since 9f060e2231 ("block:
Convert integrity to bvec_alloc_bs()")

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-22 15:09:05 -06:00
Fabian Frederick
d52a8f9ead fs/aio.c: Remove ctx parameter in kiocb_cancel
ctx is no longer used in kiocb_cancel since

57282d8fd7 ("aio: Kill ki_users")

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-04-22 12:27:24 -04:00
Jeff Layton
0d3f7a2dd2 locks: rename file-private locks to "open file description locks"
File-private locks have been merged into Linux for v3.15, and *now*
people are commenting that the name and macro definitions for the new
file-private locks suck.

...and I can't even disagree. The names and command macros do suck.

We're going to have to live with these for a long time, so it's
important that we be happy with the names before we're stuck with them.
The consensus on the lists so far is that they should be rechristened as
"open file description locks".

The name isn't a big deal for the kernel, but the command macros are not
visually distinct enough from the traditional POSIX lock macros. The
glibc and documentation folks are recommending that we change them to
look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
programmer will typo one of the commands wrong, and also makes it easier
to spot this difference when reading code.

This patch makes the following changes that I think are necessary before
v3.15 ships:

1) rename the command macros to their new names. These end up in the uapi
   headers and so are part of the external-facing API. It turns out that
   glibc doesn't actually use the fcntl.h uapi header, but it's hard to
   be sure that something else won't. Changing it now is safest.

2) make the the /proc/locks output display these as type "OFDLCK"

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frank Filz <ffilzlnx@mindspring.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-22 08:23:58 -04:00
Dmitry Monakhov
236f5ecb4a ext4: remove obsoleted check
BH can not be NULL at this point, ext4_read_dirblock() always return
non null value, and we already have done all necessery checks.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-04-21 14:38:14 -04:00
Theodore Ts'o
202ee5df38 ext4: add a new spinlock i_raw_lock to protect the ext4's raw inode
To avoid potential data races, use a spinlock which protects the raw
(on-disk) inode.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2014-04-21 14:37:55 -04:00
Theodore Ts'o
f5ccfe1ddb ext4: fix locking for O_APPEND writes
Al Viro pointed out that locking for O_APPEND writes was problematic,
since the location of the write isn't known until after we take the
i_mutex, which impacts the ext4_unaligned_aio() and s_bitmap_maxbytes
check.

For O_APPEND always assume that the write is unaligned so call
ext4_unwritten_wait().  And to solve the second problem, take the
i_mutex earlier before we start the s_bitmap_maxbytes check.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-21 14:37:52 -04:00
Theodore Ts'o
7ed07ba8c3 ext4: factor out common code in ext4_file_write()
This shouldn't change any logic flow; just delete duplicated code.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2014-04-21 14:36:30 -04:00
Theodore Ts'o
8ad2850f44 ext4: move ext4_file_dio_write() into ext4_file_write()
This commit doesn't actually change anything; it just moves code
around in preparation for some code simplification work.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2014-04-21 14:26:57 -04:00
Theodore Ts'o
7608e61044 ext4: inline generic_file_aio_write() into ext4_file_write()
Copy generic_file_aio_write() into ext4_file_write().  This is part of
a patch series which allows us to simplify ext4_file_write() and
ext4_file_dio_write(), by calling __generic_file_aio_write() directly.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2014-04-21 14:26:28 -04:00
Randy Dunlap
1051a902fe fs: fix new kernel-doc warnings in fs/bio.c
Fix new kernel-doc warnings in fs/bio.c:

Warning(fs/bio.c:316): No description found for parameter 'bio'
Warning(fs/bio.c:316): No description found for parameter 'parent'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-21 10:39:00 -06:00
Lukas Czerner
556615dcbf ext4: rename uninitialized extents to unwritten
Currently in ext4 there is quite a mess when it comes to naming
unwritten extents. Sometimes we call it uninitialized and sometimes we
refer to it as unwritten.

The right name for the extent which has been allocated but does not
contain any written data is _unwritten_. Other file systems are
using this name consistently, even the buffer head state refers to it as
unwritten. We need to fix this confusion in ext4.

This commit changes every reference to an uninitialized extent (meaning
allocated but unwritten) to unwritten extent. This includes comments,
function names and variable names. It even covers abbreviation of the
word uninitialized (such as uninit) and some misspellings.

This commit does not change any of the code paths at all. This has been
confirmed by comparing md5sums of the assembly code of each object file
after all the function names were stripped from it.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-20 23:45:47 -04:00
Lukas Czerner
090f32ee4e ext4: get rid of EXT4_MAP_UNINIT flag
Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the
cases where we're using dioread_nolock and we're writing into either
unallocated, or unwritten extent, because we need to make sure that
any DIO write into that inode will wait for the extent conversion.

However EXT4_MAP_UNINIT is not only entirely misleading name but also
unnecessary because we can check for EXT4_MAP_UNWRITTEN in the
dioread_nolock case instead.

This commit removes EXT4_MAP_UNINIT flag.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-20 23:44:47 -04:00
Linus Torvalds
9ac0367501 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
 "These are regression and bug fixes for ext4.

  We had a number of new features in ext4 during this merge window
  (ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so
  there were many more regression and bug fixes this time around.  It
  didn't help that xfstests hadn't been fully updated to fully stress
  test COLLAPSE_RANGE until after -rc1"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits)
  ext4: disable COLLAPSE_RANGE for bigalloc
  ext4: fix COLLAPSE_RANGE failure with 1KB block size
  ext4: use EINVAL if not a regular file in ext4_collapse_range()
  ext4: enforce we are operating on a regular file in ext4_zero_range()
  ext4: fix extent merging in ext4_ext_shift_path_extents()
  ext4: discard preallocations after removing space
  ext4: no need to truncate pagecache twice in collapse range
  ext4: fix removing status extents in ext4_collapse_range()
  ext4: use filemap_write_and_wait_range() correctly in collapse range
  ext4: use truncate_pagecache() in collapse range
  ext4: remove temporary shim used to merge COLLAPSE_RANGE and ZERO_RANGE
  ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabled
  ext4: always check ext4_ext_find_extent result
  ext4: fix error handling in ext4_ext_shift_extents
  ext4: silence sparse check warning for function ext4_trim_extent
  ext4: COLLAPSE_RANGE only works on extent-based files
  ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches
  ext4: use i_size_read in ext4_unaligned_aio()
  fs: disallow all fallocate operation on active swapfile
  fs: move falloc collapse range check into the filesystem methods
  ...
2014-04-20 20:43:47 -07:00
Namjae Jeon
0a04b24853 ext4: disable COLLAPSE_RANGE for bigalloc
Once COLLAPSE RANGE is be disable for ext4 with bigalloc feature till finding
root-cause of problem. It will be enable with fixing that regression of
xfstest(generic 075 and 091) again.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-19 16:38:21 -04:00
Namjae Jeon
a8680e0d5e ext4: fix COLLAPSE_RANGE failure with 1KB block size
When formatting with 1KB or 2KB(not aligned with PAGE SIZE) block
size, xfstests generic/075 and 091 are failing. The offset supplied to
function truncate_pagecache_range is block size aligned. In this
function start offset is re-aligned to PAGE_SIZE by rounding_up to the
next page boundary.  Due to this rounding up, old data remains in the
page cache when blocksize is less than page size and start offset is
not aligned with page size.  In case of collapse range, we need to
align start offset to page size boundary by doing a round down
operation instead of round up.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-19 16:37:31 -04:00
Eric Dumazet
404ca80eb5 coredump: fix va_list corruption
A va_list needs to be copied in case it needs to be used twice.

Thanks to Hugh for debugging this issue, leading to various panics.

Tested:

  lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern

'produce_core' is simply : main() { *(int *)0 = 1;}

  lpq84:~# ./produce_core
  Segmentation fault (core dumped)
  lpq84:~# dmesg | tail -1
  [  614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed

Notice the last argument was replaced by a NULL (we were lucky enough to
not crash, but do not try this on your production machine !)

After fix :

  lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
  lpq83:~# ./produce_core
  Segmentation fault
  lpq83:~# dmesg | tail -1
  [  740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed

Fixes: 5fe9d8ca21 ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Hugh Dickins <hughd@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-19 13:23:31 -07:00
Al Viro
22213318af fix races between __d_instantiate() and checks of dentry flags
in non-lazy walk we need to be careful about dentry switching from
negative to positive - both ->d_flags and ->d_inode are updated,
and in some places we might see only one store.  The cases where
dentry has been obtained by dcache lookup with ->i_mutex held on
parent are safe - ->d_lock and ->i_mutex provide all the barriers
we need.  However, there are several places where we run into
trouble:
	* do_last() fetches ->d_inode, then checks ->d_flags and
assumes that inode won't be NULL unless d_is_negative() is true.
Race with e.g. creat() - we might have fetched the old value of
->d_inode (still NULL) and new value of ->d_flags (already not
DCACHE_MISS_TYPE).  Lin Ming has observed and reported the resulting
oops.
	* a bunch of places checks ->d_inode for being non-NULL,
then checks ->d_flags for "is it a symlink".  Race with symlink(2)
in case if our CPU sees ->d_inode update first - we see non-NULL
there, but ->d_flags still contains DCACHE_MISS_TYPE instead of
DCACHE_SYMLINK_TYPE.  Result: false negative on "should we follow
link here?", with subsequent unpleasantness.

Cc: stable@vger.kernel.org # 3.13 and 3.14 need that one
Reported-and-tested-by: Lin Ming <minggr@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-19 12:30:58 -04:00
Linus Torvalds
6e66d5dab5 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "A set of 5 small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cif: fix dead code
  cifs: fix error handling cifs_user_readv
  fs: cifs: remove unused variable.
  Return correct error on query of xattr on file with empty xattrs
  cifs: Wait for writebacks to complete before attempting write.
2014-04-18 17:52:39 -07:00
Linus Torvalds
60fbf2bda1 Merge tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are some driver core fixes for 3.15-rc2.  Also in here are some
  documentation updates, as well as an API removal that had to wait for
  after -rc1 due to the cleanups coming into you from multiple developer
  trees (this one and the PPC tree.)

  All have been in linux next successfully"

* tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers/base/dd.c incorrect pr_debug() parameters
  Documentation: Update stable address in Chinese and Japanese translations
  topology: Fix compilation warning when not in SMP
  Chinese: add translation of io_ordering.txt
  stable_kernel_rules: spelling/word usage
  sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
  kernfs: protect lazy kernfs_iattrs allocation with mutex
  fs: Don't return 0 from get_anon_bdev
2014-04-18 16:59:52 -07:00
Theodore Ts'o
86f1ca3889 ext4: use EINVAL if not a regular file in ext4_collapse_range()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18 11:52:11 -04:00
jon ernst
6c5e73d3a2 ext4: enforce we are operating on a regular file in ext4_zero_range()
Signed-off-by: Jon Ernst <jonernst07@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18 11:50:35 -04:00
Lukas Czerner
6dd834effc ext4: fix extent merging in ext4_ext_shift_path_extents()
There is a bug in ext4_ext_shift_path_extents() where if we actually
manage to merge a extent we would skip shifting the next extent. This
will result in in one extent in the extent tree not being properly
shifted.

This is causing failure in various xfstests tests using fsx or fsstress
with collapse range support. It will also cause file system corruption
which looks something like:

 e2fsck 1.42.9 (4-Feb-2014)
 Pass 1: Checking inodes, blocks, and sizes
 Inode 20 has out of order extents
        (invalid logical block 3, physical block 492938, len 2)
 Clear? yes
 ...

when running e2fsck.

It's also very easily reproducible just by running fsx without any
parameters. I can usually hit the problem within a minute.

Fix it by increasing ex_start only if we're not merging the extent.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2014-04-18 10:55:24 -04:00
Lukas Czerner
ef24f6c234 ext4: discard preallocations after removing space
Currently in ext4_collapse_range() and ext4_punch_hole() we're
discarding preallocation twice. Once before we attempt to do any changes
and second time after we're done with the changes.

While the second call to ext4_discard_preallocations() in
ext4_punch_hole() case is not needed, we need to discard preallocation
right after ext4_ext_remove_space() in collapse range case because in
the case we had to restart a transaction in the middle of removing space
we might have new preallocations created.

Remove unneeded ext4_discard_preallocations() ext4_punch_hole() and move
it to the better place in ext4_collapse_range()

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18 10:50:23 -04:00
Lukas Czerner
9337d5d31a ext4: no need to truncate pagecache twice in collapse range
We're already calling truncate_pagecache() before we attempt to do any
actual job so there is not need to truncate pagecache once more using
truncate_setsize() after we're finished.

Remove truncate_setsize() and replace it just with i_size_write() note
that we're holding appropriate locks.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18 10:48:25 -04:00
Lukas Czerner
2c1d23289b ext4: fix removing status extents in ext4_collapse_range()
Currently in ext4_collapse_range() when calling ext4_es_remove_extent() to
remove status extents we're passing (EXT_MAX_BLOCKS - punch_start - 1)
in order to remove all extents from start of the collapse range to the
end of the file. However this is wrong because we might miss the
possible extent covering the last block of the file.

Fix it by removing the -1.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2014-04-18 10:43:21 -04:00
Lukas Czerner
1a66c7c3be ext4: use filemap_write_and_wait_range() correctly in collapse range
Currently we're passing -1 as lend argumnet for
filemap_write_and_wait_range() which is wrong since lend is signed type
so it would cause some confusion and we might not write_and_wait for the
entire range we're expecting to write.

Fix it by using LLONG_MAX instead.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18 10:41:52 -04:00
Lukas Czerner
694c793fc1 ext4: use truncate_pagecache() in collapse range
We should be using truncate_pagecache() instead of
truncate_pagecache_range() in the collapse range because we're
truncating page cache from offset to the end of file.
truncate_pagecache() also get rid of the private COWed pages from the
range because we're going to shift the end of the file.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-18 10:21:15 -04:00
J. Bruce Fields
fc208d026b Revert "nfsd4: fix nfs4err_resource in 4.1 case"
Since we're still limiting attributes to a page, the result here is that
a large getattr result will return NFS4ERR_REP_TOO_BIG/TOO_BIG_TO_CACHE
instead of NFS4ERR_RESOURCE.

Both error returns are wrong, and the real bug here is the arbitrary
limit on getattr results, fixed by as-yet out-of-tree patches.  But at a
minimum we can make life easier for clients by sticking to one broken
behavior in released kernels instead of two....

Trond says:

	one immediate consequence of this patch will be that NFSv4.1
	clients will now report EIO instead of EREMOTEIO if they hit the
	problem. That may make debugging a little less obvious.

	Another consequence will be that if we ever do try to add client
	side handling of NFS4ERR_REP_TOO_BIG, then we now have to deal
	with the “handle existing buggy server” syndrome.

Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18 14:46:45 +02:00
Jeff Layton
3758cf7e14 nfsd: set timeparms.to_maxval in setup_callback_client
...otherwise the logic in the timeout handling doesn't work correctly.

Spotted-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18 14:34:31 +02:00
Jeff Layton
4991a628a7 locks: allow __break_lease to sleep even when break_time is 0
A fl->fl_break_time of 0 has a special meaning to the lease break code
that basically means "never break the lease". knfsd uses this to ensure
that leases don't disappear out from under it.

Unfortunately, the code in __break_lease can end up passing this value
to wait_event_interruptible as a timeout, which prevents it from going
to sleep at all. This causes __break_lease to spin in a tight loop and
causes soft lockups.

Fix this by ensuring that we pass a minimum value of 1 as a timeout
instead.

Cc: <stable@vger.kernel.org>
Cc: J. Bruce Fields <bfields@fieldses.org>
Reported-by: Terry Barnaby <terry1@beam.ltd.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18 14:34:30 +02:00
Peter Zijlstra
4e857c58ef arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:48 +02:00
Dongsheng Yang
8698a745d8 sched, treewide: Replace hardcoded nice values with MIN_NICE/MAX_NICE
Replace various -20/+19 hardcoded nice values with MIN_NICE/MAX_NICE.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ff13819fd09b7a5dba5ab5ae797f2e7019bdfa17.1394532288.git.yangds.fnst@cn.fujitsu.com
Cc: devel@driverdev.osuosl.org
Cc: devicetree@vger.kernel.org
Cc: fcoe-devel@open-fcoe.org
Cc: linux390@de.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: nbd-general@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: openipmi-developer@lists.sourceforge.net
Cc: qla2xxx-upstream@qlogic.com
Cc: linux-arch@vger.kernel.org
[ Consolidated the patches, twiddled the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:07:24 +02:00
Abhi Das
991deec819 GFS2: quotas not being refreshed in gfs2_adjust_quota
Old values of user quota limits were being used and
could allow users to exceed their allotted quotas.
This patch refreshes the limits to the latest values
so that quotas are enforced correctly.

Resolves: rhbz#1077463
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-04-17 09:59:40 +01:00