Commit Graph

37611 Commits

Author SHA1 Message Date
Jeff Layton
6aa23d76a7 nfs: check if gssd is running before attempting to use krb5i auth in SETCLIENTID call
Currently, the client will attempt to use krb5i in the SETCLIENTID call
even if rpc.gssd isn't running. When that fails, it'll then fall back to
RPC_AUTH_UNIX. This introduced a delay when mounting if rpc.gssd isn't
running, and causes warning messages to pop up in the ring buffer.

Check to see if rpc.gssd is running before even attempting to use krb5i
auth, and just silently skip trying to do so if it isn't. In the event
that the admin is actually trying to mount with krb5*, it will still
fail at a later stage of the mount attempt.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-06 13:06:32 -05:00
Trond Myklebust
c7848f69ec NFSv4: OPEN must handle the NFS4ERR_IO return code correctly
decode_op_hdr() cannot distinguish between an XDR decoding error and
the perfectly valid errorcode NFS4ERR_IO. This is normally not a
problem, but for the particular case of OPEN, we need to be able
to increment the NFSv4 open sequence id when the server returns
a valid response.

Reported-by: J Bruce Fields <bfields@fieldses.org>
Link: http://lkml.kernel.org/r/20131204210356.GA19452@fieldses.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
2013-12-06 13:06:30 -05:00
Linus Torvalds
c537aba00e Merge git://git.kvack.org/~bcrl/aio-next
Pull aio fix from Benjamin LaHaise:
 "AIO fix from Gu Zheng that fixes a GPF that Dave Jones uncovered with
  trinity"

* git://git.kvack.org/~bcrl/aio-next:
  aio: clean up aio ring in the fail path
2013-12-06 08:32:59 -08:00
Gu Zheng
d1b9432712 aio: clean up aio ring in the fail path
Clean up the aio ring file in the fail path of aio_setup_ring
and ioctx_alloc. And maybe it can fix the GPF issue reported by
Dave Jones:
https://lkml.org/lkml/2013/11/25/898

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-12-06 10:22:55 -05:00
Linus Torvalds
002acf1fc1 Merge tag 'pm-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:

 - cpufreq regression fix from Bjørn Mork restoring the pre-3.12
   behavior of the framework during system suspend/hibernation to avoid
   garbage sysfs files from being left behind in case of a suspend error

 - PNP regression fix to restore the correct states of devices after
   resume from hibernation broken in 3.12.  From Dmitry Torokhov.

 - cpuidle fix to prevent cpuidle device unregistration from crashing
   due to a NULL pointer dereference if cpuidle has been disabled from
   the kernel command line.  From Konrad Rzeszutek Wilk.

 - intel_idle fix for the C6 state definition on Intel Avoton/Rangeley
   processors from Arne Bockholdt.

 - Power capping framework fix to make the energy_uj sysfs attribute
   work in accordance with the documentation.  From Srinivas Pandruvada.

 - epoll fix to make it ignore the EPOLLWAKEUP flag if the kernel has
   been compiled with CONFIG_PM_SLEEP unset (in which case that flag
   should not have any effect).  From Amit Pundir.

 - cpufreq fix to prevent governor sysfs files from being lost over
   system suspend/resume in some (arguably unusual) situations.  From
   Viresh Kumar.

* tag 'pm-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PowerCap: Fix mode for energy counter
  PNP: fix restoring devices after hibernation
  cpuidle: Check for dev before deregistering it.
  epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled
  cpufreq: fix garbage kobjects on errors during suspend/resume
  cpufreq: suspend governors on system suspend/hibernate
  intel_idle: Fixed C6 state on Avoton/Rangeley processors
2013-12-05 18:26:40 -08:00
Linus Torvalds
5ee540613d Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A small collection of fixes for the current series. It contains:

   - A fix for a use-after-free of a request in blk-mq.  From Ming Lei

   - A fix for a blk-mq bug that could attempt to dereference a NULL rq
     if allocation failed

   - Two xen-blkfront small fixes

   - Cleanup of submit_bio_wait() type uses in the kernel, unifying
     that.  From Kent

   - A fix for 32-bit blkg_rwstat reading.  I apologize for this one
     looking mangled in the shortlog, it's entirely my fault for missing
     an empty line between the description and body of the text"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: fix use-after-free of request
  blk-mq: fix dereference of rq->mq_ctx if allocation fails
  block: xen-blkfront: Fix possible NULL ptr dereference
  xen-blkfront: Silence pfn maybe-uninitialized warning
  block: submit_bio_wait() conversions
  Update of blkg_stat and blkg_rwstat may happen in bh context
2013-12-05 15:33:27 -08:00
Mark Tinguely
ef701600fd xfs: fix memory leak in xfs_dir2_node_removename
Fix the leak of kernel memory in xfs_dir2_node_removename()
when xfs_dir2_leafn_remove() returns an error code.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05 16:51:19 -06:00
Mark Tinguely
2a84108fe2 xfs: free the list of recovery items on error
Recovery builds a list of items on the transaction's
r_itemq head. Normally these items are committed and freed.
But in the event of a recovery error, these allocations
are leaked.

If the error occurs during item reordering, then reconstruct
the r_itemq list before deleting the list to avoid leaking
the entries that were on one of the temporary lists.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05 16:50:47 -06:00
Christoph Hellwig
dff6efc326 fs: fix iversion handling
Currently notify_change directly updates i_version for size updates,
which not only is counter to how all other fields are updated through
struct iattr, but also breaks XFS, which need inode updates to happen
under its own lock, and synchronized to the structure that gets written
to the log.

Remove the update in the common code, and it to btrfs and ext4,
XFS already does a proper updaste internally and currently gets a
double update with the existing code.

IMHO this is 3.13 and -stable material and should go in through the XFS
tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05 16:36:21 -06:00
Dave Chinner
b7d961b35b xfs: growfs overruns AGFL buffer on V4 filesystems
This loop in xfs_growfs_data_private() is incorrect for V4
superblocks filesystems:

		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
			agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);

For V4 filesystems, we don't have a agfl header structure, and so
XFS_AGFL_SIZE() returns an entire sector's worth of entries, which
we then index from an offset into the sector. Hence: buffer overrun.

This problem was introduced in 3.10 by commit 77c95bba ("xfs: add
CRC checks to the AGFL") which changed the AGFL structure but failed
to update the growfs code to handle the different structures.

Fix it by using the correct offset into the buffer for both V4 and
V5 filesystems.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05 16:19:51 -06:00
Linus Torvalds
29be6345bb Merge tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 - Stable fix for a NFSv4.1 delegation and state recovery deadlock
 - Stable fix for a loop on irrecoverable errors when returning
   delegations
 - Fix a 3-way deadlock between layoutreturn, open, and state recovery
 - Update the MAINTAINERS file with contact information for Trond
   Myklebust
 - Close needs to handle NFS4ERR_ADMIN_REVOKED
 - Enabling v4.2 should not recompile nfsd and lockd
 - Fix a couple of compile warnings

* tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix do_div() warning by instead using sector_div()
  MAINTAINERS: Update contact information for Trond Myklebust
  NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery
  SUNRPC: do not fail gss proc NULL calls with EACCES
  NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED
  NFSv4: Update list of irrecoverable errors on DELEGRETURN
  NFSv4 wait on recovery for async session errors
  NFS: Fix a warning in nfs_setsecurity
  NFS: Enabling v4.2 should not recompile nfsd and lockd
2013-12-05 13:05:48 -08:00
Jie Liu
f9fd013561 xfs: don't perform discard if the given range length is less than block size
For discard operation, we should return EINVAL if the given range length
is less than a block size, otherwise it will go through the file system
to discard data blocks as the end range might be evaluated to -1, e.g,
# fstrim -v -o 0 -l 100 /xfs7
/xfs7: 9811378176 bytes were trimmed

This issue can be triggered via xfstests/generic/288.

Also, it seems to get the request queue pointer via bdev_get_queue()
instead of the hard code pointer dereference is not a bad thing.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04 15:42:52 -06:00
Christoph Hellwig
10f73d27c8 xfs: fix the comment explaining xfs_trans_dqlockedjoin
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04 14:26:57 -06:00
Dan Carpenter
071c529eb6 xfs: underflow bug in xfs_attrlist_by_handle()
If we allocate less than sizeof(struct attrlist) then we end up
corrupting memory or doing a ZERO_PTR_SIZE dereference.

This can only be triggered with CAP_SYS_ADMIN.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04 14:23:46 -06:00
Christoph Hellwig
f230077845 xfs: remove unused FI_ flags
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com.>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04 14:11:05 -06:00
Eric Sandeen
3fefdeee92 xfs: simplify xfs_setsize_buftarg callchain; remove unused arg
The "verbose" argument to xfs_setsize_buftarg_flags() has been
unused since:

ffe37436 xfs: stop using the page cache to back the buffer cache

Remove it, and fold the function into xfs_setsize_buftarg()
now that there's no need for different types of callers.

Fix inconsistent comment spacing while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04 13:53:34 -06:00
Helge Deller
3873d064b8 nfs: fix do_div() warning by instead using sector_div()
When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
shown below.  Fix this warning by instead using sector_div() which is provided
by the kernel.h header file.

fs/nfs/blocklayout/extents.c: In function ‘normalize’:
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
 extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-04 12:57:37 -05:00
Trond Myklebust
f22e5edd22 NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery
Andy Adamson reports:

The state manager is recovering expired state and recovery OPENs are being
processed. If kswapd is pruning inodes at the same time, a deadlock can occur
when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
resultant layoutreturn gets an error that the state mangager is to handle,
causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.

At the same time an open is waiting for the inode deletion to complete in
__wait_on_freeing_inode.

If the open is either the open called by the state manager, or an open from
the same open owner that is holding the NFSv4 sequence id which causes the
OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
then the state is deadlocked with kswapd.

The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
We already know that layouts are dropped on all server reboots, and that
it has to be coded to deal with the "forgetful client model" that doesn't
send layoutreturns.

Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <Trond.Myklebust@primarydata.com>
2013-12-04 12:32:19 -05:00
Linus Torvalds
278717909d Merge tag 'squashfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next
Pull squashfs bugfix from Phillip Lougher:
 "Just a single bug fix to the new "directly decompress into the page
  cache" code"

* tag 'squashfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
  Squashfs: fix failure to unlock pages on decompress error
2013-12-04 08:54:00 -08:00
Tejun Heo
2322392b02 kernfs: implement "trusted.*" xattr support
kernfs inherited "security.*" xattr support from sysfs.  This patch
extends xattr support to "trusted.*" using simple_xattr_*().  As
trusted xattrs are restricted to CAP_SYS_ADMIN, simple_xattr_*() which
uses kernel memory for storage shouldn't be problematic.

Note that the existing "security.*" support doesn't implement
get/remove/list and the this patch only implements those ops for
"trusted.*".  We probably want to extend those ops to include support
for "security.*".

This patch will allow using kernfs from cgroup which requires
"trusted.*" xattr support.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04 07:34:45 -08:00
Tejun Heo
9a8049affd kernfs: update sysfs_init_inode_attrs()
sysfs_init_inode_attrs() is a bit clumsy to use requiring the caller
to check whether @sd->s_iattr is already set or not.  Rename it to
sysfs_inode_attrs(), update it to check whether @sd->s_iattr is
already initialized before trying to initialize it and return
@sd->s_iattr.  This simplifies the callers.

While at it,

* Rename struct sysfs_inode_attrs pointer variables to "attrs".  As
  kernfs no longer deals with "struct attribute", this isn't confusing
  and makes it easier to distinguish from struct iattr pointers.

* A new field will be added to sysfs_inode_attrs.  Reindent in
  preparation.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04 07:34:45 -08:00
Jan Kara
301d4c9a28 jbd: Revise KERN_EMERG error messages
Some of KERN_EMERG printk messages do not really deserve this log level
and the one in log_wait_commit() is even rather useless (the journal has
been previously aborted and *that* is where we should have been
complaining). So make some messages just KERN_ERR and remove the useless
message.

Signed-off-by: Jan Kara <jack@suse.cz>
2013-12-04 12:27:46 +01:00
Jan Kara
df4e7ac0bb ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
ext2_quota_write() doesn't properly setup bh it passes to
ext2_get_block() and thus we hit assertion BUG_ON(maxblocks == 0) in
ext2_get_blocks() (or we could actually ask for mapping arbitrary number
of blocks depending on whatever value was on stack).

Fix ext2_quota_write() to properly fill in number of blocks to map.

CC: stable@vger.kernel.org # >= 2.6.12
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-12-04 12:26:51 +01:00
Eryu Guan
5946d08937 ext4: check for overlapping extents in ext4_valid_extent_entries()
A corrupted ext4 may have out of order leaf extents, i.e.

extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
             ^^^^ overlap with previous extent

Reading such extent could hit BUG_ON() in ext4_es_cache_extent().

	BUG_ON(end < lblk);

The problem is that __read_extent_tree_block() tries to cache holes as
well but assumes 'lblk' is greater than 'prev' and passes underflowed
length to ext4_es_cache_extent(). Fix it by checking for overlapping
extents in ext4_valid_extent_entries().

I hit this when fuzz testing ext4, and am able to reproduce it by
modifying the on-disk extent by hand.

Also add the check for (ee_block + len - 1) in ext4_valid_extent() to
make sure the value is not overflow.

Ran xfstests on patched ext4 and no regression.

Cc: Lukáš Czerner <lczerner@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-12-03 21:22:21 -05:00
Christoph Lameter
ca6673b02e block: Replace __this_cpu_ptr with raw_cpu_ptr
__this_cpu_ptr is being phased out.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-03 19:19:41 -07:00
Junho Ryu
4e8d213980 ext4: fix use-after-free in ext4_mb_new_blocks
ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count.
While ext4_mb_use_preallocated checks pa->pa_deleted first and then
increments pa->count later, ext4_mb_put_pa decrements pa->pa_count
before holding pa->pa_lock and then sets pa->pa_deleted.

* Free sequence
ext4_mb_put_pa (1):		atomic_dec_and_test pa->pa_count
ext4_mb_put_pa (2):		lock pa->pa_lock
ext4_mb_put_pa (3):			check pa->pa_deleted
ext4_mb_put_pa (4):			set pa->pa_deleted=1
ext4_mb_put_pa (5):		unlock pa->pa_lock
ext4_mb_put_pa (6):		remove pa from a list
ext4_mb_pa_callback:		free pa

* Use sequence
ext4_mb_use_preallocated (1):	iterate over preallocation
ext4_mb_use_preallocated (2):	lock pa->pa_lock
ext4_mb_use_preallocated (3):		check pa->pa_deleted
ext4_mb_use_preallocated (4):		increase pa->pa_count
ext4_mb_use_preallocated (5):	unlock pa->pa_lock
ext4_mb_release_context:	access pa

* Use-after-free sequence
[initial status]		<pa->pa_deleted = 0, pa_count = 1>
ext4_mb_use_preallocated (1):	iterate over preallocation
ext4_mb_use_preallocated (2):	lock pa->pa_lock
ext4_mb_use_preallocated (3):		check pa->pa_deleted
ext4_mb_put_pa (1):		atomic_dec_and_test pa->pa_count
[pa_count decremented]		<pa->pa_deleted = 0, pa_count = 0>
ext4_mb_use_preallocated (4):		increase pa->pa_count
[pa_count incremented]		<pa->pa_deleted = 0, pa_count = 1>
ext4_mb_use_preallocated (5):	unlock pa->pa_lock
ext4_mb_put_pa (2):		lock pa->pa_lock
ext4_mb_put_pa (3):			check pa->pa_deleted
ext4_mb_put_pa (4):			set pa->pa_deleted=1
[race condition!]		<pa->pa_deleted = 1, pa_count = 1>
ext4_mb_put_pa (5):		unlock pa->pa_lock
ext4_mb_put_pa (6):		remove pa from a list
ext4_mb_pa_callback:		free pa
ext4_mb_release_context:	access pa

AddressSanitizer has detected use-after-free in ext4_mb_new_blocks
Bug report: http://goo.gl/rG1On3

Signed-off-by: Junho Ryu <jayr@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-12-03 18:10:28 -05:00
Kent Overstreet
bc1e79acc1 block: fixup for generic bio chaining
btrfs bits got lost in the rebase

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Chris Mason <clm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-03 14:30:00 -07:00
Amit Pundir
95f19f658c epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled
Drop EPOLLWAKEUP from epoll events mask if CONFIG_PM_SLEEP is disabled.

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-03 15:35:52 +01:00
Linus Torvalds
b0d8d22921 vfs: fix subtle use-after-free of pipe_inode_info
The pipe code was trying (and failing) to be very careful about freeing
the pipe info only after the last access, with a pattern like:

        spin_lock(&inode->i_lock);
        if (!--pipe->files) {
                inode->i_pipe = NULL;
                kill = 1;
        }
        spin_unlock(&inode->i_lock);
        __pipe_unlock(pipe);
        if (kill)
                free_pipe_info(pipe);

where the final freeing is done last.

HOWEVER.  The above is actually broken, because while the freeing is
done at the end, if we have two racing processes releasing the pipe
inode info, the one that *doesn't* free it will decrement the ->files
count, and unlock the inode i_lock, but then still use the
"pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".

This is *very* hard to trigger in practice, since the race window is
very small, and adding debug options seems to just hide it by slowing
things down.

Simon originally reported this way back in July as an Oops in
kmem_cache_allocate due to a single bit corruption (due to the final
"spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
allocation that had re-used the free'd pipe-info), it's taken this long
to figure out.

Since the 'pipe->files' accesses aren't even protected by the pipe lock
(we very much use the inode lock for that), the simple solution is to
just drop the pipe lock early.  And since there were two users of this
pattern, create a helper function for it.

Introduced commit ba5bb14733 ("pipe: take allocation and freeing of
pipe_inode_info out of ->i_mutex").

Reported-by: Simon Kirby <sim@hostway.ca>
Reported-by: Ian Applegate <ia@cloudflare.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org   # v3.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-02 09:44:51 -08:00
Theodore Ts'o
ae1495b12d ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails
While it's true that errors can only happen if there is a bug in
jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt
the kernel or remount the file system read-only in order to avoid
further data loss.  The ext4_journal_abort_handle() function doesn't
do any of this, and while it's likely that this call (since it doesn't
adjust refcounts) will likely result in the file system eventually
deadlocking since the current transaction will never be able to close,
it's much cleaner to call let ext4's error handling system deal with
this situation.

There's a separate bug here which is that if certain jbd2 errors
errors occur and file system is mounted errors=continue, the file
system will probably eventually end grind to a halt as described
above.  But things have been this way in a long time, and usually when
we have these sorts of errors it's pretty much a disaster --- and
that's why the jbd2 layer aggressively retries memory allocations,
which is the most likely cause of these jbd2 errors.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
2013-12-02 09:31:36 -05:00
Tejun Heo
bfc5c17337 sysfs, kernfs: remove cross inclusions of internal headers
fs/kernfs/kernfs-internal.h needed to include fs/sysfs/sysfs.h because
part of kernfs core implementation was living in sysfs.

fs/sysfs/sysfs.h needed to include fs/kernfs/kernfs-internal.h because
include/linux/kernfs.h didn't expose enough interface.

The separation is complete and neither is true anymore.  Remove the
cross inclusion and make sysfs a proper user of kernfs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:54:50 -08:00
Tejun Heo
ac9bba0310 sysfs, kernfs: implement kernfs_ns_enabled()
fs/sysfs/symlink.c::sysfs_delete_link() tests @sd->s_flags for
SYSFS_FLAG_NS.  Let's add kernfs_ns_enabled() so that sysfs doesn't
have to test sysfs_dirent flag directly.  This makes things tidier for
kernfs proper too.

This is purely cosmetic.

v2: To avoid possible NULL deref, use noop dummy implementation which
    always returns false when !CONFIG_SYSFS.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:41:28 -08:00
Tejun Heo
cf9e5a73aa sysfs, kernfs: make sysfs_dirent definition public
sysfs_dirent includes some information which should be available to
kernfs users - the type, flags, name and parent pointer.  This patch
moves sysfs_dirent definition from kernfs/kernfs-internal.h to
include/linux/kernfs.h so that kernfs users can access them.

The type part of flags is exported as enum kernfs_node_type, the flags
kernfs_node_flag, sysfs_type() and kernfs_enable_ns() are moved to
include/linux/kernfs.h and the former is updated to return the enum
type.  sysfs_dirent->s_parent and ->s_name are marked explicitly as
public.

This patch doesn't introduce any functional changes.

v2: Flags exported too and kernfs_enable_ns() definition moved.

v3: While moving kernfs_enable_ns() to include/linux/kernfs.h, v1 and
    v2 put the definition outside CONFIG_SYSFS replacing the dummy
    implementation with the actual implementation too.  Unfortunately,
    this can lead to oops when !CONFIG_SYSFS because
    kernfs_enable_ns() may be called on a NULL @sd and now tries to
    dereference @sd instead of not doing anything.  This issue was
    reported by Yuanhan Liu.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:21:01 -08:00
Tejun Heo
fa736a951e sysfs, kernfs: move mount core code to fs/kernfs/mount.c
Move core mount code to fs/kernfs/mount.c.  The respective
declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:16:08 -08:00
Tejun Heo
4b93dc9b1c sysfs, kernfs: prepare mount path for kernfs
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly.  This patch
rearranges mount path so that the kernfs and sysfs parts are separate.

* As sysfs_super_info won't be visible outside kernfs proper,
  kernfs_super_ns() is added to allow kernfs users to access a
  super_block's namespace tag.

* Generic mount operation is separated out into kernfs_mount_ns().
  sysfs_mount() now just performs sysfs-specific permission check,
  acquires namespace tag, and invokes kernfs_mount_ns().

* Generic superblock release is separated out into kernfs_kill_sb()
  which can be used directly as file_system_type->kill_sb().  As sysfs
  needs to put the namespace tag, sysfs_kill_sb() wraps
  kernfs_kill_sb() with ns tag put.

* sysfs_dir_cachep init and sysfs_inode_init() are separated out into
  kernfs_init().  kernfs_init() uses only small amount of memory and
  trying to handle and propagate kernfs_init() failure doesn't make
  much sense.  Use SLAB_PANIC for sysfs_dir_cachep and make
  sysfs_inode_init() panic on failure.

  After this change, kernfs_init() should be called before
  sysfs_init(), fs/namespace.c::mnt_init() modified accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:16:08 -08:00
Tejun Heo
df394fb56c sysfs, kernfs: make super_blocks bind to different kernfs_roots
kernfs is being updated to allow multiple sysfs_dirent hierarchies so
that it can also be used by other users.  Currently, sysfs
super_blocks are always attached to one kernfs_root - sysfs_root - and
distinguished only by their namespace tags.

This patch adds sysfs_super_info->root and update
sysfs_fill/test_super() so that super_blocks are identified by the
combination of both the associated kernfs_root and namespace tag.
This allows mounting different kernfs hierarchies.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:10:48 -08:00
Tejun Heo
bc755553df sysfs, kernfs: make inode number ida per kernfs_root
kernfs is being updated to allow multiple sysfs_dirent hierarchies so
that it can also be used by other users.  Currently, inode number is
allocated using a global ida, sysfs_ino_ida; however, inos for
different hierarchies should be handled separately.

This patch makes ino allocation per kernfs_root.  sysfs_ino_ida is
replaced by kernfs_root->ino_ida and sysfs_new_dirent() is updated to
take @root and allocate ino from it.  ida_simple_get/remove() are used
instead of sysfs_ino_lock and sysfs_alloc/free_ino().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:10:48 -08:00
Tejun Heo
ba7443bc65 sysfs, kernfs: implement kernfs_create/destroy_root()
There currently is single kernfs hierarchy in the whole system which
is used for sysfs.  kernfs needs to support multiple hierarchies to
allow other users.  This patch introduces struct kernfs_root which
serves as the root of each kernfs hierarchy and implements
kernfs_create/destroy_root().

* Each kernfs_root is associated with a root sd (sysfs_dentry).  The
  root is freed when the root sd is released and kernfs_destory_root()
  simply invokes kernfs_remove() on the root sd.  sysfs_remove_one()
  is updated to handle release of the root sd.  Note that ps_iattr
  update in sysfs_remove_one() is trivially updated for readability.

* Root sd's are now dynamically allocated using sysfs_new_dirent().
  Update sysfs_alloc_ino() so that it gives out ino from 1 so that the
  root sd still gets ino 1.

* While kernfs currently only points to the root sd, it'll soon grow
  fields which are specific to each hierarchy.  As determining a given
  sd's root will be necessary, sd->s_dir.root is added.  This backlink
  fits better as a separate field in sd; however, sd->s_dir is inside
  union with space to spare, so use it to save space and provide
  kernfs_root() accessor to determine the root sd.

* As hierarchies may be destroyed now, each mount needs to hold onto
  the hierarchy it's attached to.  Update sysfs_fill_super() and
  sysfs_kill_sb() so that they get and put the kernfs_root
  respectively.

* sysfs_root is replaced with kernfs_root which is dynamically created
  by invoking kernfs_create_root() from sysfs_init().

This patch doesn't introduce any visible behavior changes.

v2: kernfs_create_root() forgot to set @sd->priv.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:10:48 -08:00
Tejun Heo
061447a496 sysfs, kernfs: introduce sysfs_root_sd
Currently, it's assumed that there's a single kernfs hierarchy in the
system anchored at sysfs_root which is defined as a global struct.  To
allow other users of kernfs, this will be made dynamic.  Introduce a
new global variable sysfs_root_sd which points to &sysfs_root and
convert all &sysfs_root users.

This patch doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:09:27 -08:00
Tejun Heo
9e30cc9595 sysfs, kernfs: no need to kern_mount() sysfs from sysfs_init()
It has been very long since sysfs depended on vfs to keep track of
internal states and whether sysfs is mounted or not doesn't make any
difference to sysfs's internal operation.

In addition to init and filesystem type registration, sysfs_init()
invokes kern_mount() to create in-kernel mount of sysfs.  This
internal mounting doesn't server any purpose anymore.  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:09:27 -08:00
Tejun Heo
51a35e9fd0 sysfs, kernfs: make sysfs_super_info->ns const
Add const qualifier to sysfs_super_info->ns so that it's consistent
with other namespace tag usages in sysfs.  Because kobject doesn't use
const qualifier for namespace tags, this ends up requiring an explicit
cast to drop const qualifier in free_sysfs_super_info().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:09:27 -08:00
Tejun Heo
ccc532dc12 sysfs, kernfs: drop unused params from sysfs_fill_super()
sysfs_fill_super() takes three params - @sb, @data and @silent - but
uses only @sb.  Drop the latter two.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:08:39 -08:00
Tejun Heo
2072f1afdd sysfs, kernfs: move symlink core code to fs/kernfs/symlink.c
Move core symlink code to fs/kernfs/symlink.c.  fs/sysfs/symlink.c now
only contains sysfs wrappers around kernfs interfaces.  The respective
declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:08:39 -08:00
Tejun Heo
414985ae23 sysfs, kernfs: move file core code to fs/kernfs/file.c
Move core file code to fs/kernfs/file.c.  fs/sysfs/file.c now contains
sysfs kernfs_ops callbacks, sysfs wrappers around kernfs interfaces,
and sysfs_schedule_callback().  The respective declarations in
fs/sysfs/sysfs.h are moved to fs/kernfs/kernfs-internal.h.

This is pure relocation.

v2: Refreshed on top of the v2 of "sysfs, kernfs: prepare read path
    for kernfs".

v3: Refreshed on top of the v3 of "sysfs, kernfs: prepare read path
    for kernfs".

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:08:39 -08:00
Tejun Heo
fd7b9f7b97 sysfs, kernfs: move dir core code to fs/kernfs/dir.c
Move core dir code to fs/kernfs/dir.c.  fs/sysfs/dir.c now only
contains sysfs_warn_dup() and sysfs wrappers around kernfs interfaces.
The respective declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.

This is pure relocation.

v2: sysfs_symlink_target_lock was mistakenly relocated to kernfs.  It
    should remain with sysfs.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 18:08:39 -08:00
Tejun Heo
ffed24e228 sysfs, kernfs: move inode code to fs/kernfs/inode.c
There's nothing sysfs-specific in fs/sysfs/inode.c.  Move everything
in it to fs/kernfs/inode.c.  The respective declarations in
fs/sysfs/sysfs.h are moved to fs/kernfs/kernfs-internal.h.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:55:10 -08:00
Tejun Heo
ae6621b071 sysfs, kernfs: move internal decls to fs/kernfs/kernfs-internal.h
Move data structure, constant and basic accessor declarations from
fs/sysfs/sysfs.h to fs/kernfs/kernfs-internal.h.  The two files
currently include each other.  Once kernfs / sysfs separation is
complete, the cross inclusions will be removed.  Inclusion protectors
are added to fs/sysfs/sysfs.h to allow cross-inclusion.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:55:10 -08:00
Tejun Heo
ccf73cf336 sysfs, kernfs: introduce kernfs[_find_and]_get() and kernfs_put()
Introduce kernfs interface for finding, getting and putting
sysfs_dirents.

* sysfs_find_dirent() is renamed to kernfs_find_ns() and lockdep
  assertion for sysfs_mutex is added.

* sysfs_get_dirent_ns() is renamed to kernfs_find_and_get().

* Macro inline dancing around __sysfs_get/put() are removed and
  kernfs_get/put() are made proper functions implemented in
  fs/sysfs/dir.c.

While the conversions are mostly equivalent, there's one difference -
kernfs_get() doesn't return the input param as its return value.  This
change is intentional.  While passing through the input increases
writability in some areas, it is unnecessary and has been shown to
cause confusion regarding how the last ref is handled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:55:10 -08:00
Tejun Heo
517e64f578 sysfs, kernfs: revamp sysfs_dirent active_ref lockdep annotation
Currently, sysfs_dirent active_ref lockdep annotation uses
attribute->[s]key as the lockdep key, which forces
kernfs_create_file_ns() to assume that sysfs_dirent->priv is pointing
to a struct attribute which may not be true for non-sysfs users.  This
patch restructures the lockdep annotation such that

* kernfs_ops contains lockdep_key which is used by default for files
  created kernfs_create_file_ns().

* kernfs_create_file_ns_key() is introduced which takes an extra @key
  argument.  The created file will use the specified key for
  active_ref lockdep annotation.  If NULL is specified, lockdep for
  the file is disabled.

* sysfs_add_file_mode_ns() is updated to use
  kernfs_create_file_ns_key() with the appropriate key from the
  attribute or NULL if ignore_lockdep is set.

This makes the lockdep annotation properly contained in kernfs while
allowing sysfs to cleanly keep its current behavior.  This patch
doesn't introduce any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:48:15 -08:00
Tejun Heo
2b25a62901 sysfs, kernfs: reorganize SYSFS_* constants
We want to add one more SYSFS_FLAG_* but we can't use the next higher
bit, 0x10000, as the flag field is 16bits wide.  The flags are
currently arranged weirdly - 8 bits are set aside for the type flags
when there are only three three used, the first flag starts at 0x1000
instead of 0x0100 and flag literals have 5 digits (20 bits) when only
4 digits can be used.

Rearrange them so that type bits are only the lowest four, flags start
at 0x0010 and similar flags are grouped.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:48:14 -08:00