Commit 6596528e39 ("hfsplus: ensure bio requests are not smaller than
the hardware sectors") changed the pointers used for volume header
allocations but failed to free the correct pointers in the error path
path of hfsplus_fill_super() and hfsplus_read_wrapper.
The second hunk came from a separate patch by Pavel Ivanov.
Reported-by: Pavel Ivanov <paivanof@gmail.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We used to get the victim pinned by dentry_unhash() prior to commit
64252c75a2 ("vfs: remove dget() from dentry_unhash()") and ->rmdir()
and ->rename() instances relied on that; most of them don't care, but
ones that used d_delete() themselves do. As the result, we are getting
rmdir() oopses on NFS now.
Just grab the reference before locking the victim and drop it explicitly
after unlocking, same as vfs_rename_other() does.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org (3.0.x)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a window in which the ioend that we call inode_dio_wake on
in xfs_end_io_direct_write is already free. Fix this by storing
the inode pointer in a local variable.
This is a fix for the regression introduced in 3.1-rc by
"fs: move inode_dio_done to the end_io handler".
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
For IPv6 local address, lockd can not callback to client for
missing scope id when binding address at inet6_bind:
324 if (addr_type & IPV6_ADDR_LINKLOCAL) {
325 if (addr_len >= sizeof(struct sockaddr_in6) &&
326 addr->sin6_scope_id) {
327 /* Override any existing binding, if another one
328 * is supplied by user.
329 */
330 sk->sk_bound_dev_if = addr->sin6_scope_id;
331 }
332
333 /* Binding to link-local address requires an interface */
334 if (!sk->sk_bound_dev_if) {
335 err = -EINVAL;
336 goto out_unlock;
337 }
Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info
besides address.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: since this is server-side, use nfsd4_ prefix instead of nfs4_ prefix. ]
[ cel: implement S_ISVTX filter in bfields-normal form ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current code is sort of hackish in that it assumes a referral is always
matched to an export. When we add support for junctions that may not be the
case.
We can replace nfsd4_path() with a function that encodes the components
directly from the dentries. Since nfsd4_path is currently the only user of
the 'ex_pathname' field in struct svc_export, this has the added benefit
of allowing us to get rid of that.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
First, we shouldn't care here about the structure of the opaque part of
the stateid. Second, this hash is really dumb. (I'm not sure the
replacement is much better, though--to look at it another patch.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We want delegations to share more with open/lock stateid's, so first
we'll pull out some of the common stuff we want to share.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move most of this into helper functions. Also move the non-CONFIRM case
into caller, providing a helper function for that purpose.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Do not allow multiple mounts on same mountpoint when using -o noac
When you normally attempt to mount a share twice on the same mountpoint,
a check in do_add_mount causes it to return an error
# mount localhost:/nfsv3 /mnt
# mount localhost:/nfsv3 /mnt
mount.nfs: /mnt is already mounted or busy
However when using the option 'noac', the user is able to mount the same
share on the same mountpoint multiple times. This happens because a
share mounted with the noac option is automatically assigned the 'sync'
flag MS_SYNCHRONOUS in nfs_initialise_sb(). This flag is set after the
check for already existing superblocks is done in sget(). The check for
the mount flags in nfs_compare_mount_options() does not take into
account the 'sync' flag applied later on in the code path. This means
that when using 'noac', a new superblock structure is assigned for every
new mount of the same share and multiple shares on the same mountpoint
are allowed.
ie.
# mount -onoac localhost:/nfsv3 /mnt
can be run multiple times.
The patch checks for noac and assigns the sync flag before sget() is
called to obtain an already existing superblock structure.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix a typo which causes an Oops in the RPC layer, when using wsize < 4k.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Sricharan R <r.sricharan@ti.com>
* 'for-linus' of git://github.com/chrismason/linux:
Btrfs: add dummy extent if dst offset excceeds file end in
Btrfs: calc file extent num_bytes correctly in file clone
btrfs: xattr: fix attribute removal
Btrfs: fix wrong nbytes information of the inode
Btrfs: fix the file extent gap when doing direct IO
Btrfs: fix unclosed transaction handle in btrfs_cont_expand
Btrfs: fix misuse of trans block rsv
Btrfs: reset to appropriate block rsv after orphan operations
Btrfs: skip locking if searching the commit root in csum lookup
btrfs: fix warning in iput for bad-inode
Btrfs: fix an oops when deleting snapshots
You can see there's no file extent with range [0, 4096]. Check this by
btrfsck:
# btrfsck /dev/sda7
root 5 inode 258 errors 100
...
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
An attribute is not removed by 'setfattr -x attr file' and remains
visible in attr list. This makes xfstests/062 pass again.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If we write some data into the data hole of the file(no preallocation for this
hole), Btrfs will allocate some disk space, and update nbytes of the inode, but
the other element--disk_i_size needn't be updated. At this condition, we must
update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size()
return 1).
# mkfs.btrfs /dev/sdb1
# mount /dev/sdb1 /mnt
# touch /mnt/a
# truncate -s 856002 /mnt/a
# dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc
# umount /mnt
# btrfsck /dev/sdb1
root 5 inode 257 errors 400
found 32768 bytes used err is 1
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we write some data to the place that is beyond the end of the file
in direct I/O mode, a data hole will be created. And Btrfs should insert
a file extent item that point to this hole into the fs tree. But unfortunately
Btrfs forgets doing it.
The following is a simple way to reproduce it:
# mkfs.btrfs /dev/sdc2
# mount /dev/sdc2 /test4
# touch /test4/a
# dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc
# umount /test4
# btrfsck /dev/sdc2
root 5 inode 257 errors 100
Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The function - btrfs_cont_expand() forgot to close the transaction handle before
it jump out the while loop. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
At the beginning of create_pending_snapshot, trans->block_rsv is set
to pending->block_rsv and is used for snapshot things, however, when
it is done, we do not recover it as will.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
While truncating free space cache, we forget to change trans->block_rsv
back to the original one, but leave it with the orphan_block_rsv, and
then with option inode_cache enable, it leads to countless warnings of
btrfs_alloc_free_block and btrfs_orphan_commit_root:
WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]()
...
WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
It's not enough to just search the commit root, since we could be cow'ing the
very block we need to search through, which would mean that its locked and we'll
still deadlock. So use path->skip_locking as well. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
iput() shouldn't be called for inodes in I_NEW state.
We need to mark inode as constructed first.
WARNING: at fs/inode.c:1309 iput+0x20b/0x210()
Call Trace:
[<ffffffff8103e7ba>] warn_slowpath_common+0x7a/0xb0
[<ffffffff8103e805>] warn_slowpath_null+0x15/0x20
[<ffffffff810eaf0b>] iput+0x20b/0x210
[<ffffffff811b96fb>] btrfs_iget+0x1eb/0x4a0
[<ffffffff811c3ad6>] btrfs_run_defrag_inodes+0x136/0x210
[<ffffffff811ad55f>] cleaner_kthread+0x17f/0x1a0
[<ffffffff81035b7d>] ? sub_preempt_count+0x9d/0xd0
[<ffffffff811ad3e0>] ? transaction_kthread+0x280/0x280
[<ffffffff8105af86>] kthread+0x96/0xa0
[<ffffffff814336d4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105aef0>] ? kthread_worker_fn+0x190/0x190
[<ffffffff814336d0>] ? gs_change+0xb/0xb
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
CC: Konstantin Khlebnikov <khlebnikov@openvz.org>
Tested-by: David Sterba <dsterba@suse.cz>
CC: Josef Bacik <josef@redhat.com>
CC: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We can reproduce this oops via the following steps:
$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs
$ for ((i=0; i<3; i++)); do btrfs sub snap /mnt/btrfs /mnt/btrfs/s_$i; done
$ rm -fr /mnt/btrfs/*
$ rm -fr /mnt/btrfs/*
then we'll get
------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:2264!
[...]
Call Trace:
[<ffffffffa05578c7>] btrfs_rmdir+0xf7/0x1b0 [btrfs]
[<ffffffff81150b95>] vfs_rmdir+0xa5/0xf0
[<ffffffff81153cc3>] do_rmdir+0x123/0x140
[<ffffffff81145ac7>] ? fput+0x197/0x260
[<ffffffff810aecff>] ? audit_syscall_entry+0x1bf/0x1f0
[<ffffffff81153d0d>] sys_unlinkat+0x2d/0x40
[<ffffffff8147896b>] system_call_fastpath+0x16/0x1b
RIP [<ffffffffa054f7b9>] btrfs_orphan_add+0x179/0x1a0 [btrfs]
When it comes to btrfs_lookup_dentry, we may set a snapshot's inode->i_ino
to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID,
while the snapshot's location.objectid remains unchanged.
However, btrfs_ino() does not take this into account, and returns a wrong ino,
and causes the oops.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
These modes are not necessarily for OOB only. Particularly, MTD_OOB_RAW
affected operations on in-band page data as well. To clarify these
options and to emphasize that their effect is applied per-operation, we
change the primary prefix to MTD_OPS_.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Use a helper to test if a mutex is held instead of a hack with
mutex_trylock().
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Artem Bityutskiy <dedekind1@gmail.com>
kfree() deals gracefully with NULL pointers, so it's pointless to test for
one prior to calling it.
This removes such a test from jffs2_scan_medium().
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Artem Bityutskiy <dedekind1@gmail.com>
* 'for-linus' of git://neil.brown.name/md:
md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
md/raid1,10: Remove use-after-free bug in make_request.
md/raid10: unify handling of write completion.
Avoid dereferencing a 'request_queue' after last close.
On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed. The free
is done in a separate thread so it might happen a short time later.
__blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
called.
Since commit f758eeabeb
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock. This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.
So move the called to bdev_inode_switch_bdi before the call to
->release.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Currently, there exists a race between delayed allocated writes and
the writeback when bigalloc feature is in use. The race was because we
wanted to determine what blocks in a cluster are under delayed
allocation and we were using buffer_delayed(bh) check for it. But, the
writeback codepath clears this bit without any synchronization which
resulted in a race and an ext4 warning similar to:
EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
reserved data blocks
The race existed in two places.
(1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
writeback code path.
(2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
buffer_delayed(bh) is set.
To fix (1), this patch introduces a new buffer_head state bit -
BH_Da_Mapped. This bit is set under the protection of
EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
allocated blocks during the writeout time. We can now reliably check
for this bit inside ext4_find_delalloc_range() to determine whether
the reservation for the blocks have already been claimed or not.
To fix (2), it was necessary to set buffer_delay(bh) under the
protection of i_data_sem. So, I extracted the very beginning of
ext4_map_blocks into a new function - ext4_da_map_blocks() - and
performed the required setting of bh_delay bit and the quota
reservation under the protection of i_data_sem. These two fixes makes
the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
thus removing the race.
Tested: I was able to reproduce the problem by running 'dd' and
'fsync' in parallel. Also, xfstests sometimes used to reproduce this
race. After the fix both my test and xfstests were successful and no
race (warning message) was observed.
Google-Bug-Id: 4997027
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
ext4/inode.c.
Tested: Built and ran the kernel and verified that these tracepoints work.
Also ran xfstests.
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Rename the function so it is more clear what is going on. Also rename
the various variables so it's clearer what's happening.
Also fix a missing blocks to cluster conversion when reading the
number of reserved blocks for root.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This function really claims a number of free clusters, not blocks, so
rename it so it's clearer what's going on.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This function really returns the number of clusters after initializing
an uninitalized block bitmap has been initialized.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This function really counts the free clusters reported in the block
group descriptors, so rename it to reduce confusion.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>