This would fix the following failure during GC:
nilfs_cpfile_delete_checkpoints: cannot delete block
NILFS: GC failed during preparation: cannot delete checkpoints: err=-2
The problem was caused by a break in state consistency between page
cache and btree; the above block was removed from the btree but the
page buffering the block was remaining in the page cache in dirty
state.
This resolves the inconsistency by ensuring to clear dirty state of
the page buffering the deleted block.
Reported-by: David Arendt <admin@prnet.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Put generic_show_options read access to s_options under rcu_read_lock,
split save_mount_options() into "we are setting it the first time"
(uses in foo_fill_super()) and "we are relacing and freeing the old one",
synchronize_rcu() before kfree() in the latter.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Code Quality According To Mingo(tm) has been vastly improved,
no code has been damaged^Wchanged^Wdamaged.
[commit message rewritten -- AV]
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
romfs_dev_read() may return -EIO, but ret is unsigned, so the errorpath
isn't taken.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix ordering of LRU when moving referenced dentries to the head of the list
(they should go to the head of the list in the same order as they were found
from the tail, rather than reverse order).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ocfs2 was hand-calling vfs_follow_link(), but there's no point to that.
Let's use page_follow_link_light() and nd_set_link().
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Does equivalent of up_write(&s->s_umount); deactivate_super(s);
However, it does not does not unlock it until it's all over.
As the result, it's safe to use to dispose of new superblock on ->get_sb()
failure exits - nobody will see the sucker until it's all over.
Equivalent using up_write/deactivate_super is safe for that purpose
if superblock is either safe to use or has NULL ->s_root when we unlock.
Normally filesystems take the required precautions, but
a) we do have bugs in that area in some of them.
b) up_write/deactivate_super sequence is extremely common,
so the helper makes sense anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With Al Viro's patch to move privroot lookup to fs mount, there's no need
to have special code to hide the privroot in reiserfs_lookup.
I've also cleaned up the privroot hiding in reiserfs_readdir_dentry and
removed the last user of reiserfs_xattrs().
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The security.* xattrs are ignored for xattr files, so don't create them.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The xattr_root caching was broken from my previous patch set. It wouldn't
cause corruption, but could cause decreased performance due to allocating
a larger chunk of the journal (~ 27 blocks) than it would actually use.
This patch loads the xattr root dentry at xattr initialization and creates
it on-demand. Since we're using the cached dentry, there's no point
in keeping lookup_or_create_dir around, so that's removed.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... even if it's a negative dentry. That way we can set ->d_op on
root before anyone could race with us. Simplify d_compare(), while
we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2.6.30-rc3 introduced some sanity checks in the VFS code to avoid NFS
bugs by ensuring that lookup_one_len is always called under i_mutex.
This patch expands the i_mutex locking to enclose lookup_one_len. This was
always required, but not not enforced in the reiserfs code since it
does locking around the xattr interactions with the xattr_sem.
This is obvious enough, and it survived an overnight 50 thread ACL test.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Depending on the ordering of events as we go around the
glock shrinker loop, it is possible to drop the ref count
of a glock incorrectly. It doesn't happen very often. This
patch corrects the got_ref variable, fixing the problem.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes the following circular locking dependency problem:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.30-rc3 #5
-------------------------------------------------------
segctord/3895 is trying to acquire lock:
(&nilfs->ns_writer_mutex){+.+...}, at: [<d0d02172>]
nilfs_mdt_get_block+0x89/0x20f [nilfs2]
but task is already holding lock:
(&bmap->b_sem){++++..}, at: [<d0d02d99>]
nilfs_bmap_propagate+0x14/0x2e [nilfs2]
which lock already depends on the new lock.
The bugfix is done by replacing call sites of nilfs_get_writer() which
are never called from read-only context with direct dereferencing of
pointer to a writable FS-instance.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Some function calls in nilfs_prepare_segment_for_recovery() may fail
because they can create blocks on meta data files without configuring
a writable FS-instance. Concretely, nilfs_mdt_create_block() routine
of meta data files will fail in that case.
This fixes the problem by temporarily attaching a writable FS-instace
during the function is called.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Regreesion from commit ef8f7fc, which rearranged the code in
xfs_swap_extents() leading to double unlock of xfs inode ilock.
That resulted in xfs_fsr deadlocking itself on platforms, which
don't handle double unlock of rw_semaphore nicely. It caused the
count go negative, which represents the write holder, without
really having one. ia64 is one of the platforms where deadlock
was easily reproduced and the fix was tested.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (32 commits)
[CIFS] Fix double list addition in cifs posix open code
[CIFS] Allow raw ntlmssp code to be enabled with sec=ntlmssp
[CIFS] Fix SMB uid in NTLMSSP authenticate request
[CIFS] NTLMSSP reenabled after move from connect.c to sess.c
[CIFS] Remove sparse warning
[CIFS] remove checkpatch warning
[CIFS] Fix final user of old string conversion code
[CIFS] remove cifs_strfromUCS_le
[CIFS] NTLMSSP support moving into new file, old dead code removed
[CIFS] Fix endian conversion of vcnum field
[CIFS] Remove trailing whitespace
[CIFS] Remove sparse endian warnings
[CIFS] Add remaining ntlmssp flags and standardize field names
[CIFS] Fix build warning
cifs: fix length handling in cifs_get_name_from_search_buf
[CIFS] Remove unneeded QuerySymlink call and fix mapping for unmapped status
[CIFS] rename cifs_strndup to cifs_strndup_from_ucs
Added loop check when mounting DFS tree.
Enable dfs submounts to handle remote referrals.
[CIFS] Remove older session setup implementation
...
Remove adding open file entry twice to lists in the file
Do not fill file info twice in case of posix opens and creates
Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When a lockspace was joined multiple times, the global dlm
use count was incremented when it should not have been. This
caused the global dlm threads to not be stopped when all
lockspaces were eventually be removed.
Signed-off-by: David Teigland <teigland@redhat.com>
There is what we believe to be a false positive reported by lockdep.
inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock
The plan is to fix this via lockdep annotation, but that is proving to be
quite involved.
The patch flips the allocation over to GFP_NFS to shut the warning up, for
the 2.6.30 release.
Hopefully we will fix this for real in 2.6.31. I'll queue a patch in -mm
to switch it back to GFP_KERNEL so we don't forget.
=================================
[ INFO: inconsistent lock state ]
2.6.30-rc2-next-20090417 #203
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
{RECLAIM_FS-ON-W} state was registered at:
[<ffffffff81079188>] mark_held_locks+0x68/0x90
[<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100
[<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0
[<ffffffff81130652>] kernel_event+0xe2/0x190
[<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230
[<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110
[<ffffffff8110444d>] vfs_create+0xcd/0x140
[<ffffffff8110825d>] do_filp_open+0x88d/0xa20
[<ffffffff810f6b68>] do_sys_open+0x98/0x140
[<ffffffff810f6c50>] sys_open+0x20/0x30
[<ffffffff8100c272>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 690455
hardirqs last enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80
hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0
softirqs last enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220
softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50
other info that might help us debug this:
2 locks held by kswapd0/380:
#0: (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180
#1: (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0
stack backtrace:
Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
Call Trace:
[<ffffffff810789ef>] print_usage_bug+0x19f/0x200
[<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50
[<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0
[<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0
[<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0
[<ffffffff810f478c>] ? slob_free+0x10c/0x370
[<ffffffff8107c0a1>] lock_acquire+0xe1/0x120
[<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
[<ffffffff81562d43>] mutex_lock_nested+0x63/0x420
[<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
[<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
[<ffffffff81012fe9>] ? sched_clock+0x9/0x10
[<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0
[<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
[<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0
[<ffffffff8110cb23>] d_kill+0x33/0x60
[<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350
[<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0
[<ffffffff810d0cc5>] shrink_slab+0x125/0x180
[<ffffffff810d1540>] kswapd+0x560/0x7a0
[<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0
[<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0
[<ffffffff8106555b>] kthread+0x5b/0xa0
[<ffffffff8100d40a>] child_rip+0xa/0x20
[<ffffffff8100cdd0>] ? restore_args+0x0/0x30
[<ffffffff81065500>] ? kthread+0x0/0xa0
[<ffffffff8100d400>] ? child_rip+0x0/0x20
[eparis@redhat.com: fix audit too]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If lockd is signalled soon enough after restart then locks_start_grace()
will try to re-add an entry to a list and trigger a lock corruption
warning.
Thanks to Wang Chen for the problem report and diagnosis.
WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
...
list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128).
...
Pid: 23062, comm: lockd Tainted: G W 2.6.30-rc2 #3
Call Trace:
[<c042d5b5>] warn_slowpath+0x71/0xa0
[<c0422a96>] ? update_curr+0x11d/0x125
[<c044b12d>] ? trace_hardirqs_on_caller+0x18/0x150
[<c044b270>] ? trace_hardirqs_on+0xb/0xd
[<c051c61a>] ? _raw_spin_lock+0x53/0xfa
[<c051c89f>] __list_add+0x27/0x5c
[<ef8f6daa>] locks_start_grace+0x22/0x30 [lockd]
[<ef8f34da>] set_grace_period+0x39/0x53 [lockd]
[<c06b8921>] ? lock_kernel+0x1c/0x28
[<ef8f3558>] lockd+0x64/0x164 [lockd]
[<c044b12d>] ? trace_hardirqs_on_caller+0x18/0x150
[<c04227b0>] ? complete+0x34/0x3e
[<ef8f34f4>] ? lockd+0x0/0x164 [lockd]
[<ef8f34f4>] ? lockd+0x0/0x164 [lockd]
[<c043dd42>] kthread+0x45/0x6b
[<c043dcfd>] ? kthread+0x0/0x6b
[<c0403c23>] kernel_thread_helper+0x7/0x10
Reported-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: stable@kernel.org
After 2f9092e102 "Fix i_mutex vs. readdir
handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
an entry may be removed between the first mutex_unlock and the second
mutex_lock. In this case, lookup_one_len() will return a negative
dentry. Check for this case to avoid a NULL dereference.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Cc: stable@kernel.org
When UBIFS runs out of space it spends a lot of time trying to
find more space before returning ENOSPC. As there is no point
repeating that unless something has changed, UBIFS has an
optimization to record that the file system is 100% full and not
try to find space. That flag was not being reset when a pending
deletion was finally done.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Reviewed-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
On mount, "sec=ntlmssp" can now be specified to allow
"rawntlmssp" security to be enabled during
CIFS session establishment/authentication (ntlmssp used to
require specifying krb5 which was counterintuitive).
Signed-off-by: Steve French <sfrench@us.ibm.com>
We were not setting the SMB uid in NTLMSSP authenticate
request which could lead to INVALID_PARAMETER error
on 2nd session setup.
Signed-off-by: Steve French <sfrench@us.ibm.com>
In the mainline ocfs2 code, the interface for masklog is in files under
/sys/fs/o2cb/masklog, but the comments in fs/ocfs2/cluster/masklog.h
reference the old /proc interface. They are out of date.
This patch modifies the comments in cluster/masklog.h, which also provides
a bash script example on how to change the log mask bits.
Signed-off-by: Coly Li <coly.li@suse.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Currently the kernel defines XATTR_LIST_MAX as 65536
in include/linux/limits.h. This is the largest buffer that is used for
listing xattrs.
But with ocfs2 xattr tree, we actually have no limit for the number. If
filesystem has more names than can fit in the buffer, the kernel
logs will be pollluted with something like this when listing:
(27738,0):ocfs2_iterate_xattr_buckets:3158 ERROR: status = -34
(27738,0):ocfs2_xattr_tree_list_index_block:3264 ERROR: status = -34
So don't print "ERROR" message as this is not an ocfs2 error.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
By using the same test as is used for /proc/pid/maps and /proc/pid/smaps,
only allow processes that can ptrace() a given process to see information
that might be used to bypass address space layout randomization (ASLR).
These include eip, esp, wchan, and start_stack in /proc/pid/stat as well
as the non-symbolic output from /proc/pid/wchan.
ASLR can be bypassed by sampling eip as shown by the proof-of-concept
code at http://code.google.com/p/fuzzyaslr/ As part of a presentation
(http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were
also noted as possibly usable information leaks as well. The
start_stack address also leaks potentially useful information.
Cc: Stable Team <stable@kernel.org>
Signed-off-by: Jake Edge <jake@lwn.net>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NTLMSSP code was removed from fs/cifs/connect.c and merged
(75% smaller, cleaner) into fs/cifs/sess.c
As with the old code it requires that cifs be built with
CONFIG_CIFS_EXPERIMENTAL, the /proc/fs/cifs/Experimental flag
must be set to 2, and mount must turn on extended security
(e.g. with sec=krb5).
Although NTLMSSP encapsulated in SPNEGO is not enabled yet,
"raw" ntlmssp is common and useful in some cases since it
offers more complete security negotiation, and is the
default way of negotiating security for many Windows systems.
SPNEGO encapsulated NTLMSSP will be able to reuse the same
code.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Eliminate 56 sparse warnings like this one:
fs/nfsd/nfs4xdr.c:1331:15: warning: obsolete array initializer, use C99 syntax
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The session and slots are allocated all in one piece.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We have sb_bgl_lock() and ext4_group_info.bb_state
bit spinlock to protech group information. The later is only
used within mballoc code. Consolidate them to use sb_bgl_lock().
This makes the mballoc.c code much simpler and also avoid
confusion with two locks protecting same info.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix getbmap vs mmap deadlock
xfs: a couple getbmap cleanups
xfs: add more checks to superblock validation
xfs_file_last_byte() needs to acquire ilock
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Change repository in MAINTAINERS.
ocfs2: Fix a missing credit when deleting from indexed directories.
ocfs2/trivial: Remove unused variable in ocfs2_rename.
ocfs2: Add missing iput() during error handling in ocfs2_dentry_attach_lock()
ocfs2: Fix some printk() warnings.
ocfs2: Fix 2 warning during ocfs2 make.
ocfs2: Reserve 1 more cluster in expanding_inline_dir for indexed dir.
If the file's blocks have not yet been allocated because of delayed
allocation, the length of the extent returned by fiemap is incorrect.
This commit fixes this bug.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>