Commit Graph

25912 Commits

Author SHA1 Message Date
Cong Wang
7b9c0976ac nilfs2: remove the second argument of k[un]map_atomic()
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:24 +08:00
Cong Wang
2b86ce2db3 nfs: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:24 +08:00
Cong Wang
27a6d5c742 minix: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:24 +08:00
Cong Wang
50bc9b65b6 logfs: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:24 +08:00
Cong Wang
303a8f2afc jbd2: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:23 +08:00
Cong Wang
8fb53c46d9 jbd: remove the second argument of k[un]map_atomic()
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:23 +08:00
Cong Wang
d93492855f gfs2: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:23 +08:00
Cong Wang
2408f6ef6b fuse: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:22 +08:00
Cong Wang
d4a23aee23 ext2: remove the second argument of k[un]map_atomic()
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:22 +08:00
Cong Wang
bf7014b67f exofs: remove the second argument of k[un]map_atomic()
Ack-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:22 +08:00
Cong Wang
da4aa36d01 afs: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:22 +08:00
Cong Wang
7ac687d9e0 btrfs: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:21 +08:00
Cong Wang
e8e3c3d66f fs: remove the second argument of k[un]map_atomic()
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:21 +08:00
Laura Vasilescu
f1f996b66c kcore: fix spelling in read_kcore() comment
Signed-off-by: Laura Vasilescu <laura@rosedu.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-03-20 12:24:10 +01:00
Bob Peterson
220cca2a4f GFS2: Change truncate page allocation to be GFP_NOFS
This patch changes the page allocation in gfs2_block_truncate_page
and two others to GFP_NOFS to avoid deadlock in low-memory conditions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-03-20 11:05:00 +00:00
Linus Torvalds
b0e37d7ac6 Merge branch 'dcache-word-accesses'
* branch 'dcache-word-accesses':
  vfs: use 'unsigned long' accesses for dcache name comparison and hashing

This does the name hashing and lookup using word-sized accesses when
that is efficient, namely on x86 (although any little-endian machine
with good unaligned accesses would do).

It does very much depend on little-endian logic, but it's a very hot
couple of functions under some real loads, and this patch improves the
performance of __d_lookup_rcu() and link_path_walk() by up to about 30%.
Giving a 10% improvement on some very pathname-heavy benchmarks.

Because we do make unaligned accesses past the filename, the
optimization is disabled when CONFIG_DEBUG_PAGEALLOC is active, and we
effectively depend on the fact that on x86 we don't really ever have the
last page of usable RAM followed immediately by any IO memory (due to
ACPI tables, BIOS buffer areas etc).

Some of the bit operations we do are a bit "subtle".  It's commented,
but you do need to really think about the code.  Or just consider it
black magic.

Thanks to people on G+ for some of the optimized bit tricks.
2012-03-19 16:37:28 -07:00
Linus Torvalds
6d7d1a0dc7 vfs: get rid of batshit-insane pointless dentry hash calculations
For some odd historical reason, the final mixing round for the dentry
cache hash table lookup had an insane "xor with big constant" logic.  In
two places.

The big constant that is being xor'ed is GOLDEN_RATIO_PRIME, which is a
fairly random-looking number that is designed to be *multiplied* with so
that the bits get spread out over a whole long-word.

But xor'ing with it is insane.  It doesn't really even change the hash -
it really only shifts the hash around in the hash table.  To make
matters worse, the insane big constant is different on 32-bit and 64-bit
builds, even though the name hash bits we use are always 32-bit (and the
bits from the pointer we mix in effectively are too).

It's all total voodoo programming, in other words.

Now, some testing and analysis of the hash chains shows that the rest of
the hash function seems to be fairly good.  It does pick the right bits
of the parent dentry pointer, for example, and while it's generally a
bad idea to use an xor to mix down the upper bits (because if there is a
repeating pattern, the xor can cause "destructive interference"), it
seems to not have been a disaster.

For example, replacing the hash with the normal "hash_long()" code (that
uses the GOLDEN_RATIO_PRIME constant correctly, btw) actually just makes
the hash worse.  The hand-picked hash knew which bits of the pointer had
the highest entropy, and hash_long() ends up mixing bits less optimally
at least in some trivial tests.

So the hash function overall seems fine, it just has that really odd
"shift result around by a constant xor".

So get rid of the silly xor, and replace the down-mixing of the bits
with an add instead of an xor that tends to not have the same kind of
destructive interference issues.  Some stats on the resulting hash
chains shows that they look statistically identical before and after,
but the code is simpler and no longer makes you go "WTF?".

Also, the incoming hash really is just "unsigned int", not a long, and
there's no real point to worry about the high 26 bits of the dentry
pointer for the 64-bit case, because they are all going to be identical
anyway.

So also change the hashing to be done in the more natural 'unsigned int'
that is the real size of the actual hashed data anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-19 16:19:53 -07:00
Konrad Rzeszutek Wilk
16c0cfa425 Merge branch 'stable/cleancache.v13' into linux-next
* stable/cleancache.v13:
  mm: cleancache: Use __read_mostly as appropiate.
  mm: cleancache: report statistics via debugfs instead of sysfs.
  mm: zcache/tmem/cleancache: s/flush/invalidate/
  mm: cleancache: s/flush/invalidate/
2012-03-19 12:12:19 -04:00
David S. Miller
4da0bd7365 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-18 23:29:41 -04:00
Jason Baron
93dc6107a7 Don't limit non-nested epoll paths
Commit 28d82dc1c4 ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).

The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.

This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-18 12:25:04 -07:00
Linus Torvalds
cb1ecf25a8 Merge branch 'akpm' (more patches from Andrew)
Merge some more email patches from Andrew Morton:
 "A couple of nilfs fixes"

* emailed from Andrew Morton <akpm@linux-foundation.org>:
  nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
  nilfs2: clamp ns_r_segments_percentage to [1, 99]
2012-03-16 17:14:55 -07:00
Ryusuke Konishi
d7178c79d9 nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:

 BUG: unable to handle kernel NULL pointer dereference at 00000048
 IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
 *pde = 00000000
 Oops: 0000 [#1] PREEMPT SMP
 ...
 Call Trace:
  [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
  [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
  [<c0226636>] mount_fs+0x36/0x180
  [<c023d961>] vfs_kern_mount+0x51/0xa0
  [<c023ddae>] do_kern_mount+0x3e/0xe0
  [<c023f189>] do_mount+0x169/0x700
  [<c023fa9b>] sys_mount+0x6b/0xa0
  [<c04abd1f>] sysenter_do_call+0x12/0x28
 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
 EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc
 CR2: 0000000000000048

This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.

This patch fixes it and eliminates the reported oops.

Reported-by: Slicky Devil <slicky.dvl@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Slicky Devil <slicky.dvl@gmail.com>
Cc: <stable@vger.kernel.org>	[2.6.30+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-16 17:14:44 -07:00
Haogang Chen
3d777a6406 nilfs2: clamp ns_r_segments_percentage to [1, 99]
ns_r_segments_percentage is read from the disk.  Bogus or malicious
value could cause integer overflow and malfunction due to meaningless
disk usage calculation.  This patch reports error when mounting such
bogus volumes.

Signed-off-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-16 17:14:44 -07:00
Anton Blanchard
c017386352 afs: Remote abort can cause BUG in rxrpc code
When writing files to afs I sometimes hit a BUG:

kernel BUG at fs/afs/rxrpc.c:179!

With a backtrace of:

	afs_free_call
	afs_make_call
	afs_fs_store_data
	afs_vnode_store_data
	afs_write_back_from_locked_page
	afs_writepages_region
	afs_writepages

The cause is:

	ASSERT(skb_queue_empty(&call->rx_queue));

Looking at a tcpdump of the session the abort happens because we
are exceeding our disk quota:

	rx abort fs reply store-data error diskquota exceeded (32)

So the abort error is valid. We hit the BUG because we haven't
freed all the resources for the call.

By freeing any skbs in call->rx_queue before calling afs_free_call
we avoid hitting leaking memory and avoid hitting the BUG.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-16 17:01:41 -07:00
Anton Blanchard
2c724fb927 afs: Read of file returns EBADMSG
A read of a large file on an afs mount failed:

# cat junk.file > /dev/null
cat: junk.file: Bad message

Looking at the trace, call->offset wrapped since it is only an
unsigned short. In afs_extract_data:

        _enter("{%u},{%zu},%d,,%zu", call->offset, len, last, count);
...

        if (call->offset < count) {
                if (last) {
                        _leave(" = -EBADMSG [%d < %zu]", call->offset, count);
                        return -EBADMSG;
                }

Which matches the trace:

[cat   ] ==> afs_extract_data({65132},{524},1,,65536)
[cat   ] <== afs_extract_data() = -EBADMSG [0 < 65536]

call->offset went from 65132 to 0. Fix this by making call->offset an
unsigned int.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-16 17:01:41 -07:00
Rafael J. Wysocki
cf3bbcaf09 Merge branch 'pm-sleep'
* pm-sleep:
  PM / Sleep: JBD and JBD2 missing set_freezable()
2012-03-16 21:49:32 +01:00
Linus Torvalds
f1cbd03f5e Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Been sitting on this for a while, but lets get this out the door.
  This fixes various important bugs for 3.3 final, along with a few more
  trivial ones.  Please pull!"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix ioc leak in put_io_context
  block, sx8: fix pointer math issue getting fw version
  Block: use a freezable workqueue for disk-event polling
  drivers/block/DAC960: fix -Wuninitialized warning
  drivers/block/DAC960: fix DAC960_V2_IOCTL_Opcode_T -Wenum-compare warning
  block: fix __blkdev_get and add_disk race condition
  block: Fix setting bio flags in drivers (sd_dif/floppy)
  block: Fix NULL pointer dereference in sd_revalidate_disk
  block: exit_io_context() should call elevator_exit_icq_fn()
  block: simplify ioc_release_fn()
  block: replace icq->changed with icq->flags
2012-03-14 17:16:45 -07:00
Linus Torvalds
8e8bb96d24 Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Do not kmalloc under the flocks spinlock
  cifs: possible memory leak in xattr.
2012-03-13 17:03:53 -07:00
Nigel Cunningham
35c80422af PM / Sleep: JBD and JBD2 missing set_freezable()
With the latest and greatest changes to the freezer, I started seeing
panics that were caused by jbd2 running post-process freezing and
hitting the canary BUG_ON for non-TuxOnIce I/O submission. I've traced
this back to a lack of set_freezable calls in both jbd and jbd2. Since
they're clearly meant to be frozen (there are tests for freezing()), I
submit the following patch to add the missing calls.

Signed-off-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-03-13 22:36:44 +01:00
Ingo Molnar
47258cf3c4 Merge tag 'v3.3-rc7' into sched/core
Merge reason: merge back final fixes, prepare for the merge window.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13 16:26:52 +01:00
Ingo Molnar
35239e23c6 Merge branch 'perf/urgent' into perf/core
Merge reason: We are going to queue up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:44:11 +01:00
Al Viro
310fa7a367 restore smp_mb() in unlock_new_inode()
wait_on_inode() doesn't have ->i_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-10 17:07:28 -05:00
Miklos Szeredi
7f6c7e62fc vfs: fix return value from do_last()
complete_walk() returns either ECHILD or ESTALE.  do_last() turns this into
ECHILD unconditionally.  If not in RCU mode, this error will reach userspace
which is complete nonsense.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-10 17:05:30 -05:00
Miklos Szeredi
097b180ca0 vfs: fix double put after complete_walk()
complete_walk() already puts nd->path, no need to do it again at cleanup time.

This would result in Oopses if triggered, apparently the codepath is not too
well exercised.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-10 17:05:30 -05:00
Jan Kara
f6940fe909 udf: Fix deadlock in udf_release_file()
udf_release_file() can be called from munmap() path with mmap_sem held.  Thus
we cannot take i_mutex there because that ranks above mmap_sem. Luckily,
i_mutex is not needed in udf_release_file() anymore since protection by
i_data_sem is enough to protect from races with write and truncate.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-10 16:05:38 -05:00
Tyler Hicks
978d6d8c45 vfs: Correctly set the dir i_mutex lockdep class
9a7aa12f39 introduced additional logic around setting the i_mutex
lockdep class for directory inodes. The idea was that some filesystems
may want their own special lockdep class for different directory
inodes and calling unlock_new_inode() should not clobber one of
those special classes.

I believe that the added conditional, around the *negated* return value
of lockdep_match_class(), caused directory inodes to be placed in the
wrong lockdep class.

inode_init_always() sets the i_mutex lockdep class with i_mutex_key for
all inodes. If the filesystem did not change the class during inode
initialization, then the conditional mentioned above was false and the
directory inode was incorrectly left in the non-directory lockdep class.
If the filesystem did set a special lockdep class, then the conditional
mentioned above was true and that class was clobbered with
i_mutex_dir_key.

This patch removes the negation from the conditional so that the i_mutex
lockdep class is properly set for directory inodes. Special classes are
preserved and directory inodes with unmodified classes are set with
i_mutex_dir_key.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-10 16:05:38 -05:00
Al Viro
c7b2855505 aio: fix the "too late munmap()" race
Current code has put_ioctx() called asynchronously from aio_fput_routine();
that's done *after* we have killed the request that used to pin ioctx,
so there's nothing to stop io_destroy() waiting in wait_for_all_aios()
from progressing.  As the result, we can end up with async call of
put_ioctx() being the last one and possibly happening during exit_mmap()
or elf_core_dump(), neither of which expects stray munmap() being done
to them...

We do need to prevent _freeing_ ioctx until aio_fput_routine() is done
with that, but that's all we care about - neither io_destroy() nor
exit_aio() will progress past wait_for_all_aios() until aio_fput_routine()
does really_put_req(), so the ioctx teardown won't be done until then
and we don't care about the contents of ioctx past that point.

Since actual freeing of these suckers is RCU-delayed, we don't need to
bump ioctx refcount when request goes into list for async removal.
All we need is rcu_read_lock held just over the ->ctx_lock-protected
area in aio_fput_routine().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-09 18:59:59 -08:00
Al Viro
86b62a2cb4 aio: fix io_setup/io_destroy race
Have ioctx_alloc() return an extra reference, so that caller would drop it
on success and not bother with re-grabbing it on failure exit.  The current
code is obviously broken - io_destroy() from another thread that managed
to guess the address io_setup() would've returned would free ioctx right
under us; gets especially interesting if aio_context_t * we pass to
io_setup() points to PROT_READ mapping, so put_user() fails and we end
up doing io_destroy() on kioctx another thread has just got freed...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-09 18:59:59 -08:00
Linus Torvalds
86e0600833 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "I have two additional and btrfs fixes in my for-linus branch.  One is
  a casting error that leads to memory corruption on i386 during scrub,
  and the other fixes a corner case in the backref walking code (also
  triggered by scrub)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix casting error in scrub reada code
  btrfs: fix locking issues in find_parent_nodes()
2012-03-09 18:09:18 -08:00
David S. Miller
b2d3298e09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-09 14:34:20 -08:00
Greg Kroah-Hartman
263a5c8e16 Merge 3.3-rc6 into driver-core-next
This was done to resolve a conflict in the drivers/base/cpu.c file.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-09 12:35:53 -08:00
Benjamin Marzinski
58a7d5fb8e GFS2: call gfs2_write_alloc_required for each chunk
gfs2_fallocate was calling gfs2_write_alloc_required() once at the start of
the function. This caused problems since gfs2_write_alloc_required used a
long unsigned int for the len, but gfs2_fallocate could allocate a much
larger amount.  This patch will move the call into the loop where the
chunks are actually allocated and zeroed out. This will keep the allocation
size under the limit, and also allow gfs2_fallocate to quickly skip over
sections of the file that are already completely allocated.

fallcate_chunk was also not correctly setting the file size.  It was using the
len veriable to find the last block written to, but by the time it was setting
the size, the len variable had already been decremented to 0.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-03-09 15:29:10 +00:00
Steven Whitehouse
34cc1781c2 GFS2: Clean up log flush header writing
We already send both a pre and post flush to the block device
when writing a journal header. There is no need to wait for
the previous I/O specifically when we do this, unless we've
turned "barriers" off.

As a side effect, this also cleans up the code path for flushing
the journal and makes it more readable.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-03-09 14:07:06 +00:00
Linus Torvalds
bfcfaa77bd vfs: use 'unsigned long' accesses for dcache name comparison and hashing
Ok, this is hacky, and only works on little-endian machines with goo
unaligned handling.  And even then only with CONFIG_DEBUG_PAGEALLOC
disabled, since it can access up to 7 bytes after the pathname.

But it runs like a bat out of hell.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-08 18:08:44 -08:00
Benjamin Poirier
2f2d76cc3e dlm: Do not allocate a fd for peeloff
avoids allocating a fd that a) propagates to every kernel thread and
usermodehelper b) is not properly released.

References: http://article.gmane.org/gmane.linux.network.drbd/22529
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 13:52:09 -08:00
Greg Kroah-Hartman
54d20f006c Revert "sysfs: Kill nlink counting."
This reverts commit 524b6c5b39.

It has shown to break userspace tools, which is not acceptable.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 13:03:10 -08:00
David Teigland
7210cb7a72 dlm: fix slow rsb search in dir recovery
The function used to find an rsb during directory
recovery was searching the single linear list of
rsb's.  This wasted a lot of time compared to
using the standard hash table to find the rsb.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-03-08 14:46:30 -06:00
Steven Whitehouse
75ca61c101 GFS2: Remove a __GFP_NOFAIL allocation
In order to ensure that we've got enough buffer heads for flushing
the journal, the orignal code used __GFP_NOFAIL when performing
this allocation. Here we dispense with that in favour of using a
mempool. This should improve efficiency in low memory conditions
since flushing the journal is a good way to get memory back, we
don't want to be spinning, waiting on memory allocations. The
buffers which are allocated via this mempool are fairly short lived,
so that we'll recycle them pretty quickly.

Although there are other memory allocations which occur during the
journal flush process, this is the one which can potentially require
the most memory, so the most important one to fix.

The amount of memory reserved is a fixed amount, and we should not need
to scale it when there are a greater number of filesystems in use.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-03-08 12:10:23 +00:00
Fengguang Wu
c097b2ca51 writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-03-07 16:08:46 +01:00
Steven Whitehouse
35e478f422 GFS2: Flush pending glock work when evicting an inode
This ensures that we will not try to access the inode thats
being flushed via the glock after it has been freed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-03-07 10:43:02 +00:00