From: Dave Chinner <dgc@sgi.com>
[hch: split out from bigger patch and minor adaptions]
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32192a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
From: Dave Chinner <dgc@sgi.com>
[hch: split out from bigger patch and minor adaptions]
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32191a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
From: Dave Chinner <dgc@sgi.com>
Because this is the first major generic btree routine this patch includes
some infrastrucure, first a few routines to deal with a btree block that
can be either in short or long form, second xfs_btree_read_buf_block,
which is the new central routine to read a btree block given a cursor, and
third the new xfs_btree_ptr_addr routine to calculate the address for a
given btree pointer record.
[hch: split out from bigger patch and minor adaptions]
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32190a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Add new helpers in xfs_btree.c to find the record, key and block pointer
entries inside a btree block. To implement this genericly the
->get_maxrecs methods and two new xfs_btree_ops entries for the key and
record sizes are used. Also add a big comment describing how the
addressing inside a btree block works.
Note that these helpers are unused until users are introduced in the next
patches and this patch will thus cause some harmless compiler warnings.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32189a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Factor xfs_btree_maxrecs into a per-btree operation.
The get_maxrecs method is based on a patch from Dave Chinner.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32188a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Make the existing bmap btree tracing generic so that it applies to all
btree types.
Some fragments lifted from a patch by Dave Chinner.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32187a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
From: Dave Chinner <dgc@sgi.com>
Introduce statistics coverage of all the btrees and cover all the btree
operations, not just some.
Invaluable for determining test code coverage of all the btree
operations....
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32184a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Move the various btree validation helpers around in xfs_btree.c so that
they are close to each other and in common #ifdef DEBUG sections.
Also add a new xfs_btree_check_ptr helper to check a btree ptr that can be
either long or short form.
Split out from a bigger patch from Dave Chinner with various small changes
applied by me.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32183a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
From: Dave Chinner <dgc@sgi.com>
Refactor xfs_btree_readahead to make it more readable:
(a) remove the inline xfs_btree_readahead wrapper and move all checks out
of line into the main routine.
(b) factor out helpers for short/long form btrees
(c) move check for root in inodes from the callers into
xfs_btree_readahead
[hch: split out from a big patch and minor cleanups]
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32182a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Add a flag to the xfs btree cursor when using long (64bit) block pointers
instead of checking btnum == XFS_BTNUM_BMAP.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32181a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
The bmap btree is rooted in the inode and not in a disk block. Make the
support for this feature more generic by adding a btree flag to for this
feature instead of relying on the XFS_BTNUM_BMAP btnum check.
Also clean up xfs_btree_get_block where this new flag is used.
Based upon a patch from Dave Chinner.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32180a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Add generic union types for btree pointers, keys and records. The generic
btree pointer contains either a 32 and 64bit big endian scalar for short
and long form btrees, and the key and record contain the relevant type for
each possible btree.
Split out from a bigger patch from Dave Chinner and simplified a little
further.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32178a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
xfs_btree_init_cursor contains close to little shared code for the
different btrees and will get even more non-common code in the future.
Split it up into one routine per btree type.
Because xfs_btree_dup_cursor needs to call the init routine for a generic
btree cursor add a new btree operation vector that contains a dup_cursor
method that initializes a new cursor based on an existing one.
The btree operations vector is based on an idea and code from Dave Chinner
and will grow more entries later during this series.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32176a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
This type is only embedded in struct xfs_btree_block and never used
directly. By moving the fields directly into struct xfs_btree_block a lot
of the macros for struct xfs_btree_sblock and struct xfs_btree_lblock can
be used for struct xfs_btree_block too now which helps greatly with some
of the migrations during implementing the generic btree code.
SGI-PV: 985583
SGI-Modid: xfs-linux-melb:xfs-kern:32174a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Lock debugging reported the ilock was being destroyed without being
unlocked. We don't need to lock the inode until we are going to insert it
into the radix tree.
SGI-PV: 987246
SGI-Modid: xfs-linux-melb:xfs-kern:32159a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Destroying the quota stuff on unmount can access the log - ie
XFS_QM_DONE() ends up in xfs_dqunlock() which calls
xfs_trans_unlocked_item() and then xfs_log_move_tail(). By this time the
log has already been destroyed. Just move the cleanup of the quota code
earlier in xfs_unmountfs() before the call to xfs_log_unmount(). Moving
XFS_QM_DONE() up near XFS_QM_DQPURGEALL() seems like a good spot.
SGI-PV: 987086
SGI-Modid: xfs-linux-melb:xfs-kern:32148a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Peter Leckie <pleckie@sgi.com>
To avoid having to initialise some fields of the XFS inode on every
allocation, we can use the slab init-once feature to initialise them. All
we have to guarantee is that when we free the inode, all it's entries are
in the initial state. Add asserts where possible to ensure debug kernels
check this initial state before freeing and after allocation.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31925a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
gcc warns when using the # modifier with the %p format specifier,
so we can't use this to omit the colons when needed, introduces
%pi6 instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds transaction IDs to root tree pointers.
Transaction IDs in tree pointers are compared with the
generation numbers in block headers when reading root
blocks of trees. This can detect some types of IO errors.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This patch removes the giant fs_info->alloc_mutex and replaces it with a bunch
of little locks.
There is now a pinned_mutex, which is used when messing with the pinned_extents
extent io tree, and the extent_ins_mutex which is used with the pending_del and
extent_ins extent io trees.
The locking for the extent tree stuff was inspired by a patch that Yan Zheng
wrote to fix a race condition, I cleaned it up some and changed the locking
around a little bit, but the idea remains the same. Basically instead of
holding the extent_ins_mutex throughout the processing of an extent on the
extent_ins or pending_del trees, we just hold it while we're searching and when
we clear the bits on those trees, and lock the extent for the duration of the
operations on the extent.
Also to keep from getting hung up waiting to lock an extent, I've added a
try_lock_extent so if we cannot lock the extent, move on to the next one in the
tree and we'll come back to that one. I have tested this heavily and it does
not appear to break anything. This has to be applied on top of my
find_free_extent redo patch.
I tested this patch on top of Yan's space reblancing code and it worked fine.
The only thing that has changed since the last version is I pulled out all my
debugging stuff, apparently I forgot to run guilt refresh before I sent the
last patch out. Thank you,
Signed-off-by: Josef Bacik <jbacik@redhat.com>
So there is an odd case where we can possibly return -ENOSPC when there is in
fact space to be had. It only happens with Metadata writes, and happens _very_
infrequently. What has to happen is we have to allocate have allocated out of
the first logical byte on the disk, which would set last_alloc to
first_logical_byte(root, 0), so search_start == orig_search_start. We then
need to allocate for normal metadata, so BTRFS_BLOCK_GROUP_METADATA |
BTRFS_BLOCK_GROUP_DUP. We will do a block lookup for the given search_start,
block_group_bits() won't match and we'll go to choose another block group.
However because search_start matches orig_search_start we go to see if we can
allocate a chunk.
If we are in the situation that we cannot allocate a chunk, we fail and ENOSPC.
This is kind of a big flaw of the way find_free_extent works, as it along with
find_free_space loop through _all_ of the block groups, not just the ones that
we want to allocate out of. This patch completely kills find_free_space and
rolls it into find_free_extent. I've introduced a sort of state machine into
this, which will make it easier to get cache miss information out of the
allocator, and will work well with my locking changes.
The basic flow is this: We have the variable loop which is 0, meaning we are
in the hint phase. We lookup the block group for the hint, and lookup the
space_info for what we want to allocate out of. If the block group we were
pointed at by the hint either isn't of the correct type, or just doesn't have
the space we need, we set head to space_info->block_groups, so we start at the
beginning of the block groups for this particular space info, and loop through.
This is also where we add the empty_cluster to total_needed. At this point
loop is set to 1 and we just loop through all of the block groups for this
particular space_info looking for the space we need, just as find_free_space
would have done, except we only hit the block groups we want and not _all_ of
the block groups. If we come full circle we see if we can allocate a chunk.
If we cannot of course we exit with -ENOSPC and we are good. If not we start
over at space_info->block_groups and loop through again, with loop == 2. If we
come full circle and haven't found what we need then we exit with -ENOSPC.
I've been running this for a couple of days now and it seems stable, and I
haven't yet hit a -ENOSPC when there was plenty of space left.
Also I've made a groups_sem to handle the group list for the space_info. This
is part of my locking changes, but is relatively safe and seems better than
holding the space_info spinlock over that entire search time. Thanks,
Signed-off-by: Josef Bacik <jbacik@redhat.com>
This patch improves the space balancing code to keep more sharing
of tree blocks. The only case that breaks sharing of tree blocks is
data extents get fragmented during balancing. The main changes in
this patch are:
Add a 'drop sub-tree' function. This solves the problem in old code
that BTRFS_HEADER_FLAG_WRITTEN check breaks sharing of tree block.
Remove relocation mapping tree. Relocation mappings are stored in
struct btrfs_ref_path and updated dynamically during walking up/down
the reference path. This reduces CPU usage and simplifies code.
This patch also fixes a bug. Root items for reloc trees should be
updated in btrfs_free_reloc_root.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This is a large change for adding compression on reading and writing,
both for inline and regular extents. It does some fairly large
surgery to the writeback paths.
Compression is off by default and enabled by mount -o compress. Even
when the -o compress mount option is not used, it is possible to read
compressed extents off the disk.
If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later.
* While finding delalloc extents, the pages are locked before being sent down
to the delalloc handler. This allows the delalloc handler to do complex things
such as cleaning the pages, marking them writeback and starting IO on their
behalf.
* Inline extents are inserted at delalloc time now. This allows us to compress
the data before inserting the inline extent, and it allows us to insert
an inline extent that spans multiple pages.
* All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
are changed to record both an in-memory size and an on disk size, as well
as a flag for compression.
From a disk format point of view, the extent pointers in the file are changed
to record the on disk size of a given extent and some encoding flags.
Space in the disk format is allocated for compression encoding, as well
as encryption and a generic 'other' field. Neither the encryption or the
'other' field are currently used.
In order to limit the amount of data read for a single random read in the
file, the size of a compressed extent is limited to 128k. This is a
software only limit, the disk format supports u64 sized compressed extents.
In order to limit the ram consumed while processing extents, the uncompressed
size of a compressed extent is limited to 256k. This is a software only limit
and will be subject to tuning later.
Checksumming is still done on compressed extents, and it is done on the
uncompressed version of the data. This way additional encodings can be
layered on without having to figure out which encoding to checksum.
Compression happens at delalloc time, which is basically singled threaded because
it is usually done by a single pdflush thread. This makes it tricky to
spread the compression load across all the cpus on the box. We'll have to
look at parallel pdflush walks of dirty inodes at a later time.
Decompression is hooked into readpages and it does spread across CPUs nicely.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
CIFS in some heavy stress conditions cifs could get EAGAIN
repeatedly in smb_send2 which led to repeated retries and eventually
failure of large writes which could lead to data corruption.
There are three changes that were suggested by various network
developers:
1) convert cifs from non-blocking to blocking tcp sendmsg
(we left in the retry on failure)
2) change cifs to not set sendbuf and rcvbuf size for the socket
(let tcp autotune the buffer sizes since that works much better
in the TCP stack now)
3) if we have a partial frame sent in smb_send2, mark the tcp
session as invalid (close the socket and reconnect) so we do
not corrupt the remaining part of the SMB with the beginning
of the next SMB.
This does not appear to hurt performance measurably and has
been run in various scenarios, but it definately removes
a corruption that we were seeing in some high stress
test cases.
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The iscsi_ibft.c changes are almost certainly a bugfix as the
pointer 'ip' is a u8 *, so they never print the last 8 bytes
of the IPv6 address, and the eight bytes they do print have
a zero byte with them in each 16-bit word.
Other than that, this should cause no difference in functionality.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The most important property we need from nfs_attr_generation_counter is
monotonicity, which is not guaranteed by the current system of smp memory
barriers. We should convert it to an atomic_long_t, and drop the memory
barriers.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The original ext3 hash algorithms assumed that variables of type char
were signed, as God and K&R intended. Unfortunately, this assumption
is not true on some architectures. Userspace support for marking
filesystems with non-native signed/unsigned chars was added two years
ago, but the kernel-side support was never added (until now).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The original ext3 hash algorithms assumed that variables of type char
were signed, as God and K&R intended. Unfortunately, this assumption
is not true on some architectures. Userspace support for marking
filesystems with non-native signed/unsigned chars was added two years
ago, but the kernel-side support was never added (until now).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: akpm@linux-foundation.org
Cc: linux-kernel@vger.kernel.org
fs/ext4/balloc.c:607: warning: format '%lld' expects type 'long long int', but argument 2 has type 's64'
fs/ext4/inode.c:1822: warning: format '%lld' expects type 'long long int', but argument 2 has type 's64'
fs/ext4/inode.c:1824: warning: format '%lld' expects type 'long long int', but argument 2 has type 's64'
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As reported by Eric Paris, the capable() check in ext4_has_free_blocks()
sometimes causes SELinux denials.
We can rearrange the logic so that we only try to use the root-reserved
blocks when necessary, and even then we can move the capable() test
to last, to avoid the check most of the time.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mingming pointed out that ext4_claim_free_blocks & ext4_has_free_blocks
are largely cut & pasted; they can be collapsed/merged as follows.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The transaction can potentially get dropped if there are no buffers
that need to be written. Make sure we call the commit callback before
potentially deciding to drop the transaction. Also avoid
dereferencing the commit_transaction pointer in the marker for the
same reason.
This patch fixes the bug reported by Eric Paris at:
http://bugzilla.kernel.org/show_bug.cgi?id=11838
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Eric Paris <eparis@redhat.com>
Vegard Nossum reported a bug which accesses freed memory (found via
kmemcheck). When journal has been aborted, ext4_put_super() calls
ext4_abort() after freeing the journal_t object, and then ext4_abort()
accesses it. This patch fix it.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Vegard Nossum reported a bug which accesses freed memory (found via
kmemcheck). When journal has been aborted, ext3_put_super() calls
ext3_abort() after freeing the journal_t object, and then ext3_abort()
accesses it. This patch fix it.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Turned out some VMware userspace does pread(2) on /proc/uptime, but
seqfiles currently don't allow pread() resulting in -ESPIPE.
Seqfiles in theory can do pread(), but this can be a long story,
so revert to ->read_proc until then.
http://bugzilla.kernel.org/show_bug.cgi?id=11856
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
In commit f337b9c583 ("epoll: drop
unnecessary test") Thomas found that there is an unnecessary (always
true) test in ep_send_events(). The callback never inserts into
->rdllink while the send loop is performed, and also does the
~EP_PRIVATE_BITS test. Given we're holding the mutex during this time,
the conditions tested inside the loop are always true.
HOWEVER.
The test "!ep_is_linked(&epi->rdllink)" wasn't there because we insert
into ->rdllink, but because the send-events loop might terminate before
the whole list is scanned (-EFAULT).
In such cases, when the loop terminates early, and when a (leftover)
file received an event while we're performing the lockless loop, we need
such test to avoid to double insert the epoll items. The list_splice()
done a few steps below, will correctly re-insert the ones that were left
on "txlist".
This should fix the kenrel.org bugzilla entry 11831.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some userland apps seem to pass in a "0" for the seconds, and several
seconds worth of usecs to select(). The old kernels accepted this just
fine, so the new kernels must too.
However, due to the upscaling of the microseconds to nanoseconds we had
some cases where we got math overflow, and depending on the GCC version
(due to inlining decisions) that actually resulted in an -EINVAL return.
This patch fixes this by adding the excess microseconds to the seconds
field.
Also with thanks to Marcin Slusarz for spotting some implementation bugs
in the diagnostics patches.
Reported-by: Carlos R. Mafra <crmafra2@gmail.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a regression caused by commit d0156417, "ext4: fix ext4_dx_readdir
hash collision handling", where deleting files in a large directory
(requiring more than one getdents system call), results in some
filenames being returned twice. This was caused by a failure to
update info->curr_hash and info->curr_minor_hash, so that if the
directory had gotten modified since the last getdents() system call
(as would be the case if the user is running "rm -r" or "git clean"),
a directory entry would get returned twice to the userspace.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch fixes the bug reported by Markus Trippelsdorf at:
http://bugzilla.kernel.org/show_bug.cgi?id=11844
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Fix a regression caused by commit 6a897cf4, "ext3: fix ext3_dx_readdir
hash collision handling", where deleting files in a large directory
(requiring more than one getdents system call), results in some
filenames being returned twice. This was caused by a failure to
update info->curr_hash and info->curr_minor_hash, so that if the
directory had gotten modified since the last getdents() system call
(as would be the case if the user is running "rm -r" or "git clean"),
a directory entry would get returned twice to the userspace.
This patch fixes the bug reported by Markus Trippelsdorf at:
http://bugzilla.kernel.org/show_bug.cgi?id=11844
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ All users removed in "switch all filesystems over to d_obtain_alias",
aka commit 440037287c ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This one was due to a merge error: we added a use of nd.path in commit
2d7c820e56 ("ext3: add checks for errors
from jbd"), and concurrently we got rid of 'nd' and used a naked 'path'
in commit 8264613def ("[PATCH] switch
quota_on-related stuff to kern_path()").
That all merged cleanly, but it didn't actually _work_. This should fix
it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
hrtimers: add missing docbook comments to struct hrtimer
hrtimers: simplify hrtimer_peek_ahead_timers()
hrtimers: fix docbook comments
DECLARE_PER_CPU needs linux/percpu.h
hrtimers: fix typo
rangetimers: fix the bug reported by Ingo for real
rangetimer: fix BUG_ON reported by Ingo
rangetimer: fix x86 build failure for the !HRTIMERS case
select: fix alpha OSF wrapper
select: fix alpha OSF wrapper
hrtimer: peek at the timer queue just before going idle
hrtimer: make the futex() system call use the per process slack value
hrtimer: make the nanosleep() syscall use the per process slack
hrtimer: fix signed/unsigned bug in slack estimator
hrtimer: show the timer ranges in /proc/timer_list
hrtimer: incorporate feedback from Peter Zijlstra
hrtimer: add a hrtimer_start_range() function
hrtimer: another build fix
hrtimer: fix build bug found by Ingo
hrtimer: make select() and poll() use the hrtimer range feature
...