The reada code from scrub was casting down a u64 to
an unsigned long so it could insert it into a radix tree.
What it really wanted to do was cast down the result of a shift, instead
of casting down the u64. The bug resulted in trying to insert our
reada struct into the wrong place, which caused soft lockups and other
problems.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
- We might unlock head->mutex while it was not locked
- We might leave the function without unlocking delayed_refs->lock
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The code in link_path_walk() that finds out the length and the hash of
the next path component is some of the hottest code in the kernel. And
I have a version of it that does things at the full width of the CPU
wordsize at a time, but that means that we *really* want to split it up
into a separate helper function.
So this re-organizes the code a bit and splits the hashing part into a
helper function called "hash_name()". It returns the length of the
pathname component, while at the same time computing and writing the
hash to the appropriate location.
The code generation is slightly changed by this patch, but generally for
the better - and the added abstraction actually makes the code easier to
read too. And the new interface is well suited for replacing just the
"hash_name()" function with alternative implementations.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
.. and also use it in lookup_one_len() rather than open-coding it.
There aren't any performance-critical users, so inlining it is silly.
But it wouldn't matter if it wasn't for the fact that the word-at-a-time
dentry name patches want to conditionally replace the function, and
uninlining it sets the stage for that.
So again, this is a preparatory patch that doesn't change any semantics,
and only prepares for a much cleaner and testable word-at-a-time dentry
name accessor patch.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These don't change any semantics, but they clean up the code a bit and
mark some arguments appropriately 'const'.
They came up as I was doing the word-at-a-time dcache name accessor
code, and cleaning this up now allows me to send out a smaller relevant
interesting patch for the experimental stuff.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Get rid of
encode_compound: tag=
when XDR debugging is enabled. The current Linux client never sets
compound tags.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The fh_expire_type file attribute is a filesystem wide attribute that
consists of flags that indicate what characteristics file handles
on this FSID have.
Our client doesn't support volatile file handles. It should find
out early (say, at mount time) whether the server is going to play
shenanighans with file handles during a migration.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The Linux NFS client must distinguish between referral events (which
it currently supports) and migration events (which it does not yet
support).
In both types of events, an fs_locations array is returned. But upper
layers, not the XDR layer, should make the distinction between a
referral and a migration. There really isn't a way for an XDR decoder
function to distinguish the two, in general.
Slightly adjust the FATTR flags returned by decode_fs_locations()
to set NFS_ATTR_FATTR_V4_LOCATIONS only if a non-empty locations
array was returned from the server. Then have logic in nfs4proc.c
distinguish whether the locations array is for a referral or
something else.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: pass just the clientid4 to encode_renew(). This enables it
to be used by callers who might not have an full nfs_client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For debugging, introduce a simplistic function to print NFS file
handles on the system console. The main function is hooked into the
dprintk debugging facility, but you can directly call the helper,
_nfs_display_fhandle(), if you want to print a handle unconditionally.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For NFSv4 mounts, the clientaddr= mount option has always been
required. Now we have rpc_localaddr() in the kernel, which was
modeled after the same logic in the mount.nfs command that constructs
the clientaddr= mount option. If user space doesn't provide a
clientaddr= mount option, the kernel can now construct its own.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When the cl_xprt field is updated, the cl_server field will also have
to change. Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.4 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A migration event will replace the rpc_xprt used by an rpc_clnt. To
ensure this can be done safely, all references to cl_xprt must now use
a form of rcu_dereference().
Special care is taken with rpc_peeraddr2str(), which returns a pointer
to memory whose lifetime is the same as the rpc_xprt.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: fix lockdep splats and layering violations ]
[ cel: forward ported to 3.4 ]
[ cel: remove rpc_max_reqs(), add rpc_net_ns() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
CLOSE is new with NFSv4. Sometimes it's important to know the timing
of this operation compared to things like lease renewal.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I noticed recently that decode_attr_fs_locations() is not generating
very pretty debugging output. The pathname components each appear on
a separate line of output, though that does not appear to be the
intended display behavior. The preferred way to generate continued
lines of output on the console is to use pr_cont().
Note that incoming pathname4 components contain a string that is not
necessarily NUL-terminated. I did actually see some trailing garbage
on the console. In addition to correcting the line continuation
problem, add a string precision format specifier to ensure that each
component string is displayed properly, and that vsnprintf() does
not Oops.
Someone pointed out that allowing incoming network data to possibly
generate a console line of unbounded length may not be such a good
idea. Since this output will rarely be enabled, and there is a hard
upper bound (NFS4_PATHNAME_MAXCOMPONENTS) in our implementation, this
is probably not a major concern.
It might be useful to additionally sanity-check the length of each
incoming component, however. RFC 3530bis15 does not suggest a maximum
number of UTF-8 characters per component for either the pathname4 or
component4 types. However, we could invent one that is appropriate
for our implementation.
Another possibility is to scrap all of this and print these pathnames
in upper layers after a reasonable amount of sanity checking in the
XDR layer. This would give us an opportunity to allocate a full
buffer so that the whole pathname would be output via a single
dprintk.
Introduced by commit 7aaa0b3b: "NFSv4: convert fs-locations-components
to conform to RFC3530," (June 9, 2006).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Eliminate a number of implicit type casts in comparisons, and these
compiler warnings:
fs/nfs/dir.c: In function ‘nfs_readdir_clear_array’:
fs/nfs/dir.c:264:16: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
fs/nfs/dir.c: In function ‘nfs_readdir_search_for_cookie’:
fs/nfs/dir.c:352:16: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
fs/nfs/dir.c: In function ‘nfs_do_filldir’:
fs/nfs/dir.c:769:38: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
fs/nfs/dir.c:780:9: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The regset common infrastructure assumed that regsets would always
have .get and .set methods, but not necessarily .active methods.
Unfortunately people have since written regsets without .set methods.
Rather than putting in stub functions everywhere, handle regsets with
null .get or .set methods explicitly.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 'minorversion' mount option is now deprecated, so we need to display
the minor version number in the 'vers=' format.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow the user to mount an NFSv4.0 or NFSv4.1 partition using a
standard syntax of '-overs=4.0', or '-overs=4.1' rather than the
more cumbersome '-overs=4,minorversion=1'.
See also the earlier patch by Dros Adamson, which added the
Linux-specific syntax '-ov4.0', '-ov4.1'.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There's no point to have two bits that are set in parallel; so use the
MS_I_VERSION flag that is needed by the VFS anyway, and that way we
free up a bit in sbi->s_mount_opts.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The overflow might happen when passing blocknr into
ext3_get_group_no_and_offset(), because it expects type ext3_fsblk_t
which might be smaller than uint64_t. This will most likely happen when
calling FITRIM with the default argument len = ULLONG_MAX.
Fix this by using "end" variable instead of "start+len" as it is easier
to get right and specifically check that the end is not beyond the end
of the file system, so we are sure that the result of
get_group_no_and_offset() will not overflow. Otherwise truncate it to
the size of the file system.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.
However it ends up calling driver's revalidate_disk without open
and could cause oops.
In the case of SCSI:
process A process B
----------------------------------------------
sys_open
__blkdev_get
sd_open
returns -ENOMEDIUM
scsi_remove_device
<scsi_device torn down>
rescan_partitions
sd_revalidate_disk
<oops>
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052
This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.
Reported-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
People complained about removing both of these features, so per
Linus's dictate, we won't be able to remove them. Sigh...
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Send the nfs implementation id in EXCHANGE_ID requests unless the module
parameter nfs.send_implementation_id is 0.
This adds a CONFIG variable for the nii_domain that defaults to "kernel.org".
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch removes the old hashmap-based caching and instead uses a
"request key actor" to place an upcall to the legacy idmapper rather
than going through /sbin/request-key. This will only be used as a
fallback if /etc/request-key.conf isn't configured to use nfsidmap.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFS4CLNT_LAYOUTRECALL bit is a long-term impediment to scalability. It
basically stops all other recalls by a given server once any layout recall
is requested.
If the recall is for a different file, then we don't care.
If the recall applies to the same file, then we're in one of two situations:
Either we are in the case of a replay of an existing request, in which case
the session is supposed to deal with matters, or we are dealing with a
completely different request, in which case we should just try to process
it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFS4CLNT_LAYOUTRECALL tests in pnfs_layout_process and
pnfs_update_layout are redundant.
In the case of a bulk layout recall, we're always testing for
the NFS_LAYOUT_BULK_RECALL flay anyway.
In the case of a file or segment recall, the call to
pnfs_set_layout_stateid() updates the layout_header 'barrier'
sequence id, which triggers the test in pnfs_layoutgets_blocked()
and is less race-prone than NFS4CLNT_LAYOUTRECALL anyway.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch fixes an error path in function gfs2_rindex_update
that leaves the rindex mutex held.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
udf_release_file() can be called from munmap() path with mmap_sem held. Thus
we cannot take i_mutex there because that ranks above mmap_sem. Luckily,
i_mutex is not needed in udf_release_file() anymore since protection by
i_data_sem is enough to protect from races with write and truncate.
CC: stable@vger.kernel.org (2.6.38 & later)
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
ECMA 1.67 requires setting logicalBlocksRecorded to zero if the file
has no extents. This should be checked in udf_update_inode().
udf_fill_inode() will then take care of itself.
Signed-off-by: Steven P. Nickel <snickel@focusinfo.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Commit 36350462 removed unused quota support from UDF. As an unfortunate
sideeffect it also removed updates of i_blocks so all files had i_block == 0.
Fix the problem by returning updates of file space back to UDF allocation and
freeing functions.
Signed-off-by: Jan Kara <jack@suse.cz>
dqptr_sem can be called from slab reclaim. tty layer uses GFP_KERNEL mask for
allocation so it can end up calling slab reclaim. Given quota code can call
into tty layer to print warning this creates possibility for lock inversion
between tty->atomic_write_lock and dqptr_sem.
Using direct printing of warnings from quota layer is obsolete but since it's
easy enough to change quota code to not hold any locks when printing warnings,
let's just do it. It seems like a good thing to do even when we use netlink
layer to transmit warnings to userspace.
Reported-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Jan Kara <jack@suse.cz>
In accordance with ECMA 1.67 Part 4, 14.9.15, the checkpoint field
should be initialized to 1 at creation. (Zero is *not* a valid value.)
Signed-off-by: Steven P. Nickel <snickel@focusinfo.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently ext3 updates ctime in ext3_splice_branch() which is called whenever
we allocate one block. But it is wasteful because ext3 doesn't support
nanosecond timestamp. This leads to a performance loss.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
dquot_free_block() is called in the end of ext3_new_blocks() and updates
information of the inode structure. However, this update is not necessary
if the number of blocks we requested is equal to the number of
allocated blocks.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Per call site OOM messages are unnecessary.
k.alloc and v.alloc failures use dump_stack().
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Now that XFS takes quota reservations into account there is no need to flush
anything before reporting quotas - in addition to beeing fully transactional
all quota information is also 100% coherent with the rest of the filesystem
now.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Report all quota usage including the currently pending reservations. This
avoids the need to flush delalloc space before gathering quota information,
and matches quota enforcement, which already takes the reservations into
account.
This fixes xfstests 270.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
The is no good reason to have these two separate, and for the next change
we would need the full struct xfs_dquot in xfs_qm_export_dquot, so better
just fold the code now instead of changing it spuriously.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
It is useless and confusing and may make people believe they may just
change it, which is not true, because this will also change the on-flash
format.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
This patch changes the 'i_nlink' counter handling in 'ubifs_unlink()',
'ubifs_rmdir()' and 'ubifs_rename()'. In these function 'i_nlink' may become 0,
and if 'ubifs_jnl_update()' failed, we would use 'inc_nlink()' to restore
the previous 'i_nlink' value, which is incorrect from the VFS point of view and
would cause a 'WARN_ON()' (see 'inc_nlink() implementation).
This patches saves the previous 'i_nlink' value in a local variable and uses it
at the error path instead of calling 'inc_nlink()'. We do this only for the
inodes where 'i_nlink' may potentially become zero.
This change has been requested by Al Viro <viro@ZenIV.linux.org.uk>.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Most of the time we use the dumping function to dump something in case
of error. We use 'KERN_DEBUG' printk level, and the drawback is that users
do not see them in the console, while they see the other error messages
in the console. The result is that they send bug reports which does not
contain a lot of useful information. This patch changes the printk level
of the dump functions to 'KERN_ERR' to correct the situation.
I documented it in the MTD web site that people have to send the 'dmesg' output
when submitting bug reposts - it did not help.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>