Reimplement function hole_size based on a generic function for walking
the metadata tree and rename hole_size to gfs2_hole_size. While
previously, multiple invocations of hole_size were sometimes needed to
walk across the entire hole, the new implementation always returns the
entire hole at once (provided that the caller is interested in the total
size).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Function gfs2_free_extlen calculates the length of an extent of
free blocks that may be reserved. The end pointer was calculated as
end = start + bh->b_size but b_size is incorrect because the
bitmap usually stops prior to the end of the buffer data on
the last bitmap.
What this means is that when you do a write, you can reserve a
chunk of blocks that runs off the end of the last bitmap. For
example, I've got a file system where there is only one bitmap
for each rgrp, so ri_length==1. I saw cases in which iozone
tried to do a big write, grabbed a large block reservation,
chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188.
So 5464153 + 8188 = 5472341 which is the end of the rgrp.
When it grabbed a reservation it got back: 5470936, length 7229.
But 5470936 + 7229 = 5478165. So the reservation starts inside
the rgrp but runs 5824 blocks past the end of the bitmap.
This patch fixes the calculation so it won't exceed the last
bitmap. It also adds a BUG_ON to guard against overflows in the
future.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch function gfs2_write_begin, upon discovering an
error, called gfs2_trim_blocks while the rgrp glock was still held.
That's because gfs2_inplace_release is not called until later.
This patch reorganizes the logic a bit so gfs2_inplace_release
is called to release the lock prior to the call to gfs2_trim_blocks,
thus preventing the glock recursion.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
The heads of tha AGI unlinked list are only scanned on debug
kernels when the verifier runs. Change that to always scan the heads
and validate that the inode numbers are valid.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull vfs fixes from Al Viro.
- fix io_destroy()/aio_complete() race
- the vfs_open() change to get rid of open_check_o_direct() boilerplate
was nice, but buggy. Al has a patch avoiding a revert, but that's
definitely not a last-day fodder, so for now revert it is...
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Revert "fs: fold open_check_o_direct into do_dentry_open"
fix io_destroy()/aio_complete() race
This reverts commit cab64df194.
Having vfs_open() in some cases drop the reference to
struct file combined with
error = vfs_open(path, f, cred);
if (error) {
put_filp(f);
return ERR_PTR(error);
}
return f;
is flat-out wrong. It used to be
error = vfs_open(path, f, cred);
if (!error) {
/* from now on we need fput() to dispose of f */
error = open_check_o_direct(f);
if (error) {
fput(f);
f = ERR_PTR(error);
}
} else {
put_filp(f);
f = ERR_PTR(error);
}
and sure, having that open_check_o_direct() boilerplate gotten rid of is
nice, but not that way...
Worse, another call chain (via finish_open()) is FUBAR now wrt
FILE_OPENED handling - in that case we get error returned, with file
already hit by fput() *AND* FILE_OPENED not set. Guess what happens in
path_openat(), when it hits
if (!(opened & FILE_OPENED)) {
BUG_ON(!error);
put_filp(file);
}
The root cause of all that crap is that the callers of do_dentry_open()
have no way to tell which way did it fail; while that could be fixed up
(by passing something like int *opened to do_dentry_open() and have it
marked if we'd called ->open()), it's probably much too late in the
cycle to do so right now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
It does not return an error, so we don't need to check the return value
for IS_ERR(). Indeed, it is a bug to do so; with a sufficiently large
PFN, a legitimate DAX entry may be mistaken for an error return.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add a function to allocate wdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages that
point to the data buffer to write to.
wdata is reponsible for free those pages after it's done.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
With offset defined in rdata, transport functions need to look at this
offset when reading data into the correct places in pages.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Add a function to allocate rdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages
that point to the data buffer.
rdata is still reponsible for free those pages after it's done.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull rdma fixes from Jason Gunthorpe:
"Just three small last minute regressions that were found in the last
week. The Broadcom fix is a bit big for rc7, but since it is fixing
driver crash regressions that were merged via netdev into rc1, I am
sending it.
- bnxt netdev changes merged this cycle caused the bnxt RDMA driver
to crash under certain situations
- Arnd found (several, unfortunately) kconfig problems with the
patches adding INFINIBAND_ADDR_TRANS. Reverting this last part,
will fix it more fully outside -rc.
- Subtle change in error code for a uapi function caused breakage in
userspace. This was bug was subtly introduced cycle"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/core: Fix error code for invalid GID entry
IB: Revert "remove redundant INFINIBAND kconfig dependencies"
RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes
We only call into this function through the iomap iterators, so we already
know the buffer is unwritten. In addition to that we always require the
uptodate flag that is ORed with the result anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This function is only used by the iomap code, depends on being called
from it, and will soon stop poking into buffer head internals.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Switch to the iomap based bmap implementation to get rid of one of the
last users of xfs_get_blocks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This adds a simple iomap-based implementation of the legacy ->bmap
interface. Note that we can't easily add checks for rt or reflink
files, so these will have to remain in the callers. This interface
just needs to die..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Factor the repeated calculation of the on-disk sector for a given logical
block into a littler helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We don't need any merging logic, and this also replaces a BUG_ON with a
WARN_ON_ONCE inside __bio_add_page for the impossible overflow condition.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Just define a range of fs specific flags and use that in gfs2 instead of
exposing this internal flag globally.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address. So instead of having a flag for it
it should be an entirely separate iomap range type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
get_bufmap_init is used in the out-of-tree module, but was left in
the upstream version as an oversight. Tip-of-the-hat to sparse and Al Viro.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
As long as a symlink inode remains in-core, the destination (and
therefore size) will not be re-fetched from the server, as it cannot
change. The original implementation of the attribute cache assumed that
setting the expiry time in the past was sufficient to cause a re-fetch
of all attributes on the next getattr. That does not work in this case.
The bug manifested itself as follows. When the command sequence
touch foo; ln -s foo bar; ls -l bar
is run, the output was
lrwxrwxrwx. 1 fedora fedora 4906 Apr 24 19:10 bar -> foo
However, after a re-mount, ls -l bar produces
lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo
After this commit, even before a re-mount, the output is
lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo
Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Fixes: 71680c18c8 ("orangefs: Cache getattr results.")
Cc: stable@vger.kernel.org
Cc: hubcap@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
The struct orangefs_file_vm_ops is local to the source and does not need
to be in global scope, so make it static.
Cleans up sparse warning:
fs/orangefs/file.c:547:35: warning: symbol 'orangefs_file_vm_ops' was not
declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Now the superblock block size is PAGE_SIZE. The inode block size is
PAGE_SIZE for directories and symlinks, but is the server-reported
block size for regular files.
The block size in the OrangeFS private inode is now deleted. Stat
now reports PAGE_SIZE for directories and symlinks and the
server-reported block size for regular files.
The user-space visible change is that the block size for directores
and symlinks and the superblock is now PAGE_SIZE rather than the size of
the client-core shared memory buffers, which was typically four
megabytes.
Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: hubcap@omnibond.com
Cc: walt@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
generic/475 fired an assert failure just after the filesystem was
shut down:
XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_refcount.c, line: 182
.....
Call Trace:
xfs_refcount_insert+0x151/0x190
xfs_refcount_adjust_extents.constprop.11+0x9c/0x470
xfs_refcount_adjust.constprop.10+0xb0/0x270
xfs_refcount_finish_one+0x25a/0x420
xfs_trans_log_finish_refcount_update+0x2a/0x40
xfs_refcount_update_finish_item+0x35/0xa0
xfs_defer_finish+0x15e/0x4d0
xfs_reflink_remap_extent+0x1bc/0x610
xfs_reflink_remap_blocks+0x6e/0x280
xfs_reflink_remap_range+0x311/0x530
vfs_clone_file_range+0x119/0x200
....
If xfs_btree_insert() returns an error, the corruption check fires
instead of passing the error back the caller. The corruption check
should be after we've checked for an error, not before, thereby
avoiding assert failures if the filesystem shuts down during a
refcount btree record insert.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All the realtime allocation functions deal with space on the rtdev in
units of realtime extents. However, struct xfs_rtalloc_rec confusingly
uses the word 'block' in the name, even though they're really extents.
Fix the naming problem and fix all the unit handling problems in the two
existing users.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Strengthen the rtalloc range query checks to make sure that the keys do
not run off the end of the realtime device inappropriately. Note that
the query range functions require units of rt extents, not blocks,
despite the type name.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
The xfs_rtbuf_get function should check the block mapping it gets back
from bmapi_read. If there are no mappings or the mapping isn't a real
extent, we should return -EFSCORRUPTED rather than trying to read a
garbage value. We also require realtime bitmap blocks to be real,
written allocations.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
xfs_rtword_t is used for bit manipulations in the realtime bitmap file.
Since we're performing bit shifts with this type, we don't want sign
extension and we don't want to be left shifting negative quantities
because that's undefined behavior.
This also shuts up these UBSAN warnings:
UBSAN: Undefined behaviour in fs/xfs/libxfs/xfs_rtbitmap.c:833:48
signed integer overflow:
-2147483648 - 1 cannot be represented in type 'int'
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Since header_preamble_size is 0 for SMB2+ we can remove it in those
code paths that are only invoked from SMB2.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
struct smb2_hdr is now just a wrapper for smb2_sync_hdr.
We can thus get rid of smb2_hdr completely and access the sync header directly.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
smb2_hdr is just a wrapper around smb2_sync_hdr at this stage
and smb2_hdr is going away.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The two structures smb2_oplock_breaq_req/rsp are now basically identical.
Replace this with a single definition of a smb2_oplock_break structure.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Separate out all the 4 byte rfc1002 headers so that they are no longer
part of the SMB2 header structures to prepare for future work to add
compounding support.
Update the smb3 transform header processing that we no longer have
a rfc1002 header at the start of this structure.
Update smb2_readv_callback to accommodate that the first iovector in the
response is no the smb2 header and no longer a rfc1002 header.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The server detects reconnect by the (non-zero) value in PreviousSessionId
of SMB2/SMB3 SessionSetup request, but this behavior regressed due
to commit 166cea4dc3
("SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setup")
CC: Stable <stable@vger.kernel.org>
CC: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Pull xfs fix from Darrick Wong:
"Clear out i_mapping error state when we're reinitializing inodes.
This last minute fix prevents writeback error state from persisting
past the end of the in-core inode lifecycle and causing EIO errors to
be reported to userspace when no error has occurred.
This fix for the behavioral regression has been soaking in for-next
for a while, but various fs developers persuaded me to try to get it
upstream for 4.17 because the patch that broke things was introduced
in 4.17-rc4"
* tag 'xfs-4.17-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: clear writeback errors in inode_init_always
If the server recalls the layout that was just handed out, we risk hitting
a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we
release the sequence slot after processing the LAYOUTGET operation that
was sent as part of the OPEN compound.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the layoutget on open call failed, we can't really commit the inode,
so don't bother calling it.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>