Commit 402cb8dda9 ("fscache: Attach the index key and aux data to
the cookie") added the index_key and index_key_len parameters to
fscache_acquire_cookie(), and updated the callers in the NFS client.
One of the callers was inside nfs_fscache_get_super_cookie()
and was changed to use the full struct nfs_fscache_key as the
index_key. However, a couple members of this structure contain
pointers and thus will change each time the same NFS share is
remounted. Since index_key is used for fscache_cookie->key_hash
and this subsequently is used to compare cookies, the effectiveness
of fscache with NFS is reduced to the point at which a umount
occurs. Any subsequent remount of the same share will cause a
unique NFS super_block index_key and key_hash to be generated for
the same data, rendering any prior fscache data unable to be
found. A simple reproducer demonstrates the problem.
1. Mount share with 'fsc', create a file, drop page cache
systemctl start cachefilesd
mount -o vers=3,fsc 127.0.0.1:/export /mnt
dd if=/dev/zero of=/mnt/file1.bin bs=4096 count=1
echo 3 > /proc/sys/vm/drop_caches
2. Read file into page cache and fscache, then unmount
dd if=/mnt/file1.bin of=/dev/null bs=4096 count=1
umount /mnt
3. Remount and re-read which should come from fscache
mount -o vers=3,fsc 127.0.0.1:/export /mnt
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/file1.bin of=/dev/null bs=4096 count=1
4. Check for READ ops in mountstats - there should be none
grep READ: /proc/self/mountstats
Looking at the history and the removed function, nfs_super_get_key(),
we should only use nfs_fscache_key.key plus any uniquifier, for
the fscache index_key.
Fixes: 402cb8dda9 ("fscache: Attach the index key and aux data to the cookie")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
This reverts commit df5db5f9ee.
This patch fixes a regression: patch df5db5f9ee allowed function
run_queue() to bypass its call to do_xmote() if revokes were queued for
the glock. That's wrong because its call to do_xmote() is what is
responsible for calling the go_sync() glops functions to sync both
the ail list and any revokes queued for it. By bypassing the call,
gfs2 could get into a stand-off where the glock could not be demoted
until its revokes are written back, but the revokes would not be
written back because do_xmote() was never called.
It "sort of" works, however, because there are other mechanisms like
the log flush daemon (logd) that can sync the ail items and revokes,
if it deems it necessary. The problem is: without file system pressure,
it might never deem it necessary.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, if the go_sync operation returned an error during
the do_xmote process (such as unable to sync metadata to the journal)
the code did goto out. That kept the glock locked, so it could not be
given away, which correctly avoids file system corruption. However,
it never set the withdraw bit or requeueing the glock work. So it would
hang forever, unable to ever demote the glock.
This patch changes to goto to a new label, skip_inval, so that errors
from go_sync are treated the same way as errors from go_inval:
The delayed withdraw bit is set and the work is requeued. That way,
the logd should eventually figure out there's a problem and withdraw
properly there.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull ceph fixes from Ilya Dryomov:
"Fixes for an endianness handling bug that prevented mounts on
big-endian arches, a spammy log message and a couple error paths.
Also included a MAINTAINERS update"
* tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-client:
ceph: demote quotarealm lookup warning to a debug message
MAINTAINERS: remove myself as ceph co-maintainer
ceph: fix double unlock in handle_cap_export()
ceph: fix special error code in ceph_try_get_caps()
ceph: fix endianness bug when handling MDS session feature bits
This patch rearranges gfs2_add_revoke so that the extra glock
reference is added earlier on in the function to avoid races in which
the glock is freed before the new reference is taken.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, function gfs2_quota_unlock checked if quotas are
turned off, and if so, it branched to label out, which called
gfs2_quota_unhold. With the new system of gfs2_qa_get and put, we
no longer want to call gfs2_quota_unhold or we won't balance our
gets and puts.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, function gfs2_quota_lock checked if it was called
from a privileged user, and if so, it bypassed the quota check:
superuser can operate outside the quotas.
That's the wrong place for the check because the lock/unlock functions
are separate from the lock_check function, and you can do lock and
unlock without actually checking the quotas.
This patch moves the check to gfs2_quota_lock_check.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch removes a check from gfs2_quota_check for whether quotas
are enabled by the superblock. There is a test just prior for the
GIF_QD_LOCKED bit in the inode, and that can only be set by functions
that already check that quotas are enabled in the superblock.
Therefore, the check is redundant.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, gfs2_quota_change() would BUG_ON if the
qa_ref counter was not a positive number. This patch changes it to
be a withdraw instead. That way we can debug things more easily.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch fixes a couple of places in which gfs2_qa_get and gfs2_qa_put are
not balanced: we now keep references around whenever a file is open for writing
(see gfs2_open_common and gfs2_release), so we need to put all references we
grab in function gfs2_create_inode. This was broken in the successful case and
on one error path.
This also means that we don't have a reference to put in gfs2_evict_inode.
In addition, gfs2_qa_put was called for the wrong inode in gfs2_link.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull driver core fixes from Greg KH:
"Here are a number of small driver core fixes for 5.7-rc5 to resolve a
bunch of reported issues with the current tree.
Biggest here are the reverts and patches from John Stultz to resolve a
bunch of deferred probe regressions we have been seeing in 5.7-rc
right now.
Along with those are some other smaller fixes:
- coredump crash fix
- devlink fix for when permissive mode was enabled
- amba and platform device dma_parms fixes
- component error silenced for when deferred probe happens
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
regulator: Revert "Use driver_deferred_probe_timeout for regulator_init_complete_work"
driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings
driver core: Revert default driver_deferred_probe_timeout value to 0
component: Silence bind error on -EPROBE_DEFER
driver core: Fix handling of fw_devlink=permissive
coredump: fix crash when umh is disabled
amba: Initialize dma_parms for amba devices
driver core: platform: Initialize dma_parms for platform devices
The random buffer write failure errortag patch introduced a local
mount pointer variable for the test macro, but the macro is compiled
out on !DEBUG kernels. This results in an unused variable warning.
Access the mount structure through the buffer pointer and remove the
local mount pointer to address the warning.
Fixes: 7376d74547 ("xfs: random buffer write failure errortag")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Move the helpers that handle incore buffer cancellation records to
xfs_buf_item_recover.c since they're not directly related to the main
log recovery machinery. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The only purpose of XFS_LI_RECOVERED is to prevent log recovery from
trying to replay recovered intents more than once. Therefore, we can
move the bit setting up to the ->iop_recover caller.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Now that we've made the recovered item tests all the same, we can hoist
the test and the ail locking code to the ->iop_recover caller and call
the recovery function directly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Rename XFS_{EFI,BUI,RUI,CUI}_RECOVERED to XFS_LI_RECOVERED so that we
track recovery status in the log item, then get rid of the now unused
flags fields in each of those log item types.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
During recovery, every intent that we recover from the log has to be
added to the AIL. Replace the open-coded addition with a helper.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Replace the open-coded AIL item walking with a proper helper when we're
trying to release an intent item that has been finished. We add a new
->iop_match method to decide if an intent item matches a supplied ID.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Now that we've finished converting all types of log intent items to
provide an ->iop_recover function, we can convert the "is this an intent
item?" predicate to look for a non-null iop_recover pointer.
Move the predicate closer to the functions that use it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the code that processes the log items created from the recovered
log items into the per-item source code files and use dispatch functions
to call them. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the code that processes the log items created from the recovered
log items into the per-item source code files and use dispatch functions
to call them. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the code that processes the log items created from the recovered
log items into the per-item source code files and use dispatch functions
to call them. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the code that processes the log items created from the recovered
log items into the per-item source code files and use dispatch functions
to call them. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Quotaoff doesn't actually do anything, so take advantage of the
commit_pass2 pointer being optional and get rid of the switch
statement clause.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the bmap update intent and intent-done pass2 commit code into the
per-item source code files and use dispatch functions to call them. We
do these one at a time because there's a lot of code to move. No
functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the refcount update intent and intent-done pass2 commit code into
the per-item source code files and use dispatch functions to call them.
We do these one at a time because there's a lot of code to move. No
functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the rmap update intent and intent-done pass2 commit code into the
per-item source code files and use dispatch functions to call them. We
do these one at a time because there's a lot of code to move. No
functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the extent free intent and intent-done pass2 commit code into the
per-item source code files and use dispatch functions to call them. We
do these one at a time because there's a lot of code to move. No
functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the log icreate item pass2 commit code into the per-item source code
files and use the dispatch function to call it. We do these one at a
time because there's a lot of code to move. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the log dquot item pass2 commit code into the per-item source code
files and use the dispatch function to call it. We do these one at a
time because there's a lot of code to move. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the log inode item pass2 commit code into the per-item source code
files and use the dispatch function to call it. We do these one at a
time because there's a lot of code to move. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the log buffer item pass2 commit code into the per-item source code
files and use the dispatch function to call it. We do these one at a
time because there's a lot of code to move. No functional changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the pass1 commit code into the per-item source code files and use
the dispatch function to call them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the pass2 readhead code into the per-item source code files and use
the dispatch function to call them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a generic dispatch structure to delegate recovery of different
log item types into various code modules. This will enable us to move
code specific to a particular log item type out of xfs_log_recover.c and
into the log item source.
The first operation we virtualize is the log item sorting.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Merge misc fixes from Andrew Morton:
"14 fixes and one selftest to verify the ipc fixes herein"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: limit boost_watermark on small zones
ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
epoll: atomically remove wait entry on wake up
kselftests: introduce new epoll60 testcase for catching lost wakeups
percpu: make pcpu_alloc() aware of current gfp context
mm/slub: fix incorrect interpretation of s->offset
scripts/gdb: repair rb_first() and rb_last()
eventpoll: fix missing wakeup for ovflist in ep_poll_callback
arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
scripts/decodecode: fix trapping instruction formatting
kernel/kcov.c: fix typos in kcov_remote_start documentation
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
mm, memcg: fix error return value of mem_cgroup_css_alloc()
ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
to support bmap() on compressed inode: if queried block locates in
non-compressed cluster, return its physical block address.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Map normal/compressed cluster of compressed inode correctly, and give
the right fiemap flag FIEMAP_EXTENT_ENCODED on mapped compressed extent.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch corrects the SPDX License Identifier style in
header files related to F2FS File System support.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used).
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It turns out that when extending an existing bio, gfs2_find_jhead fails to
check if the block number is consecutive, which leads to incorrect reads for
fragmented journals.
In addition, limit the maximum bio size to an arbitrary value of 2 megabytes:
since commit 07173c3ec2 ("block: enable multipage bvecs"), if we just keep
adding pages until bio_add_page fails, bios will grow much larger than useful,
which pins more memory than necessary with barely any additional performance
gains.
Fixes: f4686c26ec ("gfs2: read journal in large chunks")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Make sure we don't walk past the end of the metadata in gfs2_walk_metadata: the
inode holds fewer pointers than indirect blocks.
Slightly clean up gfs2_iomap_get.
Fixes: a27a0c9b6a ("gfs2: gfs2_walk_metadata fix")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
When the gfs2_logd daemon withdrew, the withdraw sequence called
into make_fs_ro() to make the file system read-only. That caused the
journal descriptors to be freed. However, those journal descriptors
were used by gfs2_logd's call to gfs2_ail_flush_reqd(). This caused
a use-after free and NULL pointer dereference.
This patch changes function gfs2_logd() so that it stops all logd
work until the thread is told to stop. Once a withdraw is done,
it only does an interruptible sleep.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, when the logd daemon was forced to withdraw, it
would try to request its journal be recovered by another cluster node.
However, in single-user cases with lock_nolock, there are no other
nodes to recover the journal. Function signal_our_withdraw() was
recognizing the lock_nolock situation, but not until after it had
evicted its journal inode. Since the journal descriptor that points
to the inode was never removed from the master list, when the unmount
occurred, it did another iput on the evicted inode, which resulted in
a BUG_ON(inode->i_state & I_CLEAR).
This patch moves the check for this situation earlier in function
signal_our_withdraw(), which avoids the extra iput, so the unmount
may happen normally.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>