A recent optimization had left private uninitialized.
Fixes: 2bc4ca9bb6 ("aio: don't zero entire aio_kiocb aio_get_req()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In xrep_findroot_block, we work out the btree type and correctness of a
given block by calling different btree verifiers on root block
candidates. However, we leave the NULL b_ops while ->verify_read
validates the block, which means that if the verifier calls
xfs_buf_verifier_error it'll crash on the null b_ops. Fix it to set
b_ops before calling the verifier and unsetting it if the verifier
fails.
Furthermore, improve the documentation around xfs_buf_ensure_ops, which
is the function that is responsible for cleaning up the b_ops state of
buffers that go through xrep_findroot_block but don't match anything.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
As of commit e339dd8d8b ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.
If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.
We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.
For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc. in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.
To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.
Fixes: 40214d128e ("xfs: trim writepage mapping to within eof")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull btrfs fixes from David Sterba:
- regression fix: transaction commit can run away due to delayed ref
waiting heuristic, this is not necessary now because of the proper
reservation mechanism introduced in 5.0
- regression fix: potential crash due to use-before-check of an ERR_PTR
return value
- fix for transaction abort during transaction commit that needs to
properly clean up pending block groups
- fix deadlock during b-tree node/leaf splitting, when this happens on
some of the fundamental trees, we must prevent new tree block
allocation to re-enter indirectly via the block group flushing path
- potential memory leak after errors during mount
* tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: On error always free subvol_name in btrfs_mount
btrfs: clean up pending block groups when transaction commit aborts
btrfs: fix potential oops in device_list_add
btrfs: don't end the transaction for delayed refs in throttle
Btrfs: fix deadlock when allocating tree block during leaf/node split
Merge misc fixes from Andrew Morton:
"24 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
autofs: fix error return in autofs_fill_super()
autofs: drop dentry reference only when it is never used
fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
psi: clarify the Kconfig text for the default-disable option
mm, memory_hotplug: __offline_pages fix wrong locking
mm: hwpoison: use do_send_sig_info() instead of force_sig()
kasan: mark file common so ftrace doesn't trace it
init/Kconfig: fix grammar by moving a closing parenthesis
lib/test_kmod.c: potential double free in error handling
mm, oom: fix use-after-free in oom_kill_process
mm/hotplug: invalid PFNs from pfn_to_online_page()
mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
psi: fix aggregation idle shut-off
mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
oom, oom_reaper: do not enqueue same task twice
mm: migrate: make buffer_migrate_page_norefs() actually succeed
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
x86_64: increase stack size for KASAN_EXTRA
...
Pull smb3 fixes from Steve French:
"SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a
particularly important one for queryxattr (see xfstests 70 and 117)"
* tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
CIFS: fix use-after-free of the lease keys
CIFS: Do not consider -ENODATA as stat failure for reads
CIFS: Do not count -ENODATA as failure for query directory
CIFS: Fix trace command logging for SMB2 reads and writes
CIFS: Fix possible oops and memory leaks in async IO
cifs: limit amount of data we request for xattrs to CIFSMaxBufSize
cifs: fix computation for MAX_SMB2_HDR_SIZE
Pull iomap fixes from Darrick Wong:
"A couple of iomap fixes to eliminate some memory corruption and hang
problems that were reported:
- fix page migration when using iomap for pagecache management
- fix a use-after-free bug in the directio code"
* tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: fix a use after free in iomap_dio_rw
iomap: get/put the page in iomap_page_create/release()
On ppc64le, When a string with PAGE_SIZE - 1 (i.e. 64k-1) length is
passed as a "filesystem type" argument to the mount(2) syscall,
copy_mount_string() ends up allocating 64k (the PAGE_SIZE on ppc64le)
worth of space for holding the string in kernel's address space.
Later, in set_precision() (invoked by get_fs_type() ->
__request_module() -> vsnprintf()), we end up assigning
strlen(fs-type-string) i.e. 65535 as the
value to 'struct printf_spec'->precision member. This field has a width
of 16 bits and it is a signed data type. Hence an invalid value ends
up getting assigned. This causes the "WARN_ONCE(spec->precision != prec,
"precision %d too large", prec)" statement inside set_precision() to be
executed.
This commit fixes the bug by limiting the length of the string passed by
copy_mount_string() to strndup_user() to PATH_MAX.
Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.ibm.com>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This reverts commit ad211f3e94.
As Jan Kara pointed out, this change was unsafe since it means we lose
the call to sync_mapping_buffers() in the nojournal case. The
original point of the commit was avoid taking the inode mutex (since
it causes a lockdep warning in generic/113); but we need the mutex in
order to call sync_mapping_buffers().
The real fix to this problem was discussed here:
https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org
The proposed patch was to fix a syzbot complaint, but the problem can
also demonstrated via "kvm-xfstests -c nojournal generic/113".
Multiple solutions were discused in the e-mail thread, but none have
landed in the kernel as of this writing. Anyway, commit
ad211f3e94 is absolutely the wrong way to suppress the lockdep, so
revert it.
Fixes: ad211f3e94 ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported: Jan Kara <jack@suse.cz>
This reverts commit 2d29f6b96d.
It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone. Let's revert this
for now to have more time for a proper fix.
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull NFS client fixes from Anna Schumaker:
"This addresses two bugs, one in the error code handling of
nfs_page_async_flush() and one to fix a potential NULL pointer
dereference in nfs_parse_devname().
Stable bugfix:
- Fix up return value on fatal errors in nfs_page_async_flush()
Other bugfix:
- Fix NULL pointer dereference of dev_name"
* tag 'nfs-for-5.0-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix up return value on fatal errors in nfs_page_async_flush()
nfs: Fix NULL pointer dereference of dev_name
The request buffers are freed right before copying the pointers.
Use the func args instead which are identical and still valid.
Simple reproducer (requires KASAN enabled) on a cifs mount:
echo foo > foo ; tail -f foo & rm foo
Cc: <stable@vger.kernel.org> # 4.20
Fixes: 179e44d49c ("smb3: add tracepoint for sending lease break responses to server")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
The current dentry number tracking code doesn't distinguish between
positive & negative dentries. It just reports the total number of
dentries in the LRU lists.
As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.
This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the 5th field in the
/proc/sys/fs/dentry-state file. The number, however, does not include
negative dentries that are in flight but not in the LRU yet as well as
those in the shrinker lists which are on the way out anyway.
The number of positive dentries in the LRU lists can be roughly found by
subtracting the number of negative dentries from the unused count.
Matthew Wilcox had confirmed that since the introduction of the
dentry_stat structure in 2.1.60, the dummy array was there, probably for
future extension. They were not replacements of pre-existing fields.
So no sane applications that read the value of /proc/sys/fs/dentry-state
will do dummy thing if the last 2 fields of the sysctl parameter are not
zero. IOW, it will be safe to use one of the dummy array entry for
negative dentry count.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused. This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().
To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.
Fixes: 4e717f5c10 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The subvol_name is allocated in btrfs_parse_subvol_options and is
consumed and freed in mount_subvol. Add a free to the error paths that
don't call mount_subvol so that it is guaranteed that subvol_name is
freed when an error happens.
Fixes: 312c89fbca ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
Cc: stable@vger.kernel.org # v4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
alloc_fs_devices() can return ERR_PTR(-ENOMEM), so dereferencing its
result before the check for IS_ERR() is a bad idea.
Fixes: d1a6300282 ("btrfs: add members to fs_devices to track fsid changes")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When doing reads beyound the end of a file the server returns
error STATUS_END_OF_FILE error which is mapped to -ENODATA.
Currently we report it as a failure which confuses read stats.
Change it to not consider -ENODATA as failure for stat purposes.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Currently we log success once we send an async IO request to
the server. Instead we need to analyse a response and then log
success or failure for a particular command. Also fix argument
list for read logging.
Cc: <stable@vger.kernel.org> # 4.18
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Allocation of a page array for non-cached IO was separated from
allocation of rdata and wdata structures and this introduced memory
leaks and a possible null pointer dereference. This patch fixes
these problems.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
minus the various headers and blobs that will be part of the reply.
or else we might trigger a session reconnect.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a NULL pointer dereference of dev_name in nfs_parse_devname()
The oops looks something like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:nfs_fs_mount+0x3b6/0xc20 [nfs]
...
Call Trace:
? ida_alloc_range+0x34b/0x3d0
? nfs_clone_super+0x80/0x80 [nfs]
? nfs_free_parsed_mount_data+0x60/0x60 [nfs]
mount_fs+0x52/0x170
? __init_waitqueue_head+0x3b/0x50
vfs_kern_mount+0x6b/0x170
do_mount+0x216/0xdc0
ksys_mount+0x83/0xd0
__x64_sys_mount+0x25/0x30
do_syscall_64+0x65/0x220
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fix this by adding a NULL check on dev_name
Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Previously callers to btrfs_end_transaction_throttle() would commit the
transaction if there wasn't enough delayed refs space. This happens in
relocation, and if the fs is relatively empty we'll run out of delayed
refs space basically immediately, so we'll just be stuck in this loop of
committing the transaction over and over again.
This code existed because we didn't have a good feedback mechanism for
running delayed refs, but with the delayed refs rsv we do now. Delete
this throttling code and let the btrfs_start_transaction() in relocation
deal with putting pressure on the delayed refs infrastructure. With
this patch we no longer take 5 minutes to balance a metadata only fs.
Qu has submitted a fstest to catch slow balance or excessive transaction
commits. Steps to reproduce:
* create subvolume
* create many (eg. 16000) inlined files, of size 2KiB
* iteratively snapshot and touch several files to trigger metadata
updates
* start balance -m
Reported-by: Qu Wenruo <wqu@suse.com>
Fixes: 64403612b7 ("btrfs: rework btrfs_check_space_for_delayed_refs")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add tags and steps to reproduce ]
Signed-off-by: David Sterba <dsterba@suse.com>
When splitting a leaf or node from one of the trees that are modified when
flushing pending block groups (extent, chunk, device and free space trees),
we need to allocate a new tree block, which in turn can result in the need
to allocate a new block group. After allocating the new block group we may
need to flush new block groups that were previously allocated during the
course of the current transaction, which is what may cause a deadlock due
to attempts to write lock twice the same leaf or node, as when splitting
a leaf or node we are holding a write lock on it and its parent node.
The same type of deadlock can also happen when increasing the tree's
height, since we are holding a lock on the existing root while allocating
the tree block to use as the new root node.
An example trace when the deadlock happens during the leaf split path is:
[27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G W 4.19.16 #1
[27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
[27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
(...)
[27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246
[27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001
[27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48
[27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000
[27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000
[27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18
[27175.303601] FS: 0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000
[27175.304468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0
[27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[27175.307940] Call Trace:
[27175.308802] btrfs_search_slot+0x779/0x9a0 [btrfs]
[27175.309669] ? update_space_info+0xba/0xe0 [btrfs]
[27175.310534] btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[27175.311397] btrfs_insert_item+0x60/0xd0 [btrfs]
[27175.312253] btrfs_create_pending_block_groups+0xee/0x210 [btrfs]
[27175.313116] do_chunk_alloc+0x25f/0x300 [btrfs]
[27175.313984] find_free_extent+0x706/0x10d0 [btrfs]
[27175.314855] btrfs_reserve_extent+0x9b/0x1d0 [btrfs]
[27175.315707] btrfs_alloc_tree_block+0x100/0x5b0 [btrfs]
[27175.316548] split_leaf+0x130/0x610 [btrfs]
[27175.317390] btrfs_search_slot+0x94d/0x9a0 [btrfs]
[27175.318235] btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[27175.319087] alloc_reserved_file_extent+0x84/0x2c0 [btrfs]
[27175.319938] __btrfs_run_delayed_refs+0x596/0x1150 [btrfs]
[27175.320792] btrfs_run_delayed_refs+0xed/0x1b0 [btrfs]
[27175.321643] delayed_ref_async_start+0x81/0x90 [btrfs]
[27175.322491] normal_work_helper+0xd0/0x320 [btrfs]
[27175.323328] ? move_linked_works+0x6e/0xa0
[27175.324160] process_one_work+0x191/0x370
[27175.324976] worker_thread+0x4f/0x3b0
[27175.325763] kthread+0xf8/0x130
[27175.326531] ? rescuer_thread+0x320/0x320
[27175.327284] ? kthread_create_worker_on_cpu+0x50/0x50
[27175.328027] ret_from_fork+0x35/0x40
[27175.328741] ---[ end trace 300a1b9f0ac30e26 ]---
Fix this by preventing the flushing of new blocks groups when splitting a
leaf/node and when inserting a new root node for one of the trees modified
by the flushing operation, similar to what is done when COWing a node/leaf
from on of these trees.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383
Reported-by: Eli V <eliventer@gmail.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a local wait_for_completion variable to avoid an access to the
potentially freed dio struture after dropping the last reference count.
Also use the chance to document the completion behavior to make the
refcounting clear to the reader of the code.
Fixes: ff6a9292e6 ("iomap: implement direct I/O")
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Chandan Rajendra <chandan@linux.ibm.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
migrate_page_move_mapping() expects pages with private data set to have
a page_count elevated by 1. This is what used to happen for xfs through
the buffer_heads code before the switch to iomap in commit 82cb14175e
("xfs: add support for sub-pagesize writeback without buffer_heads").
Not having the count elevated causes move_pages() to fail on memory
mapped files coming from xfs.
Make iomap compatible with the migrate_page_move_mapping() assumption by
elevating the page count as part of iomap_page_create() and lowering it
in iomap_page_release().
It causes the move_pages() syscall to misbehave on memory mapped files
from xfs. It does not not move any pages, which I suppose is "just" a
perf issue, but it also ends up returning a positive number which is out
of spec for the syscall. Talking to Michal Hocko, it sounds like
returning positive numbers might be a necessary update to move_pages()
anyway though.
Fixes: 82cb14175e ("xfs: add support for sub-pagesize writeback without buffer_heads")
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
[hch: actually get/put the page iomap_migrate_page() to make it work
properly]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull smb3 fixes from Steve French:
"A set of small smb3 fixes, some fixing various crediting issues
discovered during xfstest runs, five for stable"
* tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: print CIFSMaxBufSize as part of /proc/fs/cifs/DebugData
smb3: add credits we receive from oplock/break PDUs
CIFS: Fix mounts if the client is low on credits
CIFS: Do not assume one credit for async responses
CIFS: Fix credit calculations in compound mid callback
CIFS: Fix credit calculation for encrypted reads with errors
CIFS: Fix credits calculations for reads with errors
CIFS: Do not reconnect TCP session in add_credits()
smb3: Cleanup license mess
CIFS: Fix possible hang during async MTU reads and writes
cifs: fix memory leak of an allocated cifs_ntsd structure
Pull block fixes from Jens Axboe:
"A collection of fixes for this release. This contains:
- Silence sparse rightfully complaining about non-static wbt
functions (Bart)
- Fixes for the zoned comments/ioctl documentation (Damien)
- direct-io fix that's been lingering for a while (Ernesto)
- cgroup writeback fix (Tejun)
- Set of NVMe patches for nvme-rdma/tcp (Sagi, Hannes, Raju)
- Block recursion tracking fix (Ming)
- Fix debugfs command flag naming for a few flags (Jianchao)"
* tag 'for-linus-20190125' of git://git.kernel.dk/linux-block:
block: Fix comment typo
uapi: fix ioctl documentation
blk-wbt: Declare local functions static
blk-mq: fix the cmd_flag_name array
nvme-multipath: drop optimization for static ANA group IDs
nvmet-rdma: fix null dereference under heavy load
nvme-rdma: rework queue maps handling
nvme-tcp: fix timeout handler
nvme-rdma: fix timeout handler
writeback: synchronize sync(2) against cgroup writeback membership switches
block: cover another queue enter recursion via BIO_QUEUE_ENTERED
direct-io: allow direct writes to empty inodes
debugfs_rename() needs to check that the dentries passed into it really
are valid, as sometimes they are not (i.e. if the return value of
another debugfs call is passed into this one.) So fix this up by
properly checking if the two parent directories are errors (they are
allowed to be NULL), and if the dentry to rename is not NULL or an
error.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CRYPTO_TFM_REQ_WEAK_KEY confuses newcomers to the crypto API because it
sounds like it is requesting a weak key. Actually, it is requesting
that weak keys be forbidden (for algorithms that have the notion of
"weak keys"; currently only DES and XTS do).
Also it is only one letter away from CRYPTO_TFM_RES_WEAK_KEY, with which
it can be easily confused. (This in fact happened in the UX500 driver,
though just in some debugging messages.)
Therefore, make the intent clear by renaming it to
CRYPTO_TFM_REQ_FORBID_WEAK_KEYS.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If the server doesn't grant us at least 3 credits during the mount
we won't be able to complete it because query path info operation
requires 3 credits. Use the cached file handle if possible to allow
the mount to succeed.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If we don't receive a response we can't assume that the server
granted one credit. Assume zero credits in such cases.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The current code doesn't do proper accounting for credits
in SMB1 case: it adds one credit per response only if we get
a complete response while it needs to return it unconditionally.
Fix this and also include malformed responses for SMB2+ into
accounting for credits because such responses have Credit
Granted field, thus nothing prevents to get a proper credit
value from them.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently we mark MID as malformed if we get an error from server
in a read response. This leads to not properly processing credits
in the readv callback. Fix this by marking such a response as
normal received response and process it appropriately.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When executing add_credits() we currently call cifs_reconnect()
if the number of credits is zero and there are no requests in
flight. In this case we may call cifs_reconnect() recursively
twice and cause memory corruption given the following sequence
of functions:
mid1.callback() -> add_credits() -> cifs_reconnect() ->
-> mid2.callback() -> add_credits() -> cifs_reconnect().
Fix this by avoiding to call cifs_reconnect() in add_credits()
and checking for zero credits in the demultiplex thread.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>