Commit Graph

64081 Commits

Author SHA1 Message Date
Jens Axboe
08a1d26eb8 io_uring: ensure openat sets O_LARGEFILE if needed
OPENAT2 correctly sets O_LARGEFILE if it has to, but that escaped the
OPENAT opcode. Dmitry reports that his test case that compares openat()
and IORING_OP_OPENAT sees failures on large files:

*** sync openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write succeeded

*** sync openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write succeeded

*** io_uring openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write failed: File too large

*** io_uring openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write failed: File too large

Ensure we set O_LARGEFILE, if force_o_largefile() is true.

Cc: stable@vger.kernel.org # v5.6
Fixes: 15b71abe7b ("io_uring: add support for IORING_OP_OPENAT")
Reported-by: Dmitry Kadashev <dkadashev@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-08 09:20:54 -06:00
Mike Marshall
0e393a9a8f orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush
Christoph Hellwig noticed that we were doing some unnecessary
work in orangefs_flush:

  orangefs_flush just writes out data on every close(2) call.  There is
  no need to change anything about the dirty state, especially as
  orangefs doesn't treat I_DIRTY_TIMES special in any way.  The code
  seems to come from partially open coding vfs_fsync.

He sent in a patch with the above commit message and also a
patch that was a reversion of another Orangefs patch I had
sent upstream a while ago. I had to fix his reversion patch
so that it would compile which caused his "don't mess with
I_DIRTY_TIMES" patch to fail to apply. So here I have just
remade his patch and applied it after the fixed reversion patch.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2020-04-08 09:39:11 -04:00
Mike Marshall
ec95f1dedc orangefs: get rid of knob code...
Christoph Hellwig sent in a reversion of "orangefs: remember count
when reading." because:

  ->read_iter calls can race with each other and one or
  more ->flush calls. Remove the the scheme to store the read
  count in the file private data as is is completely racy and
  can cause use after free or double free conditions

Christoph's reversion caused Orangefs not to work or to compile. I
added a patch that fixed that, but intel's kbuild test robot pointed
out that sending Christoph's patch followed by my patch upstream, it
would break bisection because of the failure to compile. So I have
combined the reversion plus my patch... here's the commit message
that was in my patch:

  Logically, optimal Orangefs "pages" are 4 megabytes. Reading
  large Orangefs files 4096 bytes at a time is like trying to
  kick a dead whale down the beach. Before Christoph's "Revert
  orangefs: remember count when reading." I tried to give users
  a knob whereby they could, for example, use "count" in
  read(2) or bs with dd(1) to get whatever they considered an
  appropriate amount of bytes at a time from Orangefs and fill
  as many page cache pages as they could at once.

  Without the racy code that Christoph reverted Orangefs won't
  even compile, much less work. So this replaces the logic that
  used the private file data that Christoph reverted with
  a static number of bytes to read from Orangefs.

  I ran tests like the following to determine what a
  reasonable static number of bytes might be:

  dd if=/pvfsmnt/asdf of=/dev/null count=128 bs=4194304
  dd if=/pvfsmnt/asdf of=/dev/null count=256 bs=2097152
  dd if=/pvfsmnt/asdf of=/dev/null count=512 bs=1048576
                            .
                            .
                            .
  dd if=/pvfsmnt/asdf of=/dev/null count=4194304 bs=128

  Reads seem faster using the static number, so my "knob code"
  wasn't just racy, it wasn't even a good idea...

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Reported-by: kbuild test robot <lkp@intel.com>
2020-04-08 09:38:51 -04:00
Linus Torvalds
63bef48fd6 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - a lot more of MM, quite a bit more yet to come: (memcg, pagemap,
   vmalloc, pagealloc, migration, thp, ksm, madvise, virtio,
   userfaultfd, memory-hotplug, shmem, rmap, zswap, zsmalloc, cleanups)

 - various other subsystems (procfs, misc, MAINTAINERS, bitops, lib,
   checkpatch, epoll, binfmt, kallsyms, reiserfs, kmod, gcov, kconfig,
   ubsan, fault-injection, ipc)

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (158 commits)
  ipc/shm.c: make compat_ksys_shmctl() static
  ipc/mqueue.c: fix a brace coding style issue
  lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability"
  ubsan: include bug type in report header
  kasan: unset panic_on_warn before calling panic()
  ubsan: check panic_on_warn
  drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks
  ubsan: split "bounds" checker from other options
  ubsan: add trap instrumentation option
  init/Kconfig: clean up ANON_INODES and old IO schedulers options
  kernel/gcov/fs.c: replace zero-length array with flexible-array member
  gcov: gcc_3_4: replace zero-length array with flexible-array member
  gcov: gcc_4_7: replace zero-length array with flexible-array member
  kernel/kmod.c: fix a typo "assuems" -> "assumes"
  reiserfs: clean up several indentation issues
  kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
  samples/hw_breakpoint: drop use of kallsyms_lookup_name()
  samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes
  fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
  fs/binfmt_elf.c: allocate less for static executable
  ...
2020-04-07 14:11:54 -07:00
Linus Torvalds
04de788e61 Merge tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - Fix a page leak in nfs_destroy_unlinked_subrequests()

   - Fix use-after-free issues in nfs_pageio_add_request()

   - Fix new mount code constant_table array definitions

   - finish_automount() requires us to hold 2 refs to the mount record

  Features:
   - Improve the accuracy of telldir/seekdir by using 64-bit cookies
     when possible.

   - Allow one RDMA active connection and several zombie connections to
     prevent blocking if the remote server is unresponsive.

   - Limit the size of the NFS access cache by default

   - Reduce the number of references to credentials that are taken by
     NFS

   - pNFS files and flexfiles drivers now support per-layout segment
     COMMIT lists.

   - Enable partial-file layout segments in the pNFS/flexfiles driver.

   - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type

   - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
     the DS using the layouterror mechanism.

  Bugfixes and cleanups:
   - SUNRPC: Fix krb5p regressions

   - Don't specify NFS version in "UDP not supported" error

   - nfsroot: set tcp as the default transport protocol

   - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()

   - alloc_nfs_open_context() must use the file cred when available

   - Fix locking when dereferencing the delegation cred

   - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails

   - Various clean ups of the NFS O_DIRECT commit code

   - Clean up RDMA connect/disconnect

   - Replace zero-length arrays with C99-style flexible arrays"

* tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
  NFS: Clean up process of marking inode stale.
  SUNRPC: Don't start a timer on an already queued rpc task
  NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
  NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
  NFS: Beware when dereferencing the delegation cred
  NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
  NFS: finish_automount() requires us to hold 2 refs to the mount record
  NFS: Fix a few constant_table array definitions
  NFS: Try to join page groups before an O_DIRECT retransmission
  NFS: Refactor nfs_lock_and_join_requests()
  NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
  NFS: Clean up nfs_lock_and_join_requests()
  NFS: Remove the redundant function nfs_pgio_has_mirroring()
  NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
  NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
  NFS: Fix use-after-free issues in nfs_pageio_add_request()
  NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
  NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
  NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
  pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
  ...
2020-04-07 13:51:39 -07:00
Linus Torvalds
f40f31cadc Merge tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've mainly focused on fixing bugs and addressing
  issues in recently introduced compression support.

  Enhancement:
   - add zstd support, and set LZ4 by default
   - add ioctl() to show # of compressed blocks
   - show mount time in debugfs
   - replace rwsem with spinlock
   - avoid lock contention in DIO reads

  Some major bug fixes wrt compression:
   - compressed block count
   - memory access and leak
   - remove obsolete fields
   - flag controls

  Other bug fixes and clean ups:
   - fix overflow when handling .flags in inode_info
   - fix SPO issue during resize FS flow
   - fix compression with fsverity enabled
   - potential deadlock when writing compressed pages
   - show missing mount options"

* tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
  f2fs: keep inline_data when compression conversion
  f2fs: fix to disable compression on directory
  f2fs: add missing CONFIG_F2FS_FS_COMPRESSION
  f2fs: switch discard_policy.timeout to bool type
  f2fs: fix to verify tpage before releasing in f2fs_free_dic()
  f2fs: show compression in statx
  f2fs: clean up dic->tpages assignment
  f2fs: compress: support zstd compress algorithm
  f2fs: compress: add .{init,destroy}_decompress_ctx callback
  f2fs: compress: fix to call missing destroy_compress_ctx()
  f2fs: change default compression algorithm
  f2fs: clean up {cic,dic}.ref handling
  f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages()
  f2fs: xattr.h: Make stub helpers inline
  f2fs: fix to avoid double unlock
  f2fs: fix potential .flags overflow on 32bit architecture
  f2fs: fix NULL pointer dereference in f2fs_verity_work()
  f2fs: fix to clear PG_error if fsverity failed
  f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile()
  f2fs: don't trigger data flush in foreground operation
  ...
2020-04-07 13:48:26 -07:00
Linus Torvalds
763dede1b2 Merge tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:

 - Fix for memory leaks around UBIFS orphan handling

 - Fix for memory leaks around UBI fastmap

 - Remove zero-length array from ubi-media.h

 - Fix for TNC lookup in UBIFS orphan code

* tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: ubi-media.h: Replace zero-length array with flexible-array member
  ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len
  ubi: fastmap: Only produce the initial anchor PEB when fastmap is used
  ubi: fastmap: Free unused fastmap anchor peb during detach
  ubifs: ubifs_add_orphan: Fix a memory leak bug
  ubifs: ubifs_jnl_write_inode: Fix a memory leak bug
  ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans()
2020-04-07 12:40:56 -07:00
Linus Torvalds
762a9f2f01 Merge tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:

 - New mode for time travel, external via virtio

 - Fixes for ubd to make sure no requests can get lost

 - Fixes for vector networking

 - Allow CONFIG_STATIC_LINK only when possible

 - Minor cleanups and fixes

* tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Remove some unnecessary NULL checks in vector_user.c
  um: vector: Avoid NULL ptr deference if transport is unset
  um: Make CONFIG_STATIC_LINK actually static
  um: Implement cpu_relax() as ndelay(1) for time-travel
  um: Implement ndelay/udelay in time-travel mode
  um: Implement time-travel=ext
  um: virtio: Implement VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS
  um: time-travel: Rewrite as an event scheduler
  um: Move timer-internal.h to non-shared
  hostfs: Use kasprintf() instead of fixed buffer formatting
  um: falloc.h needs to be directly included for older libc
  um: ubd: Retry buffer read on any kind of error
  um: ubd: Prevent buffer overrun on command completion
  um: Fix overlapping ELF segments when statically linked
  um: Delete never executed timer
  um: Don't overwrite ethtool driver version
  um: Fix len of file in create_pid_file
  um: Don't use console_drivers directly
  um: Cleanup CONFIG_IOSCHED_CFQ
2020-04-07 12:36:09 -07:00
Steve French
2bcb4fd6ba smb3: smbdirect support can be configured by default
smbdirect support (SMB3 over RDMA) should be enabled by
default in many configurations.

It is not experimental and is stable enough and has enough
performance benefits to recommend that it be configured by
default.  Change the  "If unsure N" to "If unsure Y" in
the description of the configuration parameter.

Acked-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-07 13:39:00 -05:00
Colin Ian King
5404e7e0ac reiserfs: clean up several indentation issues
There are several places where code is indented incorrectly. Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200325135018.113431-1-colin.king@canonical.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Alexey Dobriyan
aa0d1564b1 fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
Static executables don't need to free NULL pointer.

It doesn't matter really because static executable is not common scenario
but do it anyway out of pedantry.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200219185330.GA4933@avx2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Alexey Dobriyan
0693ffebcf fs/binfmt_elf.c: allocate less for static executable
PT_INTERP ELF header can be spared if executable is static.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200219185012.GB4871@avx2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Alexey Dobriyan
c69bcc932e fs/binfmt_elf.c: delete "loc" variable
"loc" variable became just a wrapper for PT_INTERP ELF header after main
ELF header was moved to "bprm->buf".  Delete it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200219184847.GA4871@avx2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Jason Baron
efcdd350d1 fs/epoll: make nesting accounting safe for -rt kernel
Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
ep_poll_safewake() can take several non-raw spinlocks after disabling
interrupts.  Since a spinlock can block in the -rt kernel, we can't take a
spinlock after disabling interrupts.  So let's re-work how we determine
the nesting level such that it plays nicely with the -rt kernel.

Let's introduce a 'nests' field in struct eventpoll that records the
current nesting level during ep_poll_callback().  Then, if we nest again
we can find the previous struct eventpoll that we were called from and
increase our count by 1.  The 'nests' field is protected by
ep->poll_wait.lock.

I've also moved the visited field to reduce the size of struct eventpoll
from 184 bytes to 176 bytes on x86_64 for !CONFIG_DEBUG_LOCK_ALLOC, which
is typical for a production config.

Reported-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1582739816-13167-1-git-send-email-jbaron@akamai.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Matthew Wilcox (Oracle)
fad955009c proc: inline m_next_vma into m_next
It's clearer to just put this inline.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-5-adobriyan@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Matthew Wilcox (Oracle)
b829a0f0f2 seq_file: remove m->version
The process maps file was the only user of version (introduced back in
2005).  Now that it uses ppos instead, we can remove it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-4-adobriyan@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Matthew Wilcox (Oracle)
4781f2c3ab proc: use ppos instead of m->version
The ppos is a private cursor, just like m->version.  Use the canonical
cursor, not a special one.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-3-adobriyan@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Matthew Wilcox (Oracle)
c2e88d22e8 proc: remove m_cache_vma
Instead of setting m->version in the show method, set it in m_next(),
where it should be.  Also remove the fallback code for failing to find a
vma, or version being zero.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-2-adobriyan@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Matthew Wilcox (Oracle)
d07ded611e proc: inline vma_stop into m_stop
Instead of calling vma_stop() from m_start() and m_next(), do its work
in m_stop().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-1-adobriyan@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Alexey Dobriyan
5c5ab9714c proc: speed up /proc/*/statm
top(1) reads all /proc/*/statm files but kernel threads will always have
zeros.  Print those zeroes directly without going through
seq_put_decimal_ull().

Speed up reading /proc/2/statm (which is kthreadd) is like 3%.

My system has more kernel threads than normal processes after booting KDE.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200307154435.GA2788@avx2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Alexey Dobriyan
d919b33daf proc: faster open/read/close with "permanent" files
Now that "struct proc_ops" exist we can start putting there stuff which
could not fly with VFS "struct file_operations"...

Most of fs/proc/inode.c file is dedicated to make open/read/.../close
reliable in the event of disappearing /proc entries which usually happens
if module is getting removed.  Files like /proc/cpuinfo which never
disappear simply do not need such protection.

Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such
"permanent" files.

Enable "permanent" flag for

	/proc/cpuinfo
	/proc/kmsg
	/proc/modules
	/proc/slabinfo
	/proc/stat
	/proc/sysvipc/*
	/proc/swaps

More will come once I figure out foolproof way to prevent out module
authors from marking their stuff "permanent" for performance reasons
when it is not.

This should help with scalability: benchmark is "read /proc/cpuinfo R times
by N threads scattered over the system".

	N	R	t, s (before)	t, s (after)
	-----------------------------------------------------
	64	4096	1.582458	1.530502	-3.2%
	256	4096	6.371926	6.125168	-3.9%
	1024	4096	25.64888	24.47528	-4.6%

Benchmark source:

#include <chrono>
#include <iostream>
#include <thread>
#include <vector>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN);
int N;
const char *filename;
int R;

int xxx = 0;

int glue(int n)
{
	cpu_set_t m;
	CPU_ZERO(&m);
	CPU_SET(n, &m);
	return sched_setaffinity(0, sizeof(cpu_set_t), &m);
}

void f(int n)
{
	glue(n % NR_CPUS);

	while (*(volatile int *)&xxx == 0) {
	}

	for (int i = 0; i < R; i++) {
		int fd = open(filename, O_RDONLY);
		char buf[4096];
		ssize_t rv = read(fd, buf, sizeof(buf));
		asm volatile ("" :: "g" (rv));
		close(fd);
	}
}

int main(int argc, char *argv[])
{
	if (argc < 4) {
		std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R
";
		return 1;
	}

	N = atoi(argv[1]);
	filename = argv[2];
	R = atoi(argv[3]);

	for (int i = 0; i < NR_CPUS; i++) {
		if (glue(i) == 0)
			break;
	}

	std::vector<std::thread> T;
	T.reserve(N);
	for (int i = 0; i < N; i++) {
		T.emplace_back(f, i);
	}

	auto t0 = std::chrono::system_clock::now();
	{
		*(volatile int *)&xxx = 1;
		for (auto& t: T) {
			t.join();
		}
	}
	auto t1 = std::chrono::system_clock::now();
	std::chrono::duration<double> dt = t1 - t0;
	std::cout << dt.count() << '
';

	return 0;
}

P.S.:
Explicit randomization marker is added because adding non-function pointer
will silently disable structure layout randomization.

[akpm@linux-foundation.org: coding style fixes]
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Joe Perches <joe@perches.com>
Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Jules Irenge
904f394e2e fs/proc/inode.c: annotate close_pdeo() for sparse
Fix sparse locking imbalance warning:

	warning: context imbalance in close_pdeo() - unexpected unlock

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200227201538.GA30462@avx2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Peter Xu
14819305e0 userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally
Only declare _UFFDIO_WRITEPROTECT if the user specified
UFFDIO_REGISTER_MODE_WP and if all the checks passed.  Then when the user
registers regions with shmem/hugetlbfs we won't expose the new ioctl to
them.  Even with complete anonymous memory range, we'll only expose the
new WP ioctl bit if the register mode has MODE_WP.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:40 -07:00
Peter Xu
23080e2783 userfaultfd: wp: don't wake up when doing write protect
It does not make sense to try to wake up any waiting thread when we're
write-protecting a memory region.  Only wake up when resolving a write
protected page fault.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-16-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Andrea Arcangeli
63b2d4174c userfaultfd: wp: add the writeprotect API to userfaultfd ioctl
Introduce the new uffd-wp APIs for userspace.

Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking
using the new UFFDIO_REGISTER_MODE_WP flag.  Note that this flag can
co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the
userspace program can not only resolve missing page faults, and at the
same time tracking page data changes along the way.

Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level
write protection tracking.  Note that we will need to register the memory
region with UFFDIO_REGISTER_MODE_WP before that.

[peterx@redhat.com: write up the commit message]
[peterx@redhat.com: remove useless block, write commit message, check against
 VM_MAYWRITE rather than VM_WRITE when register]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Andrea Arcangeli
72981e0e7b userfaultfd: wp: add UFFDIO_COPY_MODE_WP
This allows UFFDIO_COPY to map pages write-protected.

[peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
 around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
 commit messages]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Anshuman Khandual
03911132aa mm/vma: replace all remaining open encodings with is_vm_hugetlb_page()
This replaces all remaining open encodings with is_vm_hugetlb_page().

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:37 -07:00
Long Li
044b541c11 cifs: smbd: Do not schedule work to send immediate packet on every receive
Immediate packets should only be sent to peer when there are new
receive credits made available. New credits show up on freeing
receive buffer, not on receiving data.

Fix this by avoid unnenecessary work schedules.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-07 12:41:16 -05:00
Long Li
f1b7b862bf cifs: smbd: Properly process errors on ib_post_send
When processing errors from ib_post_send(), the transport state needs to be
rolled back to the condition before the error.

Refactor the old code to make it easy to roll back on IB errors, and fix this.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-07 12:41:16 -05:00
Long Li
eda1c54f14 cifs: Allocate crypto structures on the fly for calculating signatures of incoming packets
CIFS uses pre-allocated crypto structures to calculate signatures for both
incoming and outgoing packets. In this way it doesn't need to allocate crypto
structures for every packet, but it requires a lock to prevent concurrent
access to crypto structures.

Remove the lock by allocating crypto structures on the fly for
incoming packets. At the same time, we can still use pre-allocated crypto
structures for outgoing packets, as they are already protected by transport
lock srv_mutex.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-07 12:41:16 -05:00
Long Li
d4e5160d1a cifs: smbd: Update receive credits before sending and deal with credits roll back on failure before sending
Recevie credits should be updated before sending the packet, not
before a work is scheduled. Also, the value needs roll back if
something fails and cannot send.

Signed-off-by: Long Li <longli@microsoft.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-07 12:41:16 -05:00
Long Li
3ffbe78aff cifs: smbd: Check send queue size before posting a send
Sometimes the remote peer may return more send credits than the send queue
depth. If all the send credits are used to post senasd, we may overflow the
send queue.

Fix this by checking the send queue size before posting a send.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-07 12:41:16 -05:00
Long Li
072a14ec63 cifs: smbd: Merge code to track pending packets
As an optimization, SMBD tries to track two types of packets: packets with
payload and without payload. There is no obvious benefit or performance gain
to separately track two types of packets.

Just treat them as pending packets and merge the tracking code.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-07 12:41:16 -05:00
Aurelien Aptel
e79b0332ae cifs: ignore cached share root handle closing errors
Fix tcon use-after-free and NULL ptr deref.

Customer system crashes with the following kernel log:

[462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14       => a QUERY DIR
[462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.347107] CIFS VFS: Close unmatched open
[462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
...
    [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server)
 #6 [...] smb2_cancelled_close_fid at ... [cifs]
 #7 [...] process_one_work at ...
 #8 [...] worker_thread at ...
 #9 [...] kthread at ...

The most likely explanation we have is:

* When we put the last reference of a tcon (refcount=0), we close the
  cached share root handle.
* If closing a handle is interrupted, SMB2_close() will
  queue a SMB2_close() in a work thread.
* The queued object keeps a tcon ref so we bump the tcon
  refcount, jumping from 0 to 1.
* We reach the end of cifs_put_tcon(), we free the tcon object despite
  it now having a refcount of 1.
* The queued work now runs, but the tcon, ses & server was freed in
  the meantime resulting in a crash.

THREAD 1
========
cifs_put_tcon                 => tcon refcount reach 0
  SMB2_tdis
   close_shroot_lease
    close_shroot_lease_locked => if cached root has lease && refcount = 0
     smb2_close_cached_fid    => if cached root valid
      SMB2_close              => retry close in a thread if interrupted
       smb2_handle_cancelled_close
        __smb2_handle_cancelled_close    => !! tcon refcount bump 0 => 1 !!
         INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
         queue_work(cifsiod_wq, &cancelled->work) => queue work
 tconInfoFree(tcon);    ==> freed!
 cifs_put_smb_ses(ses); ==> freed!

THREAD 2 (workqueue)
========
smb2_cancelled_close_fid
  SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon
  cifs_put_tcon(cancelled->tcon);      => tcon refcount reach 0 second time
  *CRASH*

Fixes: d919131935 ("CIFS: Close cached root handle only if it has a lease")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-04-07 12:40:40 -05:00
Xiaoguang Wang
f7fe934686 io_uring: initialize fixed_file_data lock
syzbot reports below warning:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 7099 Comm: syz-executor897 Not tainted 5.6.0-next-20200406-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 assign_lock_key kernel/locking/lockdep.c:913 [inline]
 register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
 __lock_acquire+0x104/0x4e00 kernel/locking/lockdep.c:4223
 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4923
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
 io_sqe_files_register fs/io_uring.c:6599 [inline]
 __io_uring_register+0x1fe8/0x2f00 fs/io_uring.c:8001
 __do_sys_io_uring_register fs/io_uring.c:8081 [inline]
 __se_sys_io_uring_register fs/io_uring.c:8063 [inline]
 __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:8063
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x440289
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffff1bbf558 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440289
RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000401b10
R13: 0000000000401ba0 R14: 0000000000000000 R15: 0000000000000000

Initialize struct fixed_file_data's lock to fix this issue.

Reported-by: syzbot+e6eeca4a035da76b3065@syzkaller.appspotmail.com
Fixes: 0558955373 ("io_uring: refactor file register/unregister/update handling")
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-07 09:45:51 -06:00
Colin Ian King
211fea18a7 io_uring: remove redundant variable pointer nxt and io_wq_assign_next call
An earlier commit "io_uring: remove @nxt from handlers" removed the
setting of pointer nxt and now it is always null, hence the non-null
check and call to io_wq_assign_next is redundant and can be removed.

Addresses-Coverity: ("'Constant' variable guard")
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-07 09:45:33 -06:00
Trond Myklebust
93ce4af774 NFS: Clean up process of marking inode stale.
Instead of the various open coded calls to set the NFS_INO_STALE bit
and call nfs_zap_caches(), consolidate them into a single function
nfs_set_inode_stale().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-06 13:56:33 -04:00
Linus Torvalds
b6ff10700d Merge tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
 "This implements the fanotify FAN_DIR_MODIFY event.

  This event reports the name in a directory under which a change
  happened and together with the directory filehandle and fstatat()
  allows reliable and efficient implementation of directory
  synchronization"

* tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: Fix the checks in fanotify_fsid_equal
  fanotify: report name info for FAN_DIR_MODIFY event
  fanotify: record name info for FAN_DIR_MODIFY event
  fanotify: Drop fanotify_event_has_fid()
  fanotify: prepare to report both parent and child fid's
  fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name
  fanotify: divorce fanotify_path_event and fanotify_fid_event
  fanotify: Store fanotify handles differently
  fanotify: Simplify create_fd()
  fanotify: fix merging marks masks with FAN_ONDIR
  fanotify: merge duplicate events on parent and child
  fsnotify: replace inode pointer with an object id
  fsnotify: simplify arguments passing to fsnotify_parent()
  fsnotify: use helpers to access data by data_type
  fsnotify: funnel all dirent events through fsnotify_name()
  fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
  fsnotify: tidy up FS_ and FAN_ constants
2020-04-06 08:58:42 -07:00
Linus Torvalds
74e934ba0d Merge tag 'for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2/udf updates from Jan Kara:
 "Cleanups and fixes for ext2 and one cleanup for udf"

* tag 'for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: fix empty body warnings when -Wextra is used
  ext2: fix debug reference to ext2_xattr_cache
  udf: udf_sb.h: Replace zero-length array with flexible-array member
  ext2: xattr.h: Replace zero-length array with flexible-array member
  ext2: Silence lockdep warning about reclaim under xattr_sem
2020-04-06 08:55:04 -07:00
Linus Torvalds
e14679b62d Merge tag '9p-for-5.7' of git://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
 "Not much new, but a few patches for this cycle:

   - Fix read with O_NONBLOCK to allow incomplete read and return
     immediately

   - Rest is just cleanup (indent, unused field in struct, extra
     semicolon)"

* tag '9p-for-5.7' of git://github.com/martinetd/linux:
  net/9p: remove unused p9_req_t aux field
  9p: read only once on O_NONBLOCK
  9pnet: allow making incomplete read requests
  9p: Remove unneeded semicolon
  9p: Fix Kconfig indentation
2020-04-06 08:46:59 -07:00
Christoph Hellwig
5833112df7 xfs: reflink should force the log out if mounted with wsync
Reflink should force the log out to disk if the filesystem was mounted
with wsync, the same as most other operations in xfs.

[Note: XFS_MOUNT_WSYNC is set when the admin mounts the filesystem
with either the 'wsync' or 'sync' mount options, which effectively means
that we're classifying reflink/dedupe as IO operations and making them
synchronous when required.]

Fixes: 3fc9f5e409 ("xfs: remove xfs_reflink_remap_range")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
[darrick: add more to the changelog]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-06 08:44:39 -07:00
Christoph Hellwig
54fbdd1035 xfs: factor out a new xfs_log_force_inode helper
Create a new helper to force the log up to the last LSN touching an
inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-06 08:44:35 -07:00
Linus Torvalds
77a73eecd4 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pathwalk fix from Al Viro:
 "Dumb braino in legitimize_path()..."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix a braino in legitimize_path()
2020-04-06 08:38:52 -07:00
Al Viro
5bd73286d5 fix a braino in legitimize_path()
brown paperbag time... wrong order of arguments ended up confusing
the values to check dentry and mount_lock seqcounts against.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: 2aa3847085 ("non-RCU analogue of the previous commit")
Tested-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-06 10:38:59 -04:00
Pavel Begunkov
48bdd849e9 io_uring: fix ctx refcounting in io_submit_sqes()
If io_get_req() fails, it drops a ref. Then, awhile keeping @submitted
unmodified, io_submit_sqes() breaks the loop and puts @nr - @submitted
refs. For each submitted req a ref is dropped in io_put_req() and
friends. So, for @nr taken refs there will be
(@nr - @submitted + @submitted + 1) dropped.

Remove ctx refcounting from io_get_req(), that at the same time makes
it clearer.

Fixes: 2b85edfc0c ("io_uring: batch getting pcpu references")
Cc: stable@vger.kernel.org # v5.6
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-05 16:23:48 -06:00
Linus Torvalds
70fbdfef4b sysfs: remove redundant __compat_only_sysfs_link_entry_to_kobj fn
Commit 9255782f70 ("sysfs: Wrap __compat_only_sysfs_link_entry_to_kobj
function to change the symlink name") made this function a wrapper
around a new non-underscored function, which is a bit odd.  The normal
naming convention is the other way around: the underscored function is
the wrappee, and the non-underscored function is the wrapper.

There's only one single user (well, two call-sites in that user) of the
more limited double underscore version of this function, so just remove
the oddly named wrapper entirely and just add the extra NULL argument to
the user.

I considered just doing that in the merge, but that tends to make
history really hard to read.

Link: https://lore.kernel.org/lkml/CAHk-=wgkkmNV5tMzQDmPAQuNJBuMcry--Jb+h8H1o4RA3kF7QQ@mail.gmail.com/
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-05 11:34:35 -07:00
Linus Torvalds
d38c07afc3 Merge tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
 "Slightly late as I had to rebase mid-week to insert a bug fix:

   - A large series from Nick for 64-bit to further rework our exception
     vectors, and rewrite portions of the syscall entry/exit and
     interrupt return in C. The result is much easier to follow code
     that is also faster in general.

   - Cleanup of our ptrace code to split various parts out that had
     become badly intertwined with #ifdefs over the years.

   - Changes to our NUMA setup under the PowerVM hypervisor which should
     hopefully avoid non-sensical topologies which can lead to warnings
     from the workqueue code and other problems.

   - MAINTAINERS updates to remove some of our old orphan entries and
     update the status of others.

   - Quite a few other small changes and fixes all over the map.

  Thanks to: Abdul Haleem, afzal mohammed, Alexey Kardashevskiy, Andrew
  Donnellan, Aneesh Kumar K.V, Balamuruhan S, Cédric Le Goater, Chen
  Zhou, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Clement
  Courbet, Daniel Axtens, David Gibson, Douglas Miller, Fabiano Rosas,
  Fangrui Song, Ganesh Goudar, Gautham R. Shenoy, Greg Kroah-Hartman,
  Greg Kurz, Gustavo Luiz Duarte, Hari Bathini, Ilie Halip, Jan Kara,
  Joe Lawrence, Joe Perches, Kajol Jain, Larry Finger, Laurentiu Tudor,
  Leonardo Bras, Libor Pechacek, Madhavan Srinivasan, Mahesh Salgaonkar,
  Masahiro Yamada, Masami Hiramatsu, Mauricio Faria de Oliveira, Michael
  Neuling, Michal Suchanek, Mike Rapoport, Nageswara R Sastry, Nathan
  Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick
  Desaulniers, Oliver O'Halloran, Po-Hsu Lin, Pratik Rajesh Sampat,
  Rasmus Villemoes, Ravi Bangoria, Roman Bolshakov, Sam Bobroff,
  Sandipan Das, Santosh S, Sedat Dilek, Segher Boessenkool, Shilpasri G
  Bhat, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Tyrel
  Datwyler, Vaibhav Jain, YueHaibing"

* tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
  powerpc: Make setjmp/longjmp signature standard
  powerpc/cputable: Remove unnecessary copy of cpu_spec->oprofile_type
  powerpc: Suppress .eh_frame generation
  powerpc: Drop -fno-dwarf2-cfi-asm
  powerpc/32: drop unused ISA_DMA_THRESHOLD
  powerpc/powernv: Add documentation for the opal sensor_groups sysfs interfaces
  selftests/powerpc: Fix try-run when source tree is not writable
  powerpc/vmlinux.lds: Explicitly retain .gnu.hash
  powerpc/ptrace: move ptrace_triggered() into hw_breakpoint.c
  powerpc/ptrace: create ppc_gethwdinfo()
  powerpc/ptrace: create ptrace_get_debugreg()
  powerpc/ptrace: split out ADV_DEBUG_REGS related functions.
  powerpc/ptrace: move register viewing functions out of ptrace.c
  powerpc/ptrace: split out TRANSACTIONAL_MEM related functions.
  powerpc/ptrace: split out SPE related functions.
  powerpc/ptrace: split out ALTIVEC related functions.
  powerpc/ptrace: split out VSX related functions.
  powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET
  powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64
  powerpc/ptrace: remove unused header includes
  ...
2020-04-05 11:12:59 -07:00
Linus Torvalds
9c94b39560 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:

 - Replace ext4's bmap and iopoll implementations to use iomap.

 - Clean up extent tree handling.

 - Other cleanups and miscellaneous bug fixes

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits)
  ext4: save all error info in save_error_info() and drop ext4_set_errno()
  ext4: fix incorrect group count in ext4_fill_super error message
  ext4: fix incorrect inodes per group in error message
  ext4: don't set dioread_nolock by default for blocksize < pagesize
  ext4: disable dioread_nolock whenever delayed allocation is disabled
  ext4: do not commit super on read-only bdev
  ext4: avoid ENOSPC when avoiding to reuse recently deleted inodes
  ext4: unregister sysfs path before destroying jbd2 journal
  ext4: check for non-zero journal inum in ext4_calculate_overhead
  ext4: remove map_from_cluster from ext4_ext_map_blocks
  ext4: clean up ext4_ext_insert_extent() call in ext4_ext_map_blocks()
  ext4: mark block bitmap corrupted when found instead of BUGON
  ext4: use flexible-array member for xattr structs
  ext4: use flexible-array member in struct fname
  Documentation: correct the description of FIEMAP_EXTENT_LAST
  ext4: move ext4_fiemap to use iomap framework
  ext4: make ext4_ind_map_blocks work with fiemap
  ext4: move ext4 bmap to use iomap infrastructure
  ext4: optimize ext4_ext_precache for 0 depth
  ext4: add IOMAP_F_MERGED for non-extent based mapping
  ...
2020-04-05 10:54:03 -07:00
Linus Torvalds
83eb69f3b8 Merge branch 'work.exfat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull exfat filesystem from Al Viro:
 "Shiny new fs/exfat replacement for drivers/staging/exfat"

* 'work.exfat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  exfat: update file system parameter handling
  staging: exfat: make staging/exfat and fs/exfat mutually exclusive
  MAINTAINERS: add exfat filesystem
  exfat: add Kconfig and Makefile
  exfat: add nls operations
  exfat: add misc operations
  exfat: add exfat cache
  exfat: add bitmap operations
  exfat: add fat entry operations
  exfat: add file operations
  exfat: add directory operations
  exfat: add inode operations
  exfat: add super block operations
  exfat: add in-memory and on-disk structures and headers
2020-04-04 11:46:09 -07:00
Linus Torvalds
b3d8e42282 Merge tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6
Pull nfsd updates from Chuck Lever:

 - Fix EXCHANGE_ID response when NFSD runs in a container

 - A battery of new static trace points

 - Socket transports now use bio_vec to send Replies

 - NFS/RDMA now supports filesystems with no .splice_read method

 - Favor memcpy() over DMA mapping for small RPC/RDMA Replies

 - Add pre-requisites for supporting multiple Write chunks

 - Numerous minor fixes and clean-ups

[ Chuck is filling in for Bruce this time while he and his family settle
  into a new house ]

* tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6: (39 commits)
  svcrdma: Fix leak of transport addresses
  SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()'
  SUNRPC/cache: don't allow invalid entries to be flushed
  nfsd: fsnotify on rmdir under nfsd/clients/
  nfsd4: kill warnings on testing stateids with mismatched clientids
  nfsd: remove read permission bit for ctl sysctl
  NFSD: Fix NFS server build errors
  sunrpc: Add tracing for cache events
  SUNRPC/cache: Allow garbage collection of invalid cache entries
  nfsd: export upcalls must not return ESTALE when mountd is down
  nfsd: Add tracepoints for update of the expkey and export cache entries
  nfsd: Add tracepoints for exp_find_key() and exp_get_by_name()
  nfsd: Add tracing to nfsd_set_fh_dentry()
  nfsd: Don't add locks to closed or closing open stateids
  SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends
  SUNRPC: Refactor xs_sendpages()
  svcrdma: Avoid DMA mapping small RPC Replies
  svcrdma: Fix double sync of transport header buffer
  svcrdma: Refactor chunk list encoders
  SUNRPC: Add encoders for list item discriminators
  ...
2020-04-04 11:13:51 -07:00