Commit Graph

67197 Commits

Author SHA1 Message Date
Pavel Begunkov
327d6d968b io_uring: alloc ->io in io_req_defer_prep()
Every call to io_req_defer_prep() is prepended with allocating ->io,
just do that in the function. And while we're at it, mark error paths
with unlikey and replace "if (ret < 0)" with "if (ret)".

There is only one change in the observable behaviour, that's instead of
killing the head request right away on error, it postpones it until the
link is assembled, that looks more preferable.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 13:00:44 -06:00
Pavel Begunkov
1c2da9e883 io_uring: remove empty cleanup of OP_OPEN* reqs
A switch in __io_clean_op() doesn't have default, it's pointless to list
opcodes that doesn't do any cleanup. Remove IORING_OP_OPEN* from there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 13:00:44 -06:00
Pavel Begunkov
dca9cf8b87 io_uring: inline io_req_work_grab_env()
The only caller of io_req_work_grab_env() is io_prep_async_work(), and
they are both initialising req->work. Inline grab_env(), it's easier
to keep this way, moreover there already were bugs with misplacing
io_req_init_async().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 13:00:40 -06:00
Pavel Begunkov
0f7e466b39 io_uring: place cflags into completion data
req->cflags is used only for defer-completion path, just use completion
data to store it. With the 4 bytes from the ->sequence patch and
compacting io_kiocb, this frees 8 bytes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:45 -06:00
Pavel Begunkov
9cf7c104de io_uring: remove sequence from io_kiocb
req->sequence is used only for deferred (i.e. DRAIN) requests, but
initialised for every request. Remove req->sequence from io_kiocb
together with its initialisation in io_init_req().

Replace it with a new field in struct io_defer_entry, that will be
calculated only when needed in io_req_defer(), which is a slow path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:45 -06:00
Pavel Begunkov
27dc8338e5 io_uring: use non-intrusive list for defer
The only left user of req->list is DRAIN, hence instead of keeping a
separate per request list for it, do that with old fashion non-intrusive
lists allocated on demand. That's a really slow path, so that's OK.

This removes req->list and so sheds 16 bytes from io_kiocb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:45 -06:00
Pavel Begunkov
7d6ddea6be io_uring: remove init for unused list
poll*() doesn't use req->list, don't init it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:45 -06:00
Pavel Begunkov
135fcde849 io_uring: add req->timeout.list
Instead of using shared req->list, hang timeouts up on their own list
entry. struct io_timeout have enough extra space for it, but if that
will be a problem ->inflight_entry can reused for that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:45 -06:00
Pavel Begunkov
40d8ddd4fa io_uring: use completion list for CQ overflow
As with the completion path, also use compl.list for overflowed
requests. If cleaned up properly, nobody needs per-op data there
anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
d21ffe7eca io_uring: use inflight_entry list for iopoll'ing
req->inflight_entry is used to track requests that grabbed files_struct.
Let's share it with iopoll list, because the only iopoll'ed ops are
reads and writes, which don't need a file table.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
540e32a085 io_uring: rename ctx->poll into ctx->iopoll
It supports both polling and I/O polling. Rename ctx->poll to clearly
show that it's only in I/O poll case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
3ca405ebfc io_uring: share completion list w/ per-op space
Calling io_req_complete(req) means that the request is done, and there
is nothing left but to clean it up. That also means that per-op data
after that should not be used, so we're free to reuse it in completion
path, e.g. to store overflow_list as done in this patch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
252917c30f io_uring: follow **iovec idiom in io_import_iovec
As for import_iovec(), return !=NULL iovec from io_import_iovec() only
when it should be freed. That includes returning NULL when iovec is
already in req->io, because it should be deallocated by other means,
e.g. inside op handler. After io_setup_async_rw() local iovec to ->io,
just mark it NULL, to follow the idea in io_{read,write} as well.

That's easier to follow, and especially useful if we want to reuse
per-op space for completion data.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: only call kfree() on non-NULL pointer]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
c3e330a493 io_uring: add a helper for async rw iovec prep
Preparing reads/writes for async is a bit tricky. Extract a helper to
not repeat it twice.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
b64e3444d4 io_uring: simplify io_req_map_rw()
Don't deref req->io->rw every time, but put it in a local variable. This
looks prettier, generates less instructions, and doesn't break alias
analysis.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
e73751225b io_uring: replace rw->task_work with rq->task_work
io_kiocb::task_work was de-unionised, and is not planned to be shared
back, because it's too useful and commonly used. Hence, instead of
keeping a separate task_work in struct io_async_rw just reuse
req->task_work.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
2ae523ed07 io_uring: extract io_sendmsg_copy_hdr()
Don't repeat send msg initialisation code, it's error prone.
Extract and use a helper function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
1400e69705 io_uring: use more specific type in rcv/snd msg cp
send/recv msghdr initialisation works with struct io_async_msghdr, but
pulls the whole struct io_async_ctx for no reason. That complicates it
with composite accessing, e.g. io->msg.

Use and pass the most specific type, which is struct io_async_msghdr.
It is the larget field in union io_async_ctx and doesn't save stack
space, but looks clearer.
The most of the changes are replacing "io->msg." with "iomsg->"

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Pavel Begunkov
270a594070 io_uring: rename sr->msg into umsg
Every second field in send/recv is called msg, make it a bit more
understandable by renaming ->msg, which is a user provided ptr,
to ->umsg.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Dmitry Vyukov
b36200f543 io_uring: fix sq array offset calculation
rings_size() sets sq_offset to the total size of the rings (the returned
value which is used for memory allocation). This is wrong: sq array should
be located within the rings, not after them. Set sq_offset to where it
should be.

Fixes: 75b28affdd ("io_uring: allocate the two rings together")
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hristo Venev <hristo@venev.name>
Cc: io-uring@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:55:44 -06:00
Jens Axboe
760618f7a8 Merge branch 'io_uring-5.8' into for-5.9/io_uring
Merge in io_uring-5.8 fixes, as changes/cleanups to how we do locked
mem accounting require a fixup, and only one of the spots are noticed
by git as the other merges cleanly. The flags fix from io_uring-5.8
also causes a merge conflict, the leak fix for recvmsg, the double poll
fix, and the link failure locking fix.

* io_uring-5.8:
  io_uring: fix lockup in io_fail_links()
  io_uring: fix ->work corruption with poll_add
  io_uring: missed req_init_async() for IOSQE_ASYNC
  io_uring: always allow drain/link/hardlink/async sqe flags
  io_uring: ensure double poll additions work with both request types
  io_uring: fix recvmsg memory leak with buffer selection
  io_uring: fix not initialised work->flags
  io_uring: fix missing msg_name assignment
  io_uring: account user memory freed when exit has been queued
  io_uring: fix memleak in io_sqe_files_register()
  io_uring: fix memleak in __io_sqe_files_update()
  io_uring: export cq overflow status to userspace

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:53:31 -06:00
Pavel Begunkov
4ae6dbd683 io_uring: fix lockup in io_fail_links()
io_fail_links() doesn't consider REQ_F_COMP_LOCKED leading to nested
spin_lock(completion_lock) and lockup.

[  197.680409] rcu: INFO: rcu_preempt detected expedited stalls on
	CPUs/tasks: { 6-... } 18239 jiffies s: 1421 root: 0x40/.
[  197.680411] rcu: blocking rcu_node structures:
[  197.680412] Task dump for CPU 6:
[  197.680413] link-timeout    R  running task        0  1669
	1 0x8000008a
[  197.680414] Call Trace:
[  197.680420]  ? io_req_find_next+0xa0/0x200
[  197.680422]  ? io_put_req_find_next+0x2a/0x50
[  197.680423]  ? io_poll_task_func+0xcf/0x140
[  197.680425]  ? task_work_run+0x67/0xa0
[  197.680426]  ? do_exit+0x35d/0xb70
[  197.680429]  ? syscall_trace_enter+0x187/0x2c0
[  197.680430]  ? do_group_exit+0x43/0xa0
[  197.680448]  ? __x64_sys_exit_group+0x18/0x20
[  197.680450]  ? do_syscall_64+0x52/0xa0
[  197.680452]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:51:33 -06:00
Pavel Begunkov
d5e16d8e23 io_uring: fix ->work corruption with poll_add
req->work might be already initialised by the time it gets into
__io_arm_poll_handler(), which will corrupt it by using fields that are
in an union with req->work. Luckily, the only side effect is missing
put_creds(). Clean req->work before going there.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-24 12:51:33 -06:00
Li Guifu
99c787cfd2 f2fs: fix use-after-free issue
During umount, f2fs_put_super() unregisters procfs entries after
f2fs_destroy_segment_manager(), it may cause use-after-free
issue when umount races with procfs accessing, fix it by relocating
f2fs_unregister_sysfs().

[Chao Yu: change commit title/message a bit]

Signed-off-by: Li Guifu <bluce.liguifu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-23 20:22:42 -07:00
Jia Yang
68e79baf41 f2fs: Change the type of f2fs_flush_inline_data() to void
The return value of f2fs_flush_inline_data() is not used,
so delete it.

Signed-off-by: Jia Yang <jiayang5@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-23 20:22:37 -07:00
Steve French
0e6705182d Revert "cifs: Fix the target file was deleted when rename failed."
This reverts commit 9ffad9263b.

Upon additional testing with older servers, it was found that
the original commit introduced a regression when using the old SMB1
dialect and rsyncing over an existing file.

The patch will need to be respun to address this, likely including
a larger refactoring of the SMB1 and SMB3 rename code paths to make
it less confusing and also to address some additional rename error
cases that SMB3 may be able to workaround.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reported-by: Patrick Fernie <patrick.fernie@gmail.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Acked-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
2020-07-23 15:44:11 -05:00
Pavel Begunkov
3e863ea3bb io_uring: missed req_init_async() for IOSQE_ASYNC
IOSQE_ASYNC branch of io_queue_sqe() is another place where an
unitialised req->work can be accessed (i.e. prior io_req_init_async()).
Nothing really bad though, it just looses IO_WQ_WORK_CONCURRENT flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-23 11:20:55 -06:00
Peter Enderborg
a24c6f7bc9 debugfs: Add access restriction option
Since debugfs include sensitive information it need to be treated
carefully. But it also has many very useful debug functions for userspace.
With this option we can have same configuration for system with
need of debugfs and a way to turn it off. This gives a extra protection
for exposure on systems where user-space services with system
access are attacked.

It is controlled by a configurable default value that can be override
with a kernel command line parameter. (debugfs=)

It can be on or off, but also internally on but not seen from user-space.
This no-mount mode do not register a debugfs as filesystem, but client can
register their parts in the internal structures. This data can be readed
with a debugger or saved with a crashkernel. When it is off clients
get EPERM error when accessing the functions for registering their
components.

Signed-off-by: Peter Enderborg <peter.enderborg@sony.com>
Link: https://lore.kernel.org/r/20200716071511.26864-3-peter.enderborg@sony.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-23 17:10:25 +02:00
J. Bruce Fields
9affa43581 nfsd4: fix NULL dereference in nfsd/clients display code
We hold the cl_lock here, and that's enough to keep stateid's from going
away, but it's not enough to prevent the files they point to from going
away.  Take fi_lock and a reference and check for NULL, as we do in
other code.

Reported-by: NeilBrown <neilb@suse.de>
Fixes: 78599c42ae ("nfsd4: add file to display list of client's opens")
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-07-22 16:47:14 -04:00
Eric Biggers
f3db0bed45 fs-verity: use smp_load_acquire() for ->i_verity_info
Normally smp_store_release() or cmpxchg_release() is paired with
smp_load_acquire().  Sometimes smp_load_acquire() can be replaced with
the more lightweight READ_ONCE().  However, for this to be safe, all the
published memory must only be accessed in a way that involves the
pointer itself.  This may not be the case if allocating the object also
involves initializing a static or global variable, for example.

fsverity_info::tree_params.hash_alg->tfm is a crypto_ahash object that's
internal to and is allocated by the crypto subsystem.  So by using
READ_ONCE() for ->i_verity_info, we're relying on internal
implementation details of the crypto subsystem.

Remove this fragile assumption by using smp_load_acquire() instead.

Also fix the cmpxchg logic to correctly execute an ACQUIRE barrier when
losing the cmpxchg race, since cmpxchg doesn't guarantee a memory
barrier on failure.

(Note: I haven't seen any real-world problems here.  This change is just
fixing the code to be guaranteed correct and less fragile.)

Fixes: fd2d1acfca ("fs-verity: add the hook for file ->open()")
Link: https://lore.kernel.org/r/20200721225920.114347-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21 16:02:41 -07:00
Eric Biggers
ab673b9874 fscrypt: use smp_load_acquire() for ->i_crypt_info
Normally smp_store_release() or cmpxchg_release() is paired with
smp_load_acquire().  Sometimes smp_load_acquire() can be replaced with
the more lightweight READ_ONCE().  However, for this to be safe, all the
published memory must only be accessed in a way that involves the
pointer itself.  This may not be the case if allocating the object also
involves initializing a static or global variable, for example.

fscrypt_info includes various sub-objects which are internal to and are
allocated by other kernel subsystems such as keyrings and crypto.  So by
using READ_ONCE() for ->i_crypt_info, we're relying on internal
implementation details of these other kernel subsystems.

Remove this fragile assumption by using smp_load_acquire() instead.

(Note: I haven't seen any real-world problems here.  This change is just
fixing the code to be guaranteed correct and less fragile.)

Fixes: e37a784d8b ("fscrypt: use READ_ONCE() to access ->i_crypt_info")
Link: https://lore.kernel.org/r/20200721225920.114347-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21 16:02:13 -07:00
Eric Biggers
777afe4e68 fscrypt: use smp_load_acquire() for ->s_master_keys
Normally smp_store_release() or cmpxchg_release() is paired with
smp_load_acquire().  Sometimes smp_load_acquire() can be replaced with
the more lightweight READ_ONCE().  However, for this to be safe, all the
published memory must only be accessed in a way that involves the
pointer itself.  This may not be the case if allocating the object also
involves initializing a static or global variable, for example.

super_block::s_master_keys is a keyring, which is internal to and is
allocated by the keyrings subsystem.  By using READ_ONCE() for it, we're
relying on internal implementation details of the keyrings subsystem.

Remove this fragile assumption by using smp_load_acquire() instead.

(Note: I haven't seen any real-world problems here.  This change is just
fixing the code to be guaranteed correct and less fragile.)

Fixes: 22d94f493b ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl")
Link: https://lore.kernel.org/r/20200721225920.114347-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21 16:02:13 -07:00
Eric Biggers
97c6327f71 fscrypt: use smp_load_acquire() for fscrypt_prepared_key
Normally smp_store_release() or cmpxchg_release() is paired with
smp_load_acquire().  Sometimes smp_load_acquire() can be replaced with
the more lightweight READ_ONCE().  However, for this to be safe, all the
published memory must only be accessed in a way that involves the
pointer itself.  This may not be the case if allocating the object also
involves initializing a static or global variable, for example.

fscrypt_prepared_key includes a pointer to a crypto_skcipher object,
which is internal to and is allocated by the crypto subsystem.  By using
READ_ONCE() for it, we're relying on internal implementation details of
the crypto subsystem.

Remove this fragile assumption by using smp_load_acquire() instead.

(Note: I haven't seen any real-world problems here.  This change is just
fixing the code to be guaranteed correct and less fragile.)

Fixes: 5fee36095c ("fscrypt: add inline encryption support")
Cc: Satya Tangirala <satyat@google.com>
Link: https://lore.kernel.org/r/20200721225920.114347-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21 16:02:13 -07:00
Eric Biggers
bd0d97b719 fscrypt: switch fscrypt_do_sha256() to use the SHA-256 library
fscrypt_do_sha256() is only used for hashing encrypted filenames to
create no-key tokens, which isn't performance-critical.  Therefore a C
implementation of SHA-256 is sufficient.

Also, the logic to create no-key tokens is always potentially needed.
This differs from fscrypt's other dependencies on crypto API algorithms,
which are conditionally needed depending on what encryption policies
userspace is using.  Therefore, for fscrypt there isn't much benefit to
allowing SHA-256 to be a loadable module.

So, make fscrypt_do_sha256() use the SHA-256 library instead of the
crypto_shash API.  This is much simpler, since it avoids having to
implement one-time-init (which is hard to do correctly, and in fact was
implemented incorrectly) and handle failures to allocate the
crypto_shash object.

Fixes: edc440e3d2 ("fscrypt: improve format of no-key names")
Cc: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20200721225920.114347-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21 16:02:13 -07:00
Boris Burkov
48cfa61b58 btrfs: fix mount failure caused by race with umount
It is possible to cause a btrfs mount to fail by racing it with a slow
umount. The crux of the sequence is generic_shutdown_super not yet
calling sop->put_super before btrfs_mount_root calls btrfs_open_devices.
If that occurs, btrfs_open_devices will decide the opened counter is
non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to
0. From here, mount will call sget which will result in grab_super
trying to take the super block umount semaphore. That semaphore will be
held by the slow umount, so mount will block. Before up-ing the
semaphore, umount will delete the super block, resulting in mount's sget
reliably allocating a new one, which causes the mount path to dutifully
fill it out, and increment total_rw_bytes a second time, which causes
the mount to fail, as we see double the expected bytes.

Here is the sequence laid out in greater detail:

CPU0                                                    CPU1
down_write sb->s_umount
btrfs_kill_super
  kill_anon_super(sb)
    generic_shutdown_super(sb);
      shrink_dcache_for_umount(sb);
      sync_filesystem(sb);
      evict_inodes(sb); // SLOW

                                              btrfs_mount_root
                                                btrfs_scan_one_device
                                                fs_devices = device->fs_devices
                                                fs_info->fs_devices = fs_devices
                                                // fs_devices-opened makes this a no-op
                                                btrfs_open_devices(fs_devices, mode, fs_type)
                                                s = sget(fs_type, test, set, flags, fs_info);
                                                  find sb in s_instances
                                                  grab_super(sb);
                                                    down_write(&s->s_umount); // blocks

      sop->put_super(sb)
        // sb->fs_devices->opened == 2; no-op
      spin_lock(&sb_lock);
      hlist_del_init(&sb->s_instances);
      spin_unlock(&sb_lock);
      up_write(&sb->s_umount);
                                                    return 0;
                                                  retry lookup
                                                  don't find sb in s_instances (deleted by CPU0)
                                                  s = alloc_super
                                                  return s;
                                                btrfs_fill_super(s, fs_devices, data)
                                                  open_ctree // fs_devices total_rw_bytes improperly set!
                                                    btrfs_read_chunk_tree
                                                      read_one_dev // increment total_rw_bytes again!!
                                                      super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!

To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree
before the calls to read_one_dev, while holding the sb umount semaphore
and the uuid mutex.

To reproduce, it is sufficient to dirty a decent number of inodes, then
quickly umount and mount.

  for i in $(seq 0 500)
  do
    dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1
  done
  umount /mnt/foo&
  mount /mnt/foo

does the trick for me.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-21 22:08:54 +02:00
Robbie Ko
5909ca110b btrfs: fix page leaks after failure to lock page for delalloc
When locking pages for delalloc, we check if it's dirty and mapping still
matches. If it does not match, we need to return -EAGAIN and release all
pages. Only the current page was put though, iterate over all the
remaining pages too.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-21 22:08:53 +02:00
Qu Wenruo
fa91e4aa17 btrfs: qgroup: fix data leak caused by race between writeback and truncate
[BUG]
When running tests like generic/013 on test device with btrfs quota
enabled, it can normally lead to data leak, detected at unmount time:

  BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
  ------------[ cut here ]------------
  WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
  RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
  Call Trace:
   btrfs_put_super+0x15/0x17 [btrfs]
   generic_shutdown_super+0x72/0x110
   kill_anon_super+0x18/0x30
   btrfs_kill_super+0x17/0x30 [btrfs]
   deactivate_locked_super+0x3b/0xa0
   deactivate_super+0x40/0x50
   cleanup_mnt+0x135/0x190
   __cleanup_mnt+0x12/0x20
   task_work_run+0x64/0xb0
   __prepare_exit_to_usermode+0x1bc/0x1c0
   __syscall_return_slowpath+0x47/0x230
   do_syscall_64+0x64/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace caf08beafeca2392 ]---
  BTRFS error (device dm-3): qgroup reserved space leaked

[CAUSE]
In the offending case, the offending operations are:
2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0

The following sequence of events could happen after the writev():
	CPU1 (writeback)		|		CPU2 (truncate)
-----------------------------------------------------------------
btrfs_writepages()			|
|- extent_write_cache_pages()		|
   |- Got page for 1003520		|
   |  1003520 is Dirty, no writeback	|
   |  So (!clear_page_dirty_for_io())   |
   |  gets called for it		|
   |- Now page 1003520 is Clean.	|
   |					| btrfs_setattr()
   |					| |- btrfs_setsize()
   |					|    |- truncate_setsize()
   |					|       New i_size is 18388
   |- __extent_writepage()		|
   |  |- page_offset() > i_size		|
      |- btrfs_invalidatepage()		|
	 |- Page is clean, so no qgroup |
	    callback executed

This means, the qgroup reserved data space is not properly released in
btrfs_invalidatepage() as the page is Clean.

[FIX]
Instead of checking the dirty bit of a page, call
btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().

As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
io_tree, not bound to page status, thus we won't cause double freeing
anyway.

Fixes: 0b34c261e2 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-21 22:08:32 +02:00
Filipe Manana
580c079b57 btrfs: fix double free on ulist after backref resolution failure
At btrfs_find_all_roots_safe() we allocate a ulist and set the **roots
argument to point to it. However if later we fail due to an error returned
by find_parent_nodes(), we free that ulist but leave a dangling pointer in
the **roots argument. Upon receiving the error, a caller of this function
can attempt to free the same ulist again, resulting in an invalid memory
access.

One such scenario is during qgroup accounting:

btrfs_qgroup_account_extents()

 --> calls btrfs_find_all_roots() passes &new_roots (a stack allocated
     pointer) to btrfs_find_all_roots()

   --> btrfs_find_all_roots() just calls btrfs_find_all_roots_safe()
       passing &new_roots to it

     --> allocates ulist and assigns its address to **roots (which
         points to new_roots from btrfs_qgroup_account_extents())

     --> find_parent_nodes() returns an error, so we free the ulist
         and leave **roots pointing to it after returning

 --> btrfs_qgroup_account_extents() sees btrfs_find_all_roots() returned
     an error and jumps to the label 'cleanup', which just tries to
     free again the same ulist

Stack trace example:

 ------------[ cut here ]------------
 BTRFS: tree first key check failed
 WARNING: CPU: 1 PID: 1763215 at fs/btrfs/disk-io.c:422 btrfs_verify_level_key+0xe0/0x180 [btrfs]
 Modules linked in: dm_snapshot dm_thin_pool (...)
 CPU: 1 PID: 1763215 Comm: fsstress Tainted: G        W         5.8.0-rc3-btrfs-next-64 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:btrfs_verify_level_key+0xe0/0x180 [btrfs]
 Code: 28 5b 5d (...)
 RSP: 0018:ffffb89b473779a0 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff90397759bf08 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff
 RBP: ffff9039a419c000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffffb89b43301000 R12: 000000000000005e
 R13: ffffb89b47377a2e R14: ffffb89b473779af R15: 0000000000000000
 FS:  00007fc47e1e1000(0000) GS:ffff9039ac200000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc47e1df000 CR3: 00000003d9e4e001 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  read_block_for_search+0xf6/0x350 [btrfs]
  btrfs_next_old_leaf+0x242/0x650 [btrfs]
  resolve_indirect_refs+0x7cf/0x9e0 [btrfs]
  find_parent_nodes+0x4ea/0x12c0 [btrfs]
  btrfs_find_all_roots_safe+0xbf/0x130 [btrfs]
  btrfs_qgroup_account_extents+0x9d/0x390 [btrfs]
  btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
  btrfs_sync_file+0x3d4/0x4d0 [btrfs]
  do_fsync+0x38/0x70
  __x64_sys_fdatasync+0x13/0x20
  do_syscall_64+0x5c/0xe0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fc47e2d72e3
 Code: Bad RIP value.
 RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3
 RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003
 RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8
 R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50
 irq event stamp: 0
 hardirqs last  enabled at (0): [<0000000000000000>] 0x0
 hardirqs last disabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
 softirqs last  enabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
 softirqs last disabled at (0): [<0000000000000000>] 0x0
 ---[ end trace 8639237550317b48 ]---
 BTRFS error (device sdc): tree first key mismatch detected, bytenr=62324736 parent_transid=94 key expected=(262,108,1351680) has=(259,108,1921024)
 general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
 CPU: 2 PID: 1763215 Comm: fsstress Tainted: G        W         5.8.0-rc3-btrfs-next-64 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:ulist_release+0x14/0x60 [btrfs]
 Code: c7 07 00 (...)
 RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282
 RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840
 RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840
 R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840
 FS:  00007fc47e1e1000(0000) GS:ffff9039ac600000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f8c1c0a51c8 CR3: 00000003d9e4e004 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  ulist_free+0x13/0x20 [btrfs]
  btrfs_qgroup_account_extents+0xf3/0x390 [btrfs]
  btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
  btrfs_sync_file+0x3d4/0x4d0 [btrfs]
  do_fsync+0x38/0x70
  __x64_sys_fdatasync+0x13/0x20
  do_syscall_64+0x5c/0xe0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fc47e2d72e3
 Code: Bad RIP value.
 RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3
 RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003
 RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8
 R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50
 Modules linked in: dm_snapshot dm_thin_pool (...)
 ---[ end trace 8639237550317b49 ]---
 RIP: 0010:ulist_release+0x14/0x60 [btrfs]
 Code: c7 07 00 (...)
 RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282
 RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840
 RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840
 R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840
 FS:  00007fc47e1e1000(0000) GS:ffff9039ad200000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f6a776f7d40 CR3: 00000003d9e4e002 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fix this by making btrfs_find_all_roots_safe() set *roots to NULL after
it frees the ulist.

Fixes: 8da6d5815c ("Btrfs: added btrfs_find_all_roots()")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-21 21:59:15 +02:00
Daeho Jeong
9af846486d f2fs: add F2FS_IOC_SEC_TRIM_FILE ioctl
Added a new ioctl to send discard commands or/and zero out
to selected data area of a regular file for security reason.

The way of handling range.len of F2FS_IOC_SEC_TRIM_FILE:
1. Added -1 value support for range.len to secure trim the whole blocks
   starting from range.start regardless of i_size.
2. If the end of the range passes over the end of file, it means until
   the end of file (i_size).
3. ignored the case of that range.len is zero to prevent the function
   from making end_addr zero and triggering different behaviour of
   the function.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-21 12:58:11 -07:00
Jaegeuk Kim
b0f3b87fb3 f2fs: should avoid inode eviction in synchronous path
https://bugzilla.kernel.org/show_bug.cgi?id=208565

PID: 257    TASK: ecdd0000  CPU: 0   COMMAND: "init"
  #0 [<c0b420ec>] (__schedule) from [<c0b423c8>]
  #1 [<c0b423c8>] (schedule) from [<c0b459d4>]
  #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>]
  #3 [<c0b44fa0>] (down_read) from [<c044233c>]
  #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>]
  #5 [<c0442890>] (f2fs_truncate) from [<c044d408>]
  #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>]
  #7 [<c030be18>] (evict) from [<c030a558>]
  #8 [<c030a558>] (iput) from [<c047c600>]
  #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>]
 #10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>]
 #11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>]
 #12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>]
 #13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>]
 #14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>]
 #15 [<c0324294>] (do_fsync) from [<c0324014>]
 #16 [<c0324014>] (sys_fsync) from [<c0108bc0>]

This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where
iput() requires f2fs_lock_op() again resulting in livelock.

Reported-by: Zhiguo Niu <Zhiguo.Niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-21 12:55:54 -07:00
Eric Biggers
f000223c98 fscrypt: restrict IV_INO_LBLK_* to AES-256-XTS
IV_INO_LBLK_* exist only because of hardware limitations, and currently
the only known use case for them involves AES-256-XTS.  Therefore, for
now only allow them in combination with AES-256-XTS.  This way we don't
have to worry about them being combined with other encryption modes.

(To be clear, combining IV_INO_LBLK_* with other encryption modes
*should* work just fine.  It's just not being tested, so we can't be
100% sure it works.  So with no known use case, it's best to disallow it
for now, just like we don't allow other weird combinations like
AES-256-XTS contents encryption with Adiantum filenames encryption.)

This can be relaxed later if a use case for other combinations arises.

Fixes: b103fb7653 ("fscrypt: add support for IV_INO_LBLK_64 policies")
Fixes: e3b1078bed ("fscrypt: add support for IV_INO_LBLK_32 policies")
Link: https://lore.kernel.org/r/20200721181012.39308-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21 11:12:57 -07:00
Eric W. Biederman
be619f7f06 exec: Implement kernel_execve
To allow the kernel not to play games with set_fs to call exec
implement kernel_execve.  The function kernel_execve takes pointers
into kernel memory and copies the values pointed to onto the new
userspace stack.

The calls with arguments from kernel space of do_execve are replaced
with calls to kernel_execve.

The calls do_execve and do_execveat are made static as there are now
no callers outside of exec.

The comments that mention do_execve are updated to refer to
kernel_execve or execve depending on the circumstances.  In addition
to correcting the comments, this makes it easy to grep for do_execve
and verify it is not used.

Inspired-by: https://lkml.kernel.org/r/20200627072704.2447163-1-hch@lst.de
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/87wo365ikj.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21 08:24:52 -05:00
Eric W. Biederman
d8b9cd549e exec: Factor bprm_stack_limits out of prepare_arg_pages
In preparation for implementiong kernel_execve (which will take kernel
pointers not userspace pointers) factor out bprm_stack_limits out of
prepare_arg_pages.  This separates the counting which depends upon the
getting data from userspace from the calculations of the stack limits
which is usable in kernel_execve.

The remove prepare_args_pages and compute bprm->argc and bprm->envc
directly in do_execveat_common, before bprm_stack_limits is called.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/87365u6x60.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21 08:24:52 -05:00
Eric W. Biederman
0c9cdff054 exec: Factor bprm_execve out of do_execve_common
Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier.  Factor bprm_execve
out of do_execve_common to separate out the copying of arguments
to the newe stack, and the rest of exec.

In separating bprm_execve from do_execve_common the copying
of the arguments onto the new stack happens earlier.

As the copying of the arguments does not depend any security hooks,
files, the file table, current->in_execve, current->fs->in_exec,
bprm->unsafe, or creds this is safe.

Likewise the security hook security_creds_for_exec does not depend upon
preventing the argument copying from happening.

In addition to making it possible to implement kernel_execve that
performs the copying differently, this separation of bprm_execve from
do_execve_common makes for a nice separation of responsibilities making
the exec code easier to navigate.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/878sfm6x6x.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21 08:24:52 -05:00
Eric W. Biederman
f18ac551e5 exec: Move bprm_mm_init into alloc_bprm
Currently it is necessary for the usermode helper code and the code that
launches init to use set_fs so that pages coming from the kernel look like
they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument copying
from userspace needs to happen earlier.  Move the allocation and
initialization of bprm->mm into alloc_bprm so that the bprm->mm is
available early to store the new user stack into.  This is a prerequisite
for copying argv and envp into the new user stack early before ther rest of
exec.

To keep the things consistent the cleanup of bprm->mm is moved into
free_bprm.  So that bprm->mm will be cleaned up whenever bprm->mm is
allocated and free_bprm are called.

Moving bprm_mm_init earlier is safe as it does not depend on any files,
current->in_execve, current->fs->in_exec, bprm->unsafe, or the if the file
table is shared. (AKA bprm_mm_init does not depend on any of the code that
happens between alloc_bprm and where it was previously called.)

This moves bprm->mm cleanup after current->fs->in_exec is set to 0.  This
is safe because current->fs->in_exec is only used to preventy taking an
additional reference on the fs_struct.

This moves bprm->mm cleanup after current->in_execve is set to 0.  This is
safe because current->in_execve is only used by the lsms (apparmor and
tomoyou) and always for LSM specific functions, never for anything to do
with the mm.

This adds bprm->mm cleanup into the successful return path.  This is safe
because being on the successful return path implies that begin_new_exec
succeeded and set brpm->mm to NULL.  As bprm->mm is NULL bprm cleanup I am
moving into free_bprm will do nothing.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/87eepe6x7p.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21 08:24:52 -05:00
Eric W. Biederman
60d9ad1d1d exec: Move initialization of bprm->filename into alloc_bprm
Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier.  Move the computation
of bprm->filename and possible allocation of a name in the case
of execveat into alloc_bprm to make that possible.

The exectuable name, the arguments, and the environment are
copied into the new usermode stack which is stored in bprm
until exec passes the point of no return.

As the executable name is copied first onto the usermode stack
it needs to be known.  As there are no dependencies to computing
the executable name, compute it early in alloc_bprm.

As an implementation detail if the filename needs to be generated
because it embeds a file descriptor store that filename in a new field
bprm->fdpath, and free it in free_bprm.  Previously this was done in
an independent variable pathbuf.  I have renamed pathbuf fdpath
because fdpath is more suggestive of what kind of path is in the
variable.  I moved fdpath into struct linux_binprm because it is
tightly tied to the other variables in struct linux_binprm, and as
such is needed to allow the call alloc_binprm to move.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/87k0z66x8f.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21 08:24:52 -05:00
Eric W. Biederman
0a8f36eb48 exec: Factor out alloc_bprm
Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier.  Move the allocation
of the bprm into it's own function (alloc_bprm) and move the call of
alloc_bprm before unshare_files so that bprm can ultimately be
allocated, the arguments can be placed on the new stack, and then the
bprm can be passed into the core of exec.

Neither the allocation of struct binprm nor the unsharing depend upon each
other so swapping the order in which they are called is trivially safe.

To keep things consistent the order of cleanup at the end of
do_execve_common swapped to match the order of initialization.

Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/87pn8y6x9a.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21 08:24:44 -05:00
Ilya Ponetayev
db415f7aae exfat: fix name_hash computation on big endian systems
On-disk format for name_hash field is LE, so it must be explicitly
transformed on BE system for proper result.

Fixes: 370e812b3e ("exfat: add nls operations")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Chen Minqiang <ptpt52@gmail.com>
Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-07-21 10:44:19 +09:00
Hyeongseok Kim
41e3928f8c exfat: fix wrong size update of stream entry by typo
The stream.size field is updated to the value of create timestamp
of the file entry. Fix this to use correct stream entry pointer.

Fixes: 29bbb14bfc ("exfat: fix incorrect update of stream entry in __exfat_truncate()")
Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-07-21 10:44:15 +09:00
Namjae Jeon
d2fa0c337d exfat: fix wrong hint_stat initialization in exfat_find_dir_entry()
We found the wrong hint_stat initialization in exfat_find_dir_entry().
It should be initialized when cluster is EXFAT_EOF_CLUSTER.

Fixes: ca06197382 ("exfat: add directory operations")
Cc: stable@vger.kernel.org # v5.7
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-07-21 10:44:10 +09:00