Commit Graph

57206 Commits

Author SHA1 Message Date
Colin Ian King
561405f031 jbd2: clean up indentation issue, replace spaces with tab
There is a statement that is indented with spaces, replace it with
a tab.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04 00:20:10 -05:00
Colin Ian King
a92abd738d ext4: clean up indentation issues, remove extraneous tabs
There are several lines that are indented too far, clean these
up by removing the tabs.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04 00:16:44 -05:00
Maurizio Lombardi
132d00becb ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()
In case of error, ext4_try_to_write_inline_data() should unlock
and release the page it holds.

Fixes: f19d5870cb ("ext4: add normal write support for inline data")
Cc: stable@kernel.org # 3.8
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04 00:06:53 -05:00
Pan Bian
61157b24e6 ext4: fix possible use after free in ext4_quota_enable
The function frees qf_inode via iput but then pass qf_inode to
lockdep_set_quota_inode on the failure path. This may result in a
use-after-free bug. The patch frees df_inode only when it is never used.

Fixes: daf647d2dd ("ext4: add lockdep annotations for i_data_sem")
Cc: stable@kernel.org # 4.6
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-03 23:28:02 -05:00
Jan Kara
96f1e09745 jbd2: avoid long hold times of j_state_lock while committing a transaction
We can hold j_state_lock for writing at the beginning of
jbd2_journal_commit_transaction() for a rather long time (reportedly for
30 ms) due cleaning revoke bits of all revoked buffers under it. The
handling of revoke tables as well as cleaning of t_reserved_list, and
checkpoint lists does not need j_state_lock for anything. It is only
needed to prevent new handles from joining the transaction. Generally
T_LOCKED transaction state prevents new handles from joining the
transaction - except for reserved handles which have to allowed to join
while we wait for other handles to complete.

To prevent reserved handles from joining the transaction while cleaning
up lists, add new transaction state T_SWITCH and watch for it when
starting reserved handles. With this we can just drop the lock for
operations that don't need it.

Reported-and-tested-by: Adrian Hunter <adrian.hunter@intel.com>
Suggested-by: "Theodore Y. Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-03 23:16:07 -05:00
Kees Cook
8665569e97 pstore/ram: Avoid NULL deref in ftrace merging failure path
Given corruption in the ftrace records, it might be possible to allocate
tmp_prz without assigning prz to it, but still marking it as needing to
be freed, which would cause at least a NULL dereference.

smatch warnings:
fs/pstore/ram.c:340 ramoops_pstore_read() error: we previously assumed 'prz' could be null (see line 255)

https://lists.01.org/pipermail/kbuild-all/2018-December/055528.html

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 2fbea82bbb ("pstore: Merge per-CPU ftrace records into one")
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 17:11:02 -08:00
Kees Cook
ea84b580b9 pstore: Convert buf_lock to semaphore
Instead of running with interrupts disabled, use a semaphore. This should
make it easier for backends that may need to sleep (e.g. EFI) when
performing a write:

|BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
|in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum
|Preemption disabled at:
|[<ffffffff99d60512>] pstore_dump+0x72/0x330
|CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G      D           4.20.0-rc3 #45
|Call Trace:
| dump_stack+0x4f/0x6a
| ___might_sleep.cold.91+0xd3/0xe4
| __might_sleep+0x50/0x90
| wait_for_completion+0x32/0x130
| virt_efi_query_variable_info+0x14e/0x160
| efi_query_variable_store+0x51/0x1a0
| efivar_entry_set_safe+0xa3/0x1b0
| efi_pstore_write+0x109/0x140
| pstore_dump+0x11c/0x330
| kmsg_dump+0xa4/0xd0
| oops_exit+0x22/0x30
...

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: 21b3ddd39f ("efi: Don't use spinlocks for efi vars")
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 17:11:02 -08:00
Thomas Meyer
69596433bc pstore: Fix bool initialization/comparison
Bool initializations should use true and false. Bool tests don't need
comparisons.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Joel Fernandes (Google)
30696378f6 pstore/ram: Do not treat empty buffers as valid
The ramoops backend currently calls persistent_ram_save_old() even
if a buffer is empty. While this appears to work, it is does not seem
like the right thing to do and could lead to future bugs so lets avoid
that. It also prevents misleading prints in the logs which claim the
buffer is valid.

I got something like:

	found existing buffer, size 0, start 0

When I was expecting:

	no valid data in buffer (sig = ...)

This bails out early (and reports with pr_debug()), since it's an
acceptable state.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Joel Fernandes (Google)
b05c950698 pstore/ram: Simplify ramoops_get_next_prz() arguments
(1) remove type argument from ramoops_get_next_prz()

Since we store the type of the prz when we initialize it, we no longer
need to pass it again in ramoops_get_next_prz() since we can just use
that to setup the pstore record. So lets remove it from the argument list.

(2) remove max argument from ramoops_get_next_prz()

Looking at the code flow, the 'max' checks are already being done on
the prz passed to ramoops_get_next_prz(). Lets remove it to simplify
this function and reduce its arguments.

(3) further reduce ramoops_get_next_prz() arguments by passing record

Both the id and type fields of a pstore_record are set by
ramoops_get_next_prz(). So we can just pass a pointer to the pstore_record
instead of passing individual elements. This results in cleaner more
readable code and fewer lines.

In addition lets also remove the 'update' argument since we can detect
that. Changes are squashed into a single patch to reduce fixup conflicts.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Joel Fernandes (Google)
f0f23e5469 pstore: Map PSTORE_TYPE_* to strings
In later patches we will need to map types to names, so create a
constant table for that which can also be used in different parts of
old and new code. This saves the type in the PRZ which will be useful
in later patches.

Instead of having an explicit PSTORE_TYPE_UNKNOWN, just use ..._MAX.

This includes removing the now redundant filename templates which can use
a single format string. Also, there's no reason to limit the "is it still
compressed?" test to only PSTORE_TYPE_DMESG when building the pstorefs
filename. Records are zero-initialized, so a backend would need to have
explicitly set compressed=1.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Kees Cook
0eed84ffb0 pstore: Improve and update some comments and status output
This improves and updates some comments:
 - dump handler comment out of sync from calling convention
 - fix kern-doc typo

and improves status output:
 - reminder that only kernel crash dumps are compressed
 - do not be silent about ECC infrastructure failures

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Kees Cook
c208f7d4b0 pstore/ram: Add kern-doc for struct persistent_ram_zone
The struct persistent_ram_zone wasn't well documented. This adds kern-doc
for it.

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Kees Cook
dc80b1ea4c pstore/ram: Report backend assignments with finer granularity
In order to more easily perform automated regression testing, this
adds pr_debug() calls to report each prz allocation which can then be
verified against persistent storage. Specifically, seeing the dividing
line between header, data, any ECC bytes. (And the general assignment
output is updated to remove the bogus ECC blocksize which isn't actually
recorded outside the prz instance.)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Kees Cook
9ee85b8bd3 pstore/ram: Standardize module name in ramoops
With both ram.c and ram_core.c built into ramoops.ko, it doesn't make
sense to have differing pr_fmt prefixes. This fixes ram_core.c to use
the module name (as ram.c already does). Additionally improves region
reservation error to include the region name.

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Peng Wang
7684bd334d pstore: Avoid duplicate call of persistent_ram_zap()
When initialing a prz, if invalid data is found (no PERSISTENT_RAM_SIG),
the function call path looks like this:

ramoops_init_prz ->
    persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
    persistent_ram_zap

As we can see, persistent_ram_zap() is called twice.
We can avoid this by adding an option to persistent_ram_new(), and
only call persistent_ram_zap() when it is needed.

Signed-off-by: Peng Wang <wangpeng15@xiaomi.com>
[kees: minor tweak to exit path and commit log]
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Kees Cook
b77fa617a2 pstore: Remove needless lock during console writes
Since the console writer does not use the preallocated crash dump buffer
any more, there is no reason to perform locking around it.

Fixes: 70ad35db33 ("pstore: Convert console write to use ->write_buf")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-12-03 16:52:35 -08:00
Kees Cook
bdabc8e71c pstore: Do not use crash buffer for decompression
The pre-allocated compression buffer used for crash dumping was also
being used for decompression. This isn't technically safe, since it's
possible the kernel may attempt a crashdump while pstore is populating the
pstore filesystem (and performing decompression). Instead, just allocate
a separate buffer for decompression. Correctness is preferred over
performance here.

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 16:52:35 -08:00
Kees Cook
971f66d8a7 Merge branch 'for-linus/pstore' into for-next/pstore 2018-12-03 16:52:02 -08:00
David Teigland
3595c55932 dlm: fix invalid cluster name warning
The warning added in commit 3b0e761ba8
  "dlm: print log message when cluster name is not set"

did not account for the fact that lockspaces created
from userland do not supply a cluster name, so bogus
warnings are printed every time a userland lockspace
is created.

Signed-off-by: David Teigland <teigland@redhat.com>
2018-12-03 15:30:24 -06:00
Jani Nikula
9ee4685c9a sysfs: constify sysfs create/remove files harder
Let the passed in array be const (and thus placed in rodata) instead of
a mutable array of const pointers.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181004143750.30880-1-jani.nikula@intel.com
2018-12-03 18:18:19 +02:00
Thomas Meyer
3456880ff3 dlm: NULL check before some freeing functions is not needed
NULL check before some freeing functions is not needed.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David Teigland <teigland@redhat.com>
2018-12-03 10:02:01 -06:00
Miklos Szeredi
d233c7dd16 fuse: fix revalidation of attributes for permission check
fuse_invalidate_attr() now sets fi->inval_mask instead of fi->i_time, hence
we need to check the inval mask in fuse_permission() as well.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 2f1e81965f ("fuse: allow fine grained attr cache invaldation")
2018-12-03 10:14:43 +01:00
Miklos Szeredi
a9c2d1e82f fuse: fix fsync on directory
Commit ab2257e994 ("fuse: reduce size of struct fuse_inode") moved parts
of fields related to writeback on regular file and to directory caching
into a union.  However fuse_fsync_common() called from fuse_dir_fsync()
touches some writeback related fields, resulting in a crash.

Move writeback related parts from fuse_fsync_common() to fuse_fysnc().

Reported-by: Brett Girton <btgirton@gmail.com>
Tested-by: Brett Girton <btgirton@gmail.com>
Fixes: ab2257e994 ("fuse: reduce size of struct fuse_inode")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-03 10:14:43 +01:00
Greg Kroah-Hartman
7782b57ccc Merge 4.20-rc5 into driver-core-next
We need the fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-03 07:54:31 +01:00
Dave Kleikamp
ad3cba223a nfs: don't dirty kernel pages read by direct-io
When we use direct_IO with an NFS backing store, we can trigger a
WARNING in __set_page_dirty(), as below, since we're dirtying the page
unnecessarily in nfs_direct_read_completion().

To fix, replicate the logic in commit 53cbf3b157 ("fs: direct-io:
don't dirtying pages for ITER_BVEC/ITER_KVEC direct read").

Other filesystems that implement direct_IO handle this; most use
blockdev_direct_IO(). ceph and cifs have similar logic.

mount 127.0.0.1:/export /nfs
dd if=/dev/zero of=/nfs/image bs=1M count=200
losetup --direct-io=on -f /nfs/image
mkfs.btrfs /dev/loop0
mount -t btrfs /dev/loop0 /mnt/

kernel: WARNING: CPU: 0 PID: 8067 at fs/buffer.c:580 __set_page_dirty+0xaf/0xd0
kernel: Modules linked in: loop(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) fuse(E) tun(E) ip6t_rpfilter(E) ipt_REJECT(E) nf_
kernel:  snd_seq(E) snd_seq_device(E) snd_pcm(E) video(E) snd_timer(E) snd(E) soundcore(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) sr_mod(E) cdrom(E) ata_generic(E) pata_acpi(E) crc32c_intel(E) ahci(E) li
kernel: CPU: 0 PID: 8067 Comm: kworker/0:2 Tainted: G            E     4.20.0-rc1.master.20181111.ol7.x86_64 #1
kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
kernel: Workqueue: nfsiod rpc_async_release [sunrpc]
kernel: RIP: 0010:__set_page_dirty+0xaf/0xd0
kernel: Code: c3 48 8b 02 f6 c4 04 74 d4 48 89 df e8 ba 05 f7 ff 48 89 c6 eb cb 48 8b 43 08 a8 01 75 1f 48 89 d8 48 8b 00 a8 04 74 02 eb 87 <0f> 0b eb 83 48 83 e8 01 eb 9f 48 83 ea 01 0f 1f 00 eb 8b 48 83 e8
kernel: RSP: 0000:ffffc1c8825b7d78 EFLAGS: 00013046
kernel: RAX: 000fffffc0020089 RBX: fffff2b603308b80 RCX: 0000000000000001
kernel: RDX: 0000000000000001 RSI: ffff9d11478115c8 RDI: ffff9d11478115d0
kernel: RBP: ffffc1c8825b7da0 R08: 0000646f6973666e R09: 8080808080808080
kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9d11478115d0
kernel: R13: ffff9d11478115c8 R14: 0000000000003246 R15: 0000000000000001
kernel: FS:  0000000000000000(0000) GS:ffff9d115ba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f408686f640 CR3: 0000000104d8e004 CR4: 00000000000606f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  __set_page_dirty_buffers+0xb6/0x110
kernel:  set_page_dirty+0x52/0xb0
kernel:  nfs_direct_read_completion+0xc4/0x120 [nfs]
kernel:  nfs_pgio_release+0x10/0x20 [nfs]
kernel:  rpc_free_task+0x30/0x70 [sunrpc]
kernel:  rpc_async_release+0x12/0x20 [sunrpc]
kernel:  process_one_work+0x174/0x390
kernel:  worker_thread+0x4f/0x3e0
kernel:  kthread+0x102/0x140
kernel:  ? drain_workqueue+0x130/0x130
kernel:  ? kthread_stop+0x110/0x110
kernel:  ret_from_fork+0x35/0x40
kernel: ---[ end trace 01341980905412c9 ]---

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

[forward-ported to v4.20]
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-12-02 09:43:56 -05:00
Tigran Mkrtchyan
320f35b7bf flexfiles: enforce per-mirror stateid only for v4 DSes
Since commit bb21ce0ad2 we always enforce per-mirror stateid.
However, this makes sense only for v4+ servers.

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-12-02 09:43:56 -05:00
Daniel Santos
a788c52727 jffs2: Fix use of uninitialized delayed_work, lockdep breakage
jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER
is defined then a write buffer is available and has been initialized.
However, this does is not the case when the mtd device has no
out-of-band buffer:

int jffs2_nand_flash_setup(struct jffs2_sb_info *c)
{
        if (!c->mtd->oobsize)
                return 0;
...

The resulting call to cancel_delayed_work_sync passing a uninitialized
(but zeroed) delayed_work struct forces lockdep to become disabled.

[   90.050639] overlayfs: upper fs does not support tmpfile.
[   90.652264] INFO: trying to register non-static key.
[   90.662171] the code is fine but needs lockdep annotation.
[   90.673090] turning off the locking correctness validator.
[   90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0
[   90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7
[   90.713349]         80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff
[   90.730020]         00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420
[   90.746690]         6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90
[   90.763362]         80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000
[   90.780033]         ...
[   90.784902] Call Trace:
[   90.789793] [<8000f0d8>] show_stack+0xb8/0x148
[   90.798659] [<8005a000>] register_lock_class+0x270/0x55c
[   90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c
[   90.818964] [<8005e314>] lock_acquire+0x194/0x1dc
[   90.828345] [<8003f27c>] flush_work+0x200/0x24c
[   90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210
[   90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54
[   90.857173] [<80125cf4>] iterate_supers+0xf4/0x120
[   90.866729] [<80158fc4>] sys_sync+0x44/0x9c
[   90.875067] [<80014424>] syscall_common+0x34/0x58

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-12-02 09:20:34 +01:00
Paul E. McKenney
eaaf055f27 Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', 'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD
bug.2018.11.12a:  Get rid of BUG_ON() and friends
consolidate.2018.12.01a:  Continued RCU flavor-consolidation cleanup
doc.2018.11.12a:  Documentation updates
fixes.2018.11.12a:  Miscellaneous fixes
initrd.2018.11.08b:  Automate creation of rcutorture initrd
sil.2018.11.12a:  Remove more spin_unlock_wait() calls
2018-12-01 12:43:16 -08:00
Linus Torvalds
880584176e Merge tag 'for-linus-20181201' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:

 - Single range elevator discard merge fix, that caused crashes (Ming)

 - Fix for a regression in O_DIRECT, where we could potentially lose the
   error value (Maximilian Heyne)

 - NVMe pull request from Christoph, with little fixes all over the map
   for NVMe.

* tag 'for-linus-20181201' of git://git.kernel.dk/linux-block:
  block: fix single range discard merge
  nvme-rdma: fix double freeing of async event data
  nvme: flush namespace scanning work just before removing namespaces
  nvme: warn when finding multi-port subsystems without multipathing enabled
  fs: fix lost error code in dio_complete
  nvme-pci: fix surprise removal
  nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()
  nvme: Free ctrl device name on init failure
2018-12-01 11:36:32 -08:00
Linus Torvalds
d8f190ee83 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "31 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
  ocfs2: fix potential use after free
  mm/khugepaged: fix the xas_create_range() error path
  mm/khugepaged: collapse_shmem() do not crash on Compound
  mm/khugepaged: collapse_shmem() without freezing new_page
  mm/khugepaged: minor reorderings in collapse_shmem()
  mm/khugepaged: collapse_shmem() remember to clear holes
  mm/khugepaged: fix crashes due to misaccounted holes
  mm/khugepaged: collapse_shmem() stop if punched or truncated
  mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
  mm/huge_memory: splitting set mapping+index before unfreeze
  mm/huge_memory: rename freeze_page() to unmap_page()
  initramfs: clean old path before creating a hardlink
  kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
  psi: make disabling/enabling easier for vendor kernels
  proc: fixup map_files test on arm
  debugobjects: avoid recursive calls with kmemleak
  userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  userfaultfd: shmem: add i_size checks
  userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
  userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
  ...
2018-11-30 18:45:49 -08:00
Linus Torvalds
fd3b3e0ec5 Merge tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache and cachefiles fixes from David Howells:
 "Misc fixes:

   - Fix an assertion failure at fs/cachefiles/xattr.c:138 caused by a
     race between a cache object lookup failing and someone attempting
     to reenable that object, thereby triggering an update of the
     object's attributes.

   - Fix an assertion failure at fs/fscache/operation.c:449 caused by a
     split atomic subtract and atomic read that allows a race to happen.

   - Fix a leak of backing pages when simultaneously reading the same
     page from the same object from two or more threads.

   - Fix a hang due to a race between a cache object being discarded and
     the corresponding cookie being reenabled.

  There are also some minor cleanups:

   - Cast an enum value to a different enum type to prevent clang from
     generating a warning. This shouldn't cause any sort of change in
     the emitted code.

   - Use ktime_get_real_seconds() instead of get_seconds(). This is just
     used to uniquify a filename for an object to be placed in the
     graveyard. Objects placed there are deleted by cachfilesd in
     userspace immediately thereafter.

   - Remove an initialised, but otherwise unused variable. This should
     have been entirely optimised away anyway"

* tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  fscache, cachefiles: remove redundant variable 'cache'
  cachefiles: avoid deprecated get_seconds()
  cachefiles: Explicitly cast enumerated type in put_object
  fscache: fix race between enablement and dropping of object
  cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active
  fscache: Fix race in fscache_op_complete() due to split atomic_sub & read
  cachefiles: Fix an assertion failure when trying to update a failed object
2018-11-30 18:32:33 -08:00
Pan Bian
164f7e5867 ocfs2: fix potential use after free
ocfs2_get_dentry() calls iput(inode) to drop the reference count of
inode, and if the reference count hits 0, inode is freed.  However, in
this function, it then reads inode->i_generation, which may result in a
use after free bug.  Move the put operation later.

Link: http://lkml.kernel.org/r/1543109237-110227-1-git-send-email-bianpan2016@163.com
Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Andrea Arcangeli
29ec90660d userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration.  This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.

The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.

Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
Fixes: ff62a34210 ("hugetlb: implement memfd sealing")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Reported-by: Jann Horn <jannh@google.com>
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Pan Bian
c7d7d620dc hfsplus: do not free node before using
hfs_bmap_free() frees node via hfs_bnode_put(node).  However it then
reads node->this when dumping error message on an error path, which may
result in a use-after-free bug.  This patch frees node only when it is
never used.

Link: http://lkml.kernel.org/r/1543053441-66942-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Pan Bian
ce96a407ad hfs: do not free node before using
hfs_bmap_free() frees the node via hfs_bnode_put(node).  However, it
then reads node->this when dumping error message on an error path, which
may result in a use-after-free bug.  This patch frees the node only when
it is never again used.

Link: http://lkml.kernel.org/r/1542963889-128825-1-git-send-email-bianpan2016@163.com
Fixes: a1185ffa2fc ("HFS rewrite")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Larry Chen
e21e57445a ocfs2: fix deadlock caused by ocfs2_defrag_extent()
ocfs2_defrag_extent may fall into deadlock.

ocfs2_ioctl_move_extents
    ocfs2_ioctl_move_extents
      ocfs2_move_extents
        ocfs2_defrag_extent
          ocfs2_lock_allocators_move_extents

            ocfs2_reserve_clusters
              inode_lock GLOBAL_BITMAP_SYSTEM_INODE

	  __ocfs2_flush_truncate_log
              inode_lock GLOBAL_BITMAP_SYSTEM_INODE

As backtrace shows above, ocfs2_reserve_clusters() will call inode_lock
against the global bitmap if local allocator has not sufficient cluters.
Once global bitmap could meet the demand, ocfs2_reserve_cluster will
return success with global bitmap locked.

After ocfs2_reserve_cluster(), if truncate log is full,
__ocfs2_flush_truncate_log() will definitely fall into deadlock because
it needs to inode_lock global bitmap, which has already been locked.

To fix this bug, we could remove from
ocfs2_lock_allocators_move_extents() the code which intends to lock
global allocator, and put the removed code after
__ocfs2_flush_truncate_log().

ocfs2_lock_allocators_move_extents() is referred by 2 places, one is
here, the other does not need the data allocator context, which means
this patch does not affect the caller so far.

Link: http://lkml.kernel.org/r/20181101071422.14470-1-lchen@suse.com
Signed-off-by: Larry Chen <lchen@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:13 -08:00
Linus Torvalds
5f1ca5c619 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Assorted fixes all over the place.

  The iov_iter one is this cycle regression (splice from UDP triggering
  WARN_ON()), the rest is older"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs: Use d_instantiate() rather than d_add() and don't d_drop()
  afs: Fix missing net error handling
  afs: Fix validation/callback interaction
  iov_iter: teach csum_and_copy_to_iter() to handle pipe-backed ones
  exportfs: do not read dentry after free
  exportfs: fix 'passing zero to ERR_PTR()' warning
  aio: fix failure to put the file pointer
  sysv: return 'err' instead of 0 in __sysv_write_inode
2018-11-30 10:47:50 -08:00
Linus Torvalds
e9eaf72e73 Merge tag 'pstore-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fix from Kees Cook:
 "Fix corrupted compression due to unlucky size choice with ECC"

* tag 'pstore-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Correctly calculate usable PRZ bytes
2018-11-30 09:03:15 -08:00
NeilBrown
5946c4319e fs/locks: allow a lock request to block other requests.
Currently, a lock can block pending requests, but all pending
requests are equal.  If lots of pending requests are
mutually exclusive, this means they will all be woken up
and all but one will fail.  This can hurt performance.

So we will allow pending requests to block other requests.
Only the first request will be woken, and it will wake the others.

This patch doesn't implement this fully, but prepares the way.

- It acknowledges that a request might be blocking other requests,
  and when the request is converted to a lock, those blocked
  requests are moved across.
- When a request is requeued or discarded, all blocked requests are
  woken.
- When deadlock-detection looks for the lock which blocks a
  given request, we follow the chain of ->fl_blocker all
  the way to the top.

Tested-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30 11:26:12 -05:00
NeilBrown
d6367d6241 fs/locks: use properly initialized file_lock when unlocking.
Both locks_remove_posix() and locks_remove_flock() use a
struct file_lock without calling locks_init_lock() on it.
This means the various list_heads are not initialized, which
will become a problem with a later patch.

So change them both to initialize properly.  For flock locks,
this involves using flock_make_lock(), and changing it to
allow a file_lock to be passed in, so memory allocation isn't
always needed.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30 11:26:12 -05:00
NeilBrown
4316c3c685 ocfs2: properly initial file_lock used for unlock.
Rather than assuming all-zeros is sufficient, use the available API to
initialize the file_lock structure use for unlock.  VFS-level changes
will soon make it important that the list_heads in file_lock are
always properly initialized.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30 11:26:12 -05:00
NeilBrown
4d62d3f70b gfs2: properly initial file_lock used for unlock.
Rather than assuming all-zeros is sufficient, use the available API to
initialize the file_lock structure use for unlock.  VFS-level changes
will soon make it important that the list_heads in file_lock are
always properly initialized.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30 11:26:12 -05:00
NeilBrown
7b587e1a5a NFS: use locks_copy_lock() to copy locks.
Using memcpy() to copy lock requests leaves the various
list_head in an inconsistent state.
As we will soon attach a list of conflicting request to
another pending request, we need these lists to be consistent.
So change NFS to use locks_init_lock/locks_copy_lock instead
of memcpy.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30 11:26:12 -05:00
NeilBrown
ad6bbd8b18 fs/locks: split out __locks_wake_up_blocks().
This functionality will be useful in future patches, so
split it out from locks_wake_up_blocks().

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30 11:26:12 -05:00
NeilBrown
ada5c1da86 fs/locks: rename some lists and pointers.
struct file lock contains an 'fl_next' pointer which
is used to point to the lock that this request is blocked
waiting for.  So rename it to fl_blocker.

The fl_blocked list_head in an active lock is the head of a list of
blocked requests.  In a request it is a node in that list.
These are two distinct uses, so replace with two list_heads
with different names.
fl_blocked_requests is the head of a list of blocked requests
fl_blocked_member is a node in a member of that list.

The two different list_heads are never used at the same time, but that
will change in a future patch.

Note that a tracepoint is changed to report fl_blocker instead
of fl_next.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30 11:26:12 -05:00
Colin Ian King
31ffa56383 fscache, cachefiles: remove redundant variable 'cache'
Variable 'cache' is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
warning: variable 'cache' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30 16:00:58 +00:00
Arnd Bergmann
34e06fe4d0 cachefiles: avoid deprecated get_seconds()
get_seconds() returns an unsigned long can overflow on some architectures
and is deprecated because of that. In cachefs, we cast that number to
a a 32-bit integer, which will overflow in year 2106 on all architectures.

As confirmed by David Howells, the overflow probably isn't harmful
in the end, since the timestamps are only used to make the file names
unique, but they don't strictly have to be in monotonically increasing
order since the files only exist in order to be deleted as quickly
as possible.

Moving to ktime_get_real_seconds() avoids the deprecated interface.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30 16:00:58 +00:00
Nathan Chancellor
b7e768b7e3 cachefiles: Explicitly cast enumerated type in put_object
Clang warns when one enumerated type is implicitly converted to another.

fs/cachefiles/namei.c:247:50: warning: implicit conversion from
enumeration type 'enum cachefiles_obj_ref_trace' to different
enumeration type 'enum fscache_obj_ref_trace' [-Wenum-conversion]
        cache->cache.ops->put_object(&xobject->fscache,
cachefiles_obj_put_wait_retry);

Silence this warning by explicitly casting to fscache_obj_ref_trace,
which is also done in put_object.

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30 16:00:58 +00:00
NeilBrown
c5a94f434c fscache: fix race between enablement and dropping of object
It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30 15:57:31 +00:00