Commit Graph

57206 Commits

Author SHA1 Message Date
Jeff Moyer
a538e3ff9d aio: fix spectre gadget in lookup_ioctx
Matthew pointed out that the ioctx_table is susceptible to spectre v1,
because the index can be controlled by an attacker.  The below patch
should mitigate the attack for all of the aio system calls.

Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox <willy@infradead.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-11 11:45:50 -07:00
Luis Henriques
6f9718fe41 ceph: make 'nocopyfrom' a default mount option
Since we found a problem with the 'copy-from' operation after objects have
been truncated, offloading object copies to OSDs should be discouraged
until the issue is fixed.

Thus, this patch adds the 'nocopyfrom' mount option to the default mount
options which effectily means that remote copies won't be done in
copy_file_range unless they are explicitly enabled at mount time.

[ Adjust ceph_show_options() accordingly. ]

Link: https://tracker.ceph.com/issues/37378
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-12-11 18:22:17 +01:00
Abhi Das
2a5f14f279 gfs2: read journal in large chunks to locate the head
Use bio(s) to read in the journal sequentially in large chunks and
locate the head of the journal.

This version addresses the issues Christoph pointed out w.r.t error handling
and using deprecated API.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
2018-12-11 17:50:36 +01:00
Abhi Das
40e0e61e36 gfs2: add a helper function to get_log_header that can be used elsewhere
Move and re-order the error checks and hash/crc computations into another
function __get_log_header() so it can be used in scenarios where buffer_heads
are not being used for the log header.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-11 17:50:36 +01:00
Abhi Das
5b84609532 gfs2: changes to gfs2_log_XXX_bio
Change gfs2_log_XXX_bio family of functions so they can be used
with different bios, not just sdp->sd_log_bio.

This patch also contains some clean up suggested by Andreas.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-12-11 17:50:36 +01:00
Abhi Das
98583b3e87 gfs2: add more timing info to journal recovery process
Tells you how many milliseconds map_journal_extents and find_jhead
take.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-11 17:50:36 +01:00
Andreas Gruenbacher
0ebbe4f974 gfs2: Fix the gfs2_invalidatepage description
The comment incorrectly states that the function always returns 0.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-11 17:50:35 +01:00
Andreas Gruenbacher
977767a7e1 gfs2: Clean up gfs2_is_{ordered,writeback}
The gfs2_is_ordered and gfs2_is_writeback checks are weird in that they
implicitly check for !gfs2_is_jdata.  This makes understanding how to
use those functions correctly a challenge.  Clean this up by making
gfs2_is_ordered and gfs2_is_writeback take a super block instead of an
inode and by removing the implicit !gfs2_is_jdata checks.  Update the
callers accordingly.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-11 17:50:35 +01:00
Nikolay Borisov
ac9498d686 fanotify: Use inode_is_open_for_write
Use the aptly named function rather than opencoding i_writecount check.
No functional changes.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-12-11 10:55:45 +01:00
Chanho Min
4addd2640f exec: make prepare_bprm_creds static
prepare_bprm_creds is not used outside exec.c, so there's no reason for it
to have external linkage.

Signed-off-by: Chanho Min <chanho.min@lge.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-10 04:11:06 -05:00
Takeshi Misawa
d72f70da60 fuse: Fix memory leak in fuse_dev_free()
When ntfs is unmounted, the following leak is
reported by kmemleak.

kmemleak report:

unreferenced object 0xffff880052bf4400 (size 4096):
  comm "mount.ntfs", pid 16530, jiffies 4294861127 (age 3215.836s)
  hex dump (first 32 bytes):
    00 44 bf 52 00 88 ff ff 00 44 bf 52 00 88 ff ff  .D.R.....D.R....
    10 44 bf 52 00 88 ff ff 10 44 bf 52 00 88 ff ff  .D.R.....D.R....
  backtrace:
    [<00000000bf4a2f8d>] fuse_fill_super+0xb22/0x1da0 [fuse]
    [<000000004dde0f0c>] mount_bdev+0x263/0x320
    [<0000000025aebc66>] mount_fs+0x82/0x2bf
    [<0000000042c5a6be>] vfs_kern_mount.part.33+0xbf/0x480
    [<00000000ed10cd5b>] do_mount+0x3de/0x2ad0
    [<00000000d59ff068>] ksys_mount+0xba/0xd0
    [<000000001bda1bcc>] __x64_sys_mount+0xba/0x150
    [<00000000ebe26304>] do_syscall_64+0x151/0x490
    [<00000000d25f2b42>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<000000002e0abd2c>] 0xffffffffffffffff

fuse_dev_alloc() allocate fud->pq.processing.
But this hash table is not freed.

Fix this by freeing fud->pq.processing.

Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: be2ff42c5d ("fuse: Use hash table to link processing request")
2018-12-10 09:57:54 +01:00
Chengguang Xu
0a1e8258a4 ext4: compare old and new mode before setting update_mode flag
If new mode is the same as old mode we don't have to reset
inode mode in the rest of the code, so compare old and new
mode before setting update_mode flag.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-10 00:22:38 -05:00
Jens Axboe
96f774106e Merge tag 'v4.20-rc6' into for-4.21/block
Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the
two corruption fixes. We're going to be overhauling the direct dispatch
path, and we need to do that on top of the changes we made for that
in mainline.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-09 17:45:40 -07:00
Linus Torvalds
bc4caf186f Merge tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Three small fixes: a fix for smb3 direct i/o, a fix for CIFS DFS for
  stable and a minor cifs Kconfig fix"

* tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Avoid returning EBUSY to upper layer VFS
  cifs: Fix separator when building path from dentry
  cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
2018-12-09 10:15:13 -08:00
Linus Torvalds
fa82dcbf2a Merge tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax fixes from Dan Williams:
 "The last of the known regression fixes and fallout from the Xarray
  conversion of the filesystem-dax implementation.

  On the path to debugging why the dax memory-failure injection test
  started failing after the Xarray conversion a couple more fixes for
  the dax_lock_mapping_entry(), now called dax_lock_page(), surfaced.
  Those plus the bug that started the hunt are now addressed. These
  patches have appeared in a -next release with no issues reported.

  Note the touches to mm/memory-failure.c are just the conversion to the
  new function signature for dax_lock_page().

  Summary:

   - Fix the Xarray conversion of fsdax to properly handle
     dax_lock_mapping_entry() in the presense of pmd entries

   - Fix inode destruction racing a new lock request"

* tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix unlock mismatch with updated API
  dax: Don't access a freed inode
  dax: Check page->mapping isn't NULL
2018-12-09 09:54:04 -08:00
Linus Torvalds
f896adc42d Merge tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
 "Here are hopefully the last set of fixes for 4.20.

  There's a fix for a longstanding statfs reporting problem with project
  quotas, a correction for page cache invalidation behaviors when
  fallocating near EOF, and a fix for a broken metadata verifier return
  code.

  Finally, the most important fix is to the pipe splicing code (aka the
  generic copy_file_range fallback) to avoid pointless short directio
  reads by only asking the filesystem for as much data as there are
  available pages in the pipe buffer. Our previous fix (simulated short
  directio reads because the number of pages didn't match the length of
  the read requested) caused subtle problems on overlayfs, so that part
  is reverted.

  Anyhow, this series passes fstests -g all on xfs and overlay+xfs, and
  has passed 17 billion fsx operations problem-free since I started
  testing

  Summary:

   - Fix broken project quota inode counts

   - Fix incorrect PAGE_MASK/PAGE_SIZE usage

   - Fix incorrect return value in btree verifier

   - Fix WARN_ON remap flags false positive

   - Fix splice read overflows"

* tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: partially revert 4721a60109 (simulated directio short read on EFAULT)
  splice: don't read more than available pipe space
  vfs: allow some remap flags to be passed to vfs_clone_file_range
  xfs: fix inverted return from xfs_btree_sblock_verify_crc
  xfs: fix PAGE_MASK usage in xfs_free_file_space
  fs/xfs: fix f_ffree value for statfs when project quota is set
2018-12-08 11:25:02 -08:00
Dennis Zhou
fd42df305f blkcg: associate writeback bios with a blkg
One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg(). In
this patch, wbc_init_bio() now requires a bio to have a device
associated with it.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:37 -07:00
NeilBrown
7bbd1fc0e9 fs/locks: remove unnecessary white space.
- spaces before tabs,
 - spaces at the end of lines,
 - multiple blank lines,
 - blank lines before EXPORT_SYMBOL,
can all go.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-12-07 06:51:00 -05:00
NeilBrown
cb03f94ffb fs/locks: merge posix_unblock_lock() and locks_delete_block()
posix_unblock_lock() is not specific to posix locks, and behaves
nearly identically to locks_delete_block() - the former returning a
status while the later doesn't.

So discard posix_unblock_lock() and use locks_delete_block() instead,
after giving that function an appropriate return value.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-12-07 06:50:56 -05:00
NeilBrown
fd7732e033 fs/locks: create a tree of dependent requests.
When we find an existing lock which conflicts with a request,
and the request wants to wait, we currently add the request
to a list.  When the lock is removed, the whole list is woken.
This can cause the thundering-herd problem.
To reduce the problem, we make use of the (new) fact that
a pending request can itself have a list of blocked requests.
When we find a conflict, we look through the existing blocked requests.
If any one of them blocks the new request, the new request is attached
below that request, otherwise it is added to the list of blocked
requests, which are now known to be mutually non-conflicting.

This way, when the lock is released, only a set of non-conflicting
locks will be woken, the rest can stay asleep.
If the lock request cannot be granted and the request needs to be
requeued, all the other requests it blocks will then be woken

To make this more concrete:

  If you have a many-core machine, and have many threads all wanting to
  briefly lock a give file (udev is known to do this), you can get quite
  poor performance.

  When one thread releases a lock, it wakes up all other threads that
  are waiting (classic thundering-herd) - one will get the lock and the
  others go to sleep.
  When you have few cores, this is not very noticeable: by the time the
  4th or 5th thread gets enough CPU time to try to claim the lock, the
  earlier threads have claimed it, done what was needed, and released.
  So with few cores, many of the threads don't end up contending.
  With 50+ cores, lost of threads can get the CPU at the same time,
  and the contention can easily be measured.

  This patchset creates a tree of pending lock requests in which siblings
  don't conflict and each lock request does conflict with its parent.
  When a lock is released, only requests which don't conflict with each
  other a woken.

  Testing shows that lock-acquisitions-per-second is now fairly stable
  even as the number of contending process goes to 1000.  Without this
  patch, locks-per-second drops off steeply after a few 10s of
  processes.

  There is a small cost to this extra complexity.
  At 20 processes running a particular test on 72 cores, the lock
  acquisitions per second drops from 1.8 million to 1.4 million with
  this patch.  For 100 processes, this patch still provides 1.4 million
  while without this patch there are about 700,000.

Reported-and-tested-by: Martin Wilck <mwilck@suse.de>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-12-07 06:49:24 -05:00
NeilBrown
c0e1590897 fs/locks: change all *_conflict() functions to return bool.
posix_locks_conflict() and flock_locks_conflict() both return int.
leases_conflict() returns bool.

This inconsistency will cause problems for the next patch if not
fixed.

So change posix_locks_conflict() and flock_locks_conflict() to return
bool.
Also change the locks_conflict() helper.

And convert some
   return (foo);
to
   return foo;

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-12-07 06:49:24 -05:00
NeilBrown
16306a61d3 fs/locks: always delete_block after waiting.
Now that requests can block other requests, we
need to be careful to always clean up those blocked
requests.
Any time that we wait for a request, we might have
other requests attached, and when we stop waiting,
we must clean them up.
If the lock was granted, the requests might have been
moved to the new lock, though when merged with a
pre-exiting lock, this might not happen.
In all cases we don't want blocked locks to remain
attached, so we remove them to be safe.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Tested-by: syzbot+a4a3d526b4157113ec6a@syzkaller.appspotmail.com
Tested-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-12-07 06:49:17 -05:00
Long Li
6ac79291fb CIFS: Avoid returning EBUSY to upper layer VFS
EBUSY is not handled by VFS, and will be passed to user-mode. This is not
correct as we need to wait for more credits.

This patch also fixes a bug where rsize or wsize is used uninitialized when
the call to server->ops->wait_mtu_credits() fails.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-12-07 00:59:23 -06:00
Linus Torvalds
7f80c7325b Merge tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "This is mainly fallout from the updates to the SUNRPC code that is
  being triggered from less common combinations of NFS mount options.

  Highlights include:

  Stable fixes:
   - Fix a page leak when using RPCSEC_GSS/krb5p to encrypt data.

  Bugfixes:
   - Fix a regression that causes the RPC receive code to hang
   - Fix call_connect_status() so that it handles tasks that got
     transmitted while queued waiting for the socket lock.
   - Fix a memory leak in call_encode()
   - Fix several other connect races.
   - Fix receive code error handling.
   - Use the discard iterator rather than MSG_TRUNC for compatibility
     with AF_UNIX/AF_LOCAL sockets.
   - nfs: don't dirty kernel pages read by direct-io
   - pnfs/Flexfiles fix to enforce per-mirror stateid only for NFSv4
     data servers"

* tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Don't force a redundant disconnection in xs_read_stream()
  SUNRPC: Fix up socket polling
  SUNRPC: Use the discard iterator rather than MSG_TRUNC
  SUNRPC: Treat EFAULT as a truncated message in xs_read_stream_request()
  SUNRPC: Fix up handling of the XDRBUF_SPARSE_PAGES flag
  SUNRPC: Fix RPC receive hangs
  SUNRPC: Fix a potential race in xprt_connect()
  SUNRPC: Fix a memory leak in call_encode()
  SUNRPC: Fix leak of krb5p encode pages
  SUNRPC: call_connect_status() must handle tasks that got transmitted
  nfs: don't dirty kernel pages read by direct-io
  flexfiles: enforce per-mirror stateid only for v4 DSes
2018-12-06 18:57:04 -08:00
Deepa Dinamani
7a35397f8c io_pgetevents: use __kernel_timespec
struct timespec is not y2038 safe.
struct __kernel_timespec is the new y2038 safe structure for all
syscalls that are using struct timespec.
Update io_pgetevents interfaces to use struct __kernel_timespec.

sigset_t also has different representations on 32 bit and 64 bit
architectures. Hence, we need to support the following different
syscalls:

New y2038 safe syscalls:
(Controlled by CONFIG_64BIT_TIME for 32 bit ABIs)

Native 64 bit(unchanged) and native 32 bit : sys_io_pgetevents
Compat : compat_sys_io_pgetevents_time64

Older y2038 unsafe syscalls:
(Controlled by CONFIG_32BIT_COMPAT_TIME for 32 bit ABIs)

Native 32 bit : sys_io_pgetevents_time32
Compat : compat_sys_io_pgetevents

Note that io_getevents syscalls do not have a y2038 safe solution.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-06 17:23:31 +01:00
Deepa Dinamani
e024707bcc pselect6: use __kernel_timespec
struct timespec is not y2038 safe.
struct __kernel_timespec is the new y2038 safe structure for all
syscalls that are using struct timespec.
Update pselect interfaces to use struct __kernel_timespec.

sigset_t also has different representations on 32 bit and 64 bit
architectures. Hence, we need to support the following different
syscalls:

New y2038 safe syscalls:
(Controlled by CONFIG_64BIT_TIME for 32 bit ABIs)

Native 64 bit(unchanged) and native 32 bit : sys_pselect6
Compat : compat_sys_pselect6_time64

Older y2038 unsafe syscalls:
(Controlled by CONFIG_32BIT_COMPAT_TIME for 32 bit ABIs)

Native 32 bit : pselect6_time32
Compat : compat_sys_pselect6

Note that all other versions of select syscalls will not have
y2038 safe versions.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-06 17:23:18 +01:00
Deepa Dinamani
8bd27a3004 ppoll: use __kernel_timespec
struct timespec is not y2038 safe.
struct __kernel_timespec is the new y2038 safe structure for all
syscalls that are using struct timespec.
Update ppoll interfaces to use struct __kernel_timespec.

sigset_t also has different representations on 32 bit and 64 bit
architectures. Hence, we need to support the following different
syscalls:

New y2038 safe syscalls:
(Controlled by CONFIG_64BIT_TIME for 32 bit ABIs)

Native 64 bit(unchanged) and native 32 bit : sys_ppoll
Compat : compat_sys_ppoll_time64

Older y2038 unsafe syscalls:
(Controlled by CONFIG_32BIT_COMPAT_TIME for 32 bit ABIs)

Native 32 bit : ppoll_time32
Compat : compat_sys_ppoll

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-06 17:23:05 +01:00
Deepa Dinamani
854a6ed568 signal: Add restore_user_sigmask()
Refactor the logic to restore the sigmask before the syscall
returns into an api.
This is useful for versions of syscalls that pass in the
sigmask and expect the current->sigmask to be changed during
the execution and restored after the execution of the syscall.

With the advent of new y2038 syscalls in the subsequent patches,
we add two more new versions of the syscalls (for pselect, ppoll
and io_pgetevents) in addition to the existing native and compat
versions. Adding such an api reduces the logic that would need to
be replicated otherwise.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-06 17:22:53 +01:00
Deepa Dinamani
ded653ccbe signal: Add set_user_sigmask()
Refactor reading sigset from userspace and updating sigmask
into an api.

This is useful for versions of syscalls that pass in the
sigmask and expect the current->sigmask to be changed during,
and restored after, the execution of the syscall.

With the advent of new y2038 syscalls in the subsequent patches,
we add two more new versions of the syscalls (for pselect, ppoll,
and io_pgetevents) in addition to the existing native and compat
versions. Adding such an api reduces the logic that would need to
be replicated otherwise.

Note that the calls to sigprocmask() ignored the return value
from the api as the function only returns an error on an invalid
first argument that is hardcoded at these call sites.
The updated logic uses set_current_blocked() instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-06 17:22:38 +01:00
Paulo Alcantara
c988de29ca cifs: Fix separator when building path from dentry
Make sure to use the CIFS_DIR_SEP(cifs_sb) as path separator for
prefixpath too. Fixes a bug with smb1 UNIX extensions.

Fixes: a6b5058faf ("fs/cifs: make share unaccessible at root level mountable")
Signed-off-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2018-12-06 02:20:17 -06:00
Steve French
6e785302da cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
Missing a dependency.  Shouldn't show cifs posix extensions
in Kconfig if CONFIG_CIFS_ALLOW_INSECURE_DIALECTS (ie SMB1
protocol) is disabled.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-12-06 02:20:14 -06:00
Dave Airlie
467e8a516d Merge tag 'drm-intel-next-2018-12-04' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Final drm/i915 changes for v4.21:
- ICL DSI video mode enabling (Madhav, Vandita, Jani, Imre)
- eDP sink count fix (José)
- PSR fixes (José)
- DRM DP helper and i915 DSC enabling (Manasi, Gaurav, Anusha)
- DP FEC enabling (Anusha)
- SKL+ watermark/ddb programming improvements (Ville)
- Pixel format fixes (Ville)
- Selftest updates (Chris, Tvrtko)
- GT and engine workaround improvements (Tvrtko)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87va496uoe.fsf@intel.com
2018-12-06 09:17:51 +10:00
Linus Torvalds
d089709045 Merge tag 'for-4.20-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
 "A patch in 4.19 introduced a sanity check that was too strict and a
  filesystem cannot be mounted.

  This happens for filesystems with more than 10 devices and has been
  reported by a few users so we need the fix to propagate to stable"

* tag 'for-4.20-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable
2018-12-05 09:58:17 -08:00
Kees Cook
5b03a472b4 fanotify: Make sure to check event_len when copying
As a precaution, make sure we check event_len when copying to userspace.
Based on old feedback: https://lkml.kernel.org/r/542D9FE5.3010009@gmx.de

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-12-05 12:47:22 +01:00
Matthew Wilcox
27359fd6e5 dax: Fix unlock mismatch with updated API
Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

Change the api to return the unlock cookie from dax_lock_page() and pass
it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry with
signatures like:

 WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
 RIP: 0010:dax_insert_entry+0x2b2/0x2d0
 [..]
 Call Trace:
  dax_iomap_pte_fault.isra.41+0x791/0xde0
  ext4_dax_huge_fault+0x16f/0x1f0
  ? up_read+0x1c/0xa0
  __do_fault+0x1f/0x160
  __handle_mm_fault+0x1033/0x1490
  handle_mm_fault+0x18b/0x3d0

Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
Fixes: 9f32d22130 ("dax: Convert dax_lock_mapping_entry to XArray")
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-04 21:32:00 -08:00
zhengbin
255fbca651 nfsd: Return EPERM, not EACCES, in some SETATTR cases
As the man(2) page for utime/utimes states, EPERM is returned when the
second parameter of utime or utimes is not NULL, the caller's effective UID
does not match the owner of the file, and the caller is not privileged.

However, in a NFS directory mounted from knfsd, it will return EACCES
(from nfsd_setattr-> fh_verify->nfsd_permission).  This patch fixes
that.

Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-04 20:48:07 -05:00
Darrick J. Wong
8f67b5adc0 iomap: partially revert 4721a60109 (simulated directio short read on EFAULT)
In commit 4721a60109, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read.  This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.

In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads.  This causes infinite
splice() loops and assertion failures on generic/095 on overlayfs
because xfs only permit total success or total failure of a directio
operation.  The underlying issue in the pipe splice code has now been
fixed by changing the pipe splice loop to avoid avoid reading more data
than there is space in the pipe.

Therefore, it's no longer necessary to simulate the short directio, so
remove the hack from iomap.

Fixes: 4721a60109 ("iomap: dio data corruption and spurious errors when pipes fill")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Ranted-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04 09:40:02 -08:00
Darrick J. Wong
1761444557 splice: don't read more than available pipe space
In commit 4721a60109, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read.  This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.

In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads.

The brokenness is compounded by splice_direct_to_actor immediately
bailing on do_splice_to returning <= 0 without ever calling ->actor
(which empties out the pipe), so if userspace calls back we'll EFAULT
again on the full pipe, and nothing ever gets copied.

Therefore, teach splice_direct_to_actor to clamp its requests to the
amount of free space in the pipe and remove the simulated short read
from the iomap directio code.

Fixes: 4721a60109 ("iomap: dio data corruption and spurious errors when pipes fill")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Ranted-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04 08:50:49 -08:00
Darrick J. Wong
6744557b53 vfs: allow some remap flags to be passed to vfs_clone_file_range
In overlayfs, ovl_remap_file_range calls vfs_clone_file_range on the
lower filesystem's inode, passing through whatever remap flags it got
from its caller.  Since vfs_copy_file_range first tries a filesystem's
remap function with REMAP_FILE_CAN_SHORTEN, this can get passed through
to the second vfs_copy_file_range call, and this isn't an issue.
Change the WARN_ON to look only for the DEDUP flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-12-04 08:50:49 -08:00
Eric Sandeen
7d048df4e9 xfs: fix inverted return from xfs_btree_sblock_verify_crc
xfs_btree_sblock_verify_crc is a bool so should not be returning
a failaddr_t; worse, if xfs_log_check_lsn fails it returns
__this_address which looks like a boolean true (i.e. success)
to the caller.

(interestingly xfs_btree_lblock_verify_crc doesn't have the issue)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04 08:50:49 -08:00
Darrick J. Wong
a579121f94 xfs: fix PAGE_MASK usage in xfs_free_file_space
In commit e53c4b598, I *tried* to teach xfs to force writeback when we
fzero/fpunch right up to EOF so that if EOF is in the middle of a page,
the post-EOF part of the page gets zeroed before we return to userspace.
Unfortunately, I missed the part where PAGE_MASK is ~(PAGE_SIZE - 1),
which means that we totally fail to zero if we're fpunching and EOF is
within the first page.  Worse yet, the same PAGE_MASK thinko plagues the
filemap_write_and_wait_range call, so we'd initiate writeback of the
entire file, which (mostly) masked the thinko.

Drop the tricky PAGE_MASK and replace it with correct usage of PAGE_SIZE
and the proper rounding macros.

Fixes: e53c4b598 ("xfs: ensure post-EOF zeroing happens after zeroing part of a file")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-12-04 08:50:49 -08:00
Christoph Hellwig
154989e45f aio: clear IOCB_HIPRI
No one is going to poll for aio (yet), so we must clear the HIPRI
flag, as we would otherwise send it down the poll queues, where no
one will be polling for completions.

Signed-off-by: Christoph Hellwig <hch@lst.de>

IOCB_HIPRI, not RWF_HIPRI.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04 09:39:06 -07:00
Jens Axboe
89d04ec349 Merge tag 'v4.20-rc5' into for-4.21/block
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and
also getting the merge fix that went into mainline that users are
hitting testing for-4.21/block and/or for-next.

* tag 'v4.20-rc5': (664 commits)
  Linux 4.20-rc5
  PCI: Fix incorrect value returned from pcie_get_speed_cap()
  MAINTAINERS: Update linux-mips mailing list address
  ocfs2: fix potential use after free
  mm/khugepaged: fix the xas_create_range() error path
  mm/khugepaged: collapse_shmem() do not crash on Compound
  mm/khugepaged: collapse_shmem() without freezing new_page
  mm/khugepaged: minor reorderings in collapse_shmem()
  mm/khugepaged: collapse_shmem() remember to clear holes
  mm/khugepaged: fix crashes due to misaccounted holes
  mm/khugepaged: collapse_shmem() stop if punched or truncated
  mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
  mm/huge_memory: splitting set mapping+index before unfreeze
  mm/huge_memory: rename freeze_page() to unmap_page()
  initramfs: clean old path before creating a hardlink
  kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
  psi: make disabling/enabling easier for vendor kernels
  proc: fixup map_files test on arm
  debugobjects: avoid recursive calls with kmemleak
  userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  ...
2018-12-04 09:38:05 -07:00
Rafael J. Wysocki
a72173ecfc Revert "exec: make de_thread() freezable"
Revert commit c22397888f "exec: make de_thread() freezable" as
requested by Ingo Molnar:

"So there's a new regression in v4.20-rc4, my desktop produces this
lockdep splat:

[ 1772.588771] WARNING: pkexec/4633 still has locks held!
[ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted
[ 1772.588775] ------------------------------------
[ 1772.588776] 1 lock held by pkexec/4633:
[ 1772.588778]  #0: 00000000ed85fbf8 (&sig->cred_guard_mutex){+.+.}, at: prepare_bprm_creds+0x2a/0x70
[ 1772.588786] stack backtrace:
[ 1772.588789] CPU: 7 PID: 4633 Comm: pkexec Not tainted 4.20.0-rc4-custom-00213-g93a49841322b #1
[ 1772.588792] Call Trace:
[ 1772.588800]  dump_stack+0x85/0xcb
[ 1772.588803]  flush_old_exec+0x116/0x890
[ 1772.588807]  ? load_elf_phdrs+0x72/0xb0
[ 1772.588809]  load_elf_binary+0x291/0x1620
[ 1772.588815]  ? sched_clock+0x5/0x10
[ 1772.588817]  ? search_binary_handler+0x6d/0x240
[ 1772.588820]  search_binary_handler+0x80/0x240
[ 1772.588823]  load_script+0x201/0x220
[ 1772.588825]  search_binary_handler+0x80/0x240
[ 1772.588828]  __do_execve_file.isra.32+0x7d2/0xa60
[ 1772.588832]  ? strncpy_from_user+0x40/0x180
[ 1772.588835]  __x64_sys_execve+0x34/0x40
[ 1772.588838]  do_syscall_64+0x60/0x1c0

The warning gets triggered by an ancient lockdep check in the freezer:

(gdb) list *0xffffffff812ece06
0xffffffff812ece06 is in flush_old_exec (./include/linux/freezer.h:57).
52	 * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
53	 * If try_to_freeze causes a lockdep warning it means the caller may deadlock
54	 */
55	static inline bool try_to_freeze_unsafe(void)
56	{
57		might_sleep();
58		if (likely(!freezing(current)))
59			return false;
60		return __refrigerator(false);
61	}

I reviewed the ->cred_guard_mutex code, and the mutex is held across all
of exec() - and we always did this.

But there's this recent -rc4 commit:

> Chanho Min (1):
>       exec: make de_thread() freezable

  c22397888f: exec: make de_thread() freezable

I believe this commit is bogus, you cannot call try_to_freeze() from
de_thread(), because it's holding the ->cred_guard_mutex."

Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-04 16:04:20 +01:00
Qu Wenruo
10950929e9 btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable
[BUG]
A completely valid btrfs will refuse to mount, with error message like:
  BTRFS critical (device sdb2): corrupt leaf: root=2 block=239681536 slot=172 \
    bg_start=12018974720 bg_len=10888413184, invalid block group size, \
    have 10888413184 expect (0, 10737418240]

This has been reported several times as the 4.19 kernel is now being
used. The filesystem refuses to mount, but is otherwise ok and booting
4.18 is a workaround.

Btrfs check returns no error, and all kernels used on this fs is later
than 2011, which should all have the 10G size limit commit.

[CAUSE]
For a 12 devices btrfs, we could allocate a chunk larger than 10G due to
stripe stripe bump up.

__btrfs_alloc_chunk()
|- max_stripe_size = 1G
|- max_chunk_size = 10G
|- data_stripe = 11
|- if (1G * 11 > 10G) {
       stripe_size = 976128930;
       stripe_size = round_up(976128930, SZ_16M) = 989855744

However the final stripe_size (989855744) * 11 = 10888413184, which is
still larger than 10G.

[FIX]
For the comprehensive check, we need to do the full check at chunk read
time, and rely on bg <-> chunk mapping to do the check.

We could just skip the length check for now.

Fixes: fce466eab7 ("btrfs: tree-checker: Verify block_group_item")
Cc: stable@vger.kernel.org # v4.19+
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-04 15:05:30 +01:00
Miklos Szeredi
ec7ba118b9 Revert "ovl: relax permission checking on underlying layers"
This reverts commit 007ea44892.

The commit broke some selinux-testsuite cases, and it looks like there's no
straightforward fix keeping the direction of this patch, so revert for now.

The original patch was trying to fix the consistency of permission checks, and
not an observed bug.  So reverting should be safe.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-04 11:31:30 +01:00
Ingo Molnar
4bbfd7467c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

- Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

- Replace calls of RCU-bh and RCU-sched update-side functions
  to their vanilla RCU counterparts.  This series is a step
  towards complete removal of the RCU-bh and RCU-sched update-side
  functions.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- Documentation updates, including a number of flavor-consolidation
  updates from Joel Fernandes.

- Miscellaneous fixes.

- Automate generation of the initrd filesystem used for
  rcutorture testing.

- Convert spin_is_locked() assertions to instead use lockdep.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- SRCU updates, especially including a fix from Dennis Krein
  for a bag-on-head-class bug.

- RCU torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04 07:52:30 +01:00
ruippan (潘睿)
e647e29196 ext4: fix EXT4_IOC_GROUP_ADD ioctl
Commit e2b911c535 ("ext4: clean up feature test macros with
predicate functions") broke the EXT4_IOC_GROUP_ADD ioctl.  This was
not noticed since only very old versions of resize2fs (before
e2fsprogs 1.42) use this ioctl.  However, using a new kernel with an
enterprise Linux userspace will cause attempts to use online resize to
fail with "No reserved GDT blocks".

Fixes: e2b911c535 ("ext4: clean up feature test macros with predicate...")
Cc: stable@kernel.org # v4.4
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: ruippan (潘睿) <ruippan@tencent.com>
2018-12-04 01:04:12 -05:00
Eric Sandeen
361d24d406 ext4: hard fail dax mount on unsupported devices
As dax inches closer to production use, an administrator should not
be surprised by silently disabling the feature they asked for.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04 00:46:39 -05:00
Chengguang Xu
50c15df69e ext4: remove redundant condition check
ext4_xattr_destroy_cache() can handle NULL pointer correctly,
so there is no need to check NULL pointer before calling
ext4_xattr_destroy_cache().

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04 00:24:42 -05:00