Commit Graph

60722 Commits

Author SHA1 Message Date
Deepa Dinamani
8833293d0a fs: omfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: me@bobcopeland.com
Cc: linux-karma-devel@lists.sourceforge.net
2019-08-30 08:11:25 -07:00
Deepa Dinamani
cdd62b5b07 fs: hpfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also change the local_to_gmt() to use time64_t instead
of time32_t.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: mikulas@artax.karlin.mff.cuni.cz
2019-08-30 08:11:25 -07:00
Deepa Dinamani
028ca4db0a fs: ceph: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

According to the disscussion in
https://patchwork.kernel.org/patch/8308691/ we agreed to use
unsigned 32 bit timestamps on ceph.
Update the limits accordingly.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: zyan@redhat.com
Cc: sage@redhat.com
Cc: idryomov@gmail.com
Cc: ceph-devel@vger.kernel.org
2019-08-30 07:27:19 -07:00
Deepa Dinamani
452c277941 fs: sysv: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: hch@infradead.org
2019-08-30 07:27:18 -07:00
Deepa Dinamani
487b25bc4b fs: affs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also fix timestamp calculation to avoid overflow
while converting from days to seconds.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: dsterba@suse.com
2019-08-30 07:27:18 -07:00
Deepa Dinamani
c0da64f6bb fs: fat: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Some FAT variants indicate that the years after 2099 are not supported.
Since commit 7decd1cb03 ("fat: Fix and cleanup timestamp conversion")
we support the full range of years that can be represented, up to 2107.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: hirofumi@mail.parknet.co.jp
2019-08-30 07:27:18 -07:00
Deepa Dinamani
cb7a69e605 fs: cifs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also fixed cnvrtDosUnixTm calculations to avoid int overflow
while computing maximum date.

References:

http://cifs.com/

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cifs/d416ff7c-c536-406e-a951-4f04b2fd1d2b

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: sfrench@samba.org
Cc: linux-cifs@vger.kernel.org
2019-08-30 07:27:18 -07:00
Deepa Dinamani
1fcb79c1b2 fs: nfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

The time formats for various verious is detailed in the
RFCs as below:

https://tools.ietf.org/html/rfc7862(time metadata)
https://tools.ietf.org/html/rfc7530:

nfstime4

   struct nfstime4 {
           int64_t         seconds;
           uint32_t        nseconds;
   };

https://tools.ietf.org/html/rfc1094

          struct timeval {
              unsigned int seconds;
              unsigned int useconds;
          };

https://tools.ietf.org/html/rfc1813

struct nfstime3 {
         uint32   seconds;
         uint32   nseconds;
      };

Use the limits as per the RFC.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: trond.myklebust@hammerspace.com
Cc: anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
2019-08-30 07:27:18 -07:00
Deepa Dinamani
4881c4971d ext4: Initialize timestamps limits
ext4 has different overflow limits for max filesystem
timestamps based on the extra bytes available.

The timestamp limits are calculated according to the
encoding table in
a4dad1ae24f85i(ext4: Fix handling of extended tv_sec):

* extra  msb of                         adjust for signed
* epoch  32-bit                         32-bit tv_sec to
* bits   time    decoded 64-bit tv_sec  64-bit tv_sec      valid time range
* 0 0    1    -0x80000000..-0x00000001  0x000000000   1901-12-13..1969-12-31
* 0 0    0    0x000000000..0x07fffffff  0x000000000   1970-01-01..2038-01-19
* 0 1    1    0x080000000..0x0ffffffff  0x100000000   2038-01-19..2106-02-07
* 0 1    0    0x100000000..0x17fffffff  0x100000000   2106-02-07..2174-02-25
* 1 0    1    0x180000000..0x1ffffffff  0x200000000   2174-02-25..2242-03-16
* 1 0    0    0x200000000..0x27fffffff  0x200000000   2242-03-16..2310-04-04
* 1 1    1    0x280000000..0x2ffffffff  0x300000000   2310-04-04..2378-04-22
* 1 1    0    0x300000000..0x37fffffff  0x300000000   2378-04-22..2446-05-10

Note that the time limits are not correct for deletion times.

Added a warn when an inode cannot be extended to incorporate an
extended timestamp.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: tytso@mit.edu
Cc: adilger.kernel@dilger.ca
Cc: linux-ext4@vger.kernel.org
2019-08-30 07:27:18 -07:00
Deepa Dinamani
d5c6e2d518 9p: Fill min and max timestamps in sb
struct p9_wstat and struct p9_stat_dotl indicate that the
wire transport uses u32 and u64 fields for timestamps.
Fill in the appropriate limits to avoid inconsistencies in
the vfs cached inode times when timestamps are outside the
permitted range.

Note that the upper bound for V9FS_PROTO_2000L is retained as S64_MAX.
This is because that is the upper bound supported by vfs.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: ericvh@gmail.com
Cc: lucho@ionkov.net
Cc: asmadeus@codewreck.org
Cc: v9fs-developer@lists.sourceforge.net
2019-08-30 07:27:18 -07:00
Deepa Dinamani
22b139691f fs: Fill in max and min timestamps in superblock
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Even though some filesystems are read-only, fill in the
timestamps to reflect the on-disk representation.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-By: Tigran Aivazian <aivazian.tigran@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: aivazian.tigran@gmail.com
Cc: al@alarsen.net
Cc: coda@cs.cmu.edu
Cc: darrick.wong@oracle.com
Cc: dushistov@mail.ru
Cc: dwmw2@infradead.org
Cc: hch@infradead.org
Cc: jack@suse.com
Cc: jaharkes@cs.cmu.edu
Cc: luisbg@kernel.org
Cc: nico@fluxnic.net
Cc: phillip@squashfs.org.uk
Cc: richard@nod.at
Cc: salah.triki@gmail.com
Cc: shaggy@kernel.org
Cc: linux-xfs@vger.kernel.org
Cc: codalist@coda.cs.cmu.edu
Cc: linux-ext4@vger.kernel.org
Cc: linux-mtd@lists.infradead.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: reiserfs-devel@vger.kernel.org
2019-08-30 07:27:17 -07:00
Deepa Dinamani
42e729b9dd utimes: Clamp the timestamps before update
POSIX is ambiguous on the behavior of timestamps for
futimens, utimensat and utimes. Whether to return an
error or silently clamp a timestamp beyond the range
supported by the underlying filesystems is not clear.

POSIX.1 section for futimens, utimensat and utimes says:
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html)

The file's relevant timestamp shall be set to the greatest
value supported by the file system that is not greater
than the specified time.

If the tv_nsec field of a timespec structure has the special
value UTIME_NOW, the file's relevant timestamp shall be set
to the greatest value supported by the file system that is
not greater than the current time.

[EINVAL]
    A new file timestamp would be a value whose tv_sec
    component is not a value supported by the file system.

The patch chooses to clamp the timestamps according to the
filesystem timestamp ranges and does not return an error.
This is in line with the behavior of utime syscall also
since the POSIX page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html)
for utime does not mention returning an error or clamping like above.

Same for utimes http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30 07:27:17 -07:00
Deepa Dinamani
f8b92ba67c mount: Add mount warning for impending timestamp expiry
The warning reuses the uptime max of 30 years used by
settimeofday().

Note that the warning is only emitted for writable filesystem mounts
through the mount syscall. Automounts do not have the same warning.

Print out the warning in human readable format using the struct tm.
After discussion with Arnd Bergmann, we chose to print only the year number.
The raw s_time_max is also displayed, and the user can easily decode
it e.g. "date -u -d @$((0x7fffffff))". We did not want to consolidate
struct rtc_tm and struct tm just to print the date using a format specifier
as part of this series.
Given that the rtc_tm is not compiled on all architectures, this is not a
trivial patch. This can be added in the future.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30 07:27:17 -07:00
Deepa Dinamani
3818c1907a timestamp_truncate: Replace users of timespec64_trunc
Update the inode timestamp updates to use timestamp_truncate()
instead of timespec64_trunc().

The change was mostly generated by the following coccinelle
script.

virtual context
virtual patch

@r1 depends on patch forall@
struct inode *inode;
identifier i_xtime =~ "^i_[acm]time$";
expression e;
@@

inode->i_xtime =
- timespec64_trunc(
+ timestamp_truncate(
...,
- e);
+ inode);

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: adrian.hunter@intel.com
Cc: dedekind1@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: hch@lst.de
Cc: jaegeuk@kernel.org
Cc: jlbec@evilplan.org
Cc: richard@nod.at
Cc: tj@kernel.org
Cc: yuchao0@huawei.com
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: linux-mtd@lists.infradead.org
2019-08-30 07:27:17 -07:00
Deepa Dinamani
50e17c000c vfs: Add timestamp_truncate() api
timespec_trunc() function is used to truncate a
filesystem timestamp to the right granularity.
But, the function does not clamp tv_sec part of the
timestamps according to the filesystem timestamp limits.

The replacement api: timestamp_truncate() also alters the
signature of the function to accommodate filesystem
timestamp clamping according to flesystem limits.

Note that the tv_nsec part is set to 0 if tv_sec is not within
the range supported for the filesystem.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30 07:27:17 -07:00
Deepa Dinamani
188d20bcd1 vfs: Add file timestamp range support
Add fields to the superblock to track the min and max
timestamps supported by filesystems.

Initially, when a superblock is allocated, initialize
it to the max and min values the fields can hold.
Individual filesystems override these to match their
actual limits.

Pseudo filesystems are assumed to always support the
min and max allowable values for the fields.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30 07:27:17 -07:00
Tejun Heo
3a8e9ac89e writeback: add tracepoints for cgroup foreign writebacks
cgroup foreign inode handling has quite a bit of heuristics and
internal states which sometimes makes it difficult to understand
what's going on.  Add tracepoints to improve visibility.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-30 07:42:49 -06:00
Gao Xiang
097a802ae1 erofs: reduntant assignment in __erofs_get_meta_page()
As Joe Perches suggested [1],
 		err = bio_add_page(bio, page, PAGE_SIZE, 0);
-		if (unlikely(err != PAGE_SIZE)) {
+		if (err != PAGE_SIZE) {
 			err = -EFAULT;
 			goto err_out;
 		}

The initial assignment to err is odd as it's not
actually an error value -E<FOO> but a int size
from a unsigned int len.

Here the return is either 0 or PAGE_SIZE.

This would be more legible to me as:

		if (bio_add_page(bio, page, PAGE_SIZE, 0) != PAGE_SIZE) {
			err = -EFAULT;
			goto err_out;
		}

[1] https://lore.kernel.org/r/74c4784319b40deabfbaea92468f7e3ef44f1c96.camel@perches.com/
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190829171741.225219-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-30 09:02:10 +02:00
Gao Xiang
8d8a09b093 erofs: remove all likely/unlikely annotations
As Dan Carpenter suggested [1], I have to remove
all erofs likely/unlikely annotations.

[1] https://lore.kernel.org/linux-fsdevel/20190829154346.GK23584@kadam/
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190829163827.203274-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-30 09:02:02 +02:00
Darrick J. Wong
e7ee96dfb8 xfs: remove all *_ITER_ABORT values
Use -ECANCELED to signal "stop iterating" instead of these magical
*_ITER_ABORT values, since it's duplicative.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-29 21:22:41 -07:00
Linus Torvalds
2653810049 Merge tag '5.3-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "A few small SMB3 fixes, and a larger one to fix various older string
  handling functions"

* tag '5.3-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module number
  cifs: replace various strncpy with strscpy and similar
  cifs: Use kzfree() to zero out the password
  cifs: set domainName when a domain-key is used in multiuser
2019-08-29 17:51:23 -07:00
J. Bruce Fields
2b86e3aaf9 nfsd: eliminate an unnecessary acl size limit
We're unnecessarily limiting the size of an ACL to less than what most
filesystems will support.  Some users do hit the limit and it's
confusing and unnecessary.

It still seems prudent to impose some limit on the number of ACEs the
client gives us before passing it straight to kmalloc().  So, let's just
limit it to the maximum number that would be possible given the amount
of data left in the argument buffer.

That will still leave one limit beyond whatever the filesystem imposes:
the client and server negotiate a limit on the size of a request, which
we have to respect.

But we're no longer imposing any additional arbitrary limit.

struct nfs4_ace is 20 bytes on my system and the maximum call size we'll
negotiate is about a megabyte, so in practice this is limiting the
allocation here to about a megabyte.

Reported-by: "de Vandiere, Louis" <louis.devandiere@atos.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-28 21:13:45 -04:00
Eric Sandeen
7f313eda8f xfs: log proper length of btree block in scrub/repair
xfs_trans_log_buf() takes a final argument of the last byte to
log in the buffer; b_length is in basic blocks, so this isn't
the correct last byte.  Fix it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-28 08:31:02 -07:00
Darrick J. Wong
ffb5696f75 xfs: reinitialize rm_flags when unpacking an offset into an rmap irec
In xfs_rmap_irec_offset_unpack, we should always clear the contents of
rm_flags before we begin unpacking the encoded (ondisk) offset into the
incore rm_offset and incore rm_flags fields.  Remove the open-coded
field zeroing as this encourages api misuse.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28 08:31:02 -07:00
Darrick J. Wong
3e08f42ae7 xfs: remove unnecessary int returns from deferred bmap functions
Remove the return value from the functions that schedule deferred bmap
operations since they never fail and do not return status.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28 08:31:02 -07:00
Darrick J. Wong
74b4c5d4a9 xfs: remove unnecessary int returns from deferred refcount functions
Remove the return value from the functions that schedule deferred
refcount operations since they never fail and do not return status.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28 08:31:02 -07:00
Darrick J. Wong
bc46ac6471 xfs: remove unnecessary int returns from deferred rmap functions
Remove the return value from the functions that schedule deferred rmap
operations since they never fail and do not return status.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28 08:31:01 -07:00
Darrick J. Wong
2ca09177ab xfs: remove unnecessary parameter from xfs_iext_inc_seq
This function doesn't use the @state parameter, so get rid of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28 08:31:01 -07:00
Darrick J. Wong
b521c89027 xfs: fix sign handling problem in xfs_bmbt_diff_two_keys
In xfs_bmbt_diff_two_keys, we perform a signed int64_t subtraction with
two unsigned 64-bit quantities.  If the second quantity is actually the
"maximum" key (all ones) as used in _query_all, the subtraction
effectively becomes addition of two positive numbers and the function
returns incorrect results.  Fix this with explicit comparisons of the
unsigned values.  Nobody needs this now, but the online repair patches
will need this to work properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28 08:31:01 -07:00
Darrick J. Wong
7380e8fec1 xfs: don't return _QUERY_ABORT from xfs_rmap_has_other_keys
The xfs_rmap_has_other_keys helper aborts the iteration as soon as it
has an answer.  Don't let this abort leak out to callers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28 08:31:01 -07:00
Darrick J. Wong
c94613feef xfs: fix maxicount division by zero error
In xfs_ialloc_setup_geometry, it's possible for a malicious/corrupt fs
image to set an unreasonably large value for sb_inopblog which will
cause ialloc_blks to be zero.  If sb_imax_pct is also set, this results
in a division by zero error in the second do_div call.  Therefore, force
maxicount to zero if ialloc_blks is zero.

Note that the kernel metadata verifiers will catch the garbage inopblog
value and abort the fs mount long before it tries to set up the inode
geometry; this is needed to avoid a crash in xfs_db while setting up the
xfs_mount structure.

Found by fuzzing sb_inopblog to 122 in xfs/350.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2019-08-28 08:31:01 -07:00
zhangyi (F)
9ba55543fc ext4: fix integer overflow when calculating commit interval
If user specify a large enough value of "commit=" option, it may trigger
signed integer overflow which may lead to sbi->s_commit_interval becomes
a large or small value, zero in particular.

UBSAN: Undefined behaviour in ../fs/ext4/super.c:1592:31
signed integer overflow:
536870912 * 1000 cannot be represented in type 'int'
[...]
Call trace:
[...]
[<ffffff9008a2d120>] ubsan_epilogue+0x34/0x9c lib/ubsan.c:166
[<ffffff9008a2d8b8>] handle_overflow+0x228/0x280 lib/ubsan.c:197
[<ffffff9008a2d95c>] __ubsan_handle_mul_overflow+0x4c/0x68 lib/ubsan.c:218
[<ffffff90086d070c>] handle_mount_opt fs/ext4/super.c:1592 [inline]
[<ffffff90086d070c>] parse_options+0x1724/0x1a40 fs/ext4/super.c:1773
[<ffffff90086d51c4>] ext4_remount+0x2ec/0x14a0 fs/ext4/super.c:4834
[...]

Although it is not a big deal, still silence the UBSAN by limit the
input value.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-08-28 11:25:01 -04:00
Yang Guo
520f897a35 ext4: use percpu_counters for extent_status cache hits/misses
@es_stats_cache_hits and @es_stats_cache_misses are accessed frequently in
ext4_es_lookup_extent function, it would influence the ext4 read/write
performance in NUMA system. Let's optimize it using percpu_counter,
it is profitable for the performance.

The test command is as below:
fio -name=randwrite -numjobs=8 -filename=/mnt/test1 -rw=randwrite
-ioengine=libaio -direct=1 -iodepth=64 -sync=0 -norandommap
-group_reporting -runtime=120 -time_based -bs=4k -size=5G

And the result is better 10% than the initial implement:
without the patch,IOPS=197k, BW=770MiB/s (808MB/s)(90.3GiB/120002msec)
with the patch,  IOPS=218k, BW=852MiB/s (894MB/s)(99.9GiB/120002msec)

Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Yang Guo <guoyang2@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
2019-08-28 11:19:23 -04:00
zhangyi (F)
7727ae5297 ext4: fix potential use after free after remounting with noblock_validity
Remount process will release system zone which was allocated before if
"noblock_validity" is specified. If we mount an ext4 file system to two
mountpoints with default mount options, and then remount one of them
with "noblock_validity", it may trigger a use after free problem when
someone accessing the other one.

 # mount /dev/sda foo
 # mount /dev/sda bar

User access mountpoint "foo"   |   Remount mountpoint "bar"
                               |
ext4_map_blocks()              |   ext4_remount()
check_block_validity()         |   ext4_setup_system_zone()
ext4_data_block_valid()        |   ext4_release_system_zone()
                               |   free system_blks rb nodes
access system_blks rb nodes    |
trigger use after free         |

This problem can also be reproduced by one mountpint, At the same time,
add_system_zone() can get called during remount as well so there can be
racing ext4_data_block_valid() reading the rbtree at the same time.

This patch add RCU to protect system zone from releasing or building
when doing a remount which inverse current "noblock_validity" mount
option. It assign the rbtree after the whole tree was complete and
do actual freeing after rcu grace period, avoid any intermediate state.

Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-08-28 11:13:24 -04:00
Steve French
36e337744c cifs: update internal module number
To 2.22

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27 17:29:56 -05:00
Ronnie Sahlberg
340625e618 cifs: replace various strncpy with strscpy and similar
Using strscpy is cleaner, and avoids some problems with
handling maximum length strings.  Linus noticed the
original problem and Aurelien pointed out some additional
problems. Fortunately most of this is SMB1 code (and
in particular the ASCII string handling older, which
is less common).

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27 17:25:12 -05:00
Dan Carpenter
478228e57f cifs: Use kzfree() to zero out the password
It's safer to zero out the password so that it can never be disclosed.

Fixes: 0c219f5799c7 ("cifs: set domainName when a domain-key is used in multiuser")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27 16:44:27 -05:00
Ronnie Sahlberg
f2aee329a6 cifs: set domainName when a domain-key is used in multiuser
RHBZ: 1710429

When we use a domain-key to authenticate using multiuser we must also set
the domainnmame for the new volume as it will be used and passed to the server
in the NTLMSSP Domain-name.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27 16:44:24 -05:00
Linus Torvalds
9e8312f5e1 Merge tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Fix a page lock leak in nfs_pageio_resend()

   - Ensure O_DIRECT reports an error if the bytes read/written is 0

   - Don't handle errors if the bind/connect succeeded

   - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was
     invalidat ed"

  Bugfixes:

   - Don't refresh attributes with mounted-on-file information

   - Fix return values for nfs4_file_open() and nfs_finish_open()

   - Fix pnfs layoutstats reporting of I/O errors

   - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort
     for soft I/O errors when the user specifies a hard mount.

   - Various fixes to the error handling in sunrpc

   - Don't report writepage()/writepages() errors twice"

* tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: remove set but not used variable 'mapping'
  NFSv2: Fix write regression
  NFSv2: Fix eof handling
  NFS: Fix writepage(s) error handling to not report errors twice
  NFS: Fix spurious EIO read errors
  pNFS/flexfiles: Don't time out requests on hard mounts
  SUNRPC: Handle connection breakages correctly in call_status()
  Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
  SUNRPC: Handle EADDRINUSE and ENOBUFS correctly
  pNFS/flexfiles: Turn off soft RPC calls
  SUNRPC: Don't handle errors if the bind/connect succeeded
  NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()
  NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup
  NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0
  NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend()
  NFSv4: Fix return value in nfs_finish_open()
  NFSv4: Fix return values for nfs4_file_open()
  NFS: Don't refresh attributes with mounted-on-file information
2019-08-27 13:22:57 -07:00
Hristo Venev
75b28affdd io_uring: allocate the two rings together
Both the sq and the cq rings have sizes just over a power of two, and
the sq ring is significantly smaller. By bundling them in a single
alllocation, we get the sq ring for free.

This also means that IORING_OFF_SQ_RING and IORING_OFF_CQ_RING now mean
the same thing. If we indicate this to userspace, we can save a mmap
call.

Signed-off-by: Hristo Venev <hristo@venev.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 10:42:02 -06:00
John Hubbard
27c4d3a325 fs/io_uring.c: convert put_page() to put_user_page*()
For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca
("mm: introduce put_user_page*(), placeholder versions").

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 10:41:41 -06:00
Tejun Heo
d62241c7a4 writeback, memcg: Implement cgroup_writeback_by_id()
Implement cgroup_writeback_by_id() which initiates cgroup writeback
from bdi and memcg IDs.  This will be used by memcg foreign inode
flushing.

v2: Use wb_get_lookup() instead of wb_get_create() to avoid creating
    spurious wbs.

v3: Interpret 0 @nr as 1.25 * nr_dirty to implement best-effort
    flushing while avoding possible livelocks.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 09:22:38 -06:00
Tejun Heo
5b9cce4c7e writeback: Generalize and expose wb_completion
wb_completion is used to track writeback completions.  We want to use
it from memcg side for foreign inode flushes.  This patch updates it
to remember the target waitq instead of assuming bdi->wb_waitq and
expose it outside of fs-writeback.c.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-27 09:22:38 -06:00
YueHaibing
99300a8526 NFS: remove set but not used variable 'mapping'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/nfs/write.c: In function nfs_page_async_flush:
fs/nfs/write.c:609:24: warning: variable mapping set but not used [-Wunused-but-set-variable]

It is not use since commit aefb623c422e ("NFS: Fix
writepage(s) error handling to not report errors twice")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-27 10:24:56 -04:00
Trond Myklebust
d33d4beb52 NFSv2: Fix write regression
Ensure we update the write result count on success, since the
RPC call itself does not do so.

Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
2019-08-27 10:24:56 -04:00
Trond Myklebust
71affe9be4 NFSv2: Fix eof handling
If we received a reply from the server with a zero length read and
no error, then that implies we are at eof.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-27 10:24:56 -04:00
Steven J. Magnani
c3367a1b47 udf: augment UDF permissions on new inodes
Windows presents files created within Linux as read-only, even when
permissions in Linux indicate the file should be writable.

UDF defines a slightly different set of basic file permissions than Linux.
Specifically, UDF has "delete" and "change attribute" permissions for each
access class (user/group/other). Linux has no equivalents for these.

When the Linux UDF driver creates a file (or directory), no UDF delete or
change attribute permissions are granted. The lack of delete permission
appears to cause Windows to mark an item read-only when its permissions
otherwise indicate that it should be read-write.

Fix this by having UDF delete permissions track Linux write permissions.
Also grant UDF change attribute permission to the owner when creating a
new inode.

Reported by: Ty Young
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Link: https://lore.kernel.org/r/20190827121359.9954-1-steve@digidescorp.com
Signed-off-by: Jan Kara <jack@suse.cz>
2019-08-27 15:38:46 +02:00
Darrick J. Wong
519e5869d5 xfs: bmap scrub should only scrub records once
The inode block mapping scrub function does more work for btree format
extent maps than is absolutely necessary -- first it will walk the bmbt
and check all the entries, and then it will load the incore tree and
check every entry in that tree, possibly for a second time.

Simplify the code and decrease check runtime by separating the two
responsibilities.  The bmbt walk will make sure the incore extent
mappings are loaded, check the shape of the bmap btree (via xchk_btree)
and check that every bmbt record has a corresponding incore extent map;
and the incore extent map walk takes all the responsibility for checking
the mapping records and cross referencing them with other AG metadata.

This enables us to clean up some messy parameter handling and reduce
redundant code.  Rename a few functions to make the split of
responsibilities clearer.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-08-26 17:43:15 -07:00
zhengbin
71912e08e0 xfs: remove excess function parameter description in 'xfs_btree_sblock_v5hdr_verify'
Fixes gcc warning:

fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'max_recs' description in 'xfs_btree_sblock_v5hdr_verify'
fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'pag_max_level' description in 'xfs_btree_sblock_v5hdr_verify'

Fixes: c5ab131ba0 ("libxfs: refactor short btree block verification")
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-26 17:43:15 -07:00
Dave Chinner
f8f9ee4794 xfs: add kmem_alloc_io()
Memory we use to submit for IO needs strict alignment to the
underlying driver contraints. Worst case, this is 512 bytes. Given
that all allocations for IO are always a power of 2 multiple of 512
bytes, the kernel heap provides natural alignment for objects of
these sizes and that suffices.

Until, of course, memory debugging of some kind is turned on (e.g.
red zones, poisoning, KASAN) and then the alignment of the heap
objects is thrown out the window. Then we get weird IO errors and
data corruption problems because drivers don't validate alignment
and do the wrong thing when passed unaligned memory buffers in bios.

TO fix this, introduce kmem_alloc_io(), which will guaranteeat least
512 byte alignment of buffers for IO, even if memory debugging
options are turned on. It is assumed that the minimum allocation
size will be 512 bytes, and that sizes will be power of 2 mulitples
of 512 bytes.

Use this everywhere we allocate buffers for IO.

This no longer fails with log recovery errors when KASAN is enabled
due to the brd driver not handling unaligned memory buffers:

# mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-26 17:43:15 -07:00