Commit Graph

56429 Commits

Author SHA1 Message Date
Darrick J. Wong
2e6f27561b xfs: cross-reference inode btrees during scrub
Cross-reference the inode btrees with the other metadata when we
scrub the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:45 -08:00
Darrick J. Wong
e1134b12fd xfs: cross-reference bnobt records with cntbt
Scrub should make sure that each bnobt record has a corresponding
cntbt record.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:45 -08:00
Darrick J. Wong
52dc4b44af xfs: cross-reference with the bnobt
When we're scrubbing various btrees, cross-reference the records with
the bnobt to ensure that we don't also think the space is free.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:45 -08:00
Darrick J. Wong
166d76410d xfs: introduce scrubber cross-referencing stubs
Create some stubs that will be used to cross-reference metadata records.
The actual cross-referencing will be filled in by subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:45 -08:00
Darrick J. Wong
858333dcf0 xfs: check btree block ownership with bnobt/rmapbt when scrubbing btree
When scanning a metadata btree block, cross-reference the block location
with the free space btree and the reverse mapping btree to ensure that
the rmapbt knows about the block and the bnobt does not.  Add a
mechanism to defer checks when we happen to be scanning the bnobt/rmapbt
itself because it's less efficient to repeatedly clone and destroy the
cursor.

This patch provides the framework to make btree block owner checks
happen; the actual meat will be added in subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:45 -08:00
Darrick J. Wong
9a7e269566 xfs: fix a few erroneous process_error calls in the scrubbers
There are a few places where we make a libxfs api call on behalf of some
object other than the one we're scrubbing but inadvertently call the
regular process_error function.  When this happens we mark the object
corrupt even though it was corruption in /some other/ object that
actually produced the -EFSCORRUPTED code.  The correct output flag for
these situations is SCRUB_OFLAG_XFAIL, not SCRUB_OFLAG_CORRUPT, so fix
this now that we also have a helper to set these.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:45 -08:00
Darrick J. Wong
64b12563b2 xfs: set up scrub cross-referencing helpers
Create some helper functions that we'll use later to deal with problems
we might encounter while cross referencing metadata with other metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:44 -08:00
Darrick J. Wong
49db55eca5 xfs: add scrub cross-referencing helpers for the refcount btrees
Add a couple of functions to the refcount btrees that will be used
to cross-reference metadata against the refcountbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:44 -08:00
Darrick J. Wong
ed7c52d4bf xfs: add scrub cross-referencing helpers for the rmap btrees
Add a couple of functions to the rmap btrees that will be used
to cross-reference metadata against the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:44 -08:00
Darrick J. Wong
2e001266b6 xfs: add scrub cross-referencing helpers for the inode btrees
Add a couple of functions to the inode btrees that will be used
to cross-reference metadata against the inobt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:44 -08:00
Darrick J. Wong
ce1d802e6a xfs: add scrub cross-referencing helpers for the free space btrees
Add a couple of functions to the free space btrees that will be used
to cross-reference metadata against the bnobt/cntbt, and a generic
btree function that provides the real implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17 21:00:44 -08:00
Rock Lee
b3e7383937 ubifs: remove error message in ubifs_xattr_get
There is a situation that other modules, like overlayfs, try to get
xattr value with a small buffer, if they get -ERANGE, they will try
again with the proper buffer size. No need to report an error.

Signed-off-by: Rock Lee <rli@sierrawireless.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-18 00:18:49 +01:00
Eric Biggers
252153ba51 ubifs: switch to fscrypt_prepare_setattr()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17 21:48:05 +01:00
Eric Biggers
a0b3ccd963 ubifs: switch to fscrypt_prepare_lookup()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17 21:48:04 +01:00
Eric Biggers
0c1ad5242d ubifs: switch to fscrypt_prepare_rename()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17 21:48:04 +01:00
Eric Biggers
5653878c8c ubifs: switch to fscrypt_prepare_link()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17 21:48:03 +01:00
Eric Biggers
7e35c4dac3 ubifs: switch to fscrypt_file_open()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17 21:48:03 +01:00
Geert Uytterhoeven
c877154d30 ubifs: Fix uninitialized variable in search_dh_cookie()
fs/ubifs/tnc.c: In function ‘search_dh_cookie’:
fs/ubifs/tnc.c:1893: warning: ‘err’ is used uninitialized in this function

Indeed, err is always used uninitialized.

According to an original review comment from Hyunchul, acknowledged by
Richard, err should be initialized to -ENOENT to avoid the first call to
tnc_next().  But we can achieve the same by reordering the code.

Fixes: 781f675e2d ("ubifs: Fix unlink code wrt. double hash lookups")
Reported-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17 19:28:53 +01:00
Andreas Gruenbacher
bdba0d5ec1 Turn gfs2_block_truncate_page into gfs2_block_zero_range
Turn gfs2_block_truncate_page into a function that zeroes a range within
a block rather than only the end of a block.  This will be used for
cleaning the end of the first partial block and the start of the last
partial block when punching a hole in a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:53 -07:00
Andreas Gruenbacher
cb7f0903ef gfs2: Improve non-recursive delete algorithm
In rare cases, the current non-recursive delete algorithm doesn't
deallocate empty intermediary indirect blocks.  This should have very
little practical effect, but deallocating all blocks correctly should
still be preferable as it is cleaner and easier to validate.

The fix consists of using the first block to deallocate to compute the
start marker of the truncate point instead of the last block that needs
to be kept.  With that change, computing which indirect blocks are still
needed becomes relatively easy.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:52 -07:00
Andreas Gruenbacher
c3ce5aa9b0 gfs2: Fix metadata read-ahead during truncate
The metadata read-ahead algorithm broke when switching from recursive to
non-recursive delete: the current algorithm reads ahead blocks at height
N - 1 while deallocating the blocks at hight N.  However, deallocating
the blocks at height N requires a complete walk of the metadata tree,
not only down to height N - 1.  Consequently, all blocks below height
N - 1 will be accessed without read-ahead.

Fix this by issuing read-aheads as early as possible, after each
metapath lookup.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:50 -07:00
Andreas Gruenbacher
e8b43fe0c1 gfs2: Clean up {lookup,fillup}_metapath
Split out the entire lookup loop from lookup_metapath and
fillup_metapath.  Make both functions return the actual height in
mp->mp_aheight, and return 0 on success.  Handle lookup errors properly
in trunc_dealloc.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:48 -07:00
Andreas Gruenbacher
e7fdf00406 gfs2: Remove minor gfs2_journaled_truncate inefficiencies
First, this function truncates the file in chunks.  When the original
file size isn't block aligned, each chunk that is truncated will remain
be misaligned.  This is inefficient.

Second, this function doesn't recognize where holes are, so it loops
through them.  For each chunk of a hole, it creates a new transaction.
At least avoid creating another transactions whe the current one is
still empty.  (An better fix would be to skip large holes, of course.)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:47 -07:00
Andreas Gruenbacher
8b5860a35c gfs2: truncate: Remove unnecessary oldsize parameters
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:45 -07:00
Andreas Gruenbacher
80990f404d gfs2: Clean up trunc_start error path
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:42 -07:00
Andreas Gruenbacher
da5eb9cdda gfs2: Remove pointless BUG_ON
The current transaction is being dereferenced before asserting that is
not NULL; that isn't going to help.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:35 -07:00
Steven Whitehouse
90bcab998d gfs2: Add gfs2_blk2rgrpd comment and fix incorrect use
Document when to use gfs2_blk2rgrpd for "inexact" resource group
matching.  Based on that, fix an incorrect use of gfs2_blk2rgrpd in
sweep_bh_for_rgrps.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:34:24 -07:00
Jaegeuk Kim
7c2e59632b f2fs: add resgid and resuid to reserve root blocks
This patch adds mount options to reserve some blocks via resgid=%u,resuid=%u.
It only activates with reserve_root=%u.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:40:02 -08:00
Yufen Yu
578c647879 f2fs: implement cgroup writeback support
Cgroup writeback requires explicit support from the filesystem.
f2fs's data and node writeback IOs go through __write_data_page,
which sets fio for submiting IOs. So, we add io_wbc for fio,
associate bios with blkcg by invoking wbc_init_bio() and
account IOs issuing by wbc_account_io().
In addtion, f2fs_fill_super() is updated to set SB_I_CGROUPWB.

Meta writeback IOs is left alone by this patch and will always be
attributed to the root cgroup.

The results show that f2fs can throttle writeback nicely for
data writing and file creating.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:40:01 -08:00
Chao Yu
bffa8d3b00 f2fs: remove unused pend_list_tag
In commit 78997b569f ("f2fs: split discard policy"), we have get rid
of using pend_list_tag field in struct discard_cmd_control, but forgot
to remove it, now do it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:40:00 -08:00
Chao Yu
49c60c67d2 f2fs: avoid high cpu usage in discard thread
We take very long time to finish generic/476, this is because we will
check consistence of all discard entries in global rb tree while
traversing all different granularity pending lists, even when the list
is empty, in order to avoid that unneeded overhead, we have to skip
the check when coming up an empty list.

generic/476 time consumption:
					cost
Before patch & w/o consistence check	57s
Before patch & w/ consistence check	1426s
After patch				78s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:40:00 -08:00
Wei Yongjun
94b1e10e74 f2fs: make local functions static
Fixes the following sparse warnings:

fs/f2fs/segment.c:887:6: warning:
 symbol '__check_sit_bitmap' was not declared. Should it be static?
fs/f2fs/segment.c:1327:6: warning:
 symbol 'f2fs_wait_discard_bio' was not declared. Should it be static?
fs/f2fs/super.c:1661:5: warning:
 symbol 'f2fs_get_projid' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:59 -08:00
Jaegeuk Kim
7e65be49ed f2fs: add reserved blocks for root user
This patch allows root to reserve some blocks via mount option.

"-o reserve_root=N" means N x 4KB-sized blocks for root only.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:58 -08:00
Yunlong Song
2c1905042c f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is
NODE, e.g., recover inode has missing xattr flag and the blkaddr is in
the xattr range. Since fsck.f2fs does not check the recovery nodes, this
will cause __f2fs_replace_block change the curseg of node and do the
update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a
result, when recovery process write checkpoint and sync nodes, the
next_blkoff of curseg is used in the segment bit map, then it will
cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:57 -08:00
Yunlei He
1eca05aa9d f2fs: update inode info to inode page for new file
After checkpoint,
 1. creat a new file A ,(with dirty inode && dirty inode page && xattr info)
 2. backgroud wb write back file A inode page (without update from inode cache)
 3. fsync file A, write back inode page of file A with inode cache info
 4. sudden power off before new checkpoint

In this case, recovery process will try to recover a zero inode
page. Inline xattr flag of file A will be miss and xattr info
will be taken as blkaddr index.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:56 -08:00
Jaegeuk Kim
f66c027ead f2fs: show precise # of blocks that user/root can use
Let's show precise # of blocks that user/root can use through bavail and bfree
respectively.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:55 -08:00
Brian Foster
c468562879 xfs: cancel tx on xfs_defer_finish() error during xattr set/remove
Chris Dunlop reports a problem where an xattr operation fails,
reports the following error to syslog and hangs during unmount:

 ================================================
 [ BUG: lock held when returning to user space! ]
 ...
 ------------------------------------------------
 <PID> is leaving the kernel with locks still held!
 1 lock held by <PID>:
  #0:  (sb_internal){......}, at: [<ffffffffa07692a3>] xfs_trans_alloc+0xe3/0x130 [xfs]

The failure/shutdown occurs during deferred ops processing which
leads to an error return from xfs_defer_finish() via
xfs_attr_leaf_addname(). While the root cause of the failure is
unknown corruption, the cause of the subsequent BUG above and
unmount hang is failure to cancel the transaction before returning
to userspace.

The transaction is not cancelled because the out_defer_cancel error
handling paths in the xfs_attr_[leaf|node]_[add|remove]name()
functions clear args.trans without releasing the transaction. The
callers therefore lose the reference to the transaction and fail to
cancel it.

Since xfs_attr_[set|remove]() always cancel args.trans when != NULL
and xfs_defer_finish()->...->xfs_trans_roll() should always return
with a valid transaction, update the leaf/node xattr functions to
not reset args.trans in the error path responsible for cancelling
deferred ops.

Reported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-16 14:53:28 -08:00
J. Bruce Fields
1b8d97b0a8 NFS: commit direct writes even if they fail partially
If some of the WRITE calls making up an O_DIRECT write syscall fail,
we neglect to commit, even if some of the WRITEs succeed.

We also depend on the commit code to free the reference count on the
nfs_page taken in the "if (request_commit)" case at the end of
nfs_direct_write_completion().  The problem was originally noticed
because ENOSPC's encountered partway through a write would result in a
closed file being sillyrenamed when it should have been unlinked.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-16 10:13:23 -05:00
Arnd Bergmann
f96adf1ea0 nfs: remove unused label in nfs_encode_fh()
The only reference to the label got removed, so we now get
a harmless compiler warning:

fs/nfs/export.c: In function 'nfs_encode_fh':
fs/nfs/export.c:58:1: error: label 'out' defined but not used [-Werror=unused-label]

Fixes: aaa1500894 ("nfs: remove dead code from nfs_encode_fh()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-16 10:12:49 -05:00
David Windsor
de04644904 cifs: Define usercopy region in cifs_request slab cache
CIFS request buffers, stored in the cifs_request slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/cifs/cifsfs.c:
        cifs_init_request_bufs():
            ...
            cifs_req_poolp = mempool_create_slab_pool(cifs_min_rcv,
                                                      cifs_req_cachep);

    fs/cifs/misc.c:
        cifs_buf_get():
            ...
            ret_buf = mempool_alloc(cifs_req_poolp, GFP_NOFS);
            ...
            return ret_buf;

In support of usercopy hardening, this patch defines a region in the
cifs_request slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:57 -08:00
David Windsor
e9a0561b7c vxfs: Define usercopy region in vxfs_inode slab cache
vxfs symlink pathnames, stored in struct vxfs_inode_info field
vii_immed.vi_immed and therefore contained in the vxfs_inode slab cache,
need to be copied to/from userspace.

cache object allocation:
    fs/freevxfs/vxfs_super.c:
        vxfs_alloc_inode(...):
            ...
            vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL);
            ...
            return &vi->vfs_inode;

    fs/freevxfs/vxfs_inode.c:
        cxfs_iget(...):
            ...
            inode->i_link = vip->vii_immed.vi_immed;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
vxfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:57 -08:00
David Windsor
df5f3cfc52 ufs: Define usercopy region in ufs_inode_cache slab cache
The ufs symlink pathnames, stored in struct ufs_inode_info.i_u1.i_symlink
and therefore contained in the ufs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/ufs/super.c:
        ufs_alloc_inode(...):
            ...
            ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ufs/ufs.h:
        UFS_I(struct inode *inode):
            return container_of(inode, struct ufs_inode_info, vfs_inode);

    fs/ufs/namei.c:
        ufs_symlink(...):
            ...
            inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ufs_inode_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:56 -08:00
David Windsor
6b330623e5 orangefs: Define usercopy region in orangefs_inode_cache slab cache
orangefs symlink pathnames, stored in struct orangefs_inode_s.link_target
and therefore contained in the orangefs_inode_cache, need to be copied
to/from userspace.

cache object allocation:
    fs/orangefs/super.c:
        orangefs_alloc_inode(...):
            ...
            orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, ...);
            ...
            return &orangefs_inode->vfs_inode;

    fs/orangefs/orangefs-utils.c:
        exofs_symlink(...):
            ...
            inode->i_link = orangefs_inode->link_target;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
orangefs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:55 -08:00
David Windsor
2b06a9e336 exofs: Define usercopy region in exofs_inode_cache slab cache
The exofs short symlink names, stored in struct exofs_i_info.i_data and
therefore contained in the exofs_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/exofs/super.c:
        exofs_alloc_inode(...):
            ...
            oi = kmem_cache_alloc(exofs_inode_cachep, GFP_KERNEL);
            ...
            return &oi->vfs_inode;

    fs/exofs/namei.c:
        exofs_symlink(...):
            ...
            inode->i_link = (char *)oi->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
exofs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:55 -08:00
David Windsor
0fc256d3ad befs: Define usercopy region in befs_inode_cache slab cache
befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
and therefore contained in the befs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/befs/linuxvfs.c:
        befs_alloc_inode(...):
            ...
            bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
            ...
            return &bi->vfs_inode;

        befs_iget(...):
            ...
            strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
                    BEFS_SYMLINK_LEN);
            ...
            inode->i_link = befs_ino->i_data.symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
befs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Luis de Bethencourt <luisbg@kernel.org>
Cc: Salah Triki <salah.triki@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis de Bethencourt <luisbg@kernel.org>
2018-01-15 12:07:54 -08:00
David Windsor
8d2704d382 jfs: Define usercopy region in jfs_ip slab cache
The jfs symlink pathnames, stored in struct jfs_inode_info.i_inline and
therefore contained in the jfs_ip slab cache, need to be copied to/from
userspace.

cache object allocation:
    fs/jfs/super.c:
        jfs_alloc_inode(...):
            ...
            jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
            ...
            return &jfs_inode->vfs_inode;

    fs/jfs/jfs_incore.h:
        JFS_IP(struct inode *inode):
            return container_of(inode, struct jfs_inode_info, vfs_inode);

    fs/jfs/inode.c:
        jfs_iget(...):
            ...
            inode->i_link = JFS_IP(inode)->i_inline;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
jfs_ip slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2018-01-15 12:07:53 -08:00
David Windsor
85212d4e04 ext2: Define usercopy region in ext2_inode_cache slab cache
The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
therefore contained in the ext2_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/ext2/super.c:
        ext2_alloc_inode(...):
            struct ext2_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ext2/ext2.h:
        EXT2_I(struct inode *inode):
            return container_of(inode, struct ext2_inode_info, vfs_inode);

    fs/ext2/namei.c:
        ext2_symlink(...):
            ...
            inode->i_link = (char *)&EXT2_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext2_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jan Kara <jack@suse.cz>
2018-01-15 12:07:53 -08:00
David Windsor
f8dd7c7086 ext4: Define usercopy region in ext4_inode_cache slab cache
The ext4 symlink pathnames, stored in struct ext4_inode_info.i_data
and therefore contained in the ext4_inode_cache slab cache, need
to be copied to/from userspace.

cache object allocation:
    fs/ext4/super.c:
        ext4_alloc_inode(...):
            struct ext4_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    include/trace/events/ext4.h:
            #define EXT4_I(inode) \
                (container_of(inode, struct ext4_inode_info, vfs_inode))

    fs/ext4/namei.c:
        ext4_symlink(...):
            ...
            inode->i_link = (char *)&EXT4_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len)

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext4_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:52 -08:00
David Windsor
6391af6f58 vfs: Copy struct mount.mnt_id to userspace using put_user()
The mnt_id field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:51 -08:00
David Windsor
6a9b88204c vfs: Define usercopy region in names_cache slab caches
VFS pathnames are stored in the names_cache slab cache, either inline
or across an entire allocation entry (when approaching PATH_MAX). These
are copied to/from userspace, so they must be entirely whitelisted.

cache object allocation:
    include/linux/fs.h:
        #define __getname()    kmem_cache_alloc(names_cachep, GFP_KERNEL)

example usage trace:
    strncpy_from_user+0x4d/0x170
    getname_flags+0x6f/0x1f0
    user_path_at_empty+0x23/0x40
    do_mount+0x69/0xda0
    SyS_mount+0x83/0xd0

    fs/namei.c:
        getname_flags(...):
            ...
            result = __getname();
            ...
            kname = (char *)result->iname;
            result->name = kname;
            len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
            ...
            if (unlikely(len == EMBEDDED_NAME_MAX)) {
                const size_t size = offsetof(struct filename, iname[1]);
                kname = (char *)result;

                result = kzalloc(size, GFP_KERNEL);
                ...
                result->name = kname;
                len = strncpy_from_user(kname, filename, PATH_MAX);

In support of usercopy hardening, this patch defines the entire cache
object in the names_cache slab cache as whitelisted, since it may entirely
hold name strings to be copied to/from userspace.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, add usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:50 -08:00