Just for clarity. This part is inside the header, so it makes sense to
group it with the rest of the stuff in the header.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Dirty snapshot data needs to be flushed unconditionally. If they
were created before truncation, writeback should use old truncate
size/seq.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
When iov_iter type is ITER_PIPE, copy_page_to_iter() increases
the page's reference and add the page to a pipe_buffer. It also
set the pipe_buffer's ops to page_cache_pipe_buf_ops. The comfirm
callback in page_cache_pipe_buf_ops expects the page is from page
cache and uptodate, otherwise it return error.
For ceph_sync_read() case, pages are not from page cache. So we
can't call copy_page_to_iter() when iov_iter type is ITER_PIPE.
The fix is using iov_iter_get_pages_alloc() to allocate pages
for the pipe. (the code is similar to default_file_splice_read)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
For readahead/fadvise cases, caller of ceph_readpages does not
hold buffer capability. Pages can be added to page cache while
there is no buffer capability. This can cause data integrity
issue.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
try_get_cap_refs can be used as a condition in a wait_event* calls.
This is all fine until it has to call __ceph_do_pending_vmtruncate,
which in turn acquires the i_truncate_mutex. This leads to a situation
in which a task's state is !TASK_RUNNING and at the same time it's
trying to acquire a sleeping primitive. In essence a nested sleeping
primitives are being used. This causes the following warning:
WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110
ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6
CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G O 4.4.20-clouder2 #6
Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015
0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0
ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c
0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0
Call Trace:
[<ffffffff812f4409>] dump_stack+0x67/0x9e
[<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0
[<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
[<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
[<ffffffff8107767f>] __might_sleep+0x9f/0xb0
[<ffffffff81612d30>] mutex_lock+0x20/0x40
[<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph]
[<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph]
[<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph]
[<ffffffff81094370>] ? wait_woken+0xb0/0xb0
[<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph]
[<ffffffff81613f22>] ? schedule_timeout+0x202/0x260
[<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200
[<ffffffff811b46ce>] ? iput+0x9e/0x230
[<ffffffff81077632>] ? __might_sleep+0x52/0xb0
[<ffffffff81156147>] ? __might_fault+0x37/0x40
[<ffffffff8119e123>] ? cp_new_stat+0x153/0x170
[<ffffffff81198cfa>] __vfs_write+0xaa/0xe0
[<ffffffff81199369>] vfs_write+0xa9/0x190
[<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70
[<ffffffff8119a056>] SyS_write+0x46/0xa0
This happens since wait_event_interruptible can interfere with the
mutex locking code, since they both fiddle with the task state.
Fix the issue by using the newly-added nested blocking infrastructure
in 61ada528de ("sched/wait: Provide infrastructure to deal with
nested blocking")
Link: https://lwn.net/Articles/628628/
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
The length of the reply is protocol-dependent - for cephx it's
ceph_x_authorize_reply. Nothing sensible can be passed from the
messenger layer anyway.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Starting with version 5 the following properties change:
- UBIFS_FLG_DOUBLE_HASH is mandatory
- UBIFS_FLG_ENCRYPTION is optional but depdens on UBIFS_FLG_DOUBLE_HASH
- Filesystems with unknown super block flags will be rejected, this
allows us in future to add new features without raising the UBIFS
write version.
Signed-off-by: Richard Weinberger <richard@nod.at>
This feature flag indicates that all directory entry nodes have a 32bit
cookie set and therefore UBIFS is allowed to perform lookups by hash.
Signed-off-by: Richard Weinberger <richard@nod.at>
UBIFS stores a 32bit hash of every file, for traditional lookups by name
this scheme is fine since UBIFS can first try to find the file by the
hash of the filename and upon collisions it can walk through all entries
with the same hash and do a string compare.
When filesnames are encrypted fscrypto will ask the filesystem for a
unique cookie, based on this cookie the filesystem has to be able to
locate the target file again. With 32bit hashes this is impossible
because the chance for collisions is very high. Do deal with that we
store a 32bit cookie directly in the UBIFS directory entry node such
that we get a 64bit cookie (32bit from filename hash and the dent
cookie). For a lookup by hash UBIFS finds the entry by the first 32bit
and then compares the dent cookie. If it does not match, it has to do a
linear search of the whole directory and compares all dent cookies until
the correct entry is found.
Signed-off-by: Richard Weinberger <richard@nod.at>
As of now all filenames known by UBIFS are strings with a NUL
terminator. With encrypted filenames a filename can be any binary
string and the r5 function cannot search for the NUL terminator.
UBIFS always knows how long a filename is, therefore we can change
the hash function to iterate over the filename length to work
correctly with binary strings.
Signed-off-by: Richard Weinberger <richard@nod.at>
When data of a data node is compressed and encrypted
we need to store the size of the compressed data because
before encryption we may have to add padding bytes.
For the new field we consume the last two padding bytes
in struct ubifs_data_node. Two bytes are fine because
the data length is at most 4096.
Signed-off-by: Richard Weinberger <richard@nod.at>
When we're creating a new inode in UBIFS the inode is not
yet exposed and fscrypto calls ubifs_xattr_set() without
holding the inode mutex. This is okay but ubifs_xattr_set()
has to know about this.
Signed-off-by: Richard Weinberger <richard@nod.at>
When a file is moved or linked into another directory
its current crypto policy has to be compatible with the
target policy.
Signed-off-by: Richard Weinberger <richard@nod.at>
We need ->open() for files to load the crypto key.
If the no key is present and the file is encrypted,
refuse to open.
Signed-off-by: Richard Weinberger <richard@nod.at>
We need the ->open() hook to load the crypto context
which is needed for all crypto operations within that
directory.
Signed-off-by: Richard Weinberger <richard@nod.at>
fscrypto will need this function too. Also get struct ubifs_info
from the provided inode. Not all callers will have a reference to
struct ubifs_info.
Signed-off-by: Richard Weinberger <richard@nod.at>
'ubifs_fast_find_freeable()' can not return an error pointer, so this test
can be removed.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Richard Weinberger <richard@nod.at>
Right now wbuf timer has hardcoded timeouts and there is no place for
manual adjustments. Some projects / cases many need that though. Few
file systems allow doing that by respecting dirty_writeback_interval
that can be set using sysctl (dirty_writeback_centisecs).
Lowering dirty_writeback_interval could be some way of dealing with user
space apps lacking proper fsyncs. This is definitely *not* a perfect
solution but we don't have ideal (user space) world. There were already
advanced discussions on this matter, mostly when ext4 was introduced and
it wasn't behaving as ext3. Anyway, the final decision was to add some
hacks to the ext4, as trying to fix whole user space or adding new API
was pointless.
We can't (and shouldn't?) just follow ext4. We can't e.g. sync on close
as this would cause too many commits and flash wearing. On the other
hand we still should allow some trade-off between -o sync and default
wbuf timeout. Respecting dirty_writeback_interval should allow some sane
cutomizations if used warily.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Values of these fields are set during init and never modified. They are
used (read) in a single function only. There isn't really any reason to
keep them in a struct. It only makes struct just a bit bigger without
any visible gain.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The "perf_counter_reset" case has already been handled above.
Moreover "ORANGEFS_PARAM_REQUEST_OP_READAHEAD_COUNT_SIZE" is not a really
consistent.
It is likely that this (dead) code is a cut and paste left over.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
allocates string 'new' is not free'd on the exit path when
cdm_element_count <= 0. Fix this by kfree'ing it.
Fixes CoverityScan CID#1375923 "Resource Leak"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
This is exposing an existing deadlock between fsync and AIO. Until we
have the deadlock fixed, I'm pulling this one out.
This reverts commit a23eaa875f.
Signed-off-by: Chris Mason <clm@fb.com>
... to better explain its purpose after introducing in-place encryption
without bounce buffer.
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fscrypt users can now indicated if fscrypt_encrypt_page() should
use a bounce page, we can delay the bounce page pool initialization util
it is really needed. That is until fscrypt_operations has no
FS_CFLG_OWN_PAGES flag set.
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Rename the FS_CFLG_INPLACE_ENCRYPTION flag to FS_CFLG_OWN_PAGES which,
when set, indicates that the fs uses pages under its own control as
opposed to writeback pages which require locking and a bounce buffer for
encryption.
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In case of in-place encryption fscrypt_ctx was allocated but never
released. Since we don't need it for in-place encryption, we skip
allocating it.
Fixes: 1c7dcf69ee ("fscrypt: Add in-place encryption mode")
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Actually use the fs-provided index instead of always using page->index
which is only set for page-cache pages.
Fixes: 9c4bb8a3a9 ("fscrypt: Let fs select encryption index/tweak")
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The fscrypt_initalize() function isn't used outside fs/crypto, so
there's no point making it be an exported symbol.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
To avoid namespace collisions, rename get_crypt_info() to
fscrypt_get_crypt_info(). The function is only used inside the
fs/crypto directory, so declare it in the new header file,
fscrypt_private.h.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Multiple bugs were recently fixed in the "set encryption policy" ioctl.
To make it clear that fscrypt_process_policy() and fscrypt_get_policy()
implement ioctls and therefore their implementations must take standard
security and correctness precautions, rename them to
fscrypt_ioctl_set_policy() and fscrypt_ioctl_get_policy(). Make the
latter take in a struct file * to make it consistent with the former.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>