Pull percpu updates from Tejun Heo:
"These are the percpu changes for the v4.13-rc1 merge window. There are
a couple visibility related changes - tracepoints and allocator stats
through debugfs, along with __ro_after_init markings and a cosmetic
rename in percpu_counter.
Please note that the simple O(#elements_in_the_chunk) area allocator
used by percpu allocator is again showing scalability issues,
primarily with bpf allocating and freeing large number of counters.
Dennis is working on the replacement allocator and the percpu
allocator will be seeing increased churns in the coming cycles"
* 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: fix static checker warnings in pcpu_destroy_chunk
percpu: fix early calls for spinlock in pcpu_stats
percpu: resolve err may not be initialized in pcpu_alloc
percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch
percpu: add tracepoint support for percpu memory
percpu: expose statistics about percpu memory via debugfs
percpu: migrate percpu data structures to internal header
percpu: add missing lockdep_assert_held to func pcpu_free_area
mark most percpu globals as __ro_after_init
Just check and advance the errseq_t in the file before returning, and
use an errseq_t based check for writeback errors.
Other internal callers of filemap_* functions are left as-is.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Just check and advance the data errseq_t in struct file before
before returning from fsync on normal files. Internal filemap_*
callers are left as-is.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Add a call to filemap_report_wb_err at the end of ext4_sync_file. This
will ensure that we check and advance the errseq_t in the file, which
allows us to track and report errors on all open fds when they occur.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Many simple, block-based filesystems use generic_file_fsync as their
fsync operation. Some others (ext* and fat) also call this function
to handle syncing out data.
Switch this code over to use errseq_t based error reporting so that
all of these filesystems get reliable error reporting via fsync,
fdatasync and msync.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
This is a very minimal conversion to errseq_t based error tracking
for raw block device access. Just have it use the standard
file_write_and_wait_range call.
Note that there are internal callers that call sync_blockdev
and the like that are not affected by this. They'll continue
to use the AS_EIO/AS_ENOSPC flags for error reporting like
they always have for now.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Jan Kara's description for this patch is much better than mine, so I'm
quoting it verbatim here:
DAX currently doesn't set errors in the mapping when cache flushing
fails in dax_writeback_mapping_range(). Since this function can get
called only from fsync(2) or sync(2), this is actually as good as it can
currently get since we correctly propagate the error up from
dax_writeback_mapping_range() to filemap_fdatawrite()
However, in the future better writeback error handling will enable us to
properly report these errors on fsync(2) even if there are multiple file
descriptors open against the file or if sync(2) gets called before
fsync(2). So convert DAX to using standard error reporting through the
mapping.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-and-tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Most filesystems currently use mapping_set_error and
filemap_check_errors for setting and reporting/clearing writeback errors
at the mapping level. filemap_check_errors is indirectly called from
most of the filemap_fdatawait_* functions and from
filemap_write_and_wait*. These functions are called from all sorts of
contexts to wait on writeback to finish -- e.g. mostly in fsync, but
also in truncate calls, getattr, etc.
The non-fsync callers are problematic. We should be reporting writeback
errors during fsync, but many places spread over the tree clear out
errors before they can be properly reported, or report errors at
nonsensical times.
If I get -EIO on a stat() call, there is no reason for me to assume that
it is because some previous writeback failed. The fact that it also
clears out the error such that a subsequent fsync returns 0 is a bug,
and a nasty one since that's potentially silent data corruption.
This patch adds a small bit of new infrastructure for setting and
reporting errors during address_space writeback. While the above was my
original impetus for adding this, I think it's also the case that
current fsync semantics are just problematic for userland. Most
applications that call fsync do so to ensure that the data they wrote
has hit the backing store.
In the case where there are multiple writers to the file at the same
time, this is really hard to determine. The first one to call fsync will
see any stored error, and the rest get back 0. The processes with open
fds may not be associated with one another in any way. They could even
be in different containers, so ensuring coordination between all fsync
callers is not really an option.
One way to remedy this would be to track what file descriptor was used
to dirty the file, but that's rather cumbersome and would likely be
slow. However, there is a simpler way to improve the semantics here
without incurring too much overhead.
This set adds an errseq_t to struct address_space, and a corresponding
one is added to struct file. Writeback errors are recorded in the
mapping's errseq_t, and the one in struct file is used as the "since"
value.
This changes the semantics of the Linux fsync implementation such that
applications can now use it to determine whether there were any
writeback errors since fsync(fd) was last called (or since the file was
opened in the case of fsync having never been called).
Note that those writeback errors may have occurred when writing data
that was dirtied via an entirely different fd, but that's the case now
with the current mapping_set_error/filemap_check_error infrastructure.
This will at least prevent you from getting a false report of success.
The new behavior is still consistent with the POSIX spec, and is more
reliable for application developers. This patch just adds some basic
infrastructure for doing this, and ensures that the f_wb_err "cursor"
is properly set when a file is opened. Later patches will change the
existing code to use this new infrastructure for reporting errors at
fsync time.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Resetting this flag is almost certainly racy, and will be problematic
with some coming changes.
Make filemap_fdatawait_keep_errors return int, but not clear the flag(s).
Have jbd2 call it instead of filemap_fdatawait and don't attempt to
re-set the error flag if it fails.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
I noticed on xfs that I could still sometimes get back an error on fsync
on a fd that was opened after the error condition had been cleared.
The problem is that the buffer code sets the write_io_error flag and
then later checks that flag to set the error in the mapping. That flag
perisists for quite a while however. If the file is later opened with
O_TRUNC, the buffers will then be invalidated and the mapping's error
set such that a subsequent fsync will return error. I think this is
incorrect, as there was no writeback between the open and fsync.
Add a new mark_buffer_write_io_error operation that sets the flag and
the error in the mapping at the same time. Replace all calls to
set_buffer_write_io_error with mark_buffer_write_io_error, and remove
the places that check this flag in order to set the error in the
mapping.
This sets the error in the mapping earlier, at the time that it's first
detected.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
ext2 currently does a test+clear of the AS_EIO flag, which is
is problematic for some coming changes.
What we really need to do instead is call filemap_check_errors
in __generic_file_fsync after syncing out the buffers. That
will be sufficient for this case, and help other callers detect
these errors properly as well.
With that, we don't need to twiddle it in ext2.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Implement the show_options superblock op for ramfs as part of a bid to get
rid of s_options and generic_show_options() to make it easier to implement
a context-based mount where the mount options can be passed individually
over a file descriptor.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Implement the show_options superblock op for pstore as part of a bid to get
rid of s_options and generic_show_options() to make it easier to implement
a context-based mount where the mount options can be passed individually
over a file descriptor.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kees Cook <keescook@chromium.org>
cc: Anton Vorontsov <anton@enomsg.org>
cc: Colin Cross <ccross@android.com>
cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Implement the show_options superblock op for omfs as part of a bid to get
rid of s_options and generic_show_options() to make it easier to implement
a context-based mount where the mount options can be passed individually
over a file descriptor.
Note that the uid and gid should possibly be displayed relative to the
viewer's user namespace.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Bob Copeland <me@bobcopeland.com>
cc: linux-karma-devel@lists.sourceforge.net
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Implement the show_options superblock op for hugetlbfs as part of a bid to
get rid of s_options and generic_show_options() to make it easier to
implement a context-based mount where the mount options can be passed
individually over a file descriptor.
Note that the uid and gid should possibly be displayed relative to the
viewer's user namespace.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Nadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
btrfs, debugfs, reiserfs and tracefs call save_mount_options() and reiserfs
calls replace_mount_options(), but they then implement their own
->show_options() methods and don't touch s_options, rendering the saved
options unnecessary. I'm trying to eliminate s_options to make it easier
to implement a context-based mount where the mount options can be passed
individually over a file descriptor.
Remove the calls to save/replace_mount_options() call in these cases.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chris Mason <clm@fb.com>
cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Steven Rostedt <rostedt@goodmis.org>
cc: linux-btrfs@vger.kernel.org
cc: reiserfs-devel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ea_inode feature allows creating extended attributes that are up to
64k in size. Update __ext4_new_inode() to pick increased credit limits.
To avoid overallocating too many journal credits, update
__ext4_xattr_set_credits() to make a distinction between xattr create
vs update. This helps __ext4_new_inode() because all attributes are
known to be new, so we can save credits that are normally needed to
delete old values.
Also, have fscrypt specify its maximum context size so that we don't
end up allocating credits for 64k size.
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Extended attribute inodes are internal to ext4. Adding encryption/security
related attributes on them would mean dealing with nested calls into ea code.
Since they have no direct exposure to user mode, just avoid creating ea
entries for them.
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When a CIFS filesystem is mounted with the forcemand option and the
following command is run on it, lockdep warns about a circular locking
dependency between CifsInodeInfo::lock_sem and the inode lock.
while echo foo > hello; do :; done & while touch -c hello; do :; done
cifs_writev() takes the locks in the wrong order, but note that we can't
only flip the order around because it releases the inode lock before the
call to generic_write_sync() while it holds the lock_sem across that
call.
But, AFAICS, there is no need to hold the CifsInodeInfo::lock_sem across
the generic_write_sync() call either, so we can release both the locks
before generic_write_sync(), and change the order.
======================================================
WARNING: possible circular locking dependency detected
4.12.0-rc7+ #9 Not tainted
------------------------------------------------------
touch/487 is trying to acquire lock:
(&cifsi->lock_sem){++++..}, at: cifsFileInfo_put+0x88f/0x16a0
but task is already holding lock:
(&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sb->s_type->i_mutex_key#11){+.+.+.}:
__lock_acquire+0x1f74/0x38f0
lock_acquire+0x1cc/0x600
down_write+0x74/0x110
cifs_strict_writev+0x3cb/0x8c0
__vfs_write+0x4c1/0x930
vfs_write+0x14c/0x2d0
SyS_write+0xf7/0x240
entry_SYSCALL_64_fastpath+0x1f/0xbe
-> #0 (&cifsi->lock_sem){++++..}:
check_prevs_add+0xfa0/0x1d10
__lock_acquire+0x1f74/0x38f0
lock_acquire+0x1cc/0x600
down_write+0x74/0x110
cifsFileInfo_put+0x88f/0x16a0
cifs_setattr+0x992/0x1680
notify_change+0x61a/0xa80
utimes_common+0x3d4/0x870
do_utimes+0x1c1/0x220
SyS_utimensat+0x84/0x1a0
entry_SYSCALL_64_fastpath+0x1f/0xbe
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sb->s_type->i_mutex_key#11);
lock(&cifsi->lock_sem);
lock(&sb->s_type->i_mutex_key#11);
lock(&cifsi->lock_sem);
*** DEADLOCK ***
2 locks held by touch/487:
#0: (sb_writers#10){.+.+.+}, at: mnt_want_write+0x41/0xb0
#1: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870
stack backtrace:
CPU: 0 PID: 487 Comm: touch Not tainted 4.12.0-rc7+ #9
Call Trace:
dump_stack+0xdb/0x185
print_circular_bug+0x45b/0x790
__lock_acquire+0x1f74/0x38f0
lock_acquire+0x1cc/0x600
down_write+0x74/0x110
cifsFileInfo_put+0x88f/0x16a0
cifs_setattr+0x992/0x1680
notify_change+0x61a/0xa80
utimes_common+0x3d4/0x870
do_utimes+0x1c1/0x220
SyS_utimensat+0x84/0x1a0
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: 19dfc1f5f2 ("cifs: fix the race in cifs_writev()")
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Currently oparms.create_options is uninitialized and the code is logically
or'ing in CREATE_OPEN_BACKUP_INTENT onto a garbage value of
oparms.create_options from the stack. Fix this by just setting the value
rather than or'ing in the setting.
Detected by CoverityScan, CID#1447220 ("Unitialized scale value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
In cifs_call_async, server may respond as soon as I/O is submitted. Because
mid entry is freed on the return path, it should not be modified after I/O
is submitted.
cifs_save_when_sent modifies the sent timestamp in mid entry, and should not
be called after I/O. Call it before I/O.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Hi,
attached patch adds more missing mappings for the 0x01-0x1f range. Please
review, if you're fine with it, considere it also for stable.
Björn
>From a97720c26db2ee77d4e798e3d383fcb6a348bd29 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Jacke?= <bjacke@samba.org>
Date: Wed, 31 May 2017 22:48:41 +0200
Subject: [PATCH] cifs: add SFM mapping for 0x01-0x1F
0x1-0x1F has to be mapped to 0xF001-0xF01F
Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Some functions are only referenced under an #ifdef, causing a harmless
warning:
fs/cifs/smb2ops.c:1374:1: error: 'get_smb2_acl' defined but not used [-Werror=unused-function]
We could mark them __maybe_unused or add another #ifdef, I picked
the second approach here.
Fixes: b3fdda4d1e1b ("cifs: Use smb 2 - 3 and cifsacl mount options getacl functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Add definition and declaration of function to get cifs acls when
mounting with smb version 2 onwards to 3.
Extend/Alter query info function to allocate and return
security descriptors within the response.
Not yet handling the error case when the size of security descriptors
in response to query exceeds SMB2_MAX_BUFFER_SIZE.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Add new config option that dumps AES keys to the console when they are
generated. This is obviously for debugging purposes only, and should not
be enabled otherwise.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull mnt namespace updates from Eric Biederman:
"A big break-through came during this development cycle as a way was
found to maintain the existing umount -l semantics while allowing for
optimizations that improve the performance. That is represented by the
first change in this series moving the reparenting of mounts into
their own pass. This has allowed addressing the horrific performance
of umount -l on a carefully crafted tree of mounts with locks held
(0.06s vs 60s in my testing). What allowed this was not changing where
umounts propagate to while propgating umounts.
The next change fixes the case where the order of the mount whose
umount are being progated visits a tree where the mounts are stacked
upon each other in another order. This is weird but not hard to
implement.
The final change takes advantage of the unchanging mount propgation
tree to skip parts of the mount propgation tree that have already been
visited. Yielding a very nice speed up in the worst case.
There remains one outstanding question about the semantics of umount -l
that I am still discussiong with Ram Pai. In practice that area of the
semantics was changed by 1064f874ab ("mnt: Tuck mounts under others
instead of creating shadow/side mounts.") and no regressions have been
reported. Still I intend to finish talking that out with him to ensure
there is not something a more intense use of mount propagation in the
future will not cause to become significant"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
mnt: Make propagate_umount less slow for overlapping mount propagation trees
mnt: In propgate_umount handle visiting mounts in any order
mnt: In umount propagation reparent in a separate pass
Pull GFS2 updates from Bob Peterson:
"We've got eight GFS2 patches for this merge window:
- Andreas Gruenbacher has four patches related to cleaning up the
GFS2 inode evict process. This is about half of his patches
designed to fix a long-standing GFS2 hang related to the inode
shrinker: Shrinker calls gfs2 evict, evict calls DLM, DLM requires
memory and blocks on the shrinker.
These four patches have been well tested. His second set of patches
are still being tested, so I plan to hold them until the next merge
window, after we have more weeks of testing. The first patch
eliminates the flush_delayed_work, which can block.
- Andreas's second patch protects setting of gl_object for rgrps with
a spin_lock to prevent proven races.
- His third patch introduces a centralized mechanism for queueing
glock work with better reference counting, to prevent more races.
-His fourth patch retains a reference to inode glocks when an error
occurs while creating an inode. This keeps the subsequent evict
from needing to reacquire the glock, which might call into DLM and
block in low memory conditions.
- Arvind Yadav has a patch to add const to attribute_group
structures.
- I have a patch to detect directory entry inconsistencies and
withdraw the file system if any are found. Better that than silent
corruption.
- I have a patch to remove a vestigial variable from glock
structures, saving some slab space.
- I have another patch to remove a vestigial variable from the GFS2
in-core superblock structure"
* tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
GFS2: constify attribute_group structures.
gfs2: gfs2_create_inode: Keep glock across iput
gfs2: Clean up glock work enqueuing
gfs2: Protect gl->gl_object by spin lock
gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
GFS2: Eliminate vestigial sd_log_flush_wrapped
GFS2: Remove gl_list from glock structure
GFS2: Withdraw when directory entry inconsistencies are detected
Pull btrfs updates from David Sterba:
"The core updates improve error handling (mostly related to bios), with
the usual incremental work on the GFP_NOFS (mis)use removal,
refactoring or cleanups. Except the two top patches, all have been in
for-next for an extensive amount of time.
User visible changes:
- statx support
- quota override tunable
- improved compression thresholds
- obsoleted mount option alloc_start
Core updates:
- bio-related updates:
- faster bio cloning
- no allocation failures
- preallocated flush bios
- more kvzalloc use, memalloc_nofs protections, GFP_NOFS updates
- prep work for btree_inode removal
- dir-item validation
- qgoup fixes and updates
- cleanups:
- removed unused struct members, unused code, refactoring
- argument refactoring (fs_info/root, caller -> callee sink)
- SEARCH_TREE ioctl docs"
* 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits)
btrfs: Remove false alert when fiemap range is smaller than on-disk extent
btrfs: Don't clear SGID when inheriting ACLs
btrfs: fix integer overflow in calc_reclaim_items_nr
btrfs: scrub: fix target device intialization while setting up scrub context
btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges
btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quotas being enabled
btrfs: qgroup: Return actually freed bytes for qgroup release or free data
btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
btrfs: qgroup: Add quick exit for non-fs extents
Btrfs: rework delayed ref total_bytes_pinned accounting
Btrfs: return old and new total ref mods when adding delayed refs
Btrfs: always account pinned bytes when dropping a tree block ref
Btrfs: update total_bytes_pinned when pinning down extents
Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()
Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64
btrfs: fix validation of XATTR_ITEM dir items
btrfs: Verify dir_item in iterate_object_props
btrfs: Check name_len before in btrfs_del_root_ref
btrfs: Check name_len before reading btrfs_get_name
...
Pull timer-related user access updates from Al Viro:
"Continuation of timers-related stuff (there had been more, but my
parts of that series are already merged via timers/core). This is more
of y2038 work by Deepa Dinamani, partially disrupted by the
unification of native and compat timers-related syscalls"
* 'timers-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
posix_clocks: Use get_itimerspec64() and put_itimerspec64()
timerfd: Use get_itimerspec64() and put_itimerspec64()
nanosleep: Use get_timespec64() and put_timespec64()
posix-timers: Use get_timespec64() and put_timespec64()
posix-stubs: Conditionally include COMPAT_SYS_NI defines
time: introduce {get,put}_itimerspec64
time: add get_timespec64 and put_timespec64
Since only an open file can be mmap'ed, and we only allow open()ing an
encrypted file when its key is available, there is no need to check for
the key again before permitting each mmap().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Currently, filesystems allow truncate(2) on an encrypted file without
the encryption key. However, it's impossible to correctly handle the
case where the size being truncated to is not a multiple of the
filesystem block size, because that would require decrypting the final
block, zeroing the part beyond i_size, then encrypting the block.
As other modifications to encrypted file contents are prohibited without
the key, just prohibit truncate(2) as well, making it fail with ENOKEY.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull read/write updates from Al Viro:
"Christoph's fs/read_write.c series - consolidation and cleanups"
* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: remove nfsd_vfs_read
nfsd: use vfs_iter_read/write
fs: implement vfs_iter_write using do_iter_write
fs: implement vfs_iter_read using do_iter_read
fs: move more code into do_iter_read/do_iter_write
fs: remove __do_readv_writev
fs: remove do_compat_readv_writev
fs: remove do_readv_writev
Pull misc user access cleanups from Al Viro:
"The first pile is assorted getting rid of cargo-culted access_ok(),
cargo-culted set_fs() and field-by-field copyouts.
The same description applies to a lot of stuff in other branches -
this is just the stuff that didn't fit into a more specific topical
branch"
* 'work.misc-set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Switch flock copyin/copyout primitives to copy_{from,to}_user()
fs/fcntl: return -ESRCH in f_setown when pid/pgid can't be found
fs/fcntl: f_setown, avoid undefined behaviour
fs/fcntl: f_setown, allow returning error
lpfc debugfs: get rid of pointless access_ok()
adb: get rid of pointless access_ok()
isdn: get rid of pointless access_ok()
compat statfs: switch to copy_to_user()
fs/locks: don't mess with the address limit in compat_fcntl64
nfsd_readlink(): switch to vfs_get_link()
drbd: ->sendpage() never needed set_fs()
fs/locks: pass kernel struct flock to fcntl_getlk/setlk
fs: locks: Fix some troubles at kernel-doc comments
Pull networking updates from David Miller:
"Reasonably busy this cycle, but perhaps not as busy as in the 4.12
merge window:
1) Several optimizations for UDP processing under high load from
Paolo Abeni.
2) Support pacing internally in TCP when using the sch_fq packet
scheduler for this is not practical. From Eric Dumazet.
3) Support mutliple filter chains per qdisc, from Jiri Pirko.
4) Move to 1ms TCP timestamp clock, from Eric Dumazet.
5) Add batch dequeueing to vhost_net, from Jason Wang.
6) Flesh out more completely SCTP checksum offload support, from
Davide Caratti.
7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
Neira Ayuso, and Matthias Schiffer.
8) Add devlink support to nfp driver, from Simon Horman.
9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
Prabhu.
10) Add stack depth tracking to BPF verifier and use this information
in the various eBPF JITs. From Alexei Starovoitov.
11) Support XDP on qed device VFs, from Yuval Mintz.
12) Introduce BPF PROG ID for better introspection of installed BPF
programs. From Martin KaFai Lau.
13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.
14) For loads, allow narrower accesses in bpf verifier checking, from
Yonghong Song.
15) Support MIPS in the BPF selftests and samples infrastructure, the
MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
Daney.
16) Support kernel based TLS, from Dave Watson and others.
17) Remove completely DST garbage collection, from Wei Wang.
18) Allow installing TCP MD5 rules using prefixes, from Ivan
Delalande.
19) Add XDP support to Intel i40e driver, from Björn Töpel
20) Add support for TC flower offload in nfp driver, from Simon
Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
Kicinski, and Bert van Leeuwen.
21) IPSEC offloading support in mlx5, from Ilan Tayari.
22) Add HW PTP support to macb driver, from Rafal Ozieblo.
23) Networking refcount_t conversions, From Elena Reshetova.
24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
for tuning the TCP sockopt settings of a group of applications,
currently via CGROUPs"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
cxgb4: Support for get_ts_info ethtool method
cxgb4: Add PTP Hardware Clock (PHC) support
cxgb4: time stamping interface for PTP
nfp: default to chained metadata prepend format
nfp: remove legacy MAC address lookup
nfp: improve order of interfaces in breakout mode
net: macb: remove extraneous return when MACB_EXT_DESC is defined
bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
bpf: fix return in load_bpf_file
mpls: fix rtm policy in mpls_getroute
net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
...
The patch below updated xfs_dq_get_next_id() to use the XFS iext
lookup helpers to locate the next quota id rather than to seek for
data in the quota file. The updated code fails to correctly handle
the case where the quota inode might have contiguous chunks part of
the same extent. In this case, the start block offset is calculated
based on the next expected id but the extent lookup returns the same
start offset as for the previous chunk. This causes the returned id
to go backwards and livelocks the quota iteration. This problem is
reproduced intermittently by generic/232.
To handle this case, check whether the startoff from the extent
lookup is behind the startoff calculated from the next quota id. If
so, bump up got.br_startoff to the specific file offset that is
expected to hold the next dquot chunk.
Fixes: bda250dbaf ("xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull pstore updates from Kees Cook:
"Various fixes and tweaks for the pstore subsystem.
Highlights:
- use memdup_user() instead of open-coded copies (Geliang Tang)
- fix record memory leak during initialization (Douglas Anderson)
- avoid confused compressed record warning (Ankit Kumar)
- prepopulate record timestamp and remove redundant logic from
backends"
* tag 'pstore-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
powerpc/nvram: use memdup_user
pstore: use memdup_user
pstore: Fix format string to use %u for record id
pstore: Populate pstore record->time field
pstore: Create common record initializer
efi-pstore: Refactor erase routine
pstore: Avoid potential infinite loop
pstore: Fix leaked pstore_record in pstore_get_backend_records()
pstore: Don't warn if data is uncompressed and type is not PSTORE_TYPE_DMESG
Pull security layer updates from James Morris:
- a major update for AppArmor. From JJ:
* several bug fixes and cleanups
* the patch to add symlink support to securityfs that was floated
on the list earlier and the apparmorfs changes that make use of
securityfs symlinks
* it introduces the domain labeling base code that Ubuntu has been
carrying for several years, with several cleanups applied. And it
converts the current mediation over to using the domain labeling
base, which brings domain stacking support with it. This finally
will bring the base upstream code in line with Ubuntu and provide
a base to upstream the new feature work that Ubuntu carries.
* This does _not_ contain any of the newer apparmor mediation
features/controls (mount, signals, network, keys, ...) that
Ubuntu is currently carrying, all of which will be RFC'd on top
of this.
- Notable also is the Infiniband work in SELinux, and the new file:map
permission. From Paul:
"While we're down to 21 patches for v4.13 (it was 31 for v4.12),
the diffstat jumps up tremendously with over 2k of line changes.
Almost all of these changes are the SELinux/IB work done by
Daniel Jurgens; some other noteworthy changes include a NFS v4.2
labeling fix, a new file:map permission, and reporting of policy
capabilities on policy load"
There's also now genfscon labeling support for tracefs, which was
lost in v4.1 with the separation from debugfs.
- Smack incorporates a safer socket check in file_receive, and adds a
cap_capable call in privilege check.
- TPM as usual has a bunch of fixes and enhancements.
- Multiple calls to security_add_hooks() can now be made for the same
LSM, to allow LSMs to have hook declarations across multiple files.
- IMA now supports different "ima_appraise=" modes (eg. log, fix) from
the boot command line.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (126 commits)
apparmor: put back designators in struct initialisers
seccomp: Switch from atomic_t to recount_t
seccomp: Adjust selftests to avoid double-join
seccomp: Clean up core dump logic
IMA: update IMA policy documentation to include pcr= option
ima: Log the same audit cause whenever a file has no signature
ima: Simplify policy_func_show.
integrity: Small code improvements
ima: fix get_binary_runtime_size()
ima: use ima_parse_buf() to parse template data
ima: use ima_parse_buf() to parse measurements headers
ima: introduce ima_parse_buf()
ima: Add cgroups2 to the defaults list
ima: use memdup_user_nul
ima: fix up #endif comments
IMA: Correct Kconfig dependencies for hash selection
ima: define is_ima_appraise_enabled()
ima: define Kconfig IMA_APPRAISE_BOOTPARAM option
ima: define a set of appraisal rules requiring file signatures
ima: extend the "ima_policy" boot command line to support multiple policies
...
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
5259 1344 8 6611 19d3 fs/gfs2/sys.o
File size After adding 'const':
text data bss dec hex filename
5371 1216 8 6595 19c3 fs/gfs2/sys.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
On failure, keep the inode glock across the final iput of the new inode
so that gfs2_evict_inode doesn't have to re-acquire the glock. That
way, gfs2_evict_inode won't need to revalidate the block type.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>