Commit Graph

47524 Commits

Author SHA1 Message Date
Michal Hocko
f913da596a proc, oom: drop bogus sighand lock
Oleg has pointed out that can simplify both oom_adj_{read,write} and
oom_score_adj_{read,write} even further and drop the sighand lock.  The
main purpose of the lock was to protect p->signal from going away but this
will not happen since ea6d290ca3 ("signals: make task_struct->signal
immutable/refcountable").

The other role of the lock was to synchronize different writers,
especially those with CAP_SYS_RESOURCE.  Introduce a mutex for this
purpose.  Later patches will need this lock anyway.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/1466426628-15074-3-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Michal Hocko
d49fbf766d proc, oom: drop bogus task_lock and mm check
Series "Handle oom bypass more gracefully", V5

The following 10 patches should put some order to very rare cases of mm
shared between processes and make the paths which bypass the oom killer
oom reapable and therefore much more reliable finally.  Even though mm
shared outside of thread group is rare (either vforked tasks for a short
period, use_mm by kernel threads or exotic thread model of
clone(CLONE_VM) without CLONE_SIGHAND) it is better to cover them.  Not
only it makes the current oom killer logic quite hard to follow and
reason about it can lead to weird corner cases.  E.g.  it is possible to
select an oom victim which shares the mm with unkillable process or
bypass the oom killer even when other processes sharing the mm are still
alive and other weird cases.

Patch 1 drops bogus task_lock and mm check from oom_{score_}adj_write.
This can be considered a bug fix with a low impact as nobody has noticed
for years.

Patch 2 drops sighand lock because it is not needed anymore as pointed
by Oleg.

Patch 3 is a clean up of oom_score_adj handling and a preparatory work
for later patches.

Patch 4 enforces oom_adj_score to be consistent between processes
sharing the mm to behave consistently with the regular thread groups.
This can be considered a user visible behavior change because one thread
group updating oom_score_adj will affect others which share the same mm
via clone(CLONE_VM).  I argue that this should be acceptable because we
already have the same behavior for threads in the same thread group and
sharing the mm without signal struct is just a different model of
threading.  This is probably the most controversial part of the series,
I would like to find some consensus here.  There were some suggestions
to hook some counter/oom_score_adj into the mm_struct but I feel that
this is not necessary right now and we can rely on proc handler +
oom_kill_process to DTRT.  I can be convinced otherwise but I strongly
think that whatever we do the userspace has to have a way to see the
current oom priority as consistently as possible.

Patch 5 makes sure that no vforked task is selected if it is sharing the
mm with oom unkillable task.

Patch 6 ensures that all user tasks sharing the mm are killed which in
turn makes sure that all oom victims are oom reapable.

Patch 7 guarantees that task_will_free_mem will always imply reapable
bypass of the oom killer.

Patch 8 is new in this version and it addresses an issue pointed out by
0-day OOM report where an oom victim was reaped several times.

Patch 9 puts an upper bound on how many times oom_reaper tries to reap a
task and hides it from the oom killer to move on when no progress can be
made.  This will give an upper bound to how long an oom_reapable task
can block the oom killer from selecting another victim if the oom_reaper
is not able to reap the victim.

Patch 10 tries to plug the (hopefully) last hole when we can still lock
up when the oom victim is shared with oom unkillable tasks (kthreads and
global init).  We just try to be best effort in that case and rather
fallback to kill something else than risk a lockup.

This patch (of 10):

Both oom_adj_write and oom_score_adj_write are using task_lock, check for
task->mm and fail if it is NULL.  This is not needed because the
oom_score_adj is per signal struct so we do not need mm at all.  The code
has been introduced by 3d5992d2ac ("oom: add per-mm oom disable count")
but we do not do per-mm oom disable since c9f01245b6 ("oom: remove
oom_disable_count").

The task->mm check is even not correct because the current thread might
have exited but the thread group might be still alive - e.g.  thread group
leader would lead that echo $VAL > /proc/pid/oom_score_adj would always
fail with EINVAL while /proc/pid/task/$other_tid/oom_score_adj would
succeed.  This is unexpected at best.

Remove the lock along with the check to fix the unexpected behavior and
also because there is not real need for the lock in the first place.

Link: http://lkml.kernel.org/r/1466426628-15074-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Scott Bauer
10eec60ce7 vfs: ioctl: prevent double-fetch in dedupe ioctl
This prevents a double-fetch from user space that can lead to to an
undersized allocation and heap overflow.

Fixes: 54dbc15172 ("vfs: hoist the btrfs deduplication ioctl to the vfs")
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 15:23:12 -07:00
Linus Torvalds
69c4289449 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  fat: fix error message for bogus number of directory entries
  fat: fix typo s/supeblock/superblock/
  ASoC: max9877: Remove unused function declaration
  dw2102: don't output spurious blank lines to the kernel log
  init: fix Kconfig text
  ARM: io: fix comment grammar
  ocfs: fix ocfs2_xattr_user_get() argument name
  scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1()
2016-07-28 14:22:25 -07:00
Linus Torvalds
76d5b28bba Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota update from Jan Kara:
 "time64 support for quota"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: use time64_t internally
2016-07-28 13:53:23 -07:00
Linus Torvalds
6784725ab0 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted cleanups and fixes.

  Probably the most interesting part long-term is ->d_init() - that will
  have a bunch of followups in (at least) ceph and lustre, but we'll
  need to sort the barrier-related rules before it can get used for
  really non-trivial stuff.

  Another fun thing is the merge of ->d_iput() callers (dentry_iput()
  and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
  except the one in __d_lookup_lru())"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  fs/dcache.c: avoid soft-lockup in dput()
  vfs: new d_init method
  vfs: Update lookup_dcache() comment
  bdev: get rid of ->bd_inodes
  Remove last traces of ->sync_page
  new helper: d_same_name()
  dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
  vfs: clean up documentation
  vfs: document ->d_real()
  vfs: merge .d_select_inode() into .d_real()
  unify dentry_iput() and dentry_unlink_inode()
  binfmt_misc: ->s_root is not going anywhere
  drop redundant ->owner initializations
  ufs: get rid of redundant checks
  orangefs: constify inode_operations
  missed comment updates from ->direct_IO() prototype change
  file_inode(f)->i_mapping is f->f_mapping
  trim fsnotify hooks a bit
  9p: new helper - v9fs_parent_fid()
  debugfs: ->d_parent is never NULL or negative
  ...
2016-07-28 12:59:05 -07:00
Linus Torvalds
554828ee0d Merge branch 'salted-string-hash'
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
  fs/dcache.c: Save one 32-bit multiply in dcache lookup
  vfs: make the string hashes salt the hash
2016-07-28 12:26:31 -07:00
Benjamin Coddington
944171cbf4 pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding
A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes,
and in that case NFS_INO_INVALID_ATTR is never set on the second pass
through nfs_update_inode().  The existing check to skip the clearing of
NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this
case (see commit 10b7e9ad44: "pNFS: Don't mark the inode as revalidated
if a LAYOUTCOMMIT is outstanding").  We know that if a LAYOUTCOMMIT is
outstanding then attributes will need upating, so always set
NFS_INO_INVALID_ATTR.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-28 14:49:08 -04:00
Pavel Shilovsky
7893242e24 CIFS: Fix a possible invalid memory access in smb2_query_symlink()
During following a symbolic link we received err_buf from SMB2_open().
While the validity of SMB2 error response is checked previously
in smb2_check_message() a symbolic link payload is not checked at all.
Fix it by adding such checks.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-07-27 22:55:56 -05:00
Aurelien Aptel
a6b5058faf fs/cifs: make share unaccessible at root level mountable
if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but
not any of the path components above:

- store the /sub/dir/foo prefix in the cifs super_block info
- in the superblock, set root dentry to the subpath dentry (instead of
  the share root)
- set a flag in the superblock to remember it
- use prefixpath when building path from a dentry

fixes bso#8950

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-07-27 22:50:55 -05:00
Nicolas Pitre
002d2f01f1 m68k: enable binfmt_flat on systems with an MMU
Now that the generic changes are in place, this can be enabled on m68k
with the use of proper user space accessors in the flat_get_addr_from_rp()
and flat_put_addr_at_rp() handlers as rp actually holds a user space
address.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-28 13:29:13 +10:00
Nicolas Pitre
472f95f32d binfmt_flat: allow compressed flat binary format to work on MMU systems
Let's take the simple and obvious approach by decompressing the binary
into a kernel buffer and then copying it to user space.  Those who are
looking for top performance on an MMU system are unlikely to choose this
executable format anyway.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-28 13:29:12 +10:00
Nicolas Pitre
015feacf93 binfmt_flat: add MMU-specific support
Not much else to do at this point except for the different stack setups.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-28 13:29:12 +10:00
Nicolas Pitre
af521f92dc binfmt_flat: update libraries' data segment pointer with userspace accessors
This is needed on systems with a MMU.  This also gets rid of the
strangest C code I've seen lateli i.e. an integer indexed with a
pointer value within square brackets. That really looked backwards.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-28 13:29:11 +10:00
Nicolas Pitre
467aa1465a binfmt_flat: use clear_user() rather than memset() to clear .bss
This is needed on systems with a MMU.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-28 13:29:11 +10:00
Nicolas Pitre
1b2ce442ea binfmt_flat: use proper user space accessors with old relocs code
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-28 13:28:58 +10:00
Phil Turnbull
955818cd5b ceph: Correctly return NXIO errors from ceph_llseek
ceph_llseek does not correctly return NXIO errors because the 'out' path
always returns 'offset'.

Fixes: 06222e491e ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:45 +02:00
Nikolay Borisov
6b1a9a6c54 ceph: Mark the file cache as unreclaimable
Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so
that it can satisfy its internal needs. Inspecting the code shows that
most of the caches are indeed reclaimable since they are directly
related to the generic inode/dentry shrinkers. However, one of the
cache used to satisfy struct file is not reclaimable since its
entries are freed only when the last reference to the file is
dropped. If a heavily loaded node opens a lot of files it can
introduce non-trivial discrepancies between memory shown as reclaimable
and what is actually reclaimed when drop_caches is used.

Fix this by removing the reclaimable flag for the file's cache.

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:45 +02:00
Yan, Zheng
c8799fc467 ceph: optimize cap flush waiting
Add a 'wake' flag to ceph_cap_flush struct, which indicates if there
is someone waiting for it to finish. When getting flush ack message,
we check the 'wake' flag in corresponding ceph_cap_flush struct to
decide if we should wake up waiters. One corner case is that the
acked cap flush has 'wake' flags is set, but it is not the first one
on the flushing list. We do not wake up waiters in this case, set
'wake' flags of preceding ceph_cap_flush struct instead

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:45 +02:00
Yan, Zheng
ed9b430c9b ceph: cleanup ceph_flush_snaps()
This patch devide __ceph_flush_snaps() into two stags. In the first
stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them
to cap flush lists. __ceph_flush_snaps() keeps holding the
i_ceph_lock in this stagge. So inode's auth cap can not change. In
the second stage, __ceph_flush_snaps() send flushsnap cap messages.
i_ceph_lock is unlocked before sending each cap message. If auth cap
changes in the middle, __ceph_flush_snaps() just stops. This is OK
because kick_flushing_inode_caps() will re-send flushsnap cap messages
to inode's new auth MDS.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:44 +02:00
Yan, Zheng
7bc00fddb9 ceph: kick cap flushes before sending other cap message
If ceph_check_caps() wants to send cap message to a recovering MDS,
make sure it kicks cap flushes first.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:44 +02:00
Yan, Zheng
70220ac8c2 ceph: introduce an inode flag to indicates if snapflush is needed
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:43 +02:00
Yan, Zheng
13c2b57d81 ceph: avoid sending duplicated cap flush message
make ceph_kick_flushing_caps() ignore inodes whose cap flushes
have already been re-sent by ceph_early_kick_flushing_caps()

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:43 +02:00
Yan, Zheng
0e29438789 ceph: unify cap flush and snapcap flush
This patch includes following changes
- Assign flush tid to snapcap flush
- Remove session's s_cap_snaps_flushing list. Add inode to session's
  s_cap_flushing list instead. Inode is removed from the list when
  there is no pending snapcap flush or cap flush.
- make __kick_flushing_caps() re-send both snapcap flushes and cap
  flushes.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:42 +02:00
Yan, Zheng
e4500b5e35 ceph: use list instead of rbtree to track cap flushes
We don't have requirement of searching cap flush by TID. In most cases,
we just need to know TID of the oldest cap flush. List is ideal for this
usage.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:42 +02:00
Yan, Zheng
3609404f8c ceph: update types of some local varibles
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:42 +02:00
Yan, Zheng
3469ed0d14 ceph: include 'follows' of pending snapflush in cap reconnect message
This helps the recovering MDS to reconstruct the internal states that
tracking pending snapflush.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:41 +02:00
Yan, Zheng
121f22a19a ceph: update cap reconnect message to version 3
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:41 +02:00
Yan, Zheng
430afbadd6 ceph: mount non-default filesystem by name
To mount non-default filesytem, user currently needs to provide mds
namespace ID. This is inconvenience.

This patch makes user be able to mount filesystem by name. If user
wants to mount non-default filesystem. Client first subscribes to
fsmap.user. Subscribe to mdsmap.<ID> after getting ID of filesystem.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:40 +02:00
Jeff Layton
f49d1e058d ceph: handle LOOKUP_RCU in ceph_d_revalidate
We can now handle the snapshot cases under RCU, as well as the
non-snapshot case when we don't need to queue up a lease renewal
allow LOOKUP_RCU walks to proceed under those conditions.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:39 +02:00
Jeff Layton
14fb9c9efe ceph: allow dentry_lease_is_valid to work under RCU walk
Under rcuwalk, we need to take extra care when dereferencing d_parent.
We want to do that once and pass a pointer to dentry_lease_is_valid.

Also, we must ensure that that function can handle the case where we're
racing with d_release. Check whether "di" is NULL under the d_lock, and
just return 0 if so.

Finally, we still need to kick off a renewal job if the lease is getting
close to expiration. If that's the case, then just drop out of rcuwalk
mode since that could block.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:39 +02:00
Jeff Layton
5b484a5131 ceph: clear d_fsinfo pointer under d_lock
To check for a valid dentry lease, we need to get at the
ceph_dentry_info. Under rcuwalk though, we may end up with a dentry that
is on its way to destruction. Since we need to take the d_lock in
dentry_lease_is_valid already, we can just ensure that we clear the
d_fsinfo pointer out under the same lock before destroying it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:39 +02:00
Jeff Layton
8aa152c778 ceph: remove ceph_mdsc_lease_release
Nothing calls it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 03:00:38 +02:00
Miklos Szeredi
9b16f03c47 ceph: don't use ->d_time
Pretty simple: just use ceph_dentry_info.time instead (which was already
there, unused).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-28 03:00:35 +02:00
Colin Ian King
679f0b825d ceph: fix spelling mistake: "resgister" -> "register"
trivial fix to spelling mistake in pr_err message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
2016-07-28 02:55:40 +02:00
Yan, Zheng
fce8515741 ceph: fix NULL dereference in ceph_queue_cap_snap()
old_snapc->seq is used in dout(...)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 02:55:40 +02:00
Yan, Zheng
9a5530c638 ceph: wait unsafe sync writes for evicting inode
Otherwise ceph_sync_write_unsafe() may access/modify freed inode.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 02:55:40 +02:00
Yan, Zheng
fc8c3892f3 ceph: fix use-after-free bug in ceph_direct_read_write()
ceph_aio_complete() can free the ceph_aio_request struct before
the code exits the while loop.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 02:55:39 +02:00
Yan, Zheng
774a6a118c ceph: reduce i_nr_by_mode array size
Track usage count for individual fmode bit. This can reduce the
array size by half.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 02:55:39 +02:00
Yan, Zheng
a22bd5ffae ceph: set user pages dirty after direct IO read
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 02:55:38 +02:00
Yan, Zheng
779fe0fb8e ceph: rados pool namespace support
This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 02:55:38 +02:00
Yan, Zheng
30c156d995 libceph: rados pool namespace support
Add pool namesapce pointer to struct ceph_file_layout and struct
ceph_object_locator. Pool namespace is used by when mapping object
to PG, it's also used when composing OSD request.

The namespace pointer in struct ceph_file_layout is RCU protected.
So libceph can read namespace without taking lock.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28 02:55:37 +02:00
Yan, Zheng
7627151ea3 libceph: define new ceph_file_layout structure
Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28 02:55:36 +02:00
Ilya Dryomov
281dbe5db8 libceph: add an ONSTACK initializer for oids
An on-stack oid in ceph_ioctl_get_dataloc() is not initialized,
resulting in a WARN and a NULL pointer dereference later on.  We will
have more of these on-stack in the future, so fix it with a convenience
macro.

Fixes: d30291b985 ("libceph: variable-sized ceph_object_id")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28 02:55:35 +02:00
Linus Torvalds
468fc7ed55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

 2) Make DSA binding more sane, from Andrew Lunn.

 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface.  From Brenden Blanco and
    others.

 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

 8) Simplify netlink conntrack entry layout, from Florian Westphal.

 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

11) Support qdisc packet injection in pktgen, from John Fastabend.

12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

13) Add NV congestion control support to TCP, from Lawrence Brakmo.

14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

16) Support MPLS over IPV4, from Simon Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  xgene: Fix build warning with ACPI disabled.
  be2net: perform temperature query in adapter regardless of its interface state
  l2tp: Correctly return -EBADF from pppol2tp_getname.
  net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
  net: ipmr/ip6mr: update lastuse on entry change
  macsec: ensure rx_sa is set when validation is disabled
  tipc: dump monitor attributes
  tipc: add a function to get the bearer name
  tipc: get monitor threshold for the cluster
  tipc: make cluster size threshold for monitoring configurable
  tipc: introduce constants for tipc address validation
  net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
  MAINTAINERS: xgene: Add driver and documentation path
  Documentation: dtb: xgene: Add MDIO node
  dtb: xgene: Add MDIO node
  drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
  drivers: net: xgene: Use exported functions
  drivers: net: xgene: Enable MDIO driver
  drivers: net: xgene: Add backward compatibility
  drivers: net: phy: xgene: Add MDIO driver
  ...
2016-07-27 12:03:20 -07:00
Linus Torvalds
ba4f67899f Merge tag 'dlm-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
 "This set includes two trivial changes, one to use kmemdup and another
  to control the log level of recovery messages"

* tag 'dlm-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: Use kmemdup instead of kmalloc and memcpy
  dlm: add log_info config option
2016-07-27 10:47:24 -07:00
Linus Torvalds
4fc29c1aa3 Merge tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "The major change in this version is mitigating cpu overheads on write
  paths by replacing redundant inode page updates with mark_inode_dirty
  calls.  And we tried to reduce lock contentions as well to improve
  filesystem scalability.  Other feature is setting F2FS automatically
  when detecting host-managed SMR.

  Enhancements:
   - ioctl to move a range of data between files
   - inject orphan inode errors
   - avoid flush commands congestion
   - support lazytime

  Bug fixes:
   - return proper results for some dentry operations
   - fix deadlock in add_link failure
   - disable extent_cache for fcollapse/finsert"

* tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: clean up coding style and redundancy
  f2fs: get victim segment again after new cp
  f2fs: handle error case with f2fs_bug_on
  f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
  f2fs: support an ioctl to move a range of data blocks
  f2fs: fix to report error number of f2fs_find_entry
  f2fs: avoid memory allocation failure due to a long length
  f2fs: reset default idle interval value
  f2fs: use blk_plug in all the possible paths
  f2fs: fix to avoid data update racing between GC and DIO
  f2fs: add maximum prefree segments
  f2fs: disable extent_cache for fcollapse/finsert inodes
  f2fs: refactor __exchange_data_block for speed up
  f2fs: fix ERR_PTR returned by bio
  f2fs: avoid mark_inode_dirty
  f2fs: move i_size_write in f2fs_write_end
  f2fs: fix to avoid redundant discard during fstrim
  f2fs: avoid mismatching block range for discard
  f2fs: fix incorrect f_bfree calculation in ->statfs
  f2fs: use percpu_rw_semaphore
  ...
2016-07-27 10:36:31 -07:00
Linus Torvalds
0e6acf0204 Merge tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs updates from Dave Chinner:
 "The major addition is the new iomap based block mapping
  infrastructure.  We've been kicking this about locally for years, but
  there are other filesystems want to use it too (e.g. gfs2).  Now it
  is fully working, reviewed and ready for merge and be used by other
  filesystems.

  There are a lot of other fixes and cleanups in the tree, but those are
  XFS internal things and none are of the scale or visibility of the
  iomap changes.  See below for details.

  I am likely to send another pull request next week - we're just about
  ready to merge some new functionality (on disk block->owner reverse
  mapping infrastructure), but that's a huge chunk of code (74 files
  changed, 7283 insertions(+), 1114 deletions(-)) so I'm keeping that
  separate to all the "normal" pull request changes so they don't get
  lost in the noise.

  Summary of changes in this update:
   - generic iomap based IO path infrastructure
   - generic iomap based fiemap implementation
   - xfs iomap based Io path implementation
   - buffer error handling fixes
   - tracking of in flight buffer IO for unmount serialisation
   - direct IO and DAX io path separation and simplification
   - shortform directory format definition changes for wider platform
     compatibility
   - various buffer cache fixes
   - cleanups in preparation for rmap merge
   - error injection cleanups and fixes
   - log item format buffer memory allocation restructuring to prevent
     rare OOM reclaim deadlocks
   - sparse inode chunks are now fully supported"

* tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (53 commits)
  xfs: remove EXPERIMENTAL tag from sparse inode feature
  xfs: bufferhead chains are invalid after end_page_writeback
  xfs: allocate log vector buffers outside CIL context lock
  libxfs: directory node splitting does not have an extra block
  xfs: remove dax code from object file when disabled
  xfs: skip dirty pages in ->releasepage()
  xfs: remove __arch_pack
  xfs: kill xfs_dir2_inou_t
  xfs: kill xfs_dir2_sf_off_t
  xfs: split direct I/O and DAX path
  xfs: direct calls in the direct I/O path
  xfs: stop using generic_file_read_iter for direct I/O
  xfs: split xfs_file_read_iter into buffered and direct I/O helpers
  xfs: remove s_maxbytes enforcement in xfs_file_read_iter
  xfs: kill ioflags
  xfs: don't pass ioflags around in the ioctl path
  xfs: track and serialize in-flight async buffers against unmount
  xfs: exclude never-released buffers from buftarg I/O accounting
  xfs: don't reset b_retries to 0 on every failure
  xfs: remove extraneous buffer flag changes
  ...
2016-07-27 09:53:35 -07:00
Miklos Szeredi
1b91dbdd29 Merge branch 'd_real' into overlayfs-next 2016-07-27 11:36:03 +02:00
Linus Torvalds
0e06f5c0de Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2

 - most(?) of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
  thp: fix comments of __pmd_trans_huge_lock()
  cgroup: remove unnecessary 0 check from css_from_id()
  cgroup: fix idr leak for the first cgroup root
  mm: memcontrol: fix documentation for compound parameter
  mm: memcontrol: remove BUG_ON in uncharge_list
  mm: fix build warnings in <linux/compaction.h>
  mm, thp: convert from optimistic swapin collapsing to conservative
  mm, thp: fix comment inconsistency for swapin readahead functions
  thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
  shmem: split huge pages beyond i_size under memory pressure
  thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
  khugepaged: add support of collapse for tmpfs/shmem pages
  shmem: make shmem_inode_info::lock irq-safe
  khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
  thp: extract khugepaged from mm/huge_memory.c
  shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
  shmem: add huge pages support
  shmem: get_unmapped_area align huge page
  shmem: prepare huge= mount option and sysfs knob
  mm, rmap: account shmem thp pages
  ...
2016-07-26 19:55:54 -07:00