cl_rpcclient starts as ERR_PTR(-EINVAL), and connections like that
are floating freely through the system. Most places check whether
pointer is valid before dereferencing it, but newly added code
in nfs_match_client does not.
Which causes crashes when more than one NFS mount point is present.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
We recently refactored the Orangefs debugfs code.
The refactor seemed to trigger dan.carpenter@oracle.com's
static tester to find a possible double-free in the code.
While designing the fix we saw a condition under which the
buffer being freed could also be overflowed.
We also realized how to rebuild the related debugfs file's
"contents" (a string) without deleting and re-creating the file.
This fix should eliminate the possible double-free, the
potential overflow and improve code readability.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Pull nfsd bugfixes from Bruce Fields:
"Fixes for some recent regressions including fallout from the vmalloc'd
stack change (after which we can no longer encrypt stuff on the
stack)"
* tag 'nfsd-4.9-1' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix general protection fault in release_lock_stateid()
svcrdma: backchannel cannot share a page for send and rcv buffers
sunrpc: fix some missing rq_rbuffer assignments
sunrpc: don't pass on-stack memory to sg_set_buf
nfsd: move blocked lock handling under a dedicated spinlock
Pull btrfs fixes from Chris Mason:
"Some fixes that Dave Sterba collected. We held off on these last week
because I was focused on the memory corruption testing"
* 'for-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix WARNING in btrfs_select_ref_head()
Btrfs: remove some no-op casts
btrfs: pass correct args to btrfs_async_run_delayed_refs()
btrfs: make file clone aware of fatal signals
btrfs: qgroup: Prevent qgroup->reserved from going subzero
Btrfs: kill BUG_ON in do_relocation
Pull overlayfs fixes from Miklos Szeredi:
"Fix two more POSIX ACL bugs introduced in 4.8 and add a missing fsync
during copy up to prevent possible data loss"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fsync after copy-up
ovl: fix get_acl() on tmpfs
ovl: update S_ISGID when setting posix ACLs
Add a helper function that clears buffer heads from a block device
aliasing passed bh. Use this helper function from filesystems instead of
the original unmap_underlying_metadata() to save some boiler plate code
and also have a better name for the functionalily since it is not
unmapping anything for a *long* time.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Use clean_bdev_aliases() instead of iterating through blocks one by one.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Use clean_bdev_aliases() instead of iterating through blocks one by one.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Use new provided function instead of an iteration through all allocated
blocks.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Provide function equivalent to unmap_underlying_metadata() for a range
of blocks. We somewhat optimize the function to use pagevec lookups
instead of looking up buffer heads one by one and use page lock to pin
buffer heads instead of mapping's private_lock to improve scalability.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add wbc_to_write_flags(), which returns the write modifier flags to use,
based on a struct writeback_control. No functional changes in this
patch, but it prepares us for factoring other wbc fields for write type.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
3c8e03166a "NFSv4: do exact check about attribute specified" fixed
some handling of unsupported-attribute errors, but it also delayed
checking for unwriteable attributes till after we decode them. This
could lead to odd behavior in the case a client attemps to set an
attribute we don't know about followed by one we try to parse. In that
case the parser for the known attribute will attempt to parse the
unknown attribute. It should fail in some safe way, but the error might
at least be incorrect (probably bad_xdr instead of inval). So, it's
better to do that check at the start.
As far as I know this doesn't cause any problems with current clients
but it might be a minor issue e.g. if we encounter a future client that
supports a new attribute that we currently don't.
Cc: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Minor cleanup, no change in behavior.
Provide helpers for some common attribute bitmap operations. Drop some
comments that just echo the code.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, when the client continually returns NFS4ERR_DELAY on a
CB_LAYOUTRECALL, we'll give up trying to retransmit after two lease
periods, but leave the layout in place.
What we really need to do here is fence the client in this case. Have it
fall through to that code in that case instead of into the
NFS4ERR_NOMATCHING_LAYOUT case.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, we try to allocate the cache as a single, large chunk, which
can fail if no big chunks of memory are available. We _do_ try to size
it according to the amount of memory in the box, but if the server is
started well after boot time, then the allocation can fail due to memory
fragmentation.
Fall back to doing a vzalloc if the kcalloc fails, and switch the
shutdown code to do a kvfree to handle freeing correctly.
Reported-by: Olaf Hering <olaf@aepfle.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's only needed for the CONFIG_SWAP-only use of bio_end_io_t.
Because CONFIG_SWAP implies CONFIG_BLOCK this will allow to drop some
ifdefs in blk_types.h.
Instead we'll need to add a few explicit includes that were implicit
before, though.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly. Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Each socket operates in a network namespace where it has been created,
so if we want to dump and restore a socket, we have to know its network
namespace.
We have a socket_diag to get information about sockets, it doesn't
report sockets which are not bound or connected.
This patch introduces a new socket ioctl, which is called SIOCGSKNS
and used to get a file descriptor for a socket network namespace.
A task must have CAP_NET_ADMIN in a target network namespace to
use this ioctl.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure the copied up file hits the disk before renaming to the final
destination. If this is not done then the copy-up may corrupt the data in
the file in case of a crash.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
tmpfs doesn't have ->get_acl() because it only uses cached acls.
This fixes the acl tests in pjdfstest when tmpfs is used as the upper layer
of the overlay.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 39a25b2b37 ("ovl: define ->get_acl() for overlay inodes")
Cc: <stable@vger.kernel.org> # v4.8
This change fixes xfstest generic/375, which failed to clear the
setgid bit in the following test case on overlayfs:
touch $testfile
chown 100:100 $testfile
chmod 2755 $testfile
_runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Fixes: d837a49bd5 ("ovl: fix POSIX ACL setting")
Cc: <stable@vger.kernel.org> # v4.8
Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:
for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
--runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
Fix the problem by dropping freeze protection only once IO is completed
in aio_complete().
Reported-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
[hch: forward ported on top of various VFS and aio changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pass the ABI iocb structure to aio_setup_rw and let it handle the
non-vectored I/O case as well. With that and a new helper for the AIO
return value handling we can now define new aio_read and aio_write
helpers that implement reads and writes in a self-contained way without
duplicating too much code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Otherwise we might dereference an already freed file and/or inode
when aio_complete is called before we return from the read_iter or
write_iter method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mostly simple overlapping changes.
For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Lots of fixes, mostly drivers as is usually the case.
1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
Khoroshilov.
2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
Pedersen.
3) Don't put aead_req crypto struct on the stack in mac80211, from
Ard Biesheuvel.
4) Several uninitialized variable warning fixes from Arnd Bergmann.
5) Fix memory leak in cxgb4, from Colin Ian King.
6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.
7) Several VRF semantic fixes from David Ahern.
8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.
9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.
10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.
11) Fix stale link state during failover in NCSCI driver, from Gavin
Shan.
12) Fix netdev lower adjacency list traversal, from Ido Schimmel.
13) Propvide proper handle when emitting notifications of filter
deletes, from Jamal Hadi Salim.
14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.
15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.
16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.
17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.
18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
Leitner.
19) Revert a netns locking change that causes regressions, from Paul
Moore.
20) Add recursion limit to GRO handling, from Sabrina Dubroca.
21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.
22) Avoid accessing stale vxlan/geneve socket in data path, from
Pravin Shelar"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
geneve: avoid using stale geneve socket.
vxlan: avoid using stale vxlan socket.
qede: Fix out-of-bound fastpath memory access
net: phy: dp83848: add dp83822 PHY support
enic: fix rq disable
tipc: fix broadcast link synchronization problem
ibmvnic: Fix missing brackets in init_sub_crq_irqs
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
net/mlx4_en: Save slave ethtool stats command
net/mlx4_en: Fix potential deadlock in port statistics flow
net/mlx4: Fix firmware command timeout during interrupt test
net/mlx4_core: Do not access comm channel if it has not yet been initialized
net/mlx4_en: Fix panic during reboot
net/mlx4_en: Process all completions in RX rings after port goes up
net/mlx4_en: Resolve dividing by zero in 32-bit system
net/mlx4_core: Change the default value of enable_qos
net/mlx4_core: Avoid setting ports to auto when only one port type is supported
net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
...
Pull ubi/ubifs fixes from Richard Weinberger:
"This contains fixes for issues in both UBI and UBIFS:
- A regression wrt overlayfs, introduced in -rc2.
- An UBI issue, found by Dan Carpenter's static checker"
* tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs:
ubifs: Fix regression in ubifs_readdir()
ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()
Pull driver core fixes from Greg KH:
"Here are two small driver core / kernfs fixes for 4.9-rc3.
One makes the Kconfig entry for DEBUG_TEST_DRIVER_REMOVE a bit more
explicit that this is a crazy thing to enable for a distro kernel
(thanks for trying Fedora!), the other resolves an issue with vim
opening kernfs files (sysfs, configfs, etc.)
Both have been in linux-next with no reported issues"
* tag 'driver-core-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Make Kconfig text for DEBUG_TEST_DRIVER_REMOVE stronger
kernfs: Add noop_fsync to supported kernfs_file_fops
Pull btrfs fixes from Chris Mason:
"My patch fixes the btrfs list_head abuse that we tracked down during
Dave Jones' memory corruption investigation. With both Jens and my
patches in place, I'm no longer able to trigger problems.
Filipe is fixing a difficult old bug between snapshots, balance and
send. Dave is cooking a few more for the next rc, but these are tested
and ready"
* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix races on root_log_ctx lists
btrfs: fix incremental send failure caused by balance
Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields. This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.
In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.
Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits. Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Merge misc fixes from Andrew Morton:
"20 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
cris/arch-v32: cryptocop: print a hex number after a 0x prefix
ipack: print a hex number after a 0x prefix
block: DAC960: print a hex number after a 0x prefix
fs: exofs: print a hex number after a 0x prefix
lib/genalloc.c: start search from start of chunk
mm: memcontrol: do not recurse in direct reclaim
CREDITS: update credit information for Martin Kepplinger
proc: fix NULL dereference when reading /proc/<pid>/auxv
mm: kmemleak: ensure that the task stack is not freed during scanning
lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
latent_entropy: raise CONFIG_FRAME_WARN by default
kconfig.h: remove config_enabled() macro
ipc: account for kmem usage on mqueue and msg
mm/slab: improve performance of gathering slabinfo stats
mm: page_alloc: use KERN_CONT where appropriate
mm/list_lru.c: avoid error-path NULL pointer deref
h8300: fix syscall restarting
kcov: properly check if we are in an interrupt
mm/slab: fix kmemcg cache creation delayed issue
Reading auxv of any kernel thread results in NULL pointer dereferencing
in auxv_read() where mm can be NULL. Fix that by checking for NULL mm
and bailing out early. This is also the original behavior changed by
recent commit c531716785 ("proc: switch auxv to use of __mem_open()").
# cat /proc/2/auxv
Unable to handle kernel NULL pointer dereference at virtual address 000000a8
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
Hardware name: BCM2709
task: ea3b0b00 task.stack: e99b2000
PC is at auxv_read+0x24/0x4c
LR is at do_readv_writev+0x2fc/0x37c
Process cat (pid: 113, stack limit = 0xe99b2210)
Call chain:
auxv_read
do_readv_writev
vfs_readv
default_file_splice_read
splice_direct_to_actor
do_splice_direct
do_sendfile
SyS_sendfile64
ret_fast_syscall
Fixes: c531716785 ("proc: switch auxv to use of __mem_open()")
Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Janis Danisevskis <jdanis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now genl_register_family() is the only thing (other than the
users themselves, perhaps, but I didn't find any doing that)
writing to the family struct.
In all families that I found, genl_register_family() is only
called from __init functions (some indirectly, in which case
I've add __init annotations to clarifly things), so all can
actually be marked __ro_after_init.
This protects the data structure from accidental corruption.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of providing macros/inline functions to initialize
the families, make all users initialize them statically and
get rid of the macros.
This reduces the kernel code size by about 1.6k on x86-64
(with allyesconfig).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Static family IDs have never really been used, the only
use case was the workaround I introduced for those users
that assumed their family ID was also their multicast
group ID.
Additionally, because static family IDs would never be
reserved by the generic netlink code, using a relatively
low ID would only work for built-in families that can be
registered immediately after generic netlink is started,
which is basically only the control family (apart from
the workaround code, which I also had to add code for so
it would reserve those IDs)
Thus, anything other than GENL_ID_GENERATE is flawed and
luckily not used except in the cases I mentioned. Move
those workarounds into a few lines of code, and then get
rid of GENL_ID_GENERATE entirely, making it more robust.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull oreangefs updates from Mike Marshall:
"A couple of orangefs cleanups sent in by other developers:
- use d_fsdata instead of d_time (Miklos Szeredi)
- use file_inode(file) instead of file->f_path.dentry->d_inode (Amir
Goldstein)"
* tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: don't use d_time
orangefs: user file_inode() where it is due
Pull xfs fixes from Dave Chinner:
"This update contains fixes for most of the outstanding regressions
introduced with the 4.9-rc1 XFS merge. There is also a fix for an
iomap bug, too.
This is a quite a bit larger than I'd prefer for a -rc3, but most of
the change comes from cleaning up the new reflink copy on write code;
it's much simpler and easier to understand now. These changes fixed
several bugs in the new code, and it wasn't clear that there was an
easier/simpler way to fix them. The rest of the fixes are the usual
size you'd expect at this stage.
I've left the commits to soak in linux-next for a some extra time
because of the size before asking you to pull, no new problems with
them have been reported so I think it's all OK.
Summary:
- iomap page offset masking fix for page faults
- add IOMAP_REPORT to distinguish between read and fiemap map
requests
- cleanups to new shared data extent code
- fix mount active status on failed log recovery
- fix broken dquots in a buffer calculation
- fix locking order issues and merge xfs_reflink_remap_range and
xfs_file_share_range
- rework unmapping of CoW extents and remove now unused functions
- clean state when CoW is done"
* tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (25 commits)
xfs: clear cowblocks tag when cow fork is emptied
xfs: fix up inode cowblocks tracking tracepoints
fs: Do to trim high file position bits in iomap_page_mkwrite_actor
xfs: remove xfs_bunmapi_cow
xfs: optimize xfs_reflink_end_cow
xfs: optimize xfs_reflink_cancel_cow_blocks
xfs: refactor xfs_bunmapi_cow
xfs: optimize writes to reflink files
xfs: don't bother looking at the refcount tree for reads
xfs: handle "raw" delayed extents xfs_reflink_trim_around_shared
xfs: add xfs_trim_extent
iomap: add IOMAP_REPORT
xfs: merge xfs_reflink_remap_range and xfs_file_share_range
xfs: remove xfs_file_wait_for_io
xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range
xfs: fix the same_inode check in xfs_file_share_range
xfs: remove the same fs check from xfs_file_share_range
libxfs: v3 inodes are only valid on crc-enabled filesystems
libxfs: clean up _calc_dquots_per_chunk
xfs: unset MS_ACTIVE if mount fails
...
btrfs_remove_all_log_ctxs takes a shortcut where it avoids walking the
list because it knows all of the waiters are patiently waiting for the
commit to finish.
But, there's a small race where btrfs_sync_log can remove itself from
the list if it finds a log commit is already done. Also, it uses
list_del_init() to remove itself from the list, but there's no way to
know if btrfs_remove_all_log_ctxs has already run, so we don't know for
sure if it is safe to call list_del_init().
This gets rid of all the shortcuts for btrfs_remove_all_log_ctxs(), and
just calls it with the proper locking.
This is part two of the corruption fixed by cbd60aa7cd. I should have
done this in the first place, but convinced myself the optimizations were
safe. A 12 hour run of dbench 2048 will eventually trigger a list debug
WARN_ON for the list_del_init() in btrfs_sync_log().
Fixes: d1433debe7
Reported-by: Dave Jones <davej@codemonkey.org.uk>
cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Chris Mason <clm@fb.com>
If you edit a kernfs backed file with vi(1), you see an ugly error
message when you write the file because vi tries to fsync(2) the
file after writing, which fails.
We have noop_fsync() for this, use it.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>