Commit 5bc2afc2b5 "NFSv4: Honour the 'opened' parameter in the atomic_open()
filesystem method" have support the opened arguments now.
Also,
Commit 03da633aa7 "atomic_open: take care of EEXIST in no-open case with
O_CREAT|O_EXCL in fs/namei.c" have change vfs's logical.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Commit e38eb6506f "NFS: set_pnfs_layoutdriver() from nfs4_proc_fsinfo()"
have remove the using of mntfh from nfs_server_set_fsinfo().
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Make each fuse device clone refer to a separate processing queue. The only
constraint on userspace code is that the request answer must be written to
the same device clone as it was read off.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Allow fuse device clones to refer to be distinguished. This patch just
adds the infrastructure by associating a separate "struct fuse_dev" with
each clone.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Allow an open fuse device to be "cloned". Userspace can create a clone by:
newfd = open("/dev/fuse", O_RDWR)
ioctl(newfd, FUSE_DEV_IOC_CLONE, &oldfd);
At this point newfd will refer to the same fuse connection as oldfd.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
In fuse_abort_conn() when all requests are on private lists we no longer
need fc->lock protection.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
No longer need to call request_end() with the connection lock held. We
still protect the background counters and queue with fc->lock, so acquire
it if necessary.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Now that we atomically test having already done everything we no longer
need other protection.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
When the connection is aborted it is possible that request_end() will be
called twice. Use atomic test and set to do the actual ending only once.
test_and_set_bit() also provides the necessary barrier semantics so no
explicit smp_wmb() is necessary.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
When an unlocked request is aborted, it is moved from fpq->io to a private
list. Then, after unlocking fpq->lock, the private list is processed and
the requests are finished off.
To protect the private list, we need to mark the request with a flag, so if
in the meantime the request is unlocked the list is not corrupted.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Add a fpq->lock for protecting members of struct fuse_pqueue and FR_LOCKED
request flag.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
- locked list_add() + list_del_init() cancel out
- common handling of case when request is ended here in the read phase
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
This will allow checking ->connected just with the processing queue lock.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
This is just two fields: fc->io and fc->processing.
This patch just rearranges the fields, no functional change.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
wait_event_interruptible_exclusive_locked() will do everything
request_wait() does, so replace it.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Interrupt is only queued after the request has been sent to userspace.
This is either done in request_wait_answer() or fuse_dev_do_read()
depending on which state the request is in at the time of the interrupt.
If it's not yet sent, then queuing the interrupt is postponed until the
request is read. Otherwise (the request has already been read and is
waiting for an answer) the interrupt is queued immedidately.
We want to call queue_interrupt() without fc->lock protection, in which
case there can be a race between the two functions:
- neither of them queue the interrupt (thinking the other one has already
done it).
- both of them queue the interrupt
The first one is prevented by adding memory barriers, the second is
prevented by checking (under fiq->waitq.lock) if the interrupt has already
been queued.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Use fiq->waitq.lock for protecting members of struct fuse_iqueue and
FR_PENDING request flag, previously protected by fc->lock.
Following patches will remove fc->lock protection from these members.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
This will allow checking ->connected just with the input queue lock.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
The input queue contains normal requests (fc->pending), forgets
(fc->forget_*) and interrupts (fc->interrupts). There's also fc->waitq and
fc->fasync for waking up the readers of the fuse device when a request is
available.
The fc->reqctr is also moved to the input queue (assigned to the request
when the request is added to the input queue.
This patch just rearranges the fields, no functional change.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Use flags for representing the state in fuse_req. This is needed since
req->list will be protected by different locks in different states, hence
we'll want the state itself to be split into distinct bits, each protected
with the relevant lock in that state.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
FUSE_REQ_INIT is actually the same state as FUSE_REQ_PENDING and
FUSE_REQ_READING and FUSE_REQ_WRITING can be merged into a common
FUSE_REQ_IO state.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Only hold fc->lock over sections of request_wait_answer() that actually
need it. If wait_event_interruptible() returns zero, it means that the
request finished. Need to add memory barriers, though, to make sure that
all relevant data in the request is synchronized.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Since it's a 64bit counter, it's never gonna wrap around. Remove code
dealing with that possibility.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Splice fc->pending and fc->processing lists into a common kill list while
holding fc->lock.
By the time we release fc->lock, pending and processing lists are empty and
the io list contains only locked requests.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Finer grained locking will mean there's no single lock to protect
modification of bitfileds in fuse_req.
So move to using bitops. Can use the non-atomic variants for those which
happen while the request definitely has only one reference.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
- don't end the request while req->locked is true
- make unlock_request() return an error if the connection was aborted
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
fuse_abort_conn() does all the work done by fuse_dev_release() and more.
"More" consists of:
end_io_requests(fc);
wake_up_all(&fc->waitq);
kill_fasync(&fc->fasync, SIGIO, POLL_IN);
All of which should be no-op (WARN_ON's added).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
fc->conn_error is set once in FUSE_INIT reply and never cleared. Check it
in request allocation, there's no sense in doing all the preparation if
sending will surely fail.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Move accounting of fc->num_waiting to the point where the request actually
starts waiting. This is earlier than the current queue_request() for
background requests, since they might be waiting on the fc->bg_queue before
being queued on fc->pending.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Reset req->waiting in fuse_put_request(). This is needed for correct
accounting in fc->num_waiting for reserved requests.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
request_end() expects fc->num_background and fc->active_background to have
been incremented, which is not the case in fuse_request_send_nowait()
failure path. So instead just call the ->end() callback (which is actually
set by all callers).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.
[Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
fuse_conn_init(fc) instead of before.]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: a325f9b922 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
Cc: <stable@vger.kernel.org> #v2.6.31+
__fget() does lockless fetch of pointer from the descriptor
table, attempts to grab a reference and treats "it was already
zero" as "it's already gone from the table, we just hadn't
seen the store, let's fail". Unfortunately, that breaks the
atomicity of dup2() - __fget() might see the old pointer,
notice that it's been already dropped and treat that as
"it's closed". What we should be getting is either the
old file or new one, depending whether we come before or after
dup2().
Dmitry had following test failing sometimes :
int fd;
void *Thread(void *x) {
char buf;
int n = read(fd, &buf, 1);
if (n != 1)
exit(printf("read failed: n=%d errno=%d\n", n, errno));
return 0;
}
int main()
{
fd = open("/dev/urandom", O_RDONLY);
int fd2 = open("/dev/urandom", O_RDONLY);
if (fd == -1 || fd2 == -1)
exit(printf("open failed\n"));
pthread_t th;
pthread_create(&th, 0, Thread, 0);
if (dup2(fd2, fd) == -1)
exit(printf("dup2 failed\n"));
pthread_join(th, 0);
if (close(fd) == -1)
exit(printf("close failed\n"));
if (close(fd2) == -1)
exit(printf("close failed\n"));
printf("DONE\n");
return 0;
}
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mateusz Guzik reported :
Currently obtaining a new file descriptor results in locking fdtable
twice - once in order to reserve a slot and second time to fill it.
Holding the spinlock in __fd_install() is needed in case a resize is
done, or to prevent a resize.
Mateusz provided an RFC patch and a micro benchmark :
http://people.redhat.com/~mguzik/pipebench.c
A resize is an unlikely operation in a process lifetime,
as table size is at least doubled at every resize.
We can use RCU instead of the spinlock.
__fd_install() must wait if a resize is in progress.
The resize must block new __fd_install() callers from starting,
and wait that ongoing install are finished (synchronize_sched())
resize should be attempted by a single thread to not waste resources.
rcu_sched variant is used, as __fd_install() and expand_fdtable() run
from process context.
It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
CPU E5-2696 v2 @ 2.50GHz
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Mateusz Guzik <mguzik@redhat.com>
Tested-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Execution of get_anon_bdev concurrently and preemptive kernel all
could bring race condition, it isn't enough to check dev against
its upper limitation with equality operator only.
This patch fix it.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull CIFS/SMB3 updates from Steve French:
"Includes two bug fixes, as well as (minimal) support for the new
protocol dialect (SMB3.1.1), and support for two ioctls including
reflink (duplicate extents) file copy and set integrity"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Unset CIFS_MOUNT_POSIX_PATHS flag when following dfs mounts
Update negotiate protocol for SMB3.11 dialect
Add ioctl to set integrity
Add Get/Set Integrity Information structure definitions
Add reflink copy over SMB3.11 with new FSCTL_DUPLICATE_EXTENTS
Add SMB3.11 mount option synonym for new dialect
add struct FILE_STANDARD_INFO
Make dialect negotiation warning message easier to read
Add defines and structs for smb3.1 dialect
Allow parsing vers=3.11 on cifs mount
client MUST ignore EncryptionKeyLength if CAP_EXTENDED_SECURITY is set
currently, get_next_ino() is able to create inodes with inode number = 0.
This have a bad impact in the filesystems relying in this function to generate
inode numbers.
While there is no problem at all in having inodes with number 0, userspace tools
which handle file management tasks can have problems handling these files, like
for example, the impossiblity of users to delete these files, since glibc will
ignore them. So, I believe the best way is kernel to avoid creating them.
This problem has been raised previously, but the old thread didn't have any
other update for a year+, and I've seen too many users hitting the same issue
regarding the impossibility to delete files while using filesystems relying on
this function. So, I'm starting the thread again, with the same patch
that I believe is enough to address this problem.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>