Commit Graph

48450 Commits

Author SHA1 Message Date
Carlos Maiolino
a5ea70d25d xfs: add configuration of error failure speed
On reception of an error, we can fail immediately, perform some
bound amount of retries or retry indefinitely. The current behaviour
we have is to retry forever.

However, we'd like the ability to choose how long the filesystem
should try after an error, it can either fail immediately, retry a
few times, or retry forever. This is implemented by using
max_retries sysfs attribute, to hold the amount of times we allow
the filesystem to retry after an error. Being -1 a special case
where the filesystem will retry indefinitely.

Add both a maximum retry count and a retry timeout so that we can
bound by time and/or physical IO attempts.

Finally, plumb these into xfs_buf_iodone error processing so that
the error behaviour follows the selected configuration.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-05-18 11:08:15 +10:00
Carlos Maiolino
ef6a50fbb1 xfs: introduce table-based init for error behaviors
Before we start expanding the number of error classes and errors we
can configure behaviour for, we need a simple and clear way to
define the default behaviour that we initialized each mount with.
Introduce a table based method for keeping the initial configuration
in, and apply that to the existing initialization code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-05-18 11:06:44 +10:00
Carlos Maiolino
df3093907c xfs: add configurable error support to metadata buffers
With the error configuration handle for async metadata write errors
in place, we can now add initial support to the IO error processing
in xfs_buf_iodone_error().

Add an infrastructure function to look up the configuration handle,
and rearrange the error handling to prepare the way for different
error handling conigurations to be used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-05-18 11:05:33 +10:00
Carlos Maiolino
ffd40ef697 xfs: introduce metadata IO error class
Now we have the basic infrastructure, add the first error class so
we can build up the infrastructure in a meaningful way. Add the
metadata async write IO error class and sysfs entry, and introduce a
default configuration that matches the existing "retry forever"
behavior for async write metadata buffers.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-05-18 11:01:00 +10:00
Carlos Maiolino
192852be8b xfs: configurable error behavior via sysfs
We need to be able to change the way XFS behaviours in error
conditions depending on the type of underlying storage. This is
necessary for handling non-traditional block devices with extended
error cases, such as thin provisioned devices that can return ENOSPC
as an IO error.

Introduce the basic sysfs infrastructure needed to define and
configure error behaviours. This is done to be generic enough to
extend to configuring behaviour in other error conditions, such as
ENOMEM, which also has different desired behaviours according to
machine configuration.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-05-18 10:58:51 +10:00
Brian Foster
9bdd9bd69b xfs: buffer ->bi_end_io function requires irq-safe lock
Reports have surfaced of a lockdep splat complaining about an
irq-safe -> irq-unsafe locking order in the xfs_buf_bio_end_io() bio
completion handler. This only occurs when I/O errors are present
because bp->b_lock is only acquired in this context to protect
setting an error on the buffer. The problem is that this lock can be
acquired with the (request_queue) q->queue_lock held. See
scsi_end_request() or ata_qc_schedule_eh(), for example.

Replace the locked test/set of b_io_error with a cmpxchg() call.
This eliminates the need for the lock and thus the lock ordering
problem goes away.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-05-18 10:56:41 +10:00
Linus Torvalds
16bf834805 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits)
  gitignore: fix wording
  mfd: ab8500-debugfs: fix "between" in printk
  memstick: trivial fix of spelling mistake on management
  cpupowerutils: bench: fix "average"
  treewide: Fix typos in printk
  IB/mlx4: printk fix
  pinctrl: sirf/atlas7: fix printk spelling
  serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/
  w1: comment spelling s/minmum/minimum/
  Blackfin: comment spelling s/divsor/divisor/
  metag: Fix misspellings in comments.
  ia64: Fix misspellings in comments.
  hexagon: Fix misspellings in comments.
  tools/perf: Fix misspellings in comments.
  cris: Fix misspellings in comments.
  c6x: Fix misspellings in comments.
  blackfin: Fix misspelling of 'register' in comment.
  avr32: Fix misspelling of 'definitions' in comment.
  treewide: Fix typos in printk
  Doc: treewide : Fix typos in DocBook/filesystem.xml
  ...
2016-05-17 17:05:30 -07:00
Linus Torvalds
a7fd20d1c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support SPI based w5100 devices, from Akinobu Mita.

   2) Partial Segmentation Offload, from Alexander Duyck.

   3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.

   4) Allow cls_flower stats offload, from Amir Vadai.

   5) Implement bpf blinding, from Daniel Borkmann.

   6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
      actually using FASYNC these atomics are superfluous.  From Eric
      Dumazet.

   7) Run TCP more preemptibly, also from Eric Dumazet.

   8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
      driver, from Gal Pressman.

   9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.

  10) Improve BPF usage documentation, from Jesper Dangaard Brouer.

  11) Support tunneling offloads in qed, from Manish Chopra.

  12) aRFS offloading in mlx5e, from Maor Gottlieb.

  13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
      Leitner.

  14) Add MSG_EOR support to TCP, this allows controlling packet
      coalescing on application record boundaries for more accurate
      socket timestamp sampling.  From Martin KaFai Lau.

  15) Fix alignment of 64-bit netlink attributes across the board, from
      Nicolas Dichtel.

  16) Per-vlan stats in bridging, from Nikolay Aleksandrov.

  17) Several conversions of drivers to ethtool ksettings, from Philippe
      Reynes.

  18) Checksum neutral ILA in ipv6, from Tom Herbert.

  19) Factorize all of the various marvell dsa drivers into one, from
      Vivien Didelot

  20) Add VF support to qed driver, from Yuval Mintz"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
  Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
  Revert "phy dp83867: Make rgmii parameters optional"
  r8169: default to 64-bit DMA on recent PCIe chips
  phy dp83867: Make rgmii parameters optional
  phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
  bpf: arm64: remove callee-save registers use for tmp registers
  asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
  switchdev: pass pointer to fib_info instead of copy
  net_sched: close another race condition in tcf_mirred_release()
  tipc: fix nametable publication field in nl compat
  drivers: net: Don't print unpopulated net_device name
  qed: add support for dcbx.
  ravb: Add missing free_irq() calls to ravb_close()
  qed: Remove a stray tab
  net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fec-mpc52xx: use phydev from struct net_device
  bpf, doc: fix typo on bpf_asm descriptions
  stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
  net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fs-enet: use phydev from struct net_device
  ...
2016-05-17 16:26:30 -07:00
Andreas Gruenbacher
e0d46f5c6e btrfs: Switch to generic xattr handlers
The btrfs_{set,remove}xattr inode operations check for a read-only root
(btrfs_root_readonly) before calling into generic_{set,remove}xattr.  If
this check is moved into __btrfs_setxattr, we can get rid of
btrfs_{set,remove}xattr.

This patch applies to mainline, I would like to keep it together with
the other xattr cleanups if possible, though.  Could you please review?

Thanks,
Andreas

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-17 19:17:09 -04:00
Andreas Gruenbacher
2b88fc21ca ubifs: Switch to generic xattr handlers
Ubifs internally uses special inodes for storing xattrs. Those inodes
had NULL {get,set,remove}xattr inode operations before this change, so
xattr operations on them would fail. The super block's s_xattr field
would also apply to those special inodes. However, the inodes are not
visible outside of ubifs, and so no xattr operations will ever be
carried out on them anyway.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-17 19:16:23 -04:00
Linus Torvalds
c2e7b20705 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs cleanups from Al Viro:
 "More cleanups from Christoph"

* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nfsd: use RWF_SYNC
  fs: add RWF_DSYNC aand RWF_SYNC
  ceph: use generic_write_sync
  fs: simplify the generic_write_sync prototype
  fs: add IOCB_SYNC and IOCB_DSYNC
  direct-io: remove the offset argument to dio_complete
  direct-io: eliminate the offset argument to ->direct_IO
  xfs: eliminate the pos variable in xfs_file_dio_aio_write
  filemap: remove the pos argument to generic_file_direct_write
  filemap: remove pos variables in generic_file_read_iter
2016-05-17 15:05:23 -07:00
Chris Mason
c315ef8d9d Merge branch 'for-chris-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.7
Signed-off-by: Chris Mason <clm@fb.com>
2016-05-17 14:43:19 -07:00
Linus Torvalds
c52b76185b Merge branch 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct path' constification update from Al Viro:
 "'struct path' is passed by reference to a bunch of Linux security
  methods; in theory, there's nothing to stop them from modifying the
  damn thing and LSM community being what it is, sooner or later some
  enterprising soul is going to decide that it's a good idea.

  Let's remove the temptation and constify all of those..."

* 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  constify ima_d_path()
  constify security_sb_pivotroot()
  constify security_path_chroot()
  constify security_path_{link,rename}
  apparmor: remove useless checks for NULL ->mnt
  constify security_path_{mkdir,mknod,symlink}
  constify security_path_{unlink,rmdir}
  apparmor: constify common_perm_...()
  apparmor: constify aa_path_link()
  apparmor: new helper - common_path_perm()
  constify chmod_common/security_path_chmod
  constify security_sb_mount()
  constify chown_common/security_path_chown
  tomoyo: constify assorted struct path *
  apparmor_path_truncate(): path->mnt is never NULL
  constify vfs_truncate()
  constify security_path_truncate()
  [apparmor] constify struct path * in a bunch of helpers
2016-05-17 14:41:03 -07:00
Linus Torvalds
681750c046 Merge branch 'for-cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull cifs xattr updates from Al Viro:
 "This is the remaining parts of the xattr work - the cifs bits"

* 'for-cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  cifs: Switch to generic xattr handlers
  cifs: Fix removexattr for os2.* xattrs
  cifs: Check for equality with ACL_TYPE_ACCESS and ACL_TYPE_DEFAULT
  cifs: Fix xattr name checks
2016-05-17 14:35:45 -07:00
Linus Torvalds
820c687b70 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes from Jan Kara:
 "A fix for UDF crash on corrupted media and one UDF header fixup"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Export superblock magic to userspace
  udf: Prevent stack overflow on corrupted filesystem mount
2016-05-17 14:25:02 -07:00
Linus Torvalds
dba1e98731 Merge tag 'jfs-4.7' of git://github.com/kleikamp/linux-shaggy
Pull jfs updates from Dave Kleikamp:
 "Some jfs logging cleanups from Joe Perches"

* tag 'jfs-4.7' of git://github.com/kleikamp/linux-shaggy:
  jfs: Coalesce some formats
  jfs: Remove unnecessary line continuations and terminating newlines
  jfs: Remove terminating newlines from jfs_info, jfs_warn, jfs_err uses
2016-05-17 14:15:18 -07:00
Kees Cook
cb6fd68fdd exec: clarify reasoning for euid/egid reset
This section of code initially looks redundant, but is required. This
improves the comment to explain more clearly why the reset is needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-17 13:56:53 -07:00
Jeff Layton
1b3c6d07e2 pnfs: make pnfs_layout_process more robust
It can return NULL if layoutgets are blocked currently. Fix it to return
-EAGAIN in that case, so we can properly handle it in pnfs_update_layout.

Also, clean up and simplify the error handling -- eliminate "status" and
just use "lseg".

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:13 -04:00
Jeff Layton
183d9e7b11 pnfs: rework LAYOUTGET retry handling
There are several problems in the way a stateid is selected for a
LAYOUTGET operation:

We pick a stateid to use in the RPC prepare op, but that makes
it difficult to serialize LAYOUTGETs that use the open stateid. That
serialization is done in pnfs_update_layout, which occurs well before
the rpc_prepare operation.

Between those two events, the i_lock is dropped and reacquired.
pnfs_update_layout can find that the list has lsegs in it and not do any
serialization, but then later pnfs_choose_layoutget_stateid ends up
choosing the open stateid.

This patch changes the client to select the stateid to use in the
LAYOUTGET earlier, when we're searching for a usable layout segment.
This way we can do it all while holding the i_lock the first time, and
ensure that we serialize any LAYOUTGET call that uses a non-layout
stateid.

This also means a rework of how LAYOUTGET replies are handled, as we
must now get the latest stateid if we want to retransmit in response
to a retryable error.

Most of those errors boil down to the fact that the layout state has
changed in some fashion. Thus, what we really want to do is to re-search
for a layout when it fails with a retryable error, so that we can avoid
reissuing the RPC at all if possible.

While the LAYOUTGET RPC is async, the initiating thread always waits for
it to complete, so it's effectively synchronous anyway. Currently, when
we need to retry a LAYOUTGET because of an error, we drive that retry
via the rpc state machine.

This means that once the call has been submitted, it runs until it
completes. So, we must move the error handling for this RPC out of the
rpc_call_done operation and into the caller.

In order to handle errors like NFS4ERR_DELAY properly, we must also
pass a pointer to the sliding timeout, which is now moved to the stack
in pnfs_update_layout.

The complicating errors are -NFS4ERR_RECALLCONFLICT and
-NFS4ERR_LAYOUTTRYLATER, as those involve a timeout after which we give
up and return NULL back to the caller. So, there is some special
handling for those errors to ensure that the layers driving the retries
can handle that appropriately.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:12 -04:00
Jeff Layton
83026d80a1 pnfs: lift retry logic from send_layoutget to pnfs_update_layout
If we get back something like NFS4ERR_OLD_STATEID, that will be
translated into -EAGAIN, and the do/while loop in send_layoutget
will drive the call again.

This is not quite what we want, I think. An error like that is a
sign that something has changed. That something could have been a
concurrent LAYOUTGET that would give us a usable lseg.

Lift the retry logic into pnfs_update_layout instead. That allows
us to redo the layout search, and may spare us from having to issue
an RPC.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:12 -04:00
Jeff Layton
d03ab29dbb pnfs: fix bad error handling in send_layoutget
Currently, the code will clear the fail bit if we get back a fatal
error. I don't think that's correct -- we want to clear that bit
if we do not get a fatal error.

Fixes: 0bcbf039f6 (nfs: handle request add failure properly)
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:11 -04:00
Jeff Layton
95e2b7e95d flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:11 -04:00
Jeff Layton
094069f1d9 flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED
Setting just the NFS_LAYOUT_RETURN_REQUESTED flag doesn't do anything,
unless there are lsegs that are also being marked for return. At the
point where that happens this flag is also set, so these set_bit calls
don't do anything useful.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:11 -04:00
Jeff Layton
6d597e1750 pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args
LAYOUTRETURN is "special" in that servers and clients are expected to
work with old stateids. When the client sends a LAYOUTRETURN with an old
stateid in it then the server is expected to only tear down layout
segments that were present when that seqid was current. Ensure that the
client handles its accounting accordingly.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:10 -04:00
Jeff Layton
3982a6a2d0 pnfs: keep track of the return sequence number in pnfs_layout_hdr
When we want to selectively do a LAYOUTRETURN, we need to specify a
stateid that represents most recent layout acquisition that is to be
returned.

When we mark a layout stateid to be returned, we update the return
sequence number in the layout header with that value, if it's newer
than the existing one. Then, when we go to do a LAYOUTRETURN on
layout header put, we overwrite the seqid in the stateid with the
saved one, and then zero it out.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:10 -04:00
Jeff Layton
6675528380 pnfs: record sequence in pnfs_layout_segment when it's created
In later patches, we're going to teach the client to be more selective
about how it returns layouts. This means keeping a record of what the
stateid's seqid was at the time that the server handed out a layout
segment.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:09 -04:00
Jeff Layton
ee26bdd680 pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set
Otherwise, we'll end up returning layouts that we've just received if
the client issues a new LAYOUTGET prior to the LAYOUTRETURN.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:09 -04:00
Tom Haynes
446ca21953 pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes
If we are initializing reads or writes and can not connect to a DS, then
check whether or not IO is allowed through the MDS. If it is allowed,
reset to the MDS. Else, fail the layout segment and force a retry
of a new layout segment.

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:08 -04:00
Tom Haynes
3b13b4b311 pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io
Whenever we check to see if we have the needed number of DSes for the
action, we may also have to check to see whether IO is allowed to go to
the MDS or not.

[jlayton: fix merge conflict due to lack of localio patches here]

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:08 -04:00
Trond Myklebust
75bf47ebf6 pNFS/flexfile: Fix erroneous fall back to read/write through the MDS
This patch fixes a problem whereby the pNFS client falls back to doing
reads and writes through the metadata server even when the layout flag
FF_FLAGS_NO_IO_THRU_MDS is set.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:07 -04:00
Trond Myklebust
cca588d6c8 NFS: Reclaim writes via writepage are opportunistic
No need to make them a priority any more, or to make them succeed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:07 -04:00
Trond Myklebust
abf4e13cc1 NFSv4: Use the right stateid for delegations in setattr, read and write
When we're using a delegation to represent our open state, we should
ensure that we use the stateid that was used to create that delegation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:07 -04:00
Trond Myklebust
93b717fd81 NFSv4: Label stateids with the type
In order to more easily distinguish what kind of stateid we are dealing
with, introduce a type that can be used to label the stateid structure.

The label will be useful both for debugging, but also when dealing with
operations like SETATTR, READ and WRITE that can take several different
types of stateid as arguments.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:06 -04:00
Trond Myklebust
f538d0ba5b pNFS: Fix a leaked layoutstats flag
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:05 -04:00
Chuck Lever
6b26cc8c8e sunrpc: Advertise maximum backchannel payload size
RPC-over-RDMA transports have a limit on how large a backward
direction (backchannel) RPC message can be. Ensure that the NFSv4.x
CREATE_SESSION operation advertises this limit to servers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:47:57 -04:00
Shirley Ma
181342c5eb xprtrdma: Add rdma6 option to support NFS/RDMA IPv6
RFC 5666: The "rdma" netid is to be used when IPv4 addressing
is employed by the underlying transport, and "rdma6" for IPv6
addressing.

Add mount -o proto=rdma6 option to support NFS/RDMA IPv6 addressing.

Changes from v2:
 - Integrated comments from Chuck Level, Anna Schumaker, Trodt Myklebust
 - Add a little more to the patch description to describe NFS/RDMA
   IPv6 suggested by Chuck Level and Anna Schumaker
 - Removed duplicated rdma6 define
 - Remove Opt_xprt_rdma mountfamily since it doesn't support

Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:47:56 -04:00
Tigran Mkrtchyan
a1d1c4f11a nfs4: client: do not send empty SETATTR after OPEN_CREATE
OPEN_CREATE with EXCLUSIVE4_1 sends initial file permission.
Ignoring  fact, that server have indicated that file mod is set, client
will send yet another SETATTR request, but, as mode is already set,
new SETATTR will be empty. This is not a problem, nevertheless
an extra roundtrip and slow open on high latency networks.

This change is aims to skip extra setattr after open  if there are
no attributes to be set.

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:47:55 -04:00
Anna Schumaker
2e72448b07 NFS: Add COPY nfs operation
This adds the copy_range file_ops function pointer used by the
sys_copy_range() function call.  This patch only implements sync copies,
so if an async copy happens we decode the stateid and ignore it.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
2016-05-17 15:47:55 -04:00
Anna Schumaker
67911c8f18 NFS: Add nfs_commit_file()
Copy will use this to set up a commit request for a generic range.  I
don't want to allocate a new pagecache entry for the file, so I needed
to change parts of the commit path to handle requests with a null
wb_page.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:47:55 -04:00
Olga Kornievskaia
c2985d001d Fixing oops in callback path
Commit 80f9642724 ("NFSv4.x: Enforce the ca_maxreponsesize_cached
on the back channel") causes an oops when it receives a callback with
cachethis=yes.

[  109.667378] BUG: unable to handle kernel NULL pointer dereference at 00000000000002c8
[  109.669476] IP: [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
[  109.671216] PGD 0
[  109.671736] Oops: 0000 [#1] SMP
[  109.705427] CPU: 1 PID: 3579 Comm: nfsv4.1-svc Not tainted 4.5.0-rc1+ #1
[  109.706987] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[  109.709468] task: ffff8800b4408000 ti: ffff88008448c000 task.ti: ffff88008448c000
[  109.711207] RIP: 0010:[<ffffffffa08a3e68>]  [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
[  109.713521] RSP: 0018:ffff88008448fca0  EFLAGS: 00010286
[  109.714762] RAX: ffff880081ee202c RBX: ffff8800b7b5b600 RCX: 0000000000000001
[  109.716427] RDX: 0000000000000008 RSI: 0000000000000008 RDI: 0000000000000000
[  109.718091] RBP: ffff88008448fda8 R08: 0000000000000000 R09: 000000000b000000
[  109.719757] R10: ffff880137786000 R11: ffff8800b7b5b600 R12: 0000000001000000
[  109.721415] R13: 0000000000000002 R14: 0000000053270000 R15: 000000000000000b
[  109.723061] FS:  0000000000000000(0000) GS:ffff880139640000(0000) knlGS:0000000000000000
[  109.724931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.726278] CR2: 00000000000002c8 CR3: 0000000034d50000 CR4: 00000000001406e0
[  109.727972] Stack:
[  109.728465]  ffff880081ee202c ffff880081ee201c 000000008448fcc0 ffff8800baccb800
[  109.730349]  ffff8800baccc800 ffffffffa08d0380 0000000000000000 0000000000000000
[  109.732211]  ffff8800b7b5b600 0000000000000001 ffffffff81d073c0 ffff880081ee3090
[  109.734056] Call Trace:
[  109.734657]  [<ffffffffa03795d4>] svc_process_common+0x5c4/0x6c0 [sunrpc]
[  109.736267]  [<ffffffffa0379a4c>] bc_svc_process+0x1fc/0x360 [sunrpc]
[  109.737775]  [<ffffffffa08a2c2c>] nfs41_callback_svc+0x10c/0x1d0 [nfsv4]
[  109.739335]  [<ffffffff810cb380>] ? prepare_to_wait_event+0xf0/0xf0
[  109.740799]  [<ffffffffa08a2b20>] ? nfs4_callback_svc+0x50/0x50 [nfsv4]
[  109.742349]  [<ffffffff810a6998>] kthread+0xd8/0xf0
[  109.743495]  [<ffffffff810a68c0>] ? kthread_park+0x60/0x60
[  109.744776]  [<ffffffff816abc4f>] ret_from_fork+0x3f/0x70
[  109.746037]  [<ffffffff810a68c0>] ? kthread_park+0x60/0x60
[  109.747324] Code: cc 45 31 f6 48 8b 85 00 ff ff ff 44 89 30 48 8b 85 f8 fe ff ff 44 89 20 48 8b 9d 38 ff ff ff 48 8b bd 30 ff ff ff 48 85 db 74 4c <4c> 8b af c8 02 00 00 4d 8d a5 08 02 00 00 49 81 c5 98 02 00 00
[  109.754361] RIP  [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
[  109.756123]  RSP <ffff88008448fca0>
[  109.756951] CR2: 00000000000002c8
[  109.757738] ---[ end trace 2b8555511ab5dfb4 ]---
[  109.758819] Kernel panic - not syncing: Fatal exception
[  109.760126] Kernel Offset: disabled
[  118.938934] ---[ end Kernel panic - not syncing: Fatal exception

It doesn't unlock the table nor does it set the cps->clp pointer which
is later needed by nfs4_cb_free_slot().

Fixes: 80f9642724 ("NFSv4.x: Enforce the ca_maxresponsesize_cached ...")
CC: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:45:00 -04:00
Steve French
897fba1172 remove directory incorrectly tries to set delete on close on non-empty directories
Wrong return code was being returned on SMB3 rmdir of
non-empty directory.

For SMB3 (unlike for cifs), we attempt to delete a directory by
set of delete on close flag on the open. Windows clients set
this flag via a set info (SET_FILE_DISPOSITION to set this flag)
which properly checks if the directory is empty.

With this patch on smb3 mounts we correctly return
 "DIRECTORY NOT EMPTY"
on attempts to remove a non-empty directory.

Signed-off-by: Steve French <steve.french@primarydata.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
2016-05-17 14:09:44 -05:00
Steve French
5a4f7e8e7f Update cifs.ko version to 2.09
Signed-off-by: Steven French <steve.french@primarydata.com>
2016-05-17 14:09:34 -05:00
Stefan Metzmacher
1a967d6c9b fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication
Only server which map unknown users to guest will allow
access using a non-null NTLMv2_Response.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:34 -05:00
Stefan Metzmacher
777f69b8d2 fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication
Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:34 -05:00
Stefan Metzmacher
fa8f3a354b fs/cifs: correctly to anonymous authentication for the LANMAN authentication
Only server which map unknown users to guest will allow
access using a non-null LMChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:34 -05:00
Stefan Metzmacher
cfda35d982 fs/cifs: correctly to anonymous authentication via NTLMSSP
See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client:

   ...
   Set NullSession to FALSE
   If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND
      AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND
      (AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1)
       OR
       AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0))
       -- Special case: client requested anonymous authentication
       Set NullSession to TRUE
   ...

Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:33 -05:00
Sachin Prabhu
11e31647c9 cifs: remove any preceding delimiter from prefix_path
We currently do not check if any delimiter exists before the prefix
path in cifs_compose_mount_options(). Consequently when building the
devname using cifs_build_devname() we can end up with multiple
delimiters separating the UNC and the prefix path.

An issue was reported by the customer mounting a folder within a DFS
share from a Netapp server which uses McAfee antivirus. We have
narrowed down the cause to the use of double backslashes in the file
name used to open the file. This was determined to be caused because of
additional delimiters as a result of the bug.

In addition to changes in cifs_build_devname(), we also fix
cifs_parse_devname() to ignore any preceding delimiter for the prefix
path.

The problem was originally reported on RHEL 6 in RHEL bz 1252721. This
is the upstream version of the fix. The fix was confirmed by looking at
the packet capture of a DFS mount.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:33 -05:00
Goldwyn Rodrigues
1f1735cb75 cifs: Use file_dentry()
CIFS may be used as lower layer of overlayfs and accessing f_path.dentry can
lead to a crash.

Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17 14:09:33 -05:00
Linus Torvalds
7f427d3a60 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull parallel filesystem directory handling update from Al Viro.

This is the main parallel directory work by Al that makes the vfs layer
able to do lookup and readdir in parallel within a single directory.
That's a big change, since this used to be all protected by the
directory inode mutex.

The inode mutex is replaced by an rwsem, and serialization of lookups of
a single name is done by a "in-progress" dentry marker.

The series begins with xattr cleanups, and then ends with switching
filesystems over to actually doing the readdir in parallel (switching to
the "iterate_shared()" that only takes the read lock).

A more detailed explanation of the process from Al Viro:
 "The xattr work starts with some acl fixes, then switches ->getxattr to
  passing inode and dentry separately.  This is the point where the
  things start to get tricky - that got merged into the very beginning
  of the -rc3-based #work.lookups, to allow untangling the
  security_d_instantiate() mess.  The xattr work itself proceeds to
  switch a lot of filesystems to generic_...xattr(); no complications
  there.

  After that initial xattr work, the series then does the following:

   - untangle security_d_instantiate()

   - convert a bunch of open-coded lookup_one_len_unlocked() to calls of
     that thing; one such place (in overlayfs) actually yields a trivial
     conflict with overlayfs fixes later in the cycle - overlayfs ended
     up switching to a variant of lookup_one_len_unlocked() sans the
     permission checks.  I would've dropped that commit (it gets
     overridden on merge from #ovl-fixes in #for-next; proper resolution
     is to use the variant in mainline fs/overlayfs/super.c), but I
     didn't want to rebase the damn thing - it was fairly late in the
     cycle...

   - some filesystems had managed to depend on lookup/lookup exclusion
     for *fs-internal* data structures in a way that would break if we
     relaxed the VFS exclusion.  Fixing hadn't been hard, fortunately.

   - core of that series - parallel lookup machinery, replacing
     ->i_mutex with rwsem, making lookup_slow() take it only shared.  At
     that point lookups happen in parallel; lookups on the same name
     wait for the in-progress one to be done with that dentry.

     Surprisingly little code, at that - almost all of it is in
     fs/dcache.c, with fs/namei.c changes limited to lookup_slow() -
     making it use the new primitive and actually switching to locking
     shared.

   - parallel readdir stuff - first of all, we provide the exclusion on
     per-struct file basis, same as we do for read() vs lseek() for
     regular files.  That takes care of most of the needed exclusion in
     readdir/readdir; however, these guys are trickier than lookups, so
     I went for switching them one-by-one.  To do that, a new method
     '->iterate_shared()' is added and filesystems are switched to it
     as they are either confirmed to be OK with shared lock on directory
     or fixed to be OK with that.  I hope to kill the original method
     come next cycle (almost all in-tree filesystems are switched
     already), but it's still not quite finished.

   - several filesystems get switched to parallel readdir.  The
     interesting part here is dealing with dcache preseeding by readdir;
     that needs minor adjustment to be safe with directory locked only
     shared.

     Most of the filesystems doing that got switched to in those
     commits.  Important exception: NFS.  Turns out that NFS folks, with
     their, er, insistence on VFS getting the fuck out of the way of the
     Smart Filesystem Code That Knows How And What To Lock(tm) have
     grown the locking of their own.  They had their own homegrown
     rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink
     is the reader there).  Of course, with VFS getting the fuck out of
     the way, as requested, the actual smarts of the smart filesystem
     code etc. had become exposed...

   - do_last/lookup_open/atomic_open cleanups.  As the result, open()
     without O_CREAT locks the directory only shared.  Including the
     ->atomic_open() case.  Backmerge from #for-linus in the middle of
     that - atomic_open() fix got brought in.

   - then comes NFS switch to saner (VFS-based ;-) locking, killing the
     homegrown "lookup and readdir are writers" kinda-sorta rwsem.  All
     exclusion for sillyunlink/lookup is done by the parallel lookups
     mechanism.  Exclusion between sillyunlink and rmdir is a real rwsem
     now - rmdir being the writer.

     Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel
     now.

   - the rest of the series consists of switching a lot of filesystems
     to parallel readdir; in a lot of cases ->llseek() gets simplified
     as well.  One backmerge in there (again, #for-linus - rockridge
     fix)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits)
  ext4: switch to ->iterate_shared()
  hfs: switch to ->iterate_shared()
  hfsplus: switch to ->iterate_shared()
  hostfs: switch to ->iterate_shared()
  hpfs: switch to ->iterate_shared()
  hpfs: handle allocation failures in hpfs_add_pos()
  gfs2: switch to ->iterate_shared()
  f2fs: switch to ->iterate_shared()
  afs: switch to ->iterate_shared()
  befs: switch to ->iterate_shared()
  befs: constify stuff a bit
  isofs: switch to ->iterate_shared()
  get_acorn_filename(): deobfuscate a bit
  btrfs: switch to ->iterate_shared()
  logfs: no need to lock directory in lseek
  switch ecryptfs to ->iterate_shared
  9p: switch to ->iterate_shared()
  fat: switch to ->iterate_shared()
  romfs, squashfs: switch to ->iterate_shared()
  more trivial ->iterate_shared conversions
  ...
2016-05-17 11:01:31 -07:00
Linus Torvalds
9a07a79684 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:

   - Crypto self tests can now be disabled at boot/run time.
   - Add async support to algif_aead.

  Algorithms:

   - A large number of fixes to MPI from Nicolai Stange.
   - Performance improvement for HMAC DRBG.

  Drivers:

   - Use generic crypto engine in omap-des.
   - Merge ppc4xx-rng and crypto4xx drivers.
   - Fix lockups in sun4i-ss driver by disabling IRQs.
   - Add DMA engine support to ccp.
   - Reenable talitos hash algorithms.
   - Add support for Hisilicon SoC RNG.
   - Add basic crypto driver for the MXC SCC.

  Others:

   - Do not allocate crypto hash tfm in NORECLAIM context in ecryptfs"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
  crypto: qat - change the adf_ctl_stop_devices to void
  crypto: caam - fix caam_jr_alloc() ret code
  crypto: vmx - comply with ABIs that specify vrsave as reserved.
  crypto: testmgr - Add a flag allowing the self-tests to be disabled at runtime.
  crypto: ccp - constify ccp_actions structure
  crypto: marvell/cesa - Use dma_pool_zalloc
  crypto: qat - make adf_vf_isr.c dependant on IOV config
  crypto: qat - Fix typo in comments
  lib: asn1_decoder - add MODULE_LICENSE("GPL")
  crypto: omap-sham - Use dma_request_chan() for requesting DMA channel
  crypto: omap-des - Use dma_request_chan() for requesting DMA channel
  crypto: omap-aes - Use dma_request_chan() for requesting DMA channel
  crypto: omap-des - Integrate with the crypto engine framework
  crypto: s5p-sss - fix incorrect usage of scatterlists api
  crypto: s5p-sss - Fix missed interrupts when working with 8 kB blocks
  crypto: s5p-sss - Use common BIT macro
  crypto: mxc-scc - fix unwinding in mxc_scc_crypto_register()
  crypto: mxc-scc - signedness bugs in mxc_scc_ablkcipher_req_init()
  crypto: talitos - fix ahash algorithms registration
  crypto: ccp - Ensure all dependencies are specified
  ...
2016-05-17 09:33:39 -07:00