Commit Graph

753602 Commits

Author SHA1 Message Date
Parav Pandit
414448d249 RDMA: Use ib_gid_attr during GID modification
Now that ib_gid_attr contains device, port and index, simplify the
provider APIs add_gid() and del_gid() to use device, port and index
fields from the ib_gid_attr attributes structure.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:34:16 -06:00
Parav Pandit
3e44e0ee08 IB/providers: Avoid null netdev check for RoCE
Now that IB core GID cache ensures that all RoCE entries have an
associated netdev remove null checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Parav Pandit
14169e333e IB/providers: Avoid zero GID check for RoCE
Now that the IB core GID cache ensures that a zero GID doesn't exist in
the GID table remove zero GID checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Parav Pandit
598ff6bae6 IB/core: Refactor GID modify code for RoCE
Code is refactored to prepare separate functions for RoCE which can do more
complex operations related to reference counting, while still
maintainining code readability. This includes
(a) Simplification to not perform netdevice checks and modifications
for IB link layer.
(b) Do not add RoCE GID entry which has NULL netdevice; instead return
an error.
(c) If GID addition fails at provider level add_gid(), do not add the
entry in the cache and keep the entry marked as INVALID.
(d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they
can be used even for modifying default GIDs. This avoid some code
duplication in modifying default GIDs.
(e) find_gid() routine refers to the data entry flags to qualify a GID
as valid or invalid GID rather than depending on attributes and zeroness
of the GID content.
(f) gid_table_reserve_default() sets the GID default attribute at
beginning while setting up the GID table. There is no need to use
default_gid flag in low level functions such as write_gid(), add_gid(),
del_gid(), as they never need to update the DEFAULT property of the GID
entry while during GID table update.

As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer
searchable as described below.

A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB
spec version 1.3 section 4.1.1, point (6) whose snippet is below.

"The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as
the Reserved GID. It shall never be assigned to any endport. It shall
not be used as a destination address or in a global routing header
(GRH)."

GID table cache now only stores valid GID entries. Before this patch,
Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using
ib_find_cached_gid_by_port() and other similar find routines.

Zero GID is no longer searchable as it shall not to be present in GRH or
path recored entry as described in IB spec version 1.3 section 4.1.1,
point (6), section 12.7.10 and section 12.7.20.

ib_cache_update() is simplified to check link layer once, use unified
locking scheme for all link layers, removed temporary gid table
allocation/free logic.

Additionally,
(a) Expand ib_gid_attr to store port and index so that GID query
routines can get port and index information from the attribute structure.
(b) Expand ib_gid_attr to store device as well so that in future code when
GID reference counting is done, device is used to reach back to the GID
table entry.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:50 -06:00
Parav Pandit
f35faa4ba9 IB/core: Simplify ib_query_gid to always refer to cache
Currently following inconsistencies exist.
1. ib_query_gid() returns GID from the software cache for a RoCE port
and returns GID from the HCA for an IB port.
This is incorrect because software GID cache is maintained regardless
of HCA port type.

2. GID is queries from the HCA via ib_query_gid and updated in the
software cache for IB link layer. Both of them might not be in sync.

ULPs such as SRP initiator, SRP target, IPoIB driver have historically
used ib_query_gid() API to query the GID. However CM used cached version
during CM processing, When software cache was introduced, this
inconsitency remained.

In order to simplify, improve readability and avoid link layer
specific above inconsistencies, this patch brings following changes.

1. ib_query_gid() always refers to the cache layer regardless of link
layer.

2. cache module who reads the GID entry from HCA and builds the cache,
directly invokes the HCA provider verb's query_gid() callback function.

3. ib_query_port() is being called in early stage where GID cache is not
yet build while reading port immutable property. Therefore it needs to
read the default GID from the HCA for IB link layer to publish the
subnet prefix.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:50 -06:00
Parav Pandit
0e1f9b9244 RDMA/providers: Simplify query_gid callback of RoCE providers
ib_query_gid() fetches the GID from the software cache maintained in
ib_core for RoCE ports.

Therefore, simplify the provider drivers for RoCE to treat query_gid()
callback as never called for RoCE, and only require non-RoCE devices to
implement it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:47 -06:00
Parav Pandit
72e1ff0fb7 RDMA/core: Update query_gid documentation for HCA drivers
query_gid() should return right GID value for iWarp and IB link layers.
It is a no-op for RoCE link layer.  Update the documentation to reflect
this.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:09:01 -06:00
Roland Dreier
8435168d50 RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
Check to make sure that ctx->cm_id->device is set before we use it.
Otherwise userspace can trigger a NULL dereference by doing
RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device.

Cc: <stable@vger.kernel.org>
Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:06:14 -06:00
Linus Torvalds
17dec0a949 Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "There was a lot of work this cycle fixing bugs that were discovered
  after the merge window and getting everything ready where we can
  reasonably support fully unprivileged fuse. The bug fixes you already
  have and much of the unprivileged fuse work is coming in via other
  trees.

  Still left for fully unprivileged fuse is figuring out how to cleanly
  handle .set_acl and .get_acl in the legacy case, and properly handling
  of evm xattrs on unprivileged mounts.

  Included in the tree is a cleanup from Alexely that replaced a linked
  list with a statically allocated fix sized array for the pid caches,
  which simplifies and speeds things up.

  Then there is are some cleanups and fixes for the ipc namespace. The
  motivation was that in reviewing other code it was discovered that
  access ipc objects from different pid namespaces recorded pids in such
  a way that when asked the wrong pids were returned. In the worst case
  there has been a measured 30% performance impact for sysvipc
  semaphores. Other test cases showed no measurable performance impact.
  Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc
  performance both gave the nod that this is good enough.

  Casey Schaufler and James Morris have given their approval to the LSM
  side of the changes.

  I simplified the types and the code dealing with sysvipc to pass just
  kern_ipc_perm for all three types of ipc. Which reduced the header
  dependencies throughout the kernel and simplified the lsm code.

  Which let me work on the pid fixes without having to worry about
  trivial changes causing complete kernel recompiles"

* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ipc/shm: Fix pid freeing.
  ipc/shm: fix up for struct file no longer being available in shm.h
  ipc/smack: Tidy up from the change in type of the ipc security hooks
  ipc: Directly call the security hook in ipc_ops.associate
  ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces
  ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces
  ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces.
  ipc/util: Helpers for making the sysvipc operations pid namespace aware
  ipc: Move IPCMNI from include/ipc.h into ipc/util.h
  msg: Move struct msg_queue into ipc/msg.c
  shm: Move struct shmid_kernel into ipc/shm.c
  sem: Move struct sem and struct sem_array into ipc/sem.c
  msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks
  shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks
  sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks
  pidns: simpler allocation of pid_* caches
2018-04-03 19:15:32 -07:00
Martin Brandenburg
dd0980226f orangefs: document package install and xfstests procedure
Unless one is working on the userspace code, there's no need to compile
OrangeFS.  The package works just fine.

(But leave the documentation for building from source since not everyone
uses distributions which include the package.)

Also document the process to run xfstests against OrangeFS.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:28 -04:00
Martin Brandenburg
209469d978 orangefs: remove unused code
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:28 -04:00
Martin Brandenburg
bdd6f08358 orangefs: make several *_operations structs static
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:27 -04:00
Martin Brandenburg
a5135eeab2 orangefs: implement vm_ops->fault
Must retrieve size before running filemap_fault so the kernel has
an up-to-date size.

This should have been caught by xfstests generic/246, but it was masked
by orangefs_new_inode, which set i_size to PAGE_SIZE.  When nothing
caused a getattr prior to a pagefault, i_size was still PAGE_SIZE.
Since xfstests only read 10 bytes, it did not catch this bug.

When orangefs_new_inode was modified to perform a getattr instead,
i_size was set to zero, as it was a newly created file.  Then
orangefs_file_write_iter did NOT set i_size.  Instead it invalidated the
attribute cache, which should have caused the next caller to retrieve
i_size.  But the fault handler did not know it was supposed to retrieve
i_size.  So during xfstests, i_size was still zero, and filemap_fault
returned VM_FAULT_SIGBUS.

Fixes xfstests generic/452.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:27 -04:00
Jaegeuk Kim
214c2461a8 f2fs: remain written times to update inode during fsync
This fixes xfstests/generic/392.

The failure was caused by different times between 1) one marked in the last
fsync(2) call and 2) the other given by roll-forward recovery after power-cut.
The reason was that we skipped updating inode block at 1), since its i_size
was recoverable along with 4KB-aligned data writes, which was fixed by:
  "f2fs: fix a wrong condition in f2fs_skip_inode_update"

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-03 18:52:47 -07:00
Nicholas Piggin
b9ee31e100 powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead
When stop is executed with EC=ESL=0, it appears to execute like a
normal instruction (resuming from NIP when woken by interrupt). So all
the save/restore handling can be avoided completely. In particular NV
GPRs do not have to be saved, and MSR does not have to be switched
back to kernel MSR.

So move the test for EC=ESL=0 sleep states out to power9_idle_stop,
and return directly to the caller after stop in that case.

This improves performance for ping-pong benchmark with the stop0_lite
idle state by 2.54% for 2 threads in the same core, and 2.57% for
different cores. Performance increase with HV_POSSIBLE defined will be
improved further by avoiding the hwsync.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04 11:11:43 +10:00
Linus Torvalds
d92cd810e6 Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "rcu_work addition and a couple trivial changes"

* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove the comment about the old manager_arb mutex
  workqueue: fix the comments of nr_idle
  fs/aio: Use rcu_work instead of explicit rcu and work item
  cgroup: Use rcu_work instead of explicit rcu and work item
  RCU, workqueue: Implement rcu_work
2018-04-03 18:00:13 -07:00
Linus Torvalds
a23867f1d2 Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:
 "Nothing too interesting.

  The biggest change is refcnting fix for ata_host - the bug is recent
  and can only be triggered on controller hotplug, so very few are
  hitting it.

  There also are a number of trivial license / error message changes and
  some hardware specific changes"

* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (23 commits)
  ahci: imx: add the imx8qm ahci sata support
  libata: ensure host is free'd on error exit paths
  ata: ahci-platform: add reset control support
  ahci: imx: fix the build warning
  ata: add Amiga Gayle PATA controller driver
  ahci: imx: add the imx6qp ahci sata support
  ata: change Tegra124 to Tegra
  ata: ahci_tegra: Add AHCI support for Tegra210
  ata: ahci_tegra: disable DIPM
  ata: ahci_tegra: disable devslp for Tegra124
  ata: ahci_tegra: initialize regulators from soc struct
  ata: ahci_tegra: Update initialization sequence
  dt-bindings: Tegra210: add binding documentation
  libata: add refcounting to ata_host
  pata_bk3710: clarify license version and use SPDX header
  pata_falcon: clarify license version and use SPDX header
  pata_it821x: Delete an error message for a failed memory allocation in it821x_firmware_command()
  pata_macio: Delete an error message for a failed memory allocation in two functions
  pata_mpc52xx: Delete an error message for a failed memory allocation in mpc52xx_ata_probe()
  sata_dwc_460ex: Delete an error message for a failed memory allocation in sata_dwc_port_start()
  ...
2018-04-03 17:42:25 -07:00
James Bottomley
2e1f44f6ad Merge branch 'fixes' into misc
Somewhat nasty merge due to conflicts between "33b28357dd00 scsi:
qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan" and "2b5b96473efc
scsi: qla2xxx: Fix FC-NVMe LUN discovery"

Merge is non-trivial and has been verified by Qlogic (Cavium)

Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
2018-04-03 17:38:39 -07:00
Linus Torvalds
ef1c4a6fa9 Merge tag 'media/v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:

 - new CEC pin injection code for testing purposes

 - DVB frontend cxd2099 promoted from staging

 - new platform driver for Sony cxd2880 DVB devices

 - new sensor drivers: mt9t112, ov2685, ov5695, ov772x, tda1997x,
   tw9910.c

 - removal of unused cx18 and ivtv alsa mixers

 - the reneseas-ceu driver doesn't depend on soc_camera anymore and
   moved from staging

 - removed the mantis_vp3028 driver, unused since 2009

 - s5p-mfc: add support for version 10 of the MSP

 - added a decoder for imon protocol

 - atomisp: lots of cleanups

 - imx074 and mt9t031: don't depend on soc_camera anymore, being
   promoted from staging

 - added helper functions to better support DVB I2C binding

 - lots of driver improvements and cleanups

* tag 'media/v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (438 commits)
  media: v4l2-ioctl: rename a temp var that stores _IOC_SIZE(cmd)
  media: fimc-capture: get rid of two warnings
  media: dvb-usb-v2: fix a missing dependency of I2C_MUX
  media: uvc: to the right check at uvc_ioctl_enum_framesizes()
  media: cec-core: fix a bug at cec_error_inj_write()
  media: tda9840: cleanup a warning
  media: tm6000:  avoid casting just to print pointer address
  media: em28xx-input: improve error handling code
  media: zr364xx: avoid casting just to print pointer address
  media: vivid-radio-rx: add a cast to avoid a warning
  media: saa7134-alsa: don't use casts to print a buffer address
  media: solo6x10: get rid of an address space warning
  media: zoran: don't cast pointers to print them
  media: ir-kbd-i2c: change the if logic to avoid a warning
  media: ir-kbd-i2c: improve error handling code
  media: saa7134-input: improve error handling
  media: s2255drv: fix a casting warning
  media: ivtvfb: Cleanup some warnings
  media: videobuf-dma-sg: Fix a weird cast
  soc_camera: fix a weird cast on printk
  ...
2018-04-03 17:16:59 -07:00
Linus Torvalds
147a89bc71 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kconfig updates from Masahiro Yamada:

 - improve checkpatch for more precise Kconfig code checking

 - clarify effective selects by grouping reverse dependencies in help

 - do not write out '# CONFIG_FOO is not set' from invisible symbols

 - make oldconfig as silent as it should be

 - rename 'silentoldconfig' to 'syncconfig'

 - add unit-test framework and several test cases

 - warn unmet dependency of tristate symbols

 - make unmet dependency warnings readable, removing false positives

 - improve recursive include detection

 - use yylineno to simplify the line number tracking

 - misc cleanups

* tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits)
  kconfig: use yylineno option instead of manual lineno increments
  kconfig: detect recursive inclusion earlier
  kconfig: remove duplicated file name and lineno of recursive inclusion
  kconfig: do not include both curses.h and ncurses.h for nconfig
  kconfig: make unmet dependency warnings readable
  kconfig: warn unmet direct dependency of tristate symbols selected by y
  kconfig: tests: test if recursive inclusion is detected
  kconfig: tests: test if recursive dependencies are detected
  kconfig: tests: test randconfig for choice in choice
  kconfig: tests: test defconfig when two choices interact
  kconfig: tests: check visibility of tristate choice values in y choice
  kconfig: tests: check unneeded "is not set" with unmet dependency
  kconfig: tests: test if new symbols in choice are asked
  kconfig: tests: test automatic submenu creation
  kconfig: tests: add basic choice tests
  kconfig: tests: add framework for Kconfig unit testing
  kbuild: add PYTHON2 and PYTHON3 variables
  kconfig: remove redundant streamline_config.pl prerequisite
  kconfig: rename silentoldconfig to syncconfig
  kconfig: invoke oldconfig instead of silentoldconfig from local*config
  ...
2018-04-03 16:28:01 -07:00
Michael Ellerman
d0b791c029 powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()
Commit 3d4fbffdd7 ("powerpc/64s/idle: POWER9 implement a separate
idle stop function for hotplug") that added power9_offline_stop() was
written before commit 7672691a08 ("powerpc/powernv: Provide a way to
force a core into SMT4 mode").

When merging the former I failed to notice that it caused us to skip
the force-SMT4 logic for offline CPUs. The result is that offlined
CPUs will not correctly participate in the force-SMT4 logic, which
presumably will result in badness (not tested).

Reconcile the two commits by making power9_offline_stop() a pre-cursor
to power9_idle_stop(), so that they share the force-SMT4 logic.

This is based on an original commit from Nick, all breakage is my own.

Fixes: 3d4fbffdd7 ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2018-04-04 09:09:35 +10:00
Linus Torvalds
3b24b83763 Merge tag 'kbuild-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - add a shell script to get Clang version

 - improve portability of build scripts

 - drop always-enabled CONFIG_THIN_ARCHIVE and remove unused code

 - rename built-in.o which is now thin archive to built-in.a

 - process clean/build targets one by one to get along with -j option

 - simplify ld-option

 - improve building with CONFIG_TRIM_UNUSED_KSYMS

 - define KBUILD_MODNAME even for objects shared among multiple modules

 - avoid linking multiple instances of same objects from composite
   objects

 - move <linux/compiler_types.h> to c_flags to include it only for C
   files

 - clean-up various Makefiles

* tag 'kbuild-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
  kbuild: get <linux/compiler_types.h> out of <linux/kconfig.h>
  kbuild: clean up link rule of composite modules
  kbuild: clean up archive rule of built-in.a
  kbuild: remove partial section mismatch detection for built-in.a
  net: liquidio: clean up Makefile for simpler composite object handling
  lib: zstd: clean up Makefile for simpler composite object handling
  kbuild: link $(real-obj-y) instead of $(obj-y) into built-in.a
  kbuild: rename real-objs-y/m to real-obj-y/m
  kbuild: move modname and modname-multi close to modname_flags
  kbuild: simplify modname calculation
  kbuild: fix modname for composite modules
  kbuild: define KBUILD_MODNAME even if multiple modules share objects
  kbuild: remove unnecessary $(subst $(obj)/, , ...) in modname-multi
  kbuild: Use ls(1) instead of stat(1) to obtain file size
  kbuild: link vmlinux only once for CONFIG_TRIM_UNUSED_KSYMS
  kbuild: move include/config/ksym/* to include/ksym/*
  kbuild: move CONFIG_TRIM_UNUSED_KSYMS code unneeded for external module
  kbuild: restore autoksyms.h touch to the top Makefile
  kbuild: move 'scripts' target below
  kbuild: remove wrong 'touch' in adjust_autoksyms.sh
  ...
2018-04-03 15:51:22 -07:00
Linus Torvalds
0734e00ef9 Merge branch 'parisc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
 "Lots of small enhancements and fixes in this patchset:

   - improved the x86-64 compatibility for PCI cards by returning -1UL
     for timed out MMIO transactions (instead of crashing)

   - fixed HPMC handler for PAT machines: size needs to be multiple of 16

   - prepare machine_power_off() to be able to turn rp3410 and c8000
     machines off via IMPI

   - added code to extract machine info for usage with qemu

   - some init sections fixes

   - lots of fixes for sparse-, ubsan- and uninitalized variables
     warnings"

* 'parisc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix out of array access in match_pci_device()
  parisc: Add code generator for Qemu/SeaBIOS machine info
  parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
  parisc: Fix HPMC handler by increasing size to multiple of 16 bytes
  parisc: Directly call machine_power_off() in power button driver
  parisc: machine_power_off() should call pm_power_off()
  parisc/Kconfig: SMP kernels boot on all machines
  parisc: Silence uninitialized variable warning in dbl_to_sgl_fcnvff()
  parisc: Move various functions and strings to init section
  parisc: Convert MAP_TYPE to cover 4 bits on parisc
  parisc: Force to various endian types for sparse
  parisc/gscps2: Fix sparse warnings
  parisc/led: Fix sparse warnings
  parisc/parport_gsc: Use NULL to avoid sparse warning
  parisc/stifb: Use fb_memset() to avoid sparse warning
2018-04-03 15:48:04 -07:00
Linus Torvalds
4608f06453 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc updates from David Miller:

 1) Add support for ADI (Application Data Integrity) found in more
    recent sparc64 cpus. Essentially this is keyed based access to
    virtual memory, and if the key encoded in the virual address is
    wrong you get a trap.

    The mm changes were reviewed by Andrew Morton and others.

    Work by Khalid Aziz.

 2) Validate DAX completion index range properly, from Rob Gardner.

 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
  sparc64: Make atomic_xchg() an inline function rather than a macro.
  sparc64: Properly range check DAX completion index
  sparc: Make auxiliary vectors for ADI available on 32-bit as well
  sparc64: Oracle DAX driver depends on SPARC64
  sparc64: Update signal delivery to use new helper functions
  sparc64: Add support for ADI (Application Data Integrity)
  mm: Allow arch code to override copy_highpage()
  mm: Clear arch specific VM flags on protection change
  mm: Add address parameter to arch_validate_prot()
  sparc64: Add auxiliary vectors to report platform ADI properties
  sparc64: Add handler for "Memory Corruption Detected" trap
  sparc64: Add HV fault type handlers for ADI related faults
  sparc64: Add support for ADI register fields, ASIs and traps
  mm, swap: Add infrastructure for saving page metadata on swap
  signals, sparc: Add signal codes for ADI violations
2018-04-03 14:08:58 -07:00
Linus Torvalds
5bb053bef8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Support offloading wireless authentication to userspace via
    NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.

 2) A lot of work on network namespace setup/teardown from Kirill Tkhai.
    Setup and cleanup of namespaces now all run asynchronously and thus
    performance is significantly increased.

 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon
    Streiff.

 4) Support zerocopy on RDS sockets, from Sowmini Varadhan.

 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel
    Borkmann.

 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime
    Chevallier.

 7) Support grafting of child qdiscs in mlxsw driver, from Nogah
    Frankel.

 8) Add packet forwarding tests to selftests, from Ido Schimmel.

 9) Deal with sub-optimal GSO packets better in BBR congestion control,
    from Eric Dumazet.

10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.

11) Add path MTU tests to selftests, from Stefano Brivio.

12) Various bits of IPSEC offloading support for mlx5, from Aviad
    Yehezkel, Yossi Kuperman, and Saeed Mahameed.

13) Support RSS spreading on ntuple filters in SFC driver, from Edward
    Cree.

14) Lots of sockmap work from John Fastabend. Applications can use eBPF
    to filter sendmsg and sendpage operations.

15) In-kernel receive TLS support, from Dave Watson.

16) Add XDP support to ixgbevf, this is significant because it should
    allow optimized XDP usage in various cloud environments. From Tony
    Nguyen.

17) Add new Intel E800 series "ice" ethernet driver, from Anirudh
    Venkataramanan et al.

18) IP fragmentation match offload support in nfp driver, from Pieter
    Jansen van Vuuren.

19) Support XDP redirect in i40e driver, from Björn Töpel.

20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of
    tracepoints in their raw form, from Alexei Starovoitov.

21) Lots of striding RQ improvements to mlx5 driver with many
    performance improvements, from Tariq Toukan.

22) Use rhashtable for inet frag reassembly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits)
  net: mvneta: improve suspend/resume
  net: mvneta: split rxq/txq init and txq deinit into SW and HW parts
  ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh
  net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
  net: bgmac: Correctly annotate register space
  route: check sysctl_fib_multipath_use_neigh earlier than hash
  fix typo in command value in drivers/net/phy/mdio-bitbang.
  sky2: Increase D3 delay to sky2 stops working after suspend
  net/mlx5e: Set EQE based as default TX interrupt moderation mode
  ibmvnic: Disable irqs before exiting reset from closed state
  net: sched: do not emit messages while holding spinlock
  vlan: also check phy_driver ts_info for vlan's real device
  Bluetooth: Mark expected switch fall-throughs
  Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME
  Bluetooth: btrsi: remove unused including <linux/version.h>
  Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4
  sh_eth: kill useless check in __sh_eth_get_regs()
  sh_eth: add sh_eth_cpu_data::no_xdfar flag
  ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()
  ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()
  ...
2018-04-03 14:04:18 -07:00
Linus Torvalds
bb2407a721 Merge tag 'docs-4.17' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
 "There's been a fair amount of activity in Documentation/ this time
  around:

   - Lots of work aligning Documentation/ABI with reality, done by
     Aishwarya Pant.

   - The trace documentation has been converted to RST by Changbin Du

   - I thrashed up kernel-doc to deal with a parsing issue and to try to
     make the code more readable. It's still a 20+-year-old Perl hack,
     though.

   - Lots of other updates, typo fixes, and more"

* tag 'docs-4.17' of git://git.lwn.net/linux: (82 commits)
  Documentation/process: update FUSE project website
  docs: kernel-doc: fix parsing of arrays
  dmaengine: Fix spelling for parenthesis in dmatest documentation
  dmaengine: Make dmatest.rst indeed reST compatible
  dmaengine: Add note to dmatest documentation about supported channels
  Documentation: magic-numbers: Fix typo
  Documentation: admin-guide: add kvmconfig, xenconfig and tinyconfig commands
  Input: alps - Update documentation for trackstick v3 format
  Documentation: Mention why %p prints ptrval
  COPYING: use the new text with points to the license files
  COPYING: create a new file with points to the Kernel license files
  Input: trackpoint: document sysfs interface
  xfs: Change URL for the project in xfs.txt
  char/bsr: add sysfs interface documentation
  acpi: nfit: document sysfs interface
  block: rbd: update sysfs interface
  Documentation/sparse: fix typo
  Documentation/CodingStyle: Add an example for braces
  docs/vm: update 00-INDEX
  kernel-doc: Remove __sched markings
  ...
2018-04-03 13:35:51 -07:00
J. Bruce Fields
880a3a5325 nfsd: fix incorrect umasks
We're neglecting to clear the umask after it's set, which can cause a
later unrelated rpc to (incorrectly) use the same umask if it happens to
be processed by the same thread.

There's a more subtle problem here too:

An NFSv4 compound request is decoded all in one pass before any
operations are executed.

Currently we're setting current->fs->umask at the time we decode the
compound.  In theory a single compound could contain multiple creates
each setting a umask.  In that case we'd end up using whichever umask
was passed in the *last* operation as the umask for all the creates,
whether that was correct or not.

So, we should just be saving the umask at decode time and waiting to set
it until we actually process the corresponding operation.

In practice it's unlikely any client would do multiple creates in a
single compound.  And even if it did they'd likely be from the same
process (hence carry the same umask).  So this is a little academic, but
we should get it right anyway.

Fixes: 47057abde5 (nfsd: add support for the umask attribute)
Cc: stable@vger.kernel.org
Reported-by: Lucash Stach <l.stach@pengutronix.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 16:27:08 -04:00
Parav Pandit
ca486a3b33 IB/qedr: Remove GID add/del dummy routines
qedr driver's add_gid() and del_gid() callbacks are doing simple
checks which are already done by the ib core before invoking these
callback routines.

Therefore, code is simplified to skip implementing add_gid() and
del_gid() callback functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:46:29 -06:00
Shiraz Saleem
3e64f8d6f5 i40iw: Remove pre-production workaround for resource profile 1
Support for resource profile 1 is currenlty deprecated due to
a pre-production errata. Remove this workaround as its no longer
needed.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:40:39 -06:00
Jason Gunthorpe
41d902cb7c RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp
This structure is pushed down the ex and the non-ex path, so it needs to be
aligned to 8 bytes to go through ex without implicit padding.

Old user space will provide 4 bytes of resp on !ex and 8 bytes on ex, so
take the approach of just copying the minimum length.

New user space will consistently provide 8 bytes in both cases.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:38:40 -06:00
Linus Torvalds
e40dc66220 Merge tag 'leds_for_4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
 "New LED class driver:
   - add driver for Mellanox regmap LEDs

  Improvement to ledtrig-disk:
   - extend disk trigger for reads and writes

  Improvements and fixes to existing LED class drivers:
   - add more product/board names for PC Engines APU2
   - fix wrong dmi_match on PC Engines APU LEDs
   - clarify chips supported by LM355x driver
   - fix Kconfig text for MLXCPLD, SYSCON, MC13783, NETXBIG
   - allow leds-mlxcpld compilation for 32 bit arch"

* tag 'leds_for_4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds: Fix wrong dmi_match on PC Engines APU LEDs
  leds: Extends disk trigger for reads and writes
  leds: Add more product/board names for PC Engines APU2
  leds: add driver for support Mellanox regmap LEDs for BMC and x86 platform
  leds: fix Kconfig text for MLXCPLD, SYSCON, MC13783, NETXBIG
  leds: Clarify supported chips by LM355x driver
  leds: leds-mlxcpld: Allow compilation for 32 bit arch
2018-04-03 12:38:19 -07:00
Linus Torvalds
cc5ada7ca3 Merge tag 'for-linus-4.17' of git://github.com/cminyard/linux-ipmi
Pull IPMI updates from Corey Minyard:
 "Mostly small changes, as usual.

  This does add an IPMI BMC server-side driver, to allow a Linux system
  to act as an IPMI controller. That's the biggest change, but it is
  just a new driver that is fairly narrow in use.

  The other largish change is removing ACPI SPMI probe support, which
  should have never really been there in the beginning"

* tag 'for-linus-4.17' of git://github.com/cminyard/linux-ipmi:
  ipmi/parisc: Add IPMI chassis poweroff for certain HP PA-RISC and IA-64 servers
  ipmi_ssif: Fix kernel panic at msg_done_handler
  ipmi:pci: Blacklist a Realtek "IPMI" device
  ipmi: Remove ACPI SPMI probing from the system interface driver
  ipmi: Remove ACPI SPMI probing from the SSIF (I2C) driver
  ipmi: missing error code in try_smi_init()
  ipmi: use ARRAY_SIZE for poweroff_functions array sizing calculation
  ipmi: Consolidate cleanup code
  ipmi: Remove some unnecessary initializations
  ipmi: Fix some error cleanup issues
  ipmi: Add or fix SPDX-License-Identifier in all files
  ipmi: Re-use existing macros for built-in properties
  ipmi:pci: Make the PCI defines consistent with normal Linux ones
  ipmi: kcs_bmc: coding-style fixes and use new poll type
  char/ipmi: add documentation for sysfs interface
  ipmi: kcs_bmc: mark expected switch fall-through in kcs_bmc_handle_data
  ipmi: add an Aspeed KCS IPMI BMC driver
  ipmi: add a KCS IPMI BMC driver
2018-04-03 12:25:44 -07:00
Linus Torvalds
77624cd2a7 Merge tag 'pinctrl-v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control bulk updates from Linus Walleij:
 "New drivers:

   - Qualcomm SDM845: this is their new flagship SoC platform which
     seems to be targeted at premium mobile handsets.

   - Renesas R-Car M3-N SoC.

   - Renesas R8A77980 SoC.

   - NXP (ex Freescale) i.MX 6SLL SoC.

   - Mediatek MT2712 SoC.

   - Allwinner H6 SoC.

  Improvements:

   - Uniphier adds a few new functions and pins.

   - Renesas refactorings and additional pin definitions.

   - Improved pin groups for Axis Artpec6.

  Cleanup:

   - Drop the TZ1090 drivers. This platform is no longer maintained and
     is being deleted.

   - Drop ST-Ericsson U8540/U9540 support as this was never
     productified.

   - Overall minor fixes and janitorial"

* tag 'pinctrl-v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (82 commits)
  pinctrl: uniphier: add UART hardware flow control pin-mux settings
  pinctrl: sunxi: add support for the Allwinner H6 main pin controller
  pinctrl: sunxi: change irq_bank_base to irq_bank_map
  pinctrl: sunxi: introduce IRQ bank conversion function
  pinctrl: sunxi: refactor irq related register function to have desc
  pinctrl: msm8998: Remove owner assignment from platform_driver
  pinctrl: uniphier: divide I2S and S/PDIF audio out pin-mux group
  pinctrl: uniphier: add PXs2 Audio in/out pin-mux settings
  pinctrl/amd: poll InterruptEnable bits in enable_irq
  pinctrl: ocelot: fix gpio direction
  pinctrl: mtk: fix check warnings.
  pintcrl: mtk: support bias-disable of generic and special pins simultaneously
  pinctrl: add mt2712 pinctrl driver
  pinctrl: pinctrl-single: Fix pcs_request_gpio() when bits_per_mux != 0
  pinctrl: imx: Add pinctrl driver support for imx6sll
  dt-bindings: imx: update pinctrl doc for imx6sll
  pinctrl: intel: Implement intel_gpio_get_direction callback
  pinctrl: stm32: add 'depends on HAS_IOMEM' to fix unmet dependency
  pinctrl: mediatek: mtk-common: use true and false for boolean values
  pinctrl: sunxi: always look for apb block
  ...
2018-04-03 12:20:54 -07:00
Linus Torvalds
dc73d6a8d4 Merge tag 'mmc-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
 "MMC core:
   - Export host capabilities through debugfs
   - Export card RCA register via sysfs
   - Improve card initializing sequence while enabling 4-bit bus
   - Export a function to enable/disable wakeup for card detect IRQ

  MMC host:
   - dw_mmc: Add support for new hi3798cv200 variant
   - dw_mmc: Remove support for some deprecated DT properties
   - mediatek: Add support for new variant used on MT7622 SoC
   - sdhci: Improve wakeup support for SDIO IRQs
   - sdhci: Improve wakeup support for card detect IRQs
   - sdhci-omap: Add tuning support
   - sdhci_omap: Add UHS-I mode support
   - sunxi: Prepare for runtime PM support via a few re-factorings
   - tmio: deprecate "toshiba,mmc-wrprotect-disable" DT property
   - tmio/renesas_sdhi: Consolidate code supporting write protect
   - tmio: Improve DMA vs PIO handling
   - tmio: Add support for IP-builtin card detection logic"

* tag 'mmc-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (55 commits)
  mmc: renesas_sdhi: replace EXT_ACC with HOST_MODE
  mmc: update sdio_claim_irq documentation
  mmc: Export host capabilities to debugfs.
  mmc: core: Disable HPI for certain Micron (Numonyx) eMMC cards
  mmc: block: fix updating ext_csd caches on ioctl call
  mmc: sunxi: Set our device drvdata earlier
  mmc: sunxi: Move the reset deassertion before enabling the clocks
  mmc: sunxi: Move resources management to separate functions
  mmc: dw_mmc: add support for hi3798cv200 specific extensions of dw-mshc
  dt-bindings: mmc: add bindings for hi3798cv200-dw-mshc
  mmc: core: Export card RCA register via sysfs
  mmc: renesas_sdhi: fix WP detection
  mmc: core: Use memdup_user() rather than duplicating its implementation
  mmc: dw_mmc-rockchip: correct property names in debug
  mmc: sd: Remove redundant err assignment from mmc_read_switch
  mmc: sdio: Check the return value of sdio_enable_4bit_bus
  mmc: core: Don't try UHS-I mode if 4-bit mode isn't supported
  arm64: dts: hi3660: remove 'num-slots' property for dwmmc
  ARM: dts: lpc18xx: remove 'num-slots' property for dwmmc
  arm64: dts: stratix10: remove 'num-slots' property for dwmmc
  ...
2018-04-03 12:17:25 -07:00
Changbin Du
51125a29a3 perf trace: Remove redundant ')'
There is a redundant ')' at the tail of each event. So remove it.

$ sudo perf trace --no-syscalls -e 'kmem:*' -a
   899.342 kmem:kfree:(vfs_writev+0xb9) call_site=ffffffff9c453979 ptr=(nil))
   899.344 kmem:kfree:(___sys_recvmsg+0x188) call_site=ffffffff9c9b8b88 ptr=(nil))

Signed-off-by: Changbin Du <changbin.du@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1520937601-24952-1-git-send-email-changbin.du@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03 16:16:41 -03:00
Linus Torvalds
dabe51840e Merge tag 'hsi-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi
Pull HSI updates from Sebastian Reichel:

 - spelling/typo fixes

 - remove extra error printing for -ENOMEM

* tag 'hsi-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
  HSI: hsi_char: Delete an error message for a failed memory allocation in hsc_probe()
  HSI: ssi_protocol: fix spelling mistake: "trigerred" -> "triggered"
  HSI: ssi_protocol: Delete an error message for a failed memory allocation in ssi_protocol_probe()
  HSI: ssi_protocol: Fix a typo in two comment lines
2018-04-03 12:14:54 -07:00
Linus Torvalds
3ac684b881 Merge tag 'for-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:

 - Microsemi Ocelot reset support

 - Spreadtrum SC27xx reset support

 - generic gpio charger: lot's of cleanups

 - axp20x fuel gauge: add AXP813 support

 - misc fixes, including one devicetree change for the Nokia N900, that
   has been Acked-by Tony Lindgren

* tag 'for-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (27 commits)
  power: reset: at91-reset: Switch from the pr_*() to the dev_*() logging functions
  power: reset: at91-poweroff: Remove redundant dev_err call in at91_poweroff_probe()
  power: reset: at91-poweroff: Switch from the pr_*() to the dev_*() logging functions
  power: reset: make function sc27xx_poweroff_shutdown static
  power: supply: da9150-fg: remove VLA usage
  ARM: dts: omap3-n900: Add link between battery and charger
  power: supply: bq2415x: add DT referencing support
  power: supply: bq27xxx: support missing supplier device
  max17042: propagate of_node to power supply device
  power: supply: axp288_fuel_gauge: Fix full status reporting
  power: supply: axp288_fuel_gauge: Do not register FG on ECS EF20EA
  power: reset: gpio-poweroff: Support for timeout from device property
  dt-bindings: power: reset: gpio-poweroff: Add 'timeout-ms' property
  power: reset: Add Spreadtrum SC27xx PMIC power off support
  power: supply: axp20x_battery: add support for AXP813
  dt-bindings: power: supply: axp20x: add AXP813 battery DT binding
  power: supply: axp20x_battery: use data struct for variant specific code
  power: supply: gpio-charger: Remove pdata from gpio_charger
  power: supply: gpio-charger: Use GPIOF_ACTIVE_LOW for legacy setup
  power: supply: gpio-charger: Remove redundant dev_err call in probe function
  ...
2018-04-03 12:10:01 -07:00
Eric Biggers
f3aefb6a70 sunrpc: remove incorrect HMAC request initialization
make_checksum_hmac_md5() is allocating an HMAC transform and doing
crypto API calls in the following order:

    crypto_ahash_init()
    crypto_ahash_setkey()
    crypto_ahash_digest()

This is wrong because it makes no sense to init() the request before a
key has been set, given that the initial state depends on the key.  And
digest() is short for init() + update() + final(), so in this case
there's no need to explicitly call init() at all.

Before commit 9fa68f6200 ("crypto: hash - prevent using keyed hashes
without setting key") the extra init() had no real effect, at least for
the software HMAC implementation.  (There are also hardware drivers that
implement HMAC-MD5, and it's not immediately obvious how gracefully they
handle init() before setkey().)  But now the crypto API detects this
incorrect initialization and returns -ENOKEY.  This is breaking NFS
mounts in some cases.

Fix it by removing the incorrect call to crypto_ahash_init().

Reported-by: Michael Young <m.a.young@durham.ac.uk>
Fixes: 9fa68f6200 ("crypto: hash - prevent using keyed hashes without setting key")
Fixes: fffdaef2eb ("gss_krb5: Add support for rc4-hmac encryption")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:17 -04:00
Chuck Lever
38a7031559 NFSD: Clean up legacy NFS SYMLINK argument XDR decoders
Move common code in NFSD's legacy SYMLINK decoders into a helper.
The immediate benefits include:

 - one fewer data copies on transports that support DDP
 - consistent error checking across all versions
 - reduction of code duplication
 - support for both legal forms of SYMLINK requests on RDMA
   transports for all versions of NFS (in particular, NFSv2, for
   completeness)

In the long term, this helper is an appropriate spot to perform a
per-transport call-out to fill the pathname argument using, say,
RDMA Reads.

Filling the pathname in the proc function also means that eventually
the incoming filehandle can be interpreted so that filesystem-
specific memory can be allocated as a sink for the pathname
argument, rather than using anonymous pages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:16 -04:00
Chuck Lever
8154ef2776 NFSD: Clean up legacy NFS WRITE argument XDR decoders
Move common code in NFSD's legacy NFS WRITE decoders into a helper.
The immediate benefit is reduction of code duplication and some nice
micro-optimizations (see below).

In the long term, this helper can perform a per-transport call-out
to fill the rq_vec (say, using RDMA Reads).

The legacy WRITE decoders and procs are changed to work like NFSv4,
which constructs the rq_vec just before it is about to call
vfs_writev.

Why? Calling a transport call-out from the proc instead of the XDR
decoder means that the incoming FH can be resolved to a particular
filesystem and file. This would allow pages from the backing file to
be presented to the transport to be filled, rather than presenting
anonymous pages and copying or flipping them into the file's page
cache later.

I also prefer using the pages in rq_arg.pages, instead of pulling
the data pages directly out of the rqstp::rq_pages array. This is
currently the way the NFSv3 write decoder works, but the other two
do not seem to take this approach. Fixing this removes the only
reference to rq_pages found in NFSD, eliminating an NFSD assumption
about how transports use the pages in rq_pages.

Lastly, avoid setting up the first element of rq_vec as a zero-
length buffer. This happens with an RDMA transport when a normal
Read chunk is present because the data payload is in rq_arg's
page list (none of it is in the head buffer).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:16 -04:00
Chuck Lever
fff4080b2f nfsd: Trace NFSv4 COMPOUND execution
This helps record the identity and timing of the ops in each NFSv4
COMPOUND, replacing dprintk calls that did much the same thing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever
87c5942e8f nfsd: Add I/O trace points in the NFSv4 read proc
NFSv4 read compound processing invokes nfsd_splice_read and
nfs_readv directly, so the trace points currently in nfsd_read are
not invoked for NFSv4 reads.

Move the NFSD READ trace points to common helpers so that NFSv4
reads are captured.

Also, record any local I/O error that occurs, the total count of
bytes that were actually returned, and whether splice or vectored
read was used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever
d890be159a nfsd: Add I/O trace points in the NFSv4 write path
NFSv4 write compound processing invokes nfsd_vfs_write directly. The
trace points currently in nfsd_write are not effective for NFSv4
writes.

Move the trace points into the shared nfsd_vfs_write() helper.

After the I/O, we also want to record any local I/O error that
might have occurred, and the total count of bytes that were actually
moved (rather than the requested number).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever
f394b62b7b nfsd: Add "nfsd_" to trace point names
Follow naming convention used in client and in sunrpc layers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:14 -04:00
Chuck Lever
79e0b4e247 nfsd: Record request byte count, not count of vectors
Byte count is more helpful to know than vector count.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:14 -04:00
Chuck Lever
afa720a091 nfsd: Fix NFSD trace points
nfsd-1915  [003] 77915.780959: write_opened:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780960: write_io_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780964: write_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1

Byte swapping and knfsd_fh_hash() are not available in "trace-cmd
report", where the print format string is actually used. These
data transformations have to be done during the TP_fast_assign step.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:13 -04:00
Chuck Lever
55f5088c22 svc: Report xprt dequeue latency
Record the time between when a rqstp is enqueued on a transport
and when it is dequeued. This includes how long the rqstp waits on
the queue and how long it takes the kernel scheduler to wake a
nfsd thread to service it.

The svc_xprt_dequeue trace point is altered to include the number
of microseconds between xprt_enqueue and xprt_dequeue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:13 -04:00
Chuck Lever
aaba72cd4e sunrpc: Report per-RPC execution stats
Introduce a mechanism to report the server-side execution latency of
each RPC. The goal is to enable user space to filter the trace
record for latency outliers, build histograms, etc.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:12 -04:00
Chuck Lever
0b9547bf6b sunrpc: Re-purpose trace_svc_process
Currently, trace_svc_process has two call sites:

1. Just after a call to svc_send. svc_send already invokes
   trace_svc_send with the same arguments just before returning

2. Just before a call to svc_drop. svc_drop already invokes
   trace_svc_drop with the same arguments just after it is called

Therefore trace_svc_process does not provide any additional
information not already provided by these other trace points.

However, it would be useful to record the incoming RPC procedure.
So reuse trace_svc_process for this purpose.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:12 -04:00
Chuck Lever
ece200ddd5 sunrpc: Save remote presentation address in svc_xprt for trace events
TP_printk defines a format string that is passed to user space for
converting raw trace event records to something human-readable.

My user space's printf (Oracle Linux 7), however, does not have a
%pI format specifier. The result is that what is supposed to be an
IP address in the output of "trace-cmd report" is just a string that
says the field couldn't be displayed.

To fix this, adopt the same approach as the client: maintain a pre-
formated presentation address for occasions when %pI is not
available.

The location of the trace_svc_send trace point is adjusted so that
rqst->rq_xprt is not NULL when the trace event is recorded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:11 -04:00