current_fs_time() is used for inode timestamps.
Change the signature of the function to take inode pointer
instead of superblock as per Linus's suggestion.
Also, move the api under vfs as per the discussion on the
thread: https://lkml.org/lkml/2016/6/9/36 . As per Arnd's
suggestion on the thread, changing the function name.
current_fs_time() will be deleted after all the references
to it are replaced by current_time().
There was a bug reported by kbuild test bot with the change
as some of the calls to current_time() were made before the
super_block was initialized. Catch these accidental assignments
as timespec_trunc() does for wrong granularities. This allows
for the function to work right even in these circumstances.
But, adds a warning to make the user aware of the bug.
A coccinelle script was used to identify all the current
.alloc_inode super_block callbacks that updated inode timestamps.
proc filesystem was the only one that was modifying inode times
as part of this callback. The series includes a patch to fix that.
Note that timespec_trunc() will also be moved to fs/inode.c
in a separate patch when this will need to be revamped for
bounds checking purposes.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* the only remaining callers of "short" fault-ins are just as happy with generic
variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
"short" ones
* rename the multipage variants to now available plain ones.
* get rid of compat macro defining iov_iter_fault_in_multipage_readable by
expanding it in its only user.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Save the position of the error reporting capability so it doesn't need to
be rediscovered during error handling.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Lukas Wunner <lukas@wunner.de>
Allow access to underlying channel IIO_CHAN_INFO_OFFSET from a consumer.
Signed-off-by: Matt Ranostay <matt@ranostay.consulting>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Some triggers can only be attached to the IIO device that corresponds to
the same physical device. Currently each driver that requires this
implements its own trigger validation function.
Introduce a new helper function called iio_trigger_validate_own_device()
that can be used to do this check. Having a common implementation avoids
code duplication and unnecessary boiler-plate code.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
In some cases (e.g. when the SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED sequence
flag is set) we may already know that the stateid was revoked and that the
only valid operation we can call is FREE_STATEID. In those cases, allow
the stateid to carry the information in the type field, so that we skip
the redundant call to TEST_STATEID.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The problem with previous code was it rounded values in wrong
place and produced wrong baud rate in some cases.
Signed-off-by: Alexey Starikovskiy <aystarik@gmail.com>
[nicolas.ferre@atmel.com: port to newer kernel and add commit log]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When performing DMA operations on a MCB device, the device needed
for using the DMA API is "mcb_device->bus_carrier".
This is rather lengthy, so a shortcut is introduced to struct mcb_device
in order to ensure the MCB device driver uses the correct device for DMA
operations.
Signed-off-by: Michael Moese <michael.moese@men.de>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is trivial to do:
- add flags argument to simple_rename()
- check if flags doesn't have any other than RENAME_NOREPLACE
- assign simple_rename() to .rename2 instead of .rename
Filesystems converted:
hugetlbfs, ramfs, bpf.
Debugfs uses simple_rename() to implement debugfs_rename(), which is for
debugfs instances to rename files internally, not for userspace filesystem
access. For this case pass zero flags to simple_rename().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Added one additional parameter to thermal_zone_device_update() to provide
caller with an optional capability to specify reason.
Currently this event is used by user space governor to trigger different
processing based on event code. Also it saves an additional call to read
temperature when the event is received.
The following events are cuurently defined:
- Unspecified event
- New temperature sample
- Trip point violated
- Trip point changed
- thermal device up and down
- thermal device power capability changed
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
The .get_trend callback in struct thermal_zone_device_ops has
the prototype:
int (*get_trend) (struct thermal_zone_device *, int,
enum thermal_trend *);
whereas the .get_trend callback in struct thermal_zone_of_device_ops
has:
int (*get_trend)(void *, long *);
Streamline both prototypes and add the trip argument to the OF callback
aswell and use enum thermal_trend * instead of an integer pointer.
While the OF prototype may be the better one, this should be decided at
framework level and not on OF level.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: linux-pm@vger.kernel.org
Reviewed-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This adds support for hardware-tracked trip points to the device tree
thermal sensor framework.
The framework supports an arbitrary number of trip points. Whenever
the current temperature is updated, the trip points immediately
below and above the current temperature are found. A .set_trips
callback is then called with the temperatures. If there is no trip
point above or below the current temperature, the passed trip
temperature will be -INT_MAX or INT_MAX respectively. In this callback,
the driver should program the hardware such that it is notified
when either of these trip points are triggered. When a trip point
is triggered, the driver should call `thermal_zone_device_update'
for the respective thermal zone. This will cause the trip points
to be updated again.
If .set_trips is not implemented, the framework behaves as before.
This patch is based on an earlier version from Mikko Perttunen
<mikko.perttunen@kapsi.fi>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: linux-pm@vger.kernel.org
Reviewed-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Add apis for platform thermal drivers to query for slope and offset
attributes, which might be needed for temperature calculations.
Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Bring in the upstream modifications so we can fixup the silent merge
conflict which is introduced by this merge.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The core uses it for polling. Give drivers a proper define handle this
case like for other response types.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
A host controller driver exposes its capability using caps flag
MMC_CAP_CMD_DURING_TFR. A driver with that capability can accept requests
that are marked mrq->cap_cmd_during_tfr = true. Then the driver informs the
upper layers when the command line is available for further commands by
calling mmc_command_done(). Because of that, the driver will not then
automatically send STOP commands, and it is the responsibility of the upper
layer to send a STOP command if it is required.
For requests submitted through the mmc_wait_for_req() interface, the caller
sets mrq->cap_cmd_during_tfr = true which causes mmc_wait_for_req() in fact
not to wait. The caller can then send commands that do not use the data
lines. Finally the caller can wait for the transfer to complete by calling
mmc_wait_for_req_done() which is now exported.
For requests submitted through the mmc_start_req() interface, the caller
again sets mrq->cap_cmd_during_tfr = true, but mmc_start_req() anyway does
not wait. The caller can then send commands that do not use the data
lines. Finally the caller can wait for the transfer to complete in the
normal way i.e. calling mmc_start_req() again.
Irrespective of how a cap_cmd_during_tfr request is started,
mmc_is_req_done() can be called if the upper layer needs to determine if
the request is done. However the appropriate waiting function (either
mmc_wait_for_req_done() or mmc_start_req()) must still be called.
The implementation consists primarily of a new completion
mrq->cmd_completion which notifies when the command line is available for
further commands. That completion is completed by mmc_command_done().
When there is an ongoing data transfer, calls to mmc_wait_for_req() will
automatically wait on that completion, so the caller does not have to do
anything special.
Note, in the case of errors, the driver may call mmc_request_done() without
calling mmc_command_done() because mmc_request_done() always calls
mmc_command_done().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Dwmmc host controller may in unknown state when entering kernel boot. One
example is when booting from eMMC, bootloader need initialize MMC host
controller into some state so it can read. In order to make sure MMC host
controller in a clean initial state, this reset support is added.
With this patch, a 'resets' property can be added into dw_mmc device
tree node. The hardware logic is: dwmmc host controller IP receives a reset
signal from a 'reset provider' (eg. power management unit). The 'resets'
property points to this reset signal. So, during dwmmc driver probe,
it can use this signal to reset itself.
Refer to [1] for more information.
[1] Documentation/devicetree/bindings/reset/reset.txt
Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
Signed-off-by: Xinwei Kong <kong.kongxinwei@hisilicon.com>
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The SD Status register contains several important fields related to the
SD Card proprietary features.
Those fields may be used by user space applications for vendor specific
usage.
None of those fields are exported today by the driver to user space.
In this patch, we are reading the SD Status register and exporting
(using MMC_DEV_ATTR) the SD Status register to the user space.
Signed-off-by: Uri Yanai <uri.yanai@sandisk.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
This patch updates the s3c24xx dma driver to be able to pass a
dma_slave_map array via the platform data. This is needed to
be able to use the new, simpler dmaengine API [1].
I used the virtual DMA channels as a parameter for the dma_filter
function. By doing that, I could reuse the existing filter function in
drivers/dma/s3c24xx-dma.c.
I have tested this on my mini2440 board with the audio driver.
According to my observations, dma_request_slave_channel in the
function dmaengine_pcm_new in the file
sound/soc/soc-generic-dmaengine-pcm.c now returns a valid DMA channel
whereas before no DMA channel was returned at that point.
Entries for DMACH_XD0, DMACH_XD1 and DMACH_TIMER are missing because I
don't realy know which driver to use for these.
[1]
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393635.html
Signed-off-by: Sam Van Den Berge <sam.van.den.berge@telenet.be>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
To get more coverage, enable COMPILE_TEST for this driver.
While at it, to fix build on other archs, select MMP_SRAM only for ARCH_MMP
and also fix the platform header
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Map/Unmap a device MMIO resource from a physical address. If no dma_map_ops
method is available the operation is a no-op.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
A MMIO mapped resource can not be represented by a struct page so a new
debug type is needed to handle this. This patch add such type and
functionality to add/remove entries and how to translate them to a
physical address.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
When building XFS with -Werror, it now fails with:
include/linux/pagemap.h: In function 'fault_in_multipages_readable':
include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable]
volatile char c;
^
This is a regression caused by commit e23d4159b1 ("fix
fault_in_multipages_...() on architectures with no-op access_ok()").
Fix it by re-adding the "(void)c" trick taht was previously used to make
the compiler think the variable is used.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
net/netfilter/core.c
net/netfilter/nf_tables_netdev.c
Resolve two conflicts before pull request for David's net-next tree:
1) Between c73c248490 ("netfilter: nf_tables_netdev: remove redundant
ip_hdr assignment") from the net tree and commit ddc8b6027a
("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()").
2) Between e8bffe0cf9 ("net: Add _nf_(un)register_hooks symbols") and
Aaron Conole's patches to replace list_head with single linked list.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The netfilter hook list never uses the prev pointer, and so can be trimmed to
be a simple singly-linked list.
In addition to having a more light weight structure for hook traversal,
struct net becomes 5568 bytes (down from 6400) and struct net_device becomes
2176 bytes (down from 2240).
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This makes things simpler because we can store the head of the list
in the nf_state structure without worrying about concurrent add/delete
of hook elements from the list.
A future commit will make use of this to implement a simpler
linked-list.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move the vf to VST 802.1ad mode (mlx4 VST QinQ mode) by setting vf vlan
protocol to 802.1ad.
VST 802.1ad mode in mlx4, is used for STAG strip/insertion by PF, while
the CTAG is set by the VF.
Read current vlan protocol as part of the vf configuration state.
Upon setting vf vlan protocol to 802.1ad, we use a mechanism of handshake
to verify that both the vf and the pf driver version support it.
The handshake uses the command QUERY_FUNC_CAP:
- The vf sets a pre-defined support bit in input modifier.
- A pf that supports the feature sends the request to the vf through a
pre-defined field in the output mailbox.
- In case vf does not support the feature, the pf will fail the control
command (in this case, IP link tool command to set the vf vlan
protocol to 802.1ad).
No change in VST 802.1Q mode.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
the ability for user-space application to specify it for the VF, as an
option to support 802.1ad.
We adjusted IP Link tool to support this option.
For future use cases, the new UAPI supports multiple vlans. For now we
limit the list size to a single vlan in kernel.
Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older versions of IP Link tool.
Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
We kept 802.1Q as the drivers' default vlan protocol.
Suitable ip link tool command examples:
Set vf vlan protocol 802.1ad:
ip link set eth0 vf 1 vlan 100 proto 802.1ad
Set vf to VST (802.1Q) mode:
ip link set eth0 vf 1 vlan 100 proto 802.1Q
Or by omitting the new parameter
ip link set eth0 vf 1 vlan 100
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check device capability to support VF vlan protocol 802.1ad mode.
Add vport attribute vlan protocol.
Init vport vlan protocol by default to 802.1Q.
Add update QP support for VF vlan protocol 802.1ad.
Add func capability vlan_offload_disable to disable all
vlan HW acceleration on VF while the VF is set to VF vlan protocol
802.1ad mode.
No change in VF vlan protocol 802.1Q (VST) mode.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This define is needed by i2c_adapter_depth() to detect if we don't
exceed the maximum number of lock subclasses. Make it visible even
if lockdep is disabled.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Peter Rosin <peda@axentia.se>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
For crazy setups in which an i2c gpio expander is behind an i2c gpio
multiplexer controlled by a gpio provided a second expander using the
same device driver we need to explicitly tell lockdep how to handle
nested locking.
Export i2c_adapter_depth() as public API to be reused outside of i2c
core code.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Peter Rosin <peda@axentia.se>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The generic THREAD_INFO_IN_TASK definition of thread_info::flags is a
u32, matching x86 prior to the introduction of THREAD_INFO_IN_TASK.
However, common helpers like test_ti_thread_flag() implicitly assume
that thread_info::flags has at least the size and alignment of unsigned
long, and relying on padding and alignment provided by other elements of
task_struct is somewhat fragile. Additionally, some architectures use
more that 32 bits for thread_info::flags, and others may need to in
future.
With THREAD_INFO_IN_TASK, task struct follows thread_info with a long
field, and thus we no longer save any space as we did back in commit:
affa219b60 ("x86: change thread_info's flag field back to 32 bits")
Given all this, it makes more sense for the generic thread_info::flags
to be an unsigned long.
In fact given <linux/thread_info.h> contains/uses the helpers mentioned
above, BE arches *must* use unsigned long (or something of the same size)
today, or they wouldn't work.
Make it so.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1474651447-30447-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the watchdog framework centrializes the IOCTL interfaces of device
drivers now, SETPRETIMEOUT and GETPRETIMEOUT need to be added in the
common code.
Signed-off-by: Robin Gong <b38343@freescale.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
[vzapolskiy: added conditional pretimeout sysfs attribute visibility]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Starting from Intel Skylake the iTCO watchdog timer registers were moved to
reside in the same register space with SMBus host controller. Not all
needed registers are available though and we need to unhide P2SB (Primary
to Sideband) device briefly to be able to read status of required NO_REBOOT
bit. The i2c-i801.c SMBus driver used to handle this and creation of the
iTCO watchdog platform device.
Windows, on the other hand, does not use the iTCO watchdog hardware
directly even if it is available. Instead it relies on ACPI Watchdog Action
Table (WDAT) table to describe the watchdog hardware to the OS. This table
contains necessary information about the the hardware and also set of
actions which are executed by a driver as needed.
This patch implements a new watchdog driver that takes advantage of the
ACPI WDAT table. We split the functionality into two parts: first part
enumerates the WDAT table and if found, populates resources and creates
platform device for the actual driver. The second part is the driver
itself.
The reason for the split is that this way we can make the driver itself to
be a module and loaded automatically if the WDAT table is found. Otherwise
the module is not loaded.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The "num" is the number of clk_hw entries in the structure, so
"unsigned int" would be a better fit. (size_t looks like data
size we count by byte.)
Besides, struct clk_onecell_data already uses unsigned int for
"clk_num".
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Avoid that sparse complains about blkg_hint manipulations.
Fixes: a637120e49 ("blkcg: use radix tree to index blkgs from blkcg")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Support Remote Invalidation. A private message is exchanged with
the client upon RDMA transport connect that indicates whether
Send With Invalidation may be used by the server to send RPC
replies. The invalidate_rkey is arbitrarily chosen from among
rkeys present in the RPC-over-RDMA header's chunk lists.
Send With Invalidate improves performance only when clients can
recognize, while processing an RPC reply, that an rkey has already
been invalidated. That has been submitted as a separate change.
In the future, the RPC-over-RDMA protocol might support Remote
Invalidation properly. The protocol needs to enable signaling
between peers to indicate when Remote Invalidation can be used
for each individual RPC.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Introduce data structure used by both client and server to exchange
implementation details during RDMA/CM connection establishment.
This is an experimental out-of-band exchange between Linux
RPC-over-RDMA Version One implementations, replacing the deprecated
CCP (see RFC 5666bis). The purpose of this extension is to enable
prototyping of features that might be introduced in a subsequent
version of RPC-over-RDMA.
Suggested by Christoph Hellwig and Devesh Sharma.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The ctxt's count field is overloaded to mean the number of pages in
the ctxt->page array and the number of SGEs in the ctxt->sge array.
Typically these two numbers are the same.
However, when an inline RPC reply is constructed from an xdr_buf
with a tail iovec, the head and tail often occupy the same page,
but each are DMA mapped independently. In that case, ->count equals
the number of pages, but it does not equal the number of SGEs.
There's one more SGE, for the tail iovec. Hence there is one more
DMA mapping than there are pages in the ctxt->page array.
This isn't a real problem until the server's iommu is enabled. Then
each RPC reply that has content in that iovec orphans a DMA mapping
that consists of real resources.
krb5i and krb5p always populate that tail iovec. After a couple
million sent krb5i/p RPC replies, the NFS server starts behaving
erratically. Reboot is needed to clear the problem.
Fixes: 9d11b51ce7 ("svcrdma: Fix send_reply() scatter/gather set-up")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>