Commit Graph

104187 Commits

Author SHA1 Message Date
Omar Sandoval
c87228f16f amiflop: fold headers into C file
amifd.h and amifdreg.h are only used from amiflop.c, and they're pretty
small, so move the contents to amiflop.c and get rid of the .h files.
This is preparation for adding a struct blk_mq_tag_set to struct
amiga_floppy_struct.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-16 09:49:42 -06:00
Jens Axboe
891b7c5fbf mtd_blkdevs: convert to blk-mq
Straight forward conversion, using an internal list to enable the
driver to pull requests at will.

Dynamically allocate the tag set to avoid having to pull in the
block headers for blktrans.h, since various mtd drivers use
block conflicting names for defines and functions.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-16 08:09:58 -06:00
Jens Axboe
9316a9ed68 blk-mq: provide helper for setting up an SQ queue and tag set
This pattern is repeated throughout all the blk-mq conversions.
Provide a basic helper to get it done.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-15 20:05:18 -06:00
Konstantin Khlebnikov
4822e902f9 block: describe difference between flags IO_STAT and STATS
This adds reasonable comments, but they definitely needs better names.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-11 11:04:21 -06:00
Hans Holmberg
bf82fa2f58 lightnvm: pblk: fix mapping issue on failed writes
On 1.2-devices, the mapping-out of remaning sectors in the
failed-write's block can result in an infinite loop,
stalling the write pipeline, fix this.

Fixes: 6a3abf5bee ("lightnvm: pblk: rework write error recovery path")
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09 08:25:07 -06:00
Javier González
7f985f9a69 lightnvm: move ppa transformations to core
Continuing the effort of moving 1.2 and 2.0 specific code to core, move
64_to_32 and 32_to_64 ppa helpers from pblk to core.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09 08:25:07 -06:00
Hans Holmberg
d68a934404 lightnvm: introduce nvm_rq_to_ppa_list
There is a number of places in the lightnvm subsystem where the user
iterates over the ppa list. Before iterating, the user must know if it
is a single or multiple LBAs due to vector commands using either the
nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which
leads to open-coding the if/else statement.

Instead of having multiple if/else's, move it into a function that can
be called by its users.

A nice side effect of this cleanup is that this patch fixes up a
bunch of cases where we don't consider the single-ppa case in pblk.

Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09 08:25:07 -06:00
Javier González
2cf99bbd10 lightnvm: pblk: add helpers for chunk addresses
Implement helpers to go from ppas to a chunk within a line and an
address within a chunk.

These helpers will be used on the patches adding trace support in pblk,
which will be sent in this window.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09 08:25:06 -06:00
Matias Bjørling
aff3fb18f9 lightnvm: move bad block and chunk state logic to core
pblk implements two data paths for recovery line state. One for 1.2
and another for 2.0, instead of having pblk implement these, combine
them in the core to reduce complexity and make available to other
targets.

The new interface will adhere to the 2.0 chunk definition,
including managing open chunks with an active write pointer. To provide
this interface, a 1.2 device recovers the state of the chunks by
manually detecting if a chunk is either free/open/close/offline, and if
open, scanning the flash pages sequentially to find the next writeable
page. This process takes on average ~10 seconds on a device with 64 dies,
1024 blocks and 60us read access time. The process can be parallelized
but is left out for maintenance simplicity, as the 1.2 specification is
deprecated. For 2.0 devices, the logic is maintained internally in the
drive and retrieved through the 2.0 interface.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09 08:25:06 -06:00
Matias Bjørling
656e33ca3d lightnvm: move device L2P detection to core
A 1.2 device is able to manage the logical to physical mapping
table internally or leave it to the host.

A target only supports one of those approaches, and therefore must
check on initialization. Move this check to core to avoid each target
implement the check.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09 08:25:06 -06:00
Matias Bjørling
d7b6801673 lightnvm: combine 1.2 and 2.0 command flags
Add nvm_set_flags helper to enable core to appropriately
set the command flags for read/write/erase depending on which version
a drive supports.

The flags arguments can be distilled into the access hint,
scrambling, and program/erase suspend. Replace the access hint with
a "is_seq" parameter. The rest of the flags are dependent on the
command opcode, which is trivial to detect and set.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09 08:25:05 -06:00
James Smart
783f4a4408 nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O
When an io is rejected by nvmf_check_ready() due to validation of the
controller state, the nvmf_fail_nonready_command() will normally return
BLK_STS_RESOURCE to requeue and retry.  However, if the controller is
dying or the I/O is marked for NVMe multipath, the I/O is failed so that
the controller can terminate or so that the io can be issued on a
different path.  Unfortunately, as this reject point is before the
transport has accepted the command, blk-mq ends up completing the I/O
and never calls nvme_complete_rq(), which is where multipath may preserve
or re-route the I/O. The end result is, the device user ends up seeing an
EIO error.

Example: single path connectivity, controller is under load, and a reset
is induced.  An I/O is received:

  a) while the reset state has been set but the queues have yet to be
     stopped; or
  b) after queues are started (at end of reset) but before the reconnect
     has completed.

The I/O finishes with an EIO status.

This patch makes the following changes:

  - Adds the HOST_PATH_ERROR pathing status from TP4028
  - Modifies the reject point such that it appears to queue successfully,
    but actually completes the io with the new pathing status and calls
    nvme_complete_rq().
  - nvme_complete_rq() recognizes the new status, avoids resetting the
    controller (likely was already done in order to get this new status),
    and calls the multipather to clear the current path that errored.
    This allows the next command (retry or new command) to select a new
    path if there is one.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01 14:16:13 -07:00
Jens Axboe
c0aac682fa Merge tag 'v4.19-rc6' into for-4.20/block
Merge -rc6 in, for two reasons:

1) Resolve a trivial conflict in the blk-mq-tag.c documentation
2) A few important regression fixes went into upstream directly, so
   they aren't in the 4.20 branch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

* tag 'v4.19-rc6': (780 commits)
  Linux 4.19-rc6
  MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
  cpufreq: qcom-kryo: Fix section annotations
  perf/core: Add sanity check to deal with pinned event failure
  xen/blkfront: correct purging of persistent grants
  Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
  selftests/powerpc: Fix Makefiles for headers_install change
  blk-mq: I/O and timer unplugs are inverted in blktrace
  dax: Fix deadlock in dax_lock_mapping_entry()
  x86/boot: Fix kexec booting failure in the SEV bit detection code
  bcache: add separate workqueue for journal_write to avoid deadlock
  drm/amd/display: Fix Edid emulation for linux
  drm/amd/display: Fix Vega10 lightup on S3 resume
  drm/amdgpu: Fix vce work queue was not cancelled when suspend
  Revert "drm/panel: Add device_link from panel device to DRM device"
  xen/blkfront: When purging persistent grants, keep them in the buffer
  clocksource/drivers/timer-atmel-pit: Properly handle error cases
  block: fix deadline elevator drain for zoned block devices
  ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
  drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
  ...

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-01 08:58:57 -06:00
Greg Kroah-Hartman
2f19e7a7e6 Merge tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Mark writes:
  "spi: Fixes for v4.19

   Quite a few fixes for the Renesas drivers in here, plus a fix for the
   Tegra driver and some documentation fixes for the recently added
   spi-mem code.  The Tegra fix is relatively large but fairly
   straightforward and mechanical, it runs on probe so it's been
   reasonably well covered in -next testing."

* tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-mem: Move the DMA-able constraint doc to the kerneldoc header
  spi: spi-mem: Add missing description for data.nbytes field
  spi: rspi: Fix interrupted DMA transfers
  spi: rspi: Fix invalid SPI use during system suspend
  spi: sh-msiof: Fix handling of write value for SISTR register
  spi: sh-msiof: Fix invalid SPI use during system suspend
  spi: gpio: Fix copy-and-paste error
  spi: tegra20-slink: explicitly enable/disable clock
2018-09-28 18:04:06 -07:00
Greg Kroah-Hartman
8f0566118e Merge tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Mark writes:
  "regulator: Fixes for 4.19

   A collection of fairly minor bug fixes here, a couple of driver
   specific ones plus two core fixes.  There's one fix for the new
   suspend state code which fixes some confusion with constant values
   that are supposed to indicate noop operation and another fixing a
   race condition with the creation of sysfs files on new regulators."

* tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: fix crash caused by null driver data
  regulator: Fix 'do-nothing' value for regulators without suspend state
  regulator: da9063: fix DT probing with constraints
  regulator: bd71837: Disable voltage monitoring for LDO3/4
2018-09-28 18:02:25 -07:00
Greg Kroah-Hartman
f151f57bfd Merge tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm
Dave writes:
  "drm fixes for 4.19-rc6

   Looks like a pretty normal week for graphics,

   core: syncobj fix, panel link regression revert
   amd: suspend/resume fixes, EDID emulation fix
   mali-dp: NV12 writeback and vblank reset fixes
   etnaviv: DMA setup fix"

* tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Fix Edid emulation for linux
  drm/amd/display: Fix Vega10 lightup on S3 resume
  drm/amdgpu: Fix vce work queue was not cancelled when suspend
  Revert "drm/panel: Add device_link from panel device to DRM device"
  drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
  drm/malidp: Fix writeback in NV12
  drm: mali-dp: Call drm_crtc_vblank_reset on device init
  drm/etnaviv: add DMA configuration for etnaviv platform device
2018-09-28 18:55:17 +02:00
Hannes Reinecke
fef912bf86 block: genhd: add 'groups' argument to device_add_disk
Update device_add_disk() to take an 'groups' argument so that
individual drivers can register a device with additional sysfs
attributes.
This avoids race condition the driver would otherwise have if these
groups were to be created with sysfs_add_groups().

Signed-off-by: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-28 08:30:28 -06:00
Omar Sandoval
6c3b7af1c9 kyber: add tracepoints
When debugging Kyber, it's really useful to know what latencies we've
been having, how the domain depths have been adjusted, and if we've
actually been throttling. Add three tracepoints, kyber_latency,
kyber_adjust, and kyber_throttled, to record that.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-27 17:34:59 -06:00
Omar Sandoval
ed88660a53 block: move call of scheduler's ->completed_request() hook
Commit 4bc6339a58 ("block: move blk_stat_add() to
__blk_mq_end_request()") consolidated some calls using ktime_get() so
we'd only need to call it once. Kyber's ->completed_request() hook also
calls ktime_get(), so let's move it to the same place, too.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-27 17:34:52 -06:00
Linus Walleij
d6a77ba0eb Revert "drm/panel: Add device_link from panel device to DRM device"
This reverts commit 0c08754b59.

commit 0c08754b59
("drm/panel: Add device_link from panel device to DRM device")
creates a circular dependency under these circumstances:

1. The panel depends on dsi-host because it is MIPI-DSI child
   device.
2. dsi-host depends on the drm parent device (connector->dev->dev)
   this should be allowed.
3. drm parent dev (connector->dev->dev) depends on the panel
   after this patch.

This makes the dependency circular and while it appears it
does not affect any in-tree drivers (they do not seem to have
dsi hosts depending on the same parent device) this does not
seem right.

As noted in a response from Andrzej Hajda, the intent is
likely to make the panel dependent on the DRM device
(connector->dev) not its parent. But we have no way of
doing that since the DRM device doesn't contain any
struct device on its own (arguably it should).

Revert this until a proper approach is figured out.

Cc: Jyri Sarha <jsarha@ti.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180927124130.9102-1-linus.walleij@linaro.org
2018-09-27 11:00:42 -04:00
Bart Van Assche
18c9a6bbe0 percpu-refcount: Introduce percpu_ref_resurrect()
This function will be used in a later patch to switch the struct
request_queue q_usage_counter from killed back to live. In contrast
to percpu_ref_reinit(), this new function does not require that the
refcount is zero.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26 15:11:29 -06:00
Bart Van Assche
cd84a62e00 block, scsi: Change the preempt-only flag into a counter
The RQF_PREEMPT flag is used for three purposes:
- In the SCSI core, for making sure that power management requests
  are executed even if a device is in the "quiesced" state.
- For domain validation by SCSI drivers that use the parallel port.
- In the IDE driver, for IDE preempt requests.
Rename "preempt-only" into "pm-only" because the primary purpose of
this mode is power management. Since the power management core may
but does not have to resume a runtime suspended device before
performing system-wide suspend and since a later patch will set
"pm-only" mode as long as a block device is runtime suspended, make
it possible to set "pm-only" mode from more than one context. Since
with this change scsi_device_quiesce() is no longer idempotent, make
that function return early if it is called for a quiesced queue.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26 15:11:28 -06:00
Bart Van Assche
bca6b067b0 block: Move power management code into a new source file
Move the code for runtime power management from blk-core.c into the
new source file blk-pm.c. Move the corresponding declarations from
<linux/blkdev.h> into <linux/blk-pm.h>. For CONFIG_PM=n, leave out
the declarations of the functions that are not used in that mode.
This patch not only reduces the number of #ifdefs in the block layer
core code but also reduces the size of header file <linux/blkdev.h>
and hence should help to reduce the build time of the Linux kernel
if CONFIG_PM is not defined.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26 15:11:28 -06:00
Christoph Hellwig
20e3267601 xen: provide a prototype for xen_biovec_phys_mergeable in xen.h
Having multiple externs in arch headers is not a good way to provide
a common interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26 08:45:10 -06:00
Greg Kroah-Hartman
a38523185b erge tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Dan writes:
  "libnvdimm/dax for 4.19-rc6

  * (2) fixes for the dax error handling updates that were merged for
  v4.19-rc1. My mails to Al have been bouncing recently, so I do not have
  his ack but the uaccess change is of the trivial / obviously correct
  variety. The address_space_operations fixes a regression.

  * A filesystem-dax fix to correct the zero page lookup to be compatible
   with non-x86 (mips and s390) architectures."

* tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: Add missing address_space_operations
  uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe()
  filesystem-dax: Fix use of zero page
2018-09-25 21:37:41 +02:00
Lubomir Rintel
8c0f9f5b30 Revert "uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name"
This changes UAPI, breaking iwd and libell:

  ell/key.c: In function 'kernel_dh_compute':
  ell/key.c:205:38: error: 'struct keyctl_dh_params' has no member named 'private'; did you mean 'dh_private'?
    struct keyctl_dh_params params = { .private = private,
                                        ^~~~~~~
                                        dh_private

This reverts commit 8a2336e549.

Fixes: 8a2336e549 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name")
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Randy Dunlap <rdunlap@infradead.org>
cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
cc: Stephan Mueller <smueller@chronox.de>
cc: James Morris <jmorris@namei.org>
cc: "Serge E. Hallyn" <serge@hallyn.com>
cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: <stable@vger.kernel.org>
Signed-off-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-25 13:28:58 +02:00
Greg Kroah-Hartman
2dd68cc7fd Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net
Dave writes:
  "Networking fixes:

  1) Fix multiqueue handling of coalesce timer in stmmac, from Jose
     Abreu.

   2) Fix memory corruption in NFC, from Suren Baghdasaryan.

   3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi.

   4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun.

   5) Fix TX done race in mvpp2, from Antoine Tenart.

   6) ipv6 metrics leak, from Wei Wang.

   7) Adjust firmware version requirements in mlxsw, from Petr Machata.

   8) Fix autonegotiation on resume in r8169, from Heiner Kallweit.

   9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff
      Barnhill.

   10) Fix double free in devlink, from Dan Carpenter.

   11) Fix ethtool regression from UFO feature removal, from Maciej
       Żenczykowski.

   12) Fix drivers that have a ndo_poll_controller() that captures the
       cpu entirely on loaded hosts by trying to drain all rx and tx
       queues, from Eric Dumazet.

   13) Fix memory corruption with jumbo frames in aquantia driver, from
       Friedemann Gerold."

* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits)
  net: mvneta: fix the remaining Rx descriptor unmapping issues
  ip_tunnel: be careful when accessing the inner header
  mpls: allow routes on ip6gre devices
  net: aquantia: memory corruption on jumbo frames
  tun: remove ndo_poll_controller
  nfp: remove ndo_poll_controller
  bnxt: remove ndo_poll_controller
  bnx2x: remove ndo_poll_controller
  mlx5: remove ndo_poll_controller
  mlx4: remove ndo_poll_controller
  i40evf: remove ndo_poll_controller
  ice: remove ndo_poll_controller
  igb: remove ndo_poll_controller
  ixgb: remove ndo_poll_controller
  fm10k: remove ndo_poll_controller
  ixgbevf: remove ndo_poll_controller
  ixgbe: remove ndo_poll_controller
  bonding: use netpoll_poll_dev() helper
  netpoll: make ndo_poll_controller() optional
  rds: Fix build regression.
  ...
2018-09-25 11:19:49 +02:00
Christoph Hellwig
65969e5cb2 block: don't include bug.h from bio.h
No need to pull in the BUG() defintion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:34:02 -06:00
Christoph Hellwig
bceacbfa48 block: don't include io.h from bio.h
Now that we don't need an override for BIOVEC_PHYS_MERGEABLE there is
no need to drag this header in.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:34:00 -06:00
Christoph Hellwig
6e768461c2 block: remove bvec_to_phys
We only use it in biovec_phys_mergeable and a m68k paravirt driver,
so just opencode it there.  Also remove the pointless unsigned long cast
for the offset in the opencoded instances.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:59 -06:00
Christoph Hellwig
3dccdae54f block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeable
These two checks should always be performed together, so merge them into
a single helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:57 -06:00
Christoph Hellwig
6a9f5f240a block: simplify BIOVEC_PHYS_MERGEABLE
Turn the macro into an inline, move it to blk.h and simplify the
arch hooks a bit.

Also rename the function to biovec_phys_mergeable as there is no need
to shout.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:54 -06:00
Christoph Hellwig
27ca1d4ed0 block: move req_gap_back_merge to blk.h
No need to expose these helpers outside the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:53 -06:00
Christoph Hellwig
e9907009cb block: move req_gap_{back,front}_merge to blk-merge.c
Keep it close to the actual users instead of exposing the function to all
drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:51 -06:00
Christoph Hellwig
43b729bfe9 block: move integrity_req_gap_{back,front}_merge to blk.h
No need to expose these to drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:50 -06:00
Eric Dumazet
ac3d9dd034 netpoll: make ndo_poll_controller() optional
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller().

NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.

This patch allows netpoll_poll_dev() to process NAPI
contexts even for drivers not providing ndo_poll_controller(),
allowing for following patches in NAPI drivers.

Also we export netpoll_poll_dev() so that it can be called
by bonding/team drivers in following patches.

Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23 21:55:24 -07:00
Greg Kroah-Hartman
d02771fb16 Merge tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Lee writes:
  "MFD fixes for v4.19
   - Fix Dialog DA9063 regulator constraints issue causing failure in
     probe
   - Fix OMAP Device Tree compatible strings to match DT"

* tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: omap-usb-host: Fix dts probe of children
  mfd: da9063: Fix DT probing with constraints
2018-09-23 17:19:27 +02:00
Greg Kroah-Hartman
a83f87c1d2 Merge tag 'for-linus-20180922' of git://git.kernel.dk/linux-block
Jens writes:
  "Just a single fix in this pull request, fixing a regression in
  /proc/diskstats caused by the unification of timestamps."

* tag 'for-linus-20180922' of git://git.kernel.dk/linux-block:
  block: use nanosecond resolution for iostat
2018-09-23 08:33:28 +02:00
Dennis Zhou (Facebook)
101246ec02 blkcg: rename blkg_try_get to blkg_tryget
blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:19 -06:00
Dennis Zhou (Facebook)
b3b9f24f5f blkcg: change blkg reference counting to use percpu_ref
Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:18 -06:00
Dennis Zhou (Facebook)
e2b0989954 blkcg: cleanup and make blk_get_rl use blkg_lookup_create
blk_get_rl is responsible for identifying which request_list a request
should be allocated to. Try get logic was added earlier, but
semantically the logic was not changed.

This patch makes better use of the bio already having a reference to the
blkg in the hot path. The cold path uses a better fallback of
blkg_lookup_create rather than just blkg_lookup and then falling back to
the q->root_rl. If lookup_create fails with anything but -ENODEV, it
falls back to q->root_rl.

A clarifying comment is added to explain why q->root_rl is used rather
than the root blkg's rl.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:16 -06:00
Dennis Zhou (Facebook)
f0fcb3ec89 blkcg: remove additional reference to the css
The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:15 -06:00
Dennis Zhou (Facebook)
c839e7a03f blkcg: remove bio->bi_css and instead use bio->bi_blkg
Prior patches ensured that all bios are now associated with some blkg.
This now makes bio->bi_css unnecessary as blkg maintains a reference to
the blkcg already.

This patch removes the field bi_css and transfers corresponding uses to
access via bi_blkg.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:13 -06:00
Dennis Zhou (Facebook)
bdc2491708 blkcg: associate writeback bios with a blkg
One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg. In
this patch, the wbc_init_bio call is changed such that it must be called
after a queue has been associated with the bio.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:11 -06:00
Dennis Zhou (Facebook)
74b7c02a9b blkcg: associate a blkg for pages being evicted by swap
A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:09 -06:00
Dennis Zhou (Facebook)
5bf9a1f3b4 blkcg: consolidate bio_issue_init to be a part of core
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:08 -06:00
Dennis Zhou (Facebook)
a7b39b4e96 blkcg: always associate a bio with a blkg
Previously, blkg's were only assigned as needed by blk-iolatency and
blk-throttle. bio->css was also always being associated while blkg was
being looked up and then thrown away in blkcg_bio_issue_check.

This patch begins the cleanup of bio->css and bio->bi_blkg by always
associating a blkg in blkcg_bio_issue_check. This tries to create the
blkg, but if it is not possible, falls back to using the root_blkg of
the request_queue. Therefore, a bio will always be associated with a
blkg. The duplicate association logic is removed from blk-throttle and
blk-iolatency.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:06 -06:00
Dennis Zhou (Facebook)
07b05bcc32 blkcg: convert blkg_lookup_create to find closest blkg
There are several scenarios where blkg_lookup_create can fail. Examples
include the blkcg dying, request_queue is dying, or simply being OOM. At
the end of the day, most handle this by simply falling back to the
q->root_blkg and calling it a day.

This patch implements the notion of closest blkg. During
blkg_lookup_create, if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest is introduced and used
during association so a bio is always attached to a blkg.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:05 -06:00
Dennis Zhou (Facebook)
49f4c2dc2b blkcg: update blkg_lookup_create to do locking
To know when to create a blkg, the general pattern is to do a
blkg_lookup and if that fails, lock and then do a lookup again and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create to do locking and implement this
pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
If a call site wants to do its own error handling or already owns the
queue lock, they can use __blkg_lookup_create. This will be used in
upcoming patches.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:03 -06:00
Dennis Zhou (Facebook)
27e6fa996c blkcg: fix ref count issue with bio_blkcg using task_css
The accessor function bio_blkcg either returns the blkcg associated with
the bio or finds one in the current context. This can cause an issue
when trying to associate a bio with a blkcg. Particularly, it's the
third case that is problematic:

	return css_to_blkcg(task_css(current, io_cgrp_id));

As the above may race against task migration and the cgroup exiting, it
is not always ok to take a reference on the blkcg returned from
bio_blkcg.

This patch adds association ahead of calling bio_blkcg rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg. blk_get_rl is modified as well to get
a reference to the blkcg it may use and blk_put_rl will always put the
reference back. Association is also moved above the bio_blkcg call to
ensure it will not return NULL in blk-iolatency.

BFQ and CFQ utilize this flaw, but due to the complexity, I do not want
to address this in this series. I've created a private version of the
function with notes not to use it describing the flaw. Hopefully soon,
that code can be cleaned up.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:02 -06:00