Commit Graph

94117 Commits

Author SHA1 Message Date
Damien Le Moal
b73c67c2cb dm kcopyd: add sequential write feature
When copyying blocks to host-managed zoned block devices, writes must be
sequential.  However, dm_kcopyd_copy() does not guarantee this as writes
are issued in the completion order of reads, and reads may complete out
of order despite being issued sequentially.

Fix this by introducing the DM_KCOPYD_WRITE_SEQ feature flag.  This can
be specified when calling dm_kcopyd_copy() and should be set
automatically if one of the destinations is a host-managed zoned block
device.  For a split job, the master job maintains the write position at
which writes must be issued.  This is checked with the pop() function
which is modified to not return any write I/O sub job that is not at the
correct write position.

When DM_KCOPYD_WRITE_SEQ is specified for a job, errors cannot be
ignored and the flag DM_KCOPYD_IGNORE_ERROR is ignored, even if
specified by the user.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:51 -04:00
Damien Le Moal
10999307c1 dm: introduce dm_remap_zone_report()
A target driver support zoned block devices and exposing it as such may
receive REQ_OP_ZONE_REPORT request for the user to determine the mapped
device zone configuration. To process properly such request, the target
driver may need to remap the zone descriptors provided in the report
reply. The helper function dm_remap_zone_report() does this generically
using only the target start offset and length and the start offset
within the target device.

dm_remap_zone_report() will remap the start sector of all zones
reported. If the report includes sequential zones, the write pointer
position of these zones will also be remapped.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:51 -04:00
Damien Le Moal
dd88d313be dm table: add zoned block devices validation
1) Introduce DM_TARGET_ZONED_HM feature flag:

The target drivers currently available will not operate correctly if a
table target maps onto a host-managed zoned block device.

To avoid problems, introduce the new feature flag DM_TARGET_ZONED_HM to
allow a target to explicitly state that it supports host-managed zoned
block devices.  This feature is checked for all targets in a table if
any of the table's block devices are host-managed.

Note that as host-aware zoned block devices are backward compatible with
regular block devices, they can be used by any of the current target
types.  This new feature is thus restricted to host-managed zoned block
devices.

2) Check device area zone alignment:

If a target maps to a zoned block device, check that the device area is
aligned on zone boundaries to avoid problems with REQ_OP_ZONE_RESET
operations (resetting a partially mapped sequential zone would not be
possible).  This also facilitates the processing of zone report with
REQ_OP_ZONE_REPORT bios.

3) Check block devices zone model compatibility

When setting the DM device's queue limits, several possibilities exists
for zoned block devices:
1) The DM target driver may want to expose a different zone model
(e.g. host-managed device emulation or regular block device on top of
host-managed zoned block devices)
2) Expose the underlying zone model of the devices as-is

To allow both cases, the underlying block device zone model must be set
in the target limits in dm_set_device_limits() and the compatibility of
all devices checked similarly to the logical block size alignment.  For
this last check, introduce validate_hardware_zoned_model() to check that
all targets of a table have the same zone model and that the zone size
of the target devices are equal.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
[Mike Snitzer refactored Damien's original work to simplify the code]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:50 -04:00
Joe Perches
d2c3c8dcb5 dm: convert DM printk macros to pr_<level> macros
Using pr_<level> is the more common logging style.

Standardize style and use new macro DM_FMT.
Use no_printk in DMDEBUG macros when CONFIG_DM_DEBUG is not #defined.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:50 -04:00
Mikulas Patocka
fc1841e1c1 dm ioctl: add a new DM_DEV_ARM_POLL ioctl
This ioctl will record the current global event number in the structure
dm_file, so that next select or poll call will wait until new events
arrived since this ioctl.

The DM_DEV_ARM_POLL ioctl has the same effect as closing and reopening
the handle.

Using the DM_DEV_ARM_POLL ioctl is optional - if the userspace is OK
with closing and reopening the /dev/mapper/control handle after select
or poll, there is no need to re-arm via ioctl.

Usage:
1. open the /dev/mapper/control device
2. send the DM_DEV_ARM_POLL ioctl
3. scan the event numbers of all devices we are interested in and process
   them
4. call select, poll or epoll on the handle (it waits until some new event
   happens since the DM_DEV_ARM_POLL ioctl)
5. go to step 2

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:49 -04:00
Kuppuswamy Sathyanarayanan
57129044f5 mfd: intel_soc_pmic_bxtwc: Use chained IRQs for second level IRQ chips
Whishkey cove PMIC has support to mask/unmask interrupts at two levels.
At first level we can mask/unmask interrupt domains like TMU, GPIO, ADC,
CHGR, BCU THERMAL and PWRBTN and at second level, it provides facility
to mask/unmask individual interrupts belong each of this domain. For
example, in case of TMU, at first level we have TMU interrupt domain,
and at second level we have two interrupts, wake alarm, system alarm that
belong to the TMU interrupt domain.

Currently, in this driver all first level IRQs are registered as part of
IRQ chip(bxtwc_regmap_irq_chip). By default, after you register the IRQ
chip from your driver, all IRQs in that chip will masked and can only be
enabled if that IRQ is requested using request_irq() call. This is the
default Linux IRQ behavior model. And whenever a dependent device that
belongs to PMIC requests only the second level IRQ and not explicitly
unmask the first level IRQ, then in essence the second level IRQ will
still be disabled. For example, if TMU device driver request wake_alarm
IRQ and not explicitly unmask TMU level 1 IRQ then according to the default
Linux IRQ model,  wake_alarm IRQ will still be disabled. So the proper
solution to fix this issue is to use the chained IRQ chip concept. We
should chain all the second level chip IRQs to the corresponding first
level IRQ. To do this, we need to create separate IRQ chips for every
group of second level IRQs.

In case of TMU, when adding second level IRQ chip, instead of using PMIC
IRQ we should use the corresponding first level IRQ. So the following
code will change from

ret = regmap_add_irq_chip(pmic->regmap, pmic->irq, ...)

to,

virq = regmap_irq_get_virq(&pmic->irq_chip_data, BXTWC_TMU_LVL1_IRQ);

ret = regmap_add_irq_chip(pmic->regmap, virq, ...)

In case of Whiskey Cove Type-C driver, Since USBC IRQ is moved under
charger level2 IRQ chip. We should use charger IRQ chip(irq_chip_data_chgr)
to get the USBC virtual IRQ number.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Revieved-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-06-19 15:45:01 +01:00
Hugh Dickins
1be7107fbe mm: larger stack guard gap, between vmas
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-19 21:50:20 +08:00
Corentin LABBE
88d58ef891 crypto: engine - replace pr_xxx by dev_xxx
By adding a struct device *dev to struct engine, we could store the
device used at register time and so use all dev_xxx functions instead of
pr_xxx.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-19 14:19:54 +08:00
Olof Johansson
5a55029f46 Merge tag 'at91-ab-4.13-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux into next/soc
SoC for 4.13:

 - New suspend/resume mode for sama5d2
 - Initial support for armv7m based SoCs

* tag 'at91-ab-4.13-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  ARM: at91: remove atmel_nand_data
  ARM: at91: fix at91_suspend_entering_slow_clock link error
  ARM: at91: debug: add samv7x support
  ARM: at91: add armv7m SoC detection
  ARM: at91: handle CONFIG_PM for armv7m configurations
  ARM: at91: Add armv7m support
  ARM: at91: Document armv7m compatibles
  ARM: at91: Documentation: add armv7m families
  ARM: at91: pm: fallback to slowclock when backup mode fails
  ARM: at91: pm: allow selecting standby and suspend modes
  ARM: at91: pm: Add sama5d2 backup mode

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 22:53:20 -07:00
Olof Johansson
1161a0d50a Merge tag 'renesas-dt-bindings2-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/dt
Second Round of Renesas ARM Based SoC DT Bindings Updates for v4.13

* Document:
  - Add Renesas H3-based Salvator-XS board DT bindings
  - Add iW-RainboW-G20D-Qseven-RZG1M board
  - Add iW-RainboW-G20M-Qseven-RZG1M system on module
  - Update R-Car Gen3 ULCB board part numbers
* Add clock bit definitions for r7s72100 SoC

* tag 'renesas-dt-bindings2-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  ARM: shmobile: Document Renesas H3-based Salvator-XS board DT bindings
  ARM: shmobile: Update R-Car Gen3 ULCB board part numbers
  ARM: shmobile: document iW-RainboW-G20D-Qseven-RZG1M board
  ARM: shmobile: document iW-RainboW-G20M-Qseven-RZG1M system on module
  ARM: dts: r7s72100: add clock bit definitions

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 22:50:49 -07:00
Olof Johansson
8c5c250670 Merge tag 'renesas-drivers-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/drivers
Renesas ARM Based SoC Drivers Updates for v4.13

* Rework Kconfig and Makefile logic

* tag 'renesas-drivers-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  soc: renesas: Rework Kconfig and Makefile logic

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 22:45:08 -07:00
David S. Miller
4b153ca989 Merge tag 'mac80211-for-davem-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
Here's just the fix for that ancient bug:
 * remove wext calling ndo_do_ioctl, since nobody needs
   that now and it makes the type change easier
 * use struct iwreq instead of struct ifreq almost everywhere
   in wireless extensions code
 * copy only struct iwreq from userspace in dev_ioctl for the
   wireless extensions, since it's smaller than struct ifreq
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-19 00:03:51 -04:00
Olof Johansson
f39b24e0b4 Merge tag 'tegra-for-4.13-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into next/drivers
soc/tegra: Changes for v4.13-rc1

This contains an implementation of generic PM domains for Tegra186,
based on the BPMP powergate request.

* tag 'tegra-for-4.13-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  soc/tegra: flowctrl: Fix error handling
  soc/tegra: bpmp: Implement generic PM domains
  soc/tegra: bpmp: Update ABI header
  PM / Domains: Allow overriding the ->xlate() callback

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 21:01:02 -07:00
Olof Johansson
93c452f5d3 Merge tag 'scpi-updates-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into next/drivers
SCPI update for v4.13

Adds support to get DVFS transition latency and OPP for any device whose
DVFS are managed by SCPI. This avoids code duplication in both cpufreq
and devfreq SCPI drivers.

* tag 'scpi-updates-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  cpufreq: scpi: use new scpi_ops functions to remove duplicate code
  firmware: arm_scpi: add support to populate OPPs and get transition latency

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 20:55:07 -07:00
Olof Johansson
8aba614385 Merge tag 'arm-soc/for-4.13/devicetree-arm64' of http://github.com/Broadcom/stblinux into next/dt64
This pull request contains Broadcom ARM64-based SoCs Device Tree changes for
4.13. Please note the following from Eric:

I've based this summary on the bcm2835-dt-next tag, to clarify what's in this
patch series, but it does require being careful since it involves a cross-merge
between branches.

- Anup documents the Broadcom Stingray binding, common clocks, adds initial
  support for the Stingray DTSI and DTS files and adds support for the PL022,
  PL330 and SP805

- Sandeep adds the clock nodes to the Stingray Device Tree nodes

- Pramod adds support for the NAND, pinctrl, GPIO to the Stingray Device Tree nodes

- Oza adds I2C Device Tree nodes to the Stingray DTSes

- Srinath adds PWM and SDHCI Device Tree nodes for the Stingray SoC

- Ravijeta adds support for the USB Dual Role PHY on Northstar 2

- Gerd starts adding references to the sdhost and sdhci controllers, and then
  switches the sdcard to to use the SDHOST (faster than SDHCI)

- Stefan defines the BCM2837 thermal coefficients in order for the Raspberry Pi
  thermal driver to work correctly

* tag 'arm-soc/for-4.13/devicetree-arm64' of http://github.com/Broadcom/stblinux:
  arm64: dts: NS2: Add USB DRD PHY device tree node
  ARM64: dts: bcm2837: Define CPU thermal coefficients
  arm64: dts: Add PWM and SDHCI DT nodes for Stingray SOC
  arm64: dts: Add PL022, PL330 and SP805 DT nodes for Stingray
  arm64: dts: Add I2C DT nodes for Stingray SoC
  arm64: dts: Add GPIO DT nodes for Stingray SOC
  arm64: dts: Add pinctrl DT nodes for Stingray SOC
  arm64: dts: Add NAND DT nodes for Stingray SOC
  arm64: dts: Add clock DT nodes for Stingray SOC
  arm64: dts: Initial DTS files for Broadcom Stingray SOC
  dt-bindings: clk: Extend binding doc for Stingray SOC
  dt-bindings: bcm: Add Broadcom Stingray bindings document
  ARM: dts: bcm283x: switch from &sdhci to &sdhost
  arm64: dts: bcm2837: add &sdhci and &sdhost
  ARM: dts: bcm283x: Add CPU thermal zone with 1 trip point
  ARM: dts: Add devicetree for the Raspberry Pi 3, for arm32 (v6)

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 20:18:44 -07:00
Olof Johansson
7a699a85e1 Merge tag 'v4.12-next-soc' of https://github.com/mbgg/linux-mediatek into next/drivers
- enhance scpsys to support mt6797
- add mt6797 support to scpsys
- fix error path in pmic-wrapper
- fix possible NULL pointer dereference in pmic-wrapper

* tag 'v4.12-next-soc' of https://github.com/mbgg/linux-mediatek:
  soc: mediatek: PMIC wrap: Fix possible NULL derefrence.
  soc: mediatek: PMIC wrap: Fix error handling
  soc: mediatek: add MT6797 scpsys support
  soc: mediatek: add vdec item for scpsys
  soc: mediatek: avoid using fixed spm power status defines

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 19:35:48 -07:00
Olof Johansson
63a677bca2 Merge tag 'v4.13-rockchip-dts32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/dt
A bunch of changes including mali gpu nodes for rk3288 boards
following (and including) the new Mali Midgard binding; a lot of
improvements for the rk3228/rk3229 socs (tsadc, operating points,
usb, clock-rates, pinctrl, watchdog); finalizing the rk1108->rv1108
rename and adc buttons for the rk3288 firefly boards.

* tag 'v4.13-rockchip-dts32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  ARM: dts: rockchip: enable usb for rk3229 evb board
  ARM: dts: rockchip: add usb nodes on rk322x
  ARM: dts: rockchip: add adc button for Firefly
  ARM: dts: rockchip: enable ARM Mali GPU on rk3288-veyron
  ARM: dts: rockchip: enable ARM Mali GPU on rk3288-firefly
  ARM: dts: rockchip: enable ARM Mali GPU on rk3288-rock2-som
  ARM: dts: rockchip: add ARM Mali GPU node for rk3288
  dt-bindings: gpu: add bindings for the ARM Mali Midgard GPU
  ARM: dts: rockchip: set a sane frequence for tsadc on rk322x
  ARM: dts: rockchip: add operating-points-v2 for cpu on rk322x
  ARM: dts: rockchip: set default rates for core clocks on rk322x
  ARM: dts: rockchip: add second uart2 pinctrl on rk322x
  ARM: dts: rockchip: correct rk322x uart2 pinctrl
  ARM: dts: rockchip: add watchdog device node on rk322x
  clk: rockchip: add clock-ids for more rk3228 clocks
  clk: rockchip: add ids for camera on rk3399
  ARM: dts: rockchip: fix rk322x i2s1 pinctrl error
  ARM: dts: rockchip: rename RK1108-evb to RV1108-evb
  ARM: dts: rockchip: rename core dtsi from RK1108 to RV1108
  ARM: dts: rockchip: Setup usb vbus-supply on rk3288-rock2

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 19:07:25 -07:00
Olof Johansson
f44a12dcd9 Merge tag 'reset-for-4.13-2' of git://git.pengutronix.de/git/pza/linux into next/drivers
Reset controller changes for v4.13, part 2

- Add reset manager offsets for Stratix10
- Use kref for reset contol reference counting
- Add new TI SCI reset driver for TI Keystone SoCs

* tag 'reset-for-4.13-2' of git://git.pengutronix.de/git/pza/linux:
  reset: Add the TI SCI reset driver
  dt-bindings: reset: Add TI SCI reset binding
  reset: use kref for reference counting
  dt-bindings: reset: Add reset manager offsets for Stratix10

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 19:03:45 -07:00
Olof Johansson
8b4a35876d Merge tag 'reset-for-4.13' of git://git.pengutronix.de/git/pza/linux into next/drivers
Reset controller changes for v4.13

- Use devm_kcalloc to allocate the channels array in sti/reset-syscfg
- Rename the TI_SYSCON_RESET Kconfig option to RESET_TI_SYSCON for
  consistency
- Add new reset driver and DT bindings for Cortina Systems Gemini
  reset controller

* tag 'reset-for-4.13' of git://git.pengutronix.de/git/pza/linux:
  reset: Add a Gemini reset controller
  reset: add DT bindings header for Gemini reset controller
  reset: ti_syscon: Rename TI_SYSCON_RESET to RESET_TI_SYSCON
  reset: sti: Use devm_kcalloc() in syscfg_reset_controller_register()

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 19:03:39 -07:00
Johan Hovold
e33a3f84f8 NFC: nfcmrvl: allow gpio 0 for reset signalling
Allow gpio 0 to be used for reset signalling, and instead use negative
errnos to disable the reset functionality.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-06-18 23:58:00 +02:00
Ming Lei
1d9e9bc6b5 blk-mq: don't stop queue for quiescing
Queue can be started by other blk-mq APIs and can be used in
different cases, this limits uses of blk_mq_quiesce_queue()
if it is based on stopping queue, and make its usage very
difficult, especially users have to use the stop queue APIs
carefully for avoiding to break blk_mq_quiesce_queue().

We have applied the QUIESCED flag for draining and blocking
dispatch, so it isn't necessary to stop queue any more.

After stopping queue is removed, blk_mq_quiesce_queue() can
be used safely and easily, then users won't worry about queue
restarting during quiescing at all.

Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 14:24:48 -06:00
Ming Lei
f4560ffe8c blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue
It is required that no dispatch can happen any more once
blk_mq_quiesce_queue() returns, and we don't have such requirement
on APIs of stopping queue.

But blk_mq_quiesce_queue() still may not block/drain dispatch in the
the case of BLK_MQ_S_START_ON_RUN, so use the new introduced flag of
QUEUE_FLAG_QUIESCED and evaluate it inside RCU read-side critical
sections for fixing this issue.

Also blk_mq_quiesce_queue() is implemented via stopping queue, which
limits its uses, and easy to cause race, because any queue restart in
other paths may break blk_mq_quiesce_queue(). With the introduced
flag of QUEUE_FLAG_QUIESCED, we don't need to depend on stopping queue
for quiescing any more.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 14:24:27 -06:00
Ming Lei
e4e739131a blk-mq: introduce blk_mq_unquiesce_queue
blk_mq_start_stopped_hw_queues() is used implictly
as counterpart of blk_mq_quiesce_queue() for unquiescing queue,
so we introduce blk_mq_unquiesce_queue() and make it
as counterpart of blk_mq_quiesce_queue() explicitly.

This function is for improving the current quiescing mechanism
in the following patches.

Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 14:20:04 -06:00
Ming Lei
4f084b41a0 blk-mq: introduce blk_mq_quiesce_queue_nowait()
This patch introduces blk_mq_quiesce_queue_nowait() so
that we can workaround mpt3sas for quiescing its queue.

Once mpt3sas is fixed, we can remove this helper.

Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 14:20:04 -06:00
Ming Lei
97e0120990 blk-mq: move blk_mq_quiesce_queue() into include/linux/blk-mq.h
We usually put blk_mq_*() into include/linux/blk-mq.h, so
move this API into there.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 14:20:04 -06:00
NeilBrown
9b10f6a9c2 block: remove bio_clone() and all references.
bio_clone() is no longer used.
Only bio_clone_bioset() or bio_clone_fast().
This is for the best, as bio_clone() used fs_bio_set,
and filesystems are unlikely to want to use bio_clone().

So remove bio_clone() and all references.
This includes a fix to some incorrect documentation.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
NeilBrown
47e0fb461f blk: make the bioset rescue_workqueue optional.
This patch converts bioset_create() to not create a workqueue by
default, so alloctions will never trigger punt_bios_to_rescuer().  It
also introduces a new flag BIOSET_NEED_RESCUER which tells
bioset_create() to preserve the old behavior.

All callers of bioset_create() that are inside block device drivers,
are given the BIOSET_NEED_RESCUER flag.

biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios.  This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, and one allocated by
target_core_iblock.c.

biosets used by md/raid do not need rescuing as
their usage was recently audited and revised to never
risk deadlock.

It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
NeilBrown
011067b056 blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().

To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().

Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().

Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
NeilBrown
af67c31fba blk: remove bio_set arg from blk_queue_split()
blk_queue_split() is always called with the last arg being q->bio_split,
where 'q' is the first arg.

Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses
q->bio_split.

This is inconsistent and unnecessary.  Remove the last arg and always use
q->bio_split inside blk_queue_split()

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed)
Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
Christoph Hellwig
5bbf4e5a8e blk-mq-sched: unify request prepare methods
This patch makes sure we always allocate requests in the core blk-mq
code and use a common prepare_request method to initialize them for
both mq I/O schedulers.  For Kyber and additional limit_depth method
is added that is called before allocating the request.

Also because none of the intializations can really fail the new method
does not return an error - instead the bfq finish method is hardened
to deal with the no-IOC case.

Last but not least this removes the abuse of RQF_QUEUE by the blk-mq
scheduling code as RQF_ELFPRIV is all that is needed now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 10:08:55 -06:00
Christoph Hellwig
7b9e936163 blk-mq-sched: unify request finished methods
No need to have two different callouts of bfq vs kyber.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 10:08:55 -06:00
Wei Wang
44ebe79149 net: add debug atomic_inc_not_zero() in dst_hold()
This patch is meant to add a debug warning on the situation where dst is
being held during its destroy phase. This could potentially cause double
free issue on the dst.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
1eb04e7c9e net: reorder all the dst flags
As some dst flags are removed, reorder the dst flags to fill in the
blanks.
Note: these flags are not exposed into user space. So it is safe to
reorder.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
a4c2fd7f78 net: remove DST_NOCACHE flag
DST_NOCACHE flag check has been removed from dst_release() and
dst_hold_safe() in a previous patch because all the dst are now ref
counted properly and can be released based on refcnt only.
Looking at the rest of the DST_NOCACHE use, all of them can now be
removed or replaced with other checks.
So this patch gets rid of all the DST_NOCACHE usage and remove this flag
completely.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
b2a9c0ed75 net: remove DST_NOGC flag
Now that all the components have been changed to release dst based on
refcnt only and not depend on dst gc anymore, we can remove the
temporary flag DST_NOGC.

Note that we also need to remove the DST_NOCACHE check in dst_release()
and dst_hold_safe() because now all the dst are released based on refcnt
and behaves as DST_NOCACHE.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
5b7c9a8ff8 net: remove dst gc related code
This patch removes all dst gc related code and all the dst free
functions

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
52df157f17 xfrm: take refcnt of dst when creating struct xfrm_dst bundle
During the creation of xfrm_dst bundle, always take ref count when
allocating the dst. This way, xfrm_bundle_create() will form a linked
list of dst with dst->child pointing to a ref counted dst child. And
the returned dst pointer is also ref counted. This makes the link from
the flow cache to this dst now ref counted properly.
As the dst is always ref counted properly, we can safely mark
DST_NOGC flag so dst_release() will release dst based on refcnt only.
And dst gc is no longer needed and all dst_free() and its related
function calls should be replaced with dst_release() or
dst_release_immediate().

The special handling logic for dst->child in dst_destroy() can be
replaced with a simple dst_release_immediate() call on the child to
release the whole list linked by dst->child pointer.
Previously used DST_NOHASH flag is not needed anymore as well. The
reason that DST_NOHASH is used in the existing code is mainly to prevent
the dst inserted in the fib tree to be wrongly destroyed during the
deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH
flag is marked in all the dst children except the one which is in the
fib tree.
However, with this patch series to remove dst gc logic and release dst
only based on ref count, it is safe to release all the children from a
xfrm_dst bundle as long as the dst children are all ref counted
properly which is already the case in the existing code.
So, this patch removes the use of DST_NOHASH flag.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
db916649b5 ipv6: get rid of icmp6 dst garbage collector
icmp6 dst route is currently ref counted during creation and will be
freed by user during its call of dst_release(). So no need of a garbage
collector for it.
Remove all icmp6 dst garbage collector related code.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
9df16efadd ipv4: call dst_hold_safe() properly
This patch checks all the calls to
dst_hold()/skb_dst_force()/dst_clone()/dst_use() to see if
dst_hold_safe() is needed to avoid double free issue if dst
gc is removed and dst_release() directly destroys dst when
dst->__refcnt drops to 0.

In tx path, TCP hold sk->sk_rx_dst ref count and also hold sock_lock().
UDP and other similar protocols always hold refcount for
skb->_skb_refdst. So both paths seem to be safe.

In rx path, as it is lockless and skb_dst_set_noref() is likely to be
used, dst_hold_safe() should always be used when trying to hold dst.

In the routing code, if dst is held during an rcu protected session, it
is necessary to call dst_hold_safe() as the current dst might be in its
rcu grace period.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
4a6ce2b6f2 net: introduce a new function dst_dev_put()
This function should be called when removing routes from fib tree after
the dst gc is no longer in use.
We first mark DST_OBSOLETE_DEAD on this dst to make sure next
dst_ops->check() fails and returns NULL.
Secondly, as we no longer keep the gc_list, we need to properly
release dst->dev right at the moment when the dst is removed from
the fib/fib6 tree.
It does the following:
1. change dst->input and output pointers to dst_discard/dst_dscard_out to
   discard all packets
2. replace dst->dev with loopback interface

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:53:59 -04:00
Wei Wang
5f56f409b5 net: introduce DST_NOGC in dst_release() to destroy dst based on refcnt
The current mechanism of freeing dst is a bit complicated. dst has its
ref count and when user grabs the reference to the dst, the ref count is
properly taken in most cases except in IPv4/IPv6/decnet/xfrm routing
code due to some historic reasons.

If the reference to dst is always taken properly, we should be able to
simplify the logic in dst_release() to destroy dst when dst->__refcnt
drops from 1 to 0. And this should be the only condition to determine
if we can call dst_destroy().
And as dst is always ref counted, there is no need for a dst garbage
list to hold the dst entries that already get removed by the routing
code but are still held by other users. And the task to periodically
check the list to free dst if ref count become 0 is also not needed
anymore.

This patch introduces a temporary flag DST_NOGC(no garbage collector).
If it is set in the dst, dst_release() will call dst_destroy() when
dst->__refcnt drops to 0. dst_hold_safe() will also check for this flag
and do atomic_inc_not_zero() similar as DST_NOCACHE to avoid double free
issue.
This temporary flag is mainly used so that we can make the transition
component by component without breaking other parts.
This flag will be removed after all components are properly transitioned.

This patch also introduces a new function dst_release_immediate() which
destroys dst without waiting on the rcu when refcnt drops to 0. It will
be used in later patches.

Follow-up patches will correct all the places to properly take ref count
on dst and mark DST_NOGC. dst_release() or dst_release_immediate() will
be used to release the dst instead of dst_free() and its related
functions.
And final clean-up patch will remove the DST_NOGC flag.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:53:59 -04:00
Stephen Boyd
4dea04c1f1 Merge tag 'meson-clk-for-4.13-2' of git://github.com/BayLibre/clk-meson into clk-next
Pull Amlogic clk driver updates from Jerome Brunet:

 * Expose more clock gate on meson8 (SAR ADC, RNG, USB, SDIO, ETH)
 * Add new compatible to the meson8 clock controller for meson8b
 * Add missing parents to gxbb clk81

* tag 'meson-clk-for-4.13-2' of git://github.com/BayLibre/clk-meson:
  clk: meson: gxbb: add all clk81 parents
  clk: meson: meson8b: add compatibles for Meson8 and Meson8m2
  clk: meson8b: export the ethernet gate clock
  clk: meson8b: export the USB clocks
  clk: meson8b: export the gate clock for the HW random number generator
  clk: meson8b: export the SDIO clock
  clk: meson8b: export the SAR ADC clocks
2017-06-16 15:01:46 -07:00
Stephen Boyd
ef748cb39d Merge branch 'for-4.13-ti-clkctrl' of https://github.com/t-kristo/linux-pm into clk-next
* 'for-4.13-ti-clkctrl' of https://github.com/t-kristo/linux-pm:
  clk: ti: omap4: add clkctrl clock data
  dt-bindings: clk: add omap4 clkctrl definitions
  clk: ti: add support for clkctrl clocks
  Documentation: dt-bindings: Add binding documentation for TI clkctrl clocks
2017-06-16 14:52:02 -07:00
Stephen Boyd
8a02fcf8a0 Merge tag 'sunxi-clk-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-next
Pull Allwinner clock patches from Maxime Ripard:

Some new clock units are supported, for the display clocks unsed in the
newer SoCs, and the A83T PRCM.

There is also a bunch of minor fixes for clocks that are not used by
anyone, and reworks needed by drivers that will land in 4.13.

* tag 'sunxi-clk-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: (21 commits)
  clk: sunxi-ng: Move all clock types to a library
  clk: sunxi-ng: a83t: Add support for A83T's PRCM
  dt-bindings: clock: sunxi-ccu: Add compatible string for A83T PRCM
  clk: sunxi-ng: select SUNXI_CCU_MULT for sun8i-a83t
  clk: sunxi-ng: a83t: Fix audio PLL divider offset
  clk: sunxi-ng: a83t: Fix PLL lock status register offset
  clk: sunxi-ng: Add driver for A83T CCU
  clk: sunxi-ng: Support multiple variable pre-dividers
  dt-bindings: clock: sunxi-ccu: Add compatible string for A83T CCU
  clk: sunxi-ng: de2: fix wrong pointer passed to PTR_ERR()
  clk: sunxi-ng: sun5i: Export video PLLs
  clk: sunxi-ng: mux: Re-adjust parent rate
  clk: sunxi-ng: mux: Change pre-divider application function prototype
  clk: sunxi-ng: mux: split out the pre-divider computation code
  clk: sunxi-ng: mux: Don't just rely on the parent for CLK_SET_RATE_PARENT
  clk: sunxi-ng: div: Switch to divider_round_rate
  clk: sunxi-ng: Pass the parent and a pointer to the clocks round rate
  clk: divider: Make divider_round_rate take the parent clock
  clk: sunxi-ng: explicitly include linux/spinlock.h
  clk: sunxi-ng: add support for DE2 CCU
  ...
2017-06-16 14:45:27 -07:00
Piotr Gregor
99b3c58f7b PCI: Test INTx masking during enumeration, not at run-time
The test for INTx masking via PCI_COMMAND_INTX_DISABLE performed in
pci_intx_mask_supported() should be done before the device can be used.
This is to avoid writing PCI_COMMAND while the driver owns the device, in
case that has any effect on MSI/MSI-X interrupts.

Move the content of pci_intx_mask_supported() to pci_intx_mask_broken() and
call it from pci_setup_device().

The test result can be queried at any time later using the same
pci_intx_mask_supported() interface as before (though with changed
implementation), so callers (uio, vfio) should be unaffected.

Signed-off-by: Piotr Gregor <piotrgregor@rsyncme.org>
[bhelgaas: changelog, remove quirk check, remove locking, move
dev->broken_intx_masking assignment to caller]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2017-06-16 16:12:37 -05:00
Dave Airlie
660e855813 amdgpu: use drm sync objects for shared semaphores (v6)
This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.

Sync objects are managed via the drm syncobj ioctls.

The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.

This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.

In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.

NOTE: this interface addition needs a version bump to expose
it to userspace.

TODO: update to dep_sync when rebasing onto amdgpu master.
(with this - r-b from Christian)

v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis
v6: lookup post deps earlier, and just replace fences
in post deps stage (Christian)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-06-16 16:58:32 -04:00
Eric Caruso
e862645952 mfd: cros_ec: add debugfs, console log file
If the EC supports the new CONSOLE_READ command type, then we
place a console_log file in debugfs for that EC device which allows
us to grab EC logs. The kernel will poll every 10 seconds for the
log and keep its own buffer, but userspace should grab this and
write it out to some logs which actually get rotated.

Signed-off-by: Eric Caruso <ejcaruso@chromium.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
[bleung: restored original version of this commit, with pointer size
 issue to be fixed in next commit]
Signed-off-by: Benson Leung <bleung@chromium.org>
2017-06-16 13:57:45 -07:00
Nicolas Boichat
0aa877c558 mfd: cros_ec: Add EC console read structures definitions
ec_params_console_read_v1 is used to capture EC logs from kernel,
and ec_params_get_cmd_versions_v1 is used to probe whether EC
supports that command.

Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Benson Leung <bleung@chromium.org>
2017-06-16 13:57:44 -07:00
Gwendal Grignou
68c35ea25b mfd: cros_ec: Add helper for event notifier.
Add cros_ec_get_event() entry point to retrieve event within functions
called by the notifier.

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Benson Leung <bleung@chromium.org>
2017-06-16 13:57:44 -07:00
David S. Miller
273889e306 Merge tag 'mlx5-updates-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
Mellanox mlx5 updates and cleanups 2017-06-16

mlx5-updates-2017-06-16

This series provide some updates and cleanups for mlx5 core and netdevice
driver.

From Eli Cohen, add a missing event string.
From Or Gerlitz, some checkpatch cleanups.
From Moni, Disalbe HW level LAG when SRIOV is enabled.
From Tariq, A code reuse cleanup in aRFS flow.
From Itay Aveksis, Typo fix.
From Gal Pressman, ethtool statistics updates and "update stats" deferred work optimizations.
From Majd Dibbiny, Fast unload support on kernel shutdown.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 15:22:42 -04:00