CQE compressing reduces PCI overhead by coalescing and compressing
multiple CQEs into a single merged CQE. Successful compressing
improves message rate especially for small packet traffic.
CQE compressing is supported for all 64B CQE formats (with certain
limitations) generated by RQ/Responder or by SQ/Requestor.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The capabilities include:
- Max number of compressed and aggregated CQEs in a single session,
while zero means unsupported.
- For Responder, there are two formats of mini CQE: mini CQE with Rx
hash and mini CQE with checksum. They're mutual exclusive.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The capabilities whether hardware support multi packet WQE or not is
exposed to user space through query_device by uhw.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Multi packet WQE enables sending multiple fix sized packets
using a single WQE. The exposed field reports such HW support.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull block layer updates from Jens Axboe:
"This is the main block pull request this series. Contrary to previous
release, I've kept the core and driver changes in the same branch. We
always ended up having dependencies between the two for obvious
reasons, so makes more sense to keep them together. That said, I'll
probably try and keep more topical branches going forward, especially
for cycles that end up being as busy as this one.
The major parts of this pull request is:
- Improved support for O_DIRECT on block devices, with a small
private implementation instead of using the pig that is
fs/direct-io.c. From Christoph.
- Request completion tracking in a scalable fashion. This is utilized
by two components in this pull, the new hybrid polling and the
writeback queue throttling code.
- Improved support for polling with O_DIRECT, adding a hybrid mode
that combines pure polling with an initial sleep. From me.
- Support for automatic throttling of writeback queues on the block
side. This uses feedback from the device completion latencies to
scale the queue on the block side up or down. From me.
- Support from SMR drives in the block layer and for SD. From Hannes
and Shaun.
- Multi-connection support for nbd. From Josef.
- Cleanup of request and bio flags, so we have a clear split between
which are bio (or rq) private, and which ones are shared. From
Christoph.
- A set of patches from Bart, that improve how we handle queue
stopping and starting in blk-mq.
- Support for WRITE_ZEROES from Chaitanya.
- Lightnvm updates from Javier/Matias.
- Supoort for FC for the nvme-over-fabrics code. From James Smart.
- A bunch of fixes from a whole slew of people, too many to name
here"
* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
blk-stat: fix a few cases of missing batch flushing
blk-flush: run the queue when inserting blk-mq flush
elevator: make the rqhash helpers exported
blk-mq: abstract out blk_mq_dispatch_rq_list() helper
blk-mq: add blk_mq_start_stopped_hw_queue()
block: improve handling of the magic discard payload
blk-wbt: don't throttle discard or write zeroes
nbd: use dev_err_ratelimited in io path
nbd: reset the setup task for NBD_CLEAR_SOCK
nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
nvme-fabrics: Add target support for FC transport
nvme-fabrics: Add host support for FC transport
nvme-fabrics: Add FC transport LLDD api definitions
nvme-fabrics: Add FC transport FC-NVME definitions
nvme-fabrics: Add FC transport error codes to nvme.h
Add type 0x28 NVME type code to scsi fc headers
nvme-fabrics: patch target code in prep for FC transport support
nvme-fabrics: set sqe.command_id in core not transports
parser: add u64 number parser
nvme-rdma: align to generic ib_event logging helper
...
Pull drm updates from Dave Airlie:
"This is the main pull request for drm for 4.10 kernel.
New drivers:
- ZTE VOU display driver (zxdrm)
- Amlogic Meson Graphic Controller GXBB/GXL/GXM SoCs (meson)
- MXSFB support (mxsfb)
Core:
- Format handling has been reworked
- Better atomic state debugging
- drm_mm leak debugging
- Atomic explicit fencing support
- fbdev helper ops
- Documentation updates
- MST fbcon fixes
Bridge:
- Silicon Image SiI8620 driver
Panel:
- Add support for new simple panels
i915:
- GVT Device model
- Better HDMI2.0 support on skylake
- More watermark fixes
- GPU idling rework for suspend/resume
- DP Audio workarounds
- Scheduler prep-work
- Opregion CADL handling
- GPU scheduler and priority boosting
amdgfx/radeon:
- Support for virtual devices
- New VM manager for non-contig VRAM buffers
- UVD powergating
- SI register header cleanup
- Cursor fixes
- Powermanagement fixes
nouveau:
- Powermangement reworks for better voltage/clock changes
- Atomic modesetting support
- Displayport Multistream (MST) support.
- GP102/104 hang and cursor fixes
- GP106 support
hisilicon:
- hibmc support (BMC chip for aarch64 servers)
armada:
- add tracing support for overlay change
- refactor plane support
- de-midlayer the driver
omapdrm:
- Timing code cleanups
rcar-du:
- R8A7792/R8A7796 support
- Misc fixes.
sunxi:
- A31 SoC display engine support
imx-drm:
- YUV format support
- Cleanup plane atomic update
mali-dp:
- Misc fixes
dw-hdmi:
- Add support for HDMI i2c master controller
tegra:
- IOMMU support fixes
- Error handling fixes
tda998x:
- Fix connector registration
- Improved robustness
- Fix infoframe/audio compliance
virtio:
- fix busid issues
- allocate more vbufs
qxl:
- misc fixes and cleanups.
vc4:
- Fragment shader threading
- ETC1 support
- VEC (tv-out) support
msm:
- A5XX GPU support
- Lots of atomic changes
tilcdc:
- Misc fixes and cleanups.
etnaviv:
- Fix dma-buf export path
- DRAW_INSTANCED support
- fix driver on i.MX6SX
exynos:
- HDMI refactoring
fsl-dcu:
- fbdev changes"
* tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux: (1343 commits)
drm/nouveau/kms/nv50: fix atomic regression on original G80
drm/nouveau/bl: Do not register interface if Apple GMUX detected
drm/nouveau/bl: Assign different names to interfaces
drm/nouveau/bios/dp: fix handling of LevelEntryTableIndex on DP table 4.2
drm/nouveau/ltc: protect clearing of comptags with mutex
drm/nouveau/gr/gf100-: handle GPC/TPC/MPC trap
drm/nouveau/core: recognise GP106 chipset
drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas
drm/nouveau/gr/gf100-: FECS intr handling is not relevant on proprietary ucode
drm/nouveau/gr/gf100-: properly ack all FECS error interrupts
drm/nouveau/fifo/gf100-: recover from host mmu faults
drm: Add fake controlD* symlinks for backwards compat
drm/vc4: Don't use drm_put_dev
drm/vc4: Document VEC DT binding
drm/vc4: Add support for the VEC (Video Encoder) IP
drm: Add TV connector states to drm_connector_state
drm: Turn DRM_MODE_SUBCONNECTOR_xx definitions into an enum
drm/vc4: Fix ->clock_select setting for the VEC encoder
drm/amdgpu/dce6: Set MASTER_UPDATE_MODE to 0 in resume_mc_access as well
drm/amdgpu: use pin rather than pin_restricted in a few cases
...
Pull VFIO updates from Alex Williamson:
- VFIO updates for v4.10 primarily include a new Mediated Device
interface, which essentially allows software defined devices to be
exposed to users through VFIO. The host vendor driver providing this
virtual device polices, or mediates user access to the device.
These devices often incorporate portions of real devices, for
instance the primary initial users of this interface expose vGPUs
which allow the user to map mediated devices, or mdevs, to a portion
of a physical GPU. QEMU composes these mdevs into PCI representations
using the existing VFIO user API. This enables both Intel KVM-GT
support, which is also expected to arrive into Linux mainline during
the v4.10 merge window, as well as NVIDIA vGPU, and also Channel I/O
devices (aka CCW devices) for s390 virtualization support. (Kirti
Wankhede, Neo Jia)
- Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin)
- Fixes to VFIO capability chain handling (Eric Auger)
- Error handling fixes for fallout from mdev (Christophe JAILLET)
- Notifiers to expose struct kvm to mdev vendor drivers (Jike Song)
- type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia)
* tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages
vfio iommu type1: Fix size argument to vfio_find_dma() during DMA UNMAP.
vfio iommu type1: WARN_ON if notifier block is not unregistered
kvm: set/clear kvm to/from vfio_group when group add/delete
vfio: support notifier chain in vfio_group
vfio: vfio_register_notifier: classify iommu notifier
vfio: Fix handling of error returned by 'vfio_group_get_from_dev()'
vfio: fix vfio_info_cap_add/shift
vfio/pci: Drop unnecessary pcibios_err_to_errno()
MAINTAINERS: Add entry VFIO based Mediated device drivers
docs: Sample driver to demonstrate how to use Mediated device framework.
docs: Sysfs ABI for mediated device framework
docs: Add Documentation for Mediated devices
vfio: Define device_api strings
vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
vfio: Introduce vfio_set_irqs_validate_and_prepare()
vfio_pci: Update vfio_pci to use vfio_info_add_capability()
vfio: Introduce common function to add capabilities
vfio iommu: Add blocking notifier to notify DMA_UNMAP
...
Pull pstore updates from Kees Cook:
"Improvements and fixes to pstore subsystem:
- add additional checks for bad platform data
- remove bounce buffer in console writer
- protect read/unlink race with a mutex
- correctly give up during dump locking failures
- increase ftrace bandwidth by splitting ftrace buffers per CPU"
* tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
ramoops: add pdata NULL check to ramoops_probe
pstore: Convert console write to use ->write_buf
pstore: Protect unlink with read_mutex
pstore: Use global ftrace filters for function trace filtering
ftrace: Provide API to use global filtering for ftrace ops
pstore: Clarify context field przs as dprzs
pstore: improve error report for failed setup
pstore: Merge per-CPU ftrace records into one
pstore: Add ftrace timestamp counter
ramoops: Split ftrace buffer space into per-CPU zones
pstore: Make ramoops_init_przs generic for other prz arrays
pstore: Allow prz to control need for locking
pstore: Warn on PSTORE_TYPE_PMSG using deprecated function
pstore: Make spinlock per zone instead of global
pstore: Actually give up during locking failure
Pull EDAC updates from Borislav Petkov:
- add KNM support to sb_edac (Piotr Luc)
- add AMD Zen support to amd64_edac (Yazen Ghannam)
- misc small cleanups, improvements and fixes (Colin Ian King, Dave
Hansen, Pan Bian, Thor Thayer, Wei Yongjun, Yanjiang Jin, yours
truly)
* tag 'edac_for_4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (26 commits)
EDAC, amd64: Fix improper return value
EDAC, amd64: Improve amd64-specific printing macros
EDAC, amd64: Autoload amd64_edac_mod on Fam17h systems
EDAC, amd64: Define and register UMC error decode function
EDAC, amd64: Determine EDAC capabilities on Fam17h systems
EDAC, amd64: Determine EDAC MC capabilities on Fam17h
EDAC, amd64: Add Fam17h debug output
EDAC, amd64: Add Fam17h scrubber support
EDAC, mce_amd: Don't report poison bit on Fam15h, bank 4
EDAC, amd64: Read MC registers on AMD Fam17h
EDAC, amd64: Reserve correct PCI devices on AMD Fam17h
EDAC, amd64: Add AMD Fam17h family type and ops
EDAC, amd64: Extend ecc_enabled() to Fam17h
EDAC, amd64: Don't force-enable ECC checking on newer systems
EDAC, amd64: Add Deferred Error type
EDAC, amd64: Rename __log_bus_error() to be more specific
EDAC, amd64: Change target of pci_name from F2 to F3
EDAC, mce_amd: Rename nb_bus_decoder to dram_ecc_decoder
EDAC: Add LRDDR4 DRAM type
EDAC, mpc85xx: Implement remove method for the platform driver
...
Pull thermal management updates from Zhang Rui:
- Thermal core code reorganization and cleanup. Two new files are
created for thermal sysfs I/F code and thermal helper functions
(Eduardo Valentin).
- Sanitize hotplug and locking for x86_pkg_temp driver (Thomas
Gleixner)
- Update MAINTAINER file for pwm-fan driver and Samsung thermal driver
(Lukasz Majewski)
- Fix module auto-load for max77620, tango and db8500 thermal driver
(Javier Martinez Canillas)
- Fix a bug that thermal hwmon sysfs I/F returns wrong critical trip
point temperature value (Krzysztof Kozlowski)
- Add Skylake PCH 100 series support for intel_pch_thermal driver
(OGAWA Hirofumi)
- Small fixes and cleanups for platform thermal drivers (Julia Lawall,
Luis Henriques, Leo Yan, Stephen Boyd, Shawn Lin, Javi Merino and
Lukasz Luba)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (76 commits)
MAINTAINERS: Samsung: Update maintainer for PWM FAN and SAMSUNG THERMAL
thermal/x86 pkg temp: Convert to hotplug state machine
thermal/x86_pkg_temp: Sanitize package management
thermal/x86_pkg_temp: Move work into package struct
thermal/x86_pkg_temp: Move work scheduled flag into package struct
thermal/x86_pkg_temp: Sanitize locking
thermal/x86_pkg_temp: Cleanup code some more
thermal/x86_pkg_temp: Cleanup namespace
thermal/x86_pkg_temp: Get rid of ref counting
thermal/x86_pkg_temp: Sanitize callback (de)initialization
thermal/x86_pkg_temp: Replace open coded cpu search
thermal/x86_pkg_temp: Remove redundant package search
thermal/x86_pkg_temp: Cleanup thermal interrupt handling
thermal: hwmon: Properly report critical temperature in sysfs
devfreq_cooling: pass a pointer to devfreq in the power model callbacks
devfreq_cooling: make the structs devfreq_cooling_xxx visible for all
dt-bindings: rockchip-thermal: fix the misleading description
thermal: rockchip: improve the warning log
thermal: db8500: Fix module autoload
thermal: tango: Fix module autoload
...
Pull clk updates from Stephen Boyd:
"This is a fairly quiet release. We don't have any patches to the core
framework. The only patch that can even be considered "core" adds
another clk_get() variant. The rest of the changes are in drivers for
various SoCs, and we have a few bits for ARM shmobile architecture
code (dts and mach) due to the dependency we're breaking between
shmobile architecture code and its clk driver. Those shmobile bits
have also been pulled into arm-soc tree. Here's the summary:
Core:
- Support for devm_get_clk_from_child() used with DT bindings that
have subnodes with the 'clocks' property
New Drivers:
- Allwinner A64 (sun50i)
- i.MX imx6ull
- Socionext's UniPhier SoC CPUs
- Mediatek MT2701 SoCs
- Rockchip rk1108 SoCs
- Qualcomm MSM8994/MSM8992 SoCS
- Qualcomm RPM Clocks
- Hisilicon Hi3516CV300 and Hi3798CV200 CRG
- Oxford Semiconductor OX820 and OX810SE SoCs
- Renesas RZ/G1M and RZ/GIE SoCs
- Renesas R-Car RST driver for mode pin states
Updates:
- Four Allwinner SoCs are migrated to the new style clk driver
- Rockchip rk3399,rk3066 PLL optimizations
- i.MX LVDS display glitch fixes and AV PLL precision improvements
- Qualcomm MSM8996 GPU GDSCs, hw controlled GDSCs, and Alpha PLL
support
- Explicit demodularization of always builtin drivers
- Freescale Qoriq ls1012a and ls1046a support
- Exynos 5433 parent typo fix and critical clock tagging
- Renesas r8a7743/r8a7745 CPG
- Renesas R-Car M3-W CSI2/VIN/SYS-DMAC/(H)SCIF/I2C/DRIF/gfx support
- stm32f4* LSI, LSE, RTC, and QSPI clocks
- pxa27x and pxa25x cpufreq as clks
- TI omap36xx sprz319 advisory 2.1 workaround
- Broadcom bcm2835 rate change propogation to PLLH_AUX from VEC"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (150 commits)
clk: bcm: Fix 'maybe-uninitialized' warning in bcm2835_clock_choose_div_and_prate()
clk: add devm_get_clk_from_child() API
clk: st: clk-flexgen: Unmap region obtained by of_iomap
clk: keystone: pll: Unmap region obtained by of_iomap
clk:mmp:clk-of-mmp2: Free memory and Unmap region obtained by kzalloc and of_iomap
clk:mmp:clk-of-pxa910: Free memory and Unmap region obtained by kzmalloc and of_iomap
clk: mmp: clk-of-pxa1928: Free memory obtained by kzalloc
clk: cdce925: Fix limit check
clk: bcm: Make COMMON_CLK_IPROC into a library
clk: qoriq: added ls1012a clock configuration
clk: ti: dra7: fix "failed to lookup clock node gmac_gmii_ref_clk_div" boot message
clk: bcm: Allow rate change propagation to PLLH_AUX on VEC clock
clk: bcm: Support rate change propagation on bcm2835 clocks
clk: bcm2835: Avoid overwriting the div info when disabling a pll_div clk
clk: ti: omap36xx: Work around sprz319 advisory 2.1
clk: clk-wm831x: fix a logic error
clk: uniphier: add cpufreq data for LD11, LD20 SoCs
clk: uniphier: add CPU-gear change (cpufreq) support
clk: qcom: Put venus core0/1 gdscs to hw control mode
clk: qcom: gdsc: Add support for gdscs with HW control
...
Pull rpmsg updates from Bjorn Andersson:
"Argument validation in public functions, function stubs for
COMPILE_TEST-ing clients, preparation for exposing rpmsg endponts
to user space and minor Qualcomm SMD fixes"
* tag 'rpmsg-v4.10' of git://github.com/andersson/remoteproc:
dt-binding: soc: qcom: smd: Add label property
rpmsg: qcom_smd: Correct return value for O_NONBLOCK
rpmsg: Provide function stubs for API
rpmsg: Handle invalid parameters in public API
rpmsg: Support drivers without primary endpoint
rpmsg: Introduce a driver override mechanism
rpmsg: smd: Reduce restrictions when finding channel
Pull remoteproc updates from Bjorn Andersson:
- introduce remoteproc "subdevice" support, which allows remoteproc
driver to associate devices to the "running" state of the remoteproc,
allowing devices to be probed and removed as the remote processor is
booted, shut down or recovering from a crash.
- handling of virtio device resources was improved, vring memory is now
allocated as part of other memory allocation. This ensures that all
vrings for all virtio devices are allocated before we boot the remote
processor.
- the debugfs mechanism for starting and stopping remoteproc instances
was replaced with a sysfs interface, also providing a mechanism for
specifying firmware to use by the instance. This allows user space to
load and boot use case specific firmware on remote processors.
- new drivers for the ST Slimcore and Qualcomm Hexagon DSP as well as
removal of the unused StE modem loader.
- finally support for crash recovery in the Qualcomm Wirelss subsystem
(used for WiFi/BT/FM on a number of platforms) and a number of bug
fixes and cleanups
* tag 'rproc-v4.10' of git://github.com/andersson/remoteproc: (49 commits)
remoteproc: qcom_adsp_pil: select qcom_scm
remoteproc: Drop wait in __rproc_boot()
remoteproc/ste: Delete unused driver
remoteproc: Remove "experimental" warning
remoteproc: qcom_adsp_pil: select qcom_scm
dt-binding: soc: qcom: smd: Add label property
remoteproc: qcom: mdt_loader: add include for sizes
remoteproc: Update last rproc_put users to rproc_free
remoteproc: qcom: adsp: Add missing MODULE_DEVICE_TABLE
remoteproc: wcnss-pil: add QCOM_SMD dependency
dmaengine: st_fdma: Revert: "Revert: Update st_fdma to 'depends on REMOTEPROC'"
remoteproc: Add support for xo clock
remoteproc: adsp-pil: fix recursive dependency
remoteproc: Introduce Qualcomm ADSP PIL
dt-binding: remoteproc: Introduce ADSP loader binding
remoteproc: qcom_wcnss: Fix circular module dependency
remoteproc: Merge table_ptr and cached_table pointers
remoteproc: Remove custom vdev handler list
remoteproc: Update max_notifyid as we allocate vrings
remoteproc: Decouple vdev resources and devices
...
Pull MMC updates from Ulf Hansson:
"It's been an busy period for mmc. Quite some changes in the mmc core,
two new mmc host drivers, some existing drivers being extended to
support new IP versions and lots of other updates.
MMC core:
- Delete eMMC packed command support
- Introduce mmc_abort_tuning() to enable eMMC tuning to fail
gracefully
- Introduce mmc_can_retune() to see if a host can be retuned
- Re-work and improve the sequence when sending a CMD6 for mmc
- Enable CDM13 polling when switching to HS and HS DDR mode for mmc
- Relax checking for CMD6 errors after switch to HS200
- Re-factoring the code dealing with the mmc block queue
- Recognize whether the eMMC card supports CMDQ
- Fix 4K native sector check
- Don't power off the card when starting the host
- Increase MMC_IOC_MAX_BYTES to support bigger firmware binaries
- Improve error handling and drop meaningless BUG_ONs()
- Lots of clean-ups and changes to improve the quality of the code
MMC host:
- sdhci: Fix tuning sequence and clean-up the related code
- sdhci: Add support to via DT override broken SDHCI cap register
bits
- sdhci-cadence: Add new driver for Cadence SD4HC SDHCI variant
- sdhci-msm: Update clock management
- sdhci-msm: Add support for eMMC HS400 mode
- sdhci-msm: Deploy runtime/system PM support
- sdhci-iproc: Extend driver support to newer IP versions
- sdhci-pci: Add support for Intel GLK
- sdhci-pci: Add support for Intel NI byt sdio
- sdhci-acpi: Add support for 80860F14 UID 2 SDIO bus
- sdhci: Lots of various small improvements and clean-ups
- tmio: Add support for tuning
- sh_mobile_sdhi: Add support for tuning
- sh_mobile_sdhi: Extend driver to support SDHI IP on R7S72100 SoC
- sh_mobile_sdhi: remove support for sh7372
- davinci: Use mmc_of_parse() to enable generic mmc DT bindings
- meson: Add new driver to support GX platforms
- dw_mmc: Deploy generic runtime/system PM support
- dw_mmc: Lots of various small improvements
As a part of the mmc changes this time, I have also pulled in an
immutable branch/tag (soc-device-match-tag1) hosted by Geert
Uytterhoeven, to share the implementation of the new
soc_device_match() interface. This is needed by these mmc related
changes:
- mmc: sdhci-of-esdhc: Get correct IP version for T4240-R1.0-R2.0
- soc: fsl: add GUTS driver for QorIQ platforms"
* tag 'mmc-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (136 commits)
mmc: sdhci-cadence: add Cadence SD4HC support
mmc: sdhci: export sdhci_execute_tuning()
mmc: sdhci: Tidy tuning loop
mmc: sdhci: Simplify tuning block size logic
mmc: sdhci: Factor out tuning helper functions
mmc: sdhci: Use mmc_abort_tuning()
mmc: mmc: Introduce mmc_abort_tuning()
mmc: sdhci: Always allow tuning to fall back to fixed sampling
mmc: sdhci: Fix tuning reset after exhausting the maximum number of loops
mmc: sdhci: Fix recovery from tuning timeout
Revert "mmc: sdhci: Reset cmd and data circuits after tuning failure"
mmc: mmc: Relax checking for switch errors after HS200 switch
mmc: sdhci-acpi: support 80860F14 UID 2 SDIO bus
mmc: sdhci-of-at91: remove bogus MMC_SDHCI_IO_ACCESSORS select
mmc: sdhci-pci: Use ACPI to get max frequency for Intel NI byt sdio
mmc: sdhci-pci: Add PCI ID for Intel NI byt sdio
mmc: sdhci-s3c: add spin_unlock_irq() before calling clk_round_rate
mmc: dw_mmc: display the clock message only one time when card is polling
mmc: dw_mmc: add the debug message for polling and non-removable
mmc: dw_mmc: check the "present" variable before checking flags
...
Pull regulator updates from Mark Brown:
"A quiet release for the regulator API, conference season must've been
slowing everyone down:
- a new interface allowing drivers to provide an interface for
reading a more detailed description of error conditions which
allows devices using these regulators to build
- ACPI support for the fixed voltage regulator.
- cleanups for the TI TWL drivers to reduce code duplication"
* tag 'regulator-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (24 commits)
regulator: tps65086: Fix 25mV ranges for BUCK regulators 1, 2, and 6
regulator: Fix regulator_get_error_flags() signature mismatch
regulator: core: add newline in debug message
regulator: tps65086: Fix 25mV ranges for BUCK regulators
regulator: core: Correct type of mode in regulator_mode_constrain
regulator: max77620: add documentation for MPOK property
regulator: max77620: add support to configure MPOK
regulator: twl6030: Remove unused fields from struct twlreg_info
regulator: twl: Remove unused fields from struct twlreg_info
regulator: twl: split twl6030 logic into its own file
regulator: twl: kill unused functions
regulator: twl: make driver DT only
regulator: twl-regulator: rework fixed regulator definition
regulator: max77620: remove unused variable
regulator: pwm: Add missing quotes to DT example
regulator: stw481x-vmmc: fix ages old enable error
regulator: gpio: properly check return value of of_get_named_gpio
regulator: lp873x: Add support for populating input supply
regulator: axp20x: Fix axp809 ldo_io registration error on cold boot
regulators: helpers: Fix handling of bypass_val_on in get_bypass_regmap
...
Pull LED updates from Jacek Anaszewski:
- userspace LED class driver - it can be useful for testing triggers
and can also be used to implement virtual LEDs
- LED class driver for NIC78bx device
- LED core fixes for preventing potential races while setting
brightness when software blinking is enabled
- improvements in LED documentation to mention semantics on changing
brightness while trigger is active
* tag 'leds_for_4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: pca955x: Add ACPI support
leds: netxbig: fix module autoload for OF registration
leds: pca963x: Add ACPI support
leds: leds-cobalt-raq: use builtin_platform_driver
led: core: Fix blink_brightness setting race
led: core: Use atomic bit-field for the blink-flags
leds: Add user LED driver for NIC78bx device
leds: verify vendor and change license in mlxcpld driver
leds: pca963x: enable low-power state
leds: pca9532: Use default trigger value from platform data
leds: pca963x: workaround group blink scaling issue
cleanup LED documentation and make it match reality
leds: lp3952: Export I2C module alias information for module autoload
leds: mc13783: Fix MC13892 keypad led access
ledtrig-cpu.c: fix english
leds/leds-lp5523.txt: make documentation match reality
tools/leds: Add uledmon program for monitoring userspace LEDs
leds: Use macro for max device node name size
leds: Introduce userspace LED class driver
mfd: qcom-pm8xxx: Clean up PM8XXX namespace
Pull pinctrl updates from Linus Walleij:
"Bulk pin control changes for the v4.10 kernel cycle:
No core changes this time. Mainly gradual improvement and
feature growth in the drivers.
New drivers:
- New driver for TI DA850/OMAP-L138/AM18XX pinconf
- The SX150x was moved over from the GPIO subsystem and reimagined as
a pin control driver with GPIO support in a joint effort by three
independent users of this hardware. The result was amazingly good!
- New subdriver for the Oxnas OX820
Improvements:
- The sunxi driver now supports the generic pin control bindings
rather than the sunxi-specific. Add debouncing support to the
driver.
- Simplifications in pinctrl-single adding a generic parser.
- Two downstream fixes and move the Raspberry Pi BCM2835 over to use
the generic GPIOLIB_IRQCHIP"
* tag 'pinctrl-v4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (92 commits)
pinctrl: sx150x: use new nested IRQ infrastructure
pinctrl: sx150x: handle missing 'advanced' reg in sx1504 and sx1505
pinctrl: sx150x: rename 'reg_advance' to 'reg_advanced'
pinctrl: sx150x: access the correct bits in the 4-bit regs of sx150[147]
pinctrl: mt8173: set GPIO16 to usb iddig mode
pinctrl: bcm2835: switch to GPIOLIB_IRQCHIP
pinctrl: New driver for TI DA850/OMAP-L138/AM18XX pinconf
devicetree: bindings: pinctrl: Add binding for ti,da850-pupd
Documentation: pinctrl: palmas: Add ti,palmas-powerhold-override property definition
pinctrl: intel: set default handler to be handle_bad_irq()
pinctrl: sx150x: add support for sx1501, sx1504, sx1505 and sx1507
pinctrl: sx150x: sort chips by part number
pinctrl: sx150x: use correct registers for reg_sense (sx1502 and sx1508)
pinctrl: imx: fix imx_pinctrl_desc initialization
pinctrl: sx150x: support setting multiple pins at once
pinctrl: sx150x: various spelling fixes and some white-space cleanup
pinctrl: mediatek: use builtin_platform_driver
pinctrl: stm32: use builtin_platform_driver
pinctrl: sunxi: Testing the wrong variable
pinctrl: nomadik: split up and comments MC0 pins
...
Pull GPIO updates from Luinus Walleij:
"Bulk GPIO changes for the v4.10 kernel cycle:
Core changes:
- Simplify threaded interrupt handling: instead of passing numbed
parameters to gpiochip_irqchip_add_chained() we create a new call:
gpiochip_irqchip_add_nested() so the two types are clearly
semantically different. Also make sure that all nested chips call
gpiochip_set_nested_irqchip() which is necessary for IRQ resend to
work properly if it happens.
- Return error on seek operations for the chardev.
- Clamp values set as part of gpio[d]_direction_output() so that
anything != 0 will be send down to the driver as "1" not the value
passed in.
- ACPI can now support naming of GPIO lines, hogs and holes in the
GPIO lists.
New drivers:
- The SX150x driver was deemed unfit for the GPIO subsystem and was
moved over to a combined GPIO+pinctrl driver in the pinctrl
subsystem.
New features:
- Various cleanups to various drivers"
* tag 'gpio-v4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (49 commits)
gpio: merrifield: Implement gpio_get_direction callback
gpio: merrifield: Add support for hardware debouncer
gpio: chardev: Return error for seek operations
gpio: arizona: Tidy up probe error path
gpio: arizona: Remove pointless set of platform drvdata
gpio: pl061: delete platform data handling
gpio: pl061: move platform data into driver
gpio: pl061: rename variable from chip to pl061
gpio: pl061: rename state container struct
gpio: pl061: use local state for parent IRQ storage
gpio: set explicit nesting on drivers
gpio: simplify adding threaded interrupts
gpio: vf610: use builtin_platform_driver
gpio: axp209: use correct register for GPIO input status
gpio: stmpe: fix interrupt handling bug
gpio: em: depnd on ARCH_SHMOBILE
gpio: zx: depend on ARCH_ZX
gpio: x86: update config dependencies for x86 specific hardware
gpio: mb86s7x: use builtin_platform_driver
gpio: etraxfs: use builtin_platform_driver
...
The comment on the name indirection suggested an issue but turned out
to be untrue. Digging in older kernel version showed issue with ipw2x00
but that is no longer true so get rid on the name indirection.
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- Again move the information relevant for driver writers next to the
callbacks.
- Put the overview and userspace interface documentation into a DOC:
section within the code.
- Remove the text that mmap needs to be coherent - since the
DMA_BUF_IOCTL_SYNC landed that's no longer the case. But keep the text
that for pte zapping exporters need to adjust the address space.
- Add a FIXME that kmap and the new begin/end stuff used by the SYNC
ioctl don't really mix correctly. That's something I just realized
while doing this doc rework.
- Augment function and structure docs like usual.
Cc: linux-doc@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
[sumits: fix cosmetic issues]
Link: http://patchwork.freedesktop.org/patch/msgid/20161209185309.1682-5-daniel.vetter@ffwll.ch
- Put the initial overview for dma-buf into dma-buf.rst.
- Put all the comments about detailed semantics into the right
kernel-doc comment for functions or ops structure member.
- To allow that detail, switch the reworked kerneldoc to inline style
for dma_buf_ops.
- Tie everything together into a much more streamlined overview
comment, relying on the hyperlinks for all the details.
- Also sprinkle some links into the kerneldoc for dma_buf and
dma_buf_attachment to tie it all together.
Cc: linux-doc@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161209185309.1682-4-daniel.vetter@ffwll.ch
Backmerge the docs-next branch from Jon into drm-misc so that we can
apply the dma-buf documentation cleanup patches. Git found a conflict
where there was none because both drm-misc and docs had identical
patches to clean up file rename issues in the rst include directives.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Pull documentation update from Jonathan Corbet:
"These are the documentation changes for 4.10.
It's another busy cycle for the docs tree, as the sphinx conversion
continues. Highlights include:
- Further work on PDF output, which remains a bit of a pain but
should be more solid now.
- Five more DocBook template files converted to Sphinx. Only 27 to
go... Lots of plain-text files have also been converted and
integrated.
- Images in binary formats have been replaced with more
source-friendly versions.
- Various bits of organizational work, including the renaming of
various files discussed at the kernel summit.
- New documentation for the device_link mechanism.
... and, of course, lots of typo fixes and small updates"
* tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits)
dma-buf: Extract dma-buf.rst
Update Documentation/00-INDEX
docs: 00-INDEX: document directories/files with no docs
docs: 00-INDEX: remove non-existing entries
docs: 00-INDEX: add missing entries for documentation files/dirs
docs: 00-INDEX: consolidate process/ and admin-guide/ description
scripts: add a script to check if Documentation/00-INDEX is sane
Docs: change sh -> awk in REPORTING-BUGS
Documentation/core-api/device_link: Add initial documentation
core-api: remove an unexpected unident
ppc/idle: Add documentation for powersave=off
Doc: Correct typo, "Introdution" => "Introduction"
Documentation/atomic_ops.txt: convert to ReST markup
Documentation/local_ops.txt: convert to ReST markup
Documentation/assoc_array.txt: convert to ReST markup
docs-rst: parse-headers.pl: cleanup the documentation
docs-rst: fix media cleandocs target
docs-rst: media/Makefile: reorganize the rules
docs-rst: media: build SVG from graphviz files
docs-rst: replace bayer.png by a SVG image
...
Merge updates from Andrew Morton:
- various misc bits
- most of MM (quite a lot of MM material is awaiting the merge of
linux-next dependencies)
- kasan
- printk updates
- procfs updates
- MAINTAINERS
- /lib updates
- checkpatch updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits)
init: reduce rootwait polling interval time to 5ms
binfmt_elf: use vmalloc() for allocation of vma_filesz
checkpatch: don't emit unified-diff error for rename-only patches
checkpatch: don't check c99 types like uint8_t under tools
checkpatch: avoid multiple line dereferences
checkpatch: don't check .pl files, improve absolute path commit log test
scripts/checkpatch.pl: fix spelling
checkpatch: don't try to get maintained status when --no-tree is given
lib/ida: document locking requirements a bit better
lib/rbtree.c: fix typo in comment of ____rb_erase_color
lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
MAINTAINERS: add drm and drm/i915 irc channels
MAINTAINERS: add "C:" for URI for chat where developers hang out
MAINTAINERS: add drm and drm/i915 bug filing info
MAINTAINERS: add "B:" for URI where to file bugs
get_maintainer: look for arbitrary letter prefixes in sections
printk: add Kconfig option to set default console loglevel
printk/sound: handle more message headers
printk/btrfs: handle more message headers
printk/kdb: handle more message headers
...
Pull irq updates from Thomas Gleixner:
"The irq department provides:
- a major update to the auto affinity management code, which is used
by multi-queue devices
- move of the microblaze irq chip driver into the common driver code
so it can be shared between microblaze, powerpc and MIPS
- a series of updates to the ARM GICV3 interrupt controller
- the usual pile of fixes and small improvements all over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
powerpc/virtex: Use generic xilinx irqchip driver
irqchip/xilinx: Try to fall back if xlnx,kind-of-intr not provided
irqchip/xilinx: Add support for parent intc
irqchip/xilinx: Rename get_irq to xintc_get_irq
irqchip/xilinx: Restructure and use jump label api
irqchip/xilinx: Clean up print messages
microblaze/irqchip: Move intc driver to irqchip
ARM: virt: Select ARM_GIC_V3_ITS
ARM: gic-v3-its: Add 32bit support to GICv3 ITS
irqchip/gic-v3-its: Specialise readq and writeq accesses
irqchip/gic-v3-its: Specialise flush_dcache operation
irqchip/gic-v3-its: Narrow down Entry Size when used as a divider
irqchip/gic-v3-its: Change unsigned types for AArch32 compatibility
irqchip/gic-v3: Use nops macro for Cavium ThunderX erratum 23154
irqchip/gic-v3: Convert arm64 GIC accessors to {read,write}_sysreg_s
genirq/msi: Drop artificial PCI dependency
irqchip/bcm7038-l1: Implement irq_cpu_offline() callback
genirq/affinity: Use default affinity mask for reserved vectors
genirq/affinity: Take reserved vectors into account when spreading irqs
PCI: Remove the irq_affinity mask from struct pci_dev
...
Pull timer updates from Thomas Gleixner:
"The time/timekeeping/timer folks deliver with this update:
- Fix a reintroduced signed/unsigned issue and cleanup the whole
signed/unsigned mess in the timekeeping core so this wont happen
accidentaly again.
- Add a new trace clock based on boot time
- Prevent injection of random sleep times when PM tracing abuses the
RTC for storage
- Make posix timers configurable for real tiny systems
- Add tracepoints for the alarm timer subsystem so timer based
suspend wakeups can be instrumented
- The usual pile of fixes and updates to core and drivers"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
timekeeping: Use mul_u64_u32_shr() instead of open coding it
timekeeping: Get rid of pointless typecasts
timekeeping: Make the conversion call chain consistently unsigned
timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
alarmtimer: Add tracepoints for alarm timers
trace: Update documentation for mono, mono_raw and boot clock
trace: Add an option for boot clock as trace clock
timekeeping: Add a fast and NMI safe boot clock
timekeeping/clocksource_cyc2ns: Document intended range limitation
timekeeping: Ignore the bogus sleep time if pm_trace is enabled
selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous"
clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap
clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map()
arm64: dts: rockchip: Arch counter doesn't tick in system suspend
clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend
posix-timers: Make them configurable
posix_cpu_timers: Move the add_device_randomness() call to a proper place
timer: Move sys_alarm from timer.c to itimer.c
ptp_clock: Allow for it to be optional
Kconfig: Regenerate *.c_shipped files after previous changes
...
Pull smp hotplug updates from Thomas Gleixner:
"This is the final round of converting the notifier mess to the state
machine. The removal of the notifiers and the related infrastructure
will happen around rc1, as there are conversions outstanding in other
trees.
The whole exercise removed about 2000 lines of code in total and in
course of the conversion several dozen bugs got fixed. The new
mechanism allows to test almost every hotplug step standalone, so
usage sites can exercise all transitions extensively.
There is more room for improvement, like integrating all the
pointlessly different architecture mechanisms of synchronizing,
setting cpus online etc into the core code"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
tracing/rb: Init the CPU mask on allocation
soc/fsl/qbman: Convert to hotplug state machine
soc/fsl/qbman: Convert to hotplug state machine
zram: Convert to hotplug state machine
KVM/PPC/Book3S HV: Convert to hotplug state machine
arm64/cpuinfo: Convert to hotplug state machine
arm64/cpuinfo: Make hotplug notifier symmetric
mm/compaction: Convert to hotplug state machine
iommu/vt-d: Convert to hotplug state machine
mm/zswap: Convert pool to hotplug state machine
mm/zswap: Convert dst-mem to hotplug state machine
mm/zsmalloc: Convert to hotplug state machine
mm/vmstat: Convert to hotplug state machine
mm/vmstat: Avoid on each online CPU loops
mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
tracing/rb: Convert to hotplug state machine
oprofile/nmi timer: Convert to hotplug state machine
net/iucv: Use explicit clean up labels in iucv_init()
x86/pci/amd-bus: Convert to hotplug state machine
x86/oprofile/nmi: Convert to hotplug state machine
...
Add a configuration option to set the default console loglevel. This
is, as before, still possible to override at runtime through bootargs
(loglevel=<x>), sysrq and /proc/printk.
There are cases where adding additional arguments on the commandline is
impractical, and changing the default for the kernel when being built
makes more sense. Provide such a method here, for those who choose to
do so.
Also, while touching this code, clarify the difference between
MESSAGE_LOGLEVEL_DEFAULT and CONSOLE_LOGLEVEL_DEFAULT.
Link: http://lkml.kernel.org/r/1479676829-30031-1-git-send-email-olof@lixom.net
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 4bcc595ccd ("printk: reinstate KERN_CONT for printing
continuation lines") allows to define more message headers for a single
message. The motivation is that continuous lines might get mixed.
Therefore it make sense to define the right log level for every piece of
a cont line.
The current btrfs_printk() macros do not support continuous lines at the
moment. But better be prepared for a custom messages and avoid
potential "lvl" buffer overflow.
This patch iterates over the entire message header. It is interested
only into the message level like the original code.
This patch also introduces PRINTK_MAX_SINGLE_HEADER_LEN. Three bytes
are enough for the message level header at the moment. But it used to
be three, see the commit 04d2c8c83d ("printk: convert the format for
KERN_<LEVEL> to a 2 byte pattern").
Also I fixed the default ratelimit level. It looked very strange when it
was different from the default log level.
[pmladek@suse.com: Fix a check of the valid message level]
Link: http://lkml.kernel.org/r/20161111183236.GD2145@dhcp128.suse.cz
Link: http://lkml.kernel.org/r/1478695291-12169-4-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: David Sterba <dsterba@suse.com>
Cc: Joe Perches <joe@perches.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
anon_vma_prepare() is mostly a large "if (unlikely(...))" block, as the
expected common case is that an anon_vma already exists. We could turn
the condition around and return 0, but it also makes sense to do it
inline and avoid a call for the common case.
Bloat-o-meter naturally shows that inlining the check has some code size
costs:
add/remove: 1/1 grow/shrink: 4/0 up/down: 475/-373 (102)
function old new delta
__anon_vma_prepare - 359 +359
handle_mm_fault 2744 2796 +52
hugetlb_cow 1146 1170 +24
hugetlb_fault 2123 2145 +22
wp_page_copy 1469 1487 +18
anon_vma_prepare 373 - -373
Checking the asm however confirms that the hot paths now avoid a call,
which is moved away.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20161116074005.22768-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add arch specific callback in the generic THP page cache code that will
deposit and withdarw preallocated page table. Archs like ppc64 use this
preallocated table to store the hash pte slot information.
Testing:
kernel build of the patch series on tmpfs mounted with option huge=always
The related thp stat:
thp_fault_alloc 72939
thp_fault_fallback 60547
thp_collapse_alloc 603
thp_collapse_alloc_failed 0
thp_file_alloc 253763
thp_file_mapped 4251
thp_split_page 51518
thp_split_page_failed 1
thp_deferred_split_page 73566
thp_split_pmd 665
thp_zero_page_alloc 3
thp_zero_page_alloc_failed 0
[akpm@linux-foundation.org: remove unneeded parentheses, per Kirill]
Link: http://lkml.kernel.org/r/20161113150025.17942-2-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we track the shadow entries in the page cache in the upper
bits of the radix_tree_node->count, behind the back of the radix tree
implementation. Because the radix tree code has no awareness of them,
we rely on random subtleties throughout the implementation (such as the
node->count != 1 check in the shrinking code, which is meant to exclude
multi-entry nodes but also happens to skip nodes with only one shadow
entry, as that's accounted in the upper bits). This is error prone and
has, in fact, caused the bug fixed in d3798ae8c6 ("mm: filemap: don't
plant shadow entries without radix tree node").
To remove these subtleties, this patch moves shadow entry tracking from
the upper bits of node->count to the existing counter for exceptional
entries. node->count goes back to being a simple counter of valid
entries in the tree node and can be shrunk to a single byte.
This vastly simplifies the page cache code. All accounting happens
natively inside the radix tree implementation, and maintaining the LRU
linkage of shadow nodes is consolidated into a single function in the
workingset code that is called for leaf nodes affected by a change in
the page cache tree.
This also removes the last user of the __radix_delete_node() return
value. Eliminate it.
Link: http://lkml.kernel.org/r/20161117193211.GE23430@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The bug in khugepaged fixed earlier in this series shows that radix tree
slot replacement is fragile; and it will become more so when not only
NULL<->!NULL transitions need to be caught but transitions from and to
exceptional entries as well. We need checks.
Re-implement radix_tree_replace_slot() on top of the sanity-checked
__radix_tree_replace(). This requires existing callers to also pass the
radix tree root, but it'll warn us when somebody replaces slots with
contents that need proper accounting (transitions between NULL entries,
real entries, exceptional entries) and where a replacement through the
slot pointer would corrupt the radix tree node counts.
Link: http://lkml.kernel.org/r/20161117193021.GB23430@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The way the page cache is sneaking shadow entries of evicted pages into
the radix tree past the node entry accounting and tracking them manually
in the upper bits of node->count is fraught with problems.
These shadow entries are marked in the tree as exceptional entries,
which are a native concept to the radix tree. Maintain an explicit
counter of exceptional entries in the radix tree node. Subsequent
patches will switch shadow entry tracking over to that counter.
DAX and shmem are the other users of exceptional entries. Since slot
replacements that change the entry type from regular to exceptional must
now be accounted, introduce a __radix_tree_replace() function that does
replacement and accounting, and switch DAX and shmem over.
The increase in radix tree node size is temporary. A followup patch
switches the shadow tracking to this new scheme and we'll no longer need
the upper bits in node->count and shrink that back to one byte.
Link: http://lkml.kernel.org/r/20161117192945.GA23430@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the 4-byte `capabilities' field next to other 4-byte things.
Shrinks sizeof(backing_dev_info) by 8 bytes on x86_64.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We ran into a funky issue, where someone doing 256K buffered reads saw
128K requests at the device level. Turns out it is read-ahead capping
the request size, since we use 128K as the default setting. This
doesn't make a lot of sense - if someone is issuing 256K reads, they
should see 256K reads, regardless of the read-ahead setting, if the
underlying device can support a 256K read in a single command.
This patch introduces a bdi hint, io_pages. This is the soft max IO
size for the lower level, I've hooked it up to the bdev settings here.
Read-ahead is modified to issue the maximum of the user request size,
and the read-ahead max size, but capped to the max request size on the
device side. The latter is done to avoid reading ahead too much, if the
application asks for a huge read. With this patch, the kernel behaves
like the application expects.
Link: http://lkml.kernel.org/r/1479498073-8657-1-git-send-email-axboe@fb.com
Signed-off-by: Jens Axboe <axboe@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With commit e77b0852b5 ("mm/mmu_gather: track page size with mmu
gather and force flush if page size change") we added the ability to
force a tlb flush when the page size change in a mmu_gather loop. We
did that by checking for a page size change every time we added a page
to mmu_gather for lazy flush/remove. We can improve that by moving the
page size change check early and not doing it every time we add a page.
This also helps us to do tlb flush when invalidating a range covering
dax mapping. Wrt dax mapping we don't have a backing struct page and
hence we don't call tlb_remove_page, which earlier forced the tlb flush
on page size change. Moving the page size change check earlier means we
will do the same even for dax mapping.
We also avoid doing this check on architecture other than powerpc.
In a later patch we will remove page size check from tlb_remove_page().
Link: http://lkml.kernel.org/r/20161026084839.27299-5-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We use __tlb_adjust_range to update range convered by mmu_gather struct.
We later use the 'start' and 'end' to do a mmu_notifier_invalidate_range
in tlb_flush_mmu_tlbonly(). Update the 'end' correctly in
__tlb_adjust_range so that we call mmu_notifier_invalidate_range with
the correct range values.
Wrt tlbflush, this should not have any impact, because a flush with
correct start address will flush tlb mapping for the range.
Also add comment w.r.t updating the range when we free pagetable pages.
For now we don't support a range based page table cache flush.
Link: http://lkml.kernel.org/r/20161026084839.27299-3-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>