Pull scheduler updates from Ingo Molnar:
"The main updates in this cycle were:
- Group balancing enhancements and cleanups (Brendan Jackman)
- Move CPU isolation related functionality into its separate
kernel/sched/isolation.c file, with related 'housekeeping_*()'
namespace and nomenclature et al. (Frederic Weisbecker)
- Improve the interactive/cpu-intense fairness calculation (Josef
Bacik)
- Improve the PELT code and related cleanups (Peter Zijlstra)
- Improve the logic of pick_next_task_fair() (Uladzislau Rezki)
- Improve the RT IPI based balancing logic (Steven Rostedt)
- Various micro-optimizations:
- better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi)
- better idle loop (Cheng Jian)
- ... plus misc fixes, cleanups and updates"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds
sched/sysctl: Fix attributes of some extern declarations
sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated
sched/isolation: Add basic isolcpus flags
sched/isolation: Move isolcpus= handling to the housekeeping code
sched/isolation: Handle the nohz_full= parameter
sched/isolation: Introduce housekeeping flags
sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL
sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu()
sched/isolation: Use its own static key
sched/isolation: Make the housekeeping cpumask private
sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu()
sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version
sched/isolation: Move housekeeping related code to its own file
sched/idle: Micro-optimize the idle loop
sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK
x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter
sched/rt: Simplify the IPI based RT balancing logic
block/ioprio: Use a helper to check for RT prio
sched/rt: Add a helper to test for a RT task
...
Pull perf updates from Ingo Molnar:
"The main changes in this cycle were:
Kernel:
- kprobes updates: use better W^X patterns for code modifications,
improve optprobes, remove jprobes. (Masami Hiramatsu, Kees Cook)
- core fixes: event timekeeping (enabled/running times statistics)
fixes, perf_event_read() locking fixes and cleanups, etc. (Peter
Zijlstra)
- Extend x86 Intel free-running PEBS support and support x86
user-register sampling in perf record and perf script. (Andi Kleen)
Tooling:
- Completely rework the way inline frames are handled. Instead of
querying for the inline nodes on-demand in the individual tools, we
now create proper callchain nodes for inlined frames. (Milian
Wolff)
- 'perf trace' updates (Arnaldo Carvalho de Melo)
- Implement a way to print formatted output to per-event files in
'perf script' to facilitate generate flamegraphs, elliminating the
need to write scripts to do that separation (yuzhoujian, Arnaldo
Carvalho de Melo)
- Update vendor events JSON metrics for Intel's Broadwell, Broadwell
Server, Haswell, Haswell Server, IvyBridge, IvyTown, JakeTown,
Sandy Bridge, Skylake, SkyLake Server - and Goldmont Plus V1 (Andi
Kleen, Kan Liang)
- Multithread the synthesizing of PERF_RECORD_ events for
pre-existing threads in 'perf top', speeding up that phase, greatly
improving the user experience in systems such as Intel's Knights
Mill (Kan Liang)
- Introduce the concept of weak groups in 'perf stat': try to set up
a group, but if it's not schedulable fallback to not using a group.
That gives us the best of both worlds: groups if they work, but
still a usable fallback if they don't. E.g: (Andi Kleen)
- perf sched timehist enhancements (David Ahern)
- ... various other enhancements, updates, cleanups and fixes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (139 commits)
kprobes: Don't spam the build log with deprecation warnings
arm/kprobes: Remove jprobe test case
arm/kprobes: Fix kretprobe test to check correct counter
perf srcline: Show correct function name for srcline of callchains
perf srcline: Fix memory leak in addr2inlines()
perf trace beauty kcmp: Beautify arguments
perf trace beauty: Implement pid_fd beautifier
tools include uapi: Grab a copy of linux/kcmp.h
perf callchain: Fix double mapping al->addr for children without self period
perf stat: Make --per-thread update shadow stats to show metrics
perf stat: Move the shadow stats scale computation in perf_stat__update_shadow_stats
perf tools: Add perf_data_file__write function
perf tools: Add struct perf_data_file
perf tools: Rename struct perf_data_file to perf_data
perf script: Print information about per-event-dump files
perf trace beauty prctl: Generate 'option' string table from kernel headers
tools include uapi: Grab a copy of linux/prctl.h
perf script: Allow creating per-event dump files
perf evsel: Restore evsel->priv as a tool private area
perf script: Use event_format__fprintf()
...
Pull core locking updates from Ingo Molnar:
"The main changes in this cycle are:
- Another attempt at enabling cross-release lockdep dependency
tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
with better performance and fewer false positives. (Byungchul Park)
- Introduce lockdep_assert_irqs_enabled()/disabled() and convert
open-coded equivalents to lockdep variants. (Frederic Weisbecker)
- Add down_read_killable() and use it in the VFS's iterate_dir()
method. (Kirill Tkhai)
- Convert remaining uses of ACCESS_ONCE() to
READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
driven. (Mark Rutland, Paul E. McKenney)
- Get rid of lockless_dereference(), by strengthening Alpha atomics,
strengthening READ_ONCE() with smp_read_barrier_depends() and thus
being able to convert users of lockless_dereference() to
READ_ONCE(). (Will Deacon)
- Various micro-optimizations:
- better PV qspinlocks (Waiman Long),
- better x86 barriers (Michael S. Tsirkin)
- better x86 refcounts (Kees Cook)
- ... plus other fixes and enhancements. (Borislav Petkov, Juergen
Gross, Miguel Bernal Marin)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
rcu: Use lockdep to assert IRQs are disabled/enabled
netpoll: Use lockdep to assert IRQs are disabled/enabled
timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
irq_work: Use lockdep to assert IRQs are disabled/enabled
irq/timings: Use lockdep to assert IRQs are disabled/enabled
perf/core: Use lockdep to assert IRQs are disabled/enabled
x86: Use lockdep to assert IRQs are disabled/enabled
smp/core: Use lockdep to assert IRQs are disabled/enabled
timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
timers/nohz: Use lockdep to assert IRQs are disabled/enabled
workqueue: Use lockdep to assert IRQs are disabled/enabled
irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
locking/rwlocks: Fix comments
x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
...
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle are:
- Documentation updates
- RCU CPU stall-warning updates
- Torture-test updates
- Miscellaneous fixes
Size wise the biggest updates are to documentation. Excluding
documentation most of the code increase comes from a single commit
which expands debugging"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
srcu: Add parameters to SRCU docbook comments
doc: Rewrite confusing statement about memory barriers
memory-barriers.txt: Fix typo in pairing example
rcu/segcblist: Include rcupdate.h
rcu: Add extended-quiescent-state testing advice
rcu: Suppress lockdep false-positive ->boost_mtx complaints
rcu: Do not include rtmutex_common.h unconditionally
torture: Provide TMPDIR environment variable to specify tmpdir
rcutorture: Dump writer stack if stalled
rcutorture: Add interrupt-disable capability to stall-warning tests
rcu: Suppress RCU CPU stall warnings while dumping trace
rcu: Turn off tracing before dumping trace
rcu: Make RCU CPU stall warnings check for irq-disabled CPUs
sched,rcu: Make cond_resched() provide RCU quiescent state
sched: Make resched_cpu() unconditional
irq_work: Map irq_work_on_queue() to irq_work_on() in !SMP
rcu: Create call_rcu_tasks() kthread at boot time
rcu: Fix up pending cbs check in rcu_prepare_for_idle
memory-barriers: Rework multicopy-atomicity section
memory-barriers: Replace uses of "transitive"
...
Pull security subsystem integrity updates from James Morris:
"There is a mixture of bug fixes, code cleanup, preparatory code for
new functionality and new functionality.
Commit 26ddabfe96 ("evm: enable EVM when X509 certificate is
loaded") enabled EVM without loading a symmetric key, but was limited
to defining the x509 certificate pathname at build. Included in this
set of patches is the ability of enabling EVM, without loading the EVM
symmetric key, from userspace. New is the ability to prevent the
loading of an EVM symmetric key."
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
ima: Remove redundant conditional operator
ima: Fix bool initialization/comparison
ima: check signature enforcement against cmdline param instead of CONFIG
module: export module signature enforcement status
ima: fix hash algorithm initialization
EVM: Only complain about a missing HMAC key once
EVM: Allow userspace to signal an RSA key has been loaded
EVM: Include security.apparmor in EVM measurements
ima: call ima_file_free() prior to calling fasync
integrity: use kernel_read_file_from_path() to read x509 certs
ima: always measure and audit files in policy
ima: don't remove the securityfs policy file
vfs: fix mounting a filesystem with i_version
Pull MMC updates from Ulf Hansson:
"MMC core:
- Introduce host claiming by context to support blkmq
- Preparations for enabling CQE (eMMC CMDQ) requests
- Re-factorizations to prepare for blkmq support
- Re-factorizations to prepare for CQE support
- Fix signal voltage switch for SD cards without power cycle
- Convert RPMB to a character device
- Export eMMC revision via sysfs
- Support eMMC DT binding for fixed driver type
- Document mmc_regulator_get_supply() API
MMC host:
- omap_hsmmc: Updated regulator management for PBIAS
- sdhci-omap: Add new OMAP SDHCI driver
- meson-mx-sdio: New driver for the Amlogic Meson8 and Meson8b SoCs
- sdhci-pci: Add support for Intel CDF
- sdhci-acpi: Fix voltage switch for some Intel host controllers
- sdhci-msm: Enable delay circuit calibration clocks
- sdhci-msm: Manage power IRQ properly
- mediatek: Add support of mt2701/mt2712
- mediatek: Updates management of clocks and tunings
- mediatek: Upgrade eMMC HS400 support
- rtsx_pci: Update tuning for gen3 PCI-Express
- renesas_sdhi: Support R-Car Gen[123] fallback compatibility strings
- Catch all errors when getting regulators
- Various additional improvements and cleanups"
* tag 'mmc-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (91 commits)
sdhci-fujitsu: add support for setting the CMD_DAT_DELAY attribute
dt-bindings: sdhci-fujitsu: document cmd-dat-delay property
mmc: tmio: Replace msleep() of 20ms or less with usleep_range()
mmc: dw_mmc: Convert timers to use timer_setup()
mmc: dw_mmc: Cleanup the DTO timer like the CTO one
mmc: vub300: Use common code in __download_offload_pseudocode()
mmc: tmio: Use common error handling code in tmio_mmc_host_probe()
mmc: Convert timers to use timer_setup()
mmc: sdhci-acpi: Fix voltage switch for some Intel host controllers
mmc: sdhci-acpi: Let devices define their own private data
mmc: mediatek: perfer to use rise edge latching for cmd line
mmc: mediatek: improve eMMC hs400 mode read performance
mmc: mediatek: add latch-ck support
mmc: mediatek: add support of source_cg clock
mmc: mediatek: add stop_clk fix and enhance_rx support
mmc: mediatek: add busy_check support
mmc: mediatek: add async fifo and data tune support
mmc: mediatek: add pad_tune0 support
mmc: mediatek: make hs400_tune_response only for mt8173
arm64: dts: mt8173: remove "mediatek, mt8135-mmc" from mmc nodes
...
As reported by kernelci and other build bots, we now get a link
failure without CONFIG_KALLSYMS:
module.c:(.text+0xf2c): undefined reference to `kallsyms_show_value'
This adds a dummy helper with the same name that can be used
for compilation. It's not entirely clear to me what this
should return for !CONFIG_KALLSYMS, I picked an unconditional
'false', which leads to the module address being unavailable
to user space.
Link: https://kernelci.org/build/mainline/branch/master/kernel/v4.14-5-g516fb7f2e73d/
Fixes: 516fb7f2e7 ("/proc/module: use the same logic as /proc/kallsyms for address exposure")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building kallsyms fails without CONFIG_PRINTK due to a missing
declaration:
kernel/kallsyms.c: In function 'kallsyms_show_value':
kernel/kallsyms.c:670:10: error: 'kptr_restrict' undeclared (first use in this function); did you mean 'keyring_restrict'?
This moves the declaration outside of the #ifdef guard, the definition
is already available without CONFIG_PRINTK.
Fixes: c0f3ea1589 ("stop using '%pK' for /proc/kallsyms pointer values")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ I clearly need to start doing "allnoconfig" builds too, or just have a
test branch for the 0day robot - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull regmap updates from Mark Brown:
"After several quiet kernel releases we've got a couple of new features
in regmap, support for using hwspinlocks as the lock for the internal
data structures and a helper for polling on regmap_fields. The Kconfig
dependencies on hwspinlocks were annoyingly difficult to squash
between things behaving surprisingly and randconfig, I could've
squashed those commits down but might've have caused hassle with other
trees trying to use the new support.
- support for using a hwspinlock to protect the regmap
- an iopoll style helper for regmap_field"
* tag 'regmap-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: Fix unused warning
regmap: Try to work around Kconfig exploding on HWSPINLOCK
regmap: Clean up hwspinlock on regmap exit
regmap: Also protect hwspinlock in error handling path
regmap: Add a config option for hwspinlock
regmap: Add hardware spinlock support
regmap: avoid -Wint-in-bool-context warning
regmap: add iopoll-like polling macro for regmap_field
regmap: constify regmap_bus structures
regmap: Avoid namespace collision within macro & tidy up
Pull spi updates from Mark Brown:
"This release is almost entirely driver changes, there's a couple of
fixes in the core but otherwise it's all drivers:
- fix for mixed dynamic and static bus number assignment.
- fixes for some leaks arising from confusing lifetime rules during
device unregistration and improved documentation to try to help
avoid this in the future.
- fixes to make the native chip select support for i.MX usable.
- slave mode support for i.MX.
- support for Coldfire MCF5441x DSPI, Renesas R8A7443/5 and
Spreadtrum ADI"
* tag 'spi-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (31 commits)
spi: imx: Don't require platform data chipselect array
spi: imx: Fix failure path leak on GPIO request error
spi: imx: GPIO based chip selects should not be required
spi: sh-msiof: remove redundant pointer dev
spi: s3c64xx: remove redundant pointer sci
spi: spi-fsl-dspi: enabling Coldfire mcf5441x dspi
spi: fix IDR collision on systems with both fixed and dynamic SPI bus numbers
spi: orion: remove redundant assignment of status to zero
spi: sh-msiof: Fix DMA transfer size check
spi: imx: Fix failure path leak on GPIO request error
spi: spi-axi: fix potential use-after-free after deregistration
spi: document odd controller reference handling
spi: fix use-after-free at controller deregistration
spi: sprd: Fix the possible negative value of BIT()
spi: sprd-adi: fix platform_no_drv_owner.cocci warnings
spi: a3700: Change SPI mode before asserting chip-select
spi: tegra114: correct register name in definition
spi: spreadtrum adi: add hwspinlock dependency
spi: sh-msiof: Use of_device_get_match_data() helper
spi: rspi: Use of_device_get_match_data() helper
...
Pull regulator updates from Mark Brown:
"A very quiet release for regulator, there's some new device support in
existing drivers here and a few fixes but nothing in the core.
Summary:
- New device support for Allwinner AXP813, Dialog DA223/4/5 and
Qualcomm PMI8994"
* tag 'regulator-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: tps65218: remove unused tps_info structure
regulator: tps65218: Fix strobe assignment
regulator: qcom_spmi: Include offset when translating voltages
regulator: qcom_spmi: Add support for pmi8994
regulator: da9211: update for supporting da9223/4/5
ASoC: pfuze100: Remove leading zero from '@08' notation
regulator: axp20x: Simplify axp20x_is_polyphase_slave implementation
regulator: axp20x: Add support for AXP813 regulators
Pull hwmon updates from Guenter Roeck:
- drivers for MAX31785 and MAX6621
- support for AMD family 17h (Ryzen, Threadripper) temperature sensors
- various driver cleanups and minor improvements
* tag 'hwmon-for-linus-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (30 commits)
dt-bindings: pmbus: Add Maxim MAX31785 documentation
pmbus: Add driver for Maxim MAX31785 Intelligent Fan Controller
hwmon: (aspeed-pwm-tacho) Sort headers
hwmon: (xgene) Minor clean up of ifdef and acpi_match_table reference
hwmon: (max6621) Inverted if condition in max6621_read()
hwmon: (asc7621) remove redundant assignment to newval
hwmon: (xgene) Support hwmon v2
hwmon: (gpio-fan) Fix null pointer dereference at probe
hwmon: (gpio-fan) Convert to use GPIO descriptors
hwmon: (gpio-fan) Rename GPIO line state variables
hwmon: (gpio-fan) Get rid of the gpio alarm struct
hwmon: (gpio-fan) Get rid of platform data struct
hwmon: (gpio-fan) Mandate OF_GPIO and cut pdata path
hwmon: (gpio-fan) Send around device pointer
hwmon: (gpio-fan) Localize platform data
hwmon: (gpio-fan) Use local variable pointers
hwmon: (gpio-fan) Move DT bindings to the right place
Documentation: devicetree: add max6621 device
hwmon: (max6621) Add support for Maxim MAX6621 temperature sensor
hwmon: (w83793) make const array watchdog_minors static, reduces object code size
...
Pull documentation updates from Jonathan Corbet:
"A relatively calm cycle for the docs tree again.
- The old driver statement has been added to the kernel docs.
- We have a couple of new helper scripts. find-unused-docs.sh from
Sayli Karnic will point out kerneldoc comments that are not actually
used in the documentation. Jani Nikula's
documentation-file-ref-check finds references to non-existing files.
- A new ftrace document from Steve Rostedt.
- Vinod Koul converted the dmaengine docs to RST
Beyond that, it's mostly simple fixes.
This set reaches outside of Documentation/ a bit more than most. In
all cases, the changes are to comment docs, mostly from Randy, in
places where there didn't seem to be anybody better to take them"
* tag 'docs-4.15' of git://git.lwn.net/linux: (52 commits)
documentation: fb: update list of available compiled-in fonts
MAINTAINERS: update DMAengine documentation location
dmaengine: doc: ReSTize pxa_dma doc
dmaengine: doc: ReSTize dmatest doc
dmaengine: doc: ReSTize client API doc
dmaengine: doc: ReSTize provider doc
dmaengine: doc: Add ReST style dmaengine document
ftrace/docs: Add documentation on how to use ftrace from within the kernel
bug-hunting.rst: Fix an example and a typo in a Sphinx tag
scripts: Add a script to find unused documentation
samples: Convert timers to use timer_setup()
documentation: kernel-api: add more info on bitmap functions
Documentation: fix selftests related file refs
Documentation: fix ref to power basic-pm-debugging
Documentation: fix ref to trace stm content
Documentation: fix ref to coccinelle content
Documentation: fix ref to workqueue content
Documentation: fix ref to sphinx/kerneldoc.py
Documentation: fix locking rt-mutex doc refs
docs: dev-tools: correct Coccinelle version number
...
In some randconfig scenarios, including arm-gic-v4.h results
in a spurious wawrning about the $SUBJECT structure not being
defined. Adding a forward definition keeps it quiet.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
extra argument and make it 'unsigned int throughout.
Also, consolidate a bunch of identical action functions into a default
function that can do the appropriate thing for the mode.
Also, change the argument name in the bit_wait*() function declarations to
reflect the fact that it's the mode and not the bit number.
[Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
should be done differently, though he's not immediately sure as to how]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
cc: Ingo Molnar <mingo@kernel.org>
ASoC: Updates for v4.15
The biggest thing this release has been the conversion of the AC98 bus
to the driver model, that's been a long time coming so thanks to Robert
Jarzmik for his dedication there. Due to there being some AC97 MFD
there's a few fairly large changes in input and the MFD layer, mainly to
the wm97xx driver.
There's also some drivers/drm changes to support the new AMD Stoney
platform, these are shared with the DRM subsystem and should be being
merged via both.
Within the subsystem the overwhelming bulk of the changes is in the
Intel drivers which continue to need lots of cleanups and fixes, this
release they've also gained support for their open source firmware.
There's also some large changs in the core as Morimoto-san continues to
mirror operations into the component level in preparation for conversion
of drivers to that.
- The AC97 bus has finally caught up with the driver model thanks to
some dedicated and persistent work from Robert Jarzmik.
- Continued work from Morimoto-san on moving us towards being able to
use components for everything.
- Lots of cleanups for the Intel platform code, including support for
their open source audio firmware.
- Support for scaling MCLK with sample rate in simple-card.
- Support for AMD Stoney platform.
The CPU hotplug notifiers are history. Remove the last reminders.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The (alleged) users of the module addresses are the same: kernel
profiling.
So just expose the same helper and format macros, and unify the logic.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pm-devfreq:
PM / devfreq: Define the constant governor name
PM / devfreq: Remove unneeded conditional statement
PM / devfreq: Show the all available frequencies
PM / devfreq: Change return type of devfreq_set_freq_table()
PM / devfreq: Use the available min/max frequency
Revert "PM / devfreq: Add show_one macro to delete the duplicate code"
PM / devfreq: Set min/max_freq when adding the devfreq device
* pm-tools:
tools/power/cpupower: add libcpupower.so.0.0.1 to .gitignore
tools/power/cpupower: Add 64 bit library detection
MAINTAINERS: add maintainer for tools/power/cpupower
cpupower: Fix no-rounding MHz frequency output
* pm-core:
ACPI / PM: Take SMART_SUSPEND driver flag into account
PCI / PM: Take SMART_SUSPEND driver flag into account
PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks
PM / core: Add SMART_SUSPEND driver flag
PCI / PM: Use the NEVER_SKIP driver flag
PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
PM / core: Convert timers to use timer_setup()
PM / core: Fix kerneldoc comments of four functions
PM / core: Drop legacy class suspend/resume operations
* pm-sleep:
freezer: Fix typo in freezable_schedule_timeout() comment
PM / s2idle: Clear the events_check_enabled flag
PM / sleep: Remove pm_complete_with_resume_check()
PM: ARM: locomo: Drop suspend and resume bus type callbacks
PM: Use a more common logging style
PM: Document rules on using pm_runtime_resume() in system suspend callbacks
* pm-cpufreq-sched:
cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq
* pm-opp:
PM / OPP: Add dev_pm_opp_{un}register_get_pstate_helper()
PM / OPP: Support updating performance state of device's power domain
PM / OPP: add missing of_node_put() for of_get_cpu_node()
PM / OPP: Rename dev_pm_opp_register_put_opp_helper()
PM / OPP: Add missing of_node_put(np)
PM / OPP: Move error message to debug level
PM / OPP: Use snprintf() to avoid kasprintf() and kfree()
PM / OPP: Move the OPP directory out of power/
* pm-cpufreq: (22 commits)
cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE
cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const
cpufreq: arm_big_little: make function arguments and structure pointer const
cpufreq: pxa: convert to clock API
cpufreq: speedstep-lib: mark expected switch fall-through
cpufreq: ti-cpufreq: add missing of_node_put()
cpufreq: dt: Remove support for Exynos4212 SoCs
cpufreq: imx6q: Move speed grading check to cpufreq driver
cpufreq: ti-cpufreq: kfree opp_data when failure
cpufreq: SPEAr: pr_err() strings should end with newlines
cpufreq: powernow-k8: pr_err() strings should end with newlines
cpufreq: dt-platdev: drop socionext,uniphier-ld6b from whitelist
arm64: wire cpu-invariant accounting support up to the task scheduler
arm64: wire frequency-invariant accounting support up to the task scheduler
arm: wire cpu-invariant accounting support up to the task scheduler
arm: wire frequency-invariant accounting support up to the task scheduler
drivers base/arch_topology: allow inlining cpu-invariant accounting support
drivers base/arch_topology: provide frequency-invariant accounting support
cpufreq: dt: invoke frequency-invariance setter function
cpufreq: arm_big_little: invoke frequency-invariance setter function
...
* pm-domains:
PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare()
PM / domains: Rework governor code to be more consistent
PM / Domains: Remove gpd_dev_ops.active_wakeup() callback
soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP
soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP
ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP
PM / Domains: Allow genpd users to specify default active wakeup behavior
PM / Domains: Add support to select performance-state of domains
PM / Domains: Rename genpd internals from pm_genpd_* to genpd_*
Add a function, similar to mod_timer(), that will start a timer if it isn't
running and will modify it if it is running and has an expiry time longer
than the new time. If the timer is running with an expiry time that's the
same or sooner, no change is made.
The function looks like:
int timer_reduce(struct timer_list *timer, unsigned long expires);
This can be used by code such as networking code to make it easier to share
a timer for multiple timeouts. For instance, in upcoming AF_RXRPC code,
the rxrpc_call struct will maintain a number of timeouts:
unsigned long ack_at;
unsigned long resend_at;
unsigned long ping_at;
unsigned long expect_rx_by;
unsigned long expect_req_by;
unsigned long expect_term_by;
each of which is set independently of the others. With timer reduction
available, when the code needs to set one of the timeouts, it only needs to
look at that timeout and then call timer_reduce() to modify the timer,
starting it or bringing it forward if necessary. There is no need to refer
to the other timeouts to see which is earliest and no need to take any lock
other than, potentially, the timer lock inside timer_reduce().
Note, that this does not protect against concurrent invocations of any of
the timer functions.
As an example, the expect_rx_by timeout above, which terminates a call if
we don't get a packet from the server within a certain time window, would
be set something like this:
unsigned long now = jiffies;
unsigned long expect_rx_by = now + packet_receive_timeout;
WRITE_ONCE(call->expect_rx_by, expect_rx_by);
timer_reduce(&call->timer, expect_rx_by);
The timer service code (which might, say, be in a work function) would then
check all the timeouts to see which, if any, had triggered, deal with
those:
t = READ_ONCE(call->ack_at);
if (time_after_eq(now, t)) {
cmpxchg(&call->ack_at, t, now + MAX_JIFFY_OFFSET);
set_bit(RXRPC_CALL_EV_ACK, &call->events);
}
and then restart the timer if necessary by finding the soonest timeout that
hasn't yet passed and then calling timer_reduce().
The disadvantage of doing things this way rather than comparing the timers
each time and calling mod_timer() is that you *will* take timer events
unless you can finish what you're doing and delete the timer in time.
The advantage of doing things this way is that you don't need to use a lock
to work out when the next timer should be set, other than the timer's own
lock - which you might not have to take.
[ tglx: Fixed weird formatting and adopted it to pending changes ]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keyrings@vger.kernel.org
Cc: linux-afs@lists.infradead.org
Link: https://lkml.kernel.org/r/151023090769.23050.1801643667223880753.stgit@warthog.procyon.org.uk
ip6_frag_id was only used by UFO, which has been removed.
ipv6_proxy_select_ident() only existed to set ip6_frag_id and has no
in-tree callers.
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the reordering distance measurement in packet unit with
sequence based approach. Previously it trackes the number of "packets"
toward the forward ACK (i.e. highest sacked sequence)in a state
variable "fackets_out".
Precisely measuring reordering degree on packet distance has not much
benefit, as the degree constantly changes by factors like path, load,
and congestion window. It is also complicated and prone to arcane bugs.
This patch replaces with sequence-based approach that's much simpler.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FACK loss detection has been disabled by default and the
successor RACK subsumed FACK and can handle reordering better.
This patch removes FACK to simplify TCP loss recovery.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a per-device sysctl to specify the default traffic class to use for
kernel originated IPv6 Neighbour Discovery packets.
Currently this includes:
- Router Solicitation (ICMPv6 type 133)
ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()
- Neighbour Solicitation (ICMPv6 type 135)
ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()
- Neighbour Advertisement (ICMPv6 type 136)
ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()
- Redirect (ICMPv6 type 137)
ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()
and if the kernel ever gets around to generating RA's,
it would presumably also include:
- Router Advertisement (ICMPv6 type 134)
(radvd daemon could pick up on the kernel setting and use it)
Interface drivers may examine the Traffic Class value and translate
the DiffServ Code Point into a link-layer appropriate traffic
prioritization scheme. An example of mapping IETF DSCP values to
IEEE 802.11 User Priority values can be found here:
https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11
The expected primary use case is to properly prioritize ND over wifi.
Testing:
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
0
jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
-bash: echo: write error: Invalid argument
jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
-bash: echo: write error: Invalid argument
jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
255
jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
34
jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
length 24, tgt is jzem22.pgc, Flags [solicited]
(based on original change written by Erik Kline, with minor changes)
v2: fix 'suspicious rcu_dereference_check() usage'
by explicitly grabbing the rcu_read_lock.
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Error injection is sloppy and very ad-hoc. BPF could fill this niche
perfectly with it's kprobe functionality. We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations. Accomplish this with the bpf_override_funciton
helper. This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function. This gives us a nice
clean way to implement systematic error injection for all of our code
paths.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we are inconsistent in when we decide to run the queue. Using
blk_mq_run_hw_queues() we check if the hctx has pending IO before
running it, but we don't do that from the individual queue run function,
blk_mq_run_hw_queue(). This results in a lot of extra and pointless
queue runs, potentially, on flush requests and (much worse) on tag
starvation situations. This is observable just looking at top output,
with lots of kworkers active. For the !async runs, it just adds to the
CPU overhead of blk-mq.
Move the has-pending check into the run function instead of having
callers do it.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
guard_bio_eod() needs to look at the partition capacity, not just the
capacity of the whole device, when determining if truncation is
necessary.
[ 60.268688] attempt to access beyond end of device
[ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
[ 60.268693] buffer_io_error: 2 callbacks suppressed
[ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read
Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Cc: stable@vger.kernel.org # v4.13
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The contexts from which a SCSI device can be quiesced or resumed are:
* Writing into /sys/class/scsi_device/*/device/state.
* SCSI parallel (SPI) domain validation.
* The SCSI device power management methods. See also scsi_bus_pm_ops.
It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after each resume the
fio job was still running:
for ((i=0; i<10; i++)); do
(
cd /sys/block/md0/md &&
while true; do
[ "$(<sync_action)" = "idle" ] && echo check > sync_action
sleep 1
done
) &
pids=($!)
for d in /sys/class/block/sd*[a-z]; do
bdev=${d#/sys/class/block/}
hcil=$(readlink "$d/device")
hcil=${hcil#../../../}
echo 4 > "$d/queue/nr_requests"
echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \
--rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \
--iodepth_batch=1 --thread --loops=$((2**31)) &
pids+=($!)
done
sleep 1
echo "$(date) Hibernating ..." >>hibernate-test-log.txt
systemctl hibernate
sleep 10
kill "${pids[@]}"
echo idle > /sys/block/md0/md/sync_action
wait
echo "$(date) Done." >>hibernate-test-log.txt
done
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This flag will be used in the next patch to let the block layer
core know whether or not a SCSI request queue has been quiesced.
A quiesced SCSI queue namely only processes RQF_PREEMPT requests.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch attempts to make the case of hctx re-running on driver tag
failure more robust. Without this patch, it's pretty easy to trigger a
stall condition with shared tags. An example is using null_blk like
this:
modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 hw_queue_depth=1
which sets up 4 devices, sharing the same tag set with a depth of 1.
Running a fio job ala:
[global]
bs=4k
rw=randread
norandommap
direct=1
ioengine=libaio
iodepth=4
[nullb0]
filename=/dev/nullb0
[nullb1]
filename=/dev/nullb1
[nullb2]
filename=/dev/nullb2
[nullb3]
filename=/dev/nullb3
will inevitably end with one or more threads being stuck waiting for a
scheduler tag. That IO is then stuck forever, until someone else
triggers a run of the queue.
Ensure that we always re-run the hardware queue, if the driver tag we
were waiting for got freed before we added our leftover request entries
back on the dispatch list.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>