ac14ef0f9f201e16e61d3be5b493e10acad957e7
118 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
82658bfd88 |
Merge 5.10.44 into android12-5.10-lts
Changes in 5.10.44 proc: Track /proc/$pid/attr/ opener mm_struct ASoC: max98088: fix ni clock divider calculation ASoC: amd: fix for pcm_read() error spi: Fix spi device unregister flow spi: spi-zynq-qspi: Fix stack violation bug bpf: Forbid trampoline attach for functions with variable arguments net/nfc/rawsock.c: fix a permission check bug usb: cdns3: Fix runtime PM imbalance on error ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet vfio-ccw: Reset FSM state to IDLE inside FSM vfio-ccw: Serialize FSM IDLE state with I/O completion ASoC: sti-sas: add missing MODULE_DEVICE_TABLE spi: sprd: Add missing MODULE_DEVICE_TABLE usb: chipidea: udc: assign interrupt number to USB gadget structure isdn: mISDN: netjet: Fix crash in nj_probe: bonding: init notify_work earlier to avoid uninitialized use netlink: disable IRQs for netlink_lock_table() net: mdiobus: get rid of a BUG_ON() cgroup: disable controllers at parse time wq: handle VM suspension in stall detection net/qla3xxx: fix schedule while atomic in ql_sem_spinlock RDS tcp loopback connection can hang net:sfc: fix non-freed irq in legacy irq mode scsi: bnx2fc: Return failure if io_req is already in ABTS processing scsi: vmw_pvscsi: Set correct residual data length scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irq scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal net: macb: ensure the device is available before accessing GEMGXL control registers net: appletalk: cops: Fix data race in cops_probe1 net: dsa: microchip: enable phy errata workaround on 9567 nvme-fabrics: decode host pathing error for connect MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER dm verity: fix require_signatures module_param permissions bnx2x: Fix missing error code in bnx2x_iov_init_one() nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME nvmet: fix false keep-alive timeout when a controller is torn down powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers spi: Don't have controller clean up spi device before driver unbind spi: Cleanup on failure of initial setup i2c: mpc: Make use of i2c_recover_bus() i2c: mpc: implement erratum A-004447 workaround ALSA: seq: Fix race of snd_seq_timer_open() ALSA: firewire-lib: fix the context to call snd_pcm_stop_xrun() ALSA: hda/realtek: headphone and mic don't work on an Acer laptop ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Elite Dragonfly G2 ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP EliteBook x360 1040 G8 ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 840 Aero G8 ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power G8 spi: bcm2835: Fix out-of-bounds access with more than 4 slaves Revert "ACPI: sleep: Put the FACS table after using it" drm: Fix use-after-free read in drm_getunique() drm: Lock pointer access in drm_master_release() perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server KVM: X86: MMU: Use the correct inherited permissions to get shadow page kvm: avoid speculation-based attacks from out-of-range memslot accesses staging: rtl8723bs: Fix uninitialized variables async_xor: check src_offs is not NULL before updating it btrfs: return value from btrfs_mark_extent_written() in case of error btrfs: promote debugging asserts to full-fledged checks in validate_super cgroup1: don't allow '\n' in renaming ftrace: Do not blindly read the ip address in ftrace_bug() mmc: renesas_sdhi: abort tuning when timeout detected mmc: renesas_sdhi: Fix HS400 on R-Car M3-W+ USB: f_ncm: ncm_bitrate (speed) is unsigned usb: f_ncm: only first packet of aggregate needs to start timer usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms usb: dwc3-meson-g12a: fix usb2 PHY glue init when phy0 is disabled usb: dwc3: meson-g12a: Disable the regulator in the error handling path of the probe usb: dwc3: gadget: Bail from dwc3_gadget_exit() if dwc->gadget is NULL usb: dwc3: ep0: fix NULL pointer exception usb: musb: fix MUSB_QUIRK_B_DISCONNECT_99 handling usb: typec: wcove: Use LE to CPU conversion when accessing msg->header usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe() usb: typec: intel_pmc_mux: Add missed error check for devm_ioremap_resource() usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind USB: serial: ftdi_sio: add NovaTech OrionMX product ID USB: serial: omninet: add device id for Zyxel Omni 56K Plus USB: serial: quatech2: fix control-request directions USB: serial: cp210x: fix alternate function for CP2102N QFN20 usb: gadget: eem: fix wrong eem header operation usb: fix various gadgets null ptr deref on 10gbps cabling. usb: fix various gadget panics on 10gbps cabling usb: typec: tcpm: cancel vdm and state machine hrtimer when unregister tcpm port usb: typec: tcpm: cancel frs hrtimer when unregister tcpm port regulator: core: resolve supply for boot-on/always-on regulators regulator: max77620: Use device_set_of_node_from_dev() regulator: bd718x7: Fix the BUCK7 voltage setting on BD71837 regulator: fan53880: Fix missing n_voltages setting regulator: bd71828: Fix .n_voltages settings regulator: rtmv20: Fix .set_current_limit/.get_current_limit callbacks phy: usb: Fix misuse of IS_ENABLED usb: dwc3: gadget: Disable gadget IRQ during pullup disable usb: typec: mux: Fix copy-paste mistake in typec_mux_match drm/mcde: Fix off by 10^3 in calculation drm/msm/a6xx: fix incorrectly set uavflagprd_inv field for A650 drm/msm/a6xx: update/fix CP_PROTECT initialization drm/msm/a6xx: avoid shadow NULL reference in failure path RDMA/ipoib: Fix warning caused by destroying non-initial netns RDMA/mlx4: Do not map the core_clock page to user space unless enabled ARM: cpuidle: Avoid orphan section warning vmlinux.lds.h: Avoid orphan section with !SMP tools/bootconfig: Fix error return code in apply_xbc() phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe() ASoC: core: Fix Null-point-dereference in fmt_single_name() ASoC: meson: gx-card: fix sound-dai dt schema phy: ti: Fix an error code in wiz_probe() gpio: wcd934x: Fix shift-out-of-bounds error perf: Fix data race between pin_count increment/decrement sched/fair: Keep load_avg and load_sum synced sched/fair: Make sure to update tg contrib for blocked load sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message IB/mlx5: Fix initializing CQ fragments buffer NFS: Fix a potential NULL dereference in nfs_get_client() NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode() perf session: Correct buffer copying when peeking events kvm: fix previous commit for 32-bit builds NFS: Fix use-after-free in nfs4_init_client() NFSv4: Fix second deadlock in nfs4_evict_inode() NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error. scsi: core: Fix error handling of scsi_host_alloc() scsi: core: Fix failure handling of scsi_add_host_with_dma() scsi: core: Put .shost_dev in failure path if host state changes to RUNNING scsi: core: Only put parent device if host state differs from SHOST_CREATED tracing: Correct the length check which causes memory corruption proc: only require mm_struct for writing Linux 5.10.44 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic64172b4e72ccb54d96000b3065dd8b33aa9fef5 |
||
![]() |
190a7f9089 |
sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling
commit 68d7a190682aa4eb02db477328088ebad15acc83 upstream.
The util_est internal UTIL_AVG_UNCHANGED flag which is used to prevent
unnecessary util_est updates uses the LSB of util_est.enqueued. It is
exposed via _task_util_est() (and task_util_est()).
Commit
|
||
![]() |
e054456ced |
Merge 5.10.37 into android12-5.10
Changes in 5.10.37
Bluetooth: verify AMP hci_chan before amp_destroy
bluetooth: eliminate the potential race condition when removing the HCI controller
net/nfc: fix use-after-free llcp_sock_bind/connect
io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers
Revert "USB: cdc-acm: fix rounding error in TIOCSSERIAL"
usb: roles: Call try_module_get() from usb_role_switch_find_by_fwnode()
tty: moxa: fix TIOCSSERIAL jiffies conversions
tty: amiserial: fix TIOCSSERIAL permission check
USB: serial: usb_wwan: fix TIOCSSERIAL jiffies conversions
staging: greybus: uart: fix TIOCSSERIAL jiffies conversions
USB: serial: ti_usb_3410_5052: fix TIOCSSERIAL permission check
staging: fwserial: fix TIOCSSERIAL jiffies conversions
tty: moxa: fix TIOCSSERIAL permission check
staging: fwserial: fix TIOCSSERIAL permission check
drm: bridge: fix LONTIUM use of mipi_dsi_() functions
usb: typec: tcpm: Address incorrect values of tcpm psy for fixed supply
usb: typec: tcpm: Address incorrect values of tcpm psy for pps supply
usb: typec: tcpm: update power supply once partner accepts
usb: xhci-mtk: remove or operator for setting schedule parameters
usb: xhci-mtk: improve bandwidth scheduling with TT
ASoC: samsung: tm2_wm5110: check of of_parse return value
ASoC: Intel: kbl_da7219_max98927: Fix kabylake_ssp_fixup function
ASoC: tlv320aic32x4: Register clocks before registering component
ASoC: tlv320aic32x4: Increase maximum register in regmap
MIPS: pci-mt7620: fix PLL lock check
MIPS: pci-rt2880: fix slot 0 configuration
FDDI: defxx: Bail out gracefully with unassigned PCI resource for CSR
PCI: Allow VPD access for QLogic ISP2722
KVM: x86: Defer the MMU unload to the normal path on an global INVPCID
PCI: xgene: Fix cfg resource mapping
PCI: keystone: Let AM65 use the pci_ops defined in pcie-designware-host.c
PM / devfreq: Unlock mutex and free devfreq struct in error path
soc/tegra: regulators: Fix locking up when voltage-spread is out of range
iio: inv_mpu6050: Fully validate gyro and accel scale writes
iio:accel:adis16201: Fix wrong axis assignment that prevents loading
iio:adc:ad7476: Fix remove handling
sc16is7xx: Defer probe if device read fails
phy: cadence: Sierra: Fix PHY power_on sequence
misc: lis3lv02d: Fix false-positive WARN on various HP models
phy: ti: j721e-wiz: Invoke wiz_init() before of_platform_device_create()
misc: vmw_vmci: explicitly initialize vmci_notify_bm_set_msg struct
misc: vmw_vmci: explicitly initialize vmci_datagram payload
selinux: add proper NULL termination to the secclass_map permissions
x86, sched: Treat Intel SNC topology as default, COD as exception
async_xor: increase src_offs when dropping destination page
md/bitmap: wait for external bitmap writes to complete during tear down
md-cluster: fix use-after-free issue when removing rdev
md: split mddev_find
md: factor out a mddev_find_locked helper from mddev_find
md: md_open returns -EBUSY when entering racing area
md: Fix missing unused status line of /proc/mdstat
mt76: mt7615: use ieee80211_free_txskb() in mt7615_tx_token_put()
ipw2x00: potential buffer overflow in libipw_wx_set_encodeext()
cfg80211: scan: drop entry from hidden_list on overflow
rtw88: Fix array overrun in rtw_get_tx_power_params()
mt76: fix potential DMA mapping leak
FDDI: defxx: Make MMIO the configuration default except for EISA
drm/i915/gvt: Fix virtual display setup for BXT/APL
drm/i915/gvt: Fix vfio_edid issue for BXT/APL
drm/qxl: use ttm bo priorities
drm/panfrost: Clear MMU irqs before handling the fault
drm/panfrost: Don't try to map pages that are already mapped
drm/radeon: fix copy of uninitialized variable back to userspace
drm/dp_mst: Revise broadcast msg lct & lcr
drm/dp_mst: Set CLEAR_PAYLOAD_ID_TABLE as broadcast
drm: bridge/panel: Cleanup connector on bridge detach
drm/amd/display: Reject non-zero src_y and src_x for video planes
drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2
ALSA: hda/realtek: Re-order ALC882 Acer quirk table entries
ALSA: hda/realtek: Re-order ALC882 Sony quirk table entries
ALSA: hda/realtek: Re-order ALC882 Clevo quirk table entries
ALSA: hda/realtek: Re-order ALC269 HP quirk table entries
ALSA: hda/realtek: Re-order ALC269 Acer quirk table entries
ALSA: hda/realtek: Re-order ALC269 Dell quirk table entries
ALSA: hda/realtek: Re-order ALC269 ASUS quirk table entries
ALSA: hda/realtek: Re-order ALC269 Sony quirk table entries
ALSA: hda/realtek: Re-order ALC269 Lenovo quirk table entries
ALSA: hda/realtek: Re-order remaining ALC269 quirk table entries
ALSA: hda/realtek: Re-order ALC662 quirk table entries
ALSA: hda/realtek: Remove redundant entry for ALC861 Haier/Uniwill devices
ALSA: hda/realtek: ALC285 Thinkpad jack pin quirk is unreachable
ALSA: hda/realtek: Fix speaker amp on HP Envy AiO 32
KVM: s390: VSIE: correctly handle MVPG when in VSIE
KVM: s390: split kvm_s390_logical_to_effective
KVM: s390: fix guarded storage control register handling
s390: fix detection of vector enhancements facility 1 vs. vector packed decimal facility
KVM: s390: VSIE: fix MVPG handling for prefixing and MSO
KVM: s390: split kvm_s390_real_to_abs
KVM: s390: extend kvm_s390_shadow_fault to return entry pointer
KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit
KVM: x86: Remove emulator's broken checks on CR0/CR3/CR4 loads
KVM: nSVM: Set the shadow root level to the TDP level for nested NPT
KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are created
KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported
KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch
KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bit
KVM: nVMX: Truncate base/index GPR value on address calc in !64-bit
KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read
KVM: Destroy I/O bus devices on unregister failure _after_ sync'ing SRCU
KVM: Stop looking for coalesced MMIO zones if the bus is destroyed
KVM: arm64: Fully zero the vcpu state on reset
KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read
Revert "drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit"
Revert "i3c master: fix missing destroy_workqueue() on error in i3c_master_register"
ovl: fix missing revert_creds() on error path
Revert "drm/qxl: do not run release if qxl failed to init"
usb: gadget: pch_udc: Revert
|
||
![]() |
94f1bdf01b |
sched/debug: Fix cgroup_path[] serialization
[ Upstream commit ad789f84c9a145f8a18744c0387cec22ec51651e ]
The handling of sysrq key can be activated by echoing the key to
/proc/sysrq-trigger or via the magic key sequence typed into a terminal
that is connected to the system in some way (serial, USB or other mean).
In the former case, the handling is done in a user context. In the
latter case, it is likely to be in an interrupt context.
Currently in print_cpu() of kernel/sched/debug.c, sched_debug_lock is
taken with interrupt disabled for the whole duration of the calls to
print_*_stats() and print_rq() which could last for the quite some time
if the information dump happens on the serial console.
If the system has many cpus and the sched_debug_lock is somehow busy
(e.g. parallel sysrq-t), the system may hit a hard lockup panic
depending on the actually serial console implementation of the
system.
The purpose of sched_debug_lock is to serialize the use of the global
cgroup_path[] buffer in print_cpu(). The rests of the printk calls don't
need serialization from sched_debug_lock.
Calling printk() with interrupt disabled can still be problematic if
multiple instances are running. Allocating a stack buffer of PATH_MAX
bytes is not feasible because of the limited size of the kernel stack.
The solution implemented in this patch is to allow only one caller at a
time to use the full size group_path[], while other simultaneous callers
will have to use shorter stack buffers with the possibility of path
name truncation. A "..." suffix will be printed if truncation may have
happened. The cgroup path name is provided for informational purpose
only, so occasional path name truncation should not be a big problem.
Fixes:
|
||
![]() |
b92945e2eb |
ANDROID: Sched: Add export symbols for sched features
Export symbols needed to implement vendor scheduler value-adds to modify sched features. Bug: 177050087 Change-Id: Ibe14d2019403be68b7ceeee47425b2473ccb51fe Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org> |
||
![]() |
2dd515921a |
ANDROID: Sched: Export sched_feat_keys symbol needed by vendor modules
Export sched_feat_keys to check Sched feature is enabled or not from vendor modules. Bug: 173559623 Change-Id: Id03483149f39bfc3e3a18ea56736a84d824a53f7 Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org> |
||
![]() |
8d4d9c7b43 |
sched/debug: Fix memory corruption caused by multiple small reads of flags
Reading /proc/sys/kernel/sched_domain/cpu*/domain0/flags mutliple times
with small reads causes oopses with slub corruption issues because the kfree is
free'ing an offset from a previous allocation. Fix this by adding in a new
pointer 'buf' for the allocation and kfree and use the temporary pointer tmp
to handle memory copies of the buf offsets.
Fixes:
|
||
![]() |
848785df48 |
sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL
The last sd_flag_debug shuffle inadvertently moved its definition within
an #ifdef CONFIG_SYSCTL region. While CONFIG_SYSCTL is indeed required to
produce the sched domain ctl interface (which uses sd_flag_debug to output
flag names), it isn't required to run any assertion on the sched_domain
hierarchy itself.
Move the definition of sd_flag_debug to a CONFIG_SCHED_DEBUG region of
topology.c.
Now at long last we have:
- sd_flag_debug declared in include/linux/sched/topology.h iff
CONFIG_SCHED_DEBUG=y
- sd_flag_debug defined in kernel/sched/topology.c, conditioned by:
- CONFIG_SCHED_DEBUG, with an explicit #ifdef block
- CONFIG_SMP, as a requirement to compile topology.c
With this change, all symbols pertaining to SD flag metadata (with the
exception of __SD_FLAG_CNT) are now defined exclusively within topology.c
Fixes:
|
||
![]() |
8fca9494d4 |
sched/topology: Move sd_flag_debug out of linux/sched/topology.h
Defining an array in a header imported all over the place clearly is a daft
idea, that still didn't stop me from doing it.
Leave a declaration of sd_flag_debug in topology.h and move its definition
to sched/debug.c.
Fixes:
|
||
![]() |
5b9f8ff7b3 |
sched/debug: Output SD flag names rather than their values
Decoding the output of /proc/sys/kernel/sched_domain/cpu*/domain*/flags has always been somewhat annoying, as one needs to go fetch the bit -> name mapping from the source code itself. This encoding can be saved in a script somewhere, but that isn't safe from flags being added, removed or even shuffled around. What matters for debugging purposes is to get *which* flags are set in a given domain, their associated value is pretty much meaningless. Make the sd flags debug file output flag names. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20200817113003.20802-7-valentin.schneider@arm.com |
||
![]() |
126c2092e5 |
sched: Add rq::ttwu_pending
In preparation of removing rq->wake_list, replace the !list_empty(rq->wake_list) with rq->ttwu_pending. This is not fully equivalent as this new variable is racy. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161908.070399698@infradead.org |
||
![]() |
9013196a46 | Merge branch 'sched/urgent' | ||
![]() |
ad32bb41fc |
sched/debug: Fix requested task uclamp values shown in procfs
The intention of commit |
||
![]() |
9818427c62 |
sched/debug: Make sd->flags sysctl read-only
Writing to the sysctl of a sched_domain->flags directly updates the value of the field, and goes nowhere near update_top_cache_domain(). This means that the cached domain pointers can end up containing stale data (e.g. the domain pointed to doesn't have the relevant flag set anymore). Explicit domain walks that check for flags will be affected by the write, but this won't be in sync with the cached pointers which will still point to the domains that were cached at the last sched_domain build. In other words, writing to this interface is playing a dangerous game. It could be made to trigger an update of the cached sched_domain pointers when written to, but this does not seem to be worth the trouble. Make it read-only. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200415210512.805-3-valentin.schneider@arm.com |
||
![]() |
f080d93e1d |
sched/debug: Fix trival print_task() format
Ensure leave one space between state and task name. w/o patch: runnable tasks: S task PID tree-key switches prio wait Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200414125721.195801-1-xiexiuqi@huawei.com |
||
![]() |
96e74ebf8d |
sched/debug: Add task uclamp values to SCHED_DEBUG procfs
Requested and effective uclamp values can be a bit tricky to decipher when playing with cgroup hierarchies. Add them to a task's procfs when SCHED_DEBUG is enabled. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-4-valentin.schneider@arm.com |
||
![]() |
9e3bf9469c |
sched/debug: Factor out printing formats into common macros
The printing macros in debug.c keep redefining the same output format. Collect each output format in a single definition, and reuse that definition in the other macros. While at it, add a layer of parentheses and replace printf's with the newly introduced macros. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-3-valentin.schneider@arm.com |
||
![]() |
c745a6212c |
sched/debug: Remove redundant macro define
Most printing macros for procfs are defined globally in debug.c, and they are re-defined (to the exact same thing) within proc_sched_show_task(). Get rid of the duplicate defines. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-2-valentin.schneider@arm.com |
||
![]() |
9f68395333 |
sched/pelt: Add a new runnable average signal
Now that runnable_load_avg has been removed, we can replace it by a new signal that will highlight the runnable pressure on a cfs_rq. This signal track the waiting time of tasks on rq and can help to better define the state of rqs. At now, only util_avg is used to define the state of a rq: A rq with more that around 80% of utilization and more than 1 tasks is considered as overloaded. But the util_avg signal of a rq can become temporaly low after that a task migrated onto another rq which can bias the classification of the rq. When tasks compete for the same rq, their runnable average signal will be higher than util_avg as it will include the waiting time and we can use this signal to better classify cfs_rqs. The new runnable_avg will track the runnable time of a task which simply adds the waiting time to the running time. The runnable _avg of cfs_rq will be the /Sum of se's runnable_avg and the runnable_avg of group entity will follow the one of the rq similarly to util_avg. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>" Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-9-mgorman@techsingularity.net |
||
![]() |
0dacee1bfa |
sched/pelt: Remove unused runnable load average
Now that runnable_load_avg is no more used, we can remove it to make space for a new signal. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>" Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-8-mgorman@techsingularity.net |
||
![]() |
02d4ac5885 |
sched/debug: Reset watchdog on all CPUs while processing sysrq-t
Lengthy output of sysrq-t may take a lot of time on slow serial console with lots of processes and CPUs. So we need to reset NMI-watchdog to avoid spurious lockup messages, and we also reset softlockup watchdogs on all other CPUs since another CPU might be blocked waiting for us to process an IPI or stop_machine. Add to sysrq_sched_debug_show() as what we did in show_state_filter(). Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20191226085224.48942-1-liwei391@huawei.com |
||
![]() |
d2abae71eb |
Merge tag 'v5.2-rc6' into sched/core, to refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
d2912cb15b |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
0e1fef63d9 |
sched/core: Remove sd->*_idx
The sched domain per rq load index files also disappear from the /proc/sys/kernel/sched_domain/cpuX/domainY directories. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@surriel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20190527062116.11512-6-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
55627e3cd2 |
sched/core: Remove rq->cpu_load[]
The per rq load array values also disappear from the cpu#X sections in /proc/sched_debug. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@surriel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20190527062116.11512-5-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
3d8d535544 |
sched/debug: Remove sd->*_idx range on sysctl
This reverts:
commit
|
||
![]() |
f2bedc4705 |
sched/fair: Remove rq->load
The CFS class is the only one maintaining and using the CPU wide load (rq->load(.weight)). The last use case of the CPU wide load in CFS's set_next_entity() can be replaced by using the load of the CFS class (rq->cfs.load(.weight)) instead. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190424084556.604-1-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
ad2e379def |
sched/debug: Fix spelling mistake "logaritmic" -> "logarithmic"
Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20181128152350.13622-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
1ca4fa3ab6 |
sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
register_sched_domain_sysctl() copies the cpu_possible_mask into sd_sysctl_cpus, but only if sd_sysctl_cpus hasn't already been allocated (ie, CONFIG_CPUMASK_OFFSTACK is set). However, when CONFIG_CPUMASK_OFFSTACK is not set, sd_sysctl_cpus is left uninitialized (all zeroes) and the kernel may fail to initialize sched_domain sysctl entries for all possible CPUs. This is visible to the user if the kernel is booted with maxcpus=n, or if ACPI tables have been modified to leave CPUs offline, and then checking for missing /proc/sys/kernel/sched_domain/cpu* entries. Fix this by separating the allocation and initialization, and adding a flag to initialize the possible CPU entries while system booting only. Tested-by: Syuuichirou Ishii <ishii.shuuichir@jp.fujitsu.com> Tested-by: Tarumizu, Kohei <tarumizu.kohei@jp.fujitsu.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190129151245.5073-1-msys.mizuma@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
e9666d10a5 |
jump_label: move 'asm goto' support test to Kconfig
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label". The jump label is controlled by HAVE_JUMP_LABEL, which is defined like this: #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) # define HAVE_JUMP_LABEL #endif We can improve this by testing 'asm goto' support in Kconfig, then make JUMP_LABEL depend on CC_HAS_ASM_GOTO. Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will match to the real kernel capability. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Sedat Dilek <sedat.dilek@gmail.com> |
||
![]() |
1da1843f9f |
sched/core: Create task_has_idle_policy() helper
We already have task_has_rt_policy() and task_has_dl_policy() helpers, create task_has_idle_policy() as well and update sched core to start using it. While at it, use task_has_dl_policy() at one more place. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/ce3915d5b490fc81af926a3b6bfb775e7188e005.1541416894.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
e73e81975f |
sched/debug: Fix potential deadlock when writing to sched_features
The following lockdep report can be triggered by writing to /sys/kernel/debug/sched_features: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18 Not tainted ------------------------------------------------------ sh/3358 is trying to acquire lock: 000000004ad3989d (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0x14/0x30 but task is already holding lock: 00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&sb->s_type->i_mutex_key#3){+.+.}: lock_acquire+0xb8/0x148 down_write+0xac/0x140 start_creating+0x5c/0x168 debugfs_create_dir+0x18/0x220 opp_debug_register+0x8c/0x120 _add_opp_dev+0x104/0x1f8 dev_pm_opp_get_opp_table+0x174/0x340 _of_add_opp_table_v2+0x110/0x760 dev_pm_opp_of_add_table+0x5c/0x240 dev_pm_opp_of_cpumask_add_table+0x5c/0x100 cpufreq_init+0x160/0x430 cpufreq_online+0x1cc/0xe30 cpufreq_add_dev+0x78/0x198 subsys_interface_register+0x168/0x270 cpufreq_register_driver+0x1c8/0x278 dt_cpufreq_probe+0xdc/0x1b8 platform_drv_probe+0xb4/0x168 driver_probe_device+0x318/0x4b0 __device_attach_driver+0xfc/0x1f0 bus_for_each_drv+0xf8/0x180 __device_attach+0x164/0x200 device_initial_probe+0x10/0x18 bus_probe_device+0x110/0x178 device_add+0x6d8/0x908 platform_device_add+0x138/0x3d8 platform_device_register_full+0x1cc/0x1f8 cpufreq_dt_platdev_init+0x174/0x1bc do_one_initcall+0xb8/0x310 kernel_init_freeable+0x4b8/0x56c kernel_init+0x10/0x138 ret_from_fork+0x10/0x18 -> #2 (opp_table_lock){+.+.}: lock_acquire+0xb8/0x148 __mutex_lock+0x104/0xf50 mutex_lock_nested+0x1c/0x28 _of_add_opp_table_v2+0xb4/0x760 dev_pm_opp_of_add_table+0x5c/0x240 dev_pm_opp_of_cpumask_add_table+0x5c/0x100 cpufreq_init+0x160/0x430 cpufreq_online+0x1cc/0xe30 cpufreq_add_dev+0x78/0x198 subsys_interface_register+0x168/0x270 cpufreq_register_driver+0x1c8/0x278 dt_cpufreq_probe+0xdc/0x1b8 platform_drv_probe+0xb4/0x168 driver_probe_device+0x318/0x4b0 __device_attach_driver+0xfc/0x1f0 bus_for_each_drv+0xf8/0x180 __device_attach+0x164/0x200 device_initial_probe+0x10/0x18 bus_probe_device+0x110/0x178 device_add+0x6d8/0x908 platform_device_add+0x138/0x3d8 platform_device_register_full+0x1cc/0x1f8 cpufreq_dt_platdev_init+0x174/0x1bc do_one_initcall+0xb8/0x310 kernel_init_freeable+0x4b8/0x56c kernel_init+0x10/0x138 ret_from_fork+0x10/0x18 -> #1 (subsys mutex#6){+.+.}: lock_acquire+0xb8/0x148 __mutex_lock+0x104/0xf50 mutex_lock_nested+0x1c/0x28 subsys_interface_register+0xd8/0x270 cpufreq_register_driver+0x1c8/0x278 dt_cpufreq_probe+0xdc/0x1b8 platform_drv_probe+0xb4/0x168 driver_probe_device+0x318/0x4b0 __device_attach_driver+0xfc/0x1f0 bus_for_each_drv+0xf8/0x180 __device_attach+0x164/0x200 device_initial_probe+0x10/0x18 bus_probe_device+0x110/0x178 device_add+0x6d8/0x908 platform_device_add+0x138/0x3d8 platform_device_register_full+0x1cc/0x1f8 cpufreq_dt_platdev_init+0x174/0x1bc do_one_initcall+0xb8/0x310 kernel_init_freeable+0x4b8/0x56c kernel_init+0x10/0x138 ret_from_fork+0x10/0x18 -> #0 (cpu_hotplug_lock.rw_sem){++++}: __lock_acquire+0x203c/0x21d0 lock_acquire+0xb8/0x148 cpus_read_lock+0x58/0x1c8 static_key_enable+0x14/0x30 sched_feat_write+0x314/0x428 full_proxy_write+0xa0/0x138 __vfs_write+0xd8/0x388 vfs_write+0xdc/0x318 ksys_write+0xb4/0x138 sys_write+0xc/0x18 __sys_trace_return+0x0/0x4 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> opp_table_lock --> &sb->s_type->i_mutex_key#3 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#3); lock(opp_table_lock); lock(&sb->s_type->i_mutex_key#3); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** 2 locks held by sh/3358: #0: 00000000a8c4b363 (sb_writers#10){.+.+}, at: vfs_write+0x238/0x318 #1: 00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428 stack backtrace: CPU: 5 PID: 3358 Comm: sh Not tainted 4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18 Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT) Call trace: dump_backtrace+0x0/0x288 show_stack+0x14/0x20 dump_stack+0x13c/0x1ac print_circular_bug.isra.10+0x270/0x438 check_prev_add.constprop.16+0x4dc/0xb98 __lock_acquire+0x203c/0x21d0 lock_acquire+0xb8/0x148 cpus_read_lock+0x58/0x1c8 static_key_enable+0x14/0x30 sched_feat_write+0x314/0x428 full_proxy_write+0xa0/0x138 __vfs_write+0xd8/0x388 vfs_write+0xdc/0x318 ksys_write+0xb4/0x138 sys_write+0xc/0x18 __sys_trace_return+0x0/0x4 This is because when loading the cpufreq_dt module we first acquire cpu_hotplug_lock.rw_sem lock, then in cpufreq_init(), we are taking the &sb->s_type->i_mutex_key lock. But when writing to /sys/kernel/debug/sched_features, the cpu_hotplug_lock.rw_sem lock depends on the &sb->s_type->i_mutex_key lock. To fix this bug, reverse the lock acquisition order when writing to sched_features, this way cpu_hotplug_lock.rw_sem no longer depends on &sb->s_type->i_mutex_key. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Jiada Wang <jiada_wang@mentor.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Eugeniu Rosca <erosca@de.adit-jv.com> Cc: George G. Davis <george_davis@mentor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180731121222.26195-1-jiada_wang@mentor.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
13e091b6dd |
Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner: "Early TSC based time stamping to allow better boot time analysis. This comes with a general cleanup of the TSC calibration code which grew warts and duct taping over the years and removes 250 lines of code. Initiated and mostly implemented by Pavel with help from various folks" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) x86/kvmclock: Mark kvm_get_preset_lpj() as __init x86/tsc: Consolidate init code sched/clock: Disable interrupts when calling generic_sched_clock_init() timekeeping: Prevent false warning when persistent clock is not available sched/clock: Close a hole in sched_clock_init() x86/tsc: Make use of tsc_calibrate_cpu_early() x86/tsc: Split native_calibrate_cpu() into early and late parts sched/clock: Use static key for sched_clock_running sched/clock: Enable sched clock early sched/clock: Move sched clock initialization and merge with generic clock x86/tsc: Use TSC as sched clock early x86/tsc: Initialize cyc2ns when tsc frequency is determined x86/tsc: Calibrate tsc only once ARM/time: Remove read_boot_clock64() s390/time: Remove read_boot_clock64() timekeeping: Default boot time offset to local_clock() timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset() s390/time: Add read_persistent_wall_and_boot_offset() x86/xen/time: Output xen sched_clock time from 0 x86/xen/time: Initialize pv xen time in init_hypervisor_platform() ... |
||
![]() |
67d9f6c256 |
sched/debug: Reverse the order of printing faults
Fix the order in which the private and shared numa faults are getting printed. No functional changes. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25215.7 25375.3 0.63 1 72107 72617 0.70 Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-7-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
46457ea464 |
sched/clock: Use static key for sched_clock_running
sched_clock_running may be read every time sched_clock_cpu() is called. Yet, this variable is updated only twice during boot, and never changes again, therefore it is better to make it a static key. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-25-pasha.tatashin@oracle.com |
||
![]() |
8f894bf47d |
sched/debug: Use match_string() helper instead of open-coded logic
match_string() returns the index of an array for a matching string, which can be used instead of the open coded variant. Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/1527765086-19873-15-git-send-email-xieyisheng1@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
fddda2b7b5 |
proc: introduce proc_create_seq{,_data}
Variants of proc_create{,_data} that directly take a struct seq_operations argument and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de> |
||
![]() |
46e0d28bdb |
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - NUMA balancing improvements (Mel Gorman) - Further load tracking improvements (Patrick Bellasi) - Various NOHZ balancing cleanups and optimizations (Peter Zijlstra) - Improve blocked load handling, in particular we can now reduce and eventually stop periodic load updates on 'very idle' CPUs. (Vincent Guittot) - On isolated CPUs offload the final 1Hz scheduler tick as well, plus related cleanups and reorganization. (Frederic Weisbecker) - Core scheduler code cleanups (Ingo Molnar)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) sched/core: Update preempt_notifier_key to modern API sched/cpufreq: Rate limits for SCHED_DEADLINE sched/fair: Update util_est only on util_avg updates sched/cpufreq/schedutil: Use util_est for OPP selection sched/fair: Use util_est in LB and WU paths sched/fair: Add util_est on top of PELT sched/core: Remove TASK_ALL sched/completions: Use bool in try_wait_for_completion() sched/fair: Update blocked load when newly idle sched/fair: Move idle_balance() sched/nohz: Merge CONFIG_NO_HZ_COMMON blocks sched/fair: Move rebalance_domains() sched/nohz: Optimize nohz_idle_balance() sched/fair: Reduce the periodic update duration sched/nohz: Stop NOHZ stats when decayed sched/cpufreq: Provide migration hint sched/nohz: Clean up nohz enter/exit sched/fair: Update blocked load from NEWIDLE sched/fair: Add NOHZ stats balancing sched/fair: Restructure nohz_balance_kick() ... |
||
![]() |
e9ca267096 |
sched/debug: Adjust newlines for better alignment
Scheduler debug stats include newlines that display out of alignment when prefixed by timestamps. For example, the dmesg utility: % echo t > /proc/sysrq-trigger % dmesg ... [ 83.124251] runnable tasks: S task PID tree-key switches prio wait-time sum-exec sum-sleep ----------------------------------------------------------------------------------------------------------- At the same time, some syslog utilities (like rsyslog by default) don't like the additional newlines control characters, saving lines like this to /var/log/messages: Mar 16 16:02:29 localhost kernel: #012runnable tasks:#012 S task PID tree-key ... ^^^^ ^^^^ Clean these up by moving newline characters to their own SEQ_printf invocation. This leaves the /proc/sched_debug unchanged, but brings the entire output into alignment when prefixed: % echo t > /proc/sysrq-trigger % dmesg ... [ 62.410368] runnable tasks: [ 62.410368] S task PID tree-key switches prio wait-time sum-exec sum-sleep [ 62.410369] ----------------------------------------------------------------------------------------------------------- [ 62.410369] I kworker/u12:0 5 1932.215593 332 120 0.000000 3.621252 0.000000 0 0 / and no escaped control characters from rsyslog in /var/log/messages: Mar 16 16:15:06 localhost kernel: runnable tasks: Mar 16 16:15:06 localhost kernel: S task PID tree-key ... Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
a8c024cd9b |
sched/debug: Fix per-task line continuation for console output
When the SEQ_printf() macro prints to the console, it runs a simple printk() without KERN_CONT "continued" line printing. The result of this is oddly wrapped task info, for example: % echo t > /proc/sysrq-trigger % dmesg ... runnable tasks: ... [ 29.608611] I [ 29.608613] rcu_sched 8 3252.013846 4087 120 [ 29.608614] 0.000000 29.090111 0.000000 [ 29.608615] 0 0 [ 29.608616] / Modify SEQ_printf to use pr_cont() for expected one-line results: % echo t > /proc/sysrq-trigger % dmesg ... runnable tasks: ... [ 106.716329] S cpuhp/5 37 2006.315026 14 120 0.000000 0.496893 0.000000 0 0 / Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1521484555-8620-2-git-send-email-joe.lawrence@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
7f65ea42eb |
sched/fair: Add util_est on top of PELT
The util_avg signal computed by PELT is too variable for some use-cases. For example, a big task waking up after a long sleep period will have its utilization almost completely decayed. This introduces some latency before schedutil will be able to pick the best frequency to run a task. The same issue can affect task placement. Indeed, since the task utilization is already decayed at wakeup, when the task is enqueued in a CPU, this can result in a CPU running a big task as being temporarily represented as being almost empty. This leads to a race condition where other tasks can be potentially allocated on a CPU which just started to run a big task which slept for a relatively long period. Moreover, the PELT utilization of a task can be updated every [ms], thus making it a continuously changing value for certain longer running tasks. This means that the instantaneous PELT utilization of a RUNNING task is not really meaningful to properly support scheduler decisions. For all these reasons, a more stable signal can do a better job of representing the expected/estimated utilization of a task/cfs_rq. Such a signal can be easily created on top of PELT by still using it as an estimator which produces values to be aggregated on meaningful events. This patch adds a simple implementation of util_est, a new signal built on top of PELT's util_avg where: util_est(task) = max(task::util_avg, f(task::util_avg@dequeue)) This allows to remember how big a task has been reported by PELT in its previous activations via f(task::util_avg@dequeue), which is the new _task_util_est(struct task_struct*) function added by this patch. If a task should change its behavior and it runs longer in a new activation, after a certain time its util_est will just track the original PELT signal (i.e. task::util_avg). The estimated utilization of cfs_rq is defined only for root ones. That's because the only sensible consumer of this signal are the scheduler and schedutil when looking for the overall CPU utilization due to FAIR tasks. For this reason, the estimated utilization of a root cfs_rq is simply defined as: util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est::enqueued) where: cfs_rq::util_est::enqueued = sum(_task_util_est(task)) for each RUNNABLE task on that root cfs_rq It's worth noting that the estimated utilization is tracked only for objects of interests, specifically: - Tasks: to better support tasks placement decisions - root cfs_rqs: to better support both tasks placement decisions as well as frequencies selection Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-2-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
325ea10c08 |
sched/headers: Simplify and clean up header usage in the scheduler
Do the following cleanups and simplifications: - sched/sched.h already includes <asm/paravirt.h>, so no need to include it in sched/core.c again. - order the <linux/sched/*.h> headers alphabetically - add all <linux/sched/*.h> headers to kernel/sched/sched.h - remove all unnecessary includes from the .c files that are already included in kernel/sched/sched.h. Finally, make all scheduler .c files use a single common header: #include "sched.h" ... which now contains a union of the relied upon headers. This makes the various .c files easier to read and easier to handle. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
97fb7a0a89 |
sched: Clean up and harmonize the coding style of the scheduler code base
A good number of small style inconsistencies have accumulated in the scheduler core, so do a pass over them to harmonize all these details: - fix speling in comments, - use curly braces for multi-line statements, - remove unnecessary parentheses from integer literals, - capitalize consistently, - remove stray newlines, - add comments where necessary, - remove invalid/unnecessary comments, - align structure definitions and other data types vertically, - add missing newlines for increased readability, - fix vertical tabulation where it's misaligned, - harmonize preprocessor conditional block labeling and vertical alignment, - remove line-breaks where they uglify the code, - add newline after local variable definitions, No change in functionality: md5: 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
1ea6c46a23 |
sched/fair: Propagate an effective runnable_load_avg
The load balancer uses runnable_load_avg as load indicator. For !cgroup this is: runnable_load_avg = \Sum se->avg.load_avg ; where se->on_rq That is, a direct sum of all runnable tasks on that runqueue. As opposed to load_avg, which is a sum of all tasks on the runqueue, which includes a blocked component. However, in the cgroup case, this comes apart since the group entities are always runnable, even if most of their constituent entities are blocked. Therefore introduce a runnable_weight which for task entities is the same as the regular weight, but for group entities is a fraction of the entity weight and represents the runnable part of the group runqueue. Then propagate this load through the PELT hierarchy to arrive at an effective runnable load avgerage -- which we should not confuse with the canonical runnable load average. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
0e2d2aaaae |
sched/fair: Rewrite PELT migration propagation
When an entity migrates in (or out) of a runqueue, we need to add (or remove) its contribution from the entire PELT hierarchy, because even non-runnable entities are included in the load average sums. In order to do this we have some propagation logic that updates the PELT tree, however the way it 'propagates' the runnable (or load) change is (more or less): tg->weight * grq->avg.load_avg ge->avg.load_avg = ------------------------------ tg->load_avg But that is the expression for ge->weight, and per the definition of load_avg: ge->avg.load_avg := ge->weight * ge->avg.runnable_avg That destroys the runnable_avg (by setting it to 1) we wanted to propagate. Instead directly propagate runnable_sum. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
2a2f5d4e44 |
sched/fair: Rewrite cfs_rq->removed_*avg
Since on wakeup migration we don't hold the rq->lock for the old CPU we cannot update its state. Instead we add the removed 'load' to an atomic variable and have the next update on that CPU collect and process it. Currently we have 2 atomic variables; which already have the issue that they can be read out-of-sync. Also, two atomic ops on a single cacheline is already more expensive than an uncontended lock. Since we want to add more, convert the thing over to an explicit cacheline with a lock in. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
65d5dc47fe |
sched/debug: Remove unused variable
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
ec846ecd63 |
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar: "Three CPU hotplug related fixes and a debugging improvement" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Add debugfs knob for "sched_debug" sched/core: WARN() when migrating to an offline CPU sched/fair: Plug hole between hotplug and active_load_balance() sched/fair: Avoid newidle balance for !active CPUs |
||
![]() |
9469eb01db |
sched/debug: Add debugfs knob for "sched_debug"
I'm forever late for editing my kernel cmdline, add a runtime knob to disable the "sched_debug" thing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170907150614.142924283@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
bfb068892d |
sched/fair: replace cfs_rq->rb_leftmost
... with the generic rbtree flavor instead. No changes in semantics whatsoever. Link: http://lkml.kernel.org/r/20170719014603.19029-8-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |