Commit Graph

14 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
ee965fe12d Merge branch 'android12-5.10' into branch 'android12-5.10-lts'
Sync up with android12-5.10 for the following commits:

fb39cdb9ea ANDROID: export reclaim_pages
1f8f6d59a2 ANDROID: vendor_hook: Add hook to not be stuck ro rmap lock in kswapd or direct_reclaim
91bfc78bc0 ANDROID: Update symbol list for mtk
02df0b2661 ANDROID: GKI: rockchip: Add symbols for crypto
efdf581d14 ANDROID: GKI: rockchip: Add symbol pci_disable_link_state
504ce2d3a6 ANDROID: GKI: rockchip: Add symbols for sound
a6b6bc98b7 ANDROID: GKI: rockchip: Add symbols for video
f3a311b456 BACKPORT: f2fs: do not set compression bit if kernel doesn't support
b0988144b0 UPSTREAM: exfat: improve performance of exfat_free_cluster when using dirsync mount
00d3b8c0cc ANDROID: GKI: rockchip: Add symbols for drm dp
936f1e35d1 UPSTREAM: arm64: perf: Support new DT compatibles
ed931dc8ff UPSTREAM: arm64: perf: Simplify registration boilerplate
bb6c018ab6 UPSTREAM: arm64: perf: Support Denver and Carmel PMUs
d306fd9d47 UPSTREAM: arm64: perf: add support for Cortex-A78
09f78c3f7e ANDROID: GKI: rockchip: Update symbol for devfreq
e7ed66854e ANDROID: GKI: rockchip: Update symbols for drm
a3e70ff5bf ANDROID: GKI: Update symbols to symbol list
a09241c6dd UPSTREAM: ASoC: hdmi-codec: make hdmi_codec_controls static
9eda09e511 UPSTREAM: ASoC: hdmi-codec: Add a prepare hook
4ad97b395f UPSTREAM: ASoC: hdmi-codec: Add iec958 controls
c0c2f6962d UPSTREAM: ASoC: hdmi-codec: Rework to support more controls
4c6eb3db8a UPSTREAM: ALSA: iec958: Split status creation and fill
580d2e7c78 UPSTREAM: ALSA: doc: Clarify IEC958 controls iface
8b4bb1bca0 UPSTREAM: ASoC: hdmi-codec: remove unused spk_mask member
5a2c4a5d1e UPSTREAM: ASoC: hdmi-codec: remove useless initialization
49e502f0c0 UPSTREAM: ASoC: codec: hdmi-codec: Support IEC958 encoded PCM format
9bf69acb92 UPSTREAM: ASoC: hdmi-codec: Fix return value in hdmi_codec_set_jack()
056409c7dc UPSTREAM: ASoC: hdmi-codec: Add RX support
5e75deab3a UPSTREAM: ASoC: hdmi-codec: Get ELD in before reporting plugged event
d6207c39cb ANDROID: GKI: rockchip: Add symbols for display driver
1c3ed9d481 BACKPORT: KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID
843d3cb41b BACKPORT: io_uring: always grab file table for deferred statx
784cc16aed BACKPORT: Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
2b377175a3 ANDROID: add two func in mm/memcontrol.c
e56f8712cf ANDROID: vendor_hooks: protect multi-mapcount pages in kernel
3f775b9367 ANDROID: vendor_hooks: account page-mapcount
1d2287f56e FROMGIT: io_uring: Use original task for req identity in io_identity_cow()
e0c9da25b2 FROMLIST: binder: fix UAF of ref->proc caused by race condition
12f4322442 ANDROID: vendor_hooks: Guard cgroup struct with CONFIG_CGROUPS
6532784c78 ANDROID: vendor_hooks: add hooks for remove_vm_area.
c9a70dd592 ANDROID: GKI: allow mm vendor hooks header inclusion from header files
039080d064 ANDROID: Update symbol list of mediatek
9e8dedef1e ANDROID: sched: add vendor hook to PELT multiplier
573c7f061d ANDROID: Guard hooks with their CONFIG_ options
14f646cca5 ANDROID: fix kernelci issue for allnoconfig builds
4442801a43 ANDROID: sched: Introducing PELT multiplier
b2e5773ea4 FROMGIT: binder: fix redefinition of seq_file attributes
9c2a5eef8f Merge tag 'android12-5.10.117_r00' into 'android12-5.10'
5fa1e1affc ANDROID: GKI: pcie: Fix the broken dw_pcie structure
51b3e17071 UPSTREAM: PCI: dwc: Support multiple ATU memory regions
a8d7f6518e ANDROID: oplus: Update the ABI xml and symbol list
4536de1b70 ANDROID: vendor_hooks: add hooks in __alloc_pages_slowpath
d63c961c9d ANDROID: GKI: Update symbols to symbol list
41cbbe08f9 FROMGIT: arm64: fix oops in concurrently setting insn_emulation sysctls
c301d142e8 FROMGIT: usb: dwc3: core: Do not perform GCTL_CORE_SOFTRESET during bootup
8b19ed264b ANDROID: vendor_hooks:vendor hook for mmput
242b11e574 ANDROID: vendor_hooks:vendor hook for pidfd_open
0e1cb27700 ANDROID: vendor_hook: Add hook in shmem_writepage()
8ee37d0bcd BACKPORT: iommu/dma: Fix race condition during iova_domain initialization
321bf845e1 FROMGIT: usb: dwc3: core: Deprecate GCTL.CORESOFTRESET
c5eb0edfde FROMGIT: usb: dwc3: gadget: Prevent repeat pullup()
8de633b735 FROMGIT: Binder: add TF_UPDATE_TXN to replace outdated txn
e8fce59434 BACKPORT: FROMGIT: cgroup: Use separate src/dst nodes when preloading css_sets for migration
f26c566455 UPSTREAM: usb: gadget: f_uac2: allow changing interface name via configfs
98fa7f7dfd UPSTREAM: usb: gadget: f_uac1: allow changing interface name via configfs
29172165ca UPSTREAM: usb: gadget: f_uac1: Add suspend callback
ff5468c71e UPSTREAM: usb: gadget: f_uac2: Add suspend callback
31e6d620c1 UPSTREAM: usb: gadget: u_audio: Add suspend call
17643c1fdd UPSTREAM: usb: gadget: u_audio: Rate ctl notifies about current srate (0=stopped)
308955e3a6 UPSTREAM: usb: gadget: f_uac1: Support multiple sampling rates
ae03eadb42 UPSTREAM: usb: gadget: f_uac2: Support multiple sampling rates
bedc53fae4 UPSTREAM: usb: gadget:audio: Replace deprecated macro S_IRUGO
37e0d5eddb UPSTREAM: usb: gadget: u_audio: Add capture/playback srate getter
3251bb3250 UPSTREAM: usb: gadget: u_audio: Move dynamic srate from params to rtd
530916be97 UPSTREAM: usb: gadget: u_audio: Support multiple sampling rates
7f496d5a99 UPSTREAM: docs: ABI: fixed formatting in configfs-usb-gadget-uac2
2500cb53e6 UPSTREAM: usb: gadget: u_audio: Subdevice 0 for capture ctls
c386f34bd4 UPSTREAM: usb: gadget: u_audio: fix calculations for small bInterval
f74e3e2fe4 UPSTREAM: docs: ABI: fixed req_number desc in UAC1
02949bae5c UPSTREAM: docs: ABI: added missing num_requests param to UAC2
e1377ac38f UPSTREAM: usb:gadget: f_uac1: fixed sync playback
4b7c8905c5 UPSTREAM: usb: gadget: u_audio.c: Adding Playback Pitch ctl for sync playback
e29d2b5178 UPSTREAM: ABI: configfs-usb-gadget-uac2: fix a broken table
ec313ae88d UPSTREAM: ABI: configfs-usb-gadget-uac1: fix a broken table
bf46bbe087 UPSTREAM: usb: gadget: f_uac1: fixing inconsistent indenting
b9c4cbbf7a UPSTREAM: docs: usb: fix malformed table
a380b466e0 UPSTREAM: usb: gadget: f_uac1: add volume and mute support
e2c0816af2 BACKPORT: usb: gadget: f_uac2: add volume and mute support
8430eb0243 UPSTREAM: usb: gadget: u_audio: add bi-directional volume and mute support
257d21b184 UPSTREAM: usb: audio-v2: add ability to define feature unit descriptor
1002747429 ANDROID: mm: shmem: use reclaim_pages() to recalim pages from a list
6719763187 UPSTREAM: usb: gadget: f_uac1: disable IN/OUT ep if unused

And add the new symbols being tracked due to abi additions from the
android12-5.10 branch:

Leaf changes summary: 85 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 69 Added functions
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 16 Added variables

69 Added functions:

  [A] 'function void __dev_kfree_skb_irq(sk_buff*, skb_free_reason)'
  [A] 'function int __page_mapcount(page*)'
  [A] 'function int __traceiter_android_vh_add_page_to_lrulist(void*, page*, bool, lru_list)'
  [A] 'function int __traceiter_android_vh_alloc_pages_slowpath_begin(void*, gfp_t, unsigned int, unsigned long int*)'
  [A] 'function int __traceiter_android_vh_alloc_pages_slowpath_end(void*, gfp_t, unsigned int, unsigned long int)'
  [A] 'function int __traceiter_android_vh_del_page_from_lrulist(void*, page*, bool, lru_list)'
  [A] 'function int __traceiter_android_vh_do_traversal_lruvec(void*, lruvec*)'
  [A] 'function int __traceiter_android_vh_mark_page_accessed(void*, page*)'
  [A] 'function int __traceiter_android_vh_mutex_unlock_slowpath_end(void*, mutex*, task_struct*)'
  [A] 'function int __traceiter_android_vh_page_should_be_protected(void*, page*, bool*)'
  [A] 'function int __traceiter_android_vh_rwsem_mark_wake_readers(void*, rw_semaphore*, rwsem_waiter*)'
  [A] 'function int __traceiter_android_vh_rwsem_set_owner(void*, rw_semaphore*)'
  [A] 'function int __traceiter_android_vh_rwsem_set_reader_owned(void*, rw_semaphore*)'
  [A] 'function int __traceiter_android_vh_rwsem_up_read_end(void*, rw_semaphore*)'
  [A] 'function int __traceiter_android_vh_rwsem_up_write_end(void*, rw_semaphore*)'
  [A] 'function int __traceiter_android_vh_sched_pelt_multiplier(void*, unsigned int, unsigned int, int*)'
  [A] 'function int __traceiter_android_vh_show_mapcount_pages(void*, void*)'
  [A] 'function int __traceiter_android_vh_update_page_mapcount(void*, page*, bool, bool, bool*, bool*)'
  [A] 'function int __v4l2_ctrl_handler_setup(v4l2_ctrl_handler*)'
  [A] 'function int crypto_ahash_final(ahash_request*)'
  [A] 'function crypto_akcipher* crypto_alloc_akcipher(const char*, u32, u32)'
  [A] 'function int crypto_register_akcipher(akcipher_alg*)'
  [A] 'function void crypto_unregister_akcipher(akcipher_alg*)'
  [A] 'function int des_expand_key(des_ctx*, const u8*, unsigned int)'
  [A] 'function void dev_pm_opp_unregister_set_opp_helper(opp_table*)'
  [A] 'function net_device* devm_alloc_etherdev_mqs(device*, int, unsigned int, unsigned int)'
  [A] 'function mii_bus* devm_mdiobus_alloc_size(device*, int)'
  [A] 'function int devm_of_mdiobus_register(device*, mii_bus*, device_node*)'
  [A] 'function int devm_register_netdev(device*, net_device*)'
  [A] 'function bool disable_hardirq(unsigned int)'
  [A] 'function void do_traversal_all_lruvec()'
  [A] 'function drm_connector_status drm_bridge_detect(drm_bridge*)'
  [A] 'function edid* drm_bridge_get_edid(drm_bridge*, drm_connector*)'
  [A] 'function int drm_bridge_get_modes(drm_bridge*, drm_connector*)'
  [A] 'function int drm_dp_get_phy_test_pattern(drm_dp_aux*, drm_dp_phy_test_params*)'
  [A] 'function int drm_dp_read_desc(drm_dp_aux*, drm_dp_desc*, bool)'
  [A] 'function int drm_dp_read_dpcd_caps(drm_dp_aux*, u8*)'
  [A] 'function int drm_dp_read_sink_count(drm_dp_aux*)'
  [A] 'function int drm_dp_set_phy_test_pattern(drm_dp_aux*, drm_dp_phy_test_params*, u8)'
  [A] 'function uint64_t drm_format_info_min_pitch(const drm_format_info*, int, unsigned int)'
  [A] 'function int drm_mm_reserve_node(drm_mm*, drm_mm_node*)'
  [A] 'function bool drm_probe_ddc(i2c_adapter*)'
  [A] 'function void drm_self_refresh_helper_cleanup(drm_crtc*)'
  [A] 'function int drm_self_refresh_helper_init(drm_crtc*)'
  [A] 'function int get_pelt_halflife()'
  [A] 'function ssize_t hdmi_avi_infoframe_pack_only(const hdmi_avi_infoframe*, void*, size_t)'
  [A] 'function ssize_t iio_read_const_attr(device*, device_attribute*, char*)'
  [A] 'function bool mipi_dsi_packet_format_is_short(u8)'
  [A] 'function platform_device* of_device_alloc(device_node*, const char*, device*)'
  [A] 'function lruvec* page_to_lruvec(page*, pg_data_t*)'
  [A] 'function int pci_disable_link_state(pci_dev*, int)'
  [A] 'function int regmap_test_bits(regmap*, unsigned int, unsigned int)'
  [A] 'function unsigned int regulator_get_linear_step(regulator*)'
  [A] 'function int regulator_suspend_enable(regulator_dev*, suspend_state_t)'
  [A] 'function int rsa_parse_priv_key(rsa_key*, void*, unsigned int)'
  [A] 'function int rsa_parse_pub_key(rsa_key*, void*, unsigned int)'
  [A] 'function int sg_nents(scatterlist*)'
  [A] 'function int snd_pcm_create_iec958_consumer_default(u8*, size_t)'
  [A] 'function int snd_pcm_fill_iec958_consumer(snd_pcm_runtime*, u8*, size_t)'
  [A] 'function int snd_pcm_fill_iec958_consumer_hw_params(snd_pcm_hw_params*, u8*, size_t)'
  [A] 'function int snd_soc_dapm_force_bias_level(snd_soc_dapm_context*, snd_soc_bias_level)'
  [A] 'function int snd_soc_jack_add_zones(snd_soc_jack*, int, snd_soc_jack_zone*)'
  [A] 'function int snd_soc_jack_get_type(snd_soc_jack*, int)'
  [A] 'function void tcpm_tcpc_reset(tcpm_port*)'
  [A] 'function int v4l2_enum_dv_timings_cap(v4l2_enum_dv_timings*, const v4l2_dv_timings_cap*, v4l2_check_dv_timings_fnc*, void*)'
  [A] 'function void v4l2_print_dv_timings(const char*, const char*, const v4l2_dv_timings*, bool)'
  [A] 'function int v4l2_src_change_event_subdev_subscribe(v4l2_subdev*, v4l2_fh*, v4l2_event_subscription*)'
  [A] 'function void v4l2_subdev_notify_event(v4l2_subdev*, const v4l2_event*)'
  [A] 'function bool v4l2_valid_dv_timings(const v4l2_dv_timings*, const v4l2_dv_timings_cap*, v4l2_check_dv_timings_fnc*, void*)'

16 Added variables:

  [A] 'tracepoint __tracepoint_android_vh_add_page_to_lrulist'
  [A] 'tracepoint __tracepoint_android_vh_alloc_pages_slowpath_begin'
  [A] 'tracepoint __tracepoint_android_vh_alloc_pages_slowpath_end'
  [A] 'tracepoint __tracepoint_android_vh_del_page_from_lrulist'
  [A] 'tracepoint __tracepoint_android_vh_do_traversal_lruvec'
  [A] 'tracepoint __tracepoint_android_vh_mark_page_accessed'
  [A] 'tracepoint __tracepoint_android_vh_mutex_unlock_slowpath_end'
  [A] 'tracepoint __tracepoint_android_vh_page_should_be_protected'
  [A] 'tracepoint __tracepoint_android_vh_rwsem_mark_wake_readers'
  [A] 'tracepoint __tracepoint_android_vh_rwsem_set_owner'
  [A] 'tracepoint __tracepoint_android_vh_rwsem_set_reader_owned'
  [A] 'tracepoint __tracepoint_android_vh_rwsem_up_read_end'
  [A] 'tracepoint __tracepoint_android_vh_rwsem_up_write_end'
  [A] 'tracepoint __tracepoint_android_vh_sched_pelt_multiplier'
  [A] 'tracepoint __tracepoint_android_vh_show_mapcount_pages'
  [A] 'tracepoint __tracepoint_android_vh_update_page_mapcount'

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I47eefe85b949d3f358da95a9b6553660b9be0791
2022-08-16 14:34:54 +02:00
JianMin Liu
4442801a43 ANDROID: sched: Introducing PELT multiplier
The new sysctl sched_pelt_multiplier allows a user to set a clock
multiplier x2 or x4 (x1 being the default). This clock multiplier
artificially speed-up PELT ramp up/down similarly to a faster half-life.
Indeed, if we write PELT as a first order filter:

  y(t) = G * (1 - exp(t/tau))

Then we can see that multiplying the time by a constant X, is the same
as
dividing the time constant tau by X.

  y(t) = G * (1 - exp((t*X)/tau))
  y(t) = G * (1 - exp(t/(tau/X)))

Tau being half-life*ln(2), multiplying the PELT time is the same as
dividing the half-life:

  - x1: 32ms half-life
  - x2: 16ms half-life
  - x4: 8ms  half-life

Internally, a new clock is created: rq->clock_task_mult. It sits in the
clock hierarchy between rq->clock_task and rq->clock_pelt.

Bug: 177593580
Bug: 237219700
Change-Id: I67e6ca7994bebea22bf75732ee11d2b10e0d6b7e
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: JianMin Liu <jian-min.liu@mediatek.com>
2022-07-27 21:20:26 +00:00
Chengming Zhou
147a376c1a sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq
[ Upstream commit 64eaf50731ac0a8c76ce2fedd50ef6652aabc5ff ]

Since commit 2312729688 ("sched/fair: Update scale invariance of PELT")
change to use rq_clock_pelt() instead of rq_clock_task(), we should also
use rq_clock_pelt() for throttled_clock_task_time and throttled_clock_task
accounting to get correct cfs_rq_clock_pelt() of throttled cfs_rq. And
rename throttled_clock_task(_time) to be clock_pelt rather than clock_task.

Fixes: 2312729688 ("sched/fair: Update scale invariance of PELT")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20220408115309.81603-1-zhouchengming@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09 10:21:02 +02:00
Vincent Guittot
57b2f3632b sched/pelt: Relax the sync of util_sum with util_avg
[ Upstream commit 98b0d890220d45418cfbc5157b3382e6da5a12ab ]

Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")

This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.

Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.

In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.

Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
Reported-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-01 17:25:45 +01:00
Dietmar Eggemann
190a7f9089 sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling
commit 68d7a190682aa4eb02db477328088ebad15acc83 upstream.

The util_est internal UTIL_AVG_UNCHANGED flag which is used to prevent
unnecessary util_est updates uses the LSB of util_est.enqueued. It is
exposed via _task_util_est() (and task_util_est()).

Commit 92a801e5d5 ("sched/fair: Mask UTIL_AVG_UNCHANGED usages")
mentions that the LSB is lost for util_est resolution but
find_energy_efficient_cpu() checks if task_util_est() returns 0 to
return prev_cpu early.

_task_util_est() returns the max value of util_est.ewma and
util_est.enqueued or'ed w/ UTIL_AVG_UNCHANGED.
So task_util_est() returning the max of task_util() and
_task_util_est() will never return 0 under the default
SCHED_FEAT(UTIL_EST, true).

To fix this use the MSB of util_est.enqueued instead and keep the flag
util_est internal, i.e. don't export it via _task_util_est().

The maximal possible util_avg value for a task is 1024 so the MSB of
'unsigned int util_est.enqueued' isn't used to store a util value.

As a caveat the code behind the util_est_se trace point has to filter
UTIL_AVG_UNCHANGED to see the real util_est.enqueued value which should
be easy to do.

This also fixes an issue report by Xuewen Yan that util_est_update()
only used UTIL_AVG_UNCHANGED for the subtrahend of the equation:

  last_enqueued_diff = ue.enqueued - (task_util() | UTIL_AVG_UNCHANGED)

Fixes: b89997aa88f0b sched/pelt: Fix task util_est update filtering
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Xuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210602145808.1562603-1-dietmar.eggemann@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16 12:01:46 +02:00
Vincent Guittot
87e867b426 sched/pelt: Cleanup PELT divider
Factorize in a single place the calculation of the divider to be used to
to compute *_avg from *_sum value

Suggested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200612154703.23555-1-vincent.guittot@linaro.org
2020-06-15 14:10:06 +02:00
Thara Gopinath
765047932f sched/pelt: Add support to track thermal pressure
Extrapolating on the existing framework to track rt/dl utilization using
pelt signals, add a similar mechanism to track thermal pressure. The
difference here from rt/dl utilization tracking is that, instead of
tracking time spent by a CPU running a RT/DL task through util_avg, the
average thermal pressure is tracked through load_avg. This is because
thermal pressure signal is weighted time "delta" capacity unlike util_avg
which is binary. "delta capacity" here means delta between the actual
capacity of a CPU and the decreased capacity a CPU due to a thermal event.

In order to track average thermal pressure, a new sched_avg variable
avg_thermal is introduced. Function update_thermal_load_avg can be called
to do the periodic bookkeeping (accumulate, decay and average) of the
thermal pressure.

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-2-thara.gopinath@linaro.org
2020-03-06 12:57:17 +01:00
Vincent Guittot
8ec59c0f5f sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is
unused since commit:

  765d0af19f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'")

Remove it.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gregkh@linuxfoundation.org
Cc: linux@armlinux.org.uk
Cc: quentin.perret@arm.com
Cc: rafael@kernel.org
Link: https://lkml.kernel.org/r/1560783617-5827-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24 19:23:39 +02:00
Vincent Guittot
2312729688 sched/fair: Update scale invariance of PELT
The current implementation of load tracking invariance scales the
contribution with current frequency and uarch performance (only for
utilization) of the CPU. One main result of this formula is that the
figures are capped by current capacity of CPU. Another one is that the
load_avg is not invariant because not scaled with uarch.

The util_avg of a periodic task that runs r time slots every p time slots
varies in the range :

    U * (1-y^r)/(1-y^p) * y^i < Utilization < U * (1-y^r)/(1-y^p)

with U is the max util_avg value = SCHED_CAPACITY_SCALE

At a lower capacity, the range becomes:

    U * C * (1-y^r')/(1-y^p) * y^i' < Utilization <  U * C * (1-y^r')/(1-y^p)

with C reflecting the compute capacity ratio between current capacity and
max capacity.

so C tries to compensate changes in (1-y^r') but it can't be accurate.

Instead of scaling the contribution value of PELT algo, we should scale the
running time. The PELT signal aims to track the amount of computation of
tasks and/or rq so it seems more correct to scale the running time to
reflect the effective amount of computation done since the last update.

In order to be fully invariant, we need to apply the same amount of
running time and idle time whatever the current capacity. Because running
at lower capacity implies that the task will run longer, we have to ensure
that the same amount of idle time will be applied when system becomes idle
and no idle time has been "stolen". But reaching the maximum utilization
value (SCHED_CAPACITY_SCALE) means that the task is seen as an
always-running task whatever the capacity of the CPU (even at max compute
capacity). In this case, we can discard this "stolen" idle times which
becomes meaningless.

In order to achieve this time scaling, a new clock_pelt is created per rq.
The increase of this clock scales with current capacity when something
is running on rq and synchronizes with clock_task when rq is idle. With
this mechanism, we ensure the same running and idle time whatever the
current capacity. This also enables to simplify the pelt algorithm by
removing all references of uarch and frequency and applying the same
contribution to utilization and loads. Furthermore, the scaling is done
only once per update of clock (update_rq_clock_task()) instead of during
each update of sched_entities and cfs/rt/dl_rq of the rq like the current
implementation. This is interesting when cgroup are involved as shown in
the results below:

On a hikey (octo Arm64 platform).
Performance cpufreq governor and only shallowest c-state to remove variance
generated by those power features so we only track the impact of pelt algo.

each test runs 16 times:

	./perf bench sched pipe
	(higher is better)
	kernel	tip/sched/core     + patch
	        ops/seconds        ops/seconds         diff
	cgroup
	root    59652(+/- 0.18%)   59876(+/- 0.24%)    +0.38%
	level1  55608(+/- 0.27%)   55923(+/- 0.24%)    +0.57%
	level2  52115(+/- 0.29%)   52564(+/- 0.22%)    +0.86%

	hackbench -l 1000
	(lower is better)
	kernel	tip/sched/core     + patch
	        duration(sec)      duration(sec)        diff
	cgroup
	root    4.453(+/- 2.37%)   4.383(+/- 2.88%)     -1.57%
	level1  4.859(+/- 8.50%)   4.830(+/- 7.07%)     -0.60%
	level2  5.063(+/- 9.83%)   4.928(+/- 9.66%)     -2.66%

Then, the responsiveness of PELT is improved when CPU is not running at max
capacity with this new algorithm. I have put below some examples of
duration to reach some typical load values according to the capacity of the
CPU with current implementation and with this patch. These values has been
computed based on the geometric series and the half period value:

  Util (%)     max capacity  half capacity(mainline)  half capacity(w/ patch)
  972 (95%)    138ms         not reachable            276ms
  486 (47.5%)  30ms          138ms                     60ms
  256 (25%)    13ms           32ms                     26ms

On my hikey (octo Arm64 platform) with schedutil governor, the time to
reach max OPP when starting from a null utilization, decreases from 223ms
with current scale invariance down to 121ms with the new algorithm.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: patrick.bellasi@arm.com
Cc: pjt@google.com
Cc: pkondeti@codeaurora.org
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Link: https://lkml.kernel.org/r/1548257214-13745-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 09:13:21 +01:00
Vincent Guittot
11d4afd4ff sched/pelt: Fix warning and clean up IRQ PELT config
Create a config for enabling irq load tracking in the scheduler.
irq load tracking is useful only when irq or paravirtual time is
accounted but it's only possible with SMP for now.

Also use __maybe_unused to remove the compilation warning in
update_rq_clock_task() that has been introduced by:

  2e62c4743a ("sched/fair: Remove #ifdefs from scale_rt_capacity()")

Suggested-by: Ingo Molnar <mingo@redhat.com>
Reported-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: dou_liyang@163.com
Fixes: 2e62c4743a ("sched/fair: Remove #ifdefs from scale_rt_capacity()")
Link: http://lkml.kernel.org/r/1537867062-27285-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 09:45:00 +02:00
Vincent Guittot
91c27493e7 sched/irq: Add IRQ utilization tracking
interrupt and steal time are the only remaining activities tracked by
rt_avg. Like for sched classes, we can use PELT to track their average
utilization of the CPU. But unlike sched class, we don't track when
entering/leaving interrupt; Instead, we take into account the time spent
under interrupt context when we update rqs' clock (rq_clock_task).
This also means that we have to decay the normal context time and account
for interrupt time during the update.

That's also important to note that because:

  rq_clock == rq_clock_task + interrupt time

and rq_clock_task is used by a sched class to compute its utilization, the
util_avg of a sched class only reflects the utilization of the time spent
in normal context and not of the whole time of the CPU. The utilization of
interrupt gives an more accurate level of utilization of CPU.

The CPU utilization is:

  avg_irq + (1 - avg_irq / max capacity) * /Sum avg_rq

Most of the time, avg_irq is small and neglictible so the use of the
approximation CPU utilization = /Sum avg_rq was enough.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Cc: viresh.kumar@linaro.org
Link: http://lkml.kernel.org/r/1530200714-4504-7-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-15 23:51:21 +02:00
Vincent Guittot
3727e0e163 sched/dl: Add dl_rq utilization tracking
Similarly to what happens with RT tasks, CFS tasks can be preempted by DL
tasks and the CFS's utilization might no longer describes the real
utilization level.

Current DL bandwidth reflects the requirements to meet deadline when tasks are
enqueued but not the current utilization of the DL sched class. We track
DL class utilization to estimate the system utilization.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Cc: viresh.kumar@linaro.org
Link: http://lkml.kernel.org/r/1530200714-4504-5-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-15 23:51:20 +02:00
Vincent Guittot
371bf42732 sched/rt: Add rt_rq utilization tracking
schedutil governor relies on cfs_rq's util_avg to choose the OPP when CFS
tasks are running. When the CPU is overloaded by CFS and RT tasks, CFS tasks
are preempted by RT tasks and in this case util_avg reflects the remaining
capacity but not what CFS want to use. In such case, schedutil can select a
lower OPP whereas the CPU is overloaded. In order to have a more accurate
view of the utilization of the CPU, we track the utilization of RT tasks.
Only util_avg is correctly tracked but not load_avg and runnable_load_avg
which are useless for rt_rq.

rt_rq uses rq_clock_task and cfs_rq uses cfs_rq_clock_task but they are
the same at the root group level, so the PELT windows of the util_sum are
aligned.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Cc: viresh.kumar@linaro.org
Link: http://lkml.kernel.org/r/1530200714-4504-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-15 23:51:20 +02:00
Vincent Guittot
c079629862 sched/pelt: Move PELT related code in a dedicated file
We want to track rt_rq's utilization as a part of the estimation of the
whole rq's utilization. This is necessary because rt tasks can steal
utilization to cfs tasks and make them lighter than they are.
As we want to use the same load tracking mecanism for both and prevent
useless dependency between cfs and rt code, PELT code is moved in a
dedicated file.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Cc: viresh.kumar@linaro.org
Link: http://lkml.kernel.org/r/1530200714-4504-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-15 23:51:20 +02:00