android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Cheng Jian	60588bfa22	sched/fair: Optimize select_idle_cpu select_idle_cpu() will scan the LLC domain for idle CPUs, it's always expensive. so the next commit : `1ad3aaf3fc` ("sched/core: Implement new approach to scale select_idle_cpu()") introduces a way to limit how many CPUs we scan. But it consume some CPUs out of 'nr' that are not allowed for the task and thus waste our attempts. The function always return nr_cpumask_bits, and we can't find a CPU which our task is allowed to run. Cpumask may be too big, similar to select_idle_core(), use per_cpu_ptr 'select_idle_mask' to prevent stack overflow. Fixes: `1ad3aaf3fc` ("sched/core: Implement new approach to scale select_idle_cpu()") Signed-off-by: Cheng Jian <cj.chengjian@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20191213024530.28052-1-cj.chengjian@huawei.com	2019-12-17 13:32:51 +01:00
Frederic Weisbecker	7c2e8bbd87	sched: Spare resched IPI when prio changes on a single fair task The runqueue of a fair task being remotely reniced is going to get a resched IPI in order to reassess which task should be the current running on the CPU. However that evaluation is useless if the fair task is running alone, in which case we can spare that IPI, preventing nohz_full CPUs from being disturbed. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20191203160106.18806-2-frederic@kernel.org	2019-12-17 13:32:50 +01:00
Vincent Guittot	6cf82d559e	sched/cfs: fix spurious active migration The load balance can fail to find a suitable task during the periodic check because the imbalance is smaller than half of the load of the waiting tasks. This results in the increase of the number of failed load balance, which can end up to start an active migration. This active migration is useless because the current running task is not a better choice than the waiting ones. In fact, the current task was probably not running but waiting for the CPU during one of the previous attempts and it had already not been selected. When load balance fails too many times to migrate a task, we should relax the contraint on the maximum load of the tasks that can be migrated similarly to what is done with cache hotness. Before the rework, load balance used to set the imbalance to the average load_per_task in order to mitigate such situation. This increased the likelihood of migrating a task but also of selecting a larger task than needed while more appropriate ones were in the list. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1575036287-6052-1-git-send-email-vincent.guittot@linaro.org	2019-12-17 13:32:48 +01:00
Vincent Guittot	7ed735c331	sched/fair: Fix find_idlest_group() to handle CPU affinity Because of CPU affinity, the local group can be skipped which breaks the assumption that statistics are always collected for local group. With uninitialized local_sgs, the comparison is meaningless and the behavior unpredictable. This can even end up to use local pointer which is to NULL in this case. If the local group has been skipped because of CPU affinity, we return the idlest group. Fixes: `57abff067a` ("sched/fair: Rework find_idlest_group()") Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: John Stultz <john.stultz@linaro.org> Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: mingo@redhat.com Cc: mgorman@suse.de Cc: juri.lelli@redhat.com Cc: dietmar.eggemann@arm.com Cc: bsegall@google.com Cc: qais.yousef@arm.com Link: https://lkml.kernel.org/r/1575483700-22153-1-git-send-email-vincent.guittot@linaro.org	2019-12-17 13:32:48 +01:00
Quentin Perret	76e19fdf8e	ANDROID: sched/fair: Fix 5.5-rc1 merge Some android-specific EAS patches caused issues during the 5.5-rc1 merge that contains the load-balance rework. Most of the issues these patches were fixing should now covered by the rework, which means they are not needed any more. And for the others, they need to be reworked to fit with the new LB code. Fix the 5.5-rc1 merge by removing all the android-specific code from the load balancer. Effectively, this reverts `65b8ddef4e` ("ANDROID: sched: Prevent unnecessary active balance of single task in sched group"), `f351885fc7` ("ANDROID: sched/fair: Attempt to improve throughput for asym cap systems"), `a7455f8123` ("ANDROID: sched/fair: Don't balance misfits if it would overload local group") and `44ab1611a8` ("ANDROID: sched/fair: Also do misfit in overloaded groups"). Fixes: `d3a196a371` ("Merge 5.5-rc1 into android-mainline") Change-Id: I15c2e044d9f571be5eac0ced929690d9bf7dec65 Signed-off-by: Quentin Perret <qperret@google.com>	2019-12-09 17:01:44 +00:00
Greg Kroah-Hartman	d3a196a371	Merge 5.5-rc1 into android-mainline Linux 5.5-rc1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6f952ebdd40746115165a2f99bab340482f5c237	2019-12-09 12:12:00 +01:00
Vincent Guittot	bef69dd878	sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util() update_cfs_rq_load_avg() calls cfs_rq_util_change() every time PELT decays, which might be inefficient when the cpufreq driver has rate limitation. When a task is attached on a CPU, we have this call path: update_load_avg() update_cfs_rq_load_avg() cfs_rq_util_change -- > trig frequency update attach_entity_load_avg() cfs_rq_util_change -- > trig frequency update The 1st frequency update will not take into account the utilization of the newly attached task and the 2nd one might be discarded because of rate limitation of the cpufreq driver. update_cfs_rq_load_avg() is only called by update_blocked_averages() and update_load_avg() so we can move the call to cfs_rq_util_change/cpufreq_update_util() into these two functions. It's also interesting to note that update_load_avg() already calls cfs_rq_util_change() directly for the !SMP case. This change will also ensure that cpufreq_update_util() is called even when there is no more CFS rq in the leaf_cfs_rq_list to update, but only IRQ, RT or DL PELT signals. [ mingo: Minor updates. ] Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: juri.lelli@redhat.com Cc: linux-pm@vger.kernel.org Cc: mgorman@suse.de Cc: rostedt@goodmis.org Cc: sargun@sargun.me Cc: srinivas.pandruvada@linux.intel.com Cc: tj@kernel.org Cc: xiexiuqi@huawei.com Cc: xiezhipeng1@huawei.com Fixes: `039ae8bcf7` ("sched/fair: Fix O(nr_cgroups) in the load balancing path") Link: https://lkml.kernel.org/r/1574083279-799-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-18 14:42:26 +01:00
Ingo Molnar	b21feab0b8	Merge tag 'v5.4-rc8' into sched/core, to pick up fixes and dependencies Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-18 14:41:02 +01:00
Vincent Guittot	a9723389cc	sched/fair: Add comments for group_type and balancing at SD_NUMA level Add comments to describe each state of goup_type and to add some details about the load balance at NUMA level. [ Valentin Schneider: Updates to the comments. ] [ mingo: Other updates to the comments. ] Reported-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1573570243-1903-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-18 14:33:12 +01:00
Vincent Guittot	3318544b72	sched/fair: Fix rework of find_idlest_group() The task, for which the scheduler looks for the idlest group of CPUs, must be discounted from all statistics in order to get a fair comparison between groups. This includes utilization, load, nr_running and idle_cpus. Such unfairness can be easily highlighted with the unixbench execl 1 task. This test continuously call execve() and the scheduler looks for the idlest group/CPU on which it should place the task. Because the task runs on the local group/CPU, the latter seems already busy even if there is nothing else running on it. As a result, the scheduler will always select another group/CPU than the local one. This recovers most of the performance regression on my system from the recent load-balancer rewrite. [ mingo: Minor cleanups. ] Reported-by: kernel test robot <rong.a.chen@intel.com> Tested-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Fixes: `57abff067a` ("sched/fair: Rework find_idlest_group()") Link: https://lkml.kernel.org/r/1571762798-25900-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-18 14:11:56 +01:00
Greg Kroah-Hartman	ad5859c6ae	Merge 5.4-rc8 into android-mainline Linux 5.4-rc8 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I1f55e5d34dc78ddb064910ce1e1b7a7b5b39aaba	2019-11-18 08:31:11 +01:00
Vincent Guittot	b90f7c9d21	sched/pelt: Fix update of blocked PELT ordering update_cfs_rq_load_avg() can call cpufreq_update_util() to trigger an update of the frequency. Make sure that RT, DL and IRQ PELT signals have been updated before calling cpufreq. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: dsmythies@telus.net Cc: juri.lelli@redhat.com Cc: mgorman@suse.de Cc: rostedt@goodmis.org Fixes: `371bf42732` ("sched/rt: Add rt_rq utilization tracking") Fixes: `3727e0e163` ("sched/dl: Add dl_rq utilization tracking") Fixes: `91c27493e7` ("sched/irq: Add IRQ utilization tracking") Link: https://lkml.kernel.org/r/1572434309-32512-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-13 08:01:31 +01:00
Peter Zijlstra	a0e813f26e	sched/core: Further clarify sched_class::set_next_task() It turns out there really is something special to the first set_next_task() invocation. In specific the 'change' pattern really should not cause balance callbacks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Fixes: `f95d4eaee6` ("sched/{rt,deadline}: Fix set_next_task vs pick_next_task") Link: https://lkml.kernel.org/r/20191108131909.775434698@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-11 08:35:21 +01:00
Peter Zijlstra	2eeb01a28c	sched/fair: Use mul_u32_u32() While reading the code I encountered another site where we should be using mul_u32_u32() because GCC just won't take a hint. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.717931380@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-11 08:35:20 +01:00
Peter Zijlstra	98c2f700ed	sched/core: Simplify sched_class::pick_next_task() Now that the indirect class call never uses the last two arguments of pick_next_task(), remove them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.660595546@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-11 08:35:20 +01:00
Peter Zijlstra	5d7d605642	sched/core: Optimize pick_next_task() Ever since we moved the sched_class definitions into their own files, the constant expression {fair,idle}_sched_class.pick_next_task() is not in fact a compile time constant anymore and results in an indirect call (barring LTO). Fix that by exposing pick_next_task_{fair,idle}() directly, this gets rid of the indirect call (and RETPOLINE) on the fast path. Also remove the unlikely() from the idle case, it is in fact /the/ way we select idle -- and that is a very common thing to do. Performance for will-it-scale/sched_yield improves by 2% (as reported by 0-day). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.603037345@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-11 08:35:19 +01:00
Peter Zijlstra	7277a34c6b	sched/fair: Better document newidle_balance() Whilst chasing the pick_next_task() race, there was some confusion about the newidle_balance() return values. Document them. [ mingo: Minor edits. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.488364308@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-11 08:35:18 +01:00
Ingo Molnar	6d5a763c30	Merge tag 'v5.4-rc7' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-11 08:34:59 +01:00
Greg Kroah-Hartman	682d8bf784	Merge tag 'v5.4-rc7' into android-mainline Linux 5.4-rc7 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I505207a0a6f68ccc3519d7f190d8faf25d9d479a	2019-11-11 06:10:55 +01:00
Greg Kroah-Hartman	8c36c8518c	Revert "Revert "sched: Rework pick_next_task() slow-path"" This reverts commit `9acb7c07b6`. The root problem has now been fixed in 5.4-rc7, so revert this to prevent merge issues. Bug: 142182814 Cc: Quentin Perret <qperret@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-11-11 06:09:50 +01:00
Peter Zijlstra	6e2df0581f	sched: Fix pick_next_task() vs 'change' pattern race Commit `67692435c4` ("sched: Rework pick_next_task() slow-path") inadvertly introduced a race because it changed a previously unexplored dependency between dropping the rq->lock and sched_class::put_prev_task(). The comments about dropping rq->lock, in for example newidle_balance(), only mentions the task being current and ->on_cpu being set. But when we look at the 'change' pattern (in for example sched_setnuma()): queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED / running = task_current(rq, p); / rq->curr == p / if (queued) dequeue_task(...); if (running) put_prev_task(...); / change task properties */ if (queued) enqueue_task(...); if (running) set_next_task(...); It becomes obvious that if we do this after put_prev_task() has already been called on @p, things go sideways. This is exactly what the commit in question allows to happen when it does: prev->sched_class->put_prev_task(rq, prev, rf); if (!rq->nr_running) newidle_balance(rq, rf); The newidle_balance() call will drop rq->lock after we've called put_prev_task() and that allows the above 'change' pattern to interleave and mess up the state. Furthermore, it turns out we lost the RT-pull when we put the last DL task. Fix both problems by extracting the balancing from put_prev_task() and doing a multi-class balance() pass before put_prev_task(). Fixes: `67692435c4` ("sched: Rework pick_next_task() slow-path") Reported-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Quentin Perret <qperret@google.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com>	2019-11-08 22:34:14 +01:00
Quentin Perret	9acb7c07b6	Revert "sched: Rework pick_next_task() slow-path" This reverts commit `67692435c4` temporarily in order to avoid the following splat: [ 222.515957] BUG: kernel NULL pointer dereference, address: 0000000000000040 [ 222.518871] #PF: supervisor read access in kernel mode [ 222.518871] #PF: error_code(0x0000) - not-present page [ 222.518871] PGD 800000007cdf7067 P4D 800000007cdf7067 PUD 7c868067 PMD 0 [ 222.518871] Oops: 0000 [#1] PREEMPT SMP PTI [ 222.518871] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.0-rc4-mainline-g6a9b34166c05 #1 [ 222.518871] Hardware name: ChromiumOS crosvm, BIOS 0 [ 222.518871] RIP: 0010:set_next_entity+0xf/0xf0 [ 222.518871] Code: 42 0b e5 85 31 c0 e8 90 1a fc ff 0f 0b eb 9b 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 53 48 89 f3 49 89 fe <83> 7e 40 00 74 3d 4c 89 f7 48 89 de e8 80 f9 ff ff 4c 8d 7b 18 4d [ 222.518871] RSP: 0018:ffffb95040073de8 EFLAGS: 00010046 [ 222.518871] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff85ec0c30 [ 222.518871] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ec44c728500 [ 222.518871] RBP: ffffb95040073e00 R08: 0000000000000000 R09: 0000000000000000 [ 222.518871] R10: 0000000000000000 R11: ffffffff84ef0a30 R12: 0000000000000000 [ 222.518871] R13: 0000000000000000 R14: ffff9ec44c728500 R15: ffffb95040073e60 [ 222.518871] FS: 0000000000000000(0000) GS:ffff9ec44c700000(0000) knlGS:0000000000000000 [ 222.518871] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.518871] CR2: 0000000000000040 CR3: 000000007c87a002 CR4: 00000000003606e0 [ 222.518871] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 222.518871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 222.518871] Call Trace: [ 222.518871] pick_next_task_fair+0xe8/0x410 [ 222.518871] ? rcu_note_context_switch+0x3ef/0x520 [ 222.518871] ? finish_task_switch+0x10d/0x250 [ 222.518871] __schedule+0x1f8/0x6e0 [ 222.518871] schedule_idle+0x27/0x40 [ 222.518871] do_idle+0x21c/0x240 [ 222.518871] cpu_startup_entry+0x25/0x30 [ 222.518871] start_secondary+0x18d/0x1b0 [ 222.518871] secondary_startup_64+0xa4/0xb0 [ 222.518871] Modules linked in: vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common vsock_diag vsock can_raw can snd_intel8x0 snd_ac97_codec ac97_bus virtio_input virtio_pci virtio_mmio virtio_crypto mmc_block mmc_core dummy_cpufreq rtc_test dummy_hcd can_dev vcan virtio_net net_failover failover virt_wifi virtio_blk virtio_gpu ttm virtio_rng virtio_ring virtio 8250_of crypto_engine [ 222.518871] CR2: 0000000000000040 [ 222.518871] ---[ end trace ae741809965b22f2 ]--- [ 222.518871] RIP: 0010:set_next_entity+0xf/0xf0 [ 222.518871] Code: 42 0b e5 85 31 c0 e8 90 1a fc ff 0f 0b eb 9b 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 53 48 89 f3 49 89 fe <83> 7e 40 00 74 3d 4c 89 f7 48 89 de e8 80 f9 ff ff 4c 8d 7b 18 4d [ 222.518871] RSP: 0018:ffffb95040073de8 EFLAGS: 00010046 [ 222.518871] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff85ec0c30 [ 222.518871] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ec44c728500 [ 222.518871] RBP: ffffb95040073e00 R08: 0000000000000000 R09: 0000000000000000 [ 222.518871] R10: 0000000000000000 R11: ffffffff84ef0a30 R12: 0000000000000000 [ 222.518871] R13: 0000000000000000 R14: ffff9ec44c728500 R15: ffffb95040073e60 [ 222.518871] FS: 0000000000000000(0000) GS:ffff9ec44c700000(0000) knlGS:0000000000000000 [ 222.518871] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.518871] CR2: 0000000000000040 CR3: 000000007c87a002 CR4: 00000000003606e0 [ 222.518871] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 222.518871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 222.518871] Kernel panic - not syncing: Fatal exception [ 222.518871] Kernel Offset: 0x3e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Bug: 142182814 Test: 20/20 avd/avd_boot_test_kernel_triggered passing Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: I62bb09b4710fdc593cbf1f42bf4bbc066e401458	2019-10-29 17:29:04 +00:00
Patrick Bellasi	b8c9636140	sched/fair/util_est: Implement faster ramp-up EWMA on utilization increases The estimated utilization for a task: util_est = max(util_avg, est.enqueue, est.ewma) is defined based on: - util_avg: the PELT defined utilization - est.enqueued: the util_avg at the end of the last activation - est.ewma: a exponential moving average on the est.enqueued samples According to this definition, when a task suddenly changes its bandwidth requirements from small to big, the EWMA will need to collect multiple samples before converging up to track the new big utilization. This slow convergence towards bigger utilization values is not aligned to the default scheduler behavior, which is to optimize for performance. Moreover, the est.ewma component fails to compensate for temporarely utilization drops which spans just few est.enqueued samples. To let util_est do a better job in the scenario depicted above, change its definition by making util_est directly follow upward motion and only decay the est.ewma on downward. Signed-off-by: Patrick Bellasi <patrick.bellasi@matbug.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <qperret@google.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191023205630.14469-1-patrick.bellasi@matbug.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-29 10:01:07 +01:00
Chris Redpath	06dc94f19d	ANDROID: sched: Honor sync flag for energy-aware wakeups Since we don't do energy-aware wakeups when we are overutilized, always honoring sync wakeups in this state does not prevent wake-wide mechanics overruling the flag as normal. Having the check for (cpu_rq(cpu)->nr_running == 1) in the test condition means that we only bypass the energy model for sync wakeups where the cpu is about to become idle. This matches the use of this flag in Binder which is the main place where sync wakeups come from in Android. This patch is based upon previous work to build EAS for android products. It replaces commit `79e3a4a27e` ("ANDROID: sched: Unconditionally honor sync flag for energy-aware wakeups") in what's considered to be 'EAS'. sync-hint code taken from wahoo https://android.googlesource.com/kernel/msm/+/4a5e890ec60d written by Juri Lelli <juri.lelli@arm.com> Change-Id: I4b3d79141fc8e53dc51cd63ac11096c2e3cb10f5 Signed-off-by: Chris Redpath <chris.redpath@arm.com> (cherry-picked from commit f1ec666a62de) [ Moved the feature to find_energy_efficient_cpu() and removed the sysctl knob ] Signed-off-by: Quentin Perret <quentin.perret@arm.com> [ Add 'cpu_rq(cpu)->nr_running == 1' to the condition and remove the word 'Unconditionally' from the patch name ] Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>	2019-10-22 09:04:43 +00:00
Vincent Guittot	57abff067a	sched/fair: Rework find_idlest_group() The slow wake up path computes per sched_group statisics to select the idlest group, which is quite similar to what load_balance() is doing for selecting busiest group. Rework find_idlest_group() to classify the sched_group and select the idlest one following the same steps as load_balance(). Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-12-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:55 +02:00
Vincent Guittot	fc1273f4ce	sched/fair: Optimize find_idlest_group() find_idlest_group() now reads CPU's load_avg in two different ways. Consolidate the function to read and use load_avg only once and simplify the algorithm to only look for the group with lowest load_avg. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-11-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:55 +02:00
Vincent Guittot	11f10e5420	sched/fair: Use load instead of runnable load in wakeup path Runnable load was originally introduced to take into account the case where blocked load biases the wake up path which may end to select an overloaded CPU with a large number of runnable tasks instead of an underutilized CPU with a huge blocked load. Tha wake up path now starts looking for idle CPUs before comparing runnable load and it's worth aligning the wake up path with the load_balance() logic. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-10-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:55 +02:00
Vincent Guittot	c63be7be59	sched/fair: Use utilization to select misfit task Utilization is used to detect a misfit task but the load is then used to select the task on the CPU which can lead to select a small task with high weight instead of the task that triggered the misfit migration. Check that task can't fit the CPU's capacity when selecting the misfit task instead of using the load. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/1571405198-27570-9-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:54 +02:00
Vincent Guittot	2ab4092fc8	sched/fair: Spread out tasks evenly when not overloaded When there is only one CPU per group, using the idle CPUs to evenly spread tasks doesn't make sense and nr_running is a better metrics. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-8-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:54 +02:00
Vincent Guittot	b0fb1eb4f0	sched/fair: Use load instead of runnable load in load_balance() 'runnable load' was originally introduced to take into account the case where blocked load biases the load balance decision which was selecting underutilized groups with huge blocked load whereas other groups were overloaded. The load is now only used when groups are overloaded. In this case, it's worth being conservative and taking into account the sleeping tasks that might wake up on the CPU. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-7-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:54 +02:00
Vincent Guittot	5e23e47443	sched/fair: Use rq->nr_running when balancing load CFS load_balance() only takes care of CFS tasks whereas CPUs can be used by other scheduling classes. Typically, a CFS task preempted by an RT or deadline task will not get a chance to be pulled by another CPU because load_balance() doesn't take into account tasks from other classes. Add sum of nr_running in the statistics and use it to detect such situations. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-6-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:54 +02:00
Vincent Guittot	0b0695f2b3	sched/fair: Rework load_balance() The load_balance() algorithm contains some heuristics which have become meaningless since the rework of the scheduler's metrics like the introduction of PELT. Furthermore, load is an ill-suited metric for solving certain task placement imbalance scenarios. For instance, in the presence of idle CPUs, we should simply try to get at least one task per CPU, whereas the current load-based algorithm can actually leave idle CPUs alone simply because the load is somewhat balanced. The current algorithm ends up creating virtual and meaningless values like the avg_load_per_task or tweaks the state of a group to make it overloaded whereas it's not, in order to try to migrate tasks. load_balance() should better qualify the imbalance of the group and clearly define what has to be moved to fix this imbalance. The type of sched_group has been extended to better reflect the type of imbalance. We now have: group_has_spare group_fully_busy group_misfit_task group_asym_packing group_imbalanced group_overloaded Based on the type of sched_group, load_balance now sets what it wants to move in order to fix the imbalance. It can be some load as before but also some utilization, a number of task or a type of task: migrate_task migrate_util migrate_load migrate_misfit This new load_balance() algorithm fixes several pending wrong tasks placement: - the 1 task per CPU case with asymmetric system - the case of cfs task preempted by other class - the case of tasks not evenly spread on groups with spare capacity Also the load balance decisions have been consolidated in the 3 functions below after removing the few bypasses and hacks of the current code: - update_sd_pick_busiest() select the busiest sched_group. - find_busiest_group() checks if there is an imbalance between local and busiest group. - calculate_imbalance() decides what have to be moved. Finally, the now unused field total_running of struct sd_lb_stats has been removed. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: riel@surriel.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-5-git-send-email-vincent.guittot@linaro.org [ Small readability and spelling updates. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:53 +02:00
Vincent Guittot	fcf0553db6	sched/fair: Remove meaningless imbalance calculation Clean up load_balance() and remove meaningless calculation and fields before adding a new algorithm. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Rik van Riel <riel@surriel.com> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-4-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:53 +02:00
Vincent Guittot	a349834703	sched/fair: Rename sg_lb_stats::sum_nr_running to sum_h_nr_running Rename sum_nr_running to sum_h_nr_running because it effectively tracks cfs->h_nr_running so we can use sum_nr_running to track rq->nr_running when needed. There are no functional changes. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Rik van Riel <riel@surriel.com> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: srikar@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/1571405198-27570-3-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:53 +02:00
Vincent Guittot	490ba971d8	sched/fair: Clean up asym packing Clean up asym packing to follow the default load balance behavior: - classify the group by creating a group_asym_packing field. - calculate the imbalance in calculate_imbalance() instead of bypassing it. We don't need to test twice same conditions anymore to detect asym packing and we consolidate the calculation of imbalance in calculate_imbalance(). There is no functional changes. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Rik van Riel <riel@surriel.com> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hdanton@sina.com Cc: parth@linux.ibm.com Cc: pauld@redhat.com Cc: quentin.perret@arm.com Cc: srikar@linux.vnet.ibm.com Cc: valentin.schneider@arm.com Link: https://lkml.kernel.org/r/1571405198-27570-2-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-21 09:40:53 +02:00
Greg Kroah-Hartman	630839ac24	Merge 5.4-rc3 into android-mainline Linux 5.4-rc3 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia87ba662738dd58ddb917e32c1fbd812861e7a46	2019-10-17 05:28:13 -07:00
Xuewei Zhang	4929a4e6fa	sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision The quota/period ratio is used to ensure a child task group won't get more bandwidth than the parent task group, and is calculated as: normalized_cfs_quota() = [(quota_us << 20) / period_us] If the quota/period ratio was changed during this scaling due to precision loss, it will cause inconsistency between parent and child task groups. See below example: A userspace container manager (kubelet) does three operations: 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us. 2) Create a few children cgroups. 3) Set quota to 1,000us and period to 10,000us on a child cgroup. These operations are expected to succeed. However, if the scaling of 147/128 happens before step 3, quota and period of the parent cgroup will be changed: new_quota: 1148437ns, 1148us new_period: 11484375ns, 11484us And when step 3 comes in, the ratio of the child cgroup will be 104857, which will be larger than the parent cgroup ratio (104821), and will fail. Scaling them by a factor of 2 will fix the problem. Tested-by: Phil Auld <pauld@redhat.com> Signed-off-by: Xuewei Zhang <xueweiz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Phil Auld <pauld@redhat.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Fixes: `2e8e192263` ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup") Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-09 12:38:02 +02:00
Quentin Perret	760b82c9b8	ANDROID: sched/fair: Bias EAS placement for latency Add to find_energy_efficient_cpu() a latency sensitive case which mimics what was done for prefer-idle in android-4.19 and before (see [1] for reference). This isn't strictly equivalent to the legacy algorithm but comes real close, and isn't very invasive. Overall, the idea is to select the biggest idle CPU we can find for latency-sensitive boosted tasks, and the smallest CPU where the can fit for latency-sensitive non-boosted tasks. The main differences with the legacy behaviour are the following: 1. the policy for 'prefer idle' when there isn't a single idle CPU in the system is simpler now. We just pick the CPU with the highest spare capacity; 2. the cstate awareness is implemented by minimizing the exit latency rather than the idle state index. This is how it is done in the slow path (find_idlest_group_cpu()), it doesn't require us to keep hooks into CPUIdle, and should actually be better because what we want is a CPU that can wake up quickly; 3. non-latency-sensitive tasks just use the standard mainline energy-aware wake-up path, which decides the placement using the Energy Model; 4. the 'boosted' and 'latency_sensitive' attributes of a task come from util_clamp (which now replaces schedtune). [1] `c27c56105d` Change-Id: Ia58516906e9cb5abe08385a8cd088097043d8703 Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:34 +01:00
Chris Redpath	44ab1611a8	ANDROID: sched/fair: Also do misfit in overloaded groups If we can classify the group as overloaded, that overrides any classification as misfit but we may still have misfit tasks present. Check the rq we're looking at to see if this is the case. Change-Id: Ida8eb66aa625e34de3fe2ee1b0dd8a78926273d8 Signed-off-by: Chris Redpath <chris.redpath@arm.com> [Removed stray reference to rq_has_misfit] Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:34 +01:00
Chris Redpath	a7455f8123	ANDROID: sched/fair: Don't balance misfits if it would overload local group When load balancing in a system with misfit tasks present, if we always pull a misfit task to the local group this can lead to pulling a running task from a smaller capacity CPUs to a bigger CPU which is busy. In this situation, the pulled task is likely not to get a chance to run before an idle balance on another small CPU pulls it back. This penalises the pulled task as it is stopped for a short amount of time and then likely relocated to a different CPU (since the original CPU just did a NEWLY_IDLE balance and reset the periodic interval). If we only do this unconditionally for NEWLY_IDLE balance, we can be sure that any tasks and load which are present on the local group are related to short-running tasks which we are happy to displace for a longer running task in a system with misfit tasks present. However, other balance types should only pull a task if we think that the local group is underutilized - checking the number of tasks gives us a conservative estimate here since if they were short tasks we would have been doing NEWLY_IDLE balances instead. Change-Id: I710add1ab1139482620b6addc8370ad194791beb Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:34 +01:00
Chris Redpath	f351885fc7	ANDROID: sched/fair: Attempt to improve throughput for asym cap systems In some systems the capacity and group weights line up to defeat all the small imbalance correction conditions in fix_small_imbalance, which can cause bad task placement. Add a new condition if the existing code can't see anything to fix: If we have asymmetric capacity, and there are more tasks than CPUs in the busiest group and there are less tasks than CPUs in the local group then we try to pull something. There could be transient small tasks which prevent this from working, but on the whole it is beneficial for those systems with inconvenient capacity/cluster size relationships. Change-Id: Icf81cde215c082a61f816534b7990ccb70aee409 Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:33 +01:00
Morten Rasmussen	65b8ddef4e	ANDROID: sched: Prevent unnecessary active balance of single task in sched group Scenarios with the busiest group having just one task and the local being idle on topologies with sched groups with different numbers of cpus manage to dodge all load-balance bailout conditions resulting the nr_balance_failed counter to be incremented. This eventually causes a pointless active migration of the task. This patch prevents this by not incrementing the counter when the busiest group only has one task. ASYM_PACKING migrations and migrations due to reduced capacity should still take place as these are explicitly captured by need_active_balance(). A better solution would be to not attempt the load-balance in the first place, but that requires significant changes to the order of bailout conditions and statistics gathering. Change-Id: I28f69c72febe0211decbe77b7bc3e48839d3d7b3 cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:33 +01:00
Dietmar Eggemann	24c4437797	ANDROID: sched: Update max cpu capacity in case of max frequency constraints Wakeup balancing uses cpu capacity awareness and needs to know the system-wide maximum cpu capacity. Patch "sched: Store system-wide maximum cpu capacity in root domain" finds the system-wide maximum cpu capacity during scheduler domain hierarchy setup. This is sufficient as long as maximum frequency invariance is not enabled. If it is enabled, the system-wide maximum cpu capacity can change between scheduler domain hierarchy setups due to frequency capping. The cpu capacity is changed in update_cpu_capacity() which is called in load balance on the lowest scheduler domain hierarchy level. To be able to know if a change in cpu capacity for a certain cpu also has an effect on the system-wide maximum cpu capacity it is normally necessary to iterate over all cpus. This would be way too costly. That's why this patch follows a different approach. The unsigned long max_cpu_capacity value in struct root_domain is replaced with a struct max_cpu_capacity, containing value (the max_cpu_capacity) and cpu (the cpu index of the cpu providing the maximum cpu_capacity). Changes to the system-wide maximum cpu capacity and the cpu index are made if: 1 System-wide maximum cpu capacity < cpu capacity 2 System-wide maximum cpu capacity > cpu capacity and cpu index == cpu There are no changes to the system-wide maximum cpu capacity in all other cases. Atomic read and write access to the pair (max_cpu_capacity.val, max_cpu_capacity.cpu) is enforced by max_cpu_capacity.lock. The access to max_cpu_capacity.val in task_fits_max() is still performed without taking the max_cpu_capacity.lock. The code to set max cpu capacity in build_sched_domains() has been removed because the whole functionality is now provided by update_cpu_capacity() instead. This approach can introduce errors temporarily, e.g. in case the cpu currently providing the max cpu capacity has its cpu capacity lowered due to frequency capping and calls update_cpu_capacity() before any cpu which might provide the max cpu now. Change-Id: I5063befab088fbf49e5d5e484ce0c6ee6165283a Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>* Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (- Fixed cherry-pick issues, and conflict with `a0fe2cf086` "sched/fair: Tune down misfit NOHZ kicks" which makes use of max_cpu_capacity - Squashed "sched/fair: remove printk while schedule is in progress" fix from Caesar Wang <wxt@rock-chips.com>) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:33 +01:00
Dietmar Eggemann	27ad2abc03	ANDROID: sched/fair: add arch scaling function for max frequency capping To be able to scale the cpu capacity by this factor introduce a call to the new arch scaling function arch_scale_max_freq_capacity() in update_cpu_capacity() and provide a default implementation which returns SCHED_CAPACITY_SCALE. Another subsystem (e.g. cpufreq) or architectural or platform specific code can overwrite this default implementation, exactly as for frequency and cpu invariance. It has to be enabled by the arch by defining arch_scale_max_freq_capacity to the actual implementation. Change-Id: I770a8b1f4f7340e9e314f71c64a765bf880f4b4d Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> ( Fixed conflict with scaling against the PELT-based scale_rt_capacity ) Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:33 +01:00
Chris Redpath	79e3a4a27e	ANDROID: sched: Unconditionally honor sync flag for energy-aware wakeups Since we don't do energy-aware wakeups when we are overutilized, always honoring sync wakeups in this state does not prevent wake-wide mechanics overruling the flag as normal. This patch is based upon previous work to build EAS for android products. sync-hint code taken from commit 4a5e890ec60d "sched/fair: add tunable to force selection at cpu granularity" written by Juri Lelli <juri.lelli@arm.com> Change-Id: I4b3d79141fc8e53dc51cd63ac11096c2e3cb10f5 Signed-off-by: Chris Redpath <chris.redpath@arm.com> (cherry-picked from commit f1ec666a62dec1083ed52fe1ddef093b84373aaf) [ Moved the feature to find_energy_efficient_cpu() and removed the sysctl knob ] Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:33 +01:00
Patrick Bellasi	b61876ed12	ANDROID: sched/fair: EAS: Add uclamp support to find_energy_efficient_cpu() Utilization clamping can be used to boost the utilization of small tasks or cap that of big tasks. Thus, one of its possible usages is to bias tasks placement to "promote" small tasks on higher capacity (less energy efficient) CPUs or "constraint" big tasks on small capacity (more energy efficient) CPUs. When the Energy Aware Scheduler (EAS) looks for the most energy efficient CPU to run a task on, it currently considers only the effective utiliation estimated for a task. Fix this by adding an additional check to skip CPUs which capacity is smaller then the task clamped utilization. Change-Id: I43fa6fa27e27c1eb5272c6a45d1a6a5b0faae1aa Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-10-01 10:36:32 +01:00
Quentin Perret	4892f51ad5	sched/fair: Avoid redundant EAS calculation The EAS wake-up path computes the system energy for several CPU candidates: the CPU with maximum spare capacity in each performance domain, and the prev_cpu. However, if prev_cpu also happens to be the CPU with maximum spare capacity in its performance domain, the energy calculation is still done twice, unnecessarily. Add a condition to filter out this corner case before doing the energy calculation. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Quentin Perret <qperret@qperret.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: qais.yousef@arm.com Cc: rjw@rjwysocki.net Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Fixes: `eb92692b25` ("sched/fair: Speed-up energy-aware wake-ups") Link: https://lkml.kernel.org/r/20190920094115.GA11503@qperret.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-09-25 17:42:32 +02:00
Qian Cai	763a9ec06c	sched/fair: Fix -Wunused-but-set-variable warnings Commit: `de53fd7aed` ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices") introduced a few compilation warnings: kernel/sched/fair.c: In function '__refill_cfs_bandwidth_runtime': kernel/sched/fair.c:4365:6: warning: variable 'now' set but not used [-Wunused-but-set-variable] kernel/sched/fair.c: In function 'start_cfs_bandwidth': kernel/sched/fair.c:4992:6: warning: variable 'overrun' set but not used [-Wunused-but-set-variable] Also, __refill_cfs_bandwidth_runtime() does no longer update the expiration time, so fix the comments accordingly. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Reviewed-by: Dave Chiluk <chiluk+linux@indeed.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pauld@redhat.com Fixes: `de53fd7aed` ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices") Link: https://lkml.kernel.org/r/1566326455-8038-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-09-25 17:42:31 +02:00
Eric W. Biederman	154abafc68	tasks, sched/core: With a grace period after finish_task_switch(), remove unnecessary code Remove work arounds that were written before there was a grace period after tasks left the runqueue in finish_task_switch(). In particular now that there tasks exiting the runqueue exprience a RCU grace period none of the work performed by task_rcu_dereference() excpet the rcu_dereference() is necessary so replace task_rcu_dereference() with rcu_dereference(). Remove the code in rcuwait_wait_event() that checks to ensure the current task has not exited. It is no longer necessary as it is guaranteed that any running task will experience a RCU grace period after it leaves the run queueue. Remove the comment in rcuwait_wake_up() as it is no longer relevant. Ref: `8f95c90ceb` ("sched/wait, RCU: Introduce rcuwait machinery") Ref: `150593bf86` ("sched/api: Introduce task_rcu_dereference() and try_get_task_struct()") Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87lfurdpk9.fsf_-_@x220.int.ebiederm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-09-25 17:42:29 +02:00
Qian Cai	dac9f027b1	sched/fair: Remove unused cfs_rq_clock_task() function cfs_rq_clock_task() was first introduced and used in: `f1b17280ef` ("sched: Maintain runnable averages across throttled periods") Over time its use has been graduately removed by the following commits: `d31b1a66cb` ("sched/fair: Factorize PELT update") `2312729688` ("sched/fair: Update scale invariance of PELT") Today, there is no single user left, so it can be safely removed. Found via the -Wunused-function build warning. Signed-off-by: Qian Cai <cai@lca.pw> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/1568668775-2127-1-git-send-email-cai@lca.pw [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-09-17 09:55:02 +02:00

... 3 4 5 6 7 ...

1140 Commits