This patch adds a parameter to select_task_rq, sibling_count_hint
allowing the caller, where it has this information, to inform the
sched_class the number of tasks that are being woken up as part of
the same event.
The wake_q mechanism is one case where this information is available.
select_task_rq_fair can then use the information to detect that it
needs to widen the search space for task placement in order to avoid
overloading the last-level cache domain's CPUs.
* * *
The reason I am investigating this change is the following use case
on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which
all repeatedly do X amount of work then
pthread_barrier_wait (i.e. sleep until the last task finishes its X
and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU
finish faster, and then those CPUs pull over the tasks that are still
running:
v CPU v ->time->
-------------
0 (big) 11111 /333
-------------
1 (big) 22222 /444|
-------------
2 (LITTLE) 333333/
-------------
3 (LITTLE) 444444/
-------------
Now when task 4 hits the barrier (at |) and wakes the others up,
there are 4 tasks with prev_cpu=<big> and 0 tasks with
prev_cpu=<little>. want_affine therefore means that we'll only look
in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled
on the bigs until the next load balance, something like this:
v CPU v ->time->
------------------------
0 (big) 11111 /333 31313\33333
------------------------
1 (big) 22222 /444|424\4444444
------------------------
2 (LITTLE) 333333/ \222222
------------------------
3 (LITTLE) 444444/ \1111
------------------------
^^^
underutilization
So, I'm trying to get want_affine = 0 for these tasks.
I don't _think_ any incarnation of the wakee_flips mechanism can help
us here because which task is waker and which tasks are wakees
generally changes with each iteration.
However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the
nice property that we know exactly how many tasks are being woken, so
we can cheat.
It might be a disadvantage that we "widen" _every_ task that's woken in
an event, while select_idle_sibling would work fine for the first
sd_llc_size - 1 tasks.
IIUC, if wake_affine() behaves correctly this trick wouldn't be
necessary on SMP systems, so it might be best guarded by the presence
of SD_ASYM_CPUCAPACITY?
* * *
Final note..
In order to observe "perfect" behaviour for this use case, I also had
to disable the TTWU_QUEUE sched feature. Suppose during the wakeup
above we are working through the work queue and have placed tasks 3
and 2, and are about to place task 1:
v CPU v ->time->
--------------
0 (big) 11111 /333 3
--------------
1 (big) 22222 /444|4
--------------
2 (LITTLE) 333333/ 2
--------------
3 (LITTLE) 444444/ <- Task 1 should go here
--------------
If TTWU_QUEUE is enabled, we will not yet have enqueued task
2 (having instead sent a reschedule IPI) or attached its load to CPU
2. So we are likely to also place task 1 on cpu 2. Disabling
TTWU_QUEUE means that we enqueue task 2 before placing task 1,
solving this issue. TTWU_QUEUE is there to minimise rq lock
contention, and I guess that this contention is less of an issue on
big.LITTLE systems since they have relatively few CPUs, which
suggests the trade-off makes sense here.
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
( - Applied from https://patchwork.kernel.org/patch/9895261/
- Fixed trivial conflict in kernel/sched/core.c
- Fixed select_task_rq_idle, now in kernel/sched/idle.c
- Fixed trivial conflict in select_task_rq_fair )
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Change-Id: I3cfc4bf48c3d7feef969db4d22449f4fbb4f795d
[satyap@codeaurora.org: port to 5.4 and fix trivial merge conflicts]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
70 lines
2.2 KiB
C
70 lines
2.2 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _LINUX_SCHED_WAKE_Q_H
|
|
#define _LINUX_SCHED_WAKE_Q_H
|
|
|
|
/*
|
|
* Wake-queues are lists of tasks with a pending wakeup, whose
|
|
* callers have already marked the task as woken internally,
|
|
* and can thus carry on. A common use case is being able to
|
|
* do the wakeups once the corresponding user lock as been
|
|
* released.
|
|
*
|
|
* We hold reference to each task in the list across the wakeup,
|
|
* thus guaranteeing that the memory is still valid by the time
|
|
* the actual wakeups are performed in wake_up_q().
|
|
*
|
|
* One per task suffices, because there's never a need for a task to be
|
|
* in two wake queues simultaneously; it is forbidden to abandon a task
|
|
* in a wake queue (a call to wake_up_q() _must_ follow), so if a task is
|
|
* already in a wake queue, the wakeup will happen soon and the second
|
|
* waker can just skip it.
|
|
*
|
|
* The DEFINE_WAKE_Q macro declares and initializes the list head.
|
|
* wake_up_q() does NOT reinitialize the list; it's expected to be
|
|
* called near the end of a function. Otherwise, the list can be
|
|
* re-initialized for later re-use by wake_q_init().
|
|
*
|
|
* NOTE that this can cause spurious wakeups. schedule() callers
|
|
* must ensure the call is done inside a loop, confirming that the
|
|
* wakeup condition has in fact occurred.
|
|
*
|
|
* NOTE that there is no guarantee the wakeup will happen any later than the
|
|
* wake_q_add() location. Therefore task must be ready to be woken at the
|
|
* location of the wake_q_add().
|
|
*/
|
|
|
|
#include <linux/sched.h>
|
|
|
|
struct wake_q_head {
|
|
struct wake_q_node *first;
|
|
struct wake_q_node **lastp;
|
|
#ifdef CONFIG_SCHED_WALT
|
|
int count;
|
|
#endif
|
|
};
|
|
|
|
#define WAKE_Q_TAIL ((struct wake_q_node *) 0x01)
|
|
|
|
#define DEFINE_WAKE_Q(name) \
|
|
struct wake_q_head name = { WAKE_Q_TAIL, &name.first }
|
|
|
|
static inline void wake_q_init(struct wake_q_head *head)
|
|
{
|
|
head->first = WAKE_Q_TAIL;
|
|
head->lastp = &head->first;
|
|
#ifdef CONFIG_SCHED_WALT
|
|
head->count = 0;
|
|
#endif
|
|
}
|
|
|
|
static inline bool wake_q_empty(struct wake_q_head *head)
|
|
{
|
|
return head->first == WAKE_Q_TAIL;
|
|
}
|
|
|
|
extern void wake_q_add(struct wake_q_head *head, struct task_struct *task);
|
|
extern void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task);
|
|
extern void wake_up_q(struct wake_q_head *head);
|
|
|
|
#endif /* _LINUX_SCHED_WAKE_Q_H */
|