s390: improve wait logic of stop_machine

The stop_machine loop to advance the state machine and to wait for all
affected CPUs to check-in calls cpu_relax_yield in a tight loop until
the last missing CPUs acknowledged the state transition.

On a virtual system where not all logical CPUs are backed by real CPUs
all the time it can take a while for all CPUs to check-in. With the
current definition of cpu_relax_yield a diagnose 0x44 is done which
tells the hypervisor to schedule *some* other CPU. That can be any
CPU and not necessarily one of the CPUs that need to run in order to
advance the state machine. This can lead to a pretty bad diagnose 0x44
storm until the last missing CPU finally checked-in.

Replace the undirected cpu_relax_yield based on diagnose 0x44 with a
directed yield. Each CPU in the wait loop will pick up the next CPU
in the cpumask of stop_machine. The diagnose 0x9c is used to tell the
hypervisor to run this next CPU instead of the current one. If there
is only a limited number of real CPUs backing the virtual CPUs we
end up with the real CPUs passed around in a round-robin fashion.

[heiko.carstens@de.ibm.com]:
    Use cpumask_next_wrap as suggested by Peter Zijlstra.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This commit is contained in:
Martin Schwidefsky
2019-05-17 12:50:42 +02:00
committed by Heiko Carstens
parent 7928260539
commit 38f2c691a4
5 changed files with 25 additions and 13 deletions

View File

@@ -31,6 +31,7 @@ struct cpu_info {
};
static DEFINE_PER_CPU(struct cpu_info, cpu_info);
static DEFINE_PER_CPU(int, cpu_relax_retry);
static bool machine_has_cpu_mhz;
@@ -58,13 +59,19 @@ void s390_update_cpu_mhz(void)
on_each_cpu(update_cpu_mhz, NULL, 0);
}
void notrace cpu_relax_yield(void)
void notrace cpu_relax_yield(const struct cpumask *cpumask)
{
if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) {
diag_stat_inc(DIAG_STAT_X044);
asm volatile("diag 0,0,0x44");
int cpu, this_cpu;
this_cpu = smp_processor_id();
if (__this_cpu_inc_return(cpu_relax_retry) >= spin_retry) {
__this_cpu_write(cpu_relax_retry, 0);
cpu = cpumask_next_wrap(this_cpu, cpumask, this_cpu, false);
if (cpu >= nr_cpu_ids)
return;
if (arch_vcpu_is_preempted(cpu))
smp_yield_cpu(cpu);
}
barrier();
}
EXPORT_SYMBOL(cpu_relax_yield);