x86/smpboot: Initialize secondary CPU only if master CPU will wait for it

Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:

  https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.

To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1403266991-12233-1-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
Igor Mammedov
2014-06-20 14:23:11 +02:00
committed by Ingo Molnar
parent 9e82bf0141
commit ce4b1b1650
3 changed files with 52 additions and 81 deletions

View File

@@ -1266,6 +1266,19 @@ static void dbg_restore_debug_regs(void)
#define dbg_restore_debug_regs()
#endif /* ! CONFIG_KGDB */
static void wait_for_master_cpu(int cpu)
{
#ifdef CONFIG_SMP
/*
* wait for ACK from master CPU before continuing
* with AP initialization
*/
WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
while (!cpumask_test_cpu(cpu, cpu_callout_mask))
cpu_relax();
#endif
}
/*
* cpu_init() initializes state that is per-CPU. Some data is already
* initialized (naturally) in the bootstrap process, such as the GDT
@@ -1281,16 +1294,17 @@ void cpu_init(void)
struct task_struct *me;
struct tss_struct *t;
unsigned long v;
int cpu;
int cpu = stack_smp_processor_id();
int i;
wait_for_master_cpu(cpu);
/*
* Load microcode on this cpu if a valid microcode is available.
* This is early microcode loading procedure.
*/
load_ucode_ap();
cpu = stack_smp_processor_id();
t = &per_cpu(init_tss, cpu);
oist = &per_cpu(orig_ist, cpu);
@@ -1302,9 +1316,6 @@ void cpu_init(void)
me = current;
if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
panic("CPU#%d already initialized!\n", cpu);
pr_debug("Initializing CPU#%d\n", cpu);
clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
@@ -1381,13 +1392,9 @@ void cpu_init(void)
struct tss_struct *t = &per_cpu(init_tss, cpu);
struct thread_struct *thread = &curr->thread;
show_ucode_info_early();
wait_for_master_cpu(cpu);
if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
for (;;)
local_irq_enable();
}
show_ucode_info_early();
printk(KERN_INFO "Initializing CPU#%d\n", cpu);