debug lockups: Improve lockup detection
When debugging a recent lockup bug i found various deficiencies in how our current lockup detection helpers work: - SysRq-L is not very efficient as it uses a workqueue, hence it cannot punch through hard lockups and cannot see through most soft lockups either. - The SysRq-L code depends on the NMI watchdog - which is off by default. - We dont print backtraces from the RCU code's built-in 'RCU state machine is stuck' debug code. This debug code tends to be one of the first (and only) mechanisms that show that a lockup has occured. This patch changes the code so taht we: - Trigger the NMI backtrace code from SysRq-L instead of using a workqueue (which cannot punch through hard lockups) - Trigger print-all-CPU-backtraces from the RCU lockup detection code Also decouple the backtrace printing code from the NMI watchdog: - Dont use variable size cpumasks (it might not be initialized and they are a bit more fragile anyway) - Trigger an NMI immediately via an IPI, instead of waiting for the NMI tick to occur. This is a lot faster and can produce more relevant backtraces. It will also work if the NMI watchdog is disabled. - Dont print the 'dazed and confused' message when we print a backtrace from the NMI - Do a show_regs() plus a dump_stack() to get maximum info out of the dump. Worst-case we get two stacktraces - which is not a big deal. Sometimes, if register content is corrupted, the precise stack walker in show_regs() wont give us a full backtrace - in this case dump_stack() will do it. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit is contained in:
@@ -24,6 +24,7 @@
|
||||
#include <linux/sysrq.h>
|
||||
#include <linux/kbd_kern.h>
|
||||
#include <linux/proc_fs.h>
|
||||
#include <linux/nmi.h>
|
||||
#include <linux/quotaops.h>
|
||||
#include <linux/perf_counter.h>
|
||||
#include <linux/kernel.h>
|
||||
@@ -222,12 +223,7 @@ static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
|
||||
|
||||
static void sysrq_handle_showallcpus(int key, struct tty_struct *tty)
|
||||
{
|
||||
struct pt_regs *regs = get_irq_regs();
|
||||
if (regs) {
|
||||
printk(KERN_INFO "CPU%d:\n", smp_processor_id());
|
||||
show_regs(regs);
|
||||
}
|
||||
schedule_work(&sysrq_showallcpus);
|
||||
trigger_all_cpu_backtrace();
|
||||
}
|
||||
|
||||
static struct sysrq_key_op sysrq_showallcpus_op = {
|
||||
|
Reference in New Issue
Block a user