Root/
1 | |
2 | [NMI watchdog is available for x86 and x86-64 architectures] |
3 | |
4 | Is your system locking up unpredictably? No keyboard activity, just |
5 | a frustrating complete hard lockup? Do you want to help us debugging |
6 | such lockups? If all yes then this document is definitely for you. |
7 | |
8 | On many x86/x86-64 type hardware there is a feature that enables |
9 | us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt |
10 | which get executed even if the system is otherwise locked up hard). |
11 | This can be used to debug hard kernel lockups. By executing periodic |
12 | NMI interrupts, the kernel can monitor whether any CPU has locked up, |
13 | and print out debugging messages if so. |
14 | |
15 | In order to use the NMI watchdog, you need to have APIC support in your |
16 | kernel. For SMP kernels, APIC support gets compiled in automatically. For |
17 | UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local |
18 | APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and |
19 | features -> IO-APIC support on uniprocessors) in your kernel config. |
20 | CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC. |
21 | CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain |
22 | kernel debugging options, such as Kernel Stack Meter or Kernel Tracer, |
23 | may implicitly disable the NMI watchdog.] |
24 | |
25 | For x86-64, the needed APIC is always compiled in. |
26 | |
27 | Using local APIC (nmi_watchdog=2) needs the first performance register, so |
28 | you can't use it for other purposes (such as high precision performance |
29 | profiling.) However, at least oprofile and the perfctr driver disable the |
30 | local APIC NMI watchdog automatically. |
31 | |
32 | To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot |
33 | parameter. Eg. the relevant lilo.conf entry: |
34 | |
35 | append="nmi_watchdog=1" |
36 | |
37 | For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1. |
38 | For UP machines without an IO-APIC use nmi_watchdog=2, this only works |
39 | for some processor types. If in doubt, boot with nmi_watchdog=1 and |
40 | check the NMI count in /proc/interrupts; if the count is zero then |
41 | reboot with nmi_watchdog=2 and check the NMI count. If it is still |
42 | zero then log a problem, you probably have a processor that needs to be |
43 | added to the nmi code. |
44 | |
45 | A 'lockup' is the following scenario: if any CPU in the system does not |
46 | execute the period local timer interrupt for more than 5 seconds, then |
47 | the NMI handler generates an oops and kills the process. This |
48 | 'controlled crash' (and the resulting kernel messages) can be used to |
49 | debug the lockup. Thus whenever the lockup happens, wait 5 seconds and |
50 | the oops will show up automatically. If the kernel produces no messages |
51 | then the system has crashed so hard (eg. hardware-wise) that either it |
52 | cannot even accept NMI interrupts, or the crash has made the kernel |
53 | unable to print messages. |
54 | |
55 | Be aware that when using local APIC, the frequency of NMI interrupts |
56 | it generates, depends on the system load. The local APIC NMI watchdog, |
57 | lacking a better source, uses the "cycles unhalted" event. As you may |
58 | guess it doesn't tick when the CPU is in the halted state (which happens |
59 | when the system is idle), but if your system locks up on anything but the |
60 | "hlt" processor instruction, the watchdog will trigger very soon as the |
61 | "cycles unhalted" event will happen every clock tick. If it locks up on |
62 | "hlt", then you are out of luck -- the event will not happen at all and the |
63 | watchdog won't trigger. This is a shortcoming of the local APIC watchdog |
64 | -- unfortunately there is no "clock ticks" event that would work all the |
65 | time. The I/O APIC watchdog is driven externally and has no such shortcoming. |
66 | But its NMI frequency is much higher, resulting in a more significant hit |
67 | to the overall system performance. |
68 | |
69 | On x86 nmi_watchdog is disabled by default so you have to enable it with |
70 | a boot time parameter. |
71 | |
72 | It's possible to disable the NMI watchdog in run-time by writing "0" to |
73 | /proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable |
74 | the NMI watchdog. Notice that you still need to use "nmi_watchdog=" parameter |
75 | at boot time. |
76 | |
77 | NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally |
78 | on x86 SMP boxes. |
79 | |
80 | [ feel free to send bug reports, suggestions and patches to |
81 | Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing |
82 | list at <linux-smp@vger.kernel.org> ] |
83 | |
84 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9