Root/
1 | Last reviewed: 06/02/2009 |
2 | |
3 | HP iLO2 NMI Watchdog Driver |
4 | NMI sourcing for iLO2 based ProLiant Servers |
5 | Documentation and Driver by |
6 | Thomas Mingarelli <thomas.mingarelli@hp.com> |
7 | |
8 | The HP iLO2 NMI Watchdog driver is a kernel module that provides basic |
9 | watchdog functionality and the added benefit of NMI sourcing. Both the |
10 | watchdog functionality and the NMI sourcing capability need to be enabled |
11 | by the user. Remember that the two modes are not dependant on one another. |
12 | A user can have the NMI sourcing without the watchdog timer and vice-versa. |
13 | |
14 | Watchdog functionality is enabled like any other common watchdog driver. That |
15 | is, an application needs to be started that kicks off the watchdog timer. A |
16 | basic application exists in the Documentation/watchdog/src directory called |
17 | watchdog-test.c. Simply compile the C file and kick it off. If the system |
18 | gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will |
19 | not be updated in a timely fashion and a hardware system reset (also known as |
20 | an Automatic Server Recovery (ASR)) event will occur. |
21 | |
22 | The hpwdt driver also has four (4) module parameters. They are the following: |
23 | |
24 | soft_margin - allows the user to set the watchdog timer value |
25 | allow_kdump - allows the user to save off a kernel dump image after an NMI |
26 | nowayout - basic watchdog parameter that does not allow the timer to |
27 | be restarted or an impending ASR to be escaped. |
28 | priority - determines whether or not the hpwdt driver is first on the |
29 | die_notify list to handle NMIs or last. The default value |
30 | for this module parameter is 0 or LAST. If the user wants to |
31 | enable NMI sourcing then reload the hpwdt driver with |
32 | priority=1 (and boot with nmi_watchdog=0). |
33 | |
34 | NOTE: More information about watchdog drivers in general, including the ioctl |
35 | interface to /dev/watchdog can be found in |
36 | Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt. |
37 | |
38 | The priority parameter was introduced due to other kernel software that relied |
39 | on handling NMIs (like oprofile). Keeping hpwdt's priority at 0 (or LAST) |
40 | enables the users of NMIs for non critical events to be work as expected. |
41 | |
42 | The NMI sourcing capability is disabled by default due to the inability to |
43 | distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the |
44 | Linux kernel. What this means is that the hpwdt nmi handler code is called |
45 | each time the NMI signal fires off. This could amount to several thousands of |
46 | NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and |
47 | confused" message in the logs or if the system gets into a hung state, then |
48 | the hpwdt driver can be reloaded with the "priority" module parameter set |
49 | (priority=1). |
50 | |
51 | 1. If the kernel has not been booted with nmi_watchdog turned off then |
52 | edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the |
53 | currently booting kernel line. |
54 | 2. reboot the sever |
55 | 3. Once the system comes up perform a rmmod hpwdt |
56 | 4. insmod /lib/modules/`uname -r`/kernel/drivers/char/watchdog/hpwdt.ko priority=1 |
57 | |
58 | Now, the hpwdt can successfully receive and source the NMI and provide a log |
59 | message that details the reason for the NMI (as determined by the HP BIOS). |
60 | |
61 | Below is a list of NMIs the HP BIOS understands along with the associated |
62 | code (reason): |
63 | |
64 | No source found 00h |
65 | |
66 | Uncorrectable Memory Error 01h |
67 | |
68 | ASR NMI 1Bh |
69 | |
70 | PCI Parity Error 20h |
71 | |
72 | NMI Button Press 27h |
73 | |
74 | SB_BUS_NMI 28h |
75 | |
76 | ILO Doorbell NMI 29h |
77 | |
78 | ILO IOP NMI 2Ah |
79 | |
80 | ILO Watchdog NMI 2Bh |
81 | |
82 | Proc Throt NMI 2Ch |
83 | |
84 | Front Side Bus NMI 2Dh |
85 | |
86 | PCI Express Error 2Fh |
87 | |
88 | DMA controller NMI 30h |
89 | |
90 | Hypertransport/CSI Error 31h |
91 | |
92 | |
93 | |
94 | -- Tom Mingarelli |
95 | (thomas.mingarelli@hp.com) |
96 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9