Root/
Source at commit b13e7eb172b6f08e5fc22da162bdde5fcde201b5 created 11 years 11 months ago. By Maarten ter Huurne, fbcon: Add 6x10 font | |
---|---|
1 | Table of contents |
2 | ================= |
3 | |
4 | Last updated: 20 December 2005 |
5 | |
6 | Contents |
7 | ======== |
8 | |
9 | - Introduction |
10 | - Devices not appearing |
11 | - Finding patch that caused a bug |
12 | -- Finding using git-bisect |
13 | -- Finding it the old way |
14 | - Fixing the bug |
15 | |
16 | Introduction |
17 | ============ |
18 | |
19 | Always try the latest kernel from kernel.org and build from source. If you are |
20 | not confident in doing that please report the bug to your distribution vendor |
21 | instead of to a kernel developer. |
22 | |
23 | Finding bugs is not always easy. Have a go though. If you can't find it don't |
24 | give up. Report as much as you have found to the relevant maintainer. See |
25 | MAINTAINERS for who that is for the subsystem you have worked on. |
26 | |
27 | Before you submit a bug report read REPORTING-BUGS. |
28 | |
29 | Devices not appearing |
30 | ===================== |
31 | |
32 | Often this is caused by udev. Check that first before blaming it on the |
33 | kernel. |
34 | |
35 | Finding patch that caused a bug |
36 | =============================== |
37 | |
38 | |
39 | |
40 | Finding using git-bisect |
41 | ------------------------ |
42 | |
43 | Using the provided tools with git makes finding bugs easy provided the bug is |
44 | reproducible. |
45 | |
46 | Steps to do it: |
47 | - start using git for the kernel source |
48 | - read the man page for git-bisect |
49 | - have fun |
50 | |
51 | Finding it the old way |
52 | ---------------------- |
53 | |
54 | [Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)] |
55 | |
56 | This is how to track down a bug if you know nothing about kernel hacking. |
57 | It's a brute force approach but it works pretty well. |
58 | |
59 | You need: |
60 | |
61 | . A reproducible bug - it has to happen predictably (sorry) |
62 | . All the kernel tar files from a revision that worked to the |
63 | revision that doesn't |
64 | |
65 | You will then do: |
66 | |
67 | . Rebuild a revision that you believe works, install, and verify that. |
68 | . Do a binary search over the kernels to figure out which one |
69 | introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but |
70 | you know that 1.3.69 does. Pick a kernel in the middle and build |
71 | that, like 1.3.50. Build & test; if it works, pick the mid point |
72 | between .50 and .69, else the mid point between .28 and .50. |
73 | . You'll narrow it down to the kernel that introduced the bug. You |
74 | can probably do better than this but it gets tricky. |
75 | |
76 | . Narrow it down to a subdirectory |
77 | |
78 | - Copy kernel that works into "test". Let's say that 3.62 works, |
79 | but 3.63 doesn't. So you diff -r those two kernels and come |
80 | up with a list of directories that changed. For each of those |
81 | directories: |
82 | |
83 | Copy the non-working directory next to the working directory |
84 | as "dir.63". |
85 | One directory at time, try moving the working directory to |
86 | "dir.62" and mv dir.63 dir"time, try |
87 | |
88 | mv dir dir.62 |
89 | mv dir.63 dir |
90 | find dir -name '*.[oa]' -print | xargs rm -f |
91 | |
92 | And then rebuild and retest. Assuming that all related |
93 | changes were contained in the sub directory, this should |
94 | isolate the change to a directory. |
95 | |
96 | Problems: changes in header files may have occurred; I've |
97 | found in my case that they were self explanatory - you may |
98 | or may not want to give up when that happens. |
99 | |
100 | . Narrow it down to a file |
101 | |
102 | - You can apply the same technique to each file in the directory, |
103 | hoping that the changes in that file are self contained. |
104 | |
105 | . Narrow it down to a routine |
106 | |
107 | - You can take the old file and the new file and manually create |
108 | a merged file that has |
109 | |
110 | #ifdef VER62 |
111 | routine() |
112 | { |
113 | ... |
114 | } |
115 | #else |
116 | routine() |
117 | { |
118 | ... |
119 | } |
120 | #endif |
121 | |
122 | And then walk through that file, one routine at a time and |
123 | prefix it with |
124 | |
125 | #define VER62 |
126 | /* both routines here */ |
127 | #undef VER62 |
128 | |
129 | Then recompile, retest, move the ifdefs until you find the one |
130 | that makes the difference. |
131 | |
132 | Finally, you take all the info that you have, kernel revisions, bug |
133 | description, the extent to which you have narrowed it down, and pass |
134 | that off to whomever you believe is the maintainer of that section. |
135 | A post to linux.dev.kernel isn't such a bad idea if you've done some |
136 | work to narrow it down. |
137 | |
138 | If you get it down to a routine, you'll probably get a fix in 24 hours. |
139 | |
140 | My apologies to Linus and the other kernel hackers for describing this |
141 | brute force approach, it's hardly what a kernel hacker would do. However, |
142 | it does work and it lets non-hackers help fix bugs. And it is cool |
143 | because Linux snapshots will let you do this - something that you can't |
144 | do with vendor supplied releases. |
145 | |
146 | Fixing the bug |
147 | ============== |
148 | |
149 | Nobody is going to tell you how to fix bugs. Seriously. You need to work it |
150 | out. But below are some hints on how to use the tools. |
151 | |
152 | To debug a kernel, use objdump and look for the hex offset from the crash |
153 | output to find the valid line of code/assembler. Without debug symbols, you |
154 | will see the assembler code for the routine shown, but if your kernel has |
155 | debug symbols the C code will also be available. (Debug symbols can be enabled |
156 | in the kernel hacking menu of the menu configuration.) For example: |
157 | |
158 | objdump -r -S -l --disassemble net/dccp/ipv4.o |
159 | |
160 | NB.: you need to be at the top level of the kernel tree for this to pick up |
161 | your C files. |
162 | |
163 | If you don't have access to the code you can also debug on some crash dumps |
164 | e.g. crash dump output as shown by Dave Miller. |
165 | |
166 | > EIP is at ip_queue_xmit+0x14/0x4c0 |
167 | > ... |
168 | > Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00 |
169 | > 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08 |
170 | > <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85 |
171 | > |
172 | > Put the bytes into a "foo.s" file like this: |
173 | > |
174 | > .text |
175 | > .globl foo |
176 | > foo: |
177 | > .byte .... /* bytes from Code: part of OOPS dump */ |
178 | > |
179 | > Compile it with "gcc -c -o foo.o foo.s" then look at the output of |
180 | > "objdump --disassemble foo.o". |
181 | > |
182 | > Output: |
183 | > |
184 | > ip_queue_xmit: |
185 | > push %ebp |
186 | > push %edi |
187 | > push %esi |
188 | > push %ebx |
189 | > sub $0xbc, %esp |
190 | > mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb) |
191 | > mov 0x8(%ebp), %ebx ! %ebx = skb->sk |
192 | > mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt |
193 | |
194 | In addition, you can use GDB to figure out the exact file and line |
195 | number of the OOPS from the vmlinux file. If you have |
196 | CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the |
197 | OOPS: |
198 | |
199 | EIP: 0060:[<c021e50e>] Not tainted VLI |
200 | |
201 | And use GDB to translate that to human-readable form: |
202 | |
203 | gdb vmlinux |
204 | (gdb) l *0xc021e50e |
205 | |
206 | If you don't have CONFIG_DEBUG_INFO enabled, you use the function |
207 | offset from the OOPS: |
208 | |
209 | EIP is at vt_ioctl+0xda8/0x1482 |
210 | |
211 | And recompile the kernel with CONFIG_DEBUG_INFO enabled: |
212 | |
213 | make vmlinux |
214 | gdb vmlinux |
215 | (gdb) p vt_ioctl |
216 | (gdb) l *(0x<address of vt_ioctl> + 0xda8) |
217 | or, as one command |
218 | (gdb) l *(vt_ioctl + 0xda8) |
219 | |
220 | If you have a call trace, such as :- |
221 | >Call Trace: |
222 | > [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5 |
223 | > [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e |
224 | > [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee |
225 | > ... |
226 | this shows the problem in the :jbd: module. You can load that module in gdb |
227 | and list the relevant code. |
228 | gdb fs/jbd/jbd.ko |
229 | (gdb) p log_wait_commit |
230 | (gdb) l *(0x<address> + 0xa3) |
231 | or |
232 | (gdb) l *(log_wait_commit + 0xa3) |
233 | |
234 | |
235 | Another very useful option of the Kernel Hacking section in menuconfig is |
236 | Debug memory allocations. This will help you see whether data has been |
237 | initialised and not set before use etc. To see the values that get assigned |
238 | with this look at mm/slab.c and search for POISON_INUSE. When using this an |
239 | Oops will often show the poisoned data instead of zero which is the default. |
240 | |
241 | Once you have worked out a fix please submit it upstream. After all open |
242 | source is about sharing what you do and don't you want to be recognised for |
243 | your genius? |
244 | |
245 | Please do read Documentation/SubmittingPatches though to help your code get |
246 | accepted. |
247 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9