Suresh Kernel Debugging Techniques
Suresh Kernel Debugging Techniques
techniques
Suresh Jayaraman
September 4, 2006
Agenda
Introduction
Kernel Oops
Hangs, Magic SysRq
Demos, Examples
Introduction
Classification of bugs
Kernel Oops
Kernel Panics
Lockups a.k.a hangs (Soft/Hard)
Unexpected behavior
Introduction
Steps involved in fixing bugs
Oops - defined
Interpreting Oops
Fault
EIP = function base address + instruction offset
Oops counter, No. of CPUs
EFLAGS
Registers (general purpose, segment, control
registers)
Call trace return addresses
Approach
10
Points to note
Don't trust Oopsed kernel
Frame pointers support better Stack tracebacks
Always check syslog in case of strange behavior
Linus Torvalds says:
Im afraid that Ive seen too many people fix bugs
by looking at debugger output, and that almost
inevitably leads to fixing the symptoms rather than
the underlying problems.
Use the source Luke
11
Lockups
System just freezes, no messages, no
responses
Types
Hardware lockups
Mostly due to hardware problem
Hardware abuse because of poorly written driver
12
Spinning in a loop
Waiting on a lock
Deadlocks
Symptoms
13
14
Other keys
15
16
SysRq + p
SysRq : Show Regs (SysRq + p)
Pid: 2894, comm:
X
EIP: 0060:[<c020a7a2>] CPU: 0
EIP is at read_chan+0x5/0x5b1
EFLAGS: 00003282 Not tainted (2.6.15-kdb-smp)
EAX: dd885000 EBX: dd88500c ECX: bf9416dc EDX: d94a4d80
ESI: dd885000 EDI: c020a79d EBP: dbdc5f44 DS: 007b ES: 007b
CR0: 80050033 CR2: b6bb4108 CR3: 1fd79000 CR4: 000006d0
[<c01023c5>] show_regs+0x10a/0x115
[<c021467e>] sysrq_handle_showregs+0xe/0x10
[<c02147f7>] __handle_sysrq+0x7a/0xf1
[<c0214881>] handle_sysrq+0x13/0x16
[<c0210040>] kbd_keycode+0x131/0x2f6
[<c021027e>] kbd_event+0x79/0xa7
17
SysRq + p
[<c022b08e>] input_event+0x3d6/0x3f9
[<c022e3a3>] atkbd_report_key+0x5e/0x7e
[<c022e7d0>] atkbd_interrupt+0x40d/0x4dd
[<c02187c0>] serio_interrupt+0x35/0x6e
[<c02191d0>] i8042_interrupt+0x1d8/0x1ea
[<c0140727>] handle_IRQ_event+0x27/0x52
[<c01407df>] __do_IRQ+0x8d/0xe2
[<c01062b9>] do_IRQ+0x49/0x5a
[<c0104e5a>] common_interrupt+0x1a/0x20
[<c020648a>] tty_read+0x63/0xb3
[<c015bf3c>] vfs_read+0xac/0x15b
[<c015c262>] sys_read+0x3b/0x60
[<c0103d9b>] sysenter_past_esp+0x54/0x79
18
Example
Running connectathon tests from multiple (180) clients,
Client process hung.
Process in D (uninterruptible) State in kernel mode
19
How to debug
Use Magic SysRq (or)
ps n -o pid,user,wchan -C <process>
ps -o pid,user,wchan -C <process> will translate the
address (EIP) in to corresponding function
If address found in System.map, use /proc/kallsyms or
disassemble module with starting address
20
Questions?
21