crash调试linux dump信息

当发生linux 内核crash时,内核依靠kexec机制在系统引导时已分配的内存的预保留部分中快速重新引导内核的新实例。这样可以使现有存储区域保持不变,以安全地将其内容复制到存储中,下文直接演示调试整个过程。

curits@curits-virtual-machine:~/Desktop/crash-master$ sudo apt install linux-crashdump
[sudo] password for curits: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  crash kdump-tools kexec-tools libsnappy1v5 makedumpfile
The following NEW packages will be installed:
  crash kdump-tools kexec-tools libsnappy1v5 linux-crashdump makedumpfile
0 upgraded, 6 newly installed, 0 to remove and 99 not upgraded.
Need to get 3,066 kB of archives.
After this operation, 9,747 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y

安装过程中会提示如下信息,直接选择yes就OK
在这里插入图片描述
在这里插入图片描述
如果不小心点到No呢,不用担心,可以使用以下命令重新配置

dpkg-reconfigure kexec-tools
dpkg-reconfigure kdump-tools

详细安装过程:

curits@curits-virtual-machine:~/Desktop/crash-master$ sudo apt install linux-crashdump
[sudo] password for curits: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  crash kdump-tools kexec-tools libsnappy1v5 makedumpfile
The following NEW packages will be installed:
  crash kdump-tools kexec-tools libsnappy1v5 linux-crashdump makedumpfile
0 upgraded, 6 newly installed, 0 to remove and 99 not upgraded.
Need to get 3,066 kB of archives.
After this operation, 9,747 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://cn.archive.ubuntu.com/ubuntu bionic/main amd64 libsnappy1v5 amd64 1.1.7-1 [16.0 kB]
Get:2 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 crash amd64 7.2.8-1ubuntu0.18.04.1 [2,780 kB]
Get:2 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 crash amd64 7.2.8-1ubuntu0.18.04.1 [2,780 kB]           
Get:3 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 kexec-tools amd64 1:2.0.16-1ubuntu1.1 [79.8 kB]         
Get:4 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 makedumpfile amd64 1:1.6.5-1ubuntu1~18.04.5 [164 kB]    
Get:5 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 kdump-tools amd64 1:1.6.5-1ubuntu1~18.04.5 [23.7 kB]    
Get:6 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 linux-crashdump amd64 4.15.0.112.100 [2,772 B]          
Fetched 2,467 kB in 2min 3s (20.1 kB/s)                                                                                     
Preconfiguring packages ...
Selecting previously unselected package libsnappy1v5:amd64.
(Reading database ... 174823 files and directories currently installed.)
Preparing to unpack .../0-libsnappy1v5_1.1.7-1_amd64.deb ...
Unpacking libsnappy1v5:amd64 (1.1.7-1) ...
Selecting previously unselected package crash.
Preparing to unpack .../1-crash_7.2.8-1ubuntu0.18.04.1_amd64.deb ...
Unpacking crash (7.2.8-1ubuntu0.18.04.1) ...
Selecting previously unselected package kexec-tools.
Preparing to unpack .../2-kexec-tools_1%3a2.0.16-1ubuntu1.1_amd64.deb ...
Unpacking kexec-tools (1:2.0.16-1ubuntu1.1) ...
Selecting previously unselected package makedumpfile.
Preparing to unpack .../3-makedumpfile_1%3a1.6.5-1ubuntu1~18.04.5_amd64.deb ...
Unpacking makedumpfile (1:1.6.5-1ubuntu1~18.04.5) ...
Selecting previously unselected package kdump-tools.
Preparing to unpack .../4-kdump-tools_1%3a1.6.5-1ubuntu1~18.04.5_amd64.deb ...
Unpacking kdump-tools (1:1.6.5-1ubuntu1~18.04.5) ...
Selecting previously unselected package linux-crashdump.
Preparing to unpack .../5-linux-crashdump_4.15.0.112.100_amd64.deb ...
Unpacking linux-crashdump (4.15.0.112.100) ...
Setting up kexec-tools (1:2.0.16-1ubuntu1.1) ...
Generating /etc/default/kexec...
Setting up makedumpfile (1:1.6.5-1ubuntu1~18.04.5) ...
Setting up libsnappy1v5:amd64 (1.1.7-1) ...
Setting up crash (7.2.8-1ubuntu0.18.04.1) ...
Setting up kdump-tools (1:1.6.5-1ubuntu1~18.04.5) ...

Creating config file /etc/default/kdump-tools with new version
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/kdump-tools.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.4.0-42-generic
Found initrd image: /boot/initrd.img-5.4.0-42-generic
Found linux image: /boot/vmlinuz-5.3.0-28-generic
Found initrd image: /boot/initrd.img-5.3.0-28-generic
Found memtest86+ image: /boot/memtest86+.elf
Found memtest86+ image: /boot/memtest86+.bin
done
Created symlink /etc/systemd/system/multi-user.target.wants/kdump-tools.service → /lib/systemd/system/kdump-tools.service.
kdump-tools-dump.service is a disabled or a static unit, not starting it.
Setting up linux-crashdump (4.15.0.112.100) ...
Processing triggers for systemd (237-3ubuntu10.38) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for ureadahead (0.100.0-21) ...
Processing triggers for libc-bin (2.27-3ubuntu1.2) ...

安装完linux-crashdump查看系统配置情况,显示* no crashkernel= parameter in the kernel cmdline,说明还没有配置成功,重启系统

curits@curits-virtual-machine:~/Desktop/crash-master$ kdump-config show
 * no crashkernel= parameter in the kernel cmdline
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr:
    /var/lib/kdump/vmlinuz
kdump initrd: 
   /var/lib/kdump/initrd.img
current state:    Not ready to kdump
kexec command:
  no kexec command recorded

再次查看配置情况,这说明转储功能已经开启

curits@curits-virtual-machine:~$ kdump-config show
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.4.0-42-generic
kdump initrd: 
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.4.0-42-generic
current state:    ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.4.0-42-generic root=UUID=df1c4cb9-434f-4663-b120-c1038f19e97b ro quiet splash reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

系统配置完成之后,可以步骤可用手动让内核crash进而生成dump文件

Centos工具安装

yum install kexec-tools
//在超级权限下执行如下命令
echo 1 > /proc/sys/kernel/sysrq    --> 只要内核没有挂掉,响应任何操作 
echo c > /proc/sysrq-trigger --> 认为造成os crash

执行完以上命令之后,系统马上重启(建议在虚拟机中进行调试)
系统重启之后可以看到kdump转储的文件 /var/crash

curits@curits-virtual-machine:/var/crash$ ls
202008261430  kdump_lock  kexec_cmd  linux-image-5.4.0-42-generic-202008261430.crash
curits@curits-virtual-machine:/var/crash$ cd 202008261430/
curits@curits-virtual-machine:/var/crash/202008261430$ ls
dmesg.202008261430  dump.202008261430

产生的dump文件用于crash分析,至于带调试信息的内核,在之前的文章有讲过,就不再赘述
直接让crash跑起来,直接运行 "sudo apt install linux-crashdump"命令候会默认安装好crash工具,就没有必要使用之前文章讲的下载源码编译crash工具,话不多说,直接让crash跑起来

curits@curits-virtual-machine:/var/crash/202008261430$ sudo crash /usr/lib/debug/boot/vmlinux-5.4.0-42-generic ./dump.202008261430 

crash 7.2.8
Copyright (C) 2002-2020  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [26MB]: patching 111021 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/boot/vmlinux-5.4.0-42-generic             
    DUMPFILE: ./dump.202008261430  [PARTIAL DUMP]
        CPUS: 2
        DATE: Wed Aug 26 14:30:31 2020
      UPTIME: 00:05:40
LOAD AVERAGE: 0.01, 0.32, 0.26
       TASKS: 503
    NODENAME: curits-virtual-machine
     RELEASE: 5.4.0-42-generic
     VERSION: #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020
     MACHINE: x86_64  (3407 Mhz)
      MEMORY: 4 GB
       PANIC: "Kernel panic - not syncing: sysrq triggered crash"
         PID: 2088
     COMMAND: "bash"
        TASK: ffff89ca05e1c740  [THREAD_INFO: ffff89ca05e1c740]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> 

第一次运行crash时,我们会获得基本系统配置的列表,例如内核版本,系统正常运行时间,日期,任务数,主机名,正在运行的进程以及进程的PID。

在crash 中可以使用ps命令查看系统正在运行的进程
在这里插入图片描述
图片中 ">"指向的进程为奔溃的进程
crash 中的Log 命令可以调出该 dmesg 特定会话的内容,并且在Log的最底端会打印oo
ps的调试信息,从Log中可以看到Trigger a crash的字样

[  462.826966] sysrq: Trigger a crash
[  462.826969] Kernel panic - not syncing: sysrq triggered crash
[  462.826971] CPU: 0 PID: 2088 Comm: bash Kdump: loaded Not tainted 5.4.0-42-generic #46~18.04.1-Ubuntu
[  462.826972] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[  462.826973] Call Trace:
[  462.826978]  dump_stack+0x6d/0x95
[  462.826981]  panic+0xfe/0x2e4
[  462.826984]  sysrq_handle_crash+0x15/0x20
[  462.826985]  __handle_sysrq+0x93/0x150
[  462.826986]  write_sysrq_trigger+0x2f/0x40
[  462.826988]  proc_reg_write+0x3e/0x60
[  462.826990]  __vfs_write+0x1b/0x40
[  462.826991]  vfs_write+0xb1/0x1a0
[  462.826992]  ksys_write+0xa7/0xe0
[  462.826993]  __x64_sys_write+0x1a/0x20
[  462.826995]  do_syscall_64+0x57/0x190
[  462.826996]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  462.826998] RIP: 0033:0x7fd221237264
[  462.827000] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 a1 06 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
[  462.827001] RSP: 002b:00007fff18ab1638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  462.827002] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd221237264
[  462.827003] RDX: 0000000000000002 RSI: 00005568916ad200 RDI: 0000000000000001
[  462.827004] RBP: 00005568916ad200 R08: 000000000000000a R09: 0000000000000001
[  462.827004] R10: 000000000000000a R11: 0000000000000246 R12: 00007fd221513760
[  462.827005] R13: 0000000000000002 R14: 00007fd22150f2a0 R15: 00007fd22150e760
crash> 

crash中的bt命令, 通常显示导致系统崩溃的任务的回溯,可以选择其他任务并获取它们的回溯

crash> bt
PID: 2088   TASK: ffff89ca05e1c740  CPU: 0   COMMAND: "bash"
 #0 [ffffa74181cdfc58] machine_kexec at ffffffff82a6f773
 #1 [ffffa74181cdfcb8] __crash_kexec at ffffffff82b573a2
 #2 [ffffa74181cdfd88] panic at ffffffff82aa0bd6
 #3 [ffffa74181cdfe10] sysrq_handle_crash at ffffffff8308e965
 #4 [ffffa74181cdfe20] __handle_sysrq at ffffffff8308edd3
 #5 [ffffa74181cdfe58] write_sysrq_trigger at ffffffff8308f2af
 #6 [ffffa74181cdfe70] proc_reg_write at ffffffff82d705ae
 #7 [ffffa74181cdfe90] __vfs_write at ffffffff82cd9d8b
 #8 [ffffa74181cdfea0] vfs_write at ffffffff82cdaa71
 #9 [ffffa74181cdfed8] ksys_write at ffffffff82cdd037
#10 [ffffa74181cdff20] __x64_sys_write at ffffffff82cdd08a
#11 [ffffa74181cdff30] do_syscall_64 at ffffffff82a04417
#12 [ffffa74181cdff50] entry_SYSCALL_64_after_hwframe at ffffffff8360008c
    RIP: 00007fd221237264  RSP: 00007fff18ab1638  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000002  RCX: 00007fd221237264
    RDX: 0000000000000002  RSI: 00005568916ad200  RDI: 0000000000000001
    RBP: 00005568916ad200   R8: 000000000000000a   R9: 0000000000000001
    R10: 000000000000000a  R11: 0000000000000246  R12: 00007fd221513760
    R13: 0000000000000002  R14: 00007fd22150f2a0  R15: 00007fd22150e760
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> 

当然这只是手动产生的crash,实际对转储的文件进行分析的时候,bt命令可以得到异常指令的指针和偏移,从而通过dis命令反汇编得到汇编代码和内核源码,进而分析异常导致原因
以下为在centos下的示例,有时候用IDA解析不出来的指令,使用crash是最直接也是最有效的方式

[curtis@localhost Desktop]$ sudo crash /usr/lib/debug/lib/modules/3.10.0-693.el7.x86_64/vmlinux 

crash 7.1.9-2.el7
Copyright (C) 2002-2016  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-693.el7.x86_64/vmlinux
    DUMPFILE: /dev/crash
        CPUS: 2
        DATE: Sun Aug 30 19:02:04 2020
      UPTIME: 00:05:21
LOAD AVERAGE: 0.32, 0.52, 0.27
       TASKS: 403
    NODENAME: localhost.localdomain
     RELEASE: 3.10.0-693.el7.x86_64
     VERSION: #1 SMP Tue Aug 22 21:09:27 UTC 2017
     MACHINE: x86_64  (3407 Mhz)
      MEMORY: 2 GB
         PID: 2813
     COMMAND: "crash"
        TASK: ffff88003ae2af70  [THREAD_INFO: ffff8800363f4000]
         CPU: 0
       STATE: TASK_RUNNING (ACTIVE)

crash> 

分析某个函数,可以先拿到系统的导出表,centos System.map所在的文件夹为

/usr/src/kernels/3.10.0-693.el7.x86_64/System.map

比如说tty_lock这个函数

crash> dis -s tty_lock
FILE: drivers/tty/tty_mutex.c
LINE: 24

  19    /*
  20     * Getting the big tty mutex.
  21     */
  22    
  23    void __lockfunc tty_lock(struct tty_struct *tty)
* 24    {
    
    
  25            if (tty->magic != TTY_MAGIC) {
    
    
  26                    pr_err("L Bad %p\n", tty);
  27                    WARN_ON(1);
  28                    return;
  29            }
  30            tty_kref_get(tty);
  31            mutex_lock(&tty->legacy_mutex);
  32    }

从这里可以清楚的看到函数的源码,所在源码文件夹,行号

crash> dis tty_lock
0xffffffff816abf60 <tty_lock>:  nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff816abf65 <tty_lock+5>:        push   %rbp
0xffffffff816abf66 <tty_lock+6>:        mov    %rsp,%rbp
0xffffffff816abf69 <tty_lock+9>:        push   %rbx
0xffffffff816abf6a <tty_lock+10>:       cmpl   $0x5401,(%rdi)
0xffffffff816abf70 <tty_lock+16>:       mov    %rdi,%rbx
0xffffffff816abf73 <tty_lock+19>:       jne    0xffffffff816abf9b <tty_lock+59>
0xffffffff816abf75 <tty_lock+21>:       test   %rdi,%rdi
0xffffffff816abf78 <tty_lock+24>:       je     0xffffffff816abf8c <tty_lock+44>
0xffffffff816abf7a <tty_lock+26>:       mov    $0x1,%eax
0xffffffff816abf7f <tty_lock+31>:       lock xadd %eax,0x4(%rdi)
0xffffffff816abf84 <tty_lock+36>:       add    $0x1,%eax
0xffffffff816abf87 <tty_lock+39>:       cmp    $0x1,%eax
0xffffffff816abf8a <tty_lock+42>:       jle    0xffffffff816abfbf <tty_lock+95>
0xffffffff816abf8c <tty_lock+44>:       lea    0x80(%rbx),%rdi
0xffffffff816abf93 <tty_lock+51>:       callq  0xffffffff816a7710 <mutex_lock>
0xffffffff816abf98 <tty_lock+56>:       pop    %rbx
0xffffffff816abf99 <tty_lock+57>:       pop    %rbp
0xffffffff816abf9a <tty_lock+58>:       retq   
0xffffffff816abf9b <tty_lock+59>:       mov    %rdi,%rsi
0xffffffff816abf9e <tty_lock+62>:       xor    %eax,%eax
0xffffffff816abfa0 <tty_lock+64>:       mov    $0xffffffff8194054a,%rdi
0xffffffff816abfa7 <tty_lock+71>:       callq  0xffffffff8169dd79 <printk>
0xffffffff816abfac <tty_lock+76>:       mov    $0x1b,%esi
0xffffffff816abfb1 <tty_lock+81>:       mov    $0xffffffff81940532,%rdi
0xffffffff816abfb8 <tty_lock+88>:       callq  0xffffffff81087af0 <warn_slowpath_null>
0xffffffff816abfbd <tty_lock+93>:       jmp    0xffffffff816abf98 <tty_lock+56>
0xffffffff816abfbf <tty_lock+95>:       cmpb   $0x0,0x4463ee(%rip)        # 0xffffffff81af23b4 <__warned.18470>
0xffffffff816abfc6 <tty_lock+102>:      jne    0xffffffff816abf8c <tty_lock+44>
0xffffffff816abfc8 <tty_lock+104>:      mov    $0x2f,%esi
0xffffffff816abfcd <tty_lock+109>:      mov    $0xffffffff81903a09,%rdi
0xffffffff816abfd4 <tty_lock+116>:      callq  0xffffffff81087af0 <warn_slowpath_null>
0xffffffff816abfd9 <tty_lock+121>:      movb   $0x1,0x4463d4(%rip)        # 0xffffffff81af23b4 <__warned.18470>
0xffffffff816abfe0 <tty_lock+128>:      jmp    0xffffffff816abf8c <tty_lock+44>

源码和汇编一起对照看还是挺过瘾

猜你喜欢

转载自blog.csdn.net/qq_42931917/article/details/108238880