Linux 内核调试(2) - Kdump

了解kdump

1.Introduction

Kdump is a standard Linux mechanism to dump machine memory content on kernel crash. Kdump is based on Kexec. Kdump utilizes two kernels: system kernel and dump capture kernel. System kernel is a normal kernel that is booted with special kdump-specific flags. We need to tell the system kernel to reserve some amount of physical memory where dump-capture kernel will be loaded. We need to load the dump capture kernel in advance because at the moment crash happens there is no way to read any data from disk because kernel is broken.

Once kernel crash happens the kernel crash handler uses Kexec mechanism to boot dump capture kernel. Please note that memory with system kernel is untouched and accessible from dump capture kernel as seen at the moment of crash. Once dump capture kernel is booted, the user can use the file /proc/vmcore to get access to memory of crashed system kernel. The dump can be saved to disk or copied over network to some other machine for further investigation.

In real production environments system and dump capture kernel will be different - system kernel needs a lot of features and compiled with a many kernel flags/drivers. While dump capture kernel goal is to be minimalistic and take as small amount of memory as possible, e.g. dump capture kernel can be compiled without network support if we store memory dump to disk only. But in this article we will simplify things and use the same kernel both as system and dump capture one. It means we will load the same kernel code twice - one as normal system kernel, another one to reserved memory area.

2.Build the system and dump-capture kernels

There are two possible methods of using Kdump.

Build a separate custom dump-capture kernel for capturing the kernel core dump.
Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
of today, i386, x86_64, ppc64, ia64 and arm architectures support relocatable
kernel.

Building a relocatable kernel is advantageous from the point of view that one does not have to build a second kernel for capturing the dump. But
at the same time one might want to build a custom dump capture kernel
suitable to his needs.

2.1.Following are the configuration setting required for system and dump-capture kernels for enabling kdump support.

System kernel config options

Enable “kexec system call” in “Processor type and features.”
- CONFIG_KEXEC=y
Enable “sysfs file system support” in “Filesystem” -> “Pseudo
filesystems.” This is usually enabled by default.
- CONFIG_SYSFS=y
Enable “Compile the kernel with debug info” in “Kernel hacking.”
- CONFIG_DEBUG_INFO=Y

Dump-capture kernel config options (Arch Independent)

Enable “kernel crash dumps” support under “Processor type and features”:
- CONFIG_CRASH_DUMP=y
Enable “/proc/vmcore support” under “Filesystems” -> “Pseudo filesystems”.
- CONFIG_PROC_VMCORE=y
  (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)

Dump-capture kernel config options (Arch Dependent, arm)

To use a relocatable kernel, Enable “AUTO_ZRELADDR” support under “Boot” options:
- AUTO_ZRELADDR=y

3.Boot into System Kernel

Update the boot loader (such as grub, yaboot, or lilo) configuration files as necessary.
Boot the system kernel with the boot parameter “crashkernel=Y@X”,where Y specifies how much memory to reserve for the dump-capture kernel and X specifies the beginning of this reserved memory.
- For example,“crashkernel=64M@16M” tells the system kernel to reserve 64 MB of memory starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
  On arm, use “crashkernel=Y@X”. Note that the start address of the kernel will be aligned to 128MiB (0x08000000), so if the start address is not then any space below the alignment point may be overwritten by the dump-capture kernel,which means it is possible that the vmcore is not that precise as expected.

4.Load the Dump-capture Kernel

After booting to the system kernel, dump-capture kernel needs to be loaded.Based on the architecture and type of image (relocatable or not), one
can choose to load the uncompressed vmlinux or compressed bzImage/vmlinuz
of dump-capture kernel. Following is the summary.

For arm:
- Use zImage

If you are using a compressed zImage, then use following command to load dump-capture kernel.

   kexec --type zImage -p \<dump-capture-kernel-bzImage> \
   --initrd=\<initrd-for-dump-capture-kernel> \
   --dtb=\<dtb-for-dump-capture-kernel> \
   --append="root=\<root-dev> \<arch-specific-options>"

5.Kernel Panic

After successfully loading the dump-capture kernel as previously described, the system will reboot into the dump-capture kernel if a system crash is triggered. Trigger points are located in panic(), die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).

The following conditions will execute a crash trigger point:

If a hard lockup is detected and “NMI watchdog” is configured, the system will boot into the dump-capture kernel ( die_nmi() ).
If die() is called, and it happens to be a thread with pid 0 or 1, or die() is called inside interrupt context or die() is called and panic_on_oops is set, the system will boot into the dump-capture kernel.

For testing purposes, you can trigger a crash by using “ALT-SysRq-c”,
"echo c > /proc/sysrq-trigger" or write a module to force the panic.

Documentation/sysctl/kernel.txt：

panic:

The value in this file represents the number of seconds the kernel
waits before rebooting on a panic. When you use the software watchdog,
the recommended setting is 60.

==============================================================

panic_on_io_nmi:

Controls the kernel's behavior when a CPU receives an NMI caused by
an IO error.

0: try to continue operation (default)

1: panic immediately. The IO error triggered an NMI. This indicates a
   serious system condition which could result in IO data corruption.
   Rather than continuing, panicking might be a better choice. Some
   servers issue this sort of NMI when the dump button is pushed,
   and you can use this option to take a crash dump.

==============================================================

panic_on_oops:

Controls the kernel's behaviour when an oops or BUG is encountered.

0: try to continue operation

1: panic immediately.  If the `panic' sysctl is also non-zero then the
   machine will be rebooted.

==============================================================

panic_on_stackoverflow:

Controls the kernel's behavior when detecting the overflows of
kernel, IRQ and exception stacks except a user stack.
This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.

0: try to continue operation.

1: panic immediately.

==============================================================

panic_on_unrecovered_nmi:

The default Linux behaviour on an NMI of either memory or unknown is
to continue operation. For many environments such as scientific
computing it is preferable that the box is taken out and the error
dealt with than an uncorrected parity/ECC error get propagated.

A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is off. That sysctl works like
the existing panic controls already in that directory.

==============================================================

panic_on_warn:

Calls panic() in the WARN() path when set to 1.  This is useful to avoid
a kernel rebuild when attempting to kdump at the location of a WARN().

0: only WARN(), default behaviour.

1: call panic() after printing out WARN() location.

==============================================================

panic_print:

Bitmask for printing system info when panic happens. User can chose
combination of the following bits:

bit 0: print all tasks info
bit 1: print system memory info
bit 2: print timer info
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer

So for example to print tasks and memory info on panic, user can:
  echo 3 > /proc/sys/kernel/panic_print

==============================================================

panic_on_rcu_stall:

When set to 1, calls panic() after RCU stall detection messages. This
is useful to define the root cause of RCU stalls using a vmcore.

0: do not panic() when RCU stall takes place, default behavior.

1: panic() after printing RCU stall messages.

6.Write Out the Dump File

After the dump-capture kernel is booted, write out the dump file with
the following command:

cp /proc/vmcore

7.Analysis

Before analyzing the dump image, you should reboot into a stable kernel. You can do limited analysis using GDB on the dump file copied out of /proc/vmcore. Use the debug vmlinux built with -g and run the following
command:

gdb vmlinux <dump-file>

Stack trace for the task on processor 0, register display, and memory
display work fine.

Note: GDB cannot analyze core files generated in ELF64 format for x86.
On systems with a maximum of 4GB of memory, you can generate
ELF32-format headers using the --elf32-core-headers kernel option on the
dump kernel.
You can also use the Crash utility to analyze dump files in Kdump
format. Crash is available on Dave Anderson’s site at the following URL:
http://people.redhat.com/~anderson/

8.代码分析

setup_arch
->reserve_crashkernel();
->->parse_crashkernel();

9.Manually Trigger the Core Dump

9.1.You can manually trigger the core dump using the following commands:

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

9.2.To check whether the crash kernel is already loaded please run following command:

$ cat /sys/kernel/kexec_crash_loaded
Note：生成该节点需要配置如下：
#CONFIG_SMP=y //去掉该行
CONFIG_KEXEC_CORE=y
CONFIG_CRASH_CORE=y

查看 /sys/kernel/kexec_crash_loaded 值判断 second kernel 是否加载，1 为已加载；否则未加载。然后执行echo c > /proc/sysrq-trigger 触发崩溃。

9.3.Dump crashed kernel

Once booted into dump capture kernel you can read /proc/vmcore file. It is recommended to dump core to a file and analyze it later.

cp /proc/vmcore /root/crash.dump

or optionally you can copy the crash to another machine. Once dump is saved you should reboot the machine into normal system kernel.

The crash dump can be quite big, makedumpfileAUR can be used to create smaller dumps by ignoring some memory regions and using compression:

makedumpfile -c -d 31 /proc/vmcore /root/crash.dump (使用该命令压缩后的文件不是elf格式，需要使用crash查看)

9.5.Analyzing core dump

You can use either gdb tool or special gdb extension called crash. Run crash as

$ crash vmlinux path/crash.dump

Where vmlinux previously saved kernel binary with debug symbols.

refer to

Documentation/kdump/kdump.txt
Documentation/sysctl/kernel.txt
Debug.Hacks 深入调试的技术和工具.pdf
kernel/kexec_core.c
kernel/panic.c

Hacker_Albert

发布了161 篇原创文章 · 获赞 15 · 访问量 2万+

私信关注

Linux 内核调试(2) - Kdump

猜你喜欢