linux内核调试工具ftrace

1. 首先，我们先来弄清楚到底什么是ftrace

它的作者：Steven Rostedt

对于ftrace作者这样说,

ftrace是一个追踪内核的内部操作的工具，这里提到的追踪有几大类
一、内核中的静态追踪器Static tracepoints within the kernel（event tracing）
①scheduling（调度）
②interrupts(中断)
③file systems(文件系统)
④virtual guest connections with host(虚拟用户和主机的连接)
二、动态的内核函数追踪（Dynamic kernel function tracing）
①trace all functions within the kernel（追踪内核中的所有函数）
②pick and choose what functions to trace（挑选要追踪的函数）
③call graphs（函数调用关系图）
④stack usage（函数栈的使用？？？）
三、延时追踪器（Latency tracers）
①how long interrupts are disabled
②how long preemption（抢占） is disabled
③how long interrupts and/or preemption is disabled
四、唤醒延时（Wake up latency）
①how long it takes a process to run after it is woken

维基百科给的定义是这样的

ftrace (abbreviated from Function Tracer) is a tracing framework for the Linux kernel. Although its original name, Function Tracer, came from ftrace’s ability to record information related to various function calls performed while the kernel is running, ftrace’s tracing capabilities cover a much broader range of kernel’s internal operations.

With its various tracer plugins, ftrace can be targeted at different static tracepoints, such as scheduling events, interrupts, memory-mapped I/O, CPU power state transitions, and operations related to file systems and virtualization. Also, dynamic tracking of kernel function calls is available, optionally restrictable to a subset of functions by using globs, and with the possibility to generate call graphs and provide stack usage reports. At the same time, ftrace can be used to measure various latencies within the Linux kernel, such as for how long interrupts or preemption are disabled.

An ftrace-enabled Linux kernel is built by enabling the CONFIG_FUNCTION_TRACER kernel configuration option. The entire runtime interaction with ftrace is performed through readable and writable virtual files contained in a specifically mounted debugfs file system; as a result, ftrace requires no specialized userspace utilities to operate. However, there are additional userspace utilities that provide more advanced features for data recording, analysis and visualization; examples of such utilities are trace-cmd and KernelShark.

Internally, ftrace relies on the gcc’s profiling mechanism to prepend machine instructions to the compiled versions of all source-level kernel functions, which redirect the execution of functions to the ftrace’s trampolines and tracer plugins that perform the actual tracing. These “entry point” instructions created by gcc are altered by ftrace when the kernel is booted, and varied later at runtime by ftrace between NOPs and actual jumps to the tracing trampolines, depending on the tracing types and options configured at runtime.

ftrace is developed primarily by Steven Rostedt, and it was merged into the Linux kernel mainline in kernel version 2.6.27, which was released on October 9, 2008

2. and 接下来让我们看看它是怎么应用的

通常，我们使用的linux系统其内核已经做了对ftrace的支持，所以我们这里先讲讲如何在linux系统中使用ftrace
ftrace 通过 debugfs 向用户态提供访问接口，debugfs会创建一个目录/sys/kernel/debug, 在这个目录下有一个tracing目录，其中存放了提供给用户访问的ftrace服务接口文档

README文件里面对此目录下面的文件作了简要的介绍，有兴趣的自行查看吧
current_tracer用于设置或显示当前使用的跟踪器；使用echo命令将跟踪器名字写入该文件可以切换到不同的跟踪器。系统启动后，其缺省值为 nop ，即不做任何跟踪操作。在执行完一段跟踪任务后，可以通过向该文件写入 nop 来重置跟踪器。
available_tracers记录了当前编译进内核的跟踪器的列表，可以通过 cat 查看其内容；写 current_tracer 文件时用到的跟踪器名字必须在该文件列出的跟踪器名字列表中。
trace文件提供了查看获取到的跟踪信息的接口。可以通过 cat 等命令查看该文件以查看跟踪到的内核活动记录，也可以将其内容保存为记录文件以备后续查看。
tracing_on用于控制跟踪的暂停。有时候在观察到某些事件时想暂时关闭跟踪，可以将 0 写入该文件以停止跟踪，这样跟踪缓冲区中比较新的部分是与所关注的事件相关的；写入 1 可以继续跟踪。
set_graph_function设置要清晰显示调用关系的函数，显示的信息结构类似于 C 语言代码，这样在分析内核运作流程时会更加直观一些。在使用 function_graph 跟踪器时使用；缺省为对所有函数都生成调用关系序列，可以通过写该文件来指定需要特别关注的函数。
buffer_size_kb用于设置单个 CPU 所使用的跟踪缓存的大小。跟踪器会将跟踪到的信息写入缓存，每个 CPU 的跟踪缓存是一样大的。跟踪缓存实现为环形缓冲区的形式，如果跟踪到的信息太多，则旧的信息会被新的跟踪信息覆盖掉。注意，要更改该文件的值需要先将 current_tracer 设置为 nop 才可以。
available_filter_functions记录了当前可以跟踪的内核函数。对于不在该文件中列出的函数，无法跟踪其活动。
set_ftrace_filter和 set_ftrace_notrace在编译内核时配置了动态 ftrace （选中 CONFIG_DYNAMIC_FTRACE 选项）后使用。前者用于显示指定要跟踪的函数，后者则作用相反，用于指定不跟踪的函数。如果一个函数名同时出现在这两个文件中，则这个函数的执行状况不会被跟踪。这些文件还支持简单形式的含有通配符的表达式，这样可以用一个表达式一次指定多个目标函数；

在开篇的定义中我们引用了作者以及维基百科的描述来了解ftrace是干什么的，它都跟踪了哪些东西，现在我们来实际操作一下看看当前系统中都提供了哪些跟踪器吧

root@ubuntu:/sys/kernel/debug/tracing# cat available_tracers 
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop

nop跟踪器不会跟踪任何内核活动，将 nop 写入 current_tracer 文件可以删除之前所使用的跟踪器，并清空之前收集到的跟踪信息，即刷新 trace 文件。
function跟踪器可以跟踪内核函数的执行情况；可以通过文件 set_ftrace_filter 显示指定要跟踪的函数。
function_graph跟踪器可以显示类似 C 源码的函数调用关系图，这样查看起来比较直观一些；可以通过文件 set_grapch_function 显示指定要生成调用流程图的函数。

ftrace还支持其他一些跟踪器例如sched_switch跟踪器可以对内核中的进程调度活动进行跟踪、irqsoff跟踪器和 preemptoff跟踪器分别跟踪关闭中断的代码和禁止进程抢占的代码，并记录关闭的最大时长，preemptirqsoff跟踪器则可以看做它们的组合。

ftrace 框架支持扩展添加新的跟踪器。读者可以参考内核源码包中 Documentation/trace 目录下的文档以及 kernel/trace 下的源文件，以了解其它跟踪器的用途和如何添加新的跟踪器。

fucntion 跟踪器的使用

function 跟踪器可以跟踪内核函数的调用情况，可用于调试或者分析 bug ，还可用于了解和观察 Linux 内核的执行过程。下面给出使用 function 跟踪器的示例。

root@ubuntu:/sys/kernel/debug/tracing# pwd
/sys/kernel/debug/tracing
root@ubuntu:/sys/kernel/debug/tracing# echo function > current_tracer 
root@ubuntu:/sys/kernel/debug/tracing# echo 1 > tracing_on
# 在这里等待一段时间，
root@ubuntu:/sys/kernel/debug/tracing# echo 0 > tracing_on

root@ubuntu:/sys/kernel/debug/tracing# cat trace | head -20
# tracer: function
#
# entries-in-buffer/entries-written: 205147/8183627   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
           <...>-38381 [000] d... 10529.718505: __update_load_avg_cfs_rq <-update_blocked_averages
           <...>-38381 [000] d... 10529.718505: __accumulate_pelt_segments <-__update_load_avg_cfs_rq
           <...>-38381 [000] d... 10529.718506: __update_load_avg_cfs_rq <-update_blocked_averages
           <...>-38381 [000] d... 10529.718506: __accumulate_pelt_segments <-__update_load_avg_cfs_rq
           <...>-38381 [000] d... 10529.718507: __update_load_avg_cfs_rq <-update_blocked_averages
           <...>-38381 [000] d... 10529.718507: __accumulate_pelt_segments <-__update_load_avg_cfs_rq
           <...>-38381 [000] d... 10529.718508: __update_load_avg_cfs_rq <-update_blocked_averages
           <...>-38381 [000] d... 10529.718509: __accumulate_pelt_segments <-__update_load_avg_cfs_rq
           <...>-38381 [000] d... 10529.718509: __update_load_avg_cfs_rq <-update_blocked_averages
           <idle>-0     [000] d... 10818.707183: quiet_vmstat <-tick_nohz_idle_stop_tick
           <idle>-0     [000] d... 10818.707184: need_update <-quiet_vmstat
           <idle>-0     [000] d... 10818.707184: first_online_pgdat <-need_update
           <idle>-0     [000] d... 10818.707186: next_zone <-need_update
           <idle>-0     [000] d... 10818.707187: next_zone <-need_update
           <idle>-0     [000] d... 10818.707187: next_zone <-need_update
           <idle>-0     [000] d... 10818.707187: next_zone <-need_update
           <idle>-0     [000] d... 10818.707188: next_zone <-need_update
           <idle>-0     [000] d... 10818.707188: next_online_pgdat <-next_zone

root@ubuntu:/sys/kernel/debug/tracing#

trace 文件给出的信息格式很清晰。首先，字段“tracer:”给出了当前所使用的跟踪器的名字，这里为 function 跟踪器。然后是跟踪信息记录的格式，TASK 字段对应任务的名字，PID 字段则给出了任务的进程 ID，字段 CPU# 表示运行被跟踪函数的 CPU 号，这里可以看到 idle 进程运行在 0 号 CPU 上，其进程 ID 是 0 ；字段 TIMESTAMP 是时间戳，其格式为“secs.usecs”，表示执行该函数时对应的时间戳；FUNCTION 一列则给出了被跟踪的函数，函数的调用者通过符号 “<-” 标明，这样可以观察到函数的调用关系。

https://lwn.net/Articles/365835/

https://lwn.net/Articles/366796/

DragonaJin

发布了8 篇原创文章 · 获赞 0 · 访问量 245

私信关注