1.前言
本文档都是对内核文档《Event Tracing》的翻译和整理。Tracing event实际是建立在Trace points基础之上的,使用Tracing event不用像tracing point那样需要自己定义probe函数,而且这些probe函数往往要通过模块的方式进行定义,然后加载,而Tracing event提供了TRACE_EVENT宏,可以通过复杂宏帮助定义统一格式的probe函数,而Tracing event需要用户指定trace 信息以何种格式存放到ring buffer中,trace信息将以何种格式打印。
2.Tracing event的用法
主要有如下几种用法:
2.1.通过‘set_event’接口
#cat /sys/kernel/debug/tracing/available_events
查看支持的event
#echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
使能一个特定的event
#echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
禁用一个特定的event
#echo > /sys/kernel/debug/tracing/set_event
禁用所有的event
#echo *:* > /sys/kernel/debug/tracing/set_event
使能所有子系统的所有event
#echo 'irq:*' > /sys/kernel/debug/tracing/set_event
使能某个子系统的所有event
2.2 通过enable
#echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
使能某个子系统的某个event
#echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
禁用某个子系统的某个event
#echo 1 > /sys/kernel/debug/tracing/events/sched/enable
使能某个子系统的所有event
#echo 0 > /sys/kernel/debug/tracing/events/sched/enable
禁用某个子系统的所有event
#echo 1 > /sys/kernel/debug/tracing/events/enable
使能所有子系统的所有event
2.3 boot选项
trace_event=[event-list]
为了在启动阶段能跟踪trace打印
注:event-list需要用“,”作为分隔符
3.定义一个trace event
看考内核代码 sample/trace_events
4.event格式
field:field-type field-name; offset:N; size:N;
如:对于scsi_dispatch_cmd_start,cat format如下:
#cat /sys/kernel/debug/tracing/events/scsi/scsi_dispatch_cmd_start/format
name: scsi_dispatch_cmd_start
ID: 446
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned int host_no; offset:8; size:4; signed:0;
field:unsigned int channel; offset:12; size:4; signed:0;
field:unsigned int id; offset:16; size:4; signed:0;
field:unsigned int lun; offset:20; size:4; signed:0;
field:unsigned int opcode; offset:24; size:4; signed:0;
field:unsigned int cmd_len; offset:28; size:4; signed:0;
field:unsigned int data_sglen; offset:32; size:4; signed:0;
field:unsigned int prot_sglen; offset:36; size:4; signed:0;
field:unsigned char prot_op; offset:40; size:1; signed:0;
field:__data_loc unsigned char[] cmnd; offset:44; size:4; signed:0;
其中前4个为公共的,固定的格式,剩余的是每个 event 个性化的信息格式
5.Event filtering
可以通过表达式对event进行过滤,event的信息进入到trace buffer后,通过filter表达式进行检查,只有与filter匹配才能够输出,如果不匹配将会被丢弃。一个event默认没有filter,可匹配任何情况。
5.1 filter格式
field-name relational-operator value
(1)如果有多个filter格式可以用逻辑表达式链接
‘&&’ and ‘||’
(2)field-name为format中定义的域名
(3)relational-operator
对于数值:
==, !=, <, <=, >, >=, &
对于字符串
==, !=, ~
5.2 设置 filter
#cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
#echo "common_preempt_count > 4" > filter
将表达式通过echo写入event文件节点下的filter节点
注:如果表达式错误,会报出错信息,通过cat filter可以查看到错误在哪个位置
# cd /sys/kernel/debug/tracing/events/signal/signal_generate
# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
-bash: echo: write error: Invalid argument
# cat filter
((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
^
parse_error: Field not found
5.3 清除filter
向filter节点写入0
5.4 子系统filter
# cd /sys/kernel/debug/tracing/events/sched
# echo 0 > filter
# cat sched_switch/filter
none
# cat sched_wakeup/filter
none
# cd /sys/kernel/debug/tracing/events/sched
# echo common_pid == 0 > filter
# cat sched_switch/filter
common_pid == 0
# cat sched_wakeup/filter
common_pid == 0
设置子系统filter
注:用私有属性设置子系统filter,只会对有此属性的event生效
5.5 PID filter
# cd /sys/kernel/debug/tracing
# echo $$ > set_event_pid //跟踪当前进程
# echo 1 > events/enable
只为特定进程trace event
# echo 123 244 1 >> set_event_pid
为更多PID追加跟踪
6.触发event
通过让一个 event与一个 trigger关联,通过设置event的filter,当检测filter匹配时,也就是event命中,与event关联的trgger会被触发。如果没有设置filter则总是匹配
6.1 Expression syntax
语法是建立在set_ftrace_filter 基础上的
# echo 'command[:count] [if filter]' > trigger
添加trigger
# echo '!command[:count] [if filter]' > trigger
移除trigger
注:
1.移除trigger时, [if filter]可以用!代替,因为是不做检查的
2.filter与前述event filter语法是一样的
3.无法通过>>来追加trigger
4.无法一次性删除所有的trigger,只能用!command的方式移除
6.2 支持的trigger command
enable_event/disable_event
(1)格式:
enable_event::[:count]
disable_event::[:count]
(2)举例:
# echo 'enable_event:kmem:kmalloc:1' > \
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
当sys_enter_read事件命中,kmalloc事件被使能
# echo 'disable_event:kmem:kmalloc' > \
/sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
当sys_exit_read事件命中,kmalloc事件被禁用
# echo '!enable_event:kmem:kmalloc:1' > \
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# echo '!disable_event:kmem:kmalloc' > \
/sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
移除命令
注:enable_event/disable_event可以支持一个trigger对应多条命令,但是每条命令只能有一个版本,如上述对于kmalloc事件,只允许有一个版本,不支持:
kmem:kmalloc and kmem:kmalloc:1 or ‘kmem:kmalloc if bytes_req == 256’ and ‘kmem:kmalloc if bytes_alloc == 256’
stacktrace
当trigger事件发生时会dump stack到trace buffer
(1)格式:
stacktrace[:count]
(2)举例:
# echo 'stacktrace' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
kmalloc事件发生时dump stack
# echo 'stacktrace:5 if bytes_req >= 65536' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
kmalloc事件发生,且bytes_req >= 65536时dump stack 5次
# echo '!stacktrace' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# echo '!stacktrace:5 if bytes_req >= 65536' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
或
# echo '!stacktrace:5' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
移除上述命令
注:每个trigger只能有一个stacktrace
snapshot
当事件发生时snapshot触发,创建一个快照,一般是在一个event命中时,需要trace很多个event时使用
# echo 'snapshot if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
a block request queue is unplugged with a depth > 1时,创建一个快照
# echo 'snapshot:1 if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
a block request queue is unplugged with a depth > 1时,创建一个快照1次
# echo '!snapshot if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
# echo '!snapshot:1 if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
移除上面的命令
注:每个trigger只能有一个snapshot
traceon/traceoff
当事件命中时,trace on或trace off
# echo 'traceoff:1 if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
the first time a block request queue is unplugged with a depth > 1,trace off
# echo 'traceoff if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
always disable tracing when nr_rq > 1:
# echo '!traceoff:1 if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
# echo '!traceoff if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
To remove the above commands
注: that there can be only one traceon or traceoff trigger per triggering event.
hist
See Documentation/trace/histogram.txt for details and examples.