ftrace(二)

即上一篇ftrace(一)简介了ftrace对应的debug目录下的各个文件的用途,这一篇主要介绍可以配置几个常用的Tracers。

Tracers

function

用于trace内核中的所有functions

function_graph

和function tracer类似,只是function graph以一种更加容易查看的方式来呈现函数调用关系。

类似与C代码的编写风格。

irqsoff

Trace关闭中断期间这段时间执行的代码,并且保存关闭中断的最大时间到tracing_max_latency中。

一般用来debug系统延迟,最好是使能latency-format option更加方便的查看trace信息。

实例:

 # echo 0 > options/function-trace
 # echo irqsoff > current_tracer
 # echo 1 > tracing_on
 # echo 0 > tracing_max_latency
 # ls -ltr
 [...]
 # echo 0 > tracing_on
 # cat trace
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 16 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: run_timer_softirq
#  => ended at:   run_timer_softirq
#
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /     delay
#  cmd     pid   ||||| time  |   caller
#     \   /      |||||  \    |   /
  <idle>-0       0d.s2    0us+: _raw_spin_lock_irq <-run_timer_softirq
  <idle>-0       0dNs3   17us : _raw_spin_unlock_irq <-run_timer_softirq
  <idle>-0       0dNs3   17us+: trace_hardirqs_on <-run_timer_softirq
  <idle>-0       0dNs3   25us : <stack trace>
 => _raw_spin_unlock_irq
 => run_timer_softirq
 => __do_softirq
 => call_softirq
 => do_softirq
 => irq_exit
 => smp_apic_timer_interrupt
 => apic_timer_interrupt
 => rcu_idle_exit
 => cpu_idle
 => rest_init
 => start_kernel

preemptoff

类似irqoff,主要是trace关闭抢占功能期间执行的代码。

实例:



 # echo 0 > options/function-trace
 # echo preemptoff > current_tracer
 # echo 1 > tracing_on
 # echo 0 > tracing_max_latency
 # ls -ltr
 [...]
 # echo 0 > tracing_on
 # cat trace
# tracer: preemptoff
#
# preemptoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 46 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: sshd-1991 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: do_IRQ
#  => ended at:   do_IRQ
#
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /     delay
#  cmd     pid   ||||| time  |   caller
#     \   /      |||||  \    |   /
    sshd-1991    1d.h.    0us+: irq_enter <-do_IRQ
    sshd-1991    1d..1   46us : irq_exit <-do_IRQ
    sshd-1991    1d..1   47us+: trace_preempt_on <-do_IRQ
    sshd-1991    1d..1   52us : <stack trace>
 => sub_preempt_count
 => irq_exit
 => do_IRQ
 => ret_from_intr

preemptirqsoff

和上面类似,trace irqsoff+preemptoff期间执行的代码。并记录时间。

实例:



 # echo 0 > options/function-trace
 # echo preemptirqsoff > current_tracer
 # echo 1 > tracing_on
 # echo 0 > tracing_max_latency
 # ls -ltr
 [...]
 # echo 0 > tracing_on
 # cat trace
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 100 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: ls-2230 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: ata_scsi_queuecmd
#  => ended at:   ata_scsi_queuecmd
#
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /     delay
#  cmd     pid   ||||| time  |   caller
#     \   /      |||||  \    |   /
      ls-2230    3d...    0us+: _raw_spin_lock_irqsave <-ata_scsi_queuecmd
      ls-2230    3...1  100us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd
      ls-2230    3...1  101us+: trace_preempt_on <-ata_scsi_queuecmd
      ls-2230    3...1  111us : <stack trace>
 => sub_preempt_count
 => _raw_spin_unlock_irqrestore
 => ata_scsi_queuecmd
 => scsi_dispatch_cmd
 => scsi_request_fn
 => __blk_run_queue_uncond
 => __blk_run_queue
 => blk_queue_bio
 => generic_make_request
 => submit_bio
 => submit_bh
 => ext3_bread
 => ext3_dir_bread
 => htree_dirblock_to_tree

wakeup

trace并record 任务(最高优先级别)从wakeup函数到实际意义上的wakeup之间的最大延迟时间

实例:



 # echo 0 > options/function-trace
 # echo wakeup > current_tracer
 # echo 1 > tracing_on
 # echo 0 > tracing_max_latency
 # chrt -f 5 sleep 1
 # echo 0 > tracing_on
 # cat trace
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 15 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: kworker/3:1H-312 (uid:0 nice:-20 policy:0 rt_prio:0)
#    -----------------
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /     delay
#  cmd     pid   ||||| time  |   caller
#     \   /      |||||  \    |   /
  <idle>-0       3dNs7    0us :      0:120:R   + [003]   312:100:R kworker/3:1H
  <idle>-0       3dNs7    1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up
  <idle>-0       3d..3   15us : __schedule <-schedule
  <idle>-0       3d..3   15us :      0:120:R ==> [003]   312:100:R kworker/3:1H

wakeup_rt

trace并record 任务(RT tasks)从wakeup函数到实际意义上的wakeup之间的最大延迟时间

实例:


 # echo 0 > options/function-trace
  # echo wakeup_rt > current_tracer
  # echo 1 > tracing_on
  # echo 0 > tracing_max_latency
  # chrt -f 5 sleep 1
  # echo 0 > tracing_on
  # cat trace
 # tracer: wakeup
 #
 # tracer: wakeup_rt
 #
 # wakeup_rt latency trace v1.1.5 on 3.8.0-test+
 # --------------------------------------------------------------------
 # latency: 5 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
 #    -----------------
 #    | task: sleep-2389 (uid:0 nice:0 policy:1 rt_prio:5)
 #    -----------------
 #
 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /     delay
 #  cmd     pid   ||||| time  |   caller
 #     \   /      |||||  \    |   /
   <idle>-0       3d.h4    0us :      0:120:R   + [003]  2389: 94:R sleep
   <idle>-0       3d.h4    1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up
   <idle>-0       3d..3    5us : __schedule <-schedule
   <idle>-0       3d..3    5us :      0:120:R ==> [003]  2389: 94:R sleep

nop

“trace nothing” tracer

上面配置的latency tracer,都可以配置对应的echo 1 > options/function-trace 来使能function trace的输出,只是为了防止此操作带来的延迟影响,我们一般都会选择关闭。

Event trace

除了上面的tracer以外,还有一个很重要的功能就是event trace,从 2.6.30 开始,ftrace 支持 event tracer。这个并不是通过配置current_tracer来设置,而是通过/sys/kernel/debug/tracing/events目录来配置的。

当我们想要debug延迟的时候,function trace本身可能就会增加系统延时,此时我们可以禁止function trace,而利用event trace来debug,由此会降低trace功能引入的延迟。

这算是一个折中的方案。

举例说明:

当我们发现系统有延迟问题时,我们想要去debug此问题,首先想到使用wakeup_rt tracer去trace该问题。trace log内容如下:


/sys/kernel/debug/tracing # echo wakeup_rt > current_tracer 
/sys/kernel/debug/tracing # echo 0 > options/function-trace 
/sys/kernel/debug/tracing # echo 1 > tracing_on 
/sys/kernel/debug/tracing # echo 0 > tracing_max_latency 
/sys/kernel/debug/tracing # chrt -f 5 sleep 1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat trace
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 4.0.0
# --------------------------------------------------------------------
# latency: 271 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
#    -----------------
#    | task: watchdog/0-11 (uid:0 nice:0 policy:1 rt_prio:99)
#    -----------------
#
#                  _------=> CPU#            
#                 / _-----=> irqs-off        
#                | / _----=> need-resched    
#                || / _---=> hardirq/softirq 
#                ||| / _--=> preempt-depth   
#                |||| /     delay            
#  cmd     pid   ||||| time  |   caller      
#     \   /      |||||  \    |   /         
  <idle>-0       0dnh4    7us+:      0:120:R   + [000]    11:  0:R watchdog/0
  <idle>-0       0dnh4   34us!: ttwu_do_activate.constprop.98 <-try_to_wake_up
  <idle>-0       0d..3  248us+: __schedule <-schedule
  <idle>-0       0d..3  265us :      0:120:R ==> [000]    11:  0:R watchdog/0

虽然这个可以找到对应时间,并且只有wake up调用到schedule之间的时间信息,因为我们关闭了function-trace option,所以没有其他function信息打印出来,如果我们这里使能了

function-trace,那么由此又会引入很大的延迟,所以这个方式不可取,但是没有function信息我们又很难定位到底哪里引起的延迟问题。此时event trace就派上用场了。


/sys/kernel/debug/tracing # echo wakeup_rt > current_tracer
/sys/kernel/debug/tracing # echo 0 > options/function-trace
/sys/kernel/debug/tracing # echo 1 > events/enable
/sys/kernel/debug/tracing # echo 1 > tracing_on
/sys/kernel/debug/tracing # echo 0 > tracing_max_latency
/sys/kernel/debug/tracing # chrt -f 5 sleep 1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat trace
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 4.0.0
# --------------------------------------------------------------------
# latency: 772 us, #12/12, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
#    -----------------
#    | task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99)
#    -----------------
#
#                  _------=> CPU#            
#                 / _-----=> irqs-off        
#                | / _----=> need-resched    
#                || / _---=> hardirq/softirq 
#                ||| / _--=> preempt-depth   
#                |||| /     delay            
#  cmd     pid   ||||| time  |   caller      
#     \   /      |||||  \    |   /         
  <idle>-0       1dnh4   11us+:      0:120:R   + [001]    12:  0:R watchdog/1
  <idle>-0       1dnh4   60us+: ttwu_do_activate.constprop.98 <-try_to_wake_up
  <idle>-0       1dnh4   90us+: sched_wakeup: comm=watchdog/1 pid=12 prio=0 success=1 target_cpu=001
  <idle>-0       1dnh2  156us+: hrtimer_expire_exit: hrtimer=ffff80007efd8068
  <idle>-0       1dnh3  186us+: hrtimer_start: hrtimer=ffff80007efd8068 function=watchdog_timer_fn expires=21160150000000 softexpires=21160150000000
  <idle>-0       1dnh2  280us!: irq_handler_exit: irq=3 ret=handled
  <idle>-0       1dn.3  493us+: hrtimer_cancel: hrtimer=ffff80007efd7f30
  <idle>-0       1dn.3  545us+: hrtimer_start: hrtimer=ffff80007efd7f30 function=tick_sched_timer expires=21156160000000 softexpires=21156160000000
  <idle>-0       1.n.2  609us+: rcu_utilization: Start context switch
  <idle>-0       1.n.2  673us+: rcu_utilization: End context switch
  <idle>-0       1d..3  748us+: __schedule <-schedule
  <idle>-0       1d..3  758us :      0:120:R ==> [001]    12:  0:R watchdog/1

可以看到除了wakeup_rt相关的log外,还会多出很多event相关的信息,更加方便于我们去定位问题所在.

Stack trace

这是ftrace的有一个功能,由于kernel有一个固定大小的stack,如果一个内核开发者在使用中不注意这个,很容易会导致stack overflow,这会引起内核panic。

所以这个功能就是为了方便于调试stack相关的信息,会对应把每个function运行时占用的stack大小打印出来。

通过CONFIG_STACK_TRACER 来使能内核的stack trace功能。

使用实例:


# echo 1 > /proc/sys/kernel/stack_tracer_enabled

After running it for a few minutes

# cat stack_max_size

 2928

# cat stack_trace

         Depth    Size   Location    (18 entries)

         -----    ----   --------

   0)     2928     224   update_sd_lb_stats+0xbc/0x4ac

   1)     2704     160   find_busiest_group+0x31/0x1f1

   2)     2544     256   load_balance+0xd9/0x662

   3)     2288      80   idle_balance+0xbb/0x130

   4)     2208     128   __schedule+0x26e/0x5b9

   5)     2080      16   schedule+0x64/0x66

   6)     2064     128   schedule_timeout+0x34/0xe0

   7)     1936     112   wait_for_common+0x97/0xf1

   8)     1824      16   wait_for_completion+0x1d/0x1f

   9)     1808     128   flush_work+0xfe/0x119

  10)     1680      16   tty_flush_to_ldisc+0x1e/0x20

  11)     1664      48   input_available_p+0x1d/0x5c

  12)     1616      48   n_tty_poll+0x6d/0x134

  13)     1568      64   tty_poll+0x64/0x7f

  14)     1504     880   do_select+0x31e/0x511

  15)      624     400   core_sys_select+0x177/0x216

  16)      224      96   sys_select+0x91/0xb9

  17)      128     128   system_call_fastpath+0x16/0x1b

(完)

参考文档:内核文档 Documentation/trace/ftrace.txt

猜你喜欢

转载自blog.csdn.net/rikeyone/article/details/80109394