linux kprobe

了解linux kprobe

1.What is kprobes?

kprobe是一个轻量级的内核调试工具，也是其他更高级的内核调试（如perf和systemtap的基础）。kprobes 主要用来对内核进行调试追踪, 属于比较轻量级的机制, 本质上是在指定的探测点(比如函数的某行, 函数的入口地址和出口地址, 或者内核的指定地址处)插入一组处理程序. 内核执行到这组处理程序的时候就可以获取到当前正在执行的上下文信息, 比如当前的函数名, 函数处理的参数以及函数的返回值, 也可以获取到寄存器甚至全局数据结构的信息.

kprobe可以在运行的内核中动态插入探测点，执行你预定义的操作。用户指定一个探测点，并把一个用户定义的处理函数关联到该探测点, 当内核执行到该探测点时, 相应的关联函数被执行，然后继续执行正常的代码路径。

kprobe 实现了三种类型的探测点：

kprobes 是可以被插入到内核的任何指令位置的探测点
jprobes 则只能被插入到一个内核函数的入口
kretprobes 则是在指定的内核函数返回时才被执行。kretprobes 在 kprobes 的机制上实现, 主要用于返回点(比如内核函数或者系统调用的返回值)的探测以及函数执行耗时的计算.

在这里插入图片描述

Note:
uprobes 机制类似 kprobes, 不过主要用户空间的追踪调试. 另外 uprobes 应该主要是由 systemtap 实现并完善.

1.1.kprobes技术背景

开发人员在内核或者模块的调试过程中，往往会需要要知道其中的一些函数有无被调用、何时被调用、执行是否正确以及函数的入参和返回值是什么等等。比较简单的做法是在内核代码对应的函数中添加日志打印信息，但这种方式往往需要重新编译内核或模块，重新启动设备之类的，操作较为复杂甚至可能会破坏原有的代码执行过程。

而利用kprobes技术，用户可以定义自己的回调函数，然后在内核或者模块中几乎所有的函数中（有些函数是不可探测的，例如kprobes自身的相关实现函数，后文会有详细说明）动态的插入探测点，当内核执行流程执行到指定的探测函数时，会调用该回调函数，用户即可收集所需的信息了，同时内核最后还会回到原本的正常执行流程。如果用户已经收集足够的信息，不再需要继续探测，则同样可以动态的移除探测点。因此kprobes技术具有对内核执行流程影响小和操作方便的优点。

1.2.kprobes的特点与使用限制：

1、kprobes允许在同一个被被探测位置注册多个kprobe，但是目前jprobe却不可以；同时也不允许以其他的jprobe回掉函数和kprobe的post_handler回调函数作为被探测点。

扫描二维码关注公众号，回复： 12918849 查看本文章
2、一般情况下，可以探测内核中的任何函数，包括中断处理函数。不过在kernel/kprobes.c和arch/*/kernel/kprobes.c程序中用于实现kprobes自身的函数是不允许被探测的，另外还有do_page_fault和notifier_call_chain；
3、如果以一个内联函数为探测点，则kprobes可能无法保证对该函数的所有实例都注册探测点。由于gcc可能会自动将某些函数优化为内联函数，因此可能无法达到用户预期的探测效果；
4、一个探测点的回调函数可能会修改被探测函数运行的上下文，例如通过修改内核的数据结构或者保存与struct pt_regs结构体中的触发探测之前寄存器信息。因此kprobes可以被用来安装bug修复代码或者注入故障测试代码；
5、kprobes会避免在处理探测点函数时再次调用另一个探测点的回调函数，例如在printk()函数上注册了探测点，则在它的回调函数中可能再次调用printk函数，此时将不再触发printk探测点的回调，仅仅时增加了kprobe结构体中nmissed字段的数值；
6、在kprobes的注册和注销过程中不会使用mutex锁和动态的申请内存；
7、kprobes回调函数的运行期间是关闭内核抢占的，同时也可能在关闭中断的情况下执行，具体要视CPU架构而定。因此不论在何种情况下，在回调函数中不要调用会放弃CPU的函数（如信号量、mutex锁等）；
8、kretprobe通过替换返回地址为预定义的trampoline的地址来实现，因此栈回溯和gcc内嵌函数__builtin_return_address()调用将返回trampoline的地址而不是真正的被探测函数的返回地址；
9、如果一个函数的调用此处和返回次数不相等，则在类似这样的函数上注册kretprobe将可能不会达到预期的效果，例如do_exit()函数会存在问题，而do_execve()函数和do_fork()函数不会；
10、如果当在进入和退出一个函数时，CPU运行在非当前任务所有的栈上，那么往该函数上注册kretprobe可能会导致不可预料的后果，因此，kprobes不支持在X86_64的结构下为__switch_to()函数注册kretprobe，将直接返回-EINVAL。

2.kprobe原理

具体流程见下图：
在这里插入图片描述

1、当用户注册一个探测点后，kprobe首先备份被探测点的对应指令，然后将原始指令的入口点替换为断点指令，该指令是CPU架构相关的，如i386和x86_64是int3，arm是设置一个未定义指令（目前的x86_64架构支持一种跳转优化方案Jump Optimization，内核需开启CONFIG_OPTPROBES选项，该种方案使用跳转指令来代替断点指令）；
2、当CPU流程执行到探测点的断点指令时，就触发了一个trap，在trap处理流程中会保存当前CPU的寄存器信息并调用对应的trap处理函数，该处理函数会设置kprobe的调用状态并调用用户注册的pre_handler回调函数，kprobe会向该函数传递注册的struct kprobe结构地址以及保存的CPU寄存器信息；
3、随后kprobe单步执行前面所拷贝的被探测指令，具体执行方式各个架构不尽相同，arm会在异常处理流程中使用模拟函数执行，而x86_64架构则会设置单步调试flag并回到异常触发前的流程中执行；
4、在单步执行完成后，kprobe执行用户注册的post_handler回调函数；
5、最后，执行流程回到被探测指令之后的正常流程继续执行。

2.KProbes Interface
在这里插入图片描述

2.1.struct kprobe

struct kprobe {
    
    
    struct hlist_node hlist;-----------------------------------------------被用于kprobe全局hash，索引值为被探测点的地址。
    /* list of kprobes for multi-handler support */
    struct list_head list;-------------------------------------------------用于链接同一被探测点的不同探测kprobe。
    /*count the number of times this probe was temporarily disarmed */
    unsigned long nmissed;
    /* location of the probe point */
    kprobe_opcode_t *addr;-------------------------------------------------被探测点的地址。
    /* Allow user to indicate symbol name of the probe point */
    const char *symbol_name;-----------------------------------------------被探测函数的名称。
    /* Offset into the symbol */
    unsigned int offset;---------------------------------------------------被探测点在函数内部的偏移，用于探测函数内核的指令，如果该值为0表示函数的入口。
    /* Called before addr is executed. */
    kprobe_pre_handler_t pre_handler;--------------------------------------被探测点指令执行之前调用的回调函数。
    /* Called after addr is executed, unless... */
    kprobe_post_handler_t post_handler;------------------------------------被探测点指令执行之后调用的回调函数。
    kprobe_fault_handler_t fault_handler;----------------------------------在执行pre_handler、post_handler或单步执行被探测指令时出现内存异常则会调用该回调函数。
    kprobe_break_handler_t break_handler;----------------------------------在执行某一kprobe过程中出发了断点指令后会调用该函数，用于实现jprobe。
    kprobe_opcode_t opcode;------------------------------------------------保存的被探测点原始指令。
    struct arch_specific_insn ainsn;---------------------------------------被复制的被探测点的原始指令，用于单步执行，架构强相关。
    u32 flags;-------------------------------------------------------------状态标记。
};

其中各个字段的含义如下：

struct hlist_node hlist：被用于kprobe全局hash，索引值为被探测点的地址；
struct list_head list：用于链接同一被探测点的不同探测kprobe；
kprobe_opcode_t *addr：被探测点的地址；
const char *symbol_name：被探测函数的名字；
unsigned int offset：被探测点在函数内部的偏移，用于探测函数内部的指令，如果该值为0表示函数的入口；
kprobe_pre_handler_t pre_handler：在被探测点指令执行之前调用的回调函数；
kprobe_post_handler_t post_handler：在被探测指令执行之后调用的回调函数；
kprobe_fault_handler_t fault_handler：在执行pre_handler、post_handler或单步执行被探测指令时出现内存异常则会调用该回调函数；
kprobe_break_handler_t break_handler：在执行某一kprobe过程中触发了断点指令后会调用该函数，用于实现jprobe；
kprobe_opcode_t opcode：保存的被探测点原始指令；
struct arch_specific_insn ainsn：被复制的被探测点的原始指令，用于单步执行，架构强相关（可能包含指令模拟函数）；
u32 flags：状态标记

涉及的API函数接口如下：

int register_kprobe(struct kprobe *kp) //向内核注册kprobe探测点
void unregister_kprobe(struct kprobe *kp) //卸载kprobe探测点
int register_kprobes(struct kprobe **kps, int num) //注册探测函数向量，包含多个探测点
void unregister_kprobes(struct kprobe **kps, int num) //卸载探测函数向量，包含多个探测点
int disable_kprobe(struct kprobe *kp) //临时暂停指定探测点的探测
int enable_kprobe(struct kprobe *kp) //恢复指定探测点的探测

2.2.KProbes Manager

The KProbes Manager is responsible for registering and unregistering KProbes and JProbes. The file kernel/kprobes.c implements the KProbes manager. Each probe is described by the struct kprobe structure and stored in a hash table hashed by the address at which the probe is placed. Access to this hash table is serialized by the spinlock kprobe_lock. This spinlock is locked before a new probe is registered, an existing probe is unregistered or when a probe is hit. This prevents these operations from executing simultaneously on a SMP machine. Whenever a probe is hit, the probe handler is called with interrupts disabled. Interrupts are disabled because handling a probe is a multiple step process which involves breakpoint handling and single-step execution of the probed instruction. There is no easy way to save the state between these operations hence interrupts are kept disabled during probe handling.

The manager is composed of these functions which are followed by a simplified description of what they do. These functions are architecture independent. A side-by-side reading of the code in kernel/kprobes.c and these steps will clarify the whole implementation.

void lock_kprobes(void)
Locks KProbes and records the CPU on which it was locked
void unlock_kprobes(void)
Resets the recorded CPU and unlocks KProbes
struct kprobe *get_kprobe(void *addr)
Using the address of the probed instruction, returns the probe from hash table
int register_kprobe(struct kprobe *p)
This function registers a probe at a given address. Registration involves copying the instruction at the probe address in a probe specific buffer. On x86 the maximum instruction size is 16 bytes hence 16 bytes are copied at the given address. Then it replaces the instruction at the probed address with the breakpoint instruction.
void unregister_kprobe(struct kprobe *p)
This function unregisters a probe. It restores the original instruction at the address and removes the probe structure from the hash table.
int register_jprobe(struct jprobe *jp)
This function registers a JProbe at a function address. JProbes use the KProbes mechanism. In the KProbe pre_handler it stores its own handler setjmp_pre_handler and in the break_handler stores the address of longjmp_break_handler. Then it registers struct kprobe jp->kp by calling - - register_kprobe()
void unregister_jprobe(struct jprobe *jp)
Unregisters the struct kprobe used by this JProbe

2.3.Kprobe config

CONFIG_KPROBES=y
CONFIG_KALLSYMS=y or CONFIG_KALLSYMS_ALL=y

2.4.debugfs Interface

/sys/kernel/debug/kprobes/list: 列出内核中已经设置kprobe断点的函数
/sys/kernel/debug/kprobes/enabled: kprobe开启/关闭开关
/sys/kernel/debug/kprobes/blacklist: kprobe黑名单（无法设置断点函数）
/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
3.kprobe使用实例

使用kprobe可以通过两种方式：

第一种是开发人员自行编写内核模块，向内核注册探测点，探测函数可根据需要自行定制，使用灵活方便；
第二种方式是使用kprobes on ftrace，这种方式是kprobe和ftrace结合使用，即可以通过kprobe来优化ftrace来跟踪函数的调用。

3.1.编写probe模块

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>


/* 对于每个探测，用户需要分配一个kprobe对象*/
static struct kprobe kp = {
    
    
    .symbol_name    = "blkdev_ioctl",
};
 
/* 在被探测指令执行前，将调用预处理例程 pre_handler，用户需要定义该例程的操作*/
static int handler_pre(struct kprobe *p, struct pt_regs *regs)
{
    
    
   
    /* 在这里可以调用内核接口函数dump_stack打印出栈的内容*/
    dump_stack();
    return 0;
}
 
/* 在被探测指令执行后，kprobe调用后处理例程post_handler */
static void handler_post(struct kprobe *p, struct pt_regs *regs,
                unsigned long flags)
{
    
    

}
 
/*在pre-handler或post-handler中的任何指令或者kprobe单步执行的被探测指令产生了例外时，会调用fault_handler*/
static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
{
    
    
    printk(KERN_DEBUG "fault_handler: p->addr = 0x%p, trap #%dn",
        p->addr, trapnr);
    /* 不处理错误时应该返回*/
    return 0;
}
 
/*初始化内核模块*/
static int __init kprobe_init(void)
{
    
    
    int ret;
    kp.pre_handler = handler_pre;
    kp.post_handler = handler_post;
    kp.fault_handler = handler_fault;
 
    ret = register_kprobe(&kp);  /*注册kprobe*/
    if (ret < 0) {
    
    
        printk(KERN_DEBUG "register_kprobe failed, returned %d\n", ret);
        return ret;
    }
    printk(KERN_DEBUG "Planted kprobe at %p\n", kp.addr);
    return 0;
}

static void __exit kprobe_exit(void)
{
    
    
    unregister_kprobe(&kp);
    printk(KERN_DEBUG "kprobe at %p unregistered\n", kp.addr);
}

module_init(kprobe_init)
module_exit(kprobe_exit)
MODULE_LICENSE("GPL");

refer to

https://lwn.net/Articles/132196/
https://lishiwen4.github.io/linux-kernel/linu-kprobe
https://blog.crazytaxii.com/posts/an_introduction_2_kprobes/
https://www.kernel.org/doc/Documentation/kprobes.txt
https://www.kernel.org/doc/ols/2006/slides/kprobes.html
https://blog.arstercz.com/introduction_to_linux_dynamic_tracing/
https://documentation.suse.com/sles/12-SP4/html/SLES-all/cha-tuning-kprobes.html
https://www.cnblogs.com/arnoldlu/p/9752061.html

猜你喜欢