Linux内核中的一些基本操作

本文档的Copyleft归yfydz所有，使用GPL发布，可以自由拷贝，转载，转载时请保持文档的完整性，严禁用于任何商业用途。
msn: [email protected]
来源：http://yfydz.cublog.cn
1. 前言
 
本文介绍linux内核中一些常用的数据结构和操作。
 
2. 双向链表(list)
 
linux内核中的双向链表通过结构 struct list_head来将各个节点连接起来，此结构会作为链表元素结构中的一个参数：
struct list_head {
 struct list_head *next, *prev;
};
 
链表头的初始化，注意，结构中的指针为NULL并不是初始化，而是指向自身才是初始化，如果只是按普通情况下的置为NULL，而不是指向自身，系统会崩溃，这是一个容易犯的错误：
 
#define LIST_HEAD_INIT(name) { &(name), &(name) }
#define LIST_HEAD(name) \
 struct list_head name = LIST_HEAD_INIT(name)
#define INIT_LIST_HEAD(ptr) do { \
 (ptr)->next = (ptr); (ptr)->prev = (ptr); \
} while (0)
 
最常用的链表操作：
插入到链表头:
void list_add(struct list_head *new, struct list_head *head);
 
插入到链表尾:
void list_add_tail(struct list_head *new, struct list_head *head);
 
删除链表节点:
void list_del(struct list_head *entry);
 
将节点移动到另一链表:
void list_move(struct list_head *list, struct list_head *head);
 
将节点移动到链表尾:
void list_move_tail(struct list_head *list,struct list_head *head);
 
判断链表是否为空，返回1为空，0非空
int list_empty(struct list_head *head);
 
把两个链表拼接起来：
void list_splice(struct list_head *list, struct list_head *head)；
 
取得节点指针：
#define list_entry(ptr, type, member) \
 ((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))
 
遍历链表中每个节点：
#define list_for_each(pos, head) \
 for (pos = (head)->next, prefetch(pos->next); pos != (head); \
         pos = pos->next, prefetch(pos->next))
 
逆向循环链表中每个节点：
#define list_for_each_prev(pos, head) \
 for (pos = (head)->prev, prefetch(pos->prev); pos != (head); \
         pos = pos->prev, prefetch(pos->prev))
 
举例：
 
LISH_HEAD(mylist);
 
struct my_list{
 struct list_head list;
 int data;
};
 
static int ini_list(void)
{
 struct my_list *p;
 int i;
 for(i=0; i<100; i++){
  p=kmalloc(sizeof(struct my_list), GFP_KERNEL);
  list_add(&p->list, &mylist);
 }
}

在内存中形成如下结构的一个双向链表：
 
  +---------------------------------------------------------------+
  |                                                               |
  |  mylist         99            98                     0        |
  |  +----+    +---------+    +---------+           +---------+   |
  +->|next|--->|list.next|--->|list.next|--->...--->|list.next|---+
     |----|    |---------|    |---------|           |---------|
  +--|prev|<---|list.prev|<---|list.prev|<---...<---|list.prev|<--+
  |  +----+    |---------|    |---------|           |---------|   |
  |            |  data   |    |  data   |           |  data   |   |
  |            +---------+    +---------+           +---------+   |
  |                                                               |
  +---------------------------------------------------------------+
 
知道了链表头就能遍历整个链表，如果是用list_add()插入新节点的话，从链表头的next方向看是一个堆栈型。
 
从链表中删除节点很容易：
static void del_item(struct my_list *p)
{
 list_del(&p->list, &mylist);
 kfree(p);
}
 
最重要的宏是list_entry，这个宏的思路是根据链表元素结构中链表头结构list_head的地址推算出链表元素结构的实际地址：
 
#define list_entry(ptr, type, member) \
 ((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))
 
ptr是链表元素结构(如struct my_list)中链表头结构list_head的地址
member是链表元素结构(如struct my_list)中链表头结构list_head参数的名称
type是链表元素结构类型(如struct my_list)
计算原理是根据链表头结构list_head的地址减去其在链表元素结构中的偏移位置而得到链表元素结构的地址。
 
例如：
static void print_list(void)
{
 struct list_head *cur;
 struct my_list *p;
 list_for_each(cur, &mylist){
  p=list_entry(cur, struct my_list, list);
  printk("data=%d\n", p->data);
 }
}
 
优点：
这样就可以用相同的数据处理方式来描述所有双向链表，不用再单独为各个链表编写各种编辑函数。
 
缺点：
1) 链表头中元素置为NULL不是初始化，与普通习惯不同；
2) 仍然需要单独编写各自的删除整个链表的函数，不能统一处理，因为不能保证所有链表元素结构中链表头结构list_head的偏移地址都是相同的，当然如果把链表头结构list_head都作为链表元素结构的第一个参数，就可以用统一的删除整个链表的函数。

3. HASH表
 
HASH表适用于不需要对整个空间元素进行排序，而是只需要能快速找到某个元素的场合，是一种以空间换时间的方法，本质也是线性表，但由一个大的线性表拆分为了多个小线性表，由于只需要查找小表，因此搜索速度就会线性查整个大表提高很多，理想情况下，有多少个小线性表，搜索速度就提高了多少倍，通常把小线性表的表头综合为一个数组，大小就是HASH表的数量。
 
HASH表速度的关键是HASH函数的设计，HASH函数根据每个元素中固定的参数进行计算，算出一个不大于HASH表数量的索引值，表示该元素需要放在该索引号对应的那个表中，对于固定的参数，计算结果始终是固定的，但对于不同的参数值，希望计算出来的结果能尽可能地平均到每个索引值，HASH函数计算得越平均，表示每个小表中元素的数量都会差不多，这样搜索性能将越好。HASH函数也要尽可能的简单，以减少计算时间，常用的算法是将参数累加求模，在include/linux/jhash.h中已经定义了一些HASH计算函数，可直接使用。
 
HASH表在路由cache表，状态连接表等处用得很多。
 
举例，连接跟踪中根据tuple值计算HASH：
// net/ipv4/netfilter/ip_conntrack_core.c
u_int32_t
hash_conntrack(const struct ip_conntrack_tuple *tuple)
{
#if 0
 dump_tuple(tuple);
#endif
 return (jhash_3words(tuple->src.ip,
                      (tuple->dst.ip ^ tuple->dst.protonum),
                      (tuple->src.u.all | (tuple->dst.u.all << 16)),
                      ip_conntrack_hash_rnd) % ip_conntrack_htable_size);
}
 
// include/linux/jhash.h
static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
{
 a += JHASH_GOLDEN_RATIO;
 b += JHASH_GOLDEN_RATIO;
 c += initval;
 __jhash_mix(a, b, c);
 return c;
}
 
4. 定时器(timer)
 
linux内核定时器由以下结构描述：
 
/* include/linux/timer.h */
struct timer_list {
 struct list_head list;
 unsigned long expires;
 unsigned long data;
 void (*function)(unsigned long);
};
list：timer链表
expires：到期时间
function：到期函数，时间到期时调用的函数
data：传给到期函数的数据，实际应用中通常是一个指针转化而来，该指针指向一个结构

timer的操作：
 
增加timer，将timer挂接到系统的timer链表：
extern void add_timer(struct timer_list * timer);
 
删除timer，将timer从系统timer链表中拆除：
extern int del_timer(struct timer_list * timer);
(del_timer()函数可能会失败，这是因为该timer本来已经不在系统timer链表中了，也就是已经删除过了)
 
对于SMP系统，删除timer最好使用下面的函数来防止冲突：
extern int del_timer_sync(struct timer_list * timer);
 
修改timer，修改timer的到期时间：
int mod_timer(struct timer_list *timer, unsigned long expires);
 
通常用法：
struct timer_list通常作为数据结构中的一个参数，在初始化结构的时候初始化timer，表示到期时要进行的操作，实现定时动作，通常更多的是作为超时处理的，timer函数作为超时时的资源释放函数。注意：如果超时了运行超时函数，此时系统是处在时钟中断的bottom half里的，不能进行很复杂的操作，如果要完成一些复杂操作，如到期后的数据发送，不能直接在到期函数中处理，而是应该在到期函数中发个信号给特定内核线程转到top half进行处理。
 
为判断时间的先后，内核中定义了以下宏来判断：
#define time_after(a,b)  ((long)(b) - (long)(a) < 0)
#define time_before(a,b) time_after(b,a)
#define time_after_eq(a,b) ((long)(a) - (long)(b) >= 0)
#define time_before_eq(a,b) time_after_eq(b,a)
这里用到了一个技巧，由于linux中的时间是无符号数，这里先将其转换为有符号数后再判断，就能解决时间回绕问题，当然只是一次回绕，回绕两次当然是判断不出来的，具体可自己实验体会。
 
5. 内核线程(kernel_thread)
 
内核中新线程的建立可以用kernel_thread函数实现，该函数在kernel/fork.c中定义：
long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
fn：内核线程主函数；
arg：线程主函数的参数；
flags：建立线程的标志；
 
内核线程函数通常都调用daemonize()进行后台化作为一个独立的线程运行，然后设置线程的一些参数，如名称，信号处理等，这也不是必须的，然后就进入一个死循环，这是线程的主体部分，这个循环不能一直在运行，否则系统就死在这了，或者是某种事件驱动的，在事件到来前是睡眠的，事件到来后唤醒进行操作，操作完后继续睡眠；或者是定时睡眠，醒后操作完再睡眠；或者加入等待队列通过schedule()调度获得执行时间。总之是不能一直占着 CPU。
 
以下是内核线程的一个实例，取自kernel/context.c:
 
int start_context_thread(void)
{
 static struct completion startup __initdata = COMPLETION_INITIALIZER(startup);
 kernel_thread(context_thread, &startup, CLONE_FS | CLONE_FILES);
 wait_for_completion(&startup);
 return 0;
}
static int context_thread(void *startup)
{
 struct task_struct *curtask = current;
 DECLARE_WAITQUEUE(wait, curtask);
 struct k_sigaction sa;
 daemonize();
 strcpy(curtask->comm, "keventd");
 keventd_running = 1;
 keventd_task = curtask;
 spin_lock_irq(&curtask->sigmask_lock);
 siginitsetinv(&curtask->blocked, sigmask(SIGCHLD));
 recalc_sigpending(curtask);
 spin_unlock_irq(&curtask->sigmask_lock);
 complete((struct completion *)startup);
 /* Install a handler so SIGCLD is delivered */
 sa.sa.sa_handler = SIG_IGN;
 sa.sa.sa_flags = 0;
 siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
 do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
 /*
  * If one of the functions on a task queue re-adds itself
  * to the task queue we call schedule() in state TASK_RUNNING
  */
 for (;;) {
  set_task_state(curtask, TASK_INTERRUPTIBLE);
  add_wait_queue(&context_task_wq, &wait);
  if (TQ_ACTIVE(tq_context))
   set_task_state(curtask, TASK_RUNNING);
  schedule();
  remove_wait_queue(&context_task_wq, &wait);
  run_task_queue(&tq_context);
  wake_up(&context_task_done);
  if (signal_pending(curtask)) {
   while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG) > 0)
    ;
   spin_lock_irq(&curtask->sigmask_lock);
   flush_signals(curtask);
   recalc_sigpending(curtask);
   spin_unlock_irq(&curtask->sigmask_lock);
  }
 }
}
 
6. 结构地址
 
在C中，结构地址和结构中第一个元素的地址是相同的，因此在linux内核中经常出现使用结构第一个元素的地址来表示结构地址的情况，在读代码时要注意这一点，这和list_entry宏的意思一样。
 
如：
struct my_struct{
 int a;
 int b;
}c;
if(&c == &c.a){  // always true
...
}
Linux内核中的一些基本操作

猜你喜欢