2019-2020-1 20199309 "Linux kernel principle and Analysis" in the seventh week of work

A, Linux kernel process create a new process

1. Concept

  • Three functions of the operating system kernel is process management, memory management, file system, process management is the core
  • Linux process status and operational principles of the system described process status is different, such as the ready state and run state are TASK_RUNNING. (This means that it is operational, but in fact there is no running depends on whether it occupies CPU)
  • fork is called once, to return twice. Returns the newly created child process pid in the parent process; returns 0 in the child process
  • After calling fork, data, heap, stack, there are two, the code still (to be shared code section two processes of this code segment) is one. When there is a parent and child want to modify the data or stack, the two processes actually split.

2. kernel code analysis

SYSCALL_DEFINE0(fork)
{
#ifdef CONFIG_MMU
    return do_fork(SIGCHLD, 0, 0, NULL, NULL);
#else
    return -EINVAL;
#endif
}
SYSCALL_DEFINE0(vfork)
{
    return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0,
            0, NULL, NULL);
}
#ifdef __ARCH_WANT_SYS_CLONE
#ifdef CONFIG_CLONE_BACKWARDS
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
         int __user *, parent_tidptr,
         int, tls_val,
         int __user *, child_tidptr)
#elif defined(CONFIG_CLONE_BACKWARDS2)
SYSCALL_DEFINE5(clone, unsigned long, newsp, unsigned long, clone_flags,
         int __user *, parent_tidptr,
         int __user *, child_tidptr,
         int, tls_val)
#elif defined(CONFIG_CLONE_BACKWARDS3)
SYSCALL_DEFINE6(clone, unsigned long, clone_flags, unsigned long, newsp,
        int, stack_size,
        int __user *, parent_tidptr,
        int __user *, child_tidptr,
        int, tls_val)
#else
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
         int __user *, parent_tidptr,
         int __user *, child_tidptr,
         int, tls_val)
#endif
{
    return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
}
#endif

Through the above code can be seen fork, vfork, and clone three system calls can create a new process, and are used to create a process by do_fork, but different parameters passed.

(1)do_fork

long do_fork(unsigned long clone_flags, unsigned long stack_start,
        unsigned long stack_size, int __user *parent_tidptr,
        int __user *child_tidptr)

First look at the parameters do_fork () is:

  • clone_flags: the child of a related marks, you can selectively copy of the parent's resources by this flag.

  • stack_start: Address child process user mode stack.

  • regs: pt_regs structure point (occurs when the system call system, the stored value pt_regs structure and sequentially register pressed kernel stack) pointer.

  • stack_size: user mode stack size, usually unnecessary, is always set to 0.

  • parent_tidptr and child_tidptr: parent, child process user mode under pid address.

To facilitate understanding, the following is a simplified key code:

struct task_struct *p;    //创建进程描述符指针
  int trace = 0;
  long nr;                  //子进程pid
  ...
  p = copy_process(clone_flags, stack_start, stack_size, 
              child_tidptr, NULL, trace);   //创建子进程的描述符和执行时所需的其他数据结构

  if (!IS_ERR(p))                            //如果 copy_process 执行成功
        struct completion vfork;             //定义完成量(一个执行单元等待另一个执行单元完成某事)
        struct pid *pid;
        ...
        pid = get_task_pid(p, PIDTYPE_PID);   //获得task结构体中的pid
        nr = pid_vnr(pid);                    //根据pid结构体中获得进程pid
        ...
        // 如果 clone_flags 包含 CLONE_VFORK 标志,就将完成量 vfork 赋值给进程描述符中的vfork_done字段,此处只是对完成量进行初始化
        if (clone_flags & CLONE_VFORK) {
            p->vfork_done = &vfork;
            init_completion(&vfork);
            get_task_struct(p);
        }

        wake_up_new_task(p);        //将子进程添加到调度器的队列,使之有机会获得CPU

        /* forking complete and child started to run, tell ptracer */
        ...
        // 如果 clone_flags 包含 CLONE_VFORK 标志,就将父进程插入等待队列直至程直到子进程释调用exec函数或退出,此处是具体的阻塞
        if (clone_flags & CLONE_VFORK) {
            if (!wait_for_vfork_done(p, &vfork))
                ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);
        }

        put_pid(pid);
    } else {
        nr = PTR_ERR(p);        //错误处理
    }
    return nr;               //返回子进程pid(父进程的fork函数返回的值为子进程pid的原因)
}

do_fork () completed a major call copy_process () to copy the parent process information, get pid, the child process calls wake_up_new_task join scheduler queue, whom allocate CPU, do some auxiliary work by clone_flags flag. Which copy_process () is to create a process content of the main code.

(2)copy_process

static struct task_struct *copy_process(unsigned long clone_flags,
                    unsigned long stack_start,
                    unsigned long stack_size,
                    int __user *child_tidptr,
                    struct pid *pid,
                    int trace)
{
    int retval;
    struct task_struct *p;
    ...
    retval = security_task_create(clone_flags);//安全性检查
    ...
    p = dup_task_struct(current);   //复制PCB,为子进程创建内核栈、进程描述符
    ftrace_graph_init_task(p);
    ···
    
    retval = -EAGAIN;
    // 检查该用户的进程数是否超过限制
    if (atomic_read(&p->real_cred->user->processes) >=
            task_rlimit(p, RLIMIT_NPROC)) {
        // 检查该用户是否具有相关权限,不一定是root
        if (p->real_cred->user != INIT_USER &&
            !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
            goto bad_fork_free;
    }
    ...
    // 检查进程数量是否超过 max_threads,后者取决于内存的大小
    if (nr_threads >= max_threads)
        goto bad_fork_cleanup_count;

    if (!try_module_get(task_thread_info(p)->exec_domain->module))
        goto bad_fork_cleanup_count;
    ...
    spin_lock_init(&p->alloc_lock);          //初始化自旋锁
    init_sigpending(&p->pending);           //初始化挂起信号 
    posix_cpu_timers_init(p);               //初始化CPU定时器
    ···
    retval = sched_fork(clone_flags, p);  //初始化新进程调度程序数据结构,把新进程的状态设置为TASK_RUNNING,并禁止内核抢占
    ...
    // 复制所有的进程信息
    shm_init_task(p);
    retval = copy_semundo(clone_flags, p);
    ...
    retval = copy_files(clone_flags, p);
    ...
    retval = copy_fs(clone_flags, p);
    ...
    retval = copy_sighand(clone_flags, p);
    ...
    retval = copy_signal(clone_flags, p);
    ...
    retval = copy_mm(clone_flags, p);
    ...
    retval = copy_namespaces(clone_flags, p);
    ...
    retval = copy_io(clone_flags, p);
    ...
    retval = copy_thread(clone_flags, stack_start, stack_size, p);// 初始化子进程内核栈
    ...
    //若传进来的pid指针和全局结构体变量init_struct_pid的地址不相同,就要为子进程分配新的pid
    if (pid != &init_struct_pid) {
        retval = -ENOMEM;
        pid = alloc_pid(p->nsproxy->pid_ns_for_children);
        if (!pid)
            goto bad_fork_cleanup_io;
    }

    ...
    p->pid = pid_nr(pid);    //根据pid结构体中获得进程pid
    //若 clone_flags 包含 CLONE_THREAD标志,说明子进程和父进程在同一个线程组
    if (clone_flags & CLONE_THREAD) {
        p->exit_signal = -1;
        p->group_leader = current->group_leader; //线程组的leader设为子进程的组leader
        p->tgid = current->tgid;       //子进程继承父进程的tgid
    } else {
        if (clone_flags & CLONE_PARENT)
            p->exit_signal = current->group_leader->exit_signal;
        else
            p->exit_signal = (clone_flags & CSIGNAL);
        p->group_leader = p;          //子进程的组leader就是它自己
        
       
        p->tgid = p->pid;        //组号tgid是它自己的pid
    }

    ...
    
    if (likely(p->pid)) {
        ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);

        init_task_pid(p, PIDTYPE_PID, pid);
        if (thread_group_leader(p)) {
            ...
            // 将子进程加入它所在组的哈希链表中
            attach_pid(p, PIDTYPE_PGID);
            attach_pid(p, PIDTYPE_SID);
            __this_cpu_inc(process_counts);
        } else {
            ...
        }
        attach_pid(p, PIDTYPE_PID);
        nr_threads++;     //增加系统中的进程数目
    }
    ...
    return p;             //返回被创建的子进程描述符指针P
    ...
}

copy_process mainly to complete the call dup_task_struct copy the current task_struct, information check, initialization, the process status is set to TASK_RUNNING, copy all process information, call copy_thread initialization sub-process kernel stack, set the child process pid.

(3)dup_task_struct

static struct task_struct *dup_task_struct(struct task_struct *orig)
{
    struct task_struct *tsk;
    struct thread_info *ti;
    int node = tsk_fork_get_node(orig);
    int err;
    tsk = alloc_task_struct_node(node);    //为子进程创建进程描述符
    ...
    ti = alloc_thread_info_node(tsk, node); //实际上是创建了两个页,一部分用来存放 thread_info,一部分就是内核堆栈
    ...
    err = arch_dup_task_struct(tsk, orig);  //复制父进程的task_struct信息
    ...
    tsk->stack = ti;                  // 将栈底的值赋给新结点的stack
   
    setup_thread_stack(tsk, orig);//对子进程的thread_info结构进行初始化(复制父进程的thread_info 结构,然后将 task 指针指向子进程的进程描述符)
    ...
    return tsk;               // 返回新创建的进程描述符指针
    ...
}

(4) copy_thread
dup_task_struct just create a kernel stack for the child process, copy_thread really complete the assignment.

int copy_thread(unsigned long clone_flags, unsigned long sp,
    unsigned long arg, struct task_struct *p)
{

    
    struct pt_regs *childregs = task_pt_regs(p);
    struct task_struct *tsk;
    int err;

    p->thread.sp = (unsigned long) childregs;
    p->thread.sp0 = (unsigned long) (childregs+1);
    memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));

    
    if (unlikely(p->flags & PF_KTHREAD)) {
        /* kernel thread */
        memset(childregs, 0, sizeof(struct pt_regs));
      
        p->thread.ip = (unsigned long) ret_from_kernel_thread; //如果创建的是内核线程,则从ret_from_kernel_thread开始执行
        task_user_gs(p) = __KERNEL_STACK_CANARY;
        childregs->ds = __USER_DS;
        childregs->es = __USER_DS;
        childregs->fs = __KERNEL_PERCPU;
        childregs->bx = sp; /* function */
        childregs->bp = arg;
        childregs->orig_ax = -1;
        childregs->cs = __KERNEL_CS | get_kernel_rpl();
        childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
        p->thread.io_bitmap_ptr = NULL;
        return 0;
    }

    
    *childregs = *current_pt_regs();//复制内核堆栈(复制父进程的寄存器信息,即系统调用SAVE_ALL压栈的那一部分内容)
    
    childregs->ax = 0;           //子进程的eax置为0,所以fork的子进程返回值为0
    ...
    p->thread.ip = (unsigned long) ret_from_fork;//ip指向 ret_from_fork,子进程从此处开始执行
    task_user_gs(p) = get_user_gs(current_pt_regs());
    ...
    return err;

Second, the textbook notes

1. Timers and Time Management

  • Core computing hardware and management time in the help.
  • The system self-timer to trigger a certain frequency clock interrupt, called the beat rate.
  • Twice the interval clock interrupt, becoming the beat (beat rate one-half)
  • Wall time (actual time) and the system operation time (since the system boot over time) is calculated according to the clock interval.
  • The total number of jiffies global variable is used to beat since the system started to produce records. It starts when the kernel is initialized to 0, after which each time increases the clock interrupt handler value of the variable. The number of interrupts per second HZ, increasing the value of one second is jiffies HZ, system running time (in seconds) of jiffie / HZ.
  • Time for a few macros:
time_after(unknown, known)      //unknown after known ? true : false;
time_before(unknown, known)     //unknown before known ? true : false;
time_after_eq(unknown, known)   //unknown after or eq known ? true : false;
time_before_eq(unknown, known)  //unknown before or eq known ? true : false;
  • The RTC device is used to the system time of persistent storage.
  • Timer management is the core foundation of the passage of time. Just perform initialization, set a timeout period, execution of the specified function after a timeout occurs, then activate a timer on it. Cycle timer is not running it on their own retreat after a timeout. Timers are represented by the following structure:
struct timer_list {
       struct list_head entry;//定时器链表的入口
       unsigned long expires;//基于jiffies的定时值
       struct tvec_base *base;//定时器内部值
       void (*function)(unsigned long);//定时器处理函数
       ...
       };

Timer handler function prototype:

void my_timer_function(unsigned long data);

add_timer(&my_timer);            //激活定时器

mod_timer(&my_timer, jiffies + new_dalay);        //改变指定定时器的超时时间
                                                  //如果定时器未被激活,mod_timer会激活该定时器
                                                  //如果调用时定时器未被激活,该函数返回0;否则返回1.
                                                                                            
del_timer(&my_timer);            //在定时器超时前停止定时器
                                 //被激活或未被激活的定时器都可以使用该函数
                                 //如果调用时定时器未被激活,该函数返回0;否则返回1.
                                 //不需要为已经超时的定时器调用该函数,因为他们会自动删除
  • Delay occurs when the execution should not be holding the lock or disable interrupts
  • The simplest method is busy waiting delay (or delay time is an integer multiple of the beat rate of less demanding precision may be used)
  • Short delay time delay precision of milliseconds, delicate; wait for a short time to complete the operation, the clock pulse is shorter than the need to rely on several cycles to achieve the retardation effect.
    schedule_timeout () * Specifies the time to perform the task sleep up to delay code that calls it must be in the context of the process, and can not hold the lock.
set_current_state(state);        //将任务设置为可中断睡眠状态或不可中断睡眠状态
schedule_timeout(s*HZ);          //S秒后唤醒,被延迟的任务并将其重新放回运行队列。

Guess you like

Origin www.cnblogs.com/fungi/p/11784476.html