6.S081——Lab3——page table

0.briefly speaking

This is the third experiment of 6.S081 Fall 2021. It is an experiment related to page tables and virtual memory. We have read in detail in the Xv6 kernel before.Source code about virtual memory management, now it's time to dive into the experiment. This experiment is divided into the following three small tasks:

  • Speed up system calls (easy)
  • Print a page table (easy)
  • Detecting which pages have been accessed (hard)

Let's take a look one by one...

1.Speed up system calls (easy)

The first task is to speed up system calls, the main method iswhen creating a processAn additional allocation in the user page tableread-only page, so that when a particular system call is executedRead data directly from the user page table and returnReduce the number of traversals between user mode and kernel mode, thereby reducing the overhead of system calls and speeding up the execution process.

According to the prompts in the experiment guide, we can follow theFollow these steps to accomplish this task

1. Allocate a physical memory page for the process in the allocproc function of kernel/proc.c, which is specially used to store shared information.
We first read the allocproc function in kernel/proc.c to see what it is doing, and then follow it The way of writing, add in the codeAssign the code for an accelerated page, this page will be mapped to the process page table by the proc_pagetable function later.

// Look in the process table for an UNUSED proc.
// If found, initialize state required to run in the kernel,
// and return with p->lock held.
// If there are no free procs, or a memory allocation fails, return 0.
// 译:查找进程表,找到UNUSED状态的进程
// 如果找到了UNUSED状态的进程,那么初始化它的状态,使其能在内核态下运行
// 并且返回时不占用锁
// 如果没有空闲进程,或内存分配失败,则返回0
static struct proc*
allocproc(void)
{
    
    
  struct proc *p;
  
  // 遍历进程组,寻找处于UNUSED状态的进程
  for(p = proc; p < &proc[NPROC]; p++) {
    
    
    // 首先获取进程的锁,保证访问安全
    acquire(&p->lock);
    
    // 如果进程状态为UNUSED,则跳转至found
    if(p->state == UNUSED) {
    
    
      goto found;
    } else {
    
    
      // 否则释放锁,检查下一个进程是否为UNUSED状态
      release(&p->lock);
    }
  }
  return 0;

// 以下是初始化一个进程的代码
found:
  // 为当前进程分配PID,并将当前进程状态改为USED
  p->pid = allocpid();
  p->state = USED;

  // Allocate a trapframe page.
  // 分配一个trapframe页,如果不成功则释放当前进程和锁
  if((p->trapframe = (struct trapframe *)kalloc()) == 0){
    
    
    freeproc(p);
    release(&p->lock);
    return 0;
  }

  // <照葫芦画瓢,我们也像trapframe一样申请一个空闲物理页用来存储变量>
  // <那么在进程信息里也得保留一份指向这个页面的指针>,这里记为p->usyscall
  // 因为内核执行的是直接映射,所以kalloc返回的指针可以直接当作物理地址映射到页表中
  if((p->usyscall) == (struct usyscall *)kalloc() == 0){
    
    
    freeproc(p);
    release(&p->lock);
    return 0;
  }
  
  // <将pid信息放入结构体,其实也就是放到了这个物理页面开头处>
  p->usyscall->pid = p->pid;

  // An empty user page table.
  // 为当前进程申请一个页表页,并将trapframe和trampoline页面映射进去
  // 注意trampoline代码作为内核代码的一部分,不用额外分配空间,只需要建立映射关系
  // <我们后面将要修改这个函数,将加速页面一并映射进去>
  p->pagetable = proc_pagetable(p);
  
  // 如果上述函数执行不成功,则释放当前进程和锁
  if(p->pagetable == 0){
    
    
    freeproc(p);
    release(&p->lock);
    return 0;
  }

  // Set up new context to start executing at forkret,
  // which returns to user space.
  // 译:设置新的上下文并从forkret开始执行
  // 这会回到用户态下
  // 这里涉及trap的返回过程,我们在后面研究源码时再深入解读这里的行为
  memset(&p->context, 0, sizeof(p->context));
  p->context.ra = (uint64)forkret;
  p->context.sp = p->kstack + PGSIZE;
 
  // 返回新的进程
  return p;
}

Note that above I put a note related to the experimentAll enclosed in angle brackets, as shown in the comments, the first thing we need to modify is the definition of the process structure, adding a pointer to the usyscall structure, this pointerIn essence, it also points to the starting physical address of the accelerated page,As follows:

// Per-process state
struct proc {
    
    
  struct spinlock lock;

  // p->lock must be held when using these:
  enum procstate state;        // Process state
  void *chan;                  // If non-zero, sleeping on chan
  int killed;                  // If non-zero, have been killed
  int xstate;                  // Exit status to be returned to parent's wait
  int pid;                     // Process ID

  // wait_lock must be held when using this:
  struct proc *parent;         // Parent process

  // these are private to the process, so p->lock need not be held.
  uint64 kstack;               // Virtual address of kernel stack
  uint64 sz;                   // Size of process memory (bytes)
  pagetable_t pagetable;       // User page table
  struct trapframe *trapframe; // data page for trampoline.S
  struct usyscall *usyscall;   // <加入一个指向加速页面的指针> 
  struct context context;      // swtch() here to run process
  struct file *ofile[NOFILE];  // Open files
  struct inode *cwd;           // Current directory
  char name[16];               // Process name (debugging)
};

Then in allocproc we allocate a physical page for this pointer, and put the pid information into this page. These codes have been fully written above, so I won’t repeat them here.

2. Write the code to release memory in the freeproc function, which is used for possible error handling
First or firstRead the code implementation of freeproc, and add the code to release the usyscall page:

// free a proc structure and the data hanging from it,
// including user pages.
// p->lock must be held.
// 译:释放进程结构体和悬挂在其上的数据,包括页表
// 必须要持有进程p的锁才可以调用此函数
static void
freeproc(struct proc *p)
{
    
    
  // 释放trapframe页面,之所以要单独释放是因为:
  // trapframe页面位于地址空间的最高处,与下面已经使用的地址空间是分离的
  if(p->trapframe)
    kfree((void*)p->trapframe);
  
  // 释放完将trapframe指针置为空
  p->trapframe = 0;
  
  // <仿照trapframe内存释放的代码,在出错时将p->usyscall页面释放,并将指针置空> 
  if(p->usyscall)
    kfree((void*)p->usyscall);
  p->usyscall = 0;
  
  // 释放页表
  // proc_freepagetable会调用uvmunmap解除trampoline和trapframe的映射关系
  // 并最终调用uvmfree释放内存和页表
  // <我们后面要修改此处的函数>
  if(p->pagetable)
    proc_freepagetable(p->pagetable, p->sz);
  
  // 将页表与其他量全部置为空,表示进程已完全释放
  p->pagetable = 0;
  p->sz = 0;
  p->pid = 0;
  p->parent = 0;
  p->name[0] = 0;
  p->chan = 0;
  p->killed = 0;
  p->xstate = 0;
  p->state = UNUSED;
}

But after doing this,Only the physical memory of the usyscall page is freed, and then inIn the proc_freepagetable function, release the mapping relationship of the page, such a release process is considered complete, so modify the proc_freepagetable function as follows:

// Free a process's page table, and free the
// physical memory it refers to.
// 译:释放进程的页表,并释放它指向的物理内存
void
proc_freepagetable(pagetable_t pagetable, uint64 sz)
{
    
    
  // 解除TRAMPOLINE和TRAPFRAME的映射关系
  // 之所以要分开来写是因为它们和连续的地址空间是分离的
  // 连续空间使用uvmfree一套解决
  uvmunmap(pagetable, TRAMPOLINE, 1, 0);
  uvmunmap(pagetable, TRAPFRAME, 1, 0);
  
  // <释放USYSCALL函数的映射关系>
  uvmunmap(pagetable, USYSCALL, 1, 0);
  uvmfree(pagetable, sz);
}

3. Finally, complete the mapping of this page in the proc_pagetable function of kernel/proc.c, and set the page access permission to read-only in user mode (PTE_U | PTE_R)

Let's read the code implementation of proc_pagetable first, and thenMap this "accelerated page" into the page table,code show as below:

// Create a user page table for a given process,
// with no user memory, but with trampoline pages.
// 译: 为给定的进程创建一个页表
// 没有用户地址空间,只有trampoline页面
pagetable_t
proc_pagetable(struct proc *p)
{
    
    
  pagetable_t pagetable;

  // An empty page table.
  // 调用uvmcreate函数返回一个空页表
  // uvmcreate函数的详细解释见完全解析系列博客(2)
  pagetable = uvmcreate();
  if(pagetable == 0)
    return 0;

  // map the trampoline code (for system call return)
  // at the highest user virtual address.
  // only the supervisor uses it, on the way
  // to/from user space, so not PTE_U.
  // 译:将trampoline页面(用于系统调用返回)映射到用户最高虚拟地址处
  // 只有超级用户(处于超级用户模式)下才可以使用
  // 所以PTE_U标志为0
  // mappages函数的讲解见系列博客(1)
  // uvmfree函数的讲解见系列博客(2)
  // 如果出错,就调用uvmfree来释放映射关系,回收页表
  // 注意这里传入的sz是0,表明在这一步没有实际的物理内存需要释放
  if(mappages(pagetable, TRAMPOLINE, PGSIZE,
              (uint64)trampoline, PTE_R | PTE_X) < 0){
    
    
    uvmfree(pagetable, 0);
    return 0;
  }
  
  // map the trapframe just below TRAMPOLINE, for trampoline.S.
  // uvmunmap函数的讲解见系列博客(2)
  // 将trapframe映射到紧邻TRAMPOLINE的下一个页面
  // 如果出错,首先取消TRAMPOLINE的映射关系,再使用uvmfree释放页表映射关系,回收页表
  if(mappages(pagetable, TRAPFRAME, PGSIZE,
              (uint64)(p->trapframe), PTE_R | PTE_W) < 0){
    
    
    uvmunmap(pagetable, TRAMPOLINE, 1, 0);
    uvmfree(pagetable, 0);
    return 0;
  }
  
  // <仿照上面的格式将加速页面映射到页表中>
  // 使用mappages将此页映射到页表中
  // 如果出错要释放映射trampoline和trapframe的映射关系
  // 并释放pagetable的内存,返回空指针
  if(mappages(pagetable, USYSCALL, PGSIZE, 
              (uint64)(p->usyscall), PTE_R | PTE_U) < 0){
    
    
      uvmunmap(pagetable, TRAMPOLINE, 1, 0);
      uvmunmap(pagetable, TRAPFRAME, 1, 0);
      uvmfree(pagetable, 0);
      return 0;
  }
  // <添加代码结束>

  return pagetable;
}

In fact, there are not many things to do in this experiment, but it is actually very challenging.Page tables and virtual memory mechanismTo understand, let's sort out this little experiment.

First, the allocproc function will call kalloc to allocate a physical page dedicated to storing someInformation that can be read directly without entering the kernel state. kalloc will return a pointer, although it is a virtual address,But due to the direct mapping mechanism of the kernel page table, we can also use this address directly as a physical address. Then, we put the data (pid) we want to speed up access directly into this page, and maintain a pointer to this page in the process information structure. Note that until this point, the page just existed, but the processIt has not been mapped into its own address space, so this page cannot be accessed in the page table

Next allocproc will call the proc_pagetable function, this functionwill create the page table of the process, and special pages (trampoline, trapframe, our acceleration page)mapped into the page table. These special pages must have been allocated, that is, they already exist in physical memory, and we just use them here to establish a mapping relationship.The trampoline is part of the kernel code and always exists in memory. The trapframe and our accelerated page have been allocated in the previous allocproc, so they also exist, so in proc_pagetable we use mappages to map them to the process page table one by one.

After the above steps are completed, a process directly accesses the so-called USYSCALL virtual address in user mode, and this address will be indexed into the acceleration page we allocated after being translated by the multi-level page table, and then the desired data can be obtained directly .

The above is the principle of the whole experiment, and the more subtle part is error handling, that is, in the above processPossible error handling and corresponding disposal measures. Pay attention to the division of labor and cooperation between allocproc, proc_pagetable, proc_freepagetable, and free_proc, and it should not be confused.

  • allocproc is responsible for allocating physical memory
  • freeproc is responsible for releasing physical memory
  • proc_pagetable is responsible for applying for page table pages and establishing mapping relationships
  • proc_freepagetable is responsible for unmapping and reclaiming page table pages

It is worth pondering that when the mapping page fails in the proc_pagetable function, it willreturns a null pointer to allocproc, here if you return a null pointer to allocproc, you will directly enter freeproc later, and at this time because the p->pagetable pointer is empty, soproc_freepagetable function will not execute, so the release logic that originally belonged to itAll have to be written to proc_pagetable as compensation, so we see that in factThe error handling logic in proc_pagetable is very similar to the logic in proc_freepagetable in form

Run the test program, the result is correct:

insert image description here

2.Print a page table (easy)

The second small task is also very simple, that is, the page table of a processPrint out according to the specified format. According to the experiment guide, the format of this function should be very similar to walkaddr, so it should be a recursive logic , so we will refer to the format of this function to write. But the most difficult part may beControl print format, so I first defined the raw_vmprint function. The function of this function is to print the specified number of indentation characters according to the level of the page table. The code is as follows (kernel/vm.c):

// 此函数借鉴了walkaddr的写法,用来递归地打印页表
void
raw_vmprint(pagetable_t pagetable, int Layer)
{
    
    
  // 遍历页表的每一项
  for(int i = 0 ; i < 512 ; ++i){
    
    
    pte_t pte = pagetable[i];
     
    // 如果当前的pte指向的是更低一级的页表
    if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) == 0){
    
    
      // 从PTE中解析出物理地址,并打印指定数量的缩进符
      // 注意解析物理地址时,不能只是简单地将权限位移除出去,还应该左移12位,让出页内偏移量
      uint64 phaddr = (pte >> 10) << 12;
      for( ; Layer != 0 ; --Layer)
        printf(".. ");
      
      // 打印本级页表信息,向孩子页表递归,注意层数+1
      printf("..%d: pte %p pa %p\n", i, pte, phaddr);
      uint64 child = PTE2PA(pte);
      raw_vmprint((pagetable_t)child, Layer + 1);
    }
    
    // 如果当前PTE指向的是叶级页表
    // 取出物理地址并打印信息,随后返回
    else if (pte & PTE_V){
    
    
      uint64 phaddr = (pte >> 10) << 12;
      printf(".. .. ..%d: pte %p pa %p\n", i, pte, phaddr);
    }
  }
}

then inThis function is based on a layer of encapsulation, constitutes the final vmprint function:

void vmprint(pagetable_t pagetable)
{
    
    
  raw_vmprint(pagetable, 0);
}

Don't forget to add the function signature of vmprint in the def.h function, otherwise this function cannot be called by other functions:

// kernel/def.h:173
void            vmprint(pagetable_t);   // <插入vmprint的函数签名>

Finally, don't forget to call the vmprint function before the exec function returns:

// kernel/exec.c:119-122
// 为第一个进程打印页表,注意这个页表头也可以放在vmprint中打印
// 我为了让vmprint函数更加干净,把这个打印语句摘出来了
if(p->pid == 1){
    
    
    printf("page table %p\n", p->pagetable);
    vmprint(p->pagetable);
  }

This small task is over here, it is very simple, let us see the specific printing effect:
insert image description here
theseThe virtual address should be consistent with the experiment instruction, while the specific physical address can beVaries with the experimental environment. Finally, test the correctness of the printing, there must be no problem:
insert image description here

3.Detecting which pages have been accessed (hard)

The last task isImplement a system call to detect whether a page is accessed, which is actually very simple. According to the instructions in the experiment guide, we first add PTE_A to the header file in the header file kernel/riscv.h, and check the RISC-V instruction manual.Found that the PTE_A flag is at bit 6:
insert image description here
In addition, we stipulate that onceThe maximum number of pages that can be queried MAXSCAN, which is also defined in the kernel/riscv.h file (the value of 32 is confirmed by reading the pgaccess_test function in user/pgtbltest.c).

#define PTE_V (1L << 0) // valid
#define PTE_R (1L << 1)
#define PTE_W (1L << 2)
#define PTE_X (1L << 3)
#define PTE_U (1L << 4) // 1 -> user can access
#define PTE_A (1L << 6) // <加入对访问位的支持>
#define MAXSCAN 32      // <限制一次最多可以查询的页面数量>

Then we can start to implement the sys_pgaccess function. According to the instructions in the guide, the implementation of this function is relatively simple. My code implementation is as follows:

#ifdef LAB_PGTBL
// 在这里声明对kernel/vm.c/walk函数的引用声明
extern pte_t * walk(pagetable_t, uint64, int);

int
sys_pgaccess(void)
{
    
    
  // lab pgtbl: your code here.
  // 在内核态下声明一个BitMask,用来存放结果
  uint64 BitMask = 0;
  
  // 声明一些变量,用来接收用户态下传入的参数
  uint64 StartVA;
  int NumberOfPages;
  uint64 BitMaskVA;
  
  // 首先读取要访问的页面数量
  if(argint(1, &NumberOfPages) < 0)
    return -1;
  
  // 如果页面数量超过了一次可以读取的最大范围
  // 系统调用直接返回
  if(NumberOfPages > MAXSCAN)
    return -1;                    
  
  // 读取页面开始地址和指向用户态存放结果的BitMask的指针
  if(argaddr(0, &StartVA) < 0)
    return -1;
  if(argaddr(2, &BitMaskVA) < 0)
    return -1;

  int i;
  pte_t* pte;
	
  // 从起始地址开始,逐页判断PTE_A是否被置位
  // 如果被置位,则设置对应BitMask的位,并将PTE_A清空
  for(i = 0 ; i < NumberOfPages ; StartVA += PGSIZE, ++i){
    
    
    if((pte = walk(myproc()->pagetable, StartVA, 0)) == 0)
      panic("pgaccess : walk failed");
    if(*pte & PTE_A){
    
    
      BitMask |= 1 << i;	// 设置BitMask对应位
      *pte &= ~PTE_A;		// 将PTE_A清空
    }
  }
  
  // 最后使用copyout将内核态下的BitMask拷贝到用户态
  copyout(myproc()->pagetable, BitMaskVA, (char*)&BitMask, sizeof(BitMask));
  return 0;
}
#endif

So this is the implementation of the sys_pgaccess system call, and finally to test its correctness:
insert image description here

4. Conclusion

So far, the third experiment of 6.S081 Fall 2021 has been completed, which is about the experimental content of virtual memory and page table mechanism. Generally speaking, the experiment is not very difficult, and can be completed quickly with the help of the instruction book. But the research on the kernel code is far from over. The next experiment involves the most important part of the operating system, that is, the realization of the terminal and trap mechanism. I can’t wait to pick up the corresponding source code...haha

Guess you like

Origin blog.csdn.net/zzy980511/article/details/130132423