Page Tables¶
约 2126 个字 预计阅读时间 7 分钟
名词解释¶
Address Space:地址空间
MMU(Memory Management Unit):内存管理单元
SATP 寄存器:Supervisor Address Translation and Protection Register,SATP 寄存器用于管理虚拟内存系统中的地址转换和保护。它存储了用于访问虚拟地址空间的页表信息。
PTE(page table entry):页表项
PNN(physical page number):页号
Paging Hardware¶
Page tables are the most popular mechanism through which the operating system provides each process with its own private address space and memory.
页表是一个由操作系统维护的数据结构,用于存储 virtual address(虚拟地址) 到 physical addresses(物理地址) 的映射关系。
三级页表¶
A page table is stored in physical memory as a three-level tree.
The root of the tree is a 4096-byte page-table page that contains 512 PTEs(9 bits), which contain the physical addresses for page-table pages in the next level of the tree. Each of those pages contains 512 PTEs for the final level in the tree.
The paging hardware uses the top 9 bits of the 27 bits to select a PTE in the root page-table page, the middle 9 bits to select a PTE in a page-table page in the next level of the tree, and the bottom 9 bits to select the final PTE.
If any of the three PTEs required to translate an address is not present, the paging hardware raises a page-fault exception, leaving it up to the kernel to handle the exception. 如果转化的地址不存在,就抛出异常,丢给 kernel 去处理异常。
To avoid the cost of loading PTEs from physical memory(三次访存), a RISC-V CPU caches page table entries in a Translation Look-aside Buffer (TLB)。
PTE¶
Each PTE contains flag bits that tell the paging hardware how the associated virtual address is allowed to be used.
-
PTE_V
indicates whether the PTE is present: if it is not set, a reference to the page causes an exception (i.e. is not allowed). 当前页表项是否有效(PTE valid) 。 -
PTE_R
controls whether instructions are allowed to read to the page.当前页表项是否可读(PTE read)。 -
PTE_W
controls whether instructions are allowed to write to the page.当前页表项是否可写(PTE read)。 -
PTE_X
controls whether the CPU may interpret the content of the page as instructions and execute them. 当前页表项是否可执行(PTE eXecute)。 -
PTE_U
controls whether instructions in user mode are allowed to access the page; if PTE_U is not set, the PTE can be used only in supervisor mode. Figure 3.2 shows how it all works. 当前页表项用户模式是否可以访问(PTE User)。
The flags and all other page hardware-related structures are defined in (kernel/riscv.h
)
SATP Register¶
To tell the hardware to use a page table, the kernel must write the physical address of the root page-table page into the satp register. 将 根页表页的物理地址写入 SATP 寄存器。
Each CPU has its own satp. A CPU will translate all addresses generated by subsequent instructions using the page table pointed to by its own satp.
Each CPU has its own satp so that different CPUs can run different processes, each with a private address space described by its own page table.
内核能够通过虚拟地址空间方便地访问整个物理内存,并且通过页表的动态配置,内核可以灵活地管理这种映射。
Code: creating an address space¶
Most of the xv6 code for manipulating address spaces and page tables resides in vm.c (kernel/vm.c:1)
.
The central data structure is pagetable_t
, which is really a pointer to a RISC-V root page-table page; a pagetable_t
may be either the kernel page table, or one of the per process page tables.
The central functions are walk
, which finds the PTE for a virtual address, and mappages
, which installs PTEs for new mappings.
- Functions starting with
kvm
manipulate the kernel page table;kvm
全称是 kernel virtual memory - functions starting with
uvm
manipulate a user page table;uvm
全称是 user virtual memory - other functions are used for both.
copyout
andcopyin
copy data to and from user virtual addresses provided as system call arguments; they are invm.c
because they need to explicitly translate those addresses in order to find the corresponding physical memory
Early in the boot sequence, main calls kvminit
(kernel/vm.c:54
) to create the kernel’s page table using kvmmake
(kernel/vm.c:20
). This call occurs before xv6 has enabled paging on the RISC-V,so addresses refer directly to physical memory.
Kvmmake
first allocates a page of physical memory to hold the root page-table page.
Then it calls kvmmap
to install the translations that the kernel needs. The translations include the kernel’s instructions and data, physical memory up to PHYSTOP
, and memory ranges which are actually devices.
Proc_mapstacks
(kernel/proc.c:33
) allocates a kernel stack for each process. It calls kvmmap
to map each stack at the virtual address generated by KSTACK
, which leaves room for the invalid stack-guard pages
kvmmap
(kernel/vm.c:127
) calls mappages
(kernel/vm.c:138
), which installs mappings into a page table for a range of virtual addresses to a corresponding range of physical addresses. It does this separately for each virtual address in the range, at page intervals. For each virtual address to be mapped, mappages
calls walk
to find the address of the PTE for that address. 对每个虚拟地址,mappages 调用 walk 函数来找到对应的页表项。It then initializes the PTE to hold the relevant physical page number, the desired permissions (PTE_W, PTE_X, and/or PTE_R), and PTE_V to mark the PTE as valid (kernel/vm.c:153).
walk
(kernel/vm.c:81
) mimics the RISC-V paging hardware as it looks up the PTE for a virtual address (see Figure 3.2). walk descends the 3-level page table 9 bits at the time. It uses each level’s 9 bits of virtual address to find the PTE of either the next-level page table or the final page (kernel/vm.c:87). If the PTE isn’t valid, then the required page hasn’t yet been allocated; if the alloc
argument is set, walk allocates a new page-table page and puts its physical address in the PTE. It returns the address of the PTE in the lowest layer in the tree (kernel/vm.c:97) 返回最后一层的页表,而不是顶层。
Each RISC-V CPU caches page table entries in a Translation Look-aside Buffer (TLB), and when xv6 changes a page table, it must tell the CPU to invalidate corresponding cached TLB entries.
Physical memory allocation¶
The kernel must allocate and free physical memory at run-time for page tables, user memory, kernel stacks, and pipe buffers.
xv6 uses the physical memory between the end of the kernel and PHYSTOP for run-time allocation. It allocates and frees whole 4096-byte pages at a time. It keeps track of which pages are free by threading a linked list through the pages themselves. Allocation consists of removing a page from the linked list; freeing consists of adding the freed page to the list 有个空闲链表记录哪个页没有用。
Process address space¶
Each process has a separate page table, and when xv6 switches between processes, it also changes page tables.
When a process asks xv6 for more user memory, xv6 first uses kalloc
to allocate physical
pages. It then adds PTEs to the process’s page table that point to the new physical pages. Xv6 sets the PTE_W, PTE_X, PTE_R, PTE_U
, and PTE_V
flags in these PTEs. Most processes do not use the entire user address space; xv6 leaves PTE_V
clear in unused PTEs.
We see here a few nice examples of use of page tables.
- First, different processes’ page tables translate user addresses to different pages of physical memory, so that each process has private user memory. 不同进程的页表指向不同的物理内存,所以每个进程有独立的用户内存。
- Second, each process sees its memory as having contiguous virtual addresses starting at zero, while the process’s physical memory can be non-contiguous. 虚拟内存连续,实际物理内存可能不连续
- Third, the kernel maps a page with trampoline code at the top of the user address space, thus a single page of physical memory shows up in all address spaces. 内核在用户地址空间的顶部映射了一页带有trampoline代码的页面,trampoline 代码用于实现从用户模式到内核模式的切换,(Note: the pointers, page table, sp, etc are stored in
trapframe
.)
The stack
is a single page, and is shown with the initial contents as created by exec
. Strings containing the command-line arguments, as well as an array of pointers to them, are at the very top of the stack. Just under that are values that allow a program to start at main as if the function main(argc, argv) had just been called.
Some I/O devices:
PLIC
:(Platform-Level Interrupt Controller)中断控制器
CLINT
:(Core Local Interrupter )也是中断的一部分。所以多个设备都能产生中断,需要中断控制器来将这些中断路由到合适的处理函数。
UART0
:(Universal Asynchronous Receiver/Transmitter)负责与Console和显示器交互。
VIRTIO disk,与磁盘进行交互。
第一件事情是,有一些page在虚拟内存中的地址很靠后,比如kernel stack在虚拟内存中的地址就很靠后。这是因为在它之下有一个未被映射的Guard page,这个Guard page对应的PTE的Valid 标志位没有设置,这样,如果kernel stack耗尽了,它会溢出到Guard page,但是因为Guard page的PTE中Valid标志位未设置,会导致立即触发page fault,这样的结果好过内存越界之后造成的数据混乱。立即触发一个panic,你就知道kernel stack出错了。同时我们也又不想浪费物理内存给Guard page,所以Guard page不会映射到任何物理内存,它只是占据了虚拟地址空间的一段靠后的地址。 同时,kernel stack被映射了两次,在靠后的虚拟地址映射了一次,在PHYSTOP下的Kernel data中又映射了一次,但是实际使用的时候用的是上面的部分,因为有Guard page会更加安全。
这是众多你可以通过page table实现的有意思的事情之一。你可以向同一个物理地址映射两个虚拟地址,你可以不将一个虚拟地址映射到物理地址。可以是一对一的映射,一对多映射,多对一映射。XV6至少在1-2个地方用到类似的技巧。这的kernel stack和Guard page就是XV6基于page table使用的有趣技巧的一个例子。
Created: January 20, 2024