Implementing Copy-on-Write Fork
接下来我们需要完成 JOS 系统中用户进程创建子进程的 Fork()函数,其中包括:
- static void pgfault(struct UTrapframe *utf)
- static int duppage(envid_t envid, unsigned pn)
- envid_t fork(void)
下面对这些函数进行说明:
static void pgfault(struct UTrapframe *utf)
具体的 page fault 处理函数,根据情况分配新页或者进行其他工作。当出现页错误时, 主要是对标为可写的或者COW的页面分配新的页面,复制旧页的数据到新页并映射到旧页 的地址处。
static int duppage(envid_t envid, unsigned pn)
将父进程的页表空间映射到子进程中,即共享数据,并都标记为 COW,为了以后任一 进程写数据时产生页错误,为其分配新的一页。 envid_t fork(void) 这里的 fork 是创建新进程的总入口,使用 env_alloc()创建一个新的进程,然后扫描父 进程的整个地址空间将其映射到子进程相关的页表中,对于父进程,返回子进程的进程号, 对于子进程,返回 0。fork 的流程如下:
首先调用 set_pgfault_handler 函数对 pgfault 处理函数进行注册。
调用 sys_exofork 创建一个新的进程。
映射可写或者 COW 的页都为 COW 的页。
为子进程分配 exception stack。
为子进程设置用户级的页错误处理句柄。//此处为容易遗忘的地方
- 标记子进程为 runnable。 在完成以上函数之后,我们可以用user/forktree 来测试我们的 fork()函数,每一个子 进程又创建子进程,直到三层结束......
Exercise 12. Implement fork, duppage and pgfault in lib/fork.c.
Test your code with the forktree program. It should produce the following messages, with interspersed 'new env', 'free env', and 'exiting gracefully' messages. The messages may not appear in this order, and the environment IDs may be different.
1000: I am '' 1001: I am '0' 2000: I am '00' 2001: I am '000' 1002: I am '1' 3000: I am '11' 3001: I am '10' 4000: I am '100' 1003: I am '01' 5000: I am '010' 4001: I am '011' 2002: I am '110' 1004: I am '001' 1005: I am '111' 1006: I am '101'
static void
pgfault(struct UTrapframe *utf)
{
void *addr = (void *) utf->utf_fault_va;
uint32_t err = utf->utf_err;
int r;
if (!(
(err & FEC_WR) && (uvpd[PDX(addr)] & PTE_P) &&
(uvpt[PGNUM(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_COW)))
panic("not copy-on-write");
addr = ROUNDDOWN(addr, PGSIZE);
if (sys_page_alloc(0, PFTEMP, PTE_W|PTE_U|PTE_P) < 0)
panic("sys_page_alloc");
memcpy(PFTEMP, addr, PGSIZE);
if (sys_page_map(0, PFTEMP, 0, addr, PTE_W|PTE_U|PTE_P) < 0)
panic("sys_page_map");
if (sys_page_unmap(0, PFTEMP) < 0)
panic("sys_page_unmap");
return;
}
static int
duppage(envid_t envid, unsigned pn)
{
int r;
// LAB 4: Your code here.
void *addr = (void*) (pn*PGSIZE);
if ((uvpt[pn] & PTE_W) || (uvpt[pn] & PTE_COW)) {
if (sys_page_map(0, addr, envid, addr, PTE_COW|PTE_U|PTE_P) < 0)
panic("2");
if (sys_page_map(0, addr, 0, addr, PTE_COW|PTE_U|PTE_P) < 0)
panic("3");
} else sys_page_map(0, addr, envid, addr, PTE_U|PTE_P);
return 0;
panic("duppage not implemented");
}
envid_t
fork(void)
{
set_pgfault_handler(pgfault);
envid_t envid;
uint32_t addr;
envid = sys_exofork();
if (envid == 0) {
// panic("child");
thisenv = &envs[ENVX(sys_getenvid())];
return 0;
}
// cprintf("sys_exofork: %x\n", envid);
if (envid < 0)
panic("sys_exofork: %e", envid);
for (addr = 0; addr < USTACKTOP; addr += PGSIZE)
if ((uvpd[PDX(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_P)
&& (uvpt[PGNUM(addr)] & PTE_U)) {
duppage(envid, PGNUM(addr));
}
if (sys_page_alloc(envid, (void *)(UXSTACKTOP-PGSIZE), PTE_U|PTE_W|PTE_P) < 0)
panic("1");
extern void _pgfault_upcall();
sys_env_set_pgfault_upcall(envid, _pgfault_upcall);
if (sys_env_set_status(envid, ENV_RUNNABLE) < 0)
panic("sys_env_set_status");
return envid;
panic("fork not implemented");
}
Challenge! Implement a shared-memory fork() called sfork(). This version should have the parent and child share all their memory pages (so writes in one environment appear in the other) except for pages in the stack area, which should be treated in the usual copy-on-write manner. Modify user/forktree.c to use sfork() instead of regular fork(). Also, once you have finished implementing IPC in part C, use your sfork() to run user/pingpongs. You will have to find a new way to provide the functionality of the global thisenv pointer.
Challenge! Your implementation of fork makes a huge number of system calls. On the x86, switching into the kernel using interrupts has non-trivial cost. Augment the system call interface so that it is possible to send a batch of system calls at once. Then change fork to use this interface.
How much faster is your new fork?
You can answer this (roughly) by using analytical arguments to estimate how much of an improvement batching system calls will make to the performance of your fork: How expensive is an int 0x30 instruction? How many times do you execute int 0x30 in your fork? Is accessing the TSS stack switch also expensive? And so on...
Alternatively, you can boot your kernel on real hardware and really benchmark your code. See the RDTSC (read time-stamp counter) instruction, defined in the IA32 manual, which counts the number of clock cycles that have elapsed since the last processor reset. QEMU doesn't emulate this instruction faithfully (it can either count the number of virtual instructions executed or use the host TSC, neither of which reflects the number of cycles a real CPU would require).