为什么 fork() 将两个进程中的每个页面都标记为只读？

Why does fork() flag each page in both processes as read-only?

我在读一本讲 fork() 如何使用虚拟内存的教科书：

When the fork function is called by the current process, the kernel creates various data structures for the new process and assigns it a unique PID. To create the virtual memory for the new process, it creates exact copies of the current process’s mm_struct, area structs, and page tables. It flags each page in both processes as read-only [emphasis added], and flags each area struct in both processes as private copy-on-write.

^{来源：计算机系统：程序员的视角，第 3 章，第 9.8.2 节 - fork 函数回顾。}

我不明白为什么它需要将两个进程中的每个页面都标记为只读。如果父进程中的每个页面都是只读的，那么父进程将永远无法修改一些未初始化的全局变量（.bss 部分）。那这个程序怎么运行呢？

If each page in the parent process is read-only then the parent process will never be able to modify some uninitialised global variables

只有当页面保持只读状态时才会如此。但他们并没有像句子的下一部分所说的那样：

and flags each area struct in both processes as private copy-on-write

每一页都以 read-only 开头，因此 parent 和 child 可以共享一个副本。如果任一进程仅在此时尝试修改这样的页面，则将制作可写副本（如果该页面确实是可写的）。复制后，写入进程可以进行任何更改，而不会影响其他进程的原始（仍然是 read-only）页面。

这可以为 parent 和 child 都不会实际更改的页面节省内存。

从user space point of view (that is from syscalls(2) used after the fork(2) in your application code...), the memory pages (managed by the MMU）不都是read-only。该抽象由内核提供。

并在成功后fork(2) you could call mprotect(2), mmap(2), munmap(2), sbrk(2) (perhaps used by malloc(3) or dlopen(3)...) and execve(2) to change the address space of your process。

阅读Advanced Linux Programming and a good textbook on Operating Systems. See of course LinuxAteMyRAM

来自 Linux kernel, things are of course very different. Refer to kernelnewbies and OSDev 网站。

既是Linux内核，又是GNU libc or musl-libc, and most applications (e.g. GNU bash) in major Linux distributions such as Debian are open source。 您可以下载和研究他们的源代码。

考虑阅读 proc(5) and elf(5), and using pmap(1), objdump(1), readelf(1)。在终端中尝试 cat /proc/$$/maps。

为什么 fork() 将两个进程中的每个页面都标记为只读？

Why does fork() flag each page in both processes as read-only?

c

process

virtual-memory

linux-kernel