为什么 fork() 将两个进程中的每个页面都标记为只读?

Why does fork() flag each page in both processes as read-only?

我在读一本讲 fork() 如何使用虚拟内存的教科书:

When the fork function is called by the current process, the kernel creates various data structures for the new process and assigns it a unique PID. To create the virtual memory for the new process, it creates exact copies of the current process’s mm_struct, area structs, and page tables. It flags each page in both processes as read-only [emphasis added], and flags each area struct in both processes as private copy-on-write.

来源:计算机系统:程序员的视角,第 3 章,第 9.8.2 节 - fork 函数回顾。

我不明白为什么它需要将两个进程中的每个页面都标记为只读。如果父进程中的每个页面都是只读的,那么父进程将永远无法修改一些未初始化的全局变量(.bss 部分)。那这个程序怎么运行呢?

If each page in the parent process is read-only then the parent process will never be able to modify some uninitialised global variables

只有当页面保持只读状态时才会如此。但他们并没有像句子的下一部分所说的那样:

and flags each area struct in both processes as private copy-on-write

每一页都以 read-only 开头,因此 parent 和 child 可以共享一个副本。如果任一进程仅在此时尝试修改这样的页面,则将制作可写副本(如果该页面确实是可写的)。复制后,写入进程可以进行任何更改,而不会影响其他进程的原始(仍然是 read-only)页面。

这可以为 parent 和 child 都不会实际更改的页面节省内存。

user space point of view (that is from syscalls(2) used after the fork(2) in your application code...), the memory pages (managed by the MMU)不都是read-only。该抽象由内核提供。

并在成功后fork(2) you could call mprotect(2), mmap(2), munmap(2), sbrk(2) (perhaps used by malloc(3) or dlopen(3)...) and execve(2) to change the address space of your process

阅读Advanced Linux Programming and a good textbook on Operating Systems. See of course LinuxAteMyRAM

来自 Linux kernel, things are of course very different. Refer to kernelnewbies and OSDev 网站。

既是Linux内核,又是GNU libc or musl-libc, and most applications (e.g. GNU bash) in major Linux distributions such as Debian are open source您可以下载和研究他们的源代码

考虑阅读 proc(5) and elf(5), and using pmap(1), objdump(1), readelf(1)。在终端中尝试 cat /proc/$$/maps