MacOS 堆栈是如何在进程开始时初始化的?

How is MacOS stack initialized at the start of the process?

出于对 MacOS 如何准备其堆栈的好奇,我编写了一个 (x86_64) 汇编程序以在进程启动时立即将堆栈顶部打印到标准输出:

global start
start:                      ; entry point of the binary, called by the loader
    push    rsp             ; push the stack pointer to stack so that we'll se that too
    mov     rdi, 1          ; file to write to: file descriptor 1 (STDOUT)
    lea     rsi, [rsp]      ; source of the write: stack
    mov     rdx, 64         ; number of bytes to write: 64 (8 x 64-bit integers)
    mov     rax, 0x02000004 ; MacOS syscall number for write
    syscall
    mov     rsi, [rsp+16]   ; smoke test: argv contents
    mov     rdx, 16         ; we expect the argv[0] ("./inspect_stack[=10=]") to be 16 bytes long
    mov     rax, 0x02000004
    syscall
    mov     rsi, [rsp+32]   ; another smoke test: envp???
    mov     rdx, 11
    mov     rax, 0x02000004
    syscall
    mov     rax, 0x02000001 ; MacOS syscall number for exit
    syscall

运行 这个程序并检查输出: nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack | xxd -e -g 8 -c 8

我看到了这样的东西:(添加了我自己的一些评论)

00000000: 00007ff7bfeff6b0  ........  # this is the stack pointer we pushed
00000008: 0000000000000001  ........  # argc
00000010: 00007ff7bfeff880  ........  # argv; see the smoke test result
00000018: 0000000000000000  ........  # a null pointer???
00000020: 00007ff7bfeff890  ........  # are these part of envp?
00000028: 00007ff7bfeff89f  ........  # ...seems like an array of pointers stored inline?
00000030: 00007ff7bfeff8dc  ........  # ...and they seem to point at a continuous buffer
00000038: 00007ff7bfeff8ed  ........
00000040: 636570736e692f2e  ./inspec  # the result of the 1st smoke test. yes, argv[0]!
00000048: 006b636174735f74  t_stack.
00000050: 6573552f3d445750  PWD=/Use  # the result of the 2nd smoke test... seems like envp?
00000058:           2f7372  rs/

所以,我了解到程序开始时会有一个 64 位整数 (argc) 和两个指针(指向 argv 和指向 envp)存储到堆栈中。但是,这似乎不是真的,或者由于某种原因 envp 指针为空。但是,我们可以看到内联存储的 envp 数组似乎在 null 之后开始。进程启动时堆栈的实际布局是什么?

进一步检查并添加更多参数,我注意到我对堆栈顶部有两个指向 argv 和 envp 的指针的理解是错误的。相反,argv 和 envp 存储为 inline,作为指向关联字符串的指针数组。两个数组都是 null-terminated,所以我看到的空值实际上是 argv 的终止符。添加更多参数使这一点更加清晰: nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack first second | xxd -e -g 8 -c 8

00000000: 00007ff7bfeff698  ........
00000008: 0000000000000003  ........  # argc
00000010: 00007ff7bfeff878  x.......  # argv[0]
00000018: 00007ff7bfeff888  ........  # argv[1]
00000020: 00007ff7bfeff88e  ........  # argv[2]
00000028: 0000000000000000  ........  # argv end
00000030: 00007ff7bfeff895  ........  # envp[0]
00000038: 00007ff7bfeff8a4  ........  # envp[1] and so on
00000040: 636570736e692f2e  ./inspec
00000048: 006b636174735f74  t_stack.
00000050: 5000646e6f636573  second.P  # the second smoke test now sees argv[2]!
00000058:           3d4457  WD=       # seems that the envp strings are located right after argc strings

TL;DR:我认为堆栈中的第二个和第三个 64 位值是 char **argvchar **envp。相反,它们是 argv[0]argv[1]。现在,要获得 C main 期望的 char **argv,我可以使用 [rsp + 8](跳过 8 个字节 argc),并获得 char **envp 我可以 mov rax, [rsp] 然后取 [rsp + 8 + rax*8 + 8](8 个字节用于跳过 argc,然后跳过 argc 指针数,最后另外 8 个字节用于跳过空终止符)。