MacOS 堆栈是如何在进程开始时初始化的?
How is MacOS stack initialized at the start of the process?
出于对 MacOS 如何准备其堆栈的好奇,我编写了一个 (x86_64) 汇编程序以在进程启动时立即将堆栈顶部打印到标准输出:
global start
start: ; entry point of the binary, called by the loader
push rsp ; push the stack pointer to stack so that we'll se that too
mov rdi, 1 ; file to write to: file descriptor 1 (STDOUT)
lea rsi, [rsp] ; source of the write: stack
mov rdx, 64 ; number of bytes to write: 64 (8 x 64-bit integers)
mov rax, 0x02000004 ; MacOS syscall number for write
syscall
mov rsi, [rsp+16] ; smoke test: argv contents
mov rdx, 16 ; we expect the argv[0] ("./inspect_stack[=10=]") to be 16 bytes long
mov rax, 0x02000004
syscall
mov rsi, [rsp+32] ; another smoke test: envp???
mov rdx, 11
mov rax, 0x02000004
syscall
mov rax, 0x02000001 ; MacOS syscall number for exit
syscall
运行 这个程序并检查输出:
nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack | xxd -e -g 8 -c 8
我看到了这样的东西:(添加了我自己的一些评论)
00000000: 00007ff7bfeff6b0 ........ # this is the stack pointer we pushed
00000008: 0000000000000001 ........ # argc
00000010: 00007ff7bfeff880 ........ # argv; see the smoke test result
00000018: 0000000000000000 ........ # a null pointer???
00000020: 00007ff7bfeff890 ........ # are these part of envp?
00000028: 00007ff7bfeff89f ........ # ...seems like an array of pointers stored inline?
00000030: 00007ff7bfeff8dc ........ # ...and they seem to point at a continuous buffer
00000038: 00007ff7bfeff8ed ........
00000040: 636570736e692f2e ./inspec # the result of the 1st smoke test. yes, argv[0]!
00000048: 006b636174735f74 t_stack.
00000050: 6573552f3d445750 PWD=/Use # the result of the 2nd smoke test... seems like envp?
00000058: 2f7372 rs/
所以,我了解到程序开始时会有一个 64 位整数 (argc) 和两个指针(指向 argv 和指向 envp)存储到堆栈中。但是,这似乎不是真的,或者由于某种原因 envp 指针为空。但是,我们可以看到内联存储的 envp 数组似乎在 null 之后开始。进程启动时堆栈的实际布局是什么?
进一步检查并添加更多参数,我注意到我对堆栈顶部有两个指向 argv 和 envp 的指针的理解是错误的。相反,argv 和 envp 存储为 inline,作为指向关联字符串的指针数组。两个数组都是 null-terminated,所以我看到的空值实际上是 argv 的终止符。添加更多参数使这一点更加清晰:
nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack first second | xxd -e -g 8 -c 8
00000000: 00007ff7bfeff698 ........
00000008: 0000000000000003 ........ # argc
00000010: 00007ff7bfeff878 x....... # argv[0]
00000018: 00007ff7bfeff888 ........ # argv[1]
00000020: 00007ff7bfeff88e ........ # argv[2]
00000028: 0000000000000000 ........ # argv end
00000030: 00007ff7bfeff895 ........ # envp[0]
00000038: 00007ff7bfeff8a4 ........ # envp[1] and so on
00000040: 636570736e692f2e ./inspec
00000048: 006b636174735f74 t_stack.
00000050: 5000646e6f636573 second.P # the second smoke test now sees argv[2]!
00000058: 3d4457 WD= # seems that the envp strings are located right after argc strings
TL;DR:我认为堆栈中的第二个和第三个 64 位值是 char **argv
和 char **envp
。相反,它们是 argv[0]
和 argv[1]
。现在,要获得 C main
期望的 char **argv
,我可以使用 [rsp + 8]
(跳过 8 个字节 argc
),并获得 char **envp
我可以 mov rax, [rsp]
然后取 [rsp + 8 + rax*8 + 8]
(8 个字节用于跳过 argc,然后跳过 argc 指针数,最后另外 8 个字节用于跳过空终止符)。
出于对 MacOS 如何准备其堆栈的好奇,我编写了一个 (x86_64) 汇编程序以在进程启动时立即将堆栈顶部打印到标准输出:
global start
start: ; entry point of the binary, called by the loader
push rsp ; push the stack pointer to stack so that we'll se that too
mov rdi, 1 ; file to write to: file descriptor 1 (STDOUT)
lea rsi, [rsp] ; source of the write: stack
mov rdx, 64 ; number of bytes to write: 64 (8 x 64-bit integers)
mov rax, 0x02000004 ; MacOS syscall number for write
syscall
mov rsi, [rsp+16] ; smoke test: argv contents
mov rdx, 16 ; we expect the argv[0] ("./inspect_stack[=10=]") to be 16 bytes long
mov rax, 0x02000004
syscall
mov rsi, [rsp+32] ; another smoke test: envp???
mov rdx, 11
mov rax, 0x02000004
syscall
mov rax, 0x02000001 ; MacOS syscall number for exit
syscall
运行 这个程序并检查输出:
nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack | xxd -e -g 8 -c 8
我看到了这样的东西:(添加了我自己的一些评论)
00000000: 00007ff7bfeff6b0 ........ # this is the stack pointer we pushed
00000008: 0000000000000001 ........ # argc
00000010: 00007ff7bfeff880 ........ # argv; see the smoke test result
00000018: 0000000000000000 ........ # a null pointer???
00000020: 00007ff7bfeff890 ........ # are these part of envp?
00000028: 00007ff7bfeff89f ........ # ...seems like an array of pointers stored inline?
00000030: 00007ff7bfeff8dc ........ # ...and they seem to point at a continuous buffer
00000038: 00007ff7bfeff8ed ........
00000040: 636570736e692f2e ./inspec # the result of the 1st smoke test. yes, argv[0]!
00000048: 006b636174735f74 t_stack.
00000050: 6573552f3d445750 PWD=/Use # the result of the 2nd smoke test... seems like envp?
00000058: 2f7372 rs/
所以,我了解到程序开始时会有一个 64 位整数 (argc) 和两个指针(指向 argv 和指向 envp)存储到堆栈中。但是,这似乎不是真的,或者由于某种原因 envp 指针为空。但是,我们可以看到内联存储的 envp 数组似乎在 null 之后开始。进程启动时堆栈的实际布局是什么?
进一步检查并添加更多参数,我注意到我对堆栈顶部有两个指向 argv 和 envp 的指针的理解是错误的。相反,argv 和 envp 存储为 inline,作为指向关联字符串的指针数组。两个数组都是 null-terminated,所以我看到的空值实际上是 argv 的终止符。添加更多参数使这一点更加清晰:
nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack first second | xxd -e -g 8 -c 8
00000000: 00007ff7bfeff698 ........
00000008: 0000000000000003 ........ # argc
00000010: 00007ff7bfeff878 x....... # argv[0]
00000018: 00007ff7bfeff888 ........ # argv[1]
00000020: 00007ff7bfeff88e ........ # argv[2]
00000028: 0000000000000000 ........ # argv end
00000030: 00007ff7bfeff895 ........ # envp[0]
00000038: 00007ff7bfeff8a4 ........ # envp[1] and so on
00000040: 636570736e692f2e ./inspec
00000048: 006b636174735f74 t_stack.
00000050: 5000646e6f636573 second.P # the second smoke test now sees argv[2]!
00000058: 3d4457 WD= # seems that the envp strings are located right after argc strings
TL;DR:我认为堆栈中的第二个和第三个 64 位值是 char **argv
和 char **envp
。相反,它们是 argv[0]
和 argv[1]
。现在,要获得 C main
期望的 char **argv
,我可以使用 [rsp + 8]
(跳过 8 个字节 argc
),并获得 char **envp
我可以 mov rax, [rsp]
然后取 [rsp + 8 + rax*8 + 8]
(8 个字节用于跳过 argc,然后跳过 argc 指针数,最后另外 8 个字节用于跳过空终止符)。