为什么 mmap 在 printfs 调用期间完成?
Why is mmap done during printfs calls?
为什么 printf() 执行 sys_mmap() 然后将字符串的内容以块(1024)复制到新地址 space对于 sys_write() ?
简单静态"hello"程序的strace如下所示
> gcc -o hello -static hello.c
> strace ./hello
execve("./hello", ["./hello"], [/* 71 vars */]) = 0
uname({sys="Linux", node="Kumar", ...}) = 0
brk(0) = 0x1ce8000
brk(0x1ce91c0) = 0x1ce91c0
arch_prctl(ARCH_SET_FS, 0x1ce8880) = 0
readlink("/proc/self/exe", "/home/admin/hello", 4096) = 18
brk(0x1d0a1c0) = 0x1d0a1c0
brk(0x1d0b000) = 0x1d0b000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 28), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7feda2130000
write(1, "Hello", 5Hello) = 5
exit_group(0) = ?
+++ exited with 0 +++
rodata 的对象转储
> objdump -s --start-address=0x4935a0 ./hello | head -5
./hello: file format elf64-x86-64
Contents of section .rodata:
4935a0 01000200 48656c6c 6f006c69 62632d73 ....Hello.libc-s
如果我们在内核级别hook sys_write()系统调用的地址,我们看到传递给它的地址是mmap-ed 地址区域。考虑到字符串已经存在于二进制的第一个可加载段的 .rodata 部分中,这不仅仅是浪费新地址 space。它与没有写权限等有关吗?那么为什么不让编译器首先将字符串放在 .data 部分(也是可写的)?
更新:
Mmap-ed 地址确实适用于 sys_write(),当我们将字符串变大(比如 ~1500 个字符)时,可以通过更简单的方式进行验证。 GDB会确认正在打印的数据地址[注意第二个断点]
(gdb) c
Continuing.
Hello World hhhhhhhhhhalhfafeuirafheuhrgiegieguehguergjkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqwwwwwwwwwwwwwwwwwwwwww pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiiiiiiiiiiiwqiuwqiuwiquwiqhchasnvjnavjanvjdanvjdanvjdanjfanvjaddijuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuquweuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuunnnnnnnnnnnnnnnnnnnnnnnnnnnzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz,,,,,,,,,,,,,,,,,,,,,,
Breakpoint 1, _IO_new_file_write (f=0x6b8300 <_IO_2_1_stdout_>, data=0x7ffff7ffc000, n=706) at fileops.c:1257
1257 {
您尝试过使用调试器吗?
$ gdb /tmp/hello
...
(gdb) b __mmap
Breakpoint 1 at 0x4152e0
(gdb) r
Starting program: /tmp/hello
Breakpoint 1, 0x00000000004152e0 in mmap64 ()
(gdb) bt
#0 0x00000000004152e0 in mmap64 ()
#1 0x000000000045d73c in _IO_file_doallocate ()
#2 0x0000000000401fec in _IO_doallocbuf ()
#3 0x000000000042ca10 in _IO_new_file_overflow ()
#4 0x000000000042be9d in _IO_new_file_xsputn ()
#5 0x000000000040111d in puts ()
#6 0x00000000004004de in main () at hello.c:4
(gdb) c
Continuing.
Hello, w
[Inferior 1 (process 4294) exited with code 011]
所以它为缓冲输入输出分配内存,FILE*
使用。请注意,仅将 printf 与常量字符串一起使用将导致调用 puts,因为 GCC 足够智能。 puts(string)
实际上是一个 fputs(string, stdout)
,其中标准输出是 FILE*
。
使用原始写入,但不会导致此类行为:
#include <unistd.h>
int main() {
write(1, "Hello, w\n", sizeof("Hello, w\n"));
}
为什么 printf() 执行 sys_mmap() 然后将字符串的内容以块(1024)复制到新地址 space对于 sys_write() ?
简单静态"hello"程序的strace如下所示
> gcc -o hello -static hello.c
> strace ./hello
execve("./hello", ["./hello"], [/* 71 vars */]) = 0
uname({sys="Linux", node="Kumar", ...}) = 0
brk(0) = 0x1ce8000
brk(0x1ce91c0) = 0x1ce91c0
arch_prctl(ARCH_SET_FS, 0x1ce8880) = 0
readlink("/proc/self/exe", "/home/admin/hello", 4096) = 18
brk(0x1d0a1c0) = 0x1d0a1c0
brk(0x1d0b000) = 0x1d0b000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 28), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7feda2130000
write(1, "Hello", 5Hello) = 5
exit_group(0) = ?
+++ exited with 0 +++
rodata 的对象转储
> objdump -s --start-address=0x4935a0 ./hello | head -5
./hello: file format elf64-x86-64
Contents of section .rodata:
4935a0 01000200 48656c6c 6f006c69 62632d73 ....Hello.libc-s
如果我们在内核级别hook sys_write()系统调用的地址,我们看到传递给它的地址是mmap-ed 地址区域。考虑到字符串已经存在于二进制的第一个可加载段的 .rodata 部分中,这不仅仅是浪费新地址 space。它与没有写权限等有关吗?那么为什么不让编译器首先将字符串放在 .data 部分(也是可写的)?
更新:
Mmap-ed 地址确实适用于 sys_write(),当我们将字符串变大(比如 ~1500 个字符)时,可以通过更简单的方式进行验证。 GDB会确认正在打印的数据地址[注意第二个断点]
(gdb) c
Continuing.
Hello World hhhhhhhhhhalhfafeuirafheuhrgiegieguehguergjkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqwwwwwwwwwwwwwwwwwwwwww pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiiiiiiiiiiiwqiuwqiuwiquwiqhchasnvjnavjanvjdanvjdanvjdanjfanvjaddijuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuquweuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuunnnnnnnnnnnnnnnnnnnnnnnnnnnzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz,,,,,,,,,,,,,,,,,,,,,,
Breakpoint 1, _IO_new_file_write (f=0x6b8300 <_IO_2_1_stdout_>, data=0x7ffff7ffc000, n=706) at fileops.c:1257
1257 {
您尝试过使用调试器吗?
$ gdb /tmp/hello
...
(gdb) b __mmap
Breakpoint 1 at 0x4152e0
(gdb) r
Starting program: /tmp/hello
Breakpoint 1, 0x00000000004152e0 in mmap64 ()
(gdb) bt
#0 0x00000000004152e0 in mmap64 ()
#1 0x000000000045d73c in _IO_file_doallocate ()
#2 0x0000000000401fec in _IO_doallocbuf ()
#3 0x000000000042ca10 in _IO_new_file_overflow ()
#4 0x000000000042be9d in _IO_new_file_xsputn ()
#5 0x000000000040111d in puts ()
#6 0x00000000004004de in main () at hello.c:4
(gdb) c
Continuing.
Hello, w
[Inferior 1 (process 4294) exited with code 011]
所以它为缓冲输入输出分配内存,FILE*
使用。请注意,仅将 printf 与常量字符串一起使用将导致调用 puts,因为 GCC 足够智能。 puts(string)
实际上是一个 fputs(string, stdout)
,其中标准输出是 FILE*
。
使用原始写入,但不会导致此类行为:
#include <unistd.h>
int main() {
write(1, "Hello, w\n", sizeof("Hello, w\n"));
}