你好,有 Linux 个系统调用的汇编语言世界?
Hello, world in assembly language with Linux system calls?
我知道 int 0x80
正在中断 linux。但是,我不明白这段代码是如何工作的。它会返回一些东西吗?
$ - msg
代表什么?
global _start
section .data
msg db "Hello, world!", 0x0a
len equ $ - msg
section .text
_start:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, len
int 0x80 ;What is this?
mov eax, 1
mov ebx, 0
int 0x80 ;and what is this?
How does $ work in NASM, exactly? 解释了 $ - msg
如何获取 NASM 来为您计算字符串长度作为 assemble 时间常数,而不是对其进行硬编码。
我最初为 SO Docs (topic ID: 1164, example ID: 19078), rewriting a basic less-well-commented example by @runner. This looks like a better place to put it than as 编写了其余部分,之前我在 SO 文档实验结束后将其移动到了那里。
进行系统调用是通过将参数放入寄存器,然后 运行ning int 0x80
(32 位模式)或 syscall
(64 位模式)来完成的。 What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 and The Definitive Guide to Linux System Calls.
将 int 0x80
视为 "call" 进入内核的一种方式,跨越 user/kernel 权限边界。 内核根据到 int 0x80
执行时寄存器中的值,然后最终是 returns。 return 值在 EAX 中。
当执行到达内核的入口点时,它查看 EAX 并根据 EAX 中的调用号分派到正确的系统调用。来自其他寄存器的值作为函数参数传递给该系统调用的内核处理程序。 (例如 eax=4 / int 0x80
将使内核调用其 sys_write
内核函数,实现 POSIX write
系统调用。)
另请参阅 - 该答案包括查看内核入口点中的 asm,即 "called" by int 0x80
。 (也适用于 32 位用户 -space,而不仅仅是你不应该使用 int 0x80
的 64 位)。
如果您还不知道低级 Unix 系统编程,您可能只想在 asm 中编写函数,这些函数接受 args 和 return 一个值(或通过指针 arg 更新数组)并调用它们来自 C 或 C++ 程序。然后你可以只担心学习如何处理寄存器和内存,而不用学习 POSIX 系统调用 API 和使用它的 ABI。这也使得将您的代码与 C 实现的编译器输出进行比较变得非常容易。编译器通常在编写高效代码方面做得很好,但是 .
libc 为系统调用提供包装函数,因此编译器生成的代码将 call write
而不是直接使用 int 0x80
调用它(或者如果您关心性能,sysenter
)。 (在 x86-64 代码中,use syscall
for the 64-bit ABI.) See also syscalls(2)
.
系统调用记录在第 2 节手册页中,例如 write(2)
. See the NOTES section for differences between the libc wrapper function and the underlying Linux system call. Note that the wrapper for sys_exit
is _exit(2)
, not the exit(3)
ISO C function that flushes stdio buffers and other cleanup first. There's also an exit_group
system call that 。 exit(3)
实际上使用它,因为单线程进程没有缺点。
这段代码进行了 2 次系统调用:
我对它进行了大量评论(到了它开始模糊实际代码而没有颜色语法突出显示的地步)。这是向初学者指出问题的尝试,而不是您应该如何正常注释代码。
section .text ; Executable code goes in the .text section
global _start ; The linker looks for this symbol to set the process entry point, so execution start here
;;;a name followed by a colon defines a symbol. The global _start directive modifies it so it's a global symbol, not just one that we can CALL or JMP to from inside the asm.
;;; note that _start isn't really a "function". You can't return from it, and the kernel passes argc, argv, and env differently than main() would expect.
_start:
;;; write(1, msg, len);
; Start by moving the arguments into registers, where the kernel will look for them
mov edx,len ; 3rd arg goes in edx: buffer length
mov ecx,msg ; 2nd arg goes in ecx: pointer to the buffer
;Set output to stdout (goes to your terminal, or wherever you redirect or pipe)
mov ebx,1 ; 1st arg goes in ebx: Unix file descriptor. 1 = stdout, which is normally connected to the terminal.
mov eax,4 ; system call number (from SYS_write / __NR_write from unistd_32.h).
int 0x80 ; generate an interrupt, activating the kernel's system-call handling code. 64-bit code uses a different instruction, different registers, and different call numbers.
;; eax = return value, all other registers unchanged.
;;;Second, exit the process. There's nothing to return to, so we can't use a ret instruction (like we could if this was main() or any function with a caller)
;;; If we don't exit, execution continues into whatever bytes are next in the memory page,
;;; typically leading to a segmentation fault because the padding 00 00 decodes to add [eax],al.
;;; _exit(0);
xor ebx,ebx ; first arg = exit status = 0. (will be truncated to 8 bits). Zeroing registers is a special case on x86, and mov ebx,0 would be less efficient.
;; leaving out the zeroing of ebx would mean we exit(1), i.e. with an error status, since ebx still holds 1 from earlier.
mov eax,1 ; put __NR_exit into eax
int 0x80 ;Execute the Linux function
section .rodata ; Section for read-only constants
;; msg is a label, and in this context doesn't need to be msg:. It could be on a separate line.
;; db = Data Bytes: assemble some literal bytes into the output file.
msg db 'Hello, world!',0xa ; ASCII string constant plus a newline (0x10)
;; No terminating zero byte is needed, because we're using write(), which takes a buffer + length instead of an implicit-length string.
;; To make this a C string that we could pass to puts or strlen, we'd need a terminating 0 byte. (e.g. "...", 0x10, 0)
len equ $ - msg ; Define an assemble-time constant (not stored by itself in the output file, but will appear as an immediate operand in insns that use it)
; Calculate len = string length. subtract the address of the start
; of the string from the current position ($)
;; equivalently, we could have put a str_end: label after the string and done len equ str_end - str
请注意,我们不将字符串长度存储在数据内存中的任何位置。它是一个 assemble 时间常数,因此将它作为立即操作数比加载更有效。我们也可以使用三个 push imm32
指令将字符串数据压入堆栈,但代码大小过大并不是一件好事。
在 Linux 上,您可以将此文件另存为 Hello.asm
并 使用这些命令从中构建一个 32 位可执行文件:
nasm -felf32 Hello.asm # assemble as 32-bit code. Add -Worphan-labels -g -Fdwarf for debug symbols and warnings
gcc -static -nostdlib -m32 Hello.o -o Hello # link without CRT startup code or libc, making a static binary
有关将程序集构建为 32 位或 64 位静态或动态链接的 Linux 可执行文件的更多详细信息,请参见 this answer NASM/YASM 语法或 GNU AT&T 语法与 GNU as
指令。 (关键点:确保在 64 位主机上构建 32 位代码时使用 -m32
或等价物,否则在 运行 时你会遇到令人困惑的问题。)
您可以使用 strace
跟踪它的执行以查看它进行的系统调用:
$ strace ./Hello
execve("./Hello", ["./Hello"], [/* 72 vars */]) = 0
[ Process PID=4019 runs in 32 bit mode. ]
write(1, "Hello, world!\n", 14Hello, world!
) = 14
_exit(0) = ?
+++ exited with 0 +++
将此与动态链接进程的跟踪(如 gcc 从 hello.c 或从 运行ning strace /bin/ls
进行比较)以了解在下面发生了多少事情动态链接和 C 库启动的引擎盖。
stderr 上的跟踪和 stdout 上的常规输出都将转到此处的终端,因此它们干扰了 write
系统调用的线路。如果您愿意,可以重定向或跟踪到一个文件。请注意,这如何让我们轻松查看系统调用 return 值而无需添加代码来打印它们,实际上比使用常规调试器(如 gdb)单步执行并查看 eax
更容易为了这。有关 gdb asm 提示,请参阅 x86 tag wiki 的底部。 (标签 wiki 的其余部分充满了指向优质资源的链接。)
该程序的 x86-64 版本非常相似,将相同的参数传递给相同的系统调用,只是在不同的寄存器中并且使用 syscall
而不是 int 0x80
。请参阅 的底部以获取以 64 位代码编写字符串并退出的工作示例。
相关:A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux。您可以 运行 的最小二进制文件,它只进行 exit() 系统调用。那是关于最小化二进制大小,而不是源大小,甚至只是实际 运行.
的指令数
我知道
int 0x80
正在中断 linux。但是,我不明白这段代码是如何工作的。它会返回一些东西吗?$ - msg
代表什么?
global _start
section .data
msg db "Hello, world!", 0x0a
len equ $ - msg
section .text
_start:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, len
int 0x80 ;What is this?
mov eax, 1
mov ebx, 0
int 0x80 ;and what is this?
How does $ work in NASM, exactly? 解释了 $ - msg
如何获取 NASM 来为您计算字符串长度作为 assemble 时间常数,而不是对其进行硬编码。
我最初为 SO Docs (topic ID: 1164, example ID: 19078), rewriting a basic less-well-commented example by @runner. This looks like a better place to put it than as
进行系统调用是通过将参数放入寄存器,然后 运行ning int 0x80
(32 位模式)或 syscall
(64 位模式)来完成的。 What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 and The Definitive Guide to Linux System Calls.
将 int 0x80
视为 "call" 进入内核的一种方式,跨越 user/kernel 权限边界。 内核根据到 int 0x80
执行时寄存器中的值,然后最终是 returns。 return 值在 EAX 中。
当执行到达内核的入口点时,它查看 EAX 并根据 EAX 中的调用号分派到正确的系统调用。来自其他寄存器的值作为函数参数传递给该系统调用的内核处理程序。 (例如 eax=4 / int 0x80
将使内核调用其 sys_write
内核函数,实现 POSIX write
系统调用。)
另请参阅 int 0x80
。 (也适用于 32 位用户 -space,而不仅仅是你不应该使用 int 0x80
的 64 位)。
如果您还不知道低级 Unix 系统编程,您可能只想在 asm 中编写函数,这些函数接受 args 和 return 一个值(或通过指针 arg 更新数组)并调用它们来自 C 或 C++ 程序。然后你可以只担心学习如何处理寄存器和内存,而不用学习 POSIX 系统调用 API 和使用它的 ABI。这也使得将您的代码与 C 实现的编译器输出进行比较变得非常容易。编译器通常在编写高效代码方面做得很好,但是
libc 为系统调用提供包装函数,因此编译器生成的代码将 call write
而不是直接使用 int 0x80
调用它(或者如果您关心性能,sysenter
)。 (在 x86-64 代码中,use syscall
for the 64-bit ABI.) See also syscalls(2)
.
系统调用记录在第 2 节手册页中,例如 write(2)
. See the NOTES section for differences between the libc wrapper function and the underlying Linux system call. Note that the wrapper for sys_exit
is _exit(2)
, not the exit(3)
ISO C function that flushes stdio buffers and other cleanup first. There's also an exit_group
system call that exit(3)
实际上使用它,因为单线程进程没有缺点。
这段代码进行了 2 次系统调用:
我对它进行了大量评论(到了它开始模糊实际代码而没有颜色语法突出显示的地步)。这是向初学者指出问题的尝试,而不是您应该如何正常注释代码。
section .text ; Executable code goes in the .text section
global _start ; The linker looks for this symbol to set the process entry point, so execution start here
;;;a name followed by a colon defines a symbol. The global _start directive modifies it so it's a global symbol, not just one that we can CALL or JMP to from inside the asm.
;;; note that _start isn't really a "function". You can't return from it, and the kernel passes argc, argv, and env differently than main() would expect.
_start:
;;; write(1, msg, len);
; Start by moving the arguments into registers, where the kernel will look for them
mov edx,len ; 3rd arg goes in edx: buffer length
mov ecx,msg ; 2nd arg goes in ecx: pointer to the buffer
;Set output to stdout (goes to your terminal, or wherever you redirect or pipe)
mov ebx,1 ; 1st arg goes in ebx: Unix file descriptor. 1 = stdout, which is normally connected to the terminal.
mov eax,4 ; system call number (from SYS_write / __NR_write from unistd_32.h).
int 0x80 ; generate an interrupt, activating the kernel's system-call handling code. 64-bit code uses a different instruction, different registers, and different call numbers.
;; eax = return value, all other registers unchanged.
;;;Second, exit the process. There's nothing to return to, so we can't use a ret instruction (like we could if this was main() or any function with a caller)
;;; If we don't exit, execution continues into whatever bytes are next in the memory page,
;;; typically leading to a segmentation fault because the padding 00 00 decodes to add [eax],al.
;;; _exit(0);
xor ebx,ebx ; first arg = exit status = 0. (will be truncated to 8 bits). Zeroing registers is a special case on x86, and mov ebx,0 would be less efficient.
;; leaving out the zeroing of ebx would mean we exit(1), i.e. with an error status, since ebx still holds 1 from earlier.
mov eax,1 ; put __NR_exit into eax
int 0x80 ;Execute the Linux function
section .rodata ; Section for read-only constants
;; msg is a label, and in this context doesn't need to be msg:. It could be on a separate line.
;; db = Data Bytes: assemble some literal bytes into the output file.
msg db 'Hello, world!',0xa ; ASCII string constant plus a newline (0x10)
;; No terminating zero byte is needed, because we're using write(), which takes a buffer + length instead of an implicit-length string.
;; To make this a C string that we could pass to puts or strlen, we'd need a terminating 0 byte. (e.g. "...", 0x10, 0)
len equ $ - msg ; Define an assemble-time constant (not stored by itself in the output file, but will appear as an immediate operand in insns that use it)
; Calculate len = string length. subtract the address of the start
; of the string from the current position ($)
;; equivalently, we could have put a str_end: label after the string and done len equ str_end - str
请注意,我们不将字符串长度存储在数据内存中的任何位置。它是一个 assemble 时间常数,因此将它作为立即操作数比加载更有效。我们也可以使用三个 push imm32
指令将字符串数据压入堆栈,但代码大小过大并不是一件好事。
在 Linux 上,您可以将此文件另存为 Hello.asm
并 使用这些命令从中构建一个 32 位可执行文件:
nasm -felf32 Hello.asm # assemble as 32-bit code. Add -Worphan-labels -g -Fdwarf for debug symbols and warnings
gcc -static -nostdlib -m32 Hello.o -o Hello # link without CRT startup code or libc, making a static binary
有关将程序集构建为 32 位或 64 位静态或动态链接的 Linux 可执行文件的更多详细信息,请参见 this answer NASM/YASM 语法或 GNU AT&T 语法与 GNU as
指令。 (关键点:确保在 64 位主机上构建 32 位代码时使用 -m32
或等价物,否则在 运行 时你会遇到令人困惑的问题。)
您可以使用 strace
跟踪它的执行以查看它进行的系统调用:
$ strace ./Hello
execve("./Hello", ["./Hello"], [/* 72 vars */]) = 0
[ Process PID=4019 runs in 32 bit mode. ]
write(1, "Hello, world!\n", 14Hello, world!
) = 14
_exit(0) = ?
+++ exited with 0 +++
将此与动态链接进程的跟踪(如 gcc 从 hello.c 或从 运行ning strace /bin/ls
进行比较)以了解在下面发生了多少事情动态链接和 C 库启动的引擎盖。
stderr 上的跟踪和 stdout 上的常规输出都将转到此处的终端,因此它们干扰了 write
系统调用的线路。如果您愿意,可以重定向或跟踪到一个文件。请注意,这如何让我们轻松查看系统调用 return 值而无需添加代码来打印它们,实际上比使用常规调试器(如 gdb)单步执行并查看 eax
更容易为了这。有关 gdb asm 提示,请参阅 x86 tag wiki 的底部。 (标签 wiki 的其余部分充满了指向优质资源的链接。)
该程序的 x86-64 版本非常相似,将相同的参数传递给相同的系统调用,只是在不同的寄存器中并且使用 syscall
而不是 int 0x80
。请参阅
相关:A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux。您可以 运行 的最小二进制文件,它只进行 exit() 系统调用。那是关于最小化二进制大小,而不是源大小,甚至只是实际 运行.
的指令数