理解 x86 汇编程序

Question

我用 C 编译了一个程序，它只对输入的所有整数求和。例如，如果输入为 4，则输出为 4+3+2+1=10.

我在理解这个程序的汇编 x86 版本时遇到了一些麻烦。

所有评论都是我自己写的，请指出我得到了什么 right/wrong 以及您如何描述每一行的作用。通过您的评论，我将能够更深入地了解 cpu 到底做了什么，至于目前我不能说我完全理解这里发生的事情。不管怎样，就在这里。欢迎大家发表评论。

.LC0:
        .byte 0x25,0x64,0x0 ; 2 digits / integers that our program will output
main:
        pushl %ebp ; we save %ebp for later usage
        movl %esp,%ebp ; we set register %ebp to point to the stack frame
        subl ,%esp ; subtracts 18 bytes from the stack pointer (esp). This allocates 18 bytes of space on the stack to be used for variables.
        movl [=10=],-12(%ebp)
        leal -4(%ebp),%eax ; subtracks -4 from the memory address of ebp and stores it at register eax
        pushl %eax ; we store register eax for later usage
        pushl $.LC0
        call __isoc99_scanf ; reads from io port / waiting for key input
        movl ,-8(%ebp)
        leal 8(%esp),%esp ; adds +8 to stack pointer memory address
.L2:
        movl -4(%ebp),%edx
        cmpl -8(%ebp),%edx ; compares our input number with an incremented number
        jl .L3 ; if incremented number is equal or bigger than input number goto .L3
        movl -8(%ebp),%edx
        addl %edx,-12(%ebp)
        incl -8(%ebp)
        jmp .L2 ; loop / another addition to our input
.L3:
        pushl -12(%ebp)
        pushl $.LC0 ; we push the argument to print function
        call printf ; prints result on screen
        xorl %eax,%eax ; sets %eax to zero
        leave ; leave copies the frame pointer to the stack point and releases the stack space formerly used by a procedure for its local variables. leave pops the old frame pointer into (E)BP, thus restoring the caller's frame.
        ret ; returns to address located on the top of the stack```

Answer 1

你的分析大部分是正确的，有一些小错误我会尽量指出。我冒昧地打破了你的评论，这样他们就可以在不滚动的情况下阅读。

请注意，如果您打算将编译器输出伪装成手写汇编，您的老师很可能会发现这一点。不要试图像这样作弊。

作为初始说明：您的函数似乎具有三个变量。这些存储在 -4(%ebp)、-8(%ebp) 和 -12(%ebp)。这也是为什么 C 编译器发出代码将堆栈指针减少 12，为这些变量分配足够的存储空间。

    .byte 0x25,0x64,0x0 ; 2 digits / integers that our program will output

这是传递给 scanf 的字符串 "%d"。

    subl ,%esp ; subtracts 18 bytes from the stack pointer (esp).
                  ; This allocates 18 bytes of space on the stack to
                  ; be used for variables.

请注意，美元符号仅表示 立即数。 与其他汇编程序不同，它不表示 AT&T 语法中的十六进制数。正如您已经在 .byte 指令中看到的那样，这是通过 0x 前缀完成的。

    leal -4(%ebp),%eax ; subtracks -4 from the memory address of ebp
                       ; and stores it at register eax

这个解释令人费解。发生的事情是 leal 指令接受一个内存操作数并将操作数的有效地址存储到寄存器操作数中。所以在这种情况下，它计算-4(%ebp)的地址（即ebp加上4的内容）并将其存储到eax中。所以eax = ebp + 4。用于获取一块栈内存的地址，用于scanf.

    pushl %eax ; we store register eax for later usage
    pushl $.LC0

这不会保存 eax 供以后使用。相反，eax 作为以下 scanf 调用的参数被压入堆栈。同样，以下指令将字符串 .LC0 的地址压入堆栈，准备像 scanf("%d", &x) 这样的 scanf 调用，其中 x 是 -4(%ebp).

处的变量

    call __isoc99_scanf ; reads from io port / waiting for key input

这只是在其 C99 标准变体中调用 scanf 函数。 glibc 有一些逻辑可以根据您选择的 C 标准修订版将对 scanf 的调用重定向到不同的函数，因此它为 -std=c99.

选择了这个奇怪的命名符号

请注意，scanf 是一个 libc 函数。它不执行任何端口 IO，而是可能会或可能不会要求操作系统提供额外的输入。此输入的来源取决于您流程的标准输入所附加的内容。

    leal 8(%esp),%esp ; adds +8 to stack pointer memory address

就像任何其他通用寄存器一样，堆栈指针没有内存地址。但是，它确实有一个值，代表一个地址。所以说“将堆栈指针增加 8，将参数从堆栈中弹出”可能更准确。

.L2:
    movl -4(%ebp),%edx
    cmpl -8(%ebp),%edx ; compares our input number with an incremented number
    jl .L3 ; if incremented number is equal or bigger
           ;than input number goto .L3
    movl -8(%ebp),%edx
    addl %edx,-12(%ebp)
    incl -8(%ebp)
    jmp .L2 ; loop / another addition to our input
.L3:

试着找出这个循环的作用。一个简单的方法是将每条指令写成伪代码，然后重构直到看起来合理。尽管如果您是从自己编写的 C 代码中得到它，您可能已经知道它的作用。

    pushl -12(%ebp)
    pushl $.LC0 ; we push the argument to print function
    call printf ; prints result on screen

再一次，这会做类似 printf("%d", z) 的事情，其中 z 是 -12(%ebp) 处的变量。

    ret ; returns to address located on the top of the stack

说“return来自函数”可能更准确。这是因为leave刚刚清空栈帧，所以栈顶保存着return地址。根据调用约定，从您的函数中编辑的值 return 保存在 eax 中。这也是为什么eax早先被清除的原因：到return零。

理解 x86 汇编程序

understanding a x86 assembly program

assembly

x86

reverse-engineering

att