了解堆栈对齐实施

Question

考虑以下 C 代码：

#include <stdint.h>

void func(void) {
   uint32_t var = 0;
   return;
}

GCC 4.7.2 为上述代码生成的未优化（即：-O0 选项）汇编代码为：

func:
    pushl %ebp
    movl %esp, %ebp
    subl , %esp
    movl [=11=], -4(%ebp)
    nop
    leave
    ret

根据System V ABI的堆栈对齐要求，堆栈必须在每个[=14之前对齐16字节=] 指令（ 堆栈边界 在未使用选项 -mpreferred-stack-boundary 更改时默认为 16 字节）。因此，ESP modulo 16 的结果在函数调用之前必须为零。

考虑到这些堆栈对齐要求，我假设在执行 leave 指令之前以下堆栈的状态表示是正确的：

Size (bytes)       Stack          ESP mod 16      Description
-----------------------------------------------------------------------------------

             |     . . .      |             
             ------------------........0          at func call
         4   | return address |
             ------------------.......12          at func entry
         4   |   saved EBP    |
     ---->   ------------------........8          EBP is pointing at this address
     |   4   |      var       |
     |       ------------------........4
 16  |       |                |
     |  12   |                |
     |       |                |
     ---->   ------------------........8          after allocating 16 bytes

考虑到堆栈的这种表示，有两点让我感到困惑：

var 显然没有在堆栈上对齐到 16 个字节。这个问题好像和我读到的内容矛盾in this answer to this question（重点是我自己的）：

-mpreferred-stack-boundary=n where the compiler tries to keep items on the stack aligned to 2^n.

在我的例子中，没有提供 -mpreferred-stack-boundary，所以根据 this section of GCC's documentation（我得到的确实是一样的），它默认设置为 4（即：2^4=16 字节边界）结果 -mpreferred-stack-boundary=4).
在堆栈上分配 16 个字节（即 subl , %esp 指令）而不是仅分配 8 个字节的目的：分配 16 个字节后，堆栈既不按 16 字节对齐也不按任何内存 space 都可以使用。通过仅分配 8 个字节，堆栈按 16 个字节对齐，不会浪费额外的 8 个字节。

Answer 1

此回答旨在进一步发展上面写的一些评论。

首先，在Margaret Bloom's 的基础上，考虑对原贴的func()函数进行如下修改：

#include <stdint.h>

void bar(void);    

void func(void) {
   uint32_t var = 0;
   bar(); // <--- function call
   return;
}

与原始 func() 函数不同，重新定义的函数包含一个 函数调用 到 bar()。

这次生成的汇编代码是：

func:
    pushl %ebp
    movl %esp, %ebp
    subl , %esp
    movl [=11=], -12(%ebp)
    call bar
    nop
    leave
    ret

请注意，指令 subl , %esp 确实将堆栈对齐 16 个字节（原始 func() 函数中的 subl , %esp 指令没有）。

由于重新定义的 func() 现在包含一个函数调用（即：call bar），堆栈必须在执行 call 指令之前按 16 字节对齐。之前的func()根本没有调用任何函数，所以不需要栈对齐16字节

很明显，至少必须在堆栈上为 var 变量分配 4 个字节。为了将堆栈对齐 16 个字节，需要分配 4 个额外的字节。

有人可能会问为什么要分配 24 个字节 为了对齐堆栈，而只分配 8 bytes 就可以了。嗯，转述Ped7g's 的一部分，也回答了这个问题：

Also keep in mind the C compiler is not obliged to produce optimal code in any kind of metric, including stack space usage. While it will try hard (and from playing around with gcc 4.7.2 on godbolt it looks good, the junk space is result only of the alignment), there's no language-breaking problem if it would fail and allocate 16B more junk than truly needed (especially in unoptimized code).

Answer 2

查看 -O0 生成的机器代码通常是徒劳的。编译器将以最简单的方式发出任何有效的东西。这通常会导致奇怪的伪影。

栈对齐只是指栈帧的对齐。它与堆栈上对象的对齐方式没有直接关系。 GCC 将分配具有所需对齐方式的堆栈对象。如果 GCC 知道堆栈帧已经提供了足够的对齐，这会更简单，但如果不是，GCC 将使用帧指针并执行显式对齐。

了解堆栈对齐实施

Understanding stack alignment enforcement

gcc

assembly

x86

memory-alignment

abi