为什么 scanf() 加载的地址似乎低于我正在写入的缓冲区的地址？

Question

我为 class 作业编写了一个故意缓冲区溢出的 C 程序。在我的程序中，我有一个 main 函数来接受用户的名称作为长度为 50 的字符数组。然后该名称作为长度为 50 的字符数组传递，其中打印消息 "Hello, user!"。用户将替换为用户提供的名称。我不对 scanf() 函数进行任何长度检查，而是获取输入直到遇到换行符。结果，我能够覆盖运行缓冲区，覆盖 main 的 return 地址并导致分段错误。

当我使用 GDB 命令反汇编 main 时，我能够看到地址 [ebp - 0x3a] 已加载并压入堆栈以用作 [=16] 的参数=] 函数（见下图）。我假设这是缓冲区的开始，直到我将 0x3a 转换为十进制并发现它的值为 58。为什么要将额外的 8 个字节分配给字符缓冲区？为什么当我尝试运行这个缓冲区溢出时，当缓冲区长度似乎从距 ebp 58 字节和距 [=30 62 字节开始时，只需要 54 个字符就可以超过运行缓冲区=] address? 同样，我使用 ebp-0x3a.

计算了到 return 地址的长度

代码：

#include <stdio.h>
#include <string.h>
void printHello(char fname[]);
int main() {
 
    char name[50]; 
    printf("Please enter a name to print a hello message!"); 
    scanf("%[^\n]", name); 

    printHello(name); 
    return 0;
}
void printHello(char fname[50]){

    int strLen = strlen(fname);

    printf("Hello, ");
    for(int i=0; i<strLen; i++){

        printf("%c", fname[i]);
     }
       printf("!\n");
}

反汇编main函数：

Dump of assembler code for function main:
   0x080484fb <+0>: lea    ecx,[esp+0x4]
   0x080484ff <+4>: and    esp,0xfffffff0
   0x08048502 <+7>: push   DWORD PTR [ecx-0x4]
   0x08048505 <+10>:    push   ebp
   0x08048506 <+11>:    mov    ebp,esp
   0x08048508 <+13>:    push   ecx
   0x08048509 <+14>:    sub    esp,0x44
   0x0804850c <+17>:    sub    esp,0xc
   0x0804850f <+20>:    push   0x8048640
   0x08048514 <+25>:    call   0x8048390 <printf@plt>
   0x08048519 <+30>:    add    esp,0x10
   0x0804851c <+33>:    sub    esp,0x8
   0x0804851f <+36>:    lea    eax,[ebp-0x3a]
   0x08048522 <+39>:    push   eax
   0x08048523 <+40>:    push   0x804866e
   0x08048528 <+45>:    call   0x80483e0 <__isoc99_scanf@plt>
   0x0804852d <+50>:    add    esp,0x10
   0x08048530 <+53>:    sub    esp,0xc
   0x08048533 <+56>:    lea    eax,[ebp-0x3a]
   0x08048536 <+59>:    push   eax
   0x08048537 <+60>:    call   0x804854c <printHello>
   0x0804853c <+65>:    add    esp,0x10
   0x0804853f <+68>:    mov    eax,0x0
   0x08048544 <+73>:    mov    ecx,DWORD PTR [ebp-0x4]
   0x08048547 <+76>:    leave  
   0x08048548 <+77>:    lea    esp,[ecx-0x4]
   0x0804854b <+80>:    ret    
End of assembler dump.

Answer 1

I assumed that this is the start of the buffer, until I converted 0x3a to decimal and found out its value was 58.

是缓冲区的开始，但为什么您会假设它应该位于 ebp 的特定偏移处？没有书面规则说一个函数应该有一个恰好与其局部变量大小相同的堆栈。编译器几乎可以为所欲为。事实上，它最终可能会使用更多 space 来保留寄存器值，maintain alignment，或者甚至只是在需要时浪费它。这个问题被问过无数次了，可惜真的没有确定的答案，你不妨成为一个GCC开发者去尝试了解一下:')

现存的一些问题和很好的回答供参考：

Why does GCC allocate more space than necessary on the stack, beyond what's needed for alignment?

除上述之外，您正在编译时没有进行任何优化，正如我从 add esp,0x10; sub esp,0x8 等无意义的指令中可以看出的那样。 GCC 喜欢在未启用优化时将内容移回堆栈 to/from，并且也不太注意以最佳方式管理堆栈 space。

Why when I try to run this buffer overflow, do only need 54 characters to overrun the buffer

从技术上讲，您只需要 50 个字符的输入就可以溢出缓冲区（scanf() 会自动添加终止符 [=13=]）。但是，这些可能不足以“破坏”任何东西。

为了更清楚地说明这一点，我们假设最初调用 main() 时 esp 是 0x1000。如果我的数学是正确的，调用 scanf() 时（就在 call 执行之前）的堆栈布局应该如下所示：

esp -> 0x0fac: 0x804866e // scanf() arg1
       0x0fb0: 0x0fbe    // scanf() arg2
       0x0fb4: ????
       0x0fb8: ????
       0x0fbc: ??AA <-- eax == 0x0fbe == ebp-0x3a
       0x0fc0: AAAA   
       0x0fc4: AAAA
       0x0fc8: AAAA
       0x0fcc: AAAA
       0x0fd0: AAAA
       0x0fd4: AAAA
       0x0fd8: AAAA
       0x0fdc: AAAA
       0x0fe0: AAAA
       0x0fe4: AAAA
       0x0fe8: AAAA
       0x0fec: AAAA
       0x0ff0: ????
       0x0ff4: 0x1004 // saved original esp+0x4, later used to restore esp
ebp -> 0x0ff8: <saved ebp>
       0x0ffc: ????
       0x1000: ????   // 0x1000 original esp at start of main()
       0x1004: ????

在上图中，As 表示您的数组，从 0x0fbe 开始。

您很可能恰好在 54（+1 终止符 = 55）处遇到分段错误，因为这正是更改保存的 esp+0x4 值所需的最低限度（在示例 0x1004 ) 并在稍后用于恢复 esp (mov ecx,DWORD PTR [ebp-0x4]; leave; lea esp,[ecx-0x4]) 时造成麻烦，最终导致无效的堆栈指针。

为什么 scanf() 加载的地址似乎低于我正在写入的缓冲区的地址？

Why does scanf() appear to load an address lower than that of the buffer I am writing to?

c

x86

assembly

buffer-overflow