gcc/clang 如何假设一个字符串常量的地址是 32 位的？

Question

如果我编译这个程序：

#include <stdio.h>

int main(int argc, char** argv) {
    printf("hello world!\n");
    return 0;
}

对于 x86-64，asm 输出使用 movl $.LC0, %edi / call puts。 (See full asm output / compile options on godbolt.)

我的问题是：GCC 如何知道字符串的地址可以放入 32 位立即操作数？为什么不需要使用 movabs $.LC0, %rdi（即 mov r64, imm64，而不是零或符号扩展的 imm32）。

AFAIK，没有什么说加载程序必须决定在任何特定地址加载数据部分。如果字符串存储在 1ULL << 32 以上的某个地址，则高位将被 movl 忽略。我对 clang 有类似的行为，所以我认为这不是 GCC 独有的。

我关心的原因是我想创建我自己的数据段，该数据段位于我选择的任意地址（可能超过 2^32）的内存中。

Answer 1

我可以确认这发生在 64 位编译上：

gcc -O1 foo.c

然后objdump -d a.out（还要注意printf("%s\n")可以是optimized into puts!）：

0000000000400536 <main>:
  400536:       48 83 ec 08             sub    [=11=]x8,%rsp
  40053a:       bf d4 05 40 00          mov    [=11=]x4005d4,%edi
  40053f:       e8 cc fe ff ff          callq  400410 <puts@plt>
  400544:       b8 00 00 00 00          mov    [=11=]x0,%eax
  400549:       48 83 c4 08             add    [=11=]x8,%rsp
  40054d:       c3                      retq   
  40054e:       66 90                   xchg   %ax,%ax

原因是GCC默认为-mcmodel=small，其中静态数据链接在地址space的底部2G。

请注意，字符串常量不会进入数据段，而是位于代码段内，除非 -fwritable-strings。此外，如果您想在内存中自由地重新定位目标代码，您可能希望使用 -fpic 进行编译以使代码 RIP 相对，而不是将 64 位地址放在各处。

Answer 2

在 GCC 手册中：

https://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc/i386-and-x86_002d64-Options.html

3.17.15 Intel 386 和 AMD x86-64 选项

-mcmodel=small

Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.

-mcmodel=kernel Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.

-mcmodel=medium

Generate code for the medium model: The program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put into large data or bss sections and can be located above 2GB. Programs can be statically or dynamically linked.

-mcmodel=large

Generate code for the large model: This model makes no assumptions about addresses and sizes of sections.

https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

3.18.1 AArch64 选项

-mcmodel=tiny

Generate code for the tiny code model. The program and its statically defined symbols must be within 1GB of each other. Pointers are 64 bits. Programs can be statically or dynamically linked. This model is not fully implemented and mostly treated as ‘small’.

-mcmodel=small

Generate code for the small code model. The program and its statically defined symbols must be within 4GB of each other. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.

-mcmodel=large

Generate code for the large code model. This makes no assumptions about addresses and sizes of sections. Pointers are 64 bits. Programs can be statically linked only.

gcc/clang 如何假设一个字符串常量的地址是 32 位的？

How can gcc/clang assume a string constant's address is 32-bit?

c

linux

linker

x86-64

elf