Return get_pc_thunk 的值未被使用

Question

我有这个程序：

static int aux() {
    return 1;
}
int _start(){
    int a = aux();
    return a;
}

当我使用带标志 -nostdlib -m32 -fpie 的 GCC 编译它并生成 ELF 二进制文件时，我得到以下汇编代码：

00001000 <aux>:
    1000:   f3 0f 1e fb             endbr32 
    1004:   55                      push   %ebp
    1005:   89 e5                   mov    %esp,%ebp
    1007:   e8 2d 00 00 00          call   1039 <__x86.get_pc_thunk.ax>
    100c:   05 e8 2f 00 00          add    [=13=]x2fe8,%eax
    1011:   b8 01 00 00 00          mov    [=13=]x1,%eax
    1016:   5d                      pop    %ebp
    1017:   c3                      ret    

00001018 <_start>:
    1018:   f3 0f 1e fb             endbr32 
    101c:   55                      push   %ebp
    101d:   89 e5                   mov    %esp,%ebp
    101f:   83 ec 10                sub    [=13=]x10,%esp
    1022:   e8 12 00 00 00          call   1039 <__x86.get_pc_thunk.ax>
    1027:   05 cd 2f 00 00          add    [=13=]x2fcd,%eax
    102c:   e8 cf ff ff ff          call   1000 <aux>
    1031:   89 45 fc                mov    %eax,-0x4(%ebp)
    1034:   8b 45 fc                mov    -0x4(%ebp),%eax
    1037:   c9                      leave  
    1038:   c3                      ret    

00001039 <__x86.get_pc_thunk.ax>:
    1039:   8b 04 24                mov    (%esp),%eax
    103c:   c3                      ret

我知道get_pc_thunk函数用于在x86中实现与位置无关的代码，但在这种情况下我不明白为什么要使用它。我的问题是：

该函数正在返回 eax 寄存器中下一条指令的地址，并且在这两种用法中，add 指令用于使 eax 指向 GOT .通常，（至少在访问全局变量时），这个 eax 寄存器将立即用于访问 table 中的全局变量。然而，在这种情况下， eax 被完全忽略了。这是怎么回事？
我也不明白为什么 get_pc_thunk 甚至出现在代码中，因为两个 call 指令都使用相对地址。由于地址是相对的，它们不应该开箱即用地与位置无关吗？

谢谢！

Answer 1

您没有启用优化，因此 GCC 发出函数序言而不考虑它们是否对相关函数有用。

要查看 get_pc_thunk 的结果，请访问全局变量。

要删除对 get_pc_thunk 的无用调用启用优化，例如通过将 -O2 添加到 GCC 命令行。

Answer 2

If, however, I move the aux() function to another compilation unit, the get_pc_thunk function remains being called, even with -O2, and, again, its return value is being ignored.

IIRC, EBX=GOT 点是PLT 自己assumed/required 调用必须通过PLT 因为在编译这个编译单元时不知道 aux 定义将被静态 link 编辑。（https://godbolt.org/z/Yere9o 显示 main 的效果只有 aux() 的原型，而不是它可以内联的定义。）

使用 "hidden" ELF 可见性属性，我们可以让它消失，因为编译器知道它不需要通过 PLT 间接访问，因为 call rel32 将在静态 link时间不需要运行时间搬迁：https://godbolt.org/z/73dGKq

__attribute__((visibility("hidden"))) int aux(void);
int _start(){
    int a = aux();
    return a;
}

gcc10.1 -O2 -m32 -fpie

_start:
        jmp     aux

IMO it makes sense to have the call in object files generated for compilation units that are calling external functions, but I don't understand why the linker (or the 'flow') is not removing them in the final binary.

@felipeek：好问题。 linker 不知道什么时候可以放松调用 foo@plt 来调用 foo，因为这也会禁用符号插入。即使此 ELF 共享库中有 foo 的定义，较早加载的定义中的定义也可以覆盖它/优先。我认为这个“问题”是由于 PIE 可执行文件是从一种 hack 演变而来的：在共享对象中放置一个入口点，动态 linker 将愿意运行它。即在 ELF 级别，PIE 可执行文件与 .so 相同，并且 -fpie 和 -fPIC 看起来与 linker.

相同

linker 可以走另一条路，不过：如果制作一个普通的非 PIE 可执行文件（ELF 类型 = EXEC），它可以将调用 foo 变成调用 foo@plt，但是 PLT 本身不必是 PIE/PIC 所以它不需要 EBX=GOT.

Are we saying that all calls to other compilation units will invoke a totally unnecessary call in the final binary when PIE is required?

不，只有在 32 位 PIE 代码中您无法告诉编译器它是使用 ELF“隐藏”可见性的“内部”符号。您甚至可以为同一个符号设置 2 个名称，其中一个具有隐藏的可见性，因此您可以创建一个库可以按名称解析的函数，但您仍然可以从可执行文件中使用简单 call rel32 而不是通过 PLT 进行笨重的间接调用。

这是 PIE 的缺点之一。即使在 64 位代码中，如果没有该属性，您也会得到 jmp aux@PLT。（或者使用 -fno-plt，使用 GOT 条目的 RIP 相对寻址的间接调用。）

32 位 PIE 的性能确实很糟糕，平均为 15%（不久前在当时的 CPU 上测量，可能会有所不同。）对 x86-64 的影响要小得多，其中 RIP 相对寻址可用，比如几个 %。 32-bit absolute addresses no longer allowed in x86-64 Linux? 有一些 link 更详细。

Return get_pc_thunk 的值未被使用

Return value of get_pc_thunk not being used

linux

x86

assembly

gcc

loader