为什么在 Linux 中如此间接地调用导入的函数？

Question

考虑一个简单的 C 程序：

#include <stdio.h>

int main()
{
    puts("Hello");
    return 0;
}

运行它与 GDB，为简单起见设置了 LD_BIND_NOW=1，我可以观察到以下内容：

$ gdb -q ./test -ex 'b main' -ex r
Reading symbols from ./test...done.
Breakpoint 1 at 0x8048420
Starting program: /tmp/test 

Breakpoint 1, 0x08048420 in main ()
(gdb) disas
Dump of assembler code for function main:
   0x0804841d <+0>:     push   ebp
   0x0804841e <+1>:     mov    ebp,esp
=> 0x08048420 <+3>:     and    esp,0xfffffff0
   0x08048423 <+6>:     sub    esp,0x10
   0x08048426 <+9>:     mov    DWORD PTR [esp],0x8048500
   0x0804842d <+16>:    call   0x80482c0 <puts@plt>
   0x08048432 <+21>:    mov    eax,0x0
   0x08048437 <+26>:    leave  
   0x08048438 <+27>:    ret    
End of assembler dump.
(gdb) si 4
0x080482c0 in puts@plt ()
(gdb) disas
Dump of assembler code for function puts@plt:
=> 0x080482c0 <+0>:     jmp    DWORD PTR ds:0x8049670
   0x080482c6 <+6>:     push   0x0
   0x080482cb <+11>:    jmp    0x80482b0
End of assembler dump.
(gdb) si
_IO_puts (str=0x8048500 "Hello") at ioputs.c:35
35      {
(gdb)

显然，将PLT入口绑定到函数后，我们还是分两步调用：

call puts@plt
jmp [ds:puts_address]

将其与它在 Win32 中的实现方式进行比较，所有导入函数的调用，例如MessageBoxA，做得像

call [ds:MessageBoxA_address]

即一步到位。

即使考虑到惰性绑定，仍然有可能有例如[puts_address] 包含对 _dl_runtime_resolve 的调用或启动时需要的任何内容，因此一步间接调用仍然有效。

那么造成这种并发症的原因是什么？这是某种分支预测或分支目标预测优化吗？

编辑以响应 (v2)

我的实际意思是 call PLT; jump [GOT] 的这种间接寻址即使在惰性绑定的上下文中也是多余的。考虑以下示例（依赖于 gcc 未优化的编译）：

#include <stdio.h>

int main()
{
    for(int i=0;i<3;++i)
    {
        puts("Hello");
        __asm__ __volatile__("nop");
    }
    return 0;
}

运行它（未设置 LD_BIND_NOW）在 GDB 中：

$ gdb ./test -ex 'b main' -ex r -ex disas/r
Reading symbols from ./test...done.
Breakpoint 1 at 0x8048387
Starting program: /tmp/test 

Breakpoint 1, 0x08048387 in main ()
Dump of assembler code for function main:
   ...
   0x08048397 <+19>:    c7 04 24 80 84 04 08    mov    DWORD PTR [esp],0x8048480
   0x0804839e <+26>:    e8 11 ff ff ff  call   0x80482b4 <puts@plt>
   0x080483a3 <+31>:    90      nop
   0x080483a4 <+32>:    83 44 24 1c 01  add    DWORD PTR [esp+0x1c],0x1
   ...

反汇编puts@plt，可以看到puts的GOT入口地址：

(gdb) disas 'puts@plt'
Dump of assembler code for function puts@plt:
   0x080482b4 <+0>:     jmp    DWORD PTR ds:0x8049580
   0x080482ba <+6>:     push   0x10
   0x080482bf <+11>:    jmp    0x8048284
End of assembler dump.

所以我们看到它是 0x8049580。我们可以修补 main() 的代码，将 e8 11 ff ff ff 90（地址 0x8048e9e）更改为间接调用 GOT 条目，即 call [ds:0x8049580]: ff 15 80 95 04 08:

(gdb) set *(uint64_t*)0x804839e=0x44830804958015ff
(gdb) disas/r
Dump of assembler code for function main:
   ...
   0x08048397 <+19>:    c7 04 24 80 84 04 08    mov    DWORD PTR [esp],0x8048480
   0x0804839e <+26>:    ff 15 80 95 04 08       call   DWORD PTR ds:0x8049580
   0x080483a4 <+32>:    83 44 24 1c 01  add    DWORD PTR [esp+0x1c],0x1
   ...

运行这之后的程序还是给出：

(gdb) c
Continuing.
Hello
Hello
Hello
[Inferior 1 (process 14678) exited normally]

即第一次调用做的是惰性绑定，后面两次只是使用了fixup的结果（不信可以自己trace一下）。

所以问题仍然存在：为什么GCC不使用这种调用方式？

Answer 1

Apparently, after binding the PLT entry to the function, we still do a two-step call:
call puts@plt
jmp [ds:puts_address]

编译器和 linker 无法知道您将在运行时设置 LD_BIND_NOW=1，因此无法及时返回并重新编写生成的代码以直接使用call [puts_address].

另请参阅 gcc-patches 邮件列表中的 recent -fno-plt patches。

Win32

Win32 不允许惰性函数解析（至少默认情况下不允许）。换句话说，他们编译 / link 代码只有工作，就好像 LD_BIND_NOW=1 在编译 / link 时被硬编码一样。一些历史 here.

it's still possible to have e.g. [puts_address] contain the call to _dl_runtime_resolve or whatever is needed on startup, so the one-step indirect call would still work.

我觉得你很困惑。 [puts_address] 确实在启动时包含_dl_runtime_resolve（好吧，不完全是。Gory details）。您的问题是“为什么呼叫不能直接转到 [puts_address]，为什么需要 puts@plt？”。

答案是 _dl_runtime_resolve 需要知道 它正在解析哪个 函数。它无法从 puts 的参数中推断出该信息。 puts@plt 存在的全部意义就是向 _dl_runtime_resolve 提供该信息。

更新：

Why can't call <puts@plt> be replaced with call *[puts@GOT].

答案在第一个 -fno-plt patch 我参考了：

“这是有警告的。通常不能对所有人都这样做标记为 extern 的函数，因为编译器不可能判断是否函数是“真正外部的”（在共享库中定义）。如果一个函数不是真正的外部函数（最终在最终的可执行文件中定义），然后间接调用它是一种性能损失，因为它可能有是一个直接电话。"

然后您可能会问：为什么 linker（知道 puts 是在同一个二进制文件中还是在单独的 DSO 中定义）为什么不能重写 call *[puts@GOT]进入 call <puts@plt>?

答案是这些是不同的指令（不同的操作码），link使用者通常不更改指令，只更改指令内的地址（响应重定位条目）。

理论上 linker 可以做到这一点，但还没有人在意。

为什么在 Linux 中如此间接地调用导入的函数？

Why are imported functions called so indirectly in Linux?

linux

glibc

dynamic-linking

dlopen