无法从汇编 (yasm) 代码调用 64 位 Linux 上的 C 标准库函数

Question

我有一个函数 foo 用汇编语言编写，并在 Linux (Ubuntu) 64 位上用 yasm 和 GCC 编译。它只是使用 puts() 将消息打印到 stdout，如下所示：

bits 64

extern puts
global foo

section .data

message:
  db 'foo() called', 0

section .text

foo:
  push rbp
  mov rbp, rsp
  lea rdi, [rel message]
  call puts
  pop rbp
  ret

被GCC编译的C程序调用：

extern void foo();

int main() {
    foo();
    return 0;
}

构建命令：

yasm -f elf64 foo_64_unix.asm
gcc -c foo_main.c -o foo_main.o
gcc foo_64_unix.o foo_main.o -o foo
./foo

这是问题所在：

当运行程序打印错误消息并在调用 puts 期间立即出现段错误：

./foo: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault

在用 objdump 反汇编后，我发现调用地址是错误的：

0000000000000660 <foo>:
 660:   90                      nop
 661:   55                      push   %rbp
 662:   48 89 e5                mov    %rsp,%rbp
 665:   48 8d 3d a4 09 20 00    lea    0x2009a4(%rip),%rdi
 66c:   e8 00 00 00 00          callq  671 <foo+0x11>      <-- here
 671:   5d                      pop    %rbp
 672:   c3                      retq

(671是下一条指令的地址，不是puts的地址)

但是，如果我用 C 语言重写相同的代码，调用的方式会有所不同：

645:   e8 c6 fe ff ff          callq  510 <puts@plt>

即它引用了 PLT 中的 puts。

是否可以告诉 yasm 生成类似的代码？

Answer 1

0xe8 操作码后跟一个带符号的偏移量，该偏移量将应用于 PC（此时已前进到下一条指令）以计算分支目标。因此 objdump 将分支目标解释为 0x671。

YASM 正在呈现零，因为它可能已经在该偏移量上放置了一个重定位，这就是它要求加载程序在加载期间为 puts 填充正确偏移量的方式。加载程序在计算重定位时遇到溢出，这可能表明 puts 与您的调用的偏移量比 32 位有符号偏移量所能表示的更远。因此加载程序无法修复这条指令，你会崩溃。

66c: e8 00 00 00 00 显示未填充的地址。如果您查看重定位 table，您应该会在 0x66d 上看到重定位。汇编器将重定位填充为全零的 addresses/offsets 并不少见。

This page表明YASM有一个WRT指令可以控制.got、.plt等

的使用

根据 the NASM documentation 上的 S9.2.5，您似乎可以使用 CALL puts WRT ..plt（假设 YASM 具有相同的语法）。

Answer 2

TL:DR: 3 个选项：

构建一个非 PIE 可执行文件 (gcc -no-pie -fno-pie call-lib.c libcall.o)，以便 linker 将在您编写 call puts.
call puts wrt ..plt 就像 gcc -fPIE 一样。
call [rel puts wrt ..got] 就像 gcc -fno-plt 一样。

后两者将在 PIE 可执行文件或共享库中工作。第三种方式，wrt ..got，效率稍微高一些。

默认情况下，您的 gcc 正在构建 PIE 可执行文件 (32-bit absolute addresses no longer allowed in x86-64 Linux?)。

我不确定为什么，但是这样做时 linker 不会自动将 call puts 解析为 call puts@plt。仍然生成了一个 puts PLT 条目，但是 call 没有去那里。

在运行时间，动态 linker 尝试将 puts 直接解析为该名称的 libc 符号并修复 call rel32。但是该符号距离超过 +-2^31，因此我们收到有关 R_X86_64_PC32 重定位溢出的警告。目标地址的低 32 位正确，但高位不正确。（因此你的 call 跳转到一个错误的地址）。

如果我使用 gcc -no-pie -fno-pie call-lib.c libcall.o 构建，您的代码对我有用。 -no-pie 是关键部分：它是 linker 选项。您的 YASM 命令不必更改。

在制作传统的依赖于位置的可执行文件时，linker 会为您将调用目标的 puts 符号变成 puts@plt，因为我们 linking 动态可执行文件（而不是使用 gcc -static -fno-pie 静态地 linking libc，在这种情况下 call 可以直接到 libc 函数。 )

无论如何，这就是为什么 gcc 在使用 -fpie（桌面上的默认设置，而不是 https://godbolt.org/ 上的默认设置）编译时发出 call puts@plt（GAS 语法）的原因，但只是call puts 用 -fno-pie 编译时。

查看 What does @plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux 几年前的内容。（现代 gcc -fno-plt 就像那个博客 post 中的想法之一。）

顺便说一句，一个更 accurate/specific 的原型会让 gcc 在调用 foo:

之前避免将 EAX 清零

extern void foo(); 在 C 中表示 extern void foo(...);
您可以将其声明为 extern void foo(void);，这就是 () 在 C++ 中的含义。 C++ 不允许未指定参数的函数声明。

asm 改进

您也可以将message放入section .rodata（只读数据，link作为文本段的一部分）。

您不需要堆栈帧，只需要在调用前将堆栈按 16 对齐即可。一个假人 push rax 就可以做到。

或者我们可以通过跳转而不是调用它来尾调用puts，堆栈位置与此函数的入口相同。无论有没有 PIE，这都适用。只需将 call 替换为 jmp，只要 RSP 指向您自己的 return 地址即可。

如果你想制作 PIE 可执行文件（或共享库），你有两个选择

call puts wrt ..plt - 通过 PLT 显式调用。
call [rel puts wrt ..got] - 通过 GOT 条目明确地进行间接调用，如 gcc 的 -fno-plt 代码生成风格。（使用相对于 RIP 的寻址模式到达 GOT，因此使用 rel 关键字）。

WRT = 关于。 NASM手册documents wrt ..plt, and see also section 7.9.3: special symbols and WRT.

通常您会在文件顶部使用 default rel，这样您实际上可以使用 call [puts wrt ..got] 并仍然获得 RIP 相对寻址模式。不能在 PIE 或 PIC 代码中使用 32 位绝对寻址方式。

call [puts wrt ..got] assembles 到使用动态 linking 存储在 GOT 中的函数指针的内存间接调用。（早期绑定，不是惰性动态 linking。）

NASM文档..got获取section 9.2.3中变量的地址。（其他）库中的函数是相同的：您从 GOT 获取指针而不是直接调用，因为偏移量不是 link 时间常数并且可能不适合 32 位。

YASM 也接受 call [puts wrt ..GOTPCREL]，就像 AT&T 语法 call *puts@GOTPCREL(%rip)，但 NASM 不接受。

; don't use BITS 64.  You *want* an error if you try to assemble this into a 32-bit .o

default rel          ; RIP-relative addressing instead of 32-bit absolute by default; makes the [rel ...] optional

section .rodata            ; .rodata is best for constants, not .data
message:
  db 'foo() called', 0

section .text

global foo
foo:
    sub    rsp, 8                ; align the stack by 16

    ; PIE with PLT
    lea    rdi, [rel message]      ; needed for PIE
    call   puts WRT ..plt          ; tailcall puts
;or
    ; PIE with -fno-plt style code, skips the PLT indirection
    lea   rdi, [rel message]
    call  [rel  puts wrt ..got]
;or
    ; non-PIE
    mov    edi, message           ; more efficient, but only works in non-PIE / non-PIC
    call   puts                   ; linker will rewrite it into call puts@plt

    add   rsp,8                   ; restore the stack, undoing the add
    ret

在位置-dependent Linux可执行文件中，您可以使用mov edi, message代替RIP-relative LEA。它的代码更小，并且可以运行在大多数 CPU 上的更多执行端口上。（有趣的事实：MacOS 总是将“图像库”放在低 4GiB 之外，因此这种优化在那里是不可能的。）

在非 PIE 可执行文件中，您也可以使用 call puts 或 jmp puts 并让 linker 解决它，除非您想要更高效的 no-plt样式动态 linking。但是如果你确实选择静态 link libc，我认为这是你获得直接 jmp 到 libc 函数的唯一方法。

(我认为非PIE静态linking的可能性是为什么 ld愿意为非PIE自动生成PLT存根，但不适用于 PIE 或共享库。它要求您在 linking ELF 共享对象时说出您的意思。）

如果您确实在 PIE (call rel32) 中使用了 call puts，则只有当您将 puts 的位置无关实现静态 link你的 PIE，所以整个东西是一个可执行文件，它将在运行时间加载到随机地址（通过通常的动态-linker 机制），但根本不依赖于 libc.so.6

当目标出现在静态 link 时间时，链接器“放松”调用

GAS call *bar@GOTPCREL(%rip) 使用 R_X86_64_GOTPCRELX（放松）
NASM call [rel bar wrt ..got] 使用 R_X86_64_GOTPCREL （不可放松）

这对于手写汇编来说问题不大；当您知道该符号将出现在您要 link 的另一个 .o（而不是 .so）中时，您可以只使用 call bar。但是 C 编译器不知道库函数和您使用原型声明的其他用户函数之间的区别（除非您使用 gcc -fvisibility=hidden https://gcc.gnu.org/wiki/Visibility 或属性/编译指示之类的东西）。

不过，如果您静态地 link 一个库，您可能想要编写 link er 可以优化的 asm 源代码，但是据我所知，您不能使用 NASM 做到这一点。您可以使用 global bar:function hidden 将符号导出为隐藏（在静态 link 时可见，但在最终 .so 中动态 linking 时不可见），但它在源文件中定义函数，而不是访问它的文件。

global bar
bar:
    mov eax,231
    syscall

    call bar wrt ..plt
    call [rel bar wrt ..got]
extern bar

第二个文件，用nasm -felf64组装并用objdump -drwc -Mintel反汇编后查看重定位：

0000000000000000 <.text>:
   0:   e8 00 00 00 00          call   0x5      1: R_X86_64_PLT32       bar-0x4
   5:   ff 15 00 00 00 00       call   QWORD PTR [rip+0x0]        # 0xb 7: R_X86_64_GOTPCREL    bar-0x4

使用 ld (GNU Binutils) 2.35.1 linking 后 - ld bar.o bar2.o -o bar

0000000000401000 <_start>:
  401000:       e8 0b 00 00 00          call   401010 <bar>
  401005:       ff 15 ed 1f 00 00       call   QWORD PTR [rip+0x1fed]        # 402ff8 <.got>
  40100b:       0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]

0000000000401010 <bar>:
  401010:       b8 e7 00 00 00          mov    eax,0xe7
  401015:       0f 05                   syscall

请注意，PLT 形式放宽为直接 call bar，PLT 被淘汰。但是 ff 15 调用 [rel mem] not 放松为 e8 rel32

使用天然气：

_start:
        call    bar@plt
        call    *bar@GOTPCREL(%rip)

gcc -c foo.s && disas foo.o

0000000000000000 <_start>:
   0:   e8 00 00 00 00          call   5 <_start+0x5>   1: R_X86_64_PLT32       bar-0x4
   5:   ff 15 00 00 00 00       call   QWORD PTR [rip+0x0]        # b <_start+0xb>      7: R_X86_64_GOTPCRELX   bar-0x4

注意 R_X86_64_GOTPCRELX 末尾的 X。
ld bar2.o foo.o -o bar && disas bar:

0000000000401000 <bar>:
  401000:       b8 e7 00 00 00          mov    eax,0xe7
  401005:       0f 05                   syscall 

0000000000401007 <_start>:
  401007:       e8 f4 ff ff ff          call   401000 <bar>
  40100c:       67 e8 ee ff ff ff       addr32 call 401000 <bar>

两个调用都放松到直接 e8 call rel32 直接到目标地址。间接调用中的额外字节用 67 地址大小前缀填充（对 call rel32 没有影响），将指令填充到相同的长度。（因为重新assemble和重新计算函数内的所有相关分支以及对齐等已经太晚了。）

如果您使用 gcc -static.

静态 linked libc，call *puts@GOTPCREL(%rip) 就会发生这种情况

无法从汇编 (yasm) 代码调用 64 位 Linux 上的 C 标准库函数

Can't call C standard library function on 64-bit Linux from assembly (yasm) code

linux

assembly

x86-64

nasm

yasm

如果你想制作 PIE 可执行文件（或共享库），你有两个选择

当目标出现在静态 link 时间时，链接器“放松”调用