C 中使用 int 汇编程序指令的精确控制流,以及由此产生的段错误

exact control flow with int assembler instruction in C, and the resulting segfault

考虑这个完全愚蠢的代码:

int main() { __asm__("int [=10=]x2"); }

这会在 运行 时导致段错误。 2 是 intel 的 IDT 中 NMI 的代码(第 6.3.1 节 here)。

我很好奇为什么会出现段错误?最终导致段错误的控制流到底是什么?

同时在此处粘贴手册的第 6.3.3 节:

6.3.3 Software-Generated Interrupts
The INT n instruction permits interrupts to be generated from within software by supplying an interrupt vector number as an operand. For example, the INT 35 instruction forces an implicit call to the interrupt handler for interrupt 35. Any of the interrupt vectors from 0 to 255 can be used as a parameter in this instruction. If the processor’s predefined NMI vector is used, however, the response of the processor will not be the same as it would be from an NMI interrupt generated in the normal manner. If vector number 2 (the NMI vector) is used in this instruction, the NMI interrupt handler is called, but the processor’s NMI-handling hardware is not activated. Interrupts generated in software with the INT n instruction cannot be masked by the IF flag in the EFLAGS register.

idt中的gate包含一个descriptor privilege level (DPL) 是最大的caller privilege level (CPL)允许调用此条目。由 cpu 上的电信号引起的真实 NMI 提供了 0 的人工 CPL。这样,内核就不必区分真实信号和假信号。

通过 int xx 调用的系统服务将具有更大的 DPL,以允许指令 open 具有操作说明。根据您的内核,int 3(断点)、4(溢出)和 5(边界)可能会用作直接操作码以方便调试,分别是 "into" 和 "bounds" 操作码。

您发现了内核错误。您的程序正在尝试执行 user-space 程序禁止的 CPU 操作 (int 2),而不是无效的内存访问。因此,它应该发送 SIGILL(非法指令)信号,而不是 SIGSEGV 信号。

这个错误的原因可能是这个特定的禁止操作被报告给操作系统“#GP 错误”而不是“#UD 错误”(在 x86 体系结构手册中使用的术语)。 #GP 故障也用于报告无效的内存访问,编写代码将其映射到信号的人不会费心区分 "actual invalid memory access" 和 "improper use of int reported with #GP"。我在 Linux 和 NetBSD 上也观察到这个错误,所以这一定是一个容易犯的错误。

当您调试涉及信号的问题时,为麻烦的信号建立信号处理程序通常很有帮助,在标志中使用 sigactionSA_SIGINFO。当您设置 SA_SIGINFO 时,处理程序会收到两个额外的参数,它们提供有关信号的详细信息。您不必 在信号处理程序中使用 这些参数;相反,您所做的是 运行 调试器下的程序,允许传递信号,然后在调试器中检查详细信息。这是对您的程序的修改:

#include <signal.h>
#include <unistd.h>
#include <ucontext.h>
void handler(int s, siginfo_t *si, void *uc)
{
    pause();
}
int main(void)
{
    struct sigaction sa;
    sa.sa_sigaction = handler;
    sa.sa_flags = SA_SIGINFO | SA_RESTART;
    sigemptyset(&sa.sa_mask);
    sigaction(SIGBUS,  &sa, 0);
    sigaction(SIGFPE,  &sa, 0);
    sigaction(SIGILL,  &sa, 0);
    sigaction(SIGSEGV, &sa, 0);
    sigaction(SIGSYS,  &sa, 0);
    sigaction(SIGTRAP, &sa, 0);
    asm("int [=10=]x2");
}

uc 参数是指向 ucontext_t 的指针,但该类型是在 <ucontext.h> 中声明的,而不是 <signal.h>,因此规范说您必须定义处理程序接受类型为 void * 的第三个参数,如果你想使用它,然后转换它。)

我为 所有 对应于致命、同步 CPU 异常的信号设置了处理程序,因为为什么不呢。 pause 只是为了让执行在处理程序内无限期停止,所以我可以点击 control-C 进入调试器,信号帧将可用。

这是我在 Linux 上得到的:

(gdb) bt
#0  0x00007ffff7eb4af4 in __libc_pause ()
    at ../sysdeps/unix/sysv/linux/pause.c:29
#1  0x000055555555516d in handler (s=11, si=0x7fffffffd830, uc=0x7fffffffd700)
    at test.c:5
#2  <signal handler called>
#3  main () at test.c:14
(gdb) frame 1
#1  0x000055555555516d in handler (s=11, si=0x7fffffffd830, uc=0x7fffffffd700)
    at test.c:5
5       pause();
(gdb) p *si
 = {si_signo = 11, si_errno = 0, si_code = 128, __pad0 = 0, _sifields = {
    _pad = {0 <repeats 28 times>}, _kill = {si_pid = 0, si_uid = 0}, _timer = {
      si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0, 
        sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {
        sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, 
      si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x0, 
      si_addr_lsb = 0, _bounds = {_addr_bnd = {_lower = 0x0, _upper = 0x0}, 
        _pkey = 0}}, _sigpoll = {si_band = 0, si_fd = 0}, _sigsys = {
      _call_addr = 0x0, _syscall = 0, _arch = 0}}}
(gdb) p *(ucontext_t *)uc
 = {uc_flags = 7, uc_link = 0x0, uc_stack = {ss_sp = 0x0, ss_flags = 0, 
    ss_size = 0}, uc_mcontext = {gregs = {0, 0, 8, 582, 93824992235632, 
      140737488346656, 0, 0, 11, 140737488345936, 140737488346432, 0, 0, 0, 
      140737352200658, 140737488346272, 93824992235964, 66050, 
      12103423998558259, 18, 13, 0, 0}, fpregs = 0x7fffffffd8c0, 
    __reserved1 = {0, 1, 140737354129808, 140737488345320, 140737353799024, 
      140737354129808, 8455580781, 140737354130672}}, uc_sigmask = {__val = {
      0, 11, 128, 0 <repeats 13 times>}}, __fpregs_mem = {cwd = 0, swd = 0, 
    ftw = 0, fop = 0, rip = 140737488346656, rdp = 0, mxcsr = 895, 
    mxcr_mask = 0, _st = {{significand = {0, 0, 0, 0}, exponent = 0, 
        __glibc_reserved1 = {0, 0, 0}}, {significand = {8064, 0, 65535, 0}, 
        exponent = 0, __glibc_reserved1 = {0, 0, 0}}, {significand = {0, 0, 0, 
          0}, exponent = 0, __glibc_reserved1 = {0, 0, 0}}, {significand = {0, 
          0, 0, 0}, exponent = 0, __glibc_reserved1 = {0, 0, 0}}, {
        significand = {0, 0, 0, 0}, exponent = 0, __glibc_reserved1 = {0, 0, 
          0}}, {significand = {0, 0, 0, 0}, exponent = 0, __glibc_reserved1 = {
          0, 0, 0}}, {significand = {0, 0, 0, 0}, exponent = 0, 
        __glibc_reserved1 = {0, 0, 0}}, {significand = {0, 0, 0, 0}, 
        exponent = 0, __glibc_reserved1 = {0, 0, 0}}}, _xmm = {{element = {0, 
          0, 0, 0}} <repeats 16 times>}, __glibc_reserved1 = {
      0 <repeats 18 times>, 1179670611, 836, 7, 0, 832, 0}}, __ssp = {0, 0, 0, 
    3}}

siginfo_t结构基本没用;它有 si_code == 128,这意味着 "this signal was generated by the kernel but we're not going to tell you anything else about it,",所有其他字段均为零。我认为这是 另一个 内核错误。

ucontext_t结构更有用;特别是

(gdb) p/x ((ucontext_t *)uc)->uc_mcontext.gregs[REG_RIP]
 = 0x5555555551bc

这是引起信号的指令的地址。如果我反汇编main...

(gdb) disas main
...
0x00005555555551b7: callq  0x555555555030 <sigaction@plt>
0x00005555555551bc: int    [=13=]x2
0x00005555555551be: mov    [=13=]x0,%eax
0x00005555555551c3: leaveq 
0x00005555555551c4: retq   

...我看到引起信号的指令确实是int [=33=]x2

在 NetBSD 上,我得到一些稍微不同的东西:

(gdb) p *si
 = { si_pad = "[garbage]", _info = {
         _signo = 11, _code = 2, _errno = 0, _pad = 0, _reason = {
           _rt = {_pid = -146410395, _uid = 32639, _value = {sival_int = 4,
          sival_ptr = 0x4}}, _child = {_pid = -146410395, _uid = 32639,
        _status = 4, _utime = 0, _stime = 0}, _fault = {
        _addr = 0x7f7ff745f465 <__sigemptyset14>, _trap = 4, _trap2 = 0,
        _trap3 = 0}, _poll = {_band = 140187586131045, _fd = 4}}}}

这个siginfo_t实际上已经填写了。 si_code 2 对于 SIGSEGV 是 SEGV_ACCERR ("Invalid permissions for mapped object") 这不是废话。 headers 或联机帮助页中没有足够的信息让我理解 _trap = 4 的含义,或者为什么 _addr 指向 C 库中某处的地址,我不知道感觉像 source-diving 今天的 NetBSD 内核。 ;-)

此外,由于我今天不想调查的原因,NetBSD 上的 gdb 无法访问 ucontext_t 的定义(即使我明确包含 ucontext.h)所以我不得不将其原始转储:

(gdb) p *(ucontext_t *)uc
No symbol "ucontext_t" in current context.
(gdb) x/40xg uc
0x7f7fffffd7b0: 0x00000000000a000d      0x0000000000000000
0x7f7fffffd7c0: 0x0000000000000000      0x0000000000000000
0x7f7fffffd7d0: 0x0000000000000000      0x0000000000000000
0x7f7fffffd7e0: 0x0000000000000000      0x0000000000000005
0x7f7fffffd7f0: 0x00007f7fffffdb50      0x0000000000000000
0x7f7fffffd800: 0x00007f7ff7483a0a      0x0000000000000002
0x7f7fffffd810: 0x000000000000000d      0x00007f7ff749f340
0x7f7fffffd820: 0x0000000000000246      0x00007f7fffffdb90
0x7f7fffffd830: 0x00007f7ffffffdea      0x00007f7ff511a4c0
0x7f7fffffd840: 0x00007f7ffffffdea      0x00007f7fffffdb70
0x7f7fffffd850: 0x00007f7fffffffe0      0x0000000000000000
0x7f7fffffd860: 0x0000000000000000      0x0000000000000000
0x7f7fffffd870: 0x000000000000003f      0x00007f7ff748003f
0x7f7fffffd880: 0x0000000000000004      0x0000000000000012
0x7f7fffffd890: 0x0000000000400af5      0x000000000000e033  <---
0x7f7fffffd8a0: 0x0000000000010246      0x00007f7fffffdb50
0x7f7fffffd8b0: 0x000000000000e02b      0x00007f7ff7ffd0c0
0x7f7fffffd8c0: 0x000000000000037f      0x0000000000000000
0x7f7fffffd8d0: 0x0000000000000000      0x0000ffbf00001f80
0x7f7fffffd8e0: 0x0000000000000000      0x0000000000000000
(gdb) disas main
Dump of assembler code for function main:
   ...
   0x0000000000400af0 <+166>:   callq  0x400810 <__sigaction14@plt>
   0x0000000000400af5 <+171>:   int    [=15=]x2
   0x0000000000400af7 <+173>:   leaveq 
   0x0000000000400af8 <+174>:   retq

uc指向的内存区域中唯一与程序文本有任何对应关系的地址是0x0000000000400af5,也就是int的地址] 指令。