如何使用 SECCOMP_RET_DATA 和 PTRACE_GETEVENTMSG 获取系统调用的 return 代码

How to get the return code of the syscall using SECCOMP_RET_DATA and PTRACE_GETEVENTMSG

我在尝试使用 ptrace + seccomp 获取系统调用的 return 值时有点困惑。

man 4 bpf 说:

 FILTER MACHINE
A filter program is an array of instructions, with  all branches forwardly 
directed, terminated by a return instruction

man 2 ptrace 说:

 PTRACE_O_TRACESECCOMP  
While this triggers a PTRACE_EVENT stop, it is
similar to a syscall-enter-stop, in that the tracee has not yet
entered the syscall that seccomp triggered on. The seccomp event
message data (from the SECCOMP_RET_DATA portion of the seccomp filter
rule) can be retrieved with PTRACE_GETEVENTMSG.

 PTRACE_GETEVENTMSG 
For PTRACE_EVENT_SECCOMP, this is the seccomp(2)
filter's SECCOMP_RET_DATA associated with the triggered rule.

man 2 seccomp 说:

 SECCOMP_RET_TRACE
The tracer will be notified of a 
PTRACE_EVENT_SECCOMP  and  the  SECCOMP_RET_DATA
portion of the filter's return value will be available to 
the tracer via PTRACE_GETEVENTMSG
 [...]
The seccomp check will not be run again after the tracer is notified.

事实证明,BPF 程序无法在 BPF_RET 语句之后执行进一步的操作。所以当 tracee 在 SECCOMP_RET_TRACE 上被中断时,它处于 syscall-enter-stop 状态并且系统调用尚未进行,因此, return 代码肯定无处可去。我希望在后续调用 PTRACE_SYSCALL 之后,tracee 将处于 syscall-exit-stop 状态,并且 tracer 将能够使用 PTRACE_GETEVENTMSG 获得系统调用的结果。但它在我的示例中不起作用。

#include <linux/filter.h>
#include <linux/seccomp.h>
#include <linux/unistd.h>
#include <stddef.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    pid_t pid;
    int status;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s <prog> <arg1> ... <argN>\n", argv[0]);
        return 1;
    }

    if ((pid = fork()) == 0) {
        ptrace(PTRACE_TRACEME, 0, 0, 0);

        struct sock_filter filter[] = {
            BPF_STMT(BPF_LD | BPF_W | BPF_ABS, (offsetof(struct seccomp_data, nr))),
            BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_open, 1, 2),
            BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_openat, 0, 1),
            BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_TRACE | SECCOMP_RET_DATA),
            BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        };
        struct sock_fprog prog = {
            .filter = filter,
            .len = (unsigned short) (sizeof(filter)/sizeof(filter[0])),
        };

        if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) == -1)
            return 2;
        if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) == -1)
            return 3;

        kill(getpid(), SIGSTOP);
        return execvp(argv[1], argv + 1);
    } else {
        waitpid(pid, &status, 0);
        ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESECCOMP);
        ptrace(PTRACE_CONT, pid, 0, 0);

        int status = 0;
        unsigned long ret_data = 0;
        while(1) {
            while (1) {
                waitpid(pid, &status, 0);
                fprintf(stderr, "status = %08x\n", status);

                if (status >> 8 == (SIGTRAP | (PTRACE_EVENT_SECCOMP << 8)))
                    break;

                if (WIFEXITED(status))
                    return 0;
                ptrace(PTRACE_CONT, pid, 0, 0);
            }
            // restart stopped tracee
            ptrace(PTRACE_SYSCALL, pid, 0, 0);
            // wait for SIGTRAP, when tracee will be in the syscall-exit-stop state
            waitpid(pid, &status, 0);

            ptrace(PTRACE_GETEVENTMSG, pid, 0, &ret_data);
            fprintf(stderr, "retdat = %lu\n", ret_data);

            ptrace(PTRACE_CONT, pid, 0, 0);
        }
        return 0;
    }
}

我能够获得系统调用的return代码检查寄存器

    // ptrace(PTRACE_GETEVENTMSG, pid, 0, &ret_data);
    struct user_regs_struct regs;
    ptrace(PTRACE_GETREGS, pid, 0, &regs);
    fprintf(stderr, "retdat = %lu\n", regs.rax);

但我想知道如何按照文档中指定的方式进行操作。

How to get the return code of the syscall using SECCOMP_RET_DATA and PTRACE_GETEVENTMSG?

简单的回答是你不能。 seccomp 事件甚至在进入系统调用之前就已发送。你看不到任何结果,因为还没有任何系统调用。要获得一个,您必须在收到 seccomp 事件后使用 PTRACE_SYSCALL 旋转该过程两次:

bool WaitForSyscallExit(const pid_t pid)
{
  bool entered = false;
  int  status  = 0;

  while (true)
  {
    ptrace(PTRACE_SYSCALL, pid, 0, 0);
    waitpid(pid, &status, 0);

    if (WSTOPSIG(status) == SIGTRAP)
    {
      if (entered)
      {
        // If we had already entered before, then current SIGTRAP signal means exiting
        break;
      }
      entered = true;
    }
    else if (WIFEXITED(status) || WIFSIGNALED(status) || WCOREDUMP(status))
    {
      std::cerr << "The child has unexpectedly exited." << std::endl;

      return false;
    }
  }

  return true;
}

由于使用了PTRACE_SYSCALL,进程将停止两次(第一次进入系统调用后,下一次和最后一次退出系统调用后)。您只能在系统调用实际完成后才能获取结果,因此在第二个进程停止后。是的,您只能通过手动读取寄存器来执行此操作,因为 seccomp 结构只能在该事件的 seccomp 跟踪处理程序中使用。甚至结构本身也不包含任何与系统调用结果相关的内容,手册页也没有提到获取结果值。