qemu 跟踪什么指令？

Question

我编写了以下代码，逐步执行 /bin/ls 并计算其指令：

#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>    
#include <sys/syscall.h>

int main()
{   
    pid_t child;
    child = fork(); //create child
    
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        char* child_argv[] = {"/bin/ls", NULL};
        execv("/bin/ls", child_argv);
    }
    else {
        int status;
        long long ins_count = 0;
        while(1)
        {
            //stop tracing if child terminated successfully
            wait(&status);
            if(WIFEXITED(status))
                break;

                ins_count++;
                ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
        }

    printf("\n%lld Instructions executed.\n", ins_count);

    }
    
    return 0;
}

运行这段代码让我执行了大约 500.000 条指令。据我所知，这些指令中的大部分应该来自动态链接器。当我使用 QEMU 使用 qemu-x86_64 -singlestep -D log -d in_asm /bin/ls[= 跟踪 /bin/ls 时22=] ，我执行了大约 17.000 条指令。我需要调整什么才能在 QEMU 的相同点开始和停止计数？（又名。计算相同的指令）。

我用 QEMU 跟踪了一个“return null”程序，它产生了 7840 条指令，而我的代码给了我 109025，因此 QEMU 似乎比主要跟踪的要多，但比我的代码要少。

我的目标是稍后比较这些指令，这就是为什么我想遍历与 QEMU 相同的指令。

Answer 1

我修改了您的程序，使其运行在专用 CPU 核心（例如 7 号）上，在 fork() 之前添加以下代码:

#define _GNU_SOURCE
#include <sched.h>
[...]
  cpu_set_t set;
  int rc;

  CPU_ZERO(&set);
  CPU_SET(7, &set);

  // Migrate the calling process on the target cpu
  rc = sched_setaffinity(0, sizeof(cpu_set_t), &set);
  if (0 != rc) {
    fprintf(stderr, "sched_setaffinity(): '%m' (%d)\n", errno);
    return -1;
  }

  // Dummy system call to trigger the migration. Actually, the on line
  // manual says that the previous call will make the current process
  // migrate but I saw in cpuid's source code that the guy calls sleep(0)
  // to make sure that the migration will be done. In my opinion, it may
  // be safer to call sched_yield()
  rc = sched_yield();
  if (0 != rc) {
    fprintf(stderr, "sched_yield(): '%m' (%d)\n", errno);
    return -1;
  }

  // Create child
  child = fork();
[...]

我的电脑是运行ning Ubuntu/Linux 5.4.0 on:

# Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
# Code name     : Ivy Bridge
# cpu family    : 6
# model     : 58
# microcode : 0x21
# Number of physical cores: 4
# Number of harware threads: 8
# Base frequency: 3,50 GHz
# Turbo frequency: 3,90 GHz
# cpu MHz: 1604.615
# cache size    : 8192 KB
# cache_alignment: 64
# Address sizes: 36 bits physical, 48 bits virtual
#
# PMU version: 3
# Maximum number of fixed counters: 3
# Fixed counter bit width: 48
# Maximum number of programmable counters: 4
# Programmable counter bit width: 48

如果我在激活 ptrace() 的情况下启动修改后的程序，我得到的数字几乎与你的相同：

$ test/progexec
[...]
548765 Instructions executed.

我设计了一个读取英特尔 PMU 计数器的工具。固定计数器#0 是：

# INST_RETIRED.ANY
#
# Number of instructions that retire execution. For instructions that consist of multiple
# uops, this event counts the retirement of the last uop of the instruction. The counter
# continues counting during hardware interrupts, traps, and in-side interrupt handlers.
#

在 CPU 核心 #7 上读取上述计数器，其中程序运行s 给出以下结果：

1871879 用户 + 内核说明 space 执行（响铃 0-3）
546874 用户 space 执行说明（环 3）
1324451 内核指令 space 执行（ring 0）

所以，根据上面的数字，带有ptrace(PTRACE_SINGLESTEP)的程序计算出程序在运行ning 用户 space（英特尔保护环#3）。

N.B.: Linux 使用 ring 0 内核 space 和环 3 用户 space.

Answer 2

QEMU 的“in_asm”日志不是执行指令的日志。它会在每次翻译指令时记录（即当 QEMU 生成一些与之对应的主机代码时）。该翻译然后被缓存，如果来宾循环并再次执行相同的指令，QEMU 将简单地重新使用相同的翻译，因此它不会被 in_asm 记录。因此，“in_asm 报告的指令少得多”是预期的。

通过 -d 选项记录每条执行的指令有点棘手——您需要查看 'cpu' 和 'exec' 跟踪，以使用 - 的 'nochain' 子选项d 禁用 QEMU 优化，否则会导致某些块不被记录，使用“-singlestep”强制每个块执行一条指令，并解决一些我们打印执行跟踪然后不打印的极端情况实际执行指令。这是因为 -d 选项并非旨在让用户反省其程序行为的方式——它是一个调试选项，旨在允许调试 QEMU 和来宾程序一起执行的操作，因此它会打印以下信息需要对 QEMU 内部有一点了解才能正确解释。

您可能会发现编写 QEMU“插件”更简单：https://qemu.readthedocs.io/en/latest/devel/tcg-plugins.html -- 这是一个 API 旨在非常简单地编写诸如“执行的计数指令”之类的工具。如果幸运的话，其中一个示例插件甚至可能足以满足您的目的。

qemu 跟踪什么指令？

What instructions does qemu trace?

c

linux

ptrace

qemu

instructions