为什么kill dependency指令会消耗reservation station slots？

Why do kill dependency instructions consume reservation station slots?

我一直认为杀死依赖项的指令，例如 xor reg, reg 不必执行，并且一旦重命名器将它们移动到重新排序缓冲区就准备好退出。

我刚刚测量了通过事件 uops_issued.any 进入 RS 的微操作数量，并对这个数字感到惊讶。所有用于消除依赖的 xor reg, reg 都在 perf 事件中计算。

为什么不对 ROB 施加致命依赖，而不会无用地干扰预订站？

他们没有，但 AFAIK 没有未融合域的前端计数器。如果您没有导致微指令在 issue/before 执行后从 RS 中丢弃的分支预测错误，那么您在管道中的哪个位置计数并不重要，因此有一个解决方法。

要计算 RS 微指令，请使用 uops_executed.thread 计算已成功（？）执行的微指令。我还没有检查急切调度的 uops 的重播是否在每次尝试调度时计数 uops_executed，或者仅在 uops_dispatched_port.port_[0..7].

上计数

有关使用 perf 区分已消除与未消除以及前端融合域与后端未融合域的示例，请参见。

I just measure the number of microoperations getting into the RS with the event uops_issued.any

该事件对发布到 ROB 的融合域微指令进行计数。对于合并到 RAX 低半部分的 add eax, [rdi] 或 mov al, [rsi] 这样的微融合微指令计数为 1（即使它们计数为 2 uops_executed），并且对于像 mov reg,reg 和 xor same,same (0 uops_executed).

perf list 确实像这样（在 Skylake 上）进行了误导性描述，所以混淆是可以理解的。

uops_issued.any
[Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS)]

I always thought that instructions for killing dependencies, e.g xor reg, reg do not have to be executed and are ready for retirement as soon as the Renamer moves them to the Re-order Buffer.

是的，这也是我的想法，他们进入标记为已执行的 ROB，并且不要触摸 RS。

只有 Sandybridge-family 这样做（包括 Skylake/IceLake）；其他微体系结构（如 Zen AFAIK）确实需要一个后端 uop 来实际写入零。

AMD 确实对矢量移动（自 Bulldozer 以来）和 GP 整数移动（自 Zen 以来）进行了移动消除，所以这些可能像英特尔异或归零或 mov.

一样处理

对 Sandybridge 机制的一种猜测是异或归零（GP 整数或 XMM/YMM 寄存器）重命名为内部零寄存器。 http://blog.stuffedcow.net/2013/05/measuring-rob-capacity/ 对此进行了测试，它的异或归零指令不会消耗额外的 PRF 条目来写入目标寄存器。

为什么kill dependency指令会消耗reservation station slots？

Why do kill dependency instructions consume reservation station slots?

assembly

x86-64

cpu-architecture

perf

intel-pmu