了解 cmpxchg8b/cmpxchg16b 操作

Understanding cmpxchg8b/cmpxchg16b operation

此指令的 SDM 文本包含以下块：

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination.

我无法理解最后一句话（但也可能是整个段落）

The destination operand is written back ...回什么？
...;otherwise, the source operand is written into the destination源操作数是什么？是 ECX:EBX 吗？据我了解，这条 CAS 指令只接受一个操作数（内存目标）。

如果有人可以重新措辞 and/or 解释一下关于无条件写入的内容，我们将不胜感激。

与常规 cmpxchg r/m32, r32 的措辞（具有显式而不是隐式来源）相比，它应该更有意义，特别是比较 table 表格中的简短描述手动输入的顶部。我已经用 dst、src 和 implicit 进行了注释。请注意，Intel 语法通常为 op dst, src.

cmpxchg r/m64, r64：比较RAX (隐式)与r/m64 (dst) .如果相等，则设置 ZF 并将 r64 (src) 加载到 r/m64 (dst)。否则，清除 ZF 并将 r/m64 (dst) 加载到 RAX (implicit).
cmpxchg16b m128 比较 RDX:RAX 与 m128 (dst)。如果相等，则设置 ZF 并将 RCX:RBX 加载到 m128 (dst) 中。否则，清除 ZF 并将 m128 加载到 RDX:RAX.

是的，没错，英特尔的手册使用“加载”来描述存储到内存。（对于 cmpxchg 来说有点合理，其中目标可以是一个寄存器，对于 cmpxchg16b 根本不是。）

但无论如何，记住这些工具会有所帮助：

m64.compare_exchange_strong(expected=RAX, desired=r64);
m128.compare_exchange_strong(expected=RDX:RAX, desired=RCX:RBX);

（就 C++ std::atomic 而言。要真正成为原子，它们需要 lock 前缀，否则它是非原子 RMW。C++ 只会编译为 lock cmpxchg / lock cmpxchg16b，主流编译器永远不会解锁 cmpxchg。）

The destination operand is written back ... back to what?

目的地的旧值（刚刚加载）被写回。这意味着 cmpxchg16b 是总是写入，并且将例如始终将页面的脏标志标记为脏。（询问它在 CAS 故障时是否真的在微架构上弄脏了缓存行。我假设是这样，但还没有检查过。）

这对于旧 CPU 上的 lock 前缀在历史上很重要，其中有一个外部 LOCK# pin lock cmpxchg 实际上为整个加载+存储对断言。现代 CPU 只是在受影响的缓存行上持有一个缓存锁，用于可缓存内存上的对齐锁 CAS。这就是为什么手册说“为了简化处理器总线的接口，目标操作数接收一个写周期而不考虑比较的结果。”

The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

这整段是英特尔在编写 cmpxchg16b 条目时从 cmpxchg 手动条目复制粘贴的；它在 CX16 上下文中不太清楚，因为它有 2 个隐式操作数而不是显式源和读写 RAX。它没有定义术语“源操作数”。

在前面的描述中，它确实定义了该指令的“目标操作数”术语

Compares the 64-bit value in EDX:EAX (or 128-bit value in RDX:RAX if operand size is 128 bits) with the operand (destination operand)

“操作数”表示显式操作数。这显然是什么意思，因为它是唯一可以是记忆的东西，所以它必须是被比较的东西之一。以及其他来自英语运作方式的线索/原因等等。

所以“目标操作数”确实得到了明确的定义，但是在一条总共有 3 个操作数的指令中，不定义就说“源操作数”是很糟糕的。正如我所说，这显然是英特尔文档编写者 copy/pasta 的结果。

这不是一个严重的问题；我们知道指令的基本要点，操作部分使实际发生的事情 100% 清楚。

了解 cmpxchg8b/cmpxchg16b 操作

Understanding cmpxchg8b/cmpxchg16b operation

assembly

x86-64

intel

instructions

compare-and-swap