MOVUPD 与 MOVDQU(x86/x64 程序集)

MOVUPD vs. MOVDQU (x86/x64 assembly)

这些指令有什么区别? MOVDQU 是未对齐的双四字移动,MOVUPD 是未对齐的两个 64 位浮点移动。我的意思是他们都只是移动未对齐的 128 位数据。

MOVDQU 在第 948 页,MOVUPD 在 intel x64 手册的第 995 页。

Agner Fog 说:

The instructions MOVDQA, MOVDQU, MOVAPS, MOVUPS, MOVAPD and MOVUPD are all identical when used with [128 bit] register operands

然后他继续说(他在他的例子中使用对齐的版本,但我猜这同样适用于未对齐的变体):

On Intel Core 2 and earlier Intel processors, some floating point instructions are executed in the integer units. This includes XMM move instructions, Boolean, and some shuffle and pack instructions. These instructions have a bypass delay when mixed with instructions that use the floating point unit. On most other processors, the execution unit used is in accordance with the instruction name, e.g. MOVAPS XMM1,XMM2 uses the floating point unit, MOVDQA XMM1,XMM2 uses the integer unit.


Instructions that read or write memory use a separate unit. The bypass delay from the memory unit to the floating point unit may be longer than to the integer unit on some processors, but it doesn't depend on the type of the instruction. Thus, there is no difference in latency between MOVAPS XMM0,[MEM] and MOVDQA XMM0,[MEM] on current processors, but it cannot be ruled out that there will be a difference on future processors.


[Y]ou may use MOVAPS instead of MOVAPD or MOVDQA for moving data to or from memory or between registers. A bypass delay occurs in some processors when using MOVAPS for moving the result of an integer instruction to another register, but not when moving data to or from memory.