MOVUPD 与 MOVDQU(x86/x64 程序集)
MOVUPD vs. MOVDQU (x86/x64 assembly)
这些指令有什么区别? MOVDQU 是未对齐的双四字移动,MOVUPD 是未对齐的两个 64 位浮点移动。我的意思是他们都只是移动未对齐的 128 位数据。
MOVDQU 在第 948 页,MOVUPD 在 intel x64 手册的第 995 页。
Agner Fog 说:
The instructions MOVDQA
, MOVDQU
, MOVAPS
, MOVUPS
, MOVAPD
and MOVUPD
are all identical when used with [128 bit] register operands
然后他继续说(他在他的例子中使用对齐的版本,但我猜这同样适用于未对齐的变体):
On Intel Core 2 and earlier Intel processors, some floating point instructions are executed in
the integer units. This includes XMM move instructions, Boolean, and some shuffle and
pack instructions. These instructions have a bypass delay when mixed with instructions that
use the floating point unit. On most other processors, the execution unit used is in
accordance with the instruction name, e.g. MOVAPS XMM1,XMM2
uses the floating point unit,
MOVDQA XMM1,XMM2
uses the integer unit.
Instructions that read or write memory use a separate unit. The bypass delay from the
memory unit to the floating point unit may be longer than to the integer unit on some
processors, but it doesn't depend on the type of the instruction. Thus, there is no difference
in latency between MOVAPS XMM0,[MEM]
and MOVDQA XMM0,[MEM]
on current processors,
but it cannot be ruled out that there will be a difference on future processors.
[Y]ou may use MOVAPS
instead of MOVAPD
or MOVDQA
for moving data to or from
memory or between registers. A bypass delay occurs in some processors when using
MOVAPS
for moving the result of an integer instruction to another register, but not when
moving data to or from memory.
这些指令有什么区别? MOVDQU 是未对齐的双四字移动,MOVUPD 是未对齐的两个 64 位浮点移动。我的意思是他们都只是移动未对齐的 128 位数据。
MOVDQU 在第 948 页,MOVUPD 在 intel x64 手册的第 995 页。
Agner Fog 说:
The instructions
MOVDQA
,MOVDQU
,MOVAPS
,MOVUPS
,MOVAPD
andMOVUPD
are all identical when used with [128 bit] register operands
然后他继续说(他在他的例子中使用对齐的版本,但我猜这同样适用于未对齐的变体):
On Intel Core 2 and earlier Intel processors, some floating point instructions are executed in the integer units. This includes XMM move instructions, Boolean, and some shuffle and pack instructions. These instructions have a bypass delay when mixed with instructions that use the floating point unit. On most other processors, the execution unit used is in accordance with the instruction name, e.g.
MOVAPS XMM1,XMM2
uses the floating point unit,MOVDQA XMM1,XMM2
uses the integer unit.
Instructions that read or write memory use a separate unit. The bypass delay from the memory unit to the floating point unit may be longer than to the integer unit on some processors, but it doesn't depend on the type of the instruction. Thus, there is no difference in latency between
MOVAPS XMM0,[MEM]
andMOVDQA XMM0,[MEM]
on current processors, but it cannot be ruled out that there will be a difference on future processors.
[Y]ou may use
MOVAPS
instead ofMOVAPD
orMOVDQA
for moving data to or from memory or between registers. A bypass delay occurs in some processors when usingMOVAPS
for moving the result of an integer instruction to another register, but not when moving data to or from memory.