orpd 等 SSE2 指令有什么意义？

What is the point of SSE2 instructions such as orpd?

orpd 指令是 "bitwise logical OR of packed double precision floating point values"。这不是和 por ("bitwise logical OR") 做的完全一样吗？如果是这样，拥有它有什么意义？

请记住，SSE1 orps came first. (Well actually MMX por mm, mm/mem 甚至在 SSE1 之前就出现了。）

具有新前缀的相同操作码是 SSE2 orpd 指令对于硬件解码器逻辑来说是有意义的，我想，就像 movapd 与 movaps 一样。 ps 和 pd 版本之间的一些指令是多余的，但有些不是，比如 addps 与 addpd 或 unpcklps 与 unpcklpd 是不同的洗牌。

SSE2 还引入 66 0F EB /r por xmm,xmm/mem 的原因至少部分是为了与 MMX 0F EB /r por mm, mm/mem 保持一致，同样的操作码具有新的强制性前缀。就像 paddb mm, mm 与 paddb xmm, xmm.

但也考虑到 vec-integer 与 FP 的不同 bypass-forwarding 域的可能性。不同的微体系结构在实际解码方式和运行那些不同的指令方面有不同的行为。一些运行所有 XMM or 指令都以相同的方式进行，为 FP 和 simd-integer 域之间的转发创建额外的延迟。

没有任何 CPU 在 FP-float 和 FP-double 上有不同的转发域，所以是的，movapd 和 orpd 在实践中space 的无用浪费，你永远不应该使用。请改用较小的 orps 编码。

（或者使用 VEX 编码没关系；vorps 和 vorpd 大小相同：2 字节前缀 + 操作码 + modrm ...）

`por` 对比 `orps`

有关在 addps 等 FP 数学指令之间使用 por 或在 paddb 等 SIMD-integer 指令之间使用 orps 时的旁路延迟的更多信息，请参阅

Do I get a performance penalty when mixing SSE integer/float SIMD instructions
What's the difference between logical SSE intrinsics?
Difference between the AVX instructions vxorpd and vpxor
Does using mix of pxor and xorps affect performance?
Is there any situation where using MOVDQU and MOVUPD is better than MOVUPS?
Choosing SSE instruction execution domains in mixed contexts - pre-Skylake，整数版本具有更好的吞吐量。

如果有人想知道，请回答标题的其他解释：FP 值上的按位布尔值主要用于设置、清除或切换符号位。或者用 cmpps/pd 蒙版做一些事情，比如混合。

orpd 等 SSE2 指令有什么意义？

What is the point of SSE2 instructions such as orpd?

x86

assembly

sse

instruction-set

sse2

`por` 对比 `orps`

orpd 等 SSE2 指令有什么意义？

What is the point of SSE2 instructions such as orpd?

x86

assembly

sse

instruction-set

sse2

por 对比 orps

`por` 对比 `orps`