C/C++ x86_64 上 32 位和 64 位值的非时间加载的内在函数?

C/C++ intrinsics for non-temporal loads of 32- and 64-bit values on x86_64?

在 x86_64 上是否有用于 32 位和 64 位值的非临时加载(即没有缓存的直接从 DRAM 加载)的 C/C++ 内在函数?

我的编译器是MSVC++2017工具集v141。但欢迎其他编译器的内在函数,以及对底层汇编指令的引用。

看看这里: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=temporal

void _mm_stream_pi (__m64* mem_addr, __m64 a)
void _mm_stream_si32 (int* mem_addr, int a)

还有一些人

https://msdn.microsoft.com/en-us/library/hh977023.aspx

它实际上是 VS2015 文档,但 VS2017 文档(至少对我而言)很奇怪,杂乱无章,我在那里找不到任何东西:)。

至少据我所知

void _mm_prefetch (char const* p, int i) is used for it. 

这些负载足够短,只通知 uP 不要从缓存中逐出其他数据而不会造成性能损失(因此,即使对于非临时负载,如果缓存中有空间,它也会被缓存,但它不会驱逐任何数据)

撰写本文时(2017 年 8 月)GP 寄存器没有非临时负载


唯一可用的非时间指令是:

整数域

(v)movntdqa (load) despite the name this instruction moves 128/256/512 bits, aligned on their natural boundary, into xmm/ymm/zmm registers respectively.
(v)movntdq (store) despite the name this instruction moves xmm/ymm/zmm registers into a 128/256/512 bits, aligned on their natural boundary, memory location.

GP注册

movnti (store) store a 32/64-bit GP register into a DWORD/QWORD in memory.

MMX 寄存器

movntq (store) 将 MMX 寄存器存储到内存中的 QWORD。

浮点域

(v)movntpd/s (store) (legacy and VEX encoded) store a xmm/ymm/zmm register into an aligned 128/256/512 bits memory location. Like movntdq but in the FP domain.

(v)movntpd/s (store) (EVEX encoded) store a xmm/ymm/zmm register into an aligned 512 bits memory location clearing the upper unused bits. Like movntdq but in the FP domain.
Intel manuals are contradictory on this

蒙面动画

(v)maskmovdqu (store) stores the bytes of an xmm register according to the mask in another xmm register.

(v)maskmovq (store) stores the bytes of an MMX register according to the mask in another MMX register.