C/C++ x86_64 上 32 位和 64 位值的非时间加载的内在函数?
C/C++ intrinsics for non-temporal loads of 32- and 64-bit values on x86_64?
在 x86_64 上是否有用于 32 位和 64 位值的非临时加载(即没有缓存的直接从 DRAM 加载)的 C/C++ 内在函数?
我的编译器是MSVC++2017工具集v141。但欢迎其他编译器的内在函数,以及对底层汇编指令的引用。
看看这里:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=temporal
void _mm_stream_pi (__m64* mem_addr, __m64 a)
void _mm_stream_si32 (int* mem_addr, int a)
还有一些人
和
https://msdn.microsoft.com/en-us/library/hh977023.aspx
它实际上是 VS2015 文档,但 VS2017 文档(至少对我而言)很奇怪,杂乱无章,我在那里找不到任何东西:)。
至少据我所知
void _mm_prefetch (char const* p, int i) is used for it.
这些负载足够短,只通知 uP 不要从缓存中逐出其他数据而不会造成性能损失(因此,即使对于非临时负载,如果缓存中有空间,它也会被缓存,但它不会驱逐任何数据)
撰写本文时(2017 年 8 月)GP 寄存器没有非临时负载。
唯一可用的非时间指令是:
整数域
(v)movntdqa
(load) despite the name this instruction moves 128/256/512 bits, aligned on their natural boundary, into xmm/ymm/zmm
registers respectively.
(v)movntdq
(store) despite the name this instruction moves xmm/ymm/zmm
registers into a 128/256/512 bits, aligned on their natural boundary, memory location.
GP注册
movnti
(store) store a 32/64-bit GP register into a DWORD/QWORD in memory.
MMX 寄存器
movntq
(store) 将 MMX 寄存器存储到内存中的 QWORD。
浮点域
(v)movntpd/s
(store) (legacy and VEX encoded) store a xmm/ymm/zmm
register into an aligned 128/256/512 bits memory location. Like movntdq
but in the FP domain.
(v)movntpd/s
(store) (EVEX encoded) store a xmm/ymm/zmm
register into an aligned 512 bits memory location clearing the upper unused bits. Like movntdq
but in the FP domain.
Intel manuals are contradictory on this
蒙面动画
(v)maskmovdqu
(store) stores the bytes of an xmm
register according to the mask in another xmm
register.
(v)maskmovq
(store) stores the bytes of an MMX register according to the mask in another MMX register.
在 x86_64 上是否有用于 32 位和 64 位值的非临时加载(即没有缓存的直接从 DRAM 加载)的 C/C++ 内在函数?
我的编译器是MSVC++2017工具集v141。但欢迎其他编译器的内在函数,以及对底层汇编指令的引用。
看看这里: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=temporal
void _mm_stream_pi (__m64* mem_addr, __m64 a)
void _mm_stream_si32 (int* mem_addr, int a)
还有一些人
和
https://msdn.microsoft.com/en-us/library/hh977023.aspx
它实际上是 VS2015 文档,但 VS2017 文档(至少对我而言)很奇怪,杂乱无章,我在那里找不到任何东西:)。
至少据我所知
void _mm_prefetch (char const* p, int i) is used for it.
这些负载足够短,只通知 uP 不要从缓存中逐出其他数据而不会造成性能损失(因此,即使对于非临时负载,如果缓存中有空间,它也会被缓存,但它不会驱逐任何数据)
撰写本文时(2017 年 8 月)GP 寄存器没有非临时负载。
唯一可用的非时间指令是:
整数域
(v)movntdqa
(load) despite the name this instruction moves 128/256/512 bits, aligned on their natural boundary, intoxmm/ymm/zmm
registers respectively.
(v)movntdq
(store) despite the name this instruction movesxmm/ymm/zmm
registers into a 128/256/512 bits, aligned on their natural boundary, memory location.
GP注册
movnti
(store) store a 32/64-bit GP register into a DWORD/QWORD in memory.
MMX 寄存器
movntq
(store) 将 MMX 寄存器存储到内存中的 QWORD。
浮点域
(v)movntpd/s
(store) (legacy and VEX encoded) store axmm/ymm/zmm
register into an aligned 128/256/512 bits memory location. Likemovntdq
but in the FP domain.
(v)movntpd/s
(store) (EVEX encoded) store axmm/ymm/zmm
register into an aligned 512 bits memory location clearing the upper unused bits. Likemovntdq
but in the FP domain.
Intel manuals are contradictory on this
蒙面动画
(v)maskmovdqu
(store) stores the bytes of anxmm
register according to the mask in anotherxmm
register.
(v)maskmovq
(store) stores the bytes of an MMX register according to the mask in another MMX register.