AVX-512 中的压缩和扩展指令有什么区别?
What are the differences between the compress and expand instructions in AVX-512?
我正在研究 Intel intrinsics guide 的扩展和压缩操作。我对这两个概念感到困惑:
对于__m128d _mm_mask_expand_pd (__m128d src, __mmask8 k, __m128d a) == vexpandpd
Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
对于__m128d _mm_mask_compress_pd (__m128d src, __mmask8 k, __m128d a) == vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
有没有更清楚的描述或者哪位大神可以解释一下?
这些指令实现了 APL 运算符 \
(展开)和 /
(压缩)。 Expand 取一些 m ≥ n 位的位掩码 α 其中 n 和 n 个数的数组 ω 和 returns 个 m 个数的向量,其中的数来自 ω插入到由 α 指示的位置,其余设置为零。例如,
0 1 1 0 1 0 \ 2 3 4
returns
0 2 3 0 4 0
_mm_mask_expand_pd
指令为固定 m = 8 实现此运算符。
compress 操作撤消了 expand 操作的效果,即它使用位掩码 α 到 select 条目来自 ω 并将这些条目连续存储到内存中。
我正在研究 Intel intrinsics guide 的扩展和压缩操作。我对这两个概念感到困惑:
对于__m128d _mm_mask_expand_pd (__m128d src, __mmask8 k, __m128d a) == vexpandpd
Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
对于__m128d _mm_mask_compress_pd (__m128d src, __mmask8 k, __m128d a) == vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
有没有更清楚的描述或者哪位大神可以解释一下?
这些指令实现了 APL 运算符 \
(展开)和 /
(压缩)。 Expand 取一些 m ≥ n 位的位掩码 α 其中 n 和 n 个数的数组 ω 和 returns 个 m 个数的向量,其中的数来自 ω插入到由 α 指示的位置,其余设置为零。例如,
0 1 1 0 1 0 \ 2 3 4
returns
0 2 3 0 4 0
_mm_mask_expand_pd
指令为固定 m = 8 实现此运算符。
compress 操作撤消了 expand 操作的效果,即它使用位掩码 α 到 select 条目来自 ω 并将这些条目连续存储到内存中。