AVX-512 中的压缩和扩展指令有什么区别？

Question

我正在研究 Intel intrinsics guide 的扩展和压缩操作。我对这两个概念感到困惑：

对于__m128d _mm_mask_expand_pd (__m128d src, __mmask8 k, __m128d a) == vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).

对于__m128d _mm_mask_compress_pd (__m128d src, __mmask8 k, __m128d a) == vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.

有没有更清楚的描述或者哪位大神可以解释一下？

Answer 1

这些指令实现了 APL 运算符 \（展开）和 /（压缩）。 Expand 取一些 m ≥ n 位的位掩码 α 其中 n 和 n 个数的数组 ω 和 returns 个 m 个数的向量，其中的数来自 ω插入到由 α 指示的位置，其余设置为零。例如，

0 1 1 0 1 0 \ 2 3 4

returns

0 2 3 0 4 0

_mm_mask_expand_pd 指令为固定 m = 8 实现此运算符。

compress 操作撤消了 expand 操作的效果，即它使用位掩码 α 到 select 条目来自 ω 并将这些条目连续存储到内存中。

AVX-512 中的压缩和扩展指令有什么区别？

What are the differences between the compress and expand instructions in AVX-512?

x86

assembly

simd

avx512