AVX-512 中的压缩和扩展指令有什么区别?

What are the differences between the compress and expand instructions in AVX-512?

我正在研究 Intel intrinsics guide 的扩展和压缩操作。我对这两个概念感到困惑:

对于__m128d _mm_mask_expand_pd (__m128d src, __mmask8 k, __m128d a) == vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).

对于__m128d _mm_mask_compress_pd (__m128d src, __mmask8 k, __m128d a) == vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.

有没有更清楚的描述或者哪位大神可以解释一下?

这些指令实现了 APL 运算符 \(展开)和 /(压缩)。 Expand 取一些 mn 位的位掩码 α 其中 n n 个数的数组 ω 和 returns 个 m 个数的向量,其中的数来自 ω插入到由 α 指示的位置,其余设置为零。例如,

0 1 1 0 1 0 \ 2 3 4

returns

0 2 3 0 4 0

_mm_mask_expand_pd 指令为固定 m = 8 实现此运算符。

compress 操作撤消了 expand 操作的效果,即它使用位掩码 α 到 select 条目来自 ω 并将这些条目连续存储到内存中。