mm_cmpeq_epi8_mask 的非法指令

Question

我正在尝试运行类似于以下内容的代码

#include <immintrin.h>
void foo() {
    __m128i a = _mm_set_epi8 (0,0,6,5,4,3,2,1,8,7,6,5,4,3,2,1);
    __m128i b = _mm_set_epi8 (0,0,0,0,0,0,0,1,8,7,6,5,4,3,2,1);
    __mmask16 m = _mm_cmpeq_epi8_mask(a,b); // supposedly requires avx512vl and avx512bw
    std::cout<<m<<std::endl;
}
void bar() {
    int dataa[8] = {1,0,1,0,1,0,1,0};
    __m256i points = _mm256_lddqu_si256((__m256i *)&dataa[0]); // requires just mavx
    (void)points;
}

但是，我一直运行宁入错误Illegal instruction (core dumped)

我用

编译代码

g++ -std=c++11 -march=broadwell -mavx -mavx512vl -mavx512bw tests.cpp

根据 Intel 的内在函数文档，这些标志应该足以运行 foo 和 bar。但是，当 foo 或 bar 为运行时，我收到相同的错误消息。

但是，如果我删除 foo，并且在没有 -mavx512vl 的情况下进行编译，我可以运行 bar 顺利进行。

我已经检查过我的 cpu 支持 mno-avx512vl 和 mno-avx512bw 标志，所以它应该支持 mavx512vl 和 mavx512bw 对吗？

我必须为运行这两个函数包含哪些标志？还是我漏掉了什么？

Answer 1

恐怕你确定 CPU 能力的方法不是非常可靠。您的 gcc 编译器支持 AVX-512 的事实并不意味着您的 CPU 支持 AVX-512。

在 Linux 命令行输入 more /proc/cpuinfo 并检查标志部分查看您的 CPU 支持哪些指令集。

关于 windows： 1. 打开设置， 2. 单击“系统”， 3. 单击“关于”。这将向您显示处理器类型。 Google intel ark 'processor type' 例如 Google intel ark core i3 7100。然后按照 link 到 processor page on the Intel website 并检查 Advanced Technologies -> 指令集扩展 项。

有多个级别的 AVX-512 支持。 AVX-512_BW AVX-512_VL 是支持 AVX-512 的处理器的标准配置，除非您使用的是 Knights Landing 或 Mill 处理器。参见 https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 or https://en.wikichip.org/wiki/x86/avx-512#Implementation。

Answer 2

用gcc -march=native编译。如果你得到编译错误，你的源试图使用你的 CPU 不支持的东西。

相关：

I already checked that my cpu supports the mno-avx512vl and mno-avx512bw flags so it should support mavx512vl and mavx512bw right?

这与 GCC 选项的工作方式相反。

-mno-avx512vl 禁用 -mavx512vl 如果任何较早的选项（如 -march=skylake-avx512 或 -mavx512vl 自己）设置了它。

-march=broadwell 不启用 AVX512 指令，因为 Broadwell CPUs 不能运行本机。所以 -mno-avx512vl 在 g++ -std=c++11 -march=broadwell -mavx ...

末尾的效果完全为零

Many options have long names starting with ‘-f’ or with ‘-W’—for example, -fmove-loop-invariants, -Wformat and so on. Most of these have both positive and negative forms; the negative form of -ffoo is -fno-foo. This manual documents only one of these two forms, whichever one is not the default.

from the GCC manual, intro part of section 3: Invoking GCC 3

（-m 选项遵循与 -f 和 -W 长选项相同的约定。）

这种 foo 与 no-foo 的风格并不是 GCC 独有的；这很常见。

使用 -mavx512vl

编译后在 _mm256_lddqu_si256 上出错

GCC 是愚蠢的，它使用 EVEX 编码来加载（可能 vmovdqu64）而不是更短的 VEX 编码。但是你告诉它AVX512VL可用，所以这只是一个优化问题，不是正确性问题。

如果您在仅启用 AVX 的情况下编译函数，它当然只会使用 AVX 指令。

Answer 3

对于英特尔的 ISA，一般规则是后者架构是前者的超集。由于 AVX512 是您提到的最新版本，因此您不必使用 -mavx。使用 -march=broadwell 是无用的，因为您无法针对没有 AVX512 ISA 的 CPU 进行优化。

您的命令行应如下所示

g++ -std=c++11 -march=skylake-avx512 tests.cpp

另外，"my CPU supports those compiler flags"这个说法很奇怪。我想你的意思是 "the code I built with those flags runs on my CPU" 但正如已经提到的 no 前缀意味着 NOT 为这样的 ISA 生成代码。

因此，您的编译器标志很好，只是 CPU 您不支持所需的 ISA。

mm_cmpeq_epi8_mask 的非法指令

Illegal Instruction with mm_cmpeq_epi8_mask

gcc

instruction-set

compiler-flags

intrinsics

avx512