如何检测 Xeon Phi（骑士登陆）

Question

英特尔工程师写道，我们应该使用 VZEROUPPER/VZEROALL 来避免在所有处理器（包括未来的 Xeon 处理器）上向非 VEX 状态的昂贵转换，但在 Xeon Phi 上则不然：https://software.intel.com/pt-br/node/704023

人们也测量并发现 VZEROUPPER 和 VZEROALL 在 Knights Landing 上很昂贵：

36 clock cycles for both instructions in 64-bit mode (30 clock in 32-bit mode).

见上link.

如果我刚刚使用了 ymm0 和 ymm1，那么我的代码将如下所示：

if [we are running on a Xeon Phi]
     vpxor       ymm0,ymm0,ymm0
     vpxor       ymm1,ymm1,ymm1
else
     vzeroall
endif

如何检测 Xeon Phi（Knights Landing 和后来的 Xeon Phi 处理器）以实现上述代码？

关于 VZEROUPPER/VZEROALL 我们现在有以下情况：

这些指令不需要，而且在 Xeon Phi Knight 上非常昂贵在 64 位模式下为两条指令着陆 36 个时钟周期（在 32 位模式下为 30 个时钟周期）。
这些指令非常便宜，Xeon 和 Core 处理器 (Skylake/Kaby Lake) 需要这些指令，并且在可预见的未来 Xeon 也需要这些指令，以避免向非 VEX 状态过渡的代价高昂。

广告材料声称 Xeon Phi (Knights Landing) 与其他 Xeon 处理器完全兼容。

是否有可靠的方法来检测至强融核，以避免VZEROUPPER/VZEROALL？

有一篇文章 "How to detect Knights Landing AVX-512 support (Intel® Xeon Phi™ processor)" by James R., Updated February 22, 2016，但它只关注在 Knights Landing 上可用的特定新指令。所以VEX转换还是不是很清楚

如果知道英特尔是否计划实施一个 CPUID 位来显示非 VEX 状态是否代价高昂，那就太好了？例如：

位设置为 0 - VEX 状态转换成本高，但 VZEROUPPER/VZEROALL 成本低，应该用于清除状态；
位设置为 1 – 没有转换惩罚，不需要 VZEROUPPER/VZEROALL。

上面提到的检测Knights Landing的文章建议检查Knights Landing中介绍的位AVX-512F+CD+ER+PF。

所以代码建议一次检查所有这些位，如果所有位都已设置，那么我们就在 Knights Landing：

uint32_t avx2_bmi12_mask = (1 << 16) | // AVX-512F
                           (1 << 26) | // AVX-512PF
                           (1 << 27) | // AVX-512ER
                           (1 << 28);  // AVX-512CD

很高兴知道英特尔是否计划在不久的将来将这些所有位添加到简单的 Xeon（非 Phi）或 Core 处理器中，因此它们也将支持 AVX-512F+CD+ER+ Knight Landding 中引入的 PF 功能？

如果至强和酷睿处理器支持AVX-512F+CD+ER+PF，我们将无法区分Xeon和Xeon Phi。

请指教

Answer 1

如果您特别想检查是否在 KNL 上（而不是更一般的 "Does the CPU I am running on have feature X?"），您可以通过查看 "Extended Family"、"Family" 和 "Model" 使用 %eax==1 和 %ecx == 0 调用 cpuid 后 %eax 中的字段。类似下面的 C++ 代码将完成这项工作。

但是，正如其他人隐含指出的那样，这是一个非常具体的测试，例如，在未来的 Knights 核心上会失败，因此您最好按照建议进行检查并检查 AVX-512 Xeon 中没有的功能，因此 AVX512-ER 和 AVX512-PF。（当然，此类指令可能会出现在未来的 Xeons 中，因此不能保证长期如此，但引用凯恩斯的话："In the long term we're all dead" :-)）

class cpuidState
{
    uint32_t orig_eax;                      /* Values sent in to the cpuid instruction */
    uint32_t orig_ecx;

    uint32_t eax;                           /* Values received back from it. */
    uint32_t ebx;
    uint32_t ecx;
    uint32_t edx;

    void cpuid()
    {
        __asm__ __volatile__("cpuid"
                             : "+a" (eax), "=b" (ebx), "+c" (ecx), "=d" (edx));
    }

    void update (uint32_t eaxVal, uint32_t ecxVal)
    {
        orig_eax = eaxVal;
        orig_ecx = ecxVal;
        eax      = eaxVal;
        ecx      = ecxVal;
        cpuid();
    }

    void ensureCorrectLeaf(uint32_t eaxVal, uint32_t ecxVal)
    {
        if (orig_eax != eaxVal || orig_ecx != ecxVal)
            update (eaxVal, ecxVal);
    }

 public:
    cpuidState() : orig_eax (-1), orig_ecx(-1) { }

    // Include the Extended Model in the test. Without it we see some Xeons as KNL :-(
    bool onKNL()            { ensureCorrectLeaf(1,0); return (eax & 0x0f0ff0) == 0x50670; }    
};

如何检测 Xeon Phi（骑士登陆）

How to detect a Xeon Phi (Knights Landing)

avx

avx2

xeon-phi

avx512

knights-landing