文字 0 和作为变量的 0 如何在函数 __builtin_clz 中产生不同的行为？

Question

只有 1 种情况 __builtin_clz 给出了错误答案。我很好奇是什么导致了这种行为。

当我使用文字值 0 时，我总是按预期得到 32。但是 0 作为变量产生 31。为什么存储值 0 的方法很重要？

我采用了架构 class 但不了解差异程序集。看起来当给定字面值 0 时，即使没有优化，程序集也总是以某种方式硬编码 32 的正确答案。并且使用 -march=native 时计算前导零的方法不同。

This post 关于用 _BitScanReverse 模拟 __builtin_clz 和行 bsrl %eax, %eax 似乎暗示位扫描反向不适用于 0.

+-------------------+-------------+--------------+
|      Compile      | literal.cpp | variable.cpp |
+-------------------+-------------+--------------+
| g++               |          32 |           31 |
| g++ -O            |          32 |           32 |
| g++ -march=native |          32 |           32 |
+-------------------+-------------+--------------+

literal.cpp

#include <iostream>

int main(){
    int i = 0;
    std::cout << __builtin_clz(0) << std::endl;
}

variable.cpp

#include <iostream>

int main(){
    int i = 0;
    std::cout << __builtin_clz(i) << std::endl;
}

g++ 差异 -S [in name] -o [out name]

1c1
<       .file   "literal.cpp"
---
>       .file   "variable.cpp"
23c23,26
<       movl    , %esi
---
>       movl    -4(%rbp), %eax
>       bsrl    %eax, %eax
>       xorl    , %eax
>       movl    %eax, %esi

g++ 差异 -march=native -S [in name] -o [out name]

1c1
<       .file   "literal.cpp"
---
>       .file   "variable.cpp"
23c23,25
<       movl    , %esi
---
>       movl    -4(%rbp), %eax
>       lzcntl  %eax, %eax
>       movl    %eax, %esi

g++ 差异 -O -S [in name] -o [out name]

1c1
<       .file   "literal.cpp"
---
>       .file   "variable.cpp"

Answer 1

当您在禁用优化的情况下进行编译时，编译器不会跨语句进行常量传播。该部分是 - read my answer there, and/or Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

的副本

这就是为什么文字零可能不同于值 = 0 的变量。 只有禁用优化的变量会在运行时产生 bsr+xor , %reg.

根据 in the GCC manual __builtin_clz

的记录

Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

这允许 clz / ctz 在 x86 上分别编译为 31-bsr 或 bsf 指令。 31-bsr 是用 bsr+xor ,%reg 实现的，这要归功于 2 的补码的魔力。（BSR 产生最高设置位的索引，而不是前导零计数）。

请注意，它只说结果，而不是行为。它不是 C++ UB（整个程序绝对可以做任何事情），它仅限于该结果，就像在 x86 asm 中一样。但无论如何，似乎当输入是编译时常量 0 时，GCC 会产生像 x86 lzcnt 和其他 ISA 上的 clz 指令一样的类型宽度。（这可能发生在与目标无关的 GIMPLE 树优化中，其中通过包括内置函数在内的操作进行恒定传播。）

Intel 文档bsf/bsr如如果内容源操作数为 0，则目标操作数的内容未定义。在现实生活中，Intel 硬件实现相同的行为AMD 文档：在这种情况下，不要修改目的地。

但是由于 Intel 拒绝记录它，编译器不会让您编写利用它的代码。 GCC 不知道也不关心这种行为，也没有提供利用它的方法。 MSVC 也没有，尽管它的内在函数采用输出指针 arg，因此可以很容易地以这种方式工作。参见

有了-march=native，GCC可以直接使用BMI1 lzcnt，这对包括0在内的每个可能的输入位模式都有很好的定义.它直接产生前导零计数，而不是第一个设置位的 index。

（这就是为什么 BSR/BSF 对于 input=0 没有意义；没有索引供他们查找。有趣的事实：bsr %eax, %eax 对 eax=0。在 asm 中，指令还根据输入是否为零设置 ZF，因此您可以检测输出何时为 "undefined" 而不是 bsr 之前的单独测试+分支。或者在 AMD 和现实生活中的其他一切，它没有修改目的地。）

进一步阅读 bsf 与 lzcnt 和错误依赖关系

在 Intel 上，直到 Skylake，lzcnt / tzcnt 对输出寄存器有错误的依赖性，即使结果确实不依赖于它。 IIRC，Coffee Lake 还修复了 popcnt 的错误依赖。（所有这些都在与 BSR/BSF 相同的执行单元上运行。）

Why does breaking the "output dependency" of LZCNT matter?

文字 0 和作为变量的 0 如何在函数 __builtin_clz 中产生不同的行为？

How can a literal 0 and 0 as a variable yield different behavior with the function __builtin_clz?

c++

assembly

gcc

intrinsics

undefined-behavior