将 C++20 的 std::popcount 与矢量优化一起使用是否等同于 popcnt 内在？

Question

C++20引入了很多新函数，例如std::popcount, I use the same functionality using an Intel Intrinsic.

我编译了两个选项 - 可以在 Compiler Explorer code:

中看到

看起来生成的汇编代码是一样的，除了 std 模板中使用的类型检查。

就 OS 不可知代码而言并具有相同的优化 - 假设使用 std::popcount 和 apt 编译器向量优化标志比直接使用内部函数更好是正确的吗？

谢谢。

Answer 1

技术上 没有。（但实际上，是的）。 C++标准只规定了popcount的行为，并没有规定实现（参考[bit.count]）。

实现者可以做任何他们想做的事情来实现这个行为，包括使用 popcnt 内部函数，但他们也可以编写一个 while 循环：

int set_bits = 0;
while(x)
{
   if (x & 1)
      ++set_bits;
   x >>= 1;
}
return set_bits;

这是 [bit.count] 标准中的完整措辞：

template<class T>
constexpr int popcount(T x) noexcept;

Constraints: T is an unsigned integer type ([basic.fundamental]).
Returns: The number of 1 bits in the value of x.

现实吗？编译器编写者非常聪明，会对其进行优化以尽可能多地使用内部函数。例如，gcc's implementation 似乎经过了相当程度的优化。

Is using C++20's std::popcount with vector optimization is equivalent to popcnt intristic?