静态向量内部数据布局 - `union` 与 `std::aligned_storage_t` - 巨大的性能差异

Question

假设您必须实现一个 static_vector<T, N> class，这是一个 固定容量 容器，它完全存在于堆栈上并且从不分配，并且公开一个类似于 std::vector 的界面。 （提升提供 boost::static_vector。）

考虑到我们必须未初始化 存储以供 T 的最大 N 个实例使用，在设计内部数据时可以做出多种选择布局：

单人union:

union U { T _x; };
std::array<U, N> _data;

单身std::aligned_storage_t:

std::aligned_storage_t<sizeof(T) * N, alignof(T)> _data;

数组 std::aligned_storage_t:

using storage = std::aligned_storage_t<sizeof(T), alignof(T)>;
std::array<storage, N> _data;

无论选择如何，创建成员都需要使用 "placement new" 并且访问它们需要类似于 reinterpret_cast.

的内容

现在假设我们有两个非常小的实现 static_vector<T, N>:

with_union：使用"single-member union"方法实现；
with_storage：使用"single std::aligned_storage_t"方法实现。

让我们使用 g++ 和 clang++ 以及 -O3 执行以下基准测试。我用了 quick-bench.com for this task:

void escape(void* p) { asm volatile("" : : "g"(p) : "memory"); }
void clobber()       { asm volatile("" : :        : "memory"); }

template <typename Vector>
void test()
{
    for(std::size_t j = 0; j < 10; ++j)
    {
        clobber();
        Vector v;
        for(int i = 0; i < 123456; ++i) v.emplace_back(i);
        escape(&v);
    }
}

（escape 和 clobber 摘自 Chandler Carruth 的 CppCon 2015 演讲："Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"）

结果 g++ 7.2 (live here):

结果 clang++ 5.0 (live here):

从结果中可以看出，g++ 似乎能够积极优化 （向量化） 使用 "single std::aligned_storage_t" 方法的实现，但不是使用 union.

的实现

我的问题是：

标准中是否有任何内容阻止使用 union 的实现被积极优化？ （即标准在使用 std::aligned_storage_t 时授予编译器更多自由 - 如果是，为什么？）
这纯粹是一个"quality of implementation"问题吗？

Answer 1

is right, this is the same issue as in . Fundamentally, gcc is missing an optimization opportunity in forgetting that std::array's data is aligned. John Zwinck helpfully reported bug 80561.

您可以通过对 with_union:

进行两项更改之一来在您的基准测试中验证这一点

将 _data 从 std::array<U, N> 更改为简单的 U[N]。性能变得相同

通过将emplace_back()的实现改为：

提醒gcc _data实际上是对齐的

template <typename... Ts> 
T& emplace_back(Ts&&... xs)
{
    U* data = static_cast<U*>(__builtin_assume_aligned(_data.data(), alignof(U)));
    T* ptr = &data[_size++]._x;
    return *(new (ptr) T{std::forward<Ts>(xs)...});
}

与你的基准测试的其余部分相比，这些变化中的任何一个都让我得到了 WithUnion 和 WithAlignedStorage 之间可比较的结果。

静态向量内部数据布局 - `union` 与 `std::aligned_storage_t` - 巨大的性能差异

Static vector internal data layout - `union` vs `std::aligned_storage_t` - huge performance difference

c++

optimization

unions

language-lawyer

c++17