避免错误共享 SPSC 队列索引

Question

让我们想象一个无锁并发 SPSC（单生产者/单消费者）队列。

生产者线程读取head、tail、cached_tail和写 head, cached_tail.
消费者线程读取head、tail、cached_head和写 tail, cached head.

请注意，cached_tail 仅由生产者线程访问，就像 cached_head 仅由消费者线程访问一样。它们可以被认为是私有线程局部变量，所以它们是不同步的，因此没有被定义为原子的。

队列的数据布局如下：

#include <atomic>
#include <cstddef>
#include <thread>

struct spsc_queue
{
    /// ...

    // Producer variables
    alignas(std::hardware_destructive_interference_size) std::atomic<size_t> head; // shared
    size_t cached_tail; // non-shared

    // Consumer variables
    alignas(std::hardware_destructive_interference_size) std::atomic<size_t> tail; // shared
    size_t cached_head; // non-shared

    std::byte padding[std::hardware_destructive_interference_size - sizeof(tail) - sizeof(cached_head)];
};

因为我想避免虚假共享，我将head和tail对齐到L1缓存行大小。

push/pop 操作的伪代码实现如下：

bool push(const void* elems, size_t n)
{
    size_t h = atomic_load(head, relaxed);

    if (num_remaining_storage(h, cached_tail) < n)
    {
        cached_tail = atomic_load(tail, acquire);

        if (num_remaining_storage(h, cached_tail) < n)
            return false;
    }

    // write from elems

    atomic_store(head, h + n, release);
    return true;
}

bool pop(void* elems, size_t n)
{
    size_t t = atomic_load(tail, relaxed);

    if (num_stored_elements(cached_head, t) < n)
    {
        cached_head = atomic_load(head, acquire);

        if (num_stored_elements(cached_head, t) < n)
            return false;
    }

    // read to elems

    atomic_store(tail, t + n, release);
    return true;
}

void wait_and_push(const void* elems, size_t n)
{
    size_t h = atomic_load(head, relaxed);

    while (num_remaining_storage(h, cached_tail) < n)
        cached_tail = atomic_load(tail, acquire);

    // write from elems

    atomic_store(head, h + n, release);
}

void wait_and_pop(void* elems, size_t n)
{
    size_t t = atomic_load(tail, relaxed);

    while (num_stored_elements(cached_head, t) < n)
        cached_head = atomic_load(head, acquire);

    // write to elems

    atomic_store(tail, t + n, release);
}

初始化时（此处未列出），所有索引都设置为 0。函数 num_remaining_storage 和 num_stored_elements 是 const 函数，根据传递的参数和不可变队列容量执行简单计算 - 它们不执行任何原子读取或写入。

现在的问题是：我是否需要对齐 cached_tail 和 cached_head 以完全避免错误共享任何索引，或者它没关系，因为它是。由于 cached_tail 是生产者私有的，而 cached_head 是消费者私有的，我认为 cached_tail 可以与 head （生产者缓存行）在同一缓存行中，就像 cached_head 在与 tail（消费者缓存行）相同的缓存行中，不会发生错误共享。

我是不是漏掉了什么？

Answer 1

感谢您提供伪代码 - 它仍然缺少一些细节，但我想我明白了基本的想法。你有一个索引可以环绕的有界 SPSC 队列，你使用 push 中的 cached_tail 变量来检查是否有空闲槽，这样你就可以避免从潜在的加载 tail缓存行无效（pop 反之亦然）。

我建议将 head 和 cached_tail 并排放置（即在同一缓存行上），并将 tail 和 cached_head 放在不同的缓存行上一。 push always 读取 both 变量 - head 和 cached_tail，因此关闭它们是有意义的一起。 cached_tail 仅在没有更多空闲插槽且我们必须重新加载时更新 tail。

您的代码在细节上有点单薄，但似乎还有一些优化空间：

bool push(const void* elems, size_t n)
{
    size_t h = atomic_load(head);

    if (num_remaining_storage(h, cached_tail) < n)
    {
        auto t = atomic_load(tail);
        if (t == cached_tail)
          return false;

       // we only have to update our cached_tail if the reloaded value
       // is different - and in this case it is guaranteed that there
       // is a free slot, so we don't have to perform a recheck.
        cached_tail = t;
    }

    // write from elems

    atomic_store(head, h + n);
    return true;
}

这样 cached_tail 只会在 head 也更新时才会更新，所以这是它们位于同一缓存行的另一个原因。当然，同样的优化也可以应用于 pop。

这正是我想看一些代码的原因，因为 访问模式 对于确定哪些变量应该共享缓存行而哪些不应该是至关重要的。

避免错误共享 SPSC 队列索引

Avoiding false sharing of SPSC queue indices

c++

concurrency

atomic

lock-free

false-sharing