单生产者单消费者实现中的内存屏障

Question

来自 Wikipedia 的以下实现：

volatile unsigned int produceCount = 0, consumeCount = 0;
TokenType buffer[BUFFER_SIZE];

void producer(void) {
    while (1) {
        while (produceCount - consumeCount == BUFFER_SIZE)
            sched_yield(); // buffer is full

        buffer[produceCount % BUFFER_SIZE] = produceToken();
        // a memory_barrier should go here, see the explanation above
        ++produceCount;
     }
}

void consumer(void) {
    while (1) {
        while (produceCount - consumeCount == 0)
            sched_yield(); // buffer is empty

        consumeToken(buffer[consumeCount % BUFFER_SIZE]);
        // a memory_barrier should go here, the explanation above still applies
        ++consumeCount;
     }
}

表示必须在访问缓冲区的行和更新 Count 变量的行之间使用 内存屏障 。

这样做是为了防止 CPU 将围栏上方的指令与其下方的指令重新排序。 Count 变量在用于索引缓冲区之前不应递增。

如果不使用fence，这种重新排序会不会违反代码的正确性？ CPU 在用于索引缓冲区之前不应执行 Count 的增量。 CPU 指令重新排序时是否不处理数据相关性？

谢谢

Answer 1

If a fence is not used, won't this kind of reordering violate the correctness of code? The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?

问得好。

在 c++ 中，除非使用某种形式的内存屏障（原子、互斥等），否则编译器假定代码是单线程的。在这种情况下，as-if 规则表示编译器可以发出它喜欢的任何代码，前提是总体可观察到的效果是 'as if' 您的代码是按顺序执行的。

如评论中所述，volatile 不一定会改变这一点，它只是一个实现定义的提示，即变量可能会在两次访问之间发生变化（这 不一样 被另一个线程修改）。

因此，如果您编写没有内存屏障的多线程代码，则无法保证一个线程中对变量的更改甚至会被另一个线程观察到，因为就编译器而言，其他线程不应该触摸同样的记忆，永远。

您实际观察到的是未定义的行为。

Answer 2

看来，你的问题是"can incrementing Count and assigment to buffer be reordered without changing code behavior?"。

考虑以下代码转换：

int count1 = produceCount++;
buffer[count1 % BUFFER_SIZE] = produceToken();

请注意，代码的行为与原始代码完全相同：一次从 volatile 变量读取，一次写入 volatile，读取发生在写入之前，程序状态相同。但是，其他线程将看到关于 produceCount 增量和 buffer 修改顺序的不同图片。

编译器和 CPU 都可以在没有内存栅栏的情况下进行转换，因此您需要强制这两个操作以正确的顺序进行。

Answer 3

If a fence is not used, won't this kind of reordering violate the correctness of code?

没有。你能构建任何可以分辨差异的可移植代码吗？

The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?

为什么不呢？所发生的成本的回报是什么？写入组合和推测性提取之类的东西是巨大的优化，禁用它们是行不通的。

如果您认为仅 volatile 就可以做到，那是不对的。 volatile 关键字在 C 或 C++ 中没有定义线程同步语义。它可能碰巧在某些平台上工作，也可能碰巧不在其他平台上工作。在 Java 中，volatile 确实定义了线程同步语义，但它们不包括提供对非易失性对象的访问顺序。

但是，内存屏障确实具有定义明确的线程同步语义。我们需要确保没有线程在看到该数据之前可以看到该数据可用。并且我们需要确保在线程完成该数据之前看不到将数据标记为能够被覆盖的线程。

单生产者单消费者实现中的内存屏障

Memory barrier in the implementation of single producer single consumer

concurrency

multithreading

operating-system

producer-consumer

memory-barriers