memory barrier 和 complier-only fence 有什么区别

Question

如问题所述，我对内存屏障和仅限编译器的栅栏之间的区别感到困惑。

它们一样吗？如果不是，它们之间有什么区别？

Answer 1

内存屏障在硬件中实现，并阻止 CPU 本身重新排序指令。

但是，仅编译器的栅栏会阻止编译器的优化器对指令重新排序，但 CPU 仍然可以对它们重新排序。

Answer 2

作为具体示例，请考虑以下代码：

int x = 0, y = 0;

void foo() {
    x = 10;
    y = 20;
}

就目前而言，没有任何障碍或栅栏，编译器可能会重新排序这两个存储并发出汇编（伪）代码，如

STORE [y], 20
STORE [x], 10

如果在 x = 10; 和 y = 20; 之间插入编译器专用栅栏，编译器将无法执行此操作，而必须发出

STORE [x], 10
STORE [y], 20

但是，假设我们有另一个观察者在内存中查看 x 和 y 的值，例如内存映射硬件设备，或者另一个线程正在做

void observe() {
    std::cout << x << ", ";
    std::cout << y << std::endl;
}

（为简单起见，假设 observe() 中 x 和 y 的加载不会以任何方式重新排序，并且加载和存储到 int 发生在这个系统上是原子的。）根据它的加载发生在 foo() 中的存储的时间，我们可以看到它可以打印出 0, 0 或 10, 0 或 10, 20.看起来 0, 20 是不可能的，但实际上并非如此。

即使 foo 中的指令以 x 和 y 的顺序存储，在某些没有严格 store ordering, that does not guarantee that those stores will become visible to observe() in the same order. It could be that due to out-of-order execution, the core executing foo() actually executed the store to y before the store to x. (Say, if the cache line containing y was already in L1 cache, but the cache line for x was not; the CPU might as well go ahead and do the store to y rather than stalling for possibly hundreds of cycles while the cache line for x is loaded.) Or, the stores could be held in a store buffer 的架构上，也可能会刷新到 L1 缓存中相反的顺序。无论哪种方式，observe() 都可能打印出 0, 20.

为了确保所需的排序，必须告知 CPU 这样做，通常是通过在两个存储之间执行显式 内存屏障 指令。这将导致 CPU 等待 x 的存储可见（通过加载缓存行、清空存储缓冲区等），然后再使 y 的存储可见。所以如果你要求编译器放入一个内存屏障，它会发出像

这样的汇编

STORE [x], 10
BARRIER
STORE [y], 20

在这种情况下，您可以放心 observe() 将打印 0, 0 或 10, 0 或 10, 20，但绝不会打印 0, 20.

（请注意，这里做了很多简化的假设。如果尝试在实际的 C++ 中编写它，您需要使用 std::atomic 类型和 observe() 中的一些类似屏障来确保它的负载没有重新排序。）

memory barrier 和 complier-only fence 有什么区别

What is the difference between memory barrier and complier-only fence

c++

gcc

processor

memory-barriers