memory_order_relaxed 的商店是否有可能永远不会到达其他线程？

Is it possible that a store with memory_order_relaxed never reaches other threads?

假设我有一个线程 A 使用 x.store(1, std::memory_order_relaxed); 写入 atomic_int x = 0;。如果没有任何其他同步方法，使用 x.load(std::memory_order_relaxed); 其他线程需要多长时间才能看到它？考虑到标准给出的 C/C++ 内存模型的当前定义，写入 x 的值是否可能完全保持线程局部？

我手头的实际案例是线程 B 频繁读取 atomic_bool 以检查它是否必须退出；另一个线程，在某个时候，将 true 写入此 bool，然后在线程 B 上调用 join() 。显然我不介意在线程 B 甚至可以看到之前调用 join() atomic_bool 已设置，我也不介意线程 B 在我调用 join() 之前已经看到更改并退出执行。但我想知道：在两侧使用 memory_order_relaxed，是否可以调用 join() 并阻止 "forever" 因为更改永远不会传播到线程 B？

编辑

我联系了 Mark Batty（数学验证并随后修复 C++ 内存模型要求的大脑）。最初是关于其他事情（后来证明是 cppmem 和他的论文中的一个已知错误；幸运的是我没有完全出洋相，并借此机会也问了他这件事；他的回答是：

Q: Can it theoretically be that such a store [memory_order_relaxed without (any following) release operation] never reaches the other thread?
Mark: Theoretically, yes, but I don't think that has been observed.
Q: In other words, do relaxed stores make no sense whatsoever unless you combine them with some release operation (and acquire on the other thread), assuming you want another thread to see it?
Mark: Nearly all of the use cases for them do use release and acquire, yes.

这就是标准对此事的全部说法，我相信：

[intro.multithread]/25 An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

29.3.12标准是这么说的：

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

不能保证 store 会在另一个线程中可见，不能保证时间，并且与内存顺序没有正式关系。

当然，在每个常规架构上 store 将变得可见，但在不支持缓存一致性的罕见平台上，它可能永远不会对load.
在这种情况下，您将不得不进行原子 read-modify-write 操作以获取修改顺序中的最新值。

在实践中

Without any other synchronization methods, how long would it take before other threads can see this, using x.load(std::memory_order_relaxed);?

没时间。这是一个正常的写入，它进入存储缓冲区，所以它会在比眨眼更短的时间内在 L1d 缓存中可用。但这只是当汇编指令是运行.

指令可以由编译器重新排序，但没有合理的编译器会在任意长的循环上重新排序原子操作。

理论上

Q: Can it theoretically be that such a store [memory_order_relaxed without (any following) release operation] never reaches the other thread?

Mark: Theoretically, yes,

你应该问他如果把 "following release fence" 加回来会怎样。或者用原子存储释放操作。

为什么不对这些重新排序并延迟很长时间？（这么长的时间在实践中似乎是永恒的）

Is it possible that the value written to x stays entirely thread-local given the current definition of the C/C++ memory model that the standard gives?

如果一个虚构的，尤其是 不正当的 实现想要延迟原子操作的可见性，为什么它只对宽松的操作这样做？它可以很好地完成所有原子操作。

或者永远不会运行某些线程。

或者运行一些线程速度太慢以至于你会相信它们不是运行ning。

memory_order_relaxed 的商店是否有可能永远不会到达其他线程？

Is it possible that a store with memory_order_relaxed never reaches other threads?

c++

memory-barriers

c++11

relaxed-atomics

编辑

在实践中

理论上