对于这些关于 java volatile 和重新排序的代码，这种理解是否正确？

Question

根据此重新排序规则

reorder Rules

如果我有这样的代码

volatile int a = 0;

boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

将线程 A 设为运行 foo1，将线程 b 设为运行 foo2，因为 a= 10 是易失性存储，并且 b = true 是正常存储，那么这两个语句可能会被重新排序，这意味着在线程 B 中可能有 b = true 而 a!=10？对吗？

已添加：

感谢您的回答！
我刚刚开始学习 java 多线程，并且经常被关键字 volatile 困扰。

很多教程都谈到了volatile字段的可见性，就像“volatile字段在写操作完成后对所有读者（特别是其他线程）可见”。我怀疑完成的写入字段如何对其他线程（或 CPUS）不可见？

据我了解，完成的写入意味着您已成功将文件写回缓存，并且根据 MESI，如果该文件已被缓存，则所有其他线程都应具有无效的缓存行。一个例外（因为我对硬核不是很熟悉，这只是一个猜想）是结果可能会被写回寄存器而不是缓存，我不知道在这种情况下是否有某种协议可以保持一致性或者 volatile 使其不写入 java.

中的寄存器

在某些看起来像“隐形”的情况下会发生示例：

    A=0,B=0; 
    thread1{A=1; B=2;}  
    thread2{if(B==2) {A may be 0 here}}

假设编译器没有对它重新排序，我们在 thread2 中看到的是由于存储缓冲区，我认为存储缓冲区中的写操作并不意味着完成写入。由于存储缓冲区和无效队列策略，这使得对变量 A 的写入看起来不可见，但实际上写入操作在线程 2 读取 A 时尚未完成。即使我们将字段 B 设置为易失性，同时我们将对字段 B 的写入操作设置为具有内存屏障的存储缓冲区，线程 2 可以读取 b 值为 0 的值并完成。对于我来说，volatile 看起来与它声明的文件的可见性无关，而更像是一个边缘，以确保所有写入都发生在 ThreadA 中的 volatile 字段写入之前，对于 volatile 字段读取之后的所有操作可见（volatile read在 ThreadA 中的 volatile 字段写入完成后发生）在另一个 ThreadB 中。

顺便说一下，由于我不是母语人士，我看过很多母语教程（也有一些英文教程）说volatile会指示JVM线程从主存中读取volatile变量的值，然后不要在本地缓存它，我不认为这是真的。我说得对吗？

无论如何，谢谢你的回答，因为不是母语，所以我希望我表达清楚。

Answer 1

我很确定断言可以触发。我认为易失性负载只是一个获取操作 (https://preshing.com/20120913/acquire-and-release-semantics/) wrt。非易失性变量，因此没有什么可以阻止负载重新排序。

两个 volatile 操作无法相互重新排序，但可以在一个方向上使用非原子操作重新排序，而您选择的方向没有保证。

（注意，我不是 Java 专家；可能但不太可能 volatile 有一些语义需要更昂贵的实现。）

更具体的推理是，如果断言在针对某些特定体系结构转换为 asm 时可以触发，则必须允许 Java 内存模型触发。

Java volatile (AFAIK) 等同于 C++ std::atomic，默认为 memory_order_seq_cst。因此 foo2 可以为 ARM64 进行 JIT 编译，其中 b 的普通加载和 a.

的 LDAR 获取加载

ldar 不能用 later loads/stores 重新排序，但可以用 earlier。（stlr 发布商店除外；ARM64 专门设计用于使 C++ std::atomic<> 使用 memory_order_seq_cst / Java volatile 使用 ldar 和 stlr，不必在 seq_cst 存储上立即刷新存储缓冲区，仅在看到 LDAR 时才这样做，因此该设计提供了仍然恢复顺序一致性所需的最小数量的顺序，如 C++ 所指定的（我假设 Java).)

在许多其他 ISA 上，顺序一致性存储确实需要等待存储缓冲区自行耗尽，因此它们实际上是按顺序排列的。后来的非原子负载。在许多 ISA 上，获取或 SC 加载是通过正常加载完成的，该加载前面带有 barrier，阻止加载从任一方向穿过它，otherwise they wouldn't work。这就是为什么将 a 的易变加载编译为仅执行获取操作的获取加载指令是理解这在实践中如何发生的关键。

(在 x86 asm 中，所有加载都是获取加载，所有存储都是释放存储。但不是顺序释放；x86 的内存模型是程序顺序 + 带有存储转发的存储缓冲区，它允许 StoreLoad 重新排序，所以 Java volatile 商店需要特殊的 asm.

所以断言不能在x86上触发，除了通过compile/JIT-time reordering of the assignments。这是一个很好的例子无锁代码测试困难的原因：失败的测试可以证明存在问题，但对某些 hardware/software 组合进行测试无法证明正确性。）

Answer 2

除了 Peter Cordes 的出色回答之外，就 JMM 而言，b 上存在数据竞争，因为在 b 的写入和 b 的读取之间没有边缘发生，因为它是一个普通变量。仅当这种情况发生在边缘存在之前，才能保证如果 b=1 的负载也会看到 a=1 的负载。

您需要将 b 设为 volatile，而不是将 a 设为 volatile。

int a=0;
volatile int b=0;

thread1(){
    a=1
    b=1
}

thread2(){
  if(b==1) assert a==1;
}

因此，如果线程 2 看到 b=1，则此读取在 happens before order（易失性变量规则）中的 b=1 写入之前排序。并且由于 a=1 和 b=1 是有序的 happens before order（程序顺序规则），而 b 的读取和 a 的读取是在 happens before 顺序中排序的（再次是程序顺序规则），那么由于传递性happens before 关系，a=1 的写和a 的读之间有一个happens before edge；需要查看值 1.

您指的是使用栅栏可能实现的 JMM。尽管它提供了对幕后发生的事情的一些见解，但从围栏的角度思考同样具有破坏性，因为它们不是合适的心智模型。请参阅以下反例：

https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#myth-barriers-are-sane

Answer 3

是的，断言可能会失败。

volatile int a = 0;

boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

JMM 保证写入 volatile 字段 发生在 从它们读取之前。在您的示例中，无论线程 a 在 a = 10 之前执行什么操作，都将 发生在 之前，无论线程 b 在读取 a 之后执行什么操作（在执行 assert a == 10 时）。由于 b = true 在线程 a 的 a = 10 之后执行（对于单个线程，happens-before 始终成立），因此无法保证会有顺序保证。但是，考虑一下：

int a = 0;

volatile boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

在这个例子中，情况是：

a = 10 ---> b = true---|
                       |
                       | (happens-before due to volatile's semantics)
                       |
                       |---> if(b) ---> assert a == 10

既然你有一个总订单，断言保证通过。

Answer 4

回答你的补充。

Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?

编译器可能会弄乱代码。

例如

boolean stop;

void run(){
  while(!stop)println();
}

第一次优化

void run(){
   boolean r1=stop;
   while(!r1)println();
}

二次优化

void run(){
   boolean r1=stop;
   if(!r1)return;
   while(true) println();
}

所以现在很明显这个循环永远不会停止，因为实际上永远不会看到要停止的新值。对于商店，您可以做类似的事情，可以无限期推迟它。

As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them.

正确。这通常称为 'globally visible' 或 'globally performed'.

One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.

所有现代处理器都是 load/store 架构（即使是 uops 转换后的 X86），这意味着有明确的加载和存储指令在寄存器和内存之间传输数据，而像 add/sub 这样的常规指令只能工作与寄存器。所以无论如何都需要使用寄存器。关键部分是编译器应该尊重源代码的loads/stores并限制优化。

suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A.

在X86上，存储缓冲区中的存储顺序与程序顺序一致，将按程序顺序提交到缓存。但是在某些架构中，存储缓冲区中的存储可以乱序提交到缓存，例如由于：

写入合并
允许存储在缓存行以正确状态返回后立即提交到缓存，而不管较早的是否仍在等待。
与一部分 CPU 共享存储缓冲区。

存储缓冲区可能是重新排序的来源；但也可能是乱序和推测执行的来源。

除了商店之外，重新排序负载也可能导致观察到商店乱序。在 X86 上负载不能重新排序，但在 ARM 上是允许的。当然，JIT 也会把事情搞砸。

Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish.

重要的是要认识到 JMM 是基于顺序一致性的；所以即使它是一个宽松的内存模型（普通加载和存储的分离与同步操作如 volatile load/store lock/unlock）如果程序没有数据竞争，它只会产生顺序一致的执行。对于顺序一致性，不需要遵守实时顺序。所以 load/store 完全可以倾斜，只要：

内存顺序是所有顺序的总顺序loads/stores
内存顺序与程序顺序一致
一个负载在内存顺序中看到它之前的最近写入。

As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.

你走在正确的道路上。

示例。

int a=0
volatile int b=;

thread1(){
   1:a=1
   2:b=1
}

thread2(){
   3:r1=b
   4:r2=a
}

在这种情况下，在 1-2（程序顺序）之间的边缘之前发生了一个事件。如果 r1=1，则在 2-3（volatile 变量）之间的边缘之前发生，在 3-4（程序顺序）之间的边缘之前发生。

因为happens before关系是传递性的，所以在1-4之间有一个happens before edge。所以 r2 必须是 1.

volatile 负责以下内容：

可见性：需要确保 load/store 不会被优化掉。
即load/store是原子的。所以一个load/store应该看不到部分。
最重要的是，它需要确保保留 1-2 和 3-4 之间的顺序。

By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true.

你完全正确。这是一个非常普遍的误解。缓存是真相的来源，因为它们始终是连贯的。如果每次写入都需要进入主存，程序会变得非常慢。内存只是缓存中不适合的内容的溢出桶，并且可能与缓存完全不一致。 Plain/volatile loads/stores 存储在缓存中。可以在 MMIO 等特殊情况下绕过缓存，或者使用例如SIMD 指令，但它与这些示例无关。

Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.

这里的大多数人都不是母语人士（我当然不是）。你的英语很好，很有前途。

对于这些关于 java volatile 和重新排序的代码，这种理解是否正确？

Is this understanding correct for these code about java volatile and reordering?

java

multithreading

volatile

memory-barriers

cpu-cache