在 volatile 上下文中分析 JIT 生成的 x86 输出

Question

我写这篇 post 是为了

public class Main {
    private int x;
    private volatile int g;


    public void actor1(){
       x = 1;
       g = 1;
    }


    public void actor2(){
       put_on_screen_without_sync(g);
       put_on_screen_without_sync(x);
    }
}

现在，我正在分析JIT为上面这段代码生成了什么。从我们之前 post 的讨论中我们知道输出 1, 0 是不可能的，因为：

写入易失性 v 导致 a 在 v 之前的每个操作导致 a 在 [=15= 之前可见（将被刷新到内存） ] 将可见。

   .................(I removed not important body of method).....

  0x00007f42307d9d5e: c7460c01000000     (1) mov       dword ptr [rsi+0ch],1h
                                                ;*putfield x
                                                ; - package.Main::actor1@2 (line 14)

  0x00007f42307d9d65: bf01000000          (2) mov       edi,1h
  0x00007f42307d9d6a: 897e10              (3) mov       dword ptr [rsi+10h],edi
  0x00007f42307d9d6d: f083042400          (4) lock add  dword ptr [rsp],0h
                                                ;*putfield g
                                                ; - package.Main::actor1@7 (line 15)

  0x00007f42307d9d72: 4883c430            add       rsp,30h
  0x00007f42307d9d76: 5d                  pop       rbp
  0x00007f42307d9d77: 850583535116        test      dword ptr [7f4246cef100h],eax
                                                ;   {poll_return}
  0x00007f42307d9d7d: c3                  ret

我是否正确理解它的工作原理是因为 x86 无法进行 StoreStore 重新排序？如果可以的话，它需要额外的内存屏障，是吗？

在出色的@Eugene 回答后编辑：

 int tmp = i; // volatile load
 // [LoadStore]
 // [LoadLoad]

在这里，我明白你的意思了——很明显：every action below (after) volatile read (int tmp = i) 不会被重新排序。

 // [StoreLoad] -- this one
 int tmp = i; // volatile load
 // [LoadStore]
 // [LoadLoad]

在这里，你又放了一道屏障。它确保我们不会使用 int tmp = i 重新排序任何操作。但是，为什么它很重要？为什么我有疑问？据我所知 volatile load 保证：

在易变负载可见之前，不会对易变负载进行重新排序。

我看到你写了：

There needs to be a sequential consistency

但是，我不明白为什么需要顺序一致性。

Answer 1

有几件事，首先是 will be flushed to memory - 这是非常错误的。它几乎从不刷新到主内存 - 它通常将 StoreBuffer 耗尽到 L1 并且由缓存一致性协议来同步所有缓存之间的数据，但是如果它更容易为了让您用这些术语来理解这个概念，这很好 - 只知道略有不同且速度更快。

为什么 [StoreLoad] 确实存在，这是一个很好的问题，也许这会稍微澄清一些事情。 volatile 确实是关于栅栏的，这里是一个例子，说明在一些易变操作的情况下会插入什么障碍。例如我们有一个 volatile load:

  // i is some shared volatile field
  int tmp = i; // volatile load of "i"
  // [LoadLoad|LoadStore]

注意这里的两个障碍 LoadStore 和 LoadLoad；用简单的英语来说，这意味着 volatile load/read 之后的任何 Load 和 Store 不能 "move up" 屏障，它们不能被重新排序 "above"不稳定的负载。

这里是 volatile store 的例子。

 // "i" is a shared volatile variable
 // [StoreStore|LoadStore]
 i = tmp; // volatile store

表示任何Load和Store都不能去"below"加载存储本身。

这基本上建立了 happens-before 关系，volatile load 是 获取负载 和 volatile store 是 发布存储（这也与 Store 和 Load cpu 缓冲区的实现方式有关，但这几乎超出了问题的范围）。

如果你仔细想想，这对我们所了解的 volatile 一般情况来说是完全合理的；它说一旦易失性存储被易失性负载观察到，volatile store之前的所有内容也将被观察到，这与内存障碍相当。现在说得通了，当发生volatile store时，它上面的一切都不能超过它，一旦发生volatile load，它下面的一切都不能超过它，否则happens-before就会被破坏。

但是不是这样，还有更多。需要 顺序一致性 ，这就是为什么任何理智的实现都将保证 volatiles 本身不会重新排序，因此插入了两个栅栏：

 // any store of some other volatile
 // can not be reordered with this volatile load
 // [StoreLoad] -- this one
 int tmp = i; // volatile load of a shared variable "i"
 // [LoadStore|LoadLoad]

这里还有一个：

// [StoreStore|LoadStore]
i = tmp; // volatile store
// [StoreLoad] -- and this one

现在，事实证明在 x86 上，4 个内存屏障中有 3 个是空闲的 - 因为它是 strong memory model。唯一需要实施的是 StoreLoad。在其他 CPU 上，例如 ARM，lwsycn 是使用的一条指令 - 但我对它们了解不多。

通常 mfence 是 x86 上 StoreLoad 的一个不错的选择，但通过 [=44= 可以保证同样的事情]（AFAIK 以更便宜的方式），这就是为什么你在那里看到它。基本上是 StoreLoad 障碍。是的 - 你的最后一句话是对的，对于较弱的记忆模型 - StoreStore 障碍是必需的。旁注是当您通过构造函数中的 final 字段安全发布引用时使用的内容。退出构造函数后，插入了两个栅栏：LoadStore 和 StoreStore.

对这一切持保留态度 - 只要 JVM 不违反任何规则，它就可以自由地忽略这些：Aleksey Shipilev 对此进行了精彩的讨论。

编辑

假设你有这种情况：

[StoreStore|LoadStore]
int x = 4; // volatile store of a shared "x" variable

int y = 3; // non-volatile store of shared variable "y"

int z = x; // volatile load
[LoadLoad|LoadStore]

基本上没有任何障碍可以阻止 volatile store 与 volatile load 一起重新排序（即：易失性负载将首先执行 ) 这显然会导致问题；因此违反了顺序一致性。

顺便说一句，您有点错过了这里的要点（如果我没记错的话）Every action after volatile load won't be reordered before volatile load is visible。无法对 volatile 本身进行重新排序 - 其他操作可以自由重新排序。让我举个例子：

 int tmp = i; // volatile load of a shared variable "i"
 // [LoadStore|LoadLoad]

 int x = 3; // plain store
 int y = 4; // plain store

最后两个操作x = 3和y = 4完全可以自由重新排序，它们不能浮动在volatile之上，但是它们可以通过他们自己重新订购。上面的例子是完全合法的：

 int tmp = i; // volatile load
 // [LoadStore|LoadLoad]

 // see how they have been inverted here...
 int y = 4; // plain store
 int x = 3; // plain store

在 volatile 上下文中分析 JIT 生成的 x86 输出

Analyzing of x86 output generated by JIT in the context of volatile

java

jvm

volatile

memory-barriers