鉴于所有这些都在 ReentrantLock.lock()/unlock() 内执行，我可以用一个替换多个易失性读取吗

Question

我正在调查 implementation of ConcurrentReferenceHashMap in Spring Framework, particularly into restructure() method:

protected final class Segment extends ReentrantLock {

    private volatile Reference<K, V>[] references; // <-- !

    private void restructure(boolean allowResize, @Nullable Reference<K, V> ref) {
        boolean needsResize;
        lock();
        try {
            //...
            boolean resizing = false;
            int restructureSize = this.references.length; // <-- !
            //...

            Reference<K, V>[] restructured =
                    (resizing ? createReferenceArray(restructureSize) : this.references);// <-- !

            for (int i = 0; i < this.references.length; i++) { // <-- !
                ref = this.references[i]; // <-- !
                //...
            }

            if (resizing) {
                this.references = restructured;
                this.resizeThreshold = (int) (this.references.length * getLoadFactor());// <-- !
            }
            //...
        }
        finally {
            unlock();
        }
    }

正如您在这里看到的，我们对易失性 references 数组进行了多次读取和写入，所有这些都发生在 lock()/unlock() 同步块中。

JavaDoc of java.util.concurrent.locks.Lock，即其 Memory Synchronization 部分声称，

All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock, as described in Chapter 17 of The Java™ Language Specification:

A successful lock operation has the same memory synchronization effects as a successful Lock action.

A successful unlock operation has the same memory synchronization effects as a successful Unlock action.

Unsuccessful locking and unlocking operations, and reentrant locking/unlocking operations, do not require any memory synchronization effects.

我的问题是：我可以重写代码以便从易失性字段读取一个本地变量（即在堆栈上同步）并使用它来避免重复易失性访问吗？假设

它不会破坏JMM吗

Unsuccessful locking and unlocking operations, and reentrant locking/unlocking operations, do not require any memory synchronization effects.

Answer 1

My question is: can I rewrite the code in order to have one read from volatile field into a local var (i.e. synchronize on stack) and use it to avoid repeating volatile access? Won't it break JMM ...

你可以这样做，但你可能不应该这样做。

你可以这样做，因为所有对 references 的写入（在任何地方，不仅仅是在 restructure() 方法内）都发生在 lock()/unlock() 块内。
正如您所指出的，这些 lock()/unlock() 块 强制执行与内置监视器锁 .
提供的相同的内存同步语义内置监视器（即 synchronized）provide visibility and atomicity guarantees 比 volatile 保证更强大。

你可能不应该那样做，因为你问的是它是否会破坏 JMM。
您似乎不确定 JMM 的工作原理。
同时 ConcurrentReferenceHashMap 内部的 references 实际上用于具有数据竞争的代码：例如，这里是 restructure() 中写入 references[i] 和读取 references[i] 之间的数据竞争getReference():

    public Reference<K, V> getReference(@Nullable Object key, int hash, Restructure restructure) {
        ...
        Reference<K, V>[] references = this.references;
        ...
        Reference<K, V> head = references[index];
        ...
    }

    private void restructure(boolean allowResize, @Nullable Reference<K, V> ref) {
        ...
        lock();
        try {
            ...
            Reference<K, V>[] restructured =
                    (resizing ? createReferenceArray(restructureSize) : this.references);
            ...
            if (!resizing) {
                restructured[i] = null;
            }
            ...
            restructured[index] = this.referenceManager.createReference(
                                    entry, ref.getHash(), restructured[index]);
            ...
        }
        finally {
            unlock();
        }
    }

可以编写具有数据竞争且工作正常的代码，但您应该完全了解 JMM 和您的并发算法如何工作的所有细节。
否则很有可能引入一个或多个同步错误，这是最糟糕的：它们违反直觉（请参阅 this 的一些示例）并且无法进行单元测试（甚至无法可靠地复制）。

Answer 2

此优化的目的是什么？您是否制定了基准测试并使用分析器运行确定代码实际上是瓶颈？请查看 JMH 以编写微基准测试。

首先，你已经有了一把锁。如果出于任何原因争夺此锁，则上下文切换的开销比 volatile 变量的潜在开销高很多。

即使没有争用锁，也不意味着 volatile 变量很昂贵。例如。在 X86 上对不同变量的易失性写入和易失性读取需要昂贵的 [StoreLoad] 屏障，这将阻止 CPU 在存储缓冲区耗尽之前执行加载。需要这个 [StoreLoad] 来保持顺序一致性；否则易失性写入和易失性读取（不同地址）可能会被重新排序。

但是，如果您有多个连续的易失性写入，然后是易失性读取（不同的变量），那么在 X86 上，在最后一个易失性写入和易失性读取之间只需要一个 [StoreLoad]，因为存储不会重新排序.因此，从 CPU 内存栅栏的角度来看，前面的易失性写入是免费的。所以 volatile 写入可能非常便宜。

在 X86 上，易失性读取也很便宜。在 X86 上，每一个加载都是一个获取加载，获取加载足以实现顺序一致性。请记住，在现代 CPUs 上，缓存始终是一致的，并且如果缓存行在本地 CPU 上已经处于正确状态，那么易失性读取与从 CPU 记忆栅栏视角。因此，如果您先进行 1 次易失性读取，然后进行 9 次常规读取，那么与 10 次易失性读取相比，执行起来应该没有任何不同。

volatile 阻止的优化的主要来源是由 JIT 完成的优化。

有关更多信息，请参阅这篇精彩的 post： https://shipilev.net/blog/2014/on-the-fence-with-dependencies/

[更新] 下面的循环我肯定会尝试看看：

for (int i = 0; i < this.references.length; i++) { // <-- !
                ref = this.references[i]; // <-- !
                //...
            }

我会将其转换为：

Reference<K, V>[] localReferences = this.references;
for (int i = 0; i < localReferences.length; i++) { // <-- !
                ref = localReferences[i]; // <-- !
                //...
            }

我想一些代码优化是可能的。例如。由于现代处理器的超标量能力，也许可以展开循环，并且可以并行完成对 localReferences 的多个分配。所以我肯定会创建一个微基准测试，看看发生了什么。 JMH 支持各种分析器，您也可以看到生成的程序集。

Answer 3

这是我对中附加问题的回答：

I don't understand the race here: writing into references within restructure() is guarded by lock()-unlock() block, so it happens-before read from references in getReference(), doesn't it? And vice-versa: reading from references in getReference() "sees" only the last (in terms of execution within lock-unlock) writing into references, doesn't it?

先写几个笔记：

... writing into references within restructure() is guarded by lock()-unlock() block, so it happens-before read from references in getReference() ...

事实并非如此。

lock()-unlock() 块仅向其他 lock()-unlock() 块提供 happens-before 和原子性保证（并且它们必须使用相同的 Lock 对象）。

getReference() 中的读取不在 lock()-unlock() 中，因此它可以与另一个在 restructure() 方法中写入内容的线程并行发生。
... reading from references in getReference() "sees" only the last (in terms of execution within lock-unlock) writing into reference ...

references（我这里说的是reference，它是Segment中的实例字段，不是getReference()中的同名局部变量）是volatile 字段，因此，对该字段的所有读取和写入都以全局顺序发生（synchronization order，每次执行一个，在不同的执行中可能不同），并且每次读取 references 总是看到最新的写入。

了解这一点很重要：
- 只有 references 字段（存储对 Reference<K, V>[] 对象的引用）是易变的，而不是 Reference<K, V>[] 对象本身
- 在某个索引 references[i] 处读取和写入数组元素不是易失性的，即使 references 是易失性的。
  在 JLS 术语中，这些是 different variables.

这是一个有数据竞争的执行（为清楚起见简化了代码）：

volatile references = [val0]; // Initially

Thread 1                  | Thread 2                | Thread 3               
----------------------------------------------------|------------------------
                          | restructure(...) {      |                        
                          |   lock();               |                        
                          |   references[0] = val1; |                        
                          |   unlock();             |                        
                          | }                       | restructure(...) {     
                          |                         |   lock();              
                          |                         |   references[0] = val2;
                          |                         |   unlock();            
                          |                         | }                      
getReference(...)         |                         |                        
  var r2 = references[0]; |                         |                        
}                         |                         |

鉴于：

references[0] = val2 从 JMM 的角度来看实际上是 2 个动作：
- references
- 对数组元素的非易失性写入 references[0]
var r2 = references[0]; 从 JMM 的角度来看也是 2 个动作：
- references
- references[0] 数组元素的非易失性读取

这是反映这一点的版本：

 // Initially
volatile v = [val0];

 Thread 1    | Thread 2      | Thread 3     
--------------------------------------------
             | lock();       |              
             | r3 = v;       |              
             | r3[0] = val1; |              
             | unlock();     |              
             |               | lock();      
             |               | r4 = v;      
             |               | r4[0] = val2;
             |               | unlock();    
 r1 = v;     |               |              
 r2 = r1[0]; |               |

让我们用JLS actions重写它：

o代表一个Reference<K, V>[]对象
v 是一个 volatile 变量，通过对 o 对象
Rv 是易失性读取
R和W为非易失性读写
value x in R(..):x or Rv(..):x 显示由 read

// Initial writes
W(o=[val0]), Wv(v=o)

 T1        | T2           | T3           
-----------------------------------------
           | Lock         |              
           | Rv(v):o      |              
           | W(o[0]=val1) |              
           | Unlock       |              
           |              | Lock         
           |              | Rv(v):o      
           |              | W(o[0]=val2) 
           |              | Unlock       
 Rv(v):o   |              |              
 R(o[0]):? |              |

节目顺序：

T1.Rv(v) -> T1.R(o[0])
T2.Lock -> T2.Rv(v) -> T2.W(o[0]=val1) -> T2.Unlock
T3.Lock -> T3.Rv(v) -> T3.W(o[0]=val2) -> T3.Unlock

同步顺序（特定于此执行）- 同步操作的全局顺序：

Initial writes -> T2.Lock -> T2.Rv(v) -> T2.Unlock -> T3.Lock -> T3.Rv(v) -> T3.Unlock -> T1.Rv(v)

Synchronized-with关系（它存在于同步顺序的一些动作对之间）：

Initial writes -> 1st action in every thread
T2.Unlock -> T3.Lock

以下是程序顺序和同步对象：

           W(o=[val0])                 
              ↓ (po)                   
            Wv(v=o)                    
   ┌——————————┼————————————┐           
T1 │       T2 │         T3 │           
   │          ↓ (sw)       │           
   │        Lock           │           
   │          ↓ (po)       │           
   │        Rv(v):o        │           
   │          ↓ (po)       │           
   │        W(o[0]=val1)   │           
   │          ↓ (po)       │           
   │        Unlock         ↓ (sw)      
   │          └—————————→ Lock         
   │                (sw)   ↓ (po)     
   │                      Rv(v):o      
   │                       ↓ (po)     
   │                      W(o[0]=val2) 
   │                       ↓ (po)     
   ↓ (sw)                 Unlock       
 Rv(v):o                               
   ↓ (po)                              
 R(o[0]):?

结合程序顺序和同步给我们 happens-before:

           W(o=[val0])                 
              ↓ (hb)                   
            Wv(v=o)                    
   ┌——————————┤                        
   │       T2 │                       
   │          ↓ (hb)                  
   │        Lock                      
   │          ↓ (hb)                  
   │        Rv(v):o                   
   │          ↓ (hb)                  
   │        W(o[0]=val1)              
   │          ↓ (hb)                  
   │        Unlock               
   │          └—————————————┐           
   │                     T3 │      
   │                        ↓ (hb)     
   │                      Lock         
   │                        ↓ (hb)     
   │                      Rv(v):o      
   │                        ↓ (hb)     
   │                      W(o[0]=val2) 
T1 │                        ↓ (hb)     
   ↓ (hb)                 Unlock       
 Rv(v):o                               
   ↓ (hb)                              
 R(o[0]):?

如您所见：

对 o[0] 的访问存在冲突：
- W(o=[val0]) 在初始写入
- R(o[0]) 在 T1
- W(o[0]=val1) 在 T2
- W(o[0]=val2) 在 T3
T1 中 o[0] 的读取未按顺序发生在 T2 和 T3 中写入 o[0]

根据 JLS 的定义，这是一场数据竞争：

When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race.

因此本次执行中T1中o[0]的读可以return:

在它之前发生的最新写入（即来自初始写入的 val0）
或任何与其发生之前无关的写入（即来自 T2 的 val1 或来自 T3 的 val2）

当然，这只是一种可能的执行方式，还有很多其他可能的执行方式。

鉴于所有这些都在 ReentrantLock.lock()/unlock() 内执行，我可以用一个替换多个易失性读取吗

Can I replace multiple volatile reads with one given all of them are executed within ReentrantLock.lock()/unlock()

java

concurrency

multithreading

volatile

java-memory-model