再次设置 AtomicBoolean

Setting an AtomicBoolean again

我正在使用 AtomicBoolean 来强制执行线程之间的 volatile 可见性。一个线程正在更新值,另一个线程只读取它。

假设当前值为true。现在假设 写入线程 再次将其值设置为 true

final AtomicBoolean b = new AtomicBoolean(); // shared between threads

b.set(true);
// ... some time later
b.set(true);

在此'dummy'set(true)之后,读取线程调用get()时是否有性能损失? 读取线程是否需要重新读取并缓存值?

如果是这种情况,写入线程 可以完成:

b.compareAndSet(false, true);

这样,读线程只需要为真正的改变而失效。

写入和 CAS "touch" 缓存行触发缓存行变脏。

但是成本相对较小,大约为 30 - 50 ns。

由于尚未 运行 10,000 次而未预热代码的成本可能要高得多。

compareAndSet():

public final boolean compareAndSet(boolean expect, boolean update) {
    int e = expect ? 1 : 0;
    int u = update ? 1 : 0;
    return unsafe.compareAndSwapInt(this, valueOffset, e, u);
}

compareAndSwapInt() 已经是原生的了:

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
  UnsafeWrapper("Unsafe_CompareAndSwapInt");
  oop p = JNIHandles::resolve(obj);
  jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
  return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END

其中 Atomic::cmpxchggenerated 在 JVM 执行开始的某处,如

  address generate_atomic_cmpxchg() {
    StubCodeMark mark(this, "StubRoutines", "atomic_cmpxchg");
    address start = __ pc();

    __ movl(rax, c_rarg2);
   if ( os::is_MP() ) __ lock();
    __ cmpxchgl(c_rarg0, Address(c_rarg1, 0));
    __ ret(0);

    return start;
  }

cmpxchgl() 生成 x86 代码(它也有更长的遗留代码路径,所以我不在此处复制那个):

 InstructionMark im(this);
 prefix(adr, reg);
 emit_byte(0x0F);
 emit_byte(0xB1);
 emit_operand(reg, adr);

0F B1 实际上是一个 CMPXCHG operation. If you check the code above, if ( os::is_MP() ) __ lock(); emits a LOCK prefix on multiprocessor machines (let me just skip quoting lock(),它发出一个 F0 字节),所以几乎无处不在。

正如 CMPXCHG 文档所说:

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

所以在多处理器 x86 机器上,NOP-CAS 也会进行写入,从而影响缓存行。 (重点是我加的)