再次设置 AtomicBoolean
Setting an AtomicBoolean again
我正在使用 AtomicBoolean
来强制执行线程之间的 volatile
可见性。一个线程正在更新值,另一个线程只读取它。
假设当前值为true
。现在假设 写入线程 再次将其值设置为 true
:
final AtomicBoolean b = new AtomicBoolean(); // shared between threads
b.set(true);
// ... some time later
b.set(true);
在此'dummy'set(true)
之后,读取线程调用get()
时是否有性能损失? 读取线程是否需要重新读取并缓存值?
如果是这种情况,写入线程 可以完成:
b.compareAndSet(false, true);
这样,读线程只需要为真正的改变而失效。
写入和 CAS "touch" 缓存行触发缓存行变脏。
但是成本相对较小,大约为 30 - 50 ns。
由于尚未 运行 10,000 次而未预热代码的成本可能要高得多。
public final boolean compareAndSet(boolean expect, boolean update) {
int e = expect ? 1 : 0;
int u = update ? 1 : 0;
return unsafe.compareAndSwapInt(this, valueOffset, e, u);
}
compareAndSwapInt()
已经是原生的了:
UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
UnsafeWrapper("Unsafe_CompareAndSwapInt");
oop p = JNIHandles::resolve(obj);
jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END
其中 Atomic::cmpxchg
是 generated 在 JVM 执行开始的某处,如
address generate_atomic_cmpxchg() {
StubCodeMark mark(this, "StubRoutines", "atomic_cmpxchg");
address start = __ pc();
__ movl(rax, c_rarg2);
if ( os::is_MP() ) __ lock();
__ cmpxchgl(c_rarg0, Address(c_rarg1, 0));
__ ret(0);
return start;
}
cmpxchgl()
生成 x86 代码(它也有更长的遗留代码路径,所以我不在此处复制那个):
InstructionMark im(this);
prefix(adr, reg);
emit_byte(0x0F);
emit_byte(0xB1);
emit_operand(reg, adr);
0F
B1
实际上是一个 CMPXCHG
operation. If you check the code above, if ( os::is_MP() ) __ lock();
emits a LOCK
prefix on multiprocessor machines (let me just skip quoting lock()
,它发出一个 F0
字节),所以几乎无处不在。
正如 CMPXCHG
文档所说:
This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)
所以在多处理器 x86 机器上,NOP-CAS 也会进行写入,从而影响缓存行。 (重点是我加的)
我正在使用 AtomicBoolean
来强制执行线程之间的 volatile
可见性。一个线程正在更新值,另一个线程只读取它。
假设当前值为true
。现在假设 写入线程 再次将其值设置为 true
:
final AtomicBoolean b = new AtomicBoolean(); // shared between threads
b.set(true);
// ... some time later
b.set(true);
在此'dummy'set(true)
之后,读取线程调用get()
时是否有性能损失? 读取线程是否需要重新读取并缓存值?
如果是这种情况,写入线程 可以完成:
b.compareAndSet(false, true);
这样,读线程只需要为真正的改变而失效。
写入和 CAS "touch" 缓存行触发缓存行变脏。
但是成本相对较小,大约为 30 - 50 ns。
由于尚未 运行 10,000 次而未预热代码的成本可能要高得多。
public final boolean compareAndSet(boolean expect, boolean update) {
int e = expect ? 1 : 0;
int u = update ? 1 : 0;
return unsafe.compareAndSwapInt(this, valueOffset, e, u);
}
compareAndSwapInt()
已经是原生的了:
UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
UnsafeWrapper("Unsafe_CompareAndSwapInt");
oop p = JNIHandles::resolve(obj);
jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END
其中 Atomic::cmpxchg
是 generated 在 JVM 执行开始的某处,如
address generate_atomic_cmpxchg() {
StubCodeMark mark(this, "StubRoutines", "atomic_cmpxchg");
address start = __ pc();
__ movl(rax, c_rarg2);
if ( os::is_MP() ) __ lock();
__ cmpxchgl(c_rarg0, Address(c_rarg1, 0));
__ ret(0);
return start;
}
cmpxchgl()
生成 x86 代码(它也有更长的遗留代码路径,所以我不在此处复制那个):
InstructionMark im(this);
prefix(adr, reg);
emit_byte(0x0F);
emit_byte(0xB1);
emit_operand(reg, adr);
0F
B1
实际上是一个 CMPXCHG
operation. If you check the code above, if ( os::is_MP() ) __ lock();
emits a LOCK
prefix on multiprocessor machines (let me just skip quoting lock()
,它发出一个 F0
字节),所以几乎无处不在。
正如 CMPXCHG
文档所说:
This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)
所以在多处理器 x86 机器上,NOP-CAS 也会进行写入,从而影响缓存行。 (重点是我加的)