compare-and-swap和blocking算法的性能比较

Question

我有一个 ConcurrentLinkedQueue 用作底层数据结构。在每次 put 调用中，我都会向列表中添加一个唯一的增量值。我有此方法的同步版本和比较和交换版本。当我只有很少的线程（例如 5 个）并且总共执行 1000 万次输入时，我发现同步版本的效果要好得多。当我有很多线程（例如 2000）并且总共执行相同数量的 puts 时，我发现 CAS 工作得更好。为什么 CAS 与线程较少的阻塞算法相比表现不佳？

// AtomicReference<Foo> latestValue that is initialized
    public void put(Double value) {
        Foo currentValue;
        while (true) {
            currentValue = latestValue.get();
            Foo newValue = new Foo(value);
            if (latestValue.compareAndSet(currentValue, newValue)) {
                historyList.add(newValue);
                return;
            }
        }
    }

统计

NON-BLOCKING
Threads 2000
Puts per thread 10000
Put time average    208493309

BLOCKING
Threads 2000
Puts per thread 10000
Put time average    2370823534


NON-BLOCKING
Threads 2
Puts per thread 10000000
Put time average    13117487385

BLOCKING
Threads 2
Puts per thread 10000000
Put time average    4201127857

Answer 1

TL;DR 因为在无竞争的情况下 JVM 会优化 synchronized 并用 CAS 锁替换它。

在您的 CAS 案例中，您有开销：即使您的 CAS 会失败，您仍在尝试进行一些计算。当然，与真正的互斥量获取相比，这没什么，当您使用 synchronized.

时通常会发生什么

但是 JVM 并不愚蠢，当它看到您当前获取的锁不满足时，它只是用 CAS 锁替换真正的互斥锁（甚至在偏向锁定的情况下用简单的存储）。因此，对于 synchronized 情况下的两个线程，您只测量一个 CAS，但在您自己的 CAS 实现的情况下，您还测量分配 Foo 实例、compareAndSet 和 get() 的时间。

对于 2000 个线程，JVM 不执行 CAS 优化，因此您的实现优于预期的互斥量获取。

compare-and-swap和blocking算法的性能比较

Performance comparison between compare-and-swap and blocking algorithm

java

concurrency

blocking

lock-free