为什么“-XX:+UseLWPSynchronization”会对 Windows OS 上的 stamplock 产生负面影响?

Why could '-XX:+UseLWPSynchronization' negatively impact stamplock on Windows OS?

我想在 java 中实现一个简单的速率限制器来学习如何使用 jmh。在 'https://github.com/William1104/rate-limiter'

处创建了一个简单的 github 项目

有趣的是,当启用“-XX:+UseLWPSynchronization”选项时,某些实现(使用 stamplock)的吞吐量会受到影响。基准测试是在 Windows 机器上进行的,我希望它对非 Solaris 系统没有影响。但是,测试结果显示不同。我可以知道有人可以帮助我了解到底发生了什么吗?

以下是我机器上的测试结果作为参考:

有选项: -server, -XX:+UnlockDiagnosticVMOptions, -XX:+UseNUMA

Benchmark (rateLimiterType) Mode Cnt Score Error Units
RaterLimiterBenchmark.thread_1 StampLockLongArrayRateLimiter thrpt 90 21487.385 ▒ 1082.163 ops/ms
RaterLimiterBenchmark.thread_1 StampLockInstantArrayRateLimiter thrpt 90 13162.330 ▒ 1585.555 ops/ms
RaterLimiterBenchmark.thread_1 SynchronizedLongArrayRateLimiter thrpt 90 15362.934 ▒ 227.704 ops/ms
RaterLimiterBenchmark.thread_1 SynchronizedInstantArrayRateLimiter thrpt 90 17281.675 ▒ 2148.057 ops/ms
RaterLimiterBenchmark.thread_10 StampLockLongArrayRateLimiter thrpt 90 6868.653 ▒ 146.372 ops/ms
RaterLimiterBenchmark.thread_10 StampLockInstantArrayRateLimiter thrpt 90 8189.747 ▒ 335.517 ops/ms
RaterLimiterBenchmark.thread_10 SynchronizedLongArrayRateLimiter thrpt 90 6643.004 ▒ 103.568 ops/ms
RaterLimiterBenchmark.thread_10 SynchronizedInstantArrayRateLimiter thrpt 90 5252.975 ▒ 190.363 ops/ms
RaterLimiterBenchmark.thread_100 StampLockLongArrayRateLimiter thrpt 90 7352.890 ▒ 2109.446 ops/ms
RaterLimiterBenchmark.thread_100 StampLockInstantArrayRateLimiter thrpt 90 8675.814 ▒ 922.653 ops/ms
RaterLimiterBenchmark.thread_100 SynchronizedLongArrayRateLimiter thrpt 90 6509.368 ▒ 157.212 ops/ms
RaterLimiterBenchmark.thread_100 SynchronizedInstantArrayRateLimiter thrpt 90 5042.867 ▒ 192.971 ops/ms

有选项:-server, -XX:+UnlockDiagnosticVMOptions, -XX:+UseNUMA, -XX:+UseLWPSynchronization

Benchmark (rateLimiterType) Mode Cnt Score Error Units
RaterLimiterBenchmark.thread_1 StampLockLongArrayRateLimiter thrpt 90 11383.198 ▒ 353.921 ops/ms
RaterLimiterBenchmark.thread_1 StampLockInstantArrayRateLimiter thrpt 90 11666.918 ▒ 842.426 ops/ms
RaterLimiterBenchmark.thread_1 SynchronizedLongArrayRateLimiter thrpt 90 15696.852 ▒ 371.078 ops/ms
RaterLimiterBenchmark.thread_1 SynchronizedInstantArrayRateLimiter thrpt 90 15357.617 ▒ 650.846 ops/ms
RaterLimiterBenchmark.thread_10 StampLockLongArrayRateLimiter thrpt 90 6937.050 ▒ 130.727 ops/ms
RaterLimiterBenchmark.thread_10 StampLockInstantArrayRateLimiter thrpt 90 8268.909 ▒ 291.471 ops/ms
RaterLimiterBenchmark.thread_10 SynchronizedLongArrayRateLimiter thrpt 90 9134.319 ▒ 1208.998 ops/ms
RaterLimiterBenchmark.thread_10 SynchronizedInstantArrayRateLimiter thrpt 90 5294.341 ▒ 225.995 ops/ms
RaterLimiterBenchmark.thread_100 StampLockLongArrayRateLimiter thrpt 90 8453.825 ▒ 1075.312 ops/ms
RaterLimiterBenchmark.thread_100 StampLockInstantArrayRateLimiter thrpt 90 16297.921 ▒ 611.255 ops/ms
RaterLimiterBenchmark.thread_100 SynchronizedLongArrayRateLimiter thrpt 90 12536.378 ▒ 974.951 ops/ms
RaterLimiterBenchmark.thread_100 SynchronizedInstantArrayRateLimiter thrpt 90 9051.560 ▒ 1303.856 ops/ms

有StampLockLongArrayRateLimiter和SynchronizedLongArrayRateLImiter的实现:

package one.williamwong.ratelimiter;

import java.time.Duration;
import java.util.Arrays;
import java.util.concurrent.locks.StampedLock;

public class StampLockLongArrayRateLimiter implements IRateLimiter {

    private final long duration;
    private final long[] records;
    private final StampedLock lock;
    private int pointer;

    public StampLockLongArrayRateLimiter(int maxInvokes, Duration duration) {
        this.duration = duration.toNanos();
        this.records = new long[maxInvokes];
        this.lock = new StampedLock();
        this.pointer = 0;
    }

    @Override public void acquire() {
        final long stamp = lock.writeLock();
        try {
            final long now = System.nanoTime();
            if (records[pointer] != 0) {
                final long awayFromHead = now - records[pointer];
                if (awayFromHead < duration) {
                    handleExcessLimit(records.length, Duration.ofNanos(awayFromHead));
                }
            }
            records[pointer] = now;
            pointer = (pointer + 1) % records.length;
        } finally {
            lock.unlockWrite(stamp);
        }
    }

    @Override public void reset() {
        final long stamp = lock.writeLock();
        try {
            Arrays.fill(records, 0);
            this.pointer = 0;
        } finally {
            lock.unlockWrite(stamp);
        }
    }

}
package one.williamwong.ratelimiter;

import java.time.Duration;
import java.util.Arrays;

public class SynchronizedLongArrayRateLimiter implements IRateLimiter {

    private final long duration;
    private final long[] records;
    private final Object lock;
    private int pointer;

    public SynchronizedLongArrayRateLimiter(int maxInvokes, Duration duration) {
        this.duration = duration.toNanos();
        this.records = new long[maxInvokes];
        this.lock = new Object();
        this.pointer = 0;
    }

    @Override
    public void acquire() {
        synchronized (lock) {
            final long now = System.nanoTime();
            if (records[pointer] != 0) {
                final long awayFromHead = now - records[pointer];
                if (awayFromHead < duration) {
                    handleExcessLimit(records.length, Duration.ofNanos(awayFromHead));
                }
            }
            records[pointer] = now;
            pointer = (pointer + 1) % records.length;
        }
    }

    @Override public void reset() {
        synchronized (lock) {
            Arrays.fill(records, 0);
            this.pointer = 0;
        }
    }

}

感谢您的评论。我用不同的设置重新运行我的基准测试。如果我们每次迭代 1 秒再次执行 JMH,我得到以下结果:

带有选项:-server、-XX:+UnlockDiagnosticVMOptions、-XX:+UseNUMA、-XX:-UseLWPSynchronization

Benchmark (rateLimiterType) Mode Cnt Score Error Units
RaterLimiterBenchmark.thread_1 StampLockLongArrayRateLimiter thrpt 90 23573.282 ▒ 364.739 ops/ms
RaterLimiterBenchmark.thread_1 StampLockInstantArrayRateLimiter thrpt 90 23062.260 ▒ 1035.395 ops/ms
RaterLimiterBenchmark.thread_1 SynchronizedLongArrayRateLimiter thrpt 90 34667.411 ▒ 246.003 ops/ms
RaterLimiterBenchmark.thread_1 SynchronizedInstantArrayRateLimiter thrpt 90 36426.369 ▒ 1248.360 ops/ms
RaterLimiterBenchmark.thread_10 StampLockLongArrayRateLimiter thrpt 90 13592.158 ▒ 76.319 ops/ms
RaterLimiterBenchmark.thread_10 StampLockInstantArrayRateLimiter thrpt 90 14564.306 ▒ 474.613 ops/ms
RaterLimiterBenchmark.thread_10 SynchronizedLongArrayRateLimiter thrpt 90 13524.610 ▒ 155.850 ops/ms
RaterLimiterBenchmark.thread_10 SynchronizedInstantArrayRateLimiter thrpt 90 13080.967 ▒ 309.736 ops/ms
RaterLimiterBenchmark.thread_100 StampLockLongArrayRateLimiter thrpt 90 13224.529 ▒ 459.035 ops/ms
RaterLimiterBenchmark.thread_100 StampLockInstantArrayRateLimiter thrpt 90 13890.278 ▒ 456.182 ops/ms
RaterLimiterBenchmark.thread_100 SynchronizedLongArrayRateLimiter thrpt 90 12672.925 ▒ 314.118 ops/ms
RaterLimiterBenchmark.thread_100 SynchronizedInstantArrayRateLimiter thrpt 90 12245.120 ▒ 296.395 ops/ms

有选项:-server、-XX:+UnlockDiagnosticVMOptions、-XX:+UseNUMA、-XX:+UseLWPSynchronization

Benchmark (rateLimiterType) Mode Cnt Score Error Units
RaterLimiterBenchmark.thread_1 StampLockLongArrayRateLimiter thrpt 90 24842.514 ▒ 372.521 ops/ms
RaterLimiterBenchmark.thread_1 StampLockInstantArrayRateLimiter thrpt 90 24327.864 ▒ 322.659 ops/ms
RaterLimiterBenchmark.thread_1 SynchronizedLongArrayRateLimiter thrpt 90 34490.411 ▒ 330.288 ops/ms
RaterLimiterBenchmark.thread_1 SynchronizedInstantArrayRateLimiter thrpt 90 38383.257 ▒ 654.269 ops/ms
RaterLimiterBenchmark.thread_10 StampLockLongArrayRateLimiter thrpt 90 13536.284 ▒ 74.613 ops/ms
RaterLimiterBenchmark.thread_10 StampLockInstantArrayRateLimiter thrpt 90 13702.022 ▒ 289.616 ops/ms
RaterLimiterBenchmark.thread_10 SynchronizedLongArrayRateLimiter thrpt 90 12530.107 ▒ 243.471 ops/ms
RaterLimiterBenchmark.thread_10 SynchronizedInstantArrayRateLimiter thrpt 90 10795.833 ▒ 158.400 ops/ms
RaterLimiterBenchmark.thread_100 StampLockLongArrayRateLimiter thrpt 90 13204.275 ▒ 200.937 ops/ms
RaterLimiterBenchmark.thread_100 StampLockInstantArrayRateLimiter thrpt 90 11606.823 ▒ 224.213 ops/ms
RaterLimiterBenchmark.thread_100 SynchronizedLongArrayRateLimiter thrpt 90 11504.124 ▒ 107.543 ops/ms
RaterLimiterBenchmark.thread_100 SynchronizedInstantArrayRateLimiter thrpt 90 10732.451 ▒ 118.753 ops/ms

无论是否启用 'UseLWPSynchronization',我都没有观察到巨大的性能差异。我遇到的问题与不稳定的 JMH 设置有关。