为什么“-XX:+UseLWPSynchronization”会对 Windows OS 上的 stamplock 产生负面影响?
Why could '-XX:+UseLWPSynchronization' negatively impact stamplock on Windows OS?
我想在 java 中实现一个简单的速率限制器来学习如何使用 jmh。在 'https://github.com/William1104/rate-limiter'
处创建了一个简单的 github 项目
有趣的是,当启用“-XX:+UseLWPSynchronization”选项时,某些实现(使用 stamplock)的吞吐量会受到影响。基准测试是在 Windows 机器上进行的,我希望它对非 Solaris 系统没有影响。但是,测试结果显示不同。我可以知道有人可以帮助我了解到底发生了什么吗?
以下是我机器上的测试结果作为参考:
有选项: -server, -XX:+UnlockDiagnosticVMOptions, -XX:+UseNUMA
Benchmark
(rateLimiterType)
Mode
Cnt
Score
Error
Units
RaterLimiterBenchmark.thread_1
StampLockLongArrayRateLimiter
thrpt
90
21487.385
▒ 1082.163
ops/ms
RaterLimiterBenchmark.thread_1
StampLockInstantArrayRateLimiter
thrpt
90
13162.330
▒ 1585.555
ops/ms
RaterLimiterBenchmark.thread_1
SynchronizedLongArrayRateLimiter
thrpt
90
15362.934
▒ 227.704
ops/ms
RaterLimiterBenchmark.thread_1
SynchronizedInstantArrayRateLimiter
thrpt
90
17281.675
▒ 2148.057
ops/ms
RaterLimiterBenchmark.thread_10
StampLockLongArrayRateLimiter
thrpt
90
6868.653
▒ 146.372
ops/ms
RaterLimiterBenchmark.thread_10
StampLockInstantArrayRateLimiter
thrpt
90
8189.747
▒ 335.517
ops/ms
RaterLimiterBenchmark.thread_10
SynchronizedLongArrayRateLimiter
thrpt
90
6643.004
▒ 103.568
ops/ms
RaterLimiterBenchmark.thread_10
SynchronizedInstantArrayRateLimiter
thrpt
90
5252.975
▒ 190.363
ops/ms
RaterLimiterBenchmark.thread_100
StampLockLongArrayRateLimiter
thrpt
90
7352.890
▒ 2109.446
ops/ms
RaterLimiterBenchmark.thread_100
StampLockInstantArrayRateLimiter
thrpt
90
8675.814
▒ 922.653
ops/ms
RaterLimiterBenchmark.thread_100
SynchronizedLongArrayRateLimiter
thrpt
90
6509.368
▒ 157.212
ops/ms
RaterLimiterBenchmark.thread_100
SynchronizedInstantArrayRateLimiter
thrpt
90
5042.867
▒ 192.971
ops/ms
有选项:-server, -XX:+UnlockDiagnosticVMOptions, -XX:+UseNUMA, -XX:+UseLWPSynchronization
Benchmark
(rateLimiterType)
Mode
Cnt
Score
Error
Units
RaterLimiterBenchmark.thread_1
StampLockLongArrayRateLimiter
thrpt
90
11383.198
▒ 353.921
ops/ms
RaterLimiterBenchmark.thread_1
StampLockInstantArrayRateLimiter
thrpt
90
11666.918
▒ 842.426
ops/ms
RaterLimiterBenchmark.thread_1
SynchronizedLongArrayRateLimiter
thrpt
90
15696.852
▒ 371.078
ops/ms
RaterLimiterBenchmark.thread_1
SynchronizedInstantArrayRateLimiter
thrpt
90
15357.617
▒ 650.846
ops/ms
RaterLimiterBenchmark.thread_10
StampLockLongArrayRateLimiter
thrpt
90
6937.050
▒ 130.727
ops/ms
RaterLimiterBenchmark.thread_10
StampLockInstantArrayRateLimiter
thrpt
90
8268.909
▒ 291.471
ops/ms
RaterLimiterBenchmark.thread_10
SynchronizedLongArrayRateLimiter
thrpt
90
9134.319
▒ 1208.998
ops/ms
RaterLimiterBenchmark.thread_10
SynchronizedInstantArrayRateLimiter
thrpt
90
5294.341
▒ 225.995
ops/ms
RaterLimiterBenchmark.thread_100
StampLockLongArrayRateLimiter
thrpt
90
8453.825
▒ 1075.312
ops/ms
RaterLimiterBenchmark.thread_100
StampLockInstantArrayRateLimiter
thrpt
90
16297.921
▒ 611.255
ops/ms
RaterLimiterBenchmark.thread_100
SynchronizedLongArrayRateLimiter
thrpt
90
12536.378
▒ 974.951
ops/ms
RaterLimiterBenchmark.thread_100
SynchronizedInstantArrayRateLimiter
thrpt
90
9051.560
▒ 1303.856
ops/ms
有StampLockLongArrayRateLimiter和SynchronizedLongArrayRateLImiter的实现:
package one.williamwong.ratelimiter;
import java.time.Duration;
import java.util.Arrays;
import java.util.concurrent.locks.StampedLock;
public class StampLockLongArrayRateLimiter implements IRateLimiter {
private final long duration;
private final long[] records;
private final StampedLock lock;
private int pointer;
public StampLockLongArrayRateLimiter(int maxInvokes, Duration duration) {
this.duration = duration.toNanos();
this.records = new long[maxInvokes];
this.lock = new StampedLock();
this.pointer = 0;
}
@Override public void acquire() {
final long stamp = lock.writeLock();
try {
final long now = System.nanoTime();
if (records[pointer] != 0) {
final long awayFromHead = now - records[pointer];
if (awayFromHead < duration) {
handleExcessLimit(records.length, Duration.ofNanos(awayFromHead));
}
}
records[pointer] = now;
pointer = (pointer + 1) % records.length;
} finally {
lock.unlockWrite(stamp);
}
}
@Override public void reset() {
final long stamp = lock.writeLock();
try {
Arrays.fill(records, 0);
this.pointer = 0;
} finally {
lock.unlockWrite(stamp);
}
}
}
package one.williamwong.ratelimiter;
import java.time.Duration;
import java.util.Arrays;
public class SynchronizedLongArrayRateLimiter implements IRateLimiter {
private final long duration;
private final long[] records;
private final Object lock;
private int pointer;
public SynchronizedLongArrayRateLimiter(int maxInvokes, Duration duration) {
this.duration = duration.toNanos();
this.records = new long[maxInvokes];
this.lock = new Object();
this.pointer = 0;
}
@Override
public void acquire() {
synchronized (lock) {
final long now = System.nanoTime();
if (records[pointer] != 0) {
final long awayFromHead = now - records[pointer];
if (awayFromHead < duration) {
handleExcessLimit(records.length, Duration.ofNanos(awayFromHead));
}
}
records[pointer] = now;
pointer = (pointer + 1) % records.length;
}
}
@Override public void reset() {
synchronized (lock) {
Arrays.fill(records, 0);
this.pointer = 0;
}
}
}
感谢您的评论。我用不同的设置重新运行我的基准测试。如果我们每次迭代 1 秒再次执行 JMH,我得到以下结果:
带有选项:-server、-XX:+UnlockDiagnosticVMOptions、-XX:+UseNUMA、-XX:-UseLWPSynchronization
Benchmark
(rateLimiterType)
Mode
Cnt
Score
Error
Units
RaterLimiterBenchmark.thread_1
StampLockLongArrayRateLimiter
thrpt
90
23573.282
▒ 364.739
ops/ms
RaterLimiterBenchmark.thread_1
StampLockInstantArrayRateLimiter
thrpt
90
23062.260
▒ 1035.395
ops/ms
RaterLimiterBenchmark.thread_1
SynchronizedLongArrayRateLimiter
thrpt
90
34667.411
▒ 246.003
ops/ms
RaterLimiterBenchmark.thread_1
SynchronizedInstantArrayRateLimiter
thrpt
90
36426.369
▒ 1248.360
ops/ms
RaterLimiterBenchmark.thread_10
StampLockLongArrayRateLimiter
thrpt
90
13592.158
▒ 76.319
ops/ms
RaterLimiterBenchmark.thread_10
StampLockInstantArrayRateLimiter
thrpt
90
14564.306
▒ 474.613
ops/ms
RaterLimiterBenchmark.thread_10
SynchronizedLongArrayRateLimiter
thrpt
90
13524.610
▒ 155.850
ops/ms
RaterLimiterBenchmark.thread_10
SynchronizedInstantArrayRateLimiter
thrpt
90
13080.967
▒ 309.736
ops/ms
RaterLimiterBenchmark.thread_100
StampLockLongArrayRateLimiter
thrpt
90
13224.529
▒ 459.035
ops/ms
RaterLimiterBenchmark.thread_100
StampLockInstantArrayRateLimiter
thrpt
90
13890.278
▒ 456.182
ops/ms
RaterLimiterBenchmark.thread_100
SynchronizedLongArrayRateLimiter
thrpt
90
12672.925
▒ 314.118
ops/ms
RaterLimiterBenchmark.thread_100
SynchronizedInstantArrayRateLimiter
thrpt
90
12245.120
▒ 296.395
ops/ms
有选项:-server、-XX:+UnlockDiagnosticVMOptions、-XX:+UseNUMA、-XX:+UseLWPSynchronization
Benchmark
(rateLimiterType)
Mode
Cnt
Score
Error
Units
RaterLimiterBenchmark.thread_1
StampLockLongArrayRateLimiter
thrpt
90
24842.514
▒ 372.521
ops/ms
RaterLimiterBenchmark.thread_1
StampLockInstantArrayRateLimiter
thrpt
90
24327.864
▒ 322.659
ops/ms
RaterLimiterBenchmark.thread_1
SynchronizedLongArrayRateLimiter
thrpt
90
34490.411
▒ 330.288
ops/ms
RaterLimiterBenchmark.thread_1
SynchronizedInstantArrayRateLimiter
thrpt
90
38383.257
▒ 654.269
ops/ms
RaterLimiterBenchmark.thread_10
StampLockLongArrayRateLimiter
thrpt
90
13536.284
▒ 74.613
ops/ms
RaterLimiterBenchmark.thread_10
StampLockInstantArrayRateLimiter
thrpt
90
13702.022
▒ 289.616
ops/ms
RaterLimiterBenchmark.thread_10
SynchronizedLongArrayRateLimiter
thrpt
90
12530.107
▒ 243.471
ops/ms
RaterLimiterBenchmark.thread_10
SynchronizedInstantArrayRateLimiter
thrpt
90
10795.833
▒ 158.400
ops/ms
RaterLimiterBenchmark.thread_100
StampLockLongArrayRateLimiter
thrpt
90
13204.275
▒ 200.937
ops/ms
RaterLimiterBenchmark.thread_100
StampLockInstantArrayRateLimiter
thrpt
90
11606.823
▒ 224.213
ops/ms
RaterLimiterBenchmark.thread_100
SynchronizedLongArrayRateLimiter
thrpt
90
11504.124
▒ 107.543
ops/ms
RaterLimiterBenchmark.thread_100
SynchronizedInstantArrayRateLimiter
thrpt
90
10732.451
▒ 118.753
ops/ms
无论是否启用 'UseLWPSynchronization',我都没有观察到巨大的性能差异。我遇到的问题与不稳定的 JMH 设置有关。
我想在 java 中实现一个简单的速率限制器来学习如何使用 jmh。在 'https://github.com/William1104/rate-limiter'
处创建了一个简单的 github 项目有趣的是,当启用“-XX:+UseLWPSynchronization”选项时,某些实现(使用 stamplock)的吞吐量会受到影响。基准测试是在 Windows 机器上进行的,我希望它对非 Solaris 系统没有影响。但是,测试结果显示不同。我可以知道有人可以帮助我了解到底发生了什么吗?
以下是我机器上的测试结果作为参考:
有选项: -server, -XX:+UnlockDiagnosticVMOptions, -XX:+UseNUMA
Benchmark | (rateLimiterType) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|
RaterLimiterBenchmark.thread_1 | StampLockLongArrayRateLimiter | thrpt | 90 | 21487.385 | ▒ 1082.163 | ops/ms |
RaterLimiterBenchmark.thread_1 | StampLockInstantArrayRateLimiter | thrpt | 90 | 13162.330 | ▒ 1585.555 | ops/ms |
RaterLimiterBenchmark.thread_1 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 15362.934 | ▒ 227.704 | ops/ms |
RaterLimiterBenchmark.thread_1 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 17281.675 | ▒ 2148.057 | ops/ms |
RaterLimiterBenchmark.thread_10 | StampLockLongArrayRateLimiter | thrpt | 90 | 6868.653 | ▒ 146.372 | ops/ms |
RaterLimiterBenchmark.thread_10 | StampLockInstantArrayRateLimiter | thrpt | 90 | 8189.747 | ▒ 335.517 | ops/ms |
RaterLimiterBenchmark.thread_10 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 6643.004 | ▒ 103.568 | ops/ms |
RaterLimiterBenchmark.thread_10 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 5252.975 | ▒ 190.363 | ops/ms |
RaterLimiterBenchmark.thread_100 | StampLockLongArrayRateLimiter | thrpt | 90 | 7352.890 | ▒ 2109.446 | ops/ms |
RaterLimiterBenchmark.thread_100 | StampLockInstantArrayRateLimiter | thrpt | 90 | 8675.814 | ▒ 922.653 | ops/ms |
RaterLimiterBenchmark.thread_100 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 6509.368 | ▒ 157.212 | ops/ms |
RaterLimiterBenchmark.thread_100 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 5042.867 | ▒ 192.971 | ops/ms |
有选项:-server, -XX:+UnlockDiagnosticVMOptions, -XX:+UseNUMA, -XX:+UseLWPSynchronization
Benchmark | (rateLimiterType) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|
RaterLimiterBenchmark.thread_1 | StampLockLongArrayRateLimiter | thrpt | 90 | 11383.198 | ▒ 353.921 | ops/ms |
RaterLimiterBenchmark.thread_1 | StampLockInstantArrayRateLimiter | thrpt | 90 | 11666.918 | ▒ 842.426 | ops/ms |
RaterLimiterBenchmark.thread_1 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 15696.852 | ▒ 371.078 | ops/ms |
RaterLimiterBenchmark.thread_1 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 15357.617 | ▒ 650.846 | ops/ms |
RaterLimiterBenchmark.thread_10 | StampLockLongArrayRateLimiter | thrpt | 90 | 6937.050 | ▒ 130.727 | ops/ms |
RaterLimiterBenchmark.thread_10 | StampLockInstantArrayRateLimiter | thrpt | 90 | 8268.909 | ▒ 291.471 | ops/ms |
RaterLimiterBenchmark.thread_10 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 9134.319 | ▒ 1208.998 | ops/ms |
RaterLimiterBenchmark.thread_10 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 5294.341 | ▒ 225.995 | ops/ms |
RaterLimiterBenchmark.thread_100 | StampLockLongArrayRateLimiter | thrpt | 90 | 8453.825 | ▒ 1075.312 | ops/ms |
RaterLimiterBenchmark.thread_100 | StampLockInstantArrayRateLimiter | thrpt | 90 | 16297.921 | ▒ 611.255 | ops/ms |
RaterLimiterBenchmark.thread_100 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 12536.378 | ▒ 974.951 | ops/ms |
RaterLimiterBenchmark.thread_100 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 9051.560 | ▒ 1303.856 | ops/ms |
有StampLockLongArrayRateLimiter和SynchronizedLongArrayRateLImiter的实现:
package one.williamwong.ratelimiter;
import java.time.Duration;
import java.util.Arrays;
import java.util.concurrent.locks.StampedLock;
public class StampLockLongArrayRateLimiter implements IRateLimiter {
private final long duration;
private final long[] records;
private final StampedLock lock;
private int pointer;
public StampLockLongArrayRateLimiter(int maxInvokes, Duration duration) {
this.duration = duration.toNanos();
this.records = new long[maxInvokes];
this.lock = new StampedLock();
this.pointer = 0;
}
@Override public void acquire() {
final long stamp = lock.writeLock();
try {
final long now = System.nanoTime();
if (records[pointer] != 0) {
final long awayFromHead = now - records[pointer];
if (awayFromHead < duration) {
handleExcessLimit(records.length, Duration.ofNanos(awayFromHead));
}
}
records[pointer] = now;
pointer = (pointer + 1) % records.length;
} finally {
lock.unlockWrite(stamp);
}
}
@Override public void reset() {
final long stamp = lock.writeLock();
try {
Arrays.fill(records, 0);
this.pointer = 0;
} finally {
lock.unlockWrite(stamp);
}
}
}
package one.williamwong.ratelimiter;
import java.time.Duration;
import java.util.Arrays;
public class SynchronizedLongArrayRateLimiter implements IRateLimiter {
private final long duration;
private final long[] records;
private final Object lock;
private int pointer;
public SynchronizedLongArrayRateLimiter(int maxInvokes, Duration duration) {
this.duration = duration.toNanos();
this.records = new long[maxInvokes];
this.lock = new Object();
this.pointer = 0;
}
@Override
public void acquire() {
synchronized (lock) {
final long now = System.nanoTime();
if (records[pointer] != 0) {
final long awayFromHead = now - records[pointer];
if (awayFromHead < duration) {
handleExcessLimit(records.length, Duration.ofNanos(awayFromHead));
}
}
records[pointer] = now;
pointer = (pointer + 1) % records.length;
}
}
@Override public void reset() {
synchronized (lock) {
Arrays.fill(records, 0);
this.pointer = 0;
}
}
}
感谢您的评论。我用不同的设置重新运行我的基准测试。如果我们每次迭代 1 秒再次执行 JMH,我得到以下结果:
带有选项:-server、-XX:+UnlockDiagnosticVMOptions、-XX:+UseNUMA、-XX:-UseLWPSynchronization
Benchmark | (rateLimiterType) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|
RaterLimiterBenchmark.thread_1 | StampLockLongArrayRateLimiter | thrpt | 90 | 23573.282 | ▒ 364.739 | ops/ms |
RaterLimiterBenchmark.thread_1 | StampLockInstantArrayRateLimiter | thrpt | 90 | 23062.260 | ▒ 1035.395 | ops/ms |
RaterLimiterBenchmark.thread_1 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 34667.411 | ▒ 246.003 | ops/ms |
RaterLimiterBenchmark.thread_1 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 36426.369 | ▒ 1248.360 | ops/ms |
RaterLimiterBenchmark.thread_10 | StampLockLongArrayRateLimiter | thrpt | 90 | 13592.158 | ▒ 76.319 | ops/ms |
RaterLimiterBenchmark.thread_10 | StampLockInstantArrayRateLimiter | thrpt | 90 | 14564.306 | ▒ 474.613 | ops/ms |
RaterLimiterBenchmark.thread_10 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 13524.610 | ▒ 155.850 | ops/ms |
RaterLimiterBenchmark.thread_10 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 13080.967 | ▒ 309.736 | ops/ms |
RaterLimiterBenchmark.thread_100 | StampLockLongArrayRateLimiter | thrpt | 90 | 13224.529 | ▒ 459.035 | ops/ms |
RaterLimiterBenchmark.thread_100 | StampLockInstantArrayRateLimiter | thrpt | 90 | 13890.278 | ▒ 456.182 | ops/ms |
RaterLimiterBenchmark.thread_100 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 12672.925 | ▒ 314.118 | ops/ms |
RaterLimiterBenchmark.thread_100 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 12245.120 | ▒ 296.395 | ops/ms |
有选项:-server、-XX:+UnlockDiagnosticVMOptions、-XX:+UseNUMA、-XX:+UseLWPSynchronization
Benchmark | (rateLimiterType) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|
RaterLimiterBenchmark.thread_1 | StampLockLongArrayRateLimiter | thrpt | 90 | 24842.514 | ▒ 372.521 | ops/ms |
RaterLimiterBenchmark.thread_1 | StampLockInstantArrayRateLimiter | thrpt | 90 | 24327.864 | ▒ 322.659 | ops/ms |
RaterLimiterBenchmark.thread_1 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 34490.411 | ▒ 330.288 | ops/ms |
RaterLimiterBenchmark.thread_1 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 38383.257 | ▒ 654.269 | ops/ms |
RaterLimiterBenchmark.thread_10 | StampLockLongArrayRateLimiter | thrpt | 90 | 13536.284 | ▒ 74.613 | ops/ms |
RaterLimiterBenchmark.thread_10 | StampLockInstantArrayRateLimiter | thrpt | 90 | 13702.022 | ▒ 289.616 | ops/ms |
RaterLimiterBenchmark.thread_10 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 12530.107 | ▒ 243.471 | ops/ms |
RaterLimiterBenchmark.thread_10 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 10795.833 | ▒ 158.400 | ops/ms |
RaterLimiterBenchmark.thread_100 | StampLockLongArrayRateLimiter | thrpt | 90 | 13204.275 | ▒ 200.937 | ops/ms |
RaterLimiterBenchmark.thread_100 | StampLockInstantArrayRateLimiter | thrpt | 90 | 11606.823 | ▒ 224.213 | ops/ms |
RaterLimiterBenchmark.thread_100 | SynchronizedLongArrayRateLimiter | thrpt | 90 | 11504.124 | ▒ 107.543 | ops/ms |
RaterLimiterBenchmark.thread_100 | SynchronizedInstantArrayRateLimiter | thrpt | 90 | 10732.451 | ▒ 118.753 | ops/ms |
无论是否启用 'UseLWPSynchronization',我都没有观察到巨大的性能差异。我遇到的问题与不稳定的 JMH 设置有关。