多线程环境中的基准测试
Bench Mark in Multi threaded environment
我正在学习多线程,发现 Object.hashCode
在多线程环境中速度变慢,因为它计算默认哈希码的时间是 运行 4 threads
vs 的两倍1 thread
对于相同数量的对象。
但据我了解,并行执行此操作应该花费类似的时间。
您可以更改线程数。每个线程都有相同的工作量,因此您希望我的四核机器上的 运行 4 个线程可能与 运行 单个线程花费的时间大致相同。
我看到 4 倍大约 2.3 秒,1 倍 0.9 秒。
我的理解有什么不足之处,请帮助我理解这种行为。
public class ObjectHashCodePerformance {
private static final int THREAD_COUNT = 4;
private static final int ITERATIONS = 20000000;
public static void main(final String[] args) throws Exception {
long start = System.currentTimeMillis();
new ObjectHashCodePerformance().run();
System.err.println(System.currentTimeMillis() - start);
}
private final ExecutorService _sevice = Executors.newFixedThreadPool(THREAD_COUNT,
new ThreadFactory() {
private final ThreadFactory _delegate = Executors.defaultThreadFactory();
@Override
public Thread newThread(final Runnable r) {
Thread thread = _delegate.newThread(r);
thread.setDaemon(true);
return thread;
}
});
private void run() throws Exception {
Callable<Void> work = new java.util.concurrent.Callable<Void>() {
@Override
public Void call() throws Exception {
for (int i = 0; i < ITERATIONS; i++) {
Object object = new Object();
object.hashCode();
}
return null;
}
};
@SuppressWarnings("unchecked")
Callable<Void>[] allWork = new Callable[THREAD_COUNT];
Arrays.fill(allWork, work);
List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
for (Future<Void> future : futures) {
future.get();
}
}
}
对于线程数 4 输出是
~2.3 seconds
对于线程数 1,输出为
~.9 seconds
查看 Palamino 的评论:
您不是在测量 hashCode(),而是在单线程时测量 2000 万个对象的实例化,在 运行 4 线程时测量 8000 万个对象的实例化。将新的 Object() 逻辑移出 Callable 中的 for 循环,然后您将测量 hashCode() – Palamino
我创建了一个简单的 JMH 基准来测试各种情况:
@Fork(1)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
public class HashCodeBenchmark {
private final Object object = new Object();
@Benchmark
@Threads(1)
public void singleThread(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(2)
public void twoThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(4)
public void fourThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(8)
public void eightThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
}
结果如下:
Benchmark Mode Cnt Score Error Units
HashCodeBenchmark.eightThreads avgt 10 5.710 ± 0.087 ns/op
HashCodeBenchmark.fourThreads avgt 10 3.603 ± 0.169 ns/op
HashCodeBenchmark.singleThread avgt 10 3.063 ± 0.011 ns/op
HashCodeBenchmark.twoThreads avgt 10 3.067 ± 0.034 ns/op
所以我们可以看到,只要线程数不多于内核数,每个哈希码的时间就保持不变。
PS:正如@Tom Cools 所评论的那样 - 您正在测量分配速度,而不是测试中的 hashCode() 速度。
我发现代码有两个问题:
- allWork[] 数组的大小等于 ITERATIONS。
- 并且在迭代时,在 call() 方法中确保每个线程都获得其负载份额。 ITERATIONS/THREAD_COUNT.
以下是您可以尝试的修改版本:
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadFactory;
public class ObjectHashCodePerformance {
private static final int THREAD_COUNT = 1;
private static final int ITERATIONS = 20000;
private final Object object = new Object();
public static void main(final String[] args) throws Exception {
long start = System.currentTimeMillis();
new ObjectHashCodePerformance().run();
System.err.println(System.currentTimeMillis() - start);
}
private final ExecutorService _sevice = Executors.newFixedThreadPool(THREAD_COUNT,
new ThreadFactory() {
private final ThreadFactory _delegate = Executors.defaultThreadFactory();
@Override
public Thread newThread(final Runnable r) {
Thread thread = _delegate.newThread(r);
thread.setDaemon(true);
return thread;
}
});
private void run() throws Exception {
Callable<Void> work = new java.util.concurrent.Callable<Void>() {
@Override
public Void call() throws Exception {
for (int i = 0; i < ITERATIONS/THREAD_COUNT; i++) {
object.hashCode();
}
return null;
}
};
@SuppressWarnings("unchecked")
Callable<Void>[] allWork = new Callable[ITERATIONS];
Arrays.fill(allWork, work);
List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
System.out.println("Futures size : " + futures.size());
for (Future<Void> future : futures) {
future.get();
}
}
}
我正在学习多线程,发现 Object.hashCode
在多线程环境中速度变慢,因为它计算默认哈希码的时间是 运行 4 threads
vs 的两倍1 thread
对于相同数量的对象。
但据我了解,并行执行此操作应该花费类似的时间。
您可以更改线程数。每个线程都有相同的工作量,因此您希望我的四核机器上的 运行 4 个线程可能与 运行 单个线程花费的时间大致相同。
我看到 4 倍大约 2.3 秒,1 倍 0.9 秒。
我的理解有什么不足之处,请帮助我理解这种行为。
public class ObjectHashCodePerformance {
private static final int THREAD_COUNT = 4;
private static final int ITERATIONS = 20000000;
public static void main(final String[] args) throws Exception {
long start = System.currentTimeMillis();
new ObjectHashCodePerformance().run();
System.err.println(System.currentTimeMillis() - start);
}
private final ExecutorService _sevice = Executors.newFixedThreadPool(THREAD_COUNT,
new ThreadFactory() {
private final ThreadFactory _delegate = Executors.defaultThreadFactory();
@Override
public Thread newThread(final Runnable r) {
Thread thread = _delegate.newThread(r);
thread.setDaemon(true);
return thread;
}
});
private void run() throws Exception {
Callable<Void> work = new java.util.concurrent.Callable<Void>() {
@Override
public Void call() throws Exception {
for (int i = 0; i < ITERATIONS; i++) {
Object object = new Object();
object.hashCode();
}
return null;
}
};
@SuppressWarnings("unchecked")
Callable<Void>[] allWork = new Callable[THREAD_COUNT];
Arrays.fill(allWork, work);
List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
for (Future<Void> future : futures) {
future.get();
}
}
}
对于线程数 4 输出是
~2.3 seconds
对于线程数 1,输出为
~.9 seconds
查看 Palamino 的评论:
您不是在测量 hashCode(),而是在单线程时测量 2000 万个对象的实例化,在 运行 4 线程时测量 8000 万个对象的实例化。将新的 Object() 逻辑移出 Callable 中的 for 循环,然后您将测量 hashCode() – Palamino
我创建了一个简单的 JMH 基准来测试各种情况:
@Fork(1)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
public class HashCodeBenchmark {
private final Object object = new Object();
@Benchmark
@Threads(1)
public void singleThread(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(2)
public void twoThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(4)
public void fourThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(8)
public void eightThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
}
结果如下:
Benchmark Mode Cnt Score Error Units
HashCodeBenchmark.eightThreads avgt 10 5.710 ± 0.087 ns/op
HashCodeBenchmark.fourThreads avgt 10 3.603 ± 0.169 ns/op
HashCodeBenchmark.singleThread avgt 10 3.063 ± 0.011 ns/op
HashCodeBenchmark.twoThreads avgt 10 3.067 ± 0.034 ns/op
所以我们可以看到,只要线程数不多于内核数,每个哈希码的时间就保持不变。
PS:正如@Tom Cools 所评论的那样 - 您正在测量分配速度,而不是测试中的 hashCode() 速度。
我发现代码有两个问题:
- allWork[] 数组的大小等于 ITERATIONS。
- 并且在迭代时,在 call() 方法中确保每个线程都获得其负载份额。 ITERATIONS/THREAD_COUNT.
以下是您可以尝试的修改版本:
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadFactory;
public class ObjectHashCodePerformance {
private static final int THREAD_COUNT = 1;
private static final int ITERATIONS = 20000;
private final Object object = new Object();
public static void main(final String[] args) throws Exception {
long start = System.currentTimeMillis();
new ObjectHashCodePerformance().run();
System.err.println(System.currentTimeMillis() - start);
}
private final ExecutorService _sevice = Executors.newFixedThreadPool(THREAD_COUNT,
new ThreadFactory() {
private final ThreadFactory _delegate = Executors.defaultThreadFactory();
@Override
public Thread newThread(final Runnable r) {
Thread thread = _delegate.newThread(r);
thread.setDaemon(true);
return thread;
}
});
private void run() throws Exception {
Callable<Void> work = new java.util.concurrent.Callable<Void>() {
@Override
public Void call() throws Exception {
for (int i = 0; i < ITERATIONS/THREAD_COUNT; i++) {
object.hashCode();
}
return null;
}
};
@SuppressWarnings("unchecked")
Callable<Void>[] allWork = new Callable[ITERATIONS];
Arrays.fill(allWork, work);
List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
System.out.println("Futures size : " + futures.size());
for (Future<Void> future : futures) {
future.get();
}
}
}