JMH - 为什么我需要 Blackhole.consumeCPU()

JMH - why do I need Blackhole.consumeCPU()

我试图理解为什么使用 Blackhole.consumeCPU() ?

是明智的

我在 Google 上发现了一些关于 Blackhole.consumeCPU() 的信息 -->

Sometimes when we run run a benchmark across multiple threads we also want to burn some cpu cycles to simulate CPU business when running our code. This can't be a Thread.sleep as we really want to burn cpu. The Blackhole.consumeCPU(long) gives us the capability to do this.

我的示例代码:

import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@State(Scope.Thread)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class StringConcatAvgBenchmark {

StringBuilder stringBuilder1;
StringBuilder stringBuilder2;

StringBuffer stringBuffer1;
StringBuffer stringBuffer2;

String string1;
String string2;

/*
 * re-initializing the value after every iteration
 */
@Setup(Level.Iteration)
public void init() {
    stringBuilder1 = new StringBuilder("foo");
    stringBuilder2 = new StringBuilder("bar");

    stringBuffer1 = new StringBuffer("foo");
    stringBuffer2 = new StringBuffer("bar");

    string1 = new String("foo");
    string2 = new String("bar");

}

@Benchmark
@Warmup(iterations = 10)
@Measurement(iterations = 100)
@BenchmarkMode(Mode.AverageTime)
public StringBuilder stringBuilder() {
    // operation is very thin and so consuming some CPU
    Blackhole.consumeCPU(100);
    return stringBuilder1.append(stringBuilder2);
    // to avoid dead code optimization returning the value
}

@Benchmark
@Warmup(iterations = 10)
@Measurement(iterations = 100)
@BenchmarkMode(Mode.AverageTime)
public StringBuffer stringBuffer() {
    Blackhole.consumeCPU(100);      
    // to avoid dead code optimization returning the value
    return stringBuffer1.append(stringBuffer2);
}

@Benchmark
@Warmup(iterations = 10)
@Measurement(iterations = 100)
@BenchmarkMode(Mode.AverageTime)
public String stringPlus() {
    Blackhole.consumeCPU(100);      
    return string1 + string2;
}

@Benchmark
@Warmup(iterations = 10)
@Measurement(iterations = 100)
@BenchmarkMode(Mode.AverageTime)
public String stringConcat() {
    Blackhole.consumeCPU(100);      
    // to avoid dead code optimization returning the value
    return string1.concat(string2);
}

public static void main(String[] args) throws RunnerException {

    Options options = new OptionsBuilder()
            .include(StringConcatAvgBenchmark.class.getSimpleName())
            .threads(1).forks(1).shouldFailOnError(true).shouldDoGC(true)
            .jvmArgs("-server").build();
    new Runner(options).run();
}
}

为什么 blackhole.consumeCPU(100) 这个基准测试的结果更好?

编辑:

输出 blackhole.consumeCPU(100):

Benchmark                      Mode  Cnt    Score    Error  Units
StringBenchmark.stringBuffer   avgt   10  398,843 ± 38,666  ns/op
StringBenchmark.stringBuilder  avgt   10  387,543 ± 40,087  ns/op
StringBenchmark.stringConcat   avgt   10  410,256 ± 33,194  ns/op
StringBenchmark.stringPlus     avgt   10  386,472 ± 21,704  ns/op

输出 blackhole.consumeCPU(100):

Benchmark                      Mode  Cnt   Score    Error  Units
StringBenchmark.stringBuffer   avgt   10  51,225 ± 19,254  ns/op
StringBenchmark.stringBuilder  avgt   10  49,548 ±  4,126  ns/op
StringBenchmark.stringConcat   avgt   10  50,373 ±  1,408  ns/op
StringBenchmark.stringPlus     avgt   10  87,942 ±  1,701  ns/op

我的问题是为什么这段代码的作者在这里使用 blackhole.consumeCPU(100)

我想我现在知道为什么了,因为基准测试速度太快了。

使用 blackhole.consumeCPU(100),您可以更好地衡量每个基准并获得更重要的结果。

对吗?

添加人为延迟通常不会提高基准。

但是,在某些情况下,您正在测量的操作正在争夺某些资源,您需要一个只消耗 CPU 并且希望不做任何其他事情的退避。参见例如案例在: http://shipilev.net/blog/2014/nanotrusting-nanotime/

原始问题中的基准不是这种情况,因此我推测 Blackhole.consumeCPU 没有充分理由在那里使用,或者至少在评论中没有具体指出这个原因。不要那样做。