使用 JMH Java 微基准测试浮点打印的随机数据

Random data with JMH Java microbenchmark testing floating point printing

我正在为我编写的浮点打印代码编写 JMH 微基准测试。我还不太关心确切的性能,但让基准代码正确。

我想遍历一些随机生成的数据,所以我制作了一些静态数据数组并使我的循环机制(增量和掩码)尽可能简单。这是正确的方法还是我应该告诉 JMH 更多关于我遗漏的一些注释的情况?

此外,是否可以为测试创建显示组,而不仅仅是字典顺序?我基本上有两组测试(每组随机数据一组。

完整来源位于 https://github.com/jnordwick/zerog-grisu

这里是基准代码:

package zerog.util.grisu;

import java.util.Random;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

/* 
 * Current JMH bench, similar on small numbers (no fast path code yet)
 * and 40% faster on completely random numbers.
 * 
 * Benchmark                         Mode  Cnt         Score         Error  Units
 * JmhBenchmark.test_lowp_doubleto  thrpt   20  11439027.798 ± 2677191.952  ops/s
 * JmhBenchmark.test_lowp_grisubuf  thrpt   20  11540289.271 ±  237842.768  ops/s
 * JmhBenchmark.test_lowp_grisustr  thrpt   20   5038077.637 ±  754272.267  ops/s
 * 
 * JmhBenchmark.test_rand_doubleto  thrpt   20   1841031.602 ±  219147.330  ops/s
 * JmhBenchmark.test_rand_grisubuf  thrpt   20   2609354.822 ±   57551.153  ops/s
 * JmhBenchmark.test_rand_grisustr  thrpt   20   2078684.828 ±  298474.218  ops/s
 * 
 * This doens't account for any garbage costs either since the benchmarks
 * aren't generating enough to trigger GC, and Java internally uses per-thread
 * objects to avoid some allocations.
 * 
 * Don't call Grisu.doubleToString() except for testing. I think the extra
 * allocations and copying are killing it. I'll fix that.
 */

public class JmhBenchmark {

    static final int nmask = 1024*1024 - 1;
    static final double[] random_values = new double[nmask + 1];
    static final double[] lowp_values = new double[nmask + 1];

    static final byte[] buffer = new byte[30];
    static final byte[] bresults = new byte[30];

    static int i = 0;
    static final Grisu g = Grisu.fmt;

    static {

        Random r = new Random();
        int[] pows = new int[] { 1, 10, 100, 1000, 10000, 100000, 1000000 };

        for( int i = 0; i < random_values.length; ++i ) {
            random_values[i] = r.nextDouble();
        }

        for(int i = 0; i < lowp_values.length; ++i ) {
            lowp_values[i] = (1 + r.nextInt( 10000 )) / pows[r.nextInt( pows.length )];
        }
    }

    @Benchmark
    public String test_rand_doubleto() {
        String s = Double.toString( random_values[i] );
        i = (i + 1) & nmask;
        return s;
    }

    @Benchmark
    public String test_lowp_doubleto() {
        String s = Double.toString( lowp_values[i] );
        i = (i + 1) & nmask;
        return s;
    }

    @Benchmark
    public String test_rand_grisustr() {
        String s =  g.doubleToString( random_values[i] );
        i = (i + 1) & nmask;
        return s;
    }

    @Benchmark
    public String test_lowp_grisustr() {
        String s =  g.doubleToString( lowp_values[i] );
        i = (i + 1) & nmask;
        return s;
    }

    @Benchmark
    public byte[] test_rand_grisubuf() {
        g.doubleToBytes( bresults, 0, random_values[i] );
        i = (i + 1) & nmask;
        return bresults;
    }

    @Benchmark
    public byte[] test_lowp_grisubuf() {
        g.doubleToBytes( bresults, 0, lowp_values[i] );
        i = (i + 1) & nmask;
        return bresults;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(".*" + JmhBenchmark.class.getSimpleName() + ".*")
                .warmupIterations(20)
                .measurementIterations(20)
                .forks(1)
                .build();

        new Runner(opt).run();
    }
}

很遗憾,您没有正确测量。尽管您尝试添加一些随机控制流,但 JVM 有很多机会优化您的代码,因为它是相当可预测的。例如:

String s = Double.toString( random_values[i] );
i = (i + 1) & nmask;
return s;

random_valuesstatic final 字段中的固定数组。由于 i 的递增是相当直接的,因此在最坏的情况下可以完全确定其值,以便简单地设置 si 是动态的,但它并没有真正转义,而 nmask 又是确定性的。 JVM 仍然可以在这里优化代码,但我可以在不查看程序集的情况下告诉你具体是什么。

相反,为您的值使用非最终实例字段,将 @State 注释添加到您的 class 并在使用 @Setup 注释的方法中设置您的测试。如果您这样做,JMH 会采取措施正确地转义您的状态,以防止 JVM 在面对确定性值时进行优化。

您只能通过分析其结果来证明基准测试是正确的。基准代码只能引发您必须跟进的危险信号。我在您的代码中看到了这些危险信号:

  1. 依靠 static final 字段来存储状态。这些字段的内容通常会 "inlined" 进入计算,使您的基准测试部分无效。 JMH 仅使您免于对 @State 个对象中的常规字段进行常量折叠。

  2. 使用 static 初始值设定项。虽然这对当前的 JMH 没有影响,但预期的方法是使用 @Setup 方法来初始化状态。对于您的情况,它还有助于获得真正随机的数据点,例如如果您设置 @Setup(Level.Iteration) 以在开始下一次测试迭代之前重新初始化值。

就一般方法而言,这是实现安全循环的方法之一:将循环计数器放在方法之外。还有另一个可以说是安全的方法:在方法中循环遍历数组,但将每次迭代结果放入 Blackhole.consume.

我认为根据 Aleksey and Rafael 的建议展示一个实现会很有帮助。

密钥变化:

  • 向所有基准测试提供同一组随机数据。 这是通过将数据集序列化为临时文件,通过 @Param 机制提供 setup() 方法的路径,然后将数据反序列化为实例字段来实现的。

  • 每个基准 运行 针对整个数据集的方法。我们使用 operationsPerInvocation 功能来获得准确的时间。

  • 所有运算的结果通过黑洞机制消耗

我创建了两个例子,一个基于原题使用Serializable数据集class可以直接使用,另一个测试大家最喜欢的不可序列化class, Optional.

如果 Aleksey 或 Rafael(或任何人)有任何建议,他们将不胜感激。

Serializable个数据集。

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Comparator;
import java.util.Random;
import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

/**
 * In this example each benchmark loops over the entire randomly generated data set.
 * The same data set is used for all benchmarks.
 * And we black hole the results.
 */
@SuppressWarnings("javadoc")
@State(Scope.Benchmark)
public class JmhBenchmark {

    static final int DATA_SET_SAMPLE_SIZE = 1024 * 1024;

    static final Random RANDOM = new Random();

    static final Grisu g = Grisu.fmt;

    double[] random_values;

    double[] lowp_values;

    byte[] bresults;

    @Param("dataSetFilename")
    String dataSetFilename;

    @Setup
    public void setup() throws FileNotFoundException, IOException, ClassNotFoundException {

        try (FileInputStream fis = new FileInputStream(new File(this.dataSetFilename));
                ObjectInputStream ois = new ObjectInputStream(fis)) {

            final DataSet dataSet = (DataSet) ois.readObject();

            this.random_values = dataSet.random_values;
            this.lowp_values = dataSet.lowp_values;
        }

        this.bresults = new byte[30];
    }

    @Benchmark
    public void test_rand_doubleto(final Blackhole bh) {

        for (double random_value : this.random_values) {

            bh.consume(Double.toString(random_value));
        }
    }

    @Benchmark
    public void test_lowp_doubleto(final Blackhole bh) {

        for (double lowp_value : this.lowp_values) {

            bh.consume(Double.toString(lowp_value));
        }
    }

    @Benchmark
    public void test_rand_grisustr(final Blackhole bh) {

        for (double random_value : this.random_values) {

            bh.consume(g.doubleToString(random_value));
        }
    }

    @Benchmark
    public void test_lowp_grisustr(final Blackhole bh) {

        for (double lowp_value : this.lowp_values) {

            bh.consume(g.doubleToString(lowp_value));
        }
    }

    @Benchmark
    public void test_rand_grisubuf(final Blackhole bh) {

        for (double random_value : this.random_values) {

            bh.consume(g.doubleToBytes(this.bresults, 0, random_value));
        }
    }

    @Benchmark
    public void test_lowp_grisubuf(final Blackhole bh) {

        for (double lowp_value : this.lowp_values) {

            bh.consume(g.doubleToBytes(this.bresults, 0, lowp_value));
        }
    }

    /**
     * Serializes an object containing random data. This data will be the same for all benchmarks.
     * We pass the file name via the "dataSetFilename" parameter.
     *
     * @param args the arguments
     */
    public static void main(final String[] args) {

        try {
            // clean up any old runs as data set files can be large
            deleteTmpDirs(JmhBenchmark.class.getSimpleName());

            // create a tempDir for the benchmark
            final Path tempDirPath = createTempDir(JmhBenchmark.class.getSimpleName());

            // create a data set file
            final Path dateSetFilePath = Files.createTempFile(tempDirPath,
                    JmhBenchmark.class.getSimpleName() + "DataSet", ".ser");
            final File dateSetFile = dateSetFilePath.toFile();
            dateSetFile.deleteOnExit();

            // create the data
            final DataSet dataset = new DataSet();

            try (FileOutputStream fos = new FileOutputStream(dateSetFile);
                    ObjectOutputStream oos = new ObjectOutputStream(fos)) {
                oos.writeObject(dataset);
                oos.flush();
                oos.close();
            }

            final Options opt = new OptionsBuilder().include(JmhBenchmark.class.getSimpleName())
                .param("dataSetFilename", dateSetFile.getAbsolutePath())
                .operationsPerInvocation(DATA_SET_SAMPLE_SIZE)
                .mode(org.openjdk.jmh.annotations.Mode.All)
                .timeUnit(TimeUnit.MICROSECONDS)
                .forks(1)
                .build();

            new Runner(opt).run();

        } catch (final Exception e) {
            System.err.println(e.getMessage());
            e.printStackTrace();
            throw new RuntimeException(e);
        }

    }

    static Path createTempDir(String prefix) throws IOException {
        final Path tempDirPath = Files.createTempDirectory(prefix);
        tempDirPath.toFile()
            .deleteOnExit();
        return tempDirPath;
    }

    static void deleteTmpDirs(final String prefix) throws IOException {

        for (Path dir : Files.newDirectoryStream(new File(System.getProperty("java.io.tmpdir")).toPath(),
                prefix + "*")) {
            for (Path toDelete : Files.walk(dir)
                .sorted(Comparator.reverseOrder())
                .toArray(Path[]::new)) {
                Files.delete(toDelete);
            }
        }
    }

    static final class DataSet implements Serializable {

        private static final long serialVersionUID = 2194487667134930491L;

        private static final int[] pows = new int[] { 1, 10, 100, 1000, 10000, 100000, 1000000 };

        final double[] random_values = new double[DATA_SET_SAMPLE_SIZE];

        final double[] lowp_values = new double[DATA_SET_SAMPLE_SIZE];

        DataSet() {

            for (int i = 0; i < DATA_SET_SAMPLE_SIZE; i++) {
                this.random_values[i] = RANDOM.nextDouble();
            }

            for (int i = 0; i < DATA_SET_SAMPLE_SIZE; i++) {
                this.lowp_values[i] = (1 + RANDOM.nextInt(10000)) / pows[RANDOM.nextInt(pows.length)];
            }
        }

    }
}

带有不可序列化的测试对象(Optional

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Comparator;
import java.util.List;
import java.util.Optional;
import java.util.Random;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@SuppressWarnings("javadoc")
@State(Scope.Benchmark)
public class NonSerializable {

    static final int DATA_SET_SAMPLE_SIZE = 20000;

    static final Random RANDOM = new Random();

    Optional<Integer>[] optionals;

    @Param("dataSetFilename")
    String dataSetFilename;

    @Setup
    public void setup() throws FileNotFoundException, IOException, ClassNotFoundException {

        try (FileInputStream fis = new FileInputStream(new File(this.dataSetFilename));
                ObjectInputStream ois = new ObjectInputStream(fis)) {

            @SuppressWarnings("unchecked")
            List<Integer> strings = (List<Integer>) ois.readObject();

            this.optionals = strings.stream()
                .map(Optional::ofNullable)
                .toArray(Optional[]::new);
        }

    }

    @Benchmark
    public void mapAndIfPresent(final Blackhole bh) {

        for (int i = 0; i < this.optionals.length; i++) {

            this.optionals[i].map(integer -> integer.toString())
                .ifPresent(bh::consume);
        }
    }

    @Benchmark
    public void explicitGet(final Blackhole bh) {

        for (int i = 0; i < this.optionals.length; i++) {

            final Optional<Integer> optional = this.optionals[i];

            if (optional.isPresent()) {
                bh.consume(optional.get()
                    .toString());
            }
        }
    }

    /**
     * Serializes a list of integers containing random data or null. This data will be the same for all benchmarks.
     * We pass the file name via the "dataSetFilename" parameter.
     *
     * @param args the arguments
     */
    public static void main(final String[] args) {

        try {
            // clean up any old runs as data set files can be large
            deleteTmpDirs(NonSerializable.class.getSimpleName());

            // create a tempDir for the benchmark
            final Path tempDirPath = createTempDir(NonSerializable.class.getSimpleName());

            // create a data set file
            final Path dateSetFilePath = Files.createTempFile(tempDirPath,
                    NonSerializable.class.getSimpleName() + "DataSet", ".ser");
            final File dateSetFile = dateSetFilePath.toFile();
            dateSetFile.deleteOnExit();

            final List<Integer> dataSet = IntStream.range(0, DATA_SET_SAMPLE_SIZE)
                .mapToObj(i -> RANDOM.nextBoolean() ? RANDOM.nextInt() : null)
                .collect(Collectors.toList());

            try (FileOutputStream fos = new FileOutputStream(dateSetFile);
                    ObjectOutputStream oos = new ObjectOutputStream(fos)) {
                oos.writeObject(dataSet);
                oos.flush();
                oos.close();
            }

            final Options opt = new OptionsBuilder().include(NonSerializable.class.getSimpleName())
                .param("dataSetFilename", dateSetFile.getAbsolutePath())
                .operationsPerInvocation(DATA_SET_SAMPLE_SIZE)
                .mode(org.openjdk.jmh.annotations.Mode.All)
                .timeUnit(TimeUnit.MICROSECONDS)
                .forks(1)
                .build();

            new Runner(opt).run();

        } catch (final Exception e) {
            System.err.println(e.getMessage());
            e.printStackTrace();
            throw new RuntimeException(e);
        }

    }

    static Path createTempDir(String prefix) throws IOException {
        final Path tempDirPath = Files.createTempDirectory(prefix);
        tempDirPath.toFile()
            .deleteOnExit();
        return tempDirPath;
    }

    static void deleteTmpDirs(final String prefix) throws IOException {

        for (Path dir : Files.newDirectoryStream(new File(System.getProperty("java.io.tmpdir")).toPath(),
                prefix + "*")) {
            for (Path toDelete : Files.walk(dir)
                .sorted(Comparator.reverseOrder())
                .toArray(Path[]::new)) {
                Files.delete(toDelete);
            }
        }
    }

}