java 从列表中找到平均值的微基准

java micro benchmark to find average from list

我有一些包含不同字符串的文件(大约 100.000 个来自产品)。需要找出处理该文件中每个字符串的函数的 99%、99.9%。

我尝试使用 jmh 编写基准测试。但是,我只能找到批处理函数(处理整个文件)或仅具有一个特定字符串的所需函数所需的百分位数。

public String process1(String str){
    ...process...
}

public String processBatch(List<String> strings){
    for (String str: strings){
        process1(str)
    }
}

此外,我尝试通过@param 设置整个字符串列表。这使得 jmh 对每个字符串进行 运行 数十次迭代,但找不到所需的结果。

jmh 中是否有任何可以帮助查找所需统计信息的内容?如果没有,可以使用什么工具?

是您要找的吗?

@Warmup(iterations = 1, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 1, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class MyBenchmark {

    ClassUnderBenchmark classUnderBenchmark = new ClassUnderBenchmark();

    @State(Scope.Benchmark)
    public static class MyTestState {

        int counter = 0;
        List<String> list = Arrays.asList("aaaaa", "bbbb", "ccc");
        String currentString;

        @Setup(Level.Invocation)
        public void init() throws IOException {
            this.currentString = list.get(counter++);
            if (counter == 3) {
                counter = 0;
            }
        }
    }

    @Benchmark
    @Threads(1)
    @BenchmarkMode(Mode.SampleTime)
    public void test(MyBenchmark.MyTestState myTestState) {
        classUnderBenchmark.toUpper(myTestState.currentString);
    }

    public static class ClassUnderBenchmark {

        Random r = new Random();

        public String toUpper(String name) {
            try {
                Thread.sleep(r.nextInt(100));
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            return name.toUpperCase();
        }
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(MyBenchmark.class.getSimpleName())
                .jvmArgs("-XX:+UseG1GC", "-XX:MaxGCPauseMillis=50")
                .build();
        new Runner(opt).run();
    }
}

请参阅 javadoc (org.openjdk.jmh.annotations.Mode):

/**
 * <p>Sample time: samples the time for each operation.</p>
 *
 * <p>Runs by continuously calling {@link Benchmark} methods,
 * and randomly samples the time needed for the call. This mode automatically adjusts the sampling
 * frequency, but may omit some pauses which missed the sampling measurement. This mode is time-based, and it will
 * run until the iteration time expires.</p>
 */
SampleTime("sample", "Sampling time"),

这个测试会给你输出:

Result "test":

  N = 91
  mean =      0,056 ±(99.9%) 0,010 s/op

  Histogram, s/op:
    [0,000, 0,010) = 6 
    [0,010, 0,020) = 9 
    [0,020, 0,030) = 3 
    [0,030, 0,040) = 11 
    [0,040, 0,050) = 8 
    [0,050, 0,060) = 11 
    [0,060, 0,070) = 9 
    [0,070, 0,080) = 9 
    [0,080, 0,090) = 14 

  Percentiles, s/op:
      p(0,0000) =      0,003 s/op
     p(50,0000) =      0,059 s/op
     p(90,0000) =      0,092 s/op
     p(95,0000) =      0,095 s/op
     p(99,0000) =      0,100 s/op
     p(99,9000) =      0,100 s/op
     p(99,9900) =      0,100 s/op
     p(99,9990) =      0,100 s/op
     p(99,9999) =      0,100 s/op
    p(100,0000) =      0,100 s/op


Benchmark           Mode  Cnt  Score   Error  Units
MyBenchmark.test  sample   91  0,056 ± 0,010   s/op