此基准比较 Java 8u40 和 8u31 以乘以 BigInteger 值是否正确?

Is this benchmark comparing Java 8u40 and 8u31 for multiplying BigInteger values correct?

我看到 JDK-8055494 : Add C2 x86 intrinsic for BigInteger::multiplyToLen() method 已修复 8u40 更新,但我没想到它会在 2-3 倍内加速 BigInteger 乘法。

以下是在两个 JVM 更新上通过不同公式计算阶乘的基准测试结果:

8u31

[info] Benchmark                             (n)   Mode  Cnt       Score       Error   Units
[info] JavaFactorial.recursion              1000  thrpt    5      13.994 ±     0.175  ops/ms
[info] JavaFactorial.recursion             10000  thrpt    5       0.202 ±     0.054  ops/ms
[info] JavaFactorial.recursionPar           1000  thrpt    5      12.066 ±     8.011  ops/ms
[info] JavaFactorial.recursionPar          10000  thrpt    5       0.253 ±     0.055  ops/ms
[info] JavaFactorial.split                  1000  thrpt    5      18.255 ±     2.656  ops/ms
[info] JavaFactorial.split                 10000  thrpt    5       0.286 ±     0.063  ops/ms

8u40

[info] Benchmark                             (n)   Mode  Cnt       Score       Error   Units
[info] JavaFactorial.recursion              1000  thrpt    5      33.704 ±     0.445  ops/ms
[info] JavaFactorial.recursion             10000  thrpt    5       0.428 ±     0.199  ops/ms
[info] JavaFactorial.recursionPar           1000  thrpt    5      38.170 ±     0.433  ops/ms
[info] JavaFactorial.recursionPar          10000  thrpt    5       0.557 ±     0.030  ops/ms
[info] JavaFactorial.split                  1000  thrpt    5      46.447 ±    11.582  ops/ms
[info] JavaFactorial.split                 10000  thrpt    5       0.586 ±     0.154  ops/ms

我是否应该期待这个 JDK 改进导致 运行 my code 更快,或者这是另一个随意的基准?

编辑

如何验证我的发现以证明这种加速是由于 C2 固有的?

测试函数代码:

@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
public class JavaFactorial {
    @Param({"10", "100", "1000", "10000"})
    public int n;

    private static ForkJoinPool pool = new ForkJoinPool();

    @Benchmark
    public BigInteger loop() {
        return n > 20 ? loop(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger recursion() {
        return n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger recursionPar() {
        return n > 20 ? recursePar(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger split() {
        return n > 180 ? split(n) : (n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n)));
    }

    private long fastLoop(final int n1, int n2) {
        long p = n1;
        while (n2 > n1) {
            p = p * n2;
            n2--;
        }
        return p;
    }

    private BigInteger loop(int n1, final int n2) {
        final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
        long p = 1;
        BigInteger r = BigInteger.ONE;
        while (n1 <= n2) {
            if (p <= l) {
                p *= n1;
            } else {
                r = r.multiply(BigInteger.valueOf(p));
                p = n1;
            }
            n1++;
        }
        return r.multiply(BigInteger.valueOf(p));
    }

    private BigInteger recursion(final int n1, final int n2) {
        if (n2 - n1 < 65) {
            return loop(n1, n2);
        }
        final int nm = (n1 + n2) >> 1;
        return recursion(nm + 1, n2).multiply(recursion(n1, nm));
    }

    private BigInteger recursePar(final int n1, final int n2) {
        if (n2 - n1 < 700) {
            return recursion(n1, n2);
        }
        final int nm = (n1 + n2) >> 1;
        RecursiveTask<BigInteger> t = new RecursiveTask<BigInteger>() {
            protected BigInteger compute() {
                return recursePar(nm + 1, n2);
            }
        };
        if (ForkJoinTask.getPool() == pool) {
            t.fork();
        } else {
            pool.execute(t);
        }
        return recursePar(n1, nm).multiply(t.join());
    }

    private BigInteger loop2(int n1, final int n2) {
        final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
        long p = 1;
        BigInteger r = BigInteger.ONE;
        while (n1 <= n2) {
            if (p <= l) {
                p *= n1;
            } else {
                r = r.multiply(BigInteger.valueOf(p));
                p = n1;
            }
            n1 += 2;
        }
        return r.multiply(BigInteger.valueOf(p));
    }

    private BigInteger recursion2(final int n1, final int n2) {
        if (n2 - n1 < 65) {
            return loop2(n1, n2);
        }
        final int nm = ((n1 + n2) >> 1) | 1;
        return recursion2(nm, n2).multiply(recursion2(n1, nm - 2));
    }

    private BigInteger split(int n) {
        int i = 31 - Integer.numberOfLeadingZeros(n), s = -n, o = 1;
        BigInteger p = BigInteger.ONE, r = BigInteger.ONE;
        while (i >= 0) {
            int h = n >> i;
            int o1 = (h - 1) | 1;
            if (o < o1) {
                p = p.multiply(recursion2(o + 2, o1));
                r = r.multiply(p);
            }
            o = o1;
            s += h;
            i--;
        }
        return r.shiftLeft(s);
    }
}

这个基准肯定表明任何大量使用 BigInteger 乘法的代码都会受益。

你问题的真正答案是为你自己的代码编写一个基准测试,运行 它在 8u318u40 下。 希望您的代码更快,因此对您的代码进行基准测试并对其进行测试。如果 JRE 代码或其他人的代码速度更快,则表明您的代码可能会受益,但您必须尝试看看。

在调查了可用的 JVM 选项列表后,我发现添加到 8u40 的选项可以完全切换 on/off 所需的内在。

以下是 -XX:-UseMultiplyToLenIntrinsic 的 8u40 结果与 8u31 的结果非常接近:

    [info] JavaFactorial.recursion              1000  thrpt    5      12.521 ±      3.352  ops/ms
    [info] JavaFactorial.recursion             10000  thrpt    5       0.217 ±      0.069  ops/ms
    [info] JavaFactorial.recursionPar           1000  thrpt    5      14.268 ±      8.319  ops/ms
    [info] JavaFactorial.recursionPar          10000  thrpt    5       0.286 ±      0.015  ops/ms
    [info] JavaFactorial.split                  1000  thrpt    5      18.768 ±      4.321  ops/ms
    [info] JavaFactorial.split                 10000  thrpt    5       0.255 ±      0.076  ops/ms