为什么用于斐波那契计算的简单 Scala tailrec 循环比 Java 循环快 3 倍?
Why simple Scala tailrec loop for fibonacci calculation is faster in 3x times than Java loop?
斯卡拉
代码:
@annotation.tailrec
private def fastLoop(n: Int, a: Long = 0, b: Long = 1): Long =
if (n > 1) fastLoop(n - 1, b, a + b) else b
字节码:
private long fastLoop(int, long, long);
Code:
0: iload_1
1: iconst_1
2: if_icmple 21
5: iload_1
6: iconst_1
7: isub
8: lload 4
10: lload_2
11: lload 4
13: ladd
14: lstore 4
16: lstore_2
17: istore_1
18: goto 0
21: lload 4
23: lreturn
结果是 53879289.462 ± 6289454.961 ops/s
:
https://travis-ci.org/plokhotnyuk/scala-vs-java/jobs/56117116#L2909
Java
代码:
private long fastLoop(int n, long a, long b) {
while (n > 1) {
long c = a + b;
a = b;
b = c;
n--;
}
return b;
}
字节码:
private long fastLoop(int, long, long);
Code:
0: iload_1
1: iconst_1
2: if_icmple 24
5: lload_2
6: lload 4
8: ladd
9: lstore 6
11: lload 4
13: lstore_2
14: lload 6
16: lstore 4
18: iinc 1, -1
21: goto 0
24: lload 4
26: lreturn
结果是 17444340.812 ± 9508030.117 ops/s
:
https://travis-ci.org/plokhotnyuk/scala-vs-java/jobs/56117116#L2881
是的,这取决于环境参数(JDK 版本,CPU 型号和 RAM 频率)和动态状态。但是,为什么在相同环境下,大多数相同的字节码可以在函数参数范围内产生稳定的 2x-3x 差异?
这里是 ops/s 不同函数参数值的数字列表,来自我的带有 Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz(最大 3.50GHz)的笔记本,内存 12Gb DDR3-1333,Ubuntu 14.10,Oracle JDK 1.8.0_40-b25 64 位:
[info] Benchmark (n) Mode Cnt Score Error Units
[info] JavaFibonacci.loop 2 thrpt 5 171776163.027 ± 4620419.353 ops/s
[info] JavaFibonacci.loop 4 thrpt 5 144793748.362 ± 25506649.671 ops/s
[info] JavaFibonacci.loop 8 thrpt 5 67271848.598 ± 15133193.309 ops/s
[info] JavaFibonacci.loop 16 thrpt 5 54552795.336 ± 17398924.190 ops/s
[info] JavaFibonacci.loop 32 thrpt 5 41156886.101 ± 12905023.289 ops/s
[info] JavaFibonacci.loop 64 thrpt 5 24407771.671 ± 4614357.030 ops/s
[info] ScalaFibonacci.loop 2 thrpt 5 148926292.076 ± 23673126.125 ops/s
[info] ScalaFibonacci.loop 4 thrpt 5 139184195.527 ± 30616384.925 ops/s
[info] ScalaFibonacci.loop 8 thrpt 5 109050091.514 ± 23506756.224 ops/s
[info] ScalaFibonacci.loop 16 thrpt 5 81290743.288 ± 5214733.740 ops/s
[info] ScalaFibonacci.loop 32 thrpt 5 38937420.431 ± 8324732.107 ops/s
[info] ScalaFibonacci.loop 64 thrpt 5 22641295.988 ± 5961435.507 ops/s
附加问题是 "why values of ops/s are decreasing in non-linear way as above?"
是的,我错了,错过了被测试的方法不仅仅是 fastLoop
调用:
Scala
@Benchmark
def loop(): BigInt =
if (n > 92) loop(n - 91, 4660046610375530309L, 7540113804746346429L)
else fastLoop(n)
Java
@Benchmark
public BigInteger loop() {
return n > 92 ?
loop(n - 91, BigInteger.valueOf(4660046610375530309L), BigInteger.valueOf(7540113804746346429L)) :
BigInteger.valueOf(fastLoop(n, 0, 1));
}
正如 Aleksey 所指出的,很多时间花在了从 Long/long
到 BigInt/BigInteger
的转换上。
我编写了单独的基准测试,仅测试 fastLoop(n, 0, 1)
调用。以下是他们的结果:
[info] JavaFibonacci.fastLoop 2 thrpt 5 338071686.910 ± 66146042.535 ops/s
[info] JavaFibonacci.fastLoop 4 thrpt 5 231066635.073 ± 3702419.585 ops/s
[info] JavaFibonacci.fastLoop 8 thrpt 5 174832245.690 ± 36491363.939 ops/s
[info] JavaFibonacci.fastLoop 16 thrpt 5 95162799.968 ± 16151609.596 ops/s
[info] JavaFibonacci.fastLoop 32 thrpt 5 60197918.766 ± 10662747.434 ops/s
[info] JavaFibonacci.fastLoop 64 thrpt 5 29564087.602 ± 3610164.011 ops/s
[info] ScalaFibonacci.fastLoop 2 thrpt 5 336588218.560 ± 56762496.725 ops/s
[info] ScalaFibonacci.fastLoop 4 thrpt 5 224918874.670 ± 35499107.133 ops/s
[info] ScalaFibonacci.fastLoop 8 thrpt 5 121952667.394 ± 17314931.711 ops/s
[info] ScalaFibonacci.fastLoop 16 thrpt 5 96573968.960 ± 12757890.175 ops/s
[info] ScalaFibonacci.fastLoop 32 thrpt 5 59462408.940 ± 14924369.138 ops/s
[info] ScalaFibonacci.fastLoop 64 thrpt 5 28922994.377 ± 7209467.197 ops/s
我学到的教训:
Scala implicit 会消耗大量性能,同时又容易被忽视;
与 Java 的 BigInteger 相比,Scala 中 BigInt 值的兑现可以加快某些函数的速度。
斯卡拉
代码:
@annotation.tailrec
private def fastLoop(n: Int, a: Long = 0, b: Long = 1): Long =
if (n > 1) fastLoop(n - 1, b, a + b) else b
字节码:
private long fastLoop(int, long, long);
Code:
0: iload_1
1: iconst_1
2: if_icmple 21
5: iload_1
6: iconst_1
7: isub
8: lload 4
10: lload_2
11: lload 4
13: ladd
14: lstore 4
16: lstore_2
17: istore_1
18: goto 0
21: lload 4
23: lreturn
结果是 53879289.462 ± 6289454.961 ops/s
:
https://travis-ci.org/plokhotnyuk/scala-vs-java/jobs/56117116#L2909
Java
代码:
private long fastLoop(int n, long a, long b) {
while (n > 1) {
long c = a + b;
a = b;
b = c;
n--;
}
return b;
}
字节码:
private long fastLoop(int, long, long);
Code:
0: iload_1
1: iconst_1
2: if_icmple 24
5: lload_2
6: lload 4
8: ladd
9: lstore 6
11: lload 4
13: lstore_2
14: lload 6
16: lstore 4
18: iinc 1, -1
21: goto 0
24: lload 4
26: lreturn
结果是 17444340.812 ± 9508030.117 ops/s
:
https://travis-ci.org/plokhotnyuk/scala-vs-java/jobs/56117116#L2881
是的,这取决于环境参数(JDK 版本,CPU 型号和 RAM 频率)和动态状态。但是,为什么在相同环境下,大多数相同的字节码可以在函数参数范围内产生稳定的 2x-3x 差异?
这里是 ops/s 不同函数参数值的数字列表,来自我的带有 Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz(最大 3.50GHz)的笔记本,内存 12Gb DDR3-1333,Ubuntu 14.10,Oracle JDK 1.8.0_40-b25 64 位:
[info] Benchmark (n) Mode Cnt Score Error Units
[info] JavaFibonacci.loop 2 thrpt 5 171776163.027 ± 4620419.353 ops/s
[info] JavaFibonacci.loop 4 thrpt 5 144793748.362 ± 25506649.671 ops/s
[info] JavaFibonacci.loop 8 thrpt 5 67271848.598 ± 15133193.309 ops/s
[info] JavaFibonacci.loop 16 thrpt 5 54552795.336 ± 17398924.190 ops/s
[info] JavaFibonacci.loop 32 thrpt 5 41156886.101 ± 12905023.289 ops/s
[info] JavaFibonacci.loop 64 thrpt 5 24407771.671 ± 4614357.030 ops/s
[info] ScalaFibonacci.loop 2 thrpt 5 148926292.076 ± 23673126.125 ops/s
[info] ScalaFibonacci.loop 4 thrpt 5 139184195.527 ± 30616384.925 ops/s
[info] ScalaFibonacci.loop 8 thrpt 5 109050091.514 ± 23506756.224 ops/s
[info] ScalaFibonacci.loop 16 thrpt 5 81290743.288 ± 5214733.740 ops/s
[info] ScalaFibonacci.loop 32 thrpt 5 38937420.431 ± 8324732.107 ops/s
[info] ScalaFibonacci.loop 64 thrpt 5 22641295.988 ± 5961435.507 ops/s
附加问题是 "why values of ops/s are decreasing in non-linear way as above?"
是的,我错了,错过了被测试的方法不仅仅是 fastLoop
调用:
Scala
@Benchmark
def loop(): BigInt =
if (n > 92) loop(n - 91, 4660046610375530309L, 7540113804746346429L)
else fastLoop(n)
Java
@Benchmark
public BigInteger loop() {
return n > 92 ?
loop(n - 91, BigInteger.valueOf(4660046610375530309L), BigInteger.valueOf(7540113804746346429L)) :
BigInteger.valueOf(fastLoop(n, 0, 1));
}
正如 Aleksey 所指出的,很多时间花在了从 Long/long
到 BigInt/BigInteger
的转换上。
我编写了单独的基准测试,仅测试 fastLoop(n, 0, 1)
调用。以下是他们的结果:
[info] JavaFibonacci.fastLoop 2 thrpt 5 338071686.910 ± 66146042.535 ops/s
[info] JavaFibonacci.fastLoop 4 thrpt 5 231066635.073 ± 3702419.585 ops/s
[info] JavaFibonacci.fastLoop 8 thrpt 5 174832245.690 ± 36491363.939 ops/s
[info] JavaFibonacci.fastLoop 16 thrpt 5 95162799.968 ± 16151609.596 ops/s
[info] JavaFibonacci.fastLoop 32 thrpt 5 60197918.766 ± 10662747.434 ops/s
[info] JavaFibonacci.fastLoop 64 thrpt 5 29564087.602 ± 3610164.011 ops/s
[info] ScalaFibonacci.fastLoop 2 thrpt 5 336588218.560 ± 56762496.725 ops/s
[info] ScalaFibonacci.fastLoop 4 thrpt 5 224918874.670 ± 35499107.133 ops/s
[info] ScalaFibonacci.fastLoop 8 thrpt 5 121952667.394 ± 17314931.711 ops/s
[info] ScalaFibonacci.fastLoop 16 thrpt 5 96573968.960 ± 12757890.175 ops/s
[info] ScalaFibonacci.fastLoop 32 thrpt 5 59462408.940 ± 14924369.138 ops/s
[info] ScalaFibonacci.fastLoop 64 thrpt 5 28922994.377 ± 7209467.197 ops/s
我学到的教训:
Scala implicit 会消耗大量性能,同时又容易被忽视;
与 Java 的 BigInteger 相比,Scala 中 BigInt 值的兑现可以加快某些函数的速度。