Groovy 的 trampoline() 使递归执行速度变慢 - 为什么？

Question

我正在尝试递归：

def fac
//fac = { int curr, res = 1G -> 1 >= curr ? res : fac( curr - 1, res * curr ) }
fac = { int curr, res = 1G -> 1 >= curr ? res : fac.trampoline( curr - 1, res * curr ) }
fac = fac.trampoline()

def rnd = new Random()

long s = System.currentTimeMillis()

100000.times{ fac rnd.nextInt( 40 ) }

println "done in ${System.currentTimeMillis() - s} ms / ${fac(40)}"

如果我这样使用它，我会得到这个：

done in 691 ms

如果我取消注释行 #2 和注释行 #3-4 以删除 trampoline() 和运行它，我得到的数字会明显减少：

done in 335 ms

因此，使用 trampoline 时，递归速度会慢 2 倍。

我错过了什么？

P.S.

如果我运行 Scala 2.12 中的相同示例：

def fac( curr:Int, acc:BigInt = 1 ):BigInt = if( 1 >= curr ) acc else fac( curr - 1, curr * acc )
val s = System.currentTimeMillis
for( ix <- 0 until 100000 ) fac( scala.util.Random.nextInt(40).toInt )

println( s"done in ${System.currentTimeMillis - s} ms" )

它执行得更快一点：

done in 178 ms

更新

将闭包重写为带有注解的方法：

@groovy.transform.TailRecursive
def fac( int curr, res = 1G ) { 1 >= curr ? res : fac( curr - 1, res * curr ) }
// the rest

给予

done in 164 ms

并且是超级coll。尽管如此，我还是想知道trampoline() :)

Answer 1

如文档中所述，Closure.trampoline() 可防止调用堆栈溢出。

Recursive algorithms are often restricted by a physical limit: the maximum stack height. For example, if you call a method that recursively calls itself too deep, you will eventually receive a WhosebugException.

An approach that helps in those situations is by using Closure and its trampoline capability.

Closures are wrapped in a TrampolineClosure. Upon calling, a trampolined Closure will call the original Closure waiting for its result. If the outcome of the call is another instance of a TrampolineClosure, created perhaps as a result to a call to the trampoline() method, the Closure will again be invoked. This repetitive invocation of returned trampolined Closures instances will continue until a value other than a trampolined Closure is returned. That value will become the final result of the trampoline. That way, calls are made serially, rather than filling the stack.

^{Source: http://groovy-lang.org/closures.html#_trampoline}

但是，使用蹦床是有代价的。让我们看一下 JVisualVM 示例。

Non-trampoline 用例

运行没有 trampoline() 的示例我们在 ~441 毫秒内得到结果

done in 441 ms / 815915283247897734345611269596115894272000000000

此执行分配了约 2,927,550 个对象并消耗了大约 100 MB 的内存。

CPU有点小事，除了花时间在main()和run()方法上，它在强制参数上花费了一些周期。

`trampoline()` 用例

蹦床的引入确实改变了很多。首先，与之前的尝试相比，它使执行时间几乎慢了两倍。

done in 856 ms / 815915283247897734345611269596115894272000000000

其次，它分配 ~5,931,470 (!!!) 对象并消耗~221 MB 内存。主要区别在于，在前一种情况下，所有执行都使用了一个 $_main_closure1，而在使用蹦床的情况下 - 每次调用 trampoline() 方法都会创建：

一个新的$_main_closure1对象
用 CurriedClosure<T>
然后用 TrampolineClosure<T>

仅此分配了超过 1,200,000 个对象。

如果说到CPU，它还有更多的事情要做。看看数字：

对 TrampolineClosure<T>.<init>() 的所有调用消耗 199 毫秒
使用蹦床引入了对 PojoeMetaMethodSite$PojoCachedMethodSietNoUnwrap.invoke() 的调用，这总共消耗了额外的 201 毫秒
对 CachedClass.initValue() 的所有调用总共额外消耗 98.8 毫秒
对 ClosureMetaClass$NormalMethodChooser.chooseMethod() 的所有调用总共额外消耗 100 毫秒

这正是为什么在您的案例中引入蹦床会使代码执行速度变慢的原因。

那为什么 `@TailRecursive` 做得更好呢？

简而言之 - @TailRecursive 注释用旧的 while-loop 替换了所有闭包和递归调用。 @TailRecursive 的阶乘函数在字节码级别看起来像这样：

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

package factorial;

import groovy.lang.GroovyObject;
import groovy.lang.MetaClass;
import java.math.BigInteger;
import org.codehaus.groovy.runtime.ScriptBytecodeAdapter;
import org.codehaus.groovy.runtime.dgmimpl.NumberNumberMultiply;
import org.codehaus.groovy.transform.tailrec.GotoRecurHereException;

public class Groovy implements GroovyObject {
    public Groovy() {
        MetaClass var1 = this.$getStaticMetaClass();
        this.metaClass = var1;
    }

    public static BigInteger factorial(int number, BigInteger acc) {
        BigInteger _acc_ = acc;
        int _number_ = number;

        try {
            while(true) {
                try {
                    while(_number_ != 1) {
                        int __number__ = _number_;
                        int var7 = _number_ - 1;
                        _number_ = var7;
                        Number var8 = NumberNumberMultiply.multiply(__number__, _acc_);
                        _acc_ = (BigInteger)ScriptBytecodeAdapter.castToType(var8, BigInteger.class);
                    }

                    BigInteger var4 = _acc_;
                    return var4;
                } catch (GotoRecurHereException var13) {
                    ;
                }
            }
        } finally {
            ;
        }
    }

    public static BigInteger factorial(int number) {
        return factorial(number, (BigInteger)ScriptBytecodeAdapter.castToType(1, BigInteger.class));
    }
}

我前段时间在我的博客上记录了这个用例。如果您想了解更多信息，可以阅读博客post：

https://e.printstacktrace.blog/tail-recursive-methods-in-groovy/

Groovy 的 trampoline() 使递归执行速度变慢 - 为什么？

Groovy's trampoline() makes recursive execution much slower - why?

recursion

groovy

trampolines

Non-trampoline 用例

`trampoline()` 用例

那为什么 `@TailRecursive` 做得更好呢？

Groovy 的 trampoline() 使递归执行速度变慢 - 为什么？

Groovy's trampoline() makes recursive execution much slower - why?

recursion

groovy

trampolines

Non-trampoline 用例

trampoline() 用例

那为什么 @TailRecursive 做得更好呢？

`trampoline()` 用例

那为什么 `@TailRecursive` 做得更好呢？