StackOverflowError 由 Clojure 上的 memoized Fibonacci 引起

StackOverflowError caused by memoized Fibonacci on Clojure

配置

clojure 1.10.3openjdk 17.0.1

下测试

问题

下面是memoized Fibonacci稍微修改后的版本,一般技巧参考 维基 memoization.

(def fib
  (memoize #(condp = %
              0 (bigdec 0)
              1 1
              (+ (fib (dec %)) (fib (- % 2))))))

(fib 225) ; line 7

我原以为FP中的上述memoized FibonacciClojure一样在精神上相当于命令式DP 例如在Python下面,

def fib(n):
    dp = [0, 1] + [0] * n
    for i in range(2, n + 1):
        dp[i] = dp[i - 1] + dp[i - 2]
    return dp[n]

问题 1

在我的例子中,当斐波那契数提高到 225 时,为什么我实际上会收到以下错误?

Syntax error (WhosebugError) compiling at 7:1

我还在 core.memoize 上尝试了直接替换 memo,当斐波那契数增加到 110[= 时,我得到了同样的错误91=].

追踪

下面我添加了跟踪 naive Fibonaccimemoized Fibonacci

的递归前景
(ns fib.core)

(defn naive-fib [n]
  (condp = n
    0 (bigdec 0)
    1 1
    (+ (naive-fib (dec n)) (naive-fib (- n 2)))))

(def memo-fib
  (memoize #(condp = %
              0 (bigdec 0)
              1 1
              (+ (memo-fib (dec %)) (memo-fib (- % 2))))))

(in-ns 'user)

(require '[clojure.tools.trace :refer [trace-ns]])
(trace-ns 'fib.core)

(fib.core/naive-fib 5)
(println)
(fib.core/memo-fib 5)

naive Fibonacci 中的重叠子问题已被 memoized Fibonacci 清楚地消除。乍一看似乎没有什么可疑的原因导致 WhosebugErrormemoized Fibonacci 的堆栈帧深度与输入数 n[=91 严格线性相关=],宽度限制为1.

TRACE t427: (fib.core/naive-fib 5)
TRACE t428: | (fib.core/naive-fib 4)
TRACE t429: | | (fib.core/naive-fib 3)
TRACE t430: | | | (fib.core/naive-fib 2)
TRACE t431: | | | | (fib.core/naive-fib 1)
TRACE t431: | | | | => 1
TRACE t432: | | | | (fib.core/naive-fib 0)
TRACE t432: | | | | => 0M
TRACE t430: | | | => 1M
TRACE t433: | | | (fib.core/naive-fib 1)
TRACE t433: | | | => 1
TRACE t429: | | => 2M
TRACE t434: | | (fib.core/naive-fib 2)
TRACE t435: | | | (fib.core/naive-fib 1)
TRACE t435: | | | => 1
TRACE t436: | | | (fib.core/naive-fib 0)
TRACE t436: | | | => 0M
TRACE t434: | | => 1M
TRACE t428: | => 3M
TRACE t437: | (fib.core/naive-fib 3)
TRACE t438: | | (fib.core/naive-fib 2)
TRACE t439: | | | (fib.core/naive-fib 1)
TRACE t439: | | | => 1
TRACE t440: | | | (fib.core/naive-fib 0)
TRACE t440: | | | => 0M
TRACE t438: | | => 1M
TRACE t441: | | (fib.core/naive-fib 1)
TRACE t441: | | => 1
TRACE t437: | => 2M
TRACE t427: => 5M

TRACE t446: (fib.core/memo-fib 5)
TRACE t447: | (fib.core/memo-fib 4)
TRACE t448: | | (fib.core/memo-fib 3)
TRACE t449: | | | (fib.core/memo-fib 2)
TRACE t450: | | | | (fib.core/memo-fib 1)
TRACE t450: | | | | => 1
TRACE t451: | | | | (fib.core/memo-fib 0)
TRACE t451: | | | | => 0M
TRACE t449: | | | => 1M
TRACE t452: | | | (fib.core/memo-fib 1)
TRACE t452: | | | => 1
TRACE t448: | | => 2M
TRACE t453: | | (fib.core/memo-fib 2)
TRACE t453: | | => 1M
TRACE t447: | => 3M
TRACE t454: | (fib.core/memo-fib 3)
TRACE t454: | => 2M
TRACE t446: => 5M

问题二

为什么 Clojure 可以在 编译时 断言只有 225 堆栈的深度在我的例子中 memoized Fibonacci 的帧可能会爆炸整个 JVM 堆栈从而完全停止 运行 递归?从下面 memoize 的源代码中,我可以看到启动了一个空的 hashmap 来缓存参数和 returns memoized Fibonacci。上述 hashmap 是否导致断言 WhosebugError?为什么?

(defn memoize
  "Returns a memoized version of a referentially transparent function. The
  memoized version of the function keeps a cache of the mapping from arguments
  to results and, when calls with the same arguments are repeated often, has
  higher performance at the expense of higher memory use."
  {:added "1.0"
   :static true}
  [f]
  (let [mem (atom {})]
    (fn [& args]
      (if-let [e (find @mem args)]
        (val e)
        (let [ret (apply f args)]
          (swap! mem assoc args ret)
          ret)))))

其他 - (为了完整起见但与OP无关)

我们可以利用loop recur来实现类似TCO, or the laziness of iterate implementation.

的效果
(ns fib.core)

(defn tail-fib [n]
  (loop [n n
         x (bigdec 0)
         y 1]
    (condp = n
      0 0
      1 y
      (recur (dec n) y (+ x y)))))

(defn lazy-fib [n]
  (->> n
       (nth (iterate (fn [[x y]]
                       [y (+ x y)])
                     [(bigdec 0) 1]))
       first))

(in-ns 'user)

(require '[clojure.tools.trace :refer [trace-ns]])
(trace-ns 'fib.core)

(fib.core/tail-fib 2000)
(println)
(fib.core/lazy-fib 2000)

跟踪表明他们没有使任何递归调用更有效。

TRACE t471: (fib.core/tail-fib 2000)
TRACE t471: => 4224696333392304878706725602341482782579852840250681098010280137314308584370130707224123599639141511088446087538909603607640194711643596029271983312598737326253555802606991585915229492453904998722256795316982874482472992263901833716778060607011615497886719879858311468870876264597369086722884023654422295243347964480139515349562972087652656069529806499841977448720155612802665404554171717881930324025204312082516817125M

TRACE t476: (fib.core/lazy-fib 2000)
TRACE t476: => 4224696333392304878706725602341482782579852840250681098010280137314308584370130707224123599639141511088446087538909603607640194711643596029271983312598737326253555802606991585915229492453904998722256795316982874482472992263901833716778060607011615497886719879858311468870876264597369086722884023654422295243347964480139515349562972087652656069529806499841977448720155612802665404554171717881930324025204312082516817125M

Memoize returns 函数,所以你必须使用 def:

(def fib
  (memoize #(condp = %
              0 (bigdec 0)
              1 1
              (+ (fib (dec %)) (fib (- % 2))))))

(fib 225)
=> 47068900554068939361891195233676009091941690850M

当缓存为空时,记忆不会影响堆栈。您的 Clojure 代码不等同于 Python 代码,因为 Clojure 版本是递归的,而 Python 版本是迭代的。

不知道为什么要重新回复

(def fib
  (memoize #(condp = %
          0 (bigdec 0)
          1 1
          (+ (fib (dec %)) (fib (- % 2))))))

(fib 136) 给我溢出 (fib 135) 工作正常 然后当我再次执行 (fib 136) 时,我 git 没有错误。 从一个新的 repl 做, (地图 fib(范围 1 10000)) 工作正常

我仔细调查了这个问题,并希望最终弄明白。

首先,让我们先看看 Pythonmemoized Fibonacci 上的一个类似但方便的递归实现,然后我们在 [= 中开始研究它43=]Clojure.

def memo_fib(n):
    def fib(n):
        print(f"ALL: {n}")
        if n not in dp:
            print(f"NEW: {n}")
            if n == 0:
                dp[0] = 0
            elif n == 1:
                dp[1] = 1
            else:
                dp[n - 1], dp[n - 2] = fib(n - 1), fib(n - 2)
                dp[n] = dp[n - 1] + dp[n - 2]
        return dp[n]

    dp = {}
    return fib(n)


memo_fib(5)

我们添加了两个 print 子句来跟踪 memoized Fibonacci 上的递归执行。 ALL 表示实际进入派生调用堆栈的内容,即使它们已经被记忆,而 NEW 表示仅当它们尚未 memoize 时才会按比例显示。显然,总 ALL 步数大于 NEW 步数,这在下面的输出中也很明显,

ALL: 5
NEW: 5
ALL: 4
NEW: 4
ALL: 3
NEW: 3
ALL: 2
NEW: 2
ALL: 1
NEW: 1
ALL: 0
NEW: 0
ALL: 1
ALL: 2
ALL: 3

我们也将其与下面Python中的naive Fibonacci进行比较,

def naive_fib(n):
    print(f"NAIVE: {n}")
    if n == 0:
        return 0
    if n == 1:
        return 1
    return naive_fib(n - 1) + naive_fib(n - 2)


naive_fib(5)

我们可以在下面看到 memoized Fibonacci 确实减少了递归步骤,从而比 naive Fibonacci 更有效地改进了堆栈,这使@Eugene Pakhomov 发布的解释无效:

Memoization does not affect the stack when the cache is empty.

NAIVE: 5
NAIVE: 4
NAIVE: 3
NAIVE: 2
NAIVE: 1
NAIVE: 0
NAIVE: 1
NAIVE: 2
NAIVE: 1
NAIVE: 0
NAIVE: 3
NAIVE: 2
NAIVE: 1
NAIVE: 0
NAIVE: 1

回到 Clojure 中的 memoized Fibonacci,我们在 OP 中能够追踪到的只是 NEW 步骤。要查看 ALL 步骤,我们必须从里到外跟踪 memoize 函数本身。知道怎么做方便的请帮忙

现在是得出结论的时候了 memoize 可以帮助减少重复的递归调用,但它不能完全防止递归调用一次又一次地为一些重叠的子问题生成堆栈帧。 WhosebugError 如果这些子问题迅速增长,就会让您大吃一惊。