垃圾收集和字符串到字符串函数的记忆

Garbage collection and the memoization of a string-to-string function

以下练习来自p。 Lua(第 4 版)中 Ierusalimschy 的 Programming 的 234。 (注意:在本书前面,作者明确拒绝memoization这个词,坚持使用memorization 相反。在阅读下面的摘录时请记住这一点。)

Exercise 23.3: Imagine you have to implement a memorizing table for a function from strings to strings. Making the table weak will not do the removal of entries, because weak tables do not consider strings as collectable objects. How can you implement memorization in that case?

我难住了!

我的部分问题是我无法设计出一种方法来实现字符串的(垃圾)收集。

相比之下,对于 table,我可以为其配备终结器,它将在 table 即将被收集时报告。有没有办法确认给定的字符串(并且只有那个字符串)已被垃圾收集?


另一个困难是简单地弄清楚所需函数的规范。我能做的最好的事情就是弄清楚它不是什么。在本书的前面(第 225 页),作者给出了以下“记忆”功能的示例:

Imagine a generic server that takes requests in the form of strings with Lua code. Each time it gets a request, it runs load on the string, and then calls the resulting function. However, load is an expensive function, and some commands to the server may be quite frequent. Instead of calling load repeatedly each time it receives a common command like "closeconnection()", the server can memorize the results from load using an auxiliary table. Before calling load, the server checks in the table whether the given string already has a translation. If it cannot find a match then (and only then) the server calls load and stores the result into the table. We can pack this behavior in a new function:

[standard memo(r)ized implementation omitted; see variant using a weak-value table below]

The savings with this scheme can be huge. However, it may also cause unsuspected waste. ALthough some commands epeat over and over, many other commands happen only once. Gradually, the ["memorizing"] table results accumulates all commands the server has ever received plus their respective codes; after enough time, this behavior will exhaust the server's memory.

A weak table provides a simple solution to this problem. If the results table has weak values, each garbage-collection cycle will remove all translations not in use at that moment (which means virtually all of them)1:

local results = {}
setmetatable(results, {__mode = "v"})  -- make values weak
function mem_loadstring (s)
  local res = results[s]
  if res == nil then                   -- results not available?
    res = assert(load(s))              -- compute new results
    result[s] = res                    -- save for later reuse
  end
  return res
end

正如最初的问题陈述所指出的那样,当函数被 memo(r)ized returns 字符串时,该方案将不起作用,因为垃圾收集器不会将字符串视为“collectable".


当然,如果允许更改所需函数的接口,那么它 returns 一个单例 table 其唯一项是 real 而不是返回字符串 result string,那么问题变得几乎微不足道,但我很难相信作者有这么粗暴的“解决方案”2.

以防万一,我使用的是Lua 5.3.


1 顺便说一句,如果 memo(r)ization 的基本原理是避免不必要地更频繁地调用 load,则该方案作者提出的对我来说没有意义。在我看来,这个方案是基于这样的假设(实际上是一种启发式),即经常使用的翻译,因此会支付给 memo(r)ize,也是一个总是可以访问的(因此不能收集 table).我不明白为什么这必然,甚至可能是这种情况。

2 可以用 __tostring 的方法给这只猪涂口红,这样 table(由 memo(r)ized 函数返回的那个)在某些上下文中伪装成字符串;不过它还是一头猪

你的想法是正确的:将字符串包装成一个table(因为table是collectable)。

function memormoize (func_from_string_to_string)
   local cached = {}
   setmetatable(cached, {__mode = "v"}) 
   return 
      function(s)
         local c = cached[s] or {func_from_string_to_string(s)} 
         cached[s] = c                    
         return c[1]
      end
end

而且我在这个解决方案中没有看到猪:-)

one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.

弱 table 中将没有 "always reachable" 项。
但是最频繁的项目只会在每个 GC 周期重新计算一次。
理想的解决方案(从不收集经常使用的物品)将需要更复杂的实施。
例如,当项目 "inactivity timer" 达到某个阈值时,您可以将项目从普通缓存移至弱缓存。