如何对不可哈希列表使用 LRU 缓存？

Question

字符串中字符的编辑距离可以用 lru_cache:

计算

from functools import lru_cache
from itertools import zip_longest

@lru_cache(maxsize=4095)
def ld(s, t):
    """
    Levenshtein distance memoized implementation from Rosetta code:
    https://rosettacode.org/wiki/Levenshtein_distance#Python
    """
    if not s: return len(t)
    if not t: return len(s)
    if s[0] == t[0]: return ld(s[1:], t[1:])
    l1 = ld(s, t[1:])      # Deletion.
    l2 = ld(s[1:], t)      # Insertion.
    l3 = ld(s[1:], t[1:])  # Substitution.
    return 1 + min(l1, l2, l3)

例如

>>> ld("hello how are the you ?", "how Halo how you are the ?")
13

但是，如果我想跟踪 2 个单词列表之间的距离 而不是字符串中的字符，则没有 lru_cache 它可以工作：

from functools import lru_cache
from itertools import zip_longest

#@lru_cache(maxsize=4095)
def ld(s, t):
    """
    Levenshtein distance memoized implementation from Rosetta code:
    https://rosettacode.org/wiki/Levenshtein_distance#Python
    """
    if not s: return len(t)
    if not t: return len(s)
    if s[0] == t[0]: return ld(s[1:], t[1:])
    l1 = ld(s, t[1:])      # Deletion.
    l2 = ld(s[1:], t)      # Insertion.
    l3 = ld(s[1:], t[1:])  # Substitution.
    return 1 + min(l1, l2, l3)

用法：

 >>> ld("hello how are the you ?".split(), "how Halo how you are the ?".split())

但是，如果没有 LRU 缓存，它将无法处理更长的单词列表。

并且对于 LRU 缓存，它会抛出一个无法散列的类型错误：

>>> ld("hello how are the you ?".split(), "how Halo how you are the ?".split())

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-1de95d00ebae> in <module>
----> 1 ld("hello how are the you ?".split(), "how Halo how you are the ?".split())

TypeError: unhashable type: 'list'

我的问题是：

是否可以使字符串比较列表可散列以便可以使用 lru_cache？
如果无法以某种方式散列单词列表，我应该如何有效地计算距离以找到两个单词列表之间的距离？只能通过动态规划？

Answer 1

由于您不修改列表，而只是切片，因此您可以传递可散列的元组。

s = "hello how are the you ?".split()
t = "how Halo how you are the ?".split()
ld(tuple(s), tuple(t))

否则，您可以通过使用带有额外 space 的循环来避免使用 lru_cached 函数，您可以在其中记住计算

如何对不可哈希列表使用 LRU 缓存？

How to use LRU cache for non-hashable lists?

python

edit-distance

dynamic-programming

levenshtein-distance