使用 lru_cache 和 hash 缓存对象实例

Question

我不明白 functools.lru_cache 如何处理对象实例。我假设 class 必须提供 __hash__ 方法。因此任何具有相同哈希值的实例都应该 hit 缓存。

这是我的测试：

from functools import lru_cache

class Query:    
    def __init__(self, id: str):
        self.id = id

    def __hash__(self):
        return hash(self.id)

@lru_cache()
def fetch_item(item):
    return 'data'

o1 = Query(33)
o2 = Query(33)
o3 = 33

assert hash(o1) == hash(o2) == hash(o3)

fetch_item(o1)  # <-- expecting miss
fetch_item(o1)  # <-- expecting hit
fetch_item(o2)  # <-- expecting hit BUT get a miss !
fetch_item(o3)  # <-- expecting hit BUT get a miss !
fetch_item(o3)  # <-- expecting hit

info = fetch_item.cache_info()
print(info)

assert info.hits == 4
assert info.misses == 1
assert info.currsize == 1

如何缓存具有相同散列的对象实例的调用？

Answer 1

即使 lru_cache() 期望它的参数是可散列的，但它不使用它们的实际散列值，因此你会得到那些遗漏。

函数_make_key使得使用 _HashedSeq to make sure all the items it has are hashable, but later on in _lru_cache_wrapper 它不使用哈希值。

（_HashedSeq如果只有一个参数并且是int或str类型则跳过）

class _HashedSeq(list):
    """ This class guarantees that hash() will be called no more than once
        per element.  This is important because the lru_cache() will hash
        the key multiple times on a cache miss.
    """

    __slots__ = 'hashvalue'

    def __init__(self, tup, hash=hash):
        self[:] = tup
        self.hashvalue = hash(tup)

    def __hash__(self):
        return self.hashvalue

fetch_item(o1)  # Stores (o1,) in cache dictionary, but misses and stores (o1,)
fetch_item(o1)  # Finds (o1,) in cache dictionary
fetch_item(o2)  # Looks for (o2,) in cache dictionary, but misses and stores (o2,)
fetch_item(o3)  # Looks for (o3,) in cache dictionary, but misses and stores (33,)

不幸的是，没有提供自定义 make_key 函数的记录方式，因此，实现此目的的一种方法是通过猴子修补 _make_key 函数（在上下文管理器中）：

import functools
from contextlib import contextmanager


def make_key(*args, **kwargs):
    return hash(args[0][0])


def fetch_item(item):
    return 'data'

@contextmanager
def lru_cached_fetch_item():
    try:
        _make_key_og = functools._make_key
        functools._make_key = make_key
        yield functools.lru_cache()(fetch_item)
    finally:
        functools._make_key = _make_key_og


class Query:    
    def __init__(self, id: int):
        self.id = id

    def __hash__(self):
        return hash(self.id)


o1 = Query(33)
o2 = Query(33)
o3 = 33

assert hash(o1) == hash(o2) == hash(o3)

with lru_cached_fetch_item() as func:
    func(o1)  # <-- expecting miss
    func(o1)  # <-- expecting hit
    func(o2)  # <-- expecting hit BUT get a miss !
    func(o3)  # <-- expecting hit BUT get a miss !
    func(o3)  # <-- expecting hit

info = func.cache_info()
print(info) # CacheInfo(hits=4, misses=1, maxsize=128, currsize=1)
assert info.hits == 4
assert info.misses == 1
assert info.currsize == 1

Answer 2

简答：当o1已经在缓存中时，为了在o2上获得缓存命中，class可以定义一个__eq__()方法，比较Query个对象是否具有相等的值。

例如：

def __eq__(self, other):
    return isinstance(other, Query) and self.id == other.id

更新：一个额外的细节值得在总结中提及而不是埋没在细节中：这里描述的行为也适用于 functools.cache 中引入的包装器 Python 3.9，因为 @cache() 只是 @lru_cache(maxsize=None).

的快捷方式

长答案（包括o3）：

关于字典查找的确切机制，有一个很好的解释，因此我不会全部重新创建。可以这么说，由于 LRU 缓存存储为字典，class 对象需要比较相等才能被认为已经存在于缓存中，因为比较字典键的方式。

您可以在一个普通字典的快速示例中看到这一点，其中有两个版本的 class，其中一个使用 __eq__()，另一个不使用：

>>> o1 = Query_with_eq(33)
>>> o2 = Query_with_eq(33)
>>> {o1: 1, o2: 2}
{<__main__.Query_with_eq object at 0x6fffffea9430>: 2}

这导致字典中有一项，因为键是相等的，而：

>>> o1 = Query_without_eq(33)
>>> o2 = Query_without_eq(33)
>>> {o1: 1, o2: 2}
{<__main__.Query_without_eq object at 0x6fffffea9cd0>: 1, <__main__.Query_without_eq object at 0x6fffffea9c70>: 2}

结果为两项（不等键）。

为什么当 Query 对象存在时 int 不会导致缓存命中:

o3 是一个普通的 int 对象。虽然它的值确实比较等于 Query(33)，但假设 Query.__eq__() 正确比较类型，lru_cache 有绕过该比较的优化。

通常，lru_cache 为包装函数的参数创建字典键（作为 tuple）。可选地，如果缓存是使用 typed=True 参数创建的，它还会存储每个参数的类型，因此只有当它们也是相同类型时值才相等。

优化是如果wrapped function只有一个参数，类型为int或str，则直接将单个参数作为字典键，而不是转成一个元组。因此，(Query(33),) 和 33 不比较相等，即使实际上它们存储相同的值。（请注意，我并不是说 int 对象未被缓存，只是它们与非 int 类型的现有值不匹配。从您的示例中，您可以看到 fetch_item(o3) 在第二次调用时命中缓存。

如果参数的类型与 int 不同，您可以获得缓存命中。例如，33.0 将匹配，再次假定 Query.__eq__() 考虑了类型并且 returns True。为此，您可以执行以下操作：

def __eq__(self, other):
    if isinstance(other, Query):
        return self.id == other.id
    else:
        return self.id == other

使用 lru_cache 和 hash 缓存对象实例

Cache object instances with lru_cache and hash

python

caching

functools

使用 lru_cache 和 __hash__ 缓存对象实例

Cache object instances with lru_cache and __hash__

python

caching

functools

使用 lru_cache 和 hash 缓存对象实例

Cache object instances with lru_cache and hash