保存计算结果以供重复使用，同时管理内存消耗

Question

我正在缓存计算缓慢但通常需要多次的值。我有一本看起来像这样的字典：

stored_values = {
    hash1: slow_to_calc_value1
    hash2: slow_to_calc_value2
    # And so on x5000
}

我是这样使用的，如果之前已经计算过，可以快速获取值。

def calculate_value_for_item(item):
    item_hash = hash_item(item) # Hash the item, used as the dictionary key
    stored_value = stored_values.get(item_hash, None)
    if stored_value is not None:
        return stored_value
    calculated_value = do_heavy_math(item) # This is slow and I want to avoid
    # Storing the reult for re-use makes me run out of memory at some point
    stored_values[item_hash] = calculated_value
    return calculated_value

但是，如果我尝试存储在整个程序中计算的所有值，我会运行内存不足。

如何有效地管理查找字典的大小？最近需要的值将来也最有可能需要，这是一个合理的假设。

注意事项

我把场景简化了很多。
存储的值实际上占用了大量内存。词典本身收录的词条不多，只有几千条。如果需要，我绝对可以负担得起一些并行簿记数据结构。
理想的解决方案是让我存储 n 最后需要的值，同时删除其余值。但是任何足够接近的启发式都足够好。

Answer 1

您是否尝试过使用 @lru_cache 装饰器？它似乎完全符合您的要求。

from functools import lru_cache

store_this_many_values = 5

@lru_cache(maxsize=store_this_many_values)
def calculate_value_for_item(item):
    calculated_value = do_heavy_math(item)
    return calculated_value

@lru_cache 还添加了新功能，这可能会帮助您优化内存 and/or 性能，例如 cache_info

for i in [1,1,1,2]:
    calculate_value_for_item(i)
print(calculate_value_for_item.cache_info())

>>> CacheInfo(hits=2, misses=2, maxsize=5, currsize=2)

保存计算结果以供重复使用，同时管理内存消耗

Saving calculation results for re-use, while managing memory consumption

python

algorithm

data-structures

注意事项