使用 ProcessPoolExecutor 时如何正确记忆？

Question

我怀疑是这样的：

@memoize
def foo():
    return something_expensive

def main():
    with ProcessPoolExecutor(10) as pool:
        futures = {pool.submit(foo, arg): arg for arg in args}
        for future in concurrent.futures.as_completed(futures):
            arg = futures[future]
            try:
                result = future.result()
            except Exception as e:
                sys.stderr.write("Failed to run foo() on {}\nGot {}\n".format(arg, e))
            else:
                print(result)

将无法工作（假设 @memoize 是典型的基于字典的缓存），因为我使用的是多处理池并且进程共享的不多。至少它似乎不起作用。

在这种情况下，正确的记忆方法是什么？最后，我还想将缓存腌制到磁盘并在后续运行时加载它。

Answer 1

您可以使用来自 multiprocessing 的 Manager.dict，它使用管理器在进程之间进行代理并存储在共享字典中，可以 pickle。我决定使用多线程，因为它是一个 IO 绑定应用程序和线程共享内存 space 意味着我不需要所有管理器的东西，我可以只使用一个字典。

使用 ProcessPoolExecutor 时如何正确记忆？

How to properly memoize when using a ProcessPoolExecutor?

caching

memoization

python-3.x

python-multiprocessing

process-pool