Python：并行化 any/all 语句

Question

我是运行一些 Python 程序，我注意到瓶颈在于执行以下操作

all(foo(s) for s in l)

我想知道的是 - 将其变成并行计算的最佳方法是什么？ foo(s) 是一种线程安全方法，用于检查 s 并根据某些条件返回 True/False。 foo 没有改变任何数据结构。

所以问题是

How to test in parallel if all elements of a list l have property foo , exiting as soon as one element of l does not satisfy foo?

编辑。添加更多上下文。我不知道你在寻找什么样的上下文，但在我的场景中，s 是一个图，foo(s) 计算一些图理论不变量（例如平均距离或类似的东西）

Answer 1

Python 自带 multiprocessing 模块；有一个 example implementing a classical reduce algorithm (which could be used to implement all). Generally, you might want to look at Pool 功能：

The Pool class represents a pool of worker processes. It has methods which allows tasks to be offloaded to the worker processes in a few different ways.

Answer 2

这在某种程度上取决于 foo(s) 在做什么。如果它是 I/O 绑定的，则等待阻塞调用，而不是仅使用线程会有所帮助。最简单的方法是创建一个线程池并使用 pool.map:

from multiprocessing.pool import ThreadPool
pool = ThreadPool(10)
all(pool.map(foo, l))

但是，如果函数是 cpu 绑定的，则使用大量处理器能力，这对您没有帮助。相反，您需要使用多处理池：

from multiprocessing import Pool
pool = Pool(4)
all(pool.map(foo, l))

这将使用单独的进程而不是线程，从而允许使用多个 cpu 内核。但是，如果您的函数 foo 非常快，那么开销将消除并行处理的任何优势，因此您需要进行测试以确保获得预期的结果

参见：https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

编辑： 我假设您使用的是 Python 2.7.x。如果您使用的是 Python3，则您在 concurrent.futures 中拥有更高级的并发功能。包括ThreadPoolExecutor和ProcessPoolExecutor。

我建议使用那些用于并行处理和 asyncio 库来解决 I/O 绑定问题。

Python：并行化 any/all 语句

Python: parallelizing any/all statements

python

parallel-processing

performance