多于多进程池大小的进程,多出来的进程什么时候运行?

More processes than multiprocess pool size, when do the extra processes run?

假设我有一个大小为 5 的池和一个我想调用 10 次的工人。假设工作人员非常 cpu 密集并且需要几分钟才能完成其任务。

pool = multiprocessing.Pool(processes=5)
for i in range(10):
    pool.apply_async(pool_worker)

第 5 次迭代后,池被填满。其余的工人电话会怎样?他们是否排队等到前面的工人完成?

文档对此不是很明确。一般而言:一旦进程完成并释放,它就会捕获下一个可用任务(因此:排队)。如果你尝试

from multiprocessing import Pool
from time import sleep

def sleeping(i):
    print(f"{i} started")
    sleep(5)
    print(f"{i} ended")

if __name__ == "__main__":
    with Pool(processes=5) as p:
        results = [p.apply_async(sleeping, args=(i,)) for i in range(10)]
        results = [result.get() for result in results]

然后你会得到这样的结果

0 started
1 started
2 started
3 started
4 started
3 ended
0 ended
5 started
6 started
1 ended
7 started
2 ended
8 started
4 ended
9 started
5 ended
6 ended
7 ended
8 ended
9 ended

根据框架的不同,也可能是一旦一个进程完成了它的工作量,它就会终止,一个新的开始,然后下一个可用的任务由新进程接管。 From the docs:

Note Worker processes within a Pool typically live for the complete duration of the Pool’s work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.