Python Pebble ProcessPool如何设置max_tasks

Python Pebble ProcessPool how to set max_tasks

Pebble 的进程池采用 max_workers 和 max_tasks 的参数。

https://pythonhosted.org/Pebble/#pools

max_tasks的描述有点不清楚:

"If max_tasks is a number greater than zero each worker will be restarted after performing an equal amount of tasks."

我的问题是:

我正在 运行ning 一个需要对长度为 += 160 000 的列表的每个元素执行 运行 的函数。它是完全可并行的,而且我的服务器有 8 个内核。每个函数调用将花费大约相同的时间完成,最多比平均时间长 3 倍。

谢谢。

max_task 参数类似于 multiprocessing.Pool 中的 maxtaskperchild。 Python 2 related documentation 解释了该参数的用途。

Worker processes within a Pool typically live for the complete duration of the Pool’s work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.

换句话说,如果您想限制进程可以维持的资源增长量,则使用 max_task。例如,如果您正在处理泄漏内存或文件描述符的库,这将很有用。另一个用例是限制进程中发生的内存碎片所浪费的内存。