Python 2.7：来自 "multiprocessing.Pool" 的 "can't start new thread" 错误

Question

这是我的情况。代码几乎与 example in the docs:

相同

from multiprocessing import Pool
import numpy as np

def grad(x0, y): return 0 # does some computational-heavy work actually

if __name__ == '__main__':

    class UnrollArgs:
        def __init__(self, func):
            self.func = func

        def __call__(self, args):
            return self.func(*args)

    def batch_grad(x0, y, processes=4):
        g = Pool(processes).map(UnrollArgs(grad), [(x0, yi) for yi in y])
        return np.sum([gi for gi in g], axis=0) / len(y)

我传递给batch_grad的y有50个元素，Pool.map抛出错误：

error: can't start new thread

来自Google 我知道这通常是由于一个人试图启动太多线程造成的。也许这只是我，但我认为 multiprocessing.Pool 上的文档有点不完整。特别是，我不知道如何控制应该启动的线程数。在 Pool class.

的文档中甚至没有提到术语 "thread"

multiprocessing.Pool 的积分参数是 number of processes to start，而不是线程。

那我该如何解决呢？

更新： 可能值得注意的是，我运行代码时每次都不会出现错误。

Answer 1

我认为问题源于产生许多 Pool。这个错误很奇怪，我认为@ChongMa 是正确的，它与 Python 解释器本身无法生成线程有关。听起来我在评论中的建议可能对您有用，所以我将其重新发布在这里作为答案。

尝试这些修复： a) 使用 Pool.close() 方法让每个 Pool 知道它不会再得到任何工作：

def batch_grad(x0, y, processes=4):
    pool = Pool(processes)
    g = pool.map(UnrollArgs(grad), [(x0, yi) for yi in y])
    pool.close()
    return np.sum([gi for gi in g], axis=0) / len(y)

b) 重新使用 Pool 进行所有处理 - 将 Pool 对象传递给 batch_grad 函数，而不是多个进程：

def batch_grad(x0, y, pool=None):
    if pool is None:
        pool = Pool(4)
    g = pool.map(UnrollArgs(grad), [(x0, yi) for yi in y])
    return np.sum([gi for gi in g], axis=0) / len(y)

# then call your function like so
p = Pool(4)
batch_grad(your_x0, your_y, p)

希望这对您长期有效。

Python 2.7：来自 "multiprocessing.Pool" 的 "can't start new thread" 错误

Python 2.7: "can't start new thread" error from "multiprocessing.Pool"

python

multithreading

python-2.7

python-multiprocessing