如何在生成的线程中捕获内存错误?

How can I catch a memory error in a spawned thread?

我以前从未使用过多处理库,所以欢迎大家提出建议..

我有一个 python 程序,它使用多处理库在多个进程中执行一些内存密集型任务,偶尔会耗尽内存(我正在进行优化,但那不是这个问题是关于)。有时,会以一种我似乎无法捕获的方式抛出内存不足错误(下面的输出),然后程序挂起 pool.join()(我正在使用 multiprocessing.Pool。如何出现此问题时,我可以让程序做一些事情而不是无限期地等待吗?

理想情况下,内存错误传播回主进程,然后主进程死亡。

这是内存错误:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 764, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 325, in _handle_workers
    pool._maintain_pool()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 229, in _maintain_pool
    self._repopulate_pool()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 222, in _repopulate_pool
    w.start()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory 

这里是我管理多处理的地方:

mp_pool = mp.Pool(processes=num_processes)
mp_results = list()
for datum in input_data:
    data_args = {
         'value': 0 // actually some other simple dict key/values
    }
    mp_results.append(mp_pool.apply_async(_process_data, args=(common_args, data_args)))
frame_pool.close()
frame_pool.join()  // hangs here when that thread dies..
for result_async in mp_results:
    result = result_async.get()
    // do stuff to collect results
// rest of the code

当我中断挂起的程序时,我得到:

Process process_003:
Traceback (most recent call last):
  File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/queues.py", line 374, in get
    return recv()
    racquire()
KeyboardInterrupt

这实际上是一个known bug in python's multiprocessing module, fixed in python 3 (here's a summarizing blog post I found). There's a patch attached to python issue 22393,但尚未正式应用。

基本上,如果多进程池的子进程之一意外死亡(内存不足、外部杀死等),该池将无限期等待。