如何防止 multiprocessing 继承 imports 和 globals?

How to prevent multiprocessing from inheriting imports and globals?

我在更大的代码库中使用多处理,其中一些导入语句有副作用。我如何 运行 后台进程中的函数而不继承全局导入?

# helper.py:

print('This message should only print once!')
# main.py:

import multiprocessing as mp
import helper  # This prints the message.

def worker():
  pass  # Unfortunately this also prints the message again.

if __name__ == '__main__':
  mp.set_start_method('spawn')
  process = mp.Process(target=worker)
  process.start()
  process.join()

背景: 导入 TensorFlow 会初始化 CUDA,它会保留一些 GPU 内存。因此,生成太多进程会导致 CUDA OOM 错误,即使这些进程不使用 TensorFlow。

没有答案的类似问题:

# helper.py:

print('This message should only print once!')
# main.py:

import multiprocessing as mp

def worker():
  pass

def main():

  # Importing the module only locally so that the background
  # worker won't import it again.
  import helper

  mp.set_start_method('spawn')
  process = mp.Process(target=worker)
  process.start()
  process.join()

if __name__ == '__main__':
  main()

Is there a resources that explains exactly what the multiprocessing module does when starting an mp.Process?

超级快速版本(使用 spawn 上下文而不是 fork)

准备一些东西(一对用于通信、清理回调等的管道),然后使用进程对象的 fork()exec(). On windows it's CreateProcessW(). The new python interpreter is called with a startup script spawn_main() and passed the communication pipe file descriptors via a crafted command string and the -c switch. The startup script cleans up the environment a little bit, then unpickles the Process object from its communication pipe. Finally it calls the run 方法创建一个新进程。

那么导入模块呢?

Pickle 语义处理其中的一部分,但是 __main__sys.modules 需要一些 tlc,它被处理 here(在“清理环境”位期间)。