如何防止 multiprocessing 继承 imports 和 globals？

Question

我在更大的代码库中使用多处理，其中一些导入语句有副作用。我如何运行后台进程中的函数而不继承全局导入？

# helper.py:

print('This message should only print once!')

# main.py:

import multiprocessing as mp
import helper  # This prints the message.

def worker():
  pass  # Unfortunately this also prints the message again.

if __name__ == '__main__':
  mp.set_start_method('spawn')
  process = mp.Process(target=worker)
  process.start()
  process.join()

背景： 导入 TensorFlow 会初始化 CUDA，它会保留一些 GPU 内存。因此，生成太多进程会导致 CUDA OOM 错误，即使这些进程不使用 TensorFlow。

没有答案的类似问题：

How to avoid double imports with the Python multiprocessing module?

Answer 1

# helper.py:

print('This message should only print once!')

# main.py:

import multiprocessing as mp

def worker():
  pass

def main():

  # Importing the module only locally so that the background
  # worker won't import it again.
  import helper

  mp.set_start_method('spawn')
  process = mp.Process(target=worker)
  process.start()
  process.join()

if __name__ == '__main__':
  main()

Answer 2

Is there a resources that explains exactly what the multiprocessing module does when starting an mp.Process?

超级快速版本（使用 spawn 上下文而不是 fork）

准备一些东西（一对用于通信、清理回调等的管道），然后使用进程对象的 fork()exec(). On windows it's CreateProcessW(). The new python interpreter is called with a startup script spawn_main() and passed the communication pipe file descriptors via a crafted command string and the -c switch. The startup script cleans up the environment a little bit, then unpickles the Process object from its communication pipe. Finally it calls the run 方法创建一个新进程。

那么导入模块呢？

Pickle 语义处理其中的一部分，但是 __main__ 和 sys.modules 需要一些 tlc，它被处理 here（在“清理环境”位期间）。

如何防止 multiprocessing 继承 imports 和 globals？

How to prevent multiprocessing from inheriting imports and globals?

python

multiprocessing

tensorflow