生成新进程时导入会发生什么情况？

Question

生成新进程时导入的模块变量会发生什么情况？

IE

with concurrent.futures.ProcessPoolExecutor(max_workers=settings.MAX_PROCESSES) as executor:
    for stuff in executor.map(foo, paths):

其中：

  def foo(str):
  x = someOtherModule.fooBar()

foobar 正在访问在 someOtherModule 开头声明的内容：

someOtherModule.py:

 myHat='green'
 def fooBar():
   return myHat

具体来说，我有一个模块（称为 Y），它有一个在顶部初始化的 py4j 网关，在任何函数之外。在模块 X 中，我一次加载多个文件，加载后对数据进行排序的函数使用 Y 中的一个函数，该函数又使用网关。

这个设计是pythonic吗？我应该在每个新进程产生后导入我的 Y 模块吗？或者有更好的方法吗？

Answer 1

当您创建一个新进程时，会调用一个 fork()，它会克隆整个进程和堆栈、内存 space 等。这就是为什么多处理被认为比多处理更昂贵的原因线程，因为复制很昂贵。

所以为了回答您的问题，所有 "imported module variables" 都被克隆了。您可以随意修改它们，但您的原始父进程不会看到此更改。

编辑： 这仅适用于基于 Unix 的系统。请参阅 Dano 对 Unix+Windows 的回答。

Answer 2

在 Linux 上，fork 将用于生成 child，因此 parent 全局范围内的任何内容也将在 child，具有 copy-on-write 语义。

在 Windows 上，您 import 在 parent 进程的 __main__ 模块中 module-level 的任何内容都将在 re-imported child.

这意味着如果您有一个 parent 模块（我们称它为 someModule），如下所示：

import someOtherModule
import concurrent.futures

def foo(str):
    x = someOtherModule.fooBar()

if __name__ == "__main__":
    with concurrent.futures.ProcessPoolExecutor(max_workers=settings.MAX_PROCESSES) as executor:
        for stuff in executor.map(foo, paths):
            # stuff

而 someOtherModule 看起来像这样：

myHat='green'
def fooBar():
    return myHat

在此示例中，someModule 是脚本的 __main__ 模块。因此，在 Linux 上，您在 child 中获得的 myHat 实例将是 someModule 中实例的 copy-on-write 版本。在 Windows 上，每个 child 进程一加载就会 re-import someModule，这将导致 someOtherModule 也成为 re-imported。

我对 py4j 的了解还不够 Gateway objects 无法判断您是否确定这是您想要的行为。如果 Gateway object 是可腌制的，您可以显式地将其传递给每个 child，但您必须使用 multiprocessing.Pool 而不是 concurrent.futures.ProcessPoolExecutor：

import someOtherModule
import multiprocessing

def foo(str):
    x = someOtherModule.fooBar()

def init(hat):
    someOtherModule.myHat = hat

if __name__ == "__main__":
    hat = someOtherModule.myHat
    pool = multiprocessing.Pool(settings.MAX_PROCESSES,
                                initializer=init, initargs=(hat,))
    for stuff in pool.map(foo, paths):
            # stuff

不过，您似乎不需要为您做这件事 use-case。使用 re-import.

可能没问题

生成新进程时导入会发生什么情况？

What happens to imports when a new process is spawned?

python

module

multiprocessing

py4j