为什么在“main”中导入模块不允许多进程使用模块？

Question

我已经通过将导入移至顶部声明解决了我的问题，但这让我想知道：为什么我不能在作为 [ 的目标的函数中使用在 '__main__' 中导入的模块=13=]?

例如：

import os
import multiprocessing as mp

def run(in_file, out_dir, out_q):
    arcpy.RaterToPolygon_conversion(in_file, out_dir, "NO_SIMPIFY", "Value")
    status = str("Done with "+os.path.basename(in_file))
    out_q.put(status, block=False)

if __name__ == '__main__':
    raw_input("Program may hang, press Enter to import ArcPy...")
    import arcpy

    q = mp.Queue()
    _file = path/to/file
    _dir = path/to/dir
    # There are actually lots of files in a loop to build
    # processes but I just do one for context here
    p = mp.Process(target=run, args=(_file, _dir, q))
    p.start()

# I do stuff with Queue below to status user

当你在 IDLE 中运行它根本不会出错...只是继续做 Queue 检查（这很好所以不是问题）。问题是，当您在 CMD 终端（OS 或 Python）中运行时，它会产生 arcpy 未定义的错误！

只是一个好奇的话题。

Answer 1

unix-like 系统和 Windows 的情况不同。在 unixy 系统上，multiprocessing 使用 fork 创建 child 进程共享 parent 内存 space 的 copy-on-write 视图。 child 看到从 parent 导入的内容，包括 parent 在 if __name__ == "__main__": 下导入的所有内容。

在windows，没有fork，必须执行一个新进程。但是简单地重新运行处理 parent 过程是行不通的——它会再次运行整个程序。相反，multiprocessing 运行自己的 python 程序导入 parent 主脚本，然后 pickles/unpickles parent [=64] 的视图=] space 即，希望足以满足 child 过程。

该程序是 child 进程的 __main__ 而 parent 脚本的 __main__ 不是运行。主脚本就像任何其他模块一样被导入。原因很简单：运行使用 parent __main__ 只会再次运行完整的 parent 程序，mp 必须避免。

这里有一个测试来说明发生了什么。名为 testmp.py 的主模块和由第一个导入的第二个模块 test2.py。

testmp.py

import os
import multiprocessing as mp

print("importing test2")
import test2

def worker():
    print('worker pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))

if __name__ == "__main__":
    print('main pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))
    print("running process")
    proc = mp.Process(target=worker)
    proc.start()
    proc.join()

test2.py

import os

print('test2 pid: {}, module name: {}, file name: {}'.format(os.getpid(),
        __name__, __file__))

当运行 on Linux 时，test2 被导入一次，worker 运行s 在主模块中。

importing test2
test2 pid: 17840, module name: test2, file name: /media/td/USB20FD/tmp/test2.py
main pid: 17840, module name: __main__, file name: testmp.py
running process
worker pid: 17841, module name: __main__, file name: testmp.py

在 windows 下，请注意 "importing test2" 打印了两次 - testmp.py 是运行两次。但是 "main pid" 只打印了一次 - 它的 __main__ 不是运行。那是因为 multiprocessing 在导入过程中将模块名称更改为 __mp_main__。

E:\tmp>py testmp.py
importing test2
test2 pid: 7536, module name: test2, file name: E:\tmp\test2.py
main pid: 7536, module name: __main__, file name: testmp.py
running process
importing test2
test2 pid: 7544, module name: test2, file name: E:\tmp\test2.py
worker pid: 7544, module name: __mp_main__, file name: E:\tmp\testmp.py

为什么在“main”中导入模块不允许多进程使用模块？

Why does importing module in 'main' not allow multiprocessig to use module?

python

multiprocessing

python-2.7

arcpy

为什么在“__main__”中导入模块不允许多进程使用模块？

Why does importing module in '__main__' not allow multiprocessig to use module?

python

multiprocessing

python-2.7

arcpy

为什么在“main”中导入模块不允许多进程使用模块？

Why does importing module in 'main' not allow multiprocessig to use module?