为什么子进程在 Windows 开始时导入主模块,而在 Linux 不导入?

Why do subprocesses import the main module at start on Windows while they don't on Linux?

示例:以下代码在 Ubuntu 14.04

上运行良好
# some imports
import numpy as np
import glob
import sys
import multiprocessing
import os

# creating some temporary data
tmp_dir = os.path.join('tmp', 'nptest')
if not os.path.exists(tmp_dir):
    os.makedirs(tmp_dir)
    for i in range(10):
        x = np.random.rand(100, 50)
        y = np.random.rand(200, 20)
        file_path = os.path.join(tmp_dir, '%05d.npz' % i)
        np.savez_compressed(file_path, x=x, y=y)

def read_npz(path):
    data = dict(np.load(path))
    return (data['x'], data['y'])

def parallel_read(files):
    pool = multiprocessing.Pool(processes=4)
    data_list = pool.map(read_npz, files)
    return data_list

files = glob.glob(os.path.join(tmp_dir, '*.npz'))
x = parallel_read(files)
print('done')

但在 Windows 7 上失败,错误消息如下:

    cmd = get_command_line() + [rhandle]
    pool = multiprocessing.Pool(processes=4)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
  File "C:\Anaconda\lib\multiprocessing\__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
    is not going to be frozen to produce a Windows executable.''')
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.
    self._repopulate_pool()
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
    w.start()
  File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
    cmd = get_command_line() + [rhandle]
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
    is not going to be frozen to produce a Windows executable.''')
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.

根据我的理解,这是因为子进程在 Windows 开始时导入主模块,而在 Linux 不导入。 Windows 上的问题可以通过在主函数中放置 x = parallel_read(files) 来避免。例如:

if __name__ == '__main__':    
    x = parallel_read(files)
    print('done')

为什么子进程在 Windows 开始时导入主模块,而在 Linux 不导入?

Windows 没有 fork 函数。大多数其他操作系统都这样做,在这些平台上,multiprocessing 使用它来启动与父进程具有相同状态的新进程。 Windows 必须通过其他方式设置子进程的状态,包括导入 __main__ 模块。

请注意,如果您提出要求,Python 3.4(及更高版本)允许您在所有操作系统上使用非分叉实现。有关此功能的讨论,请参阅错误跟踪器上的 issue 8713