僵尸状态多处理库 python3

Zombie state multiprocessing library python3

我的问题涉及 join() 函数的替换,以避免在使用 python3 的多处理库时已终止进程的失效或僵尸状态。是否有替代方案可以暂停子进程被终止,直到它们从主进程获得绿灯?这允许它们正确终止而不会进入僵尸状态?

我使用以下代码准备了一个快速示例,它启动了 20 个不同的进程,第一个进程需要 10 秒的加载工作,所有其他进程需要 3 秒的加载工作:

import os
import sys
import time
import multiprocessing as mp
from multiprocessing import Process

def exe(i):
    print(i)    
    if i == 1:
        time.sleep(10)
    else:
        time.sleep(3)
procs = []
for i in range(1,20):
    proc = Process(target=exe, args=(i,))
    proc.start()
    procs.append(proc)

for proc in procs:
    print(proc) # <-- I'm blocked to join others till the first process finishes its work load
    proc.join()

print("finished")

如果启动脚本,您会看到所有其他进程都进入僵尸状态,直到从第一个进程释放 join() 函数。这可能会使系统不稳定或过载!

谢谢

根据 this thread,Marko Rauhamaa 写道:

If you don't care to know when child processes exit, you can simply ignore the SIGCHLD signal:

import signal
signal.signal(signal.SIGCHLD, signal.SIG_IGN)

That will prevent zombies from appearing.

wait(2) man page说明:

POSIX.1-2001 specifies that if the disposition of SIGCHLD is set to SIG_IGN or the SA_NOCLDWAIT flag is set for SIGCHLD (see sigaction(2)), then children that terminate do not become zombies and a call to wait() or waitpid() will block until all children have terminated, and then fail with errno set to ECHILD. (The original POSIX standard left the behavior of setting SIGCHLD to SIG_IGN unspecified. Note that even though the default disposition of SIGCHLD is "ignore", explicitly setting the disposition to SIG_IGN results in different treatment of zombie process children.)

Linux 2.6 conforms to the POSIX requirements. However, Linux 2.4 (and earlier) does not: if a wait() or waitpid() call is made while SIGCHLD is being ignored, the call behaves just as though SIGCHLD were not being ignored, that is, the call blocks until the next child terminates and then returns the process ID and status of that child.

因此,如果您正在使用 Linux 2.6 或 POSIX 兼容的 OS,使用上述代码将允许子进程退出而不会变成僵尸。如果您没有使用 POSIX 兼容的 OS,那么上面的线程提供了许多选项。下面是另一种选择,有点类似于 Marko Rauhamaa 的 third suggestion.


如果出于某种原因您需要知道子进程何时退出并希望 处理(至少其中一些)不同,那么你可以设置一个队列来 允许子进程在完成时向主进程发出信号。然后 主进程可以按照接收的顺序调用适当的连接 队列中的项目:

import time
import multiprocessing as mp

def exe(i, q):
    try:
        print(i)    
        if i == 1:
            time.sleep(10)
        elif i == 10:
            raise Exception('I quit')
        else:
            time.sleep(3)
    finally:
        q.put(mp.current_process().name)

if __name__ == '__main__':
    procs = dict()
    q = mp.Queue()
    for i in range(1,20):
        proc = mp.Process(target=exe, args=(i, q))
        proc.start()
        procs[proc.name] = proc

    while procs:
        name = q.get()
        proc = procs[name]
        print(proc) 
        proc.join()
        del procs[name]

    print("finished")

产生类似

的结果
...    
<Process(Process-10, stopped[1])>  # <-- process with exception still gets joined
19
<Process(Process-2, started)>
<Process(Process-4, stopped)>
<Process(Process-6, started)>
<Process(Process-5, stopped)>
<Process(Process-3, stopped)>
<Process(Process-9, started)>
<Process(Process-7, stopped)>
<Process(Process-8, started)>
<Process(Process-13, started)>
<Process(Process-12, stopped)>
<Process(Process-11, stopped)>
<Process(Process-16, started)>
<Process(Process-15, stopped)>
<Process(Process-17, stopped)>
<Process(Process-14, stopped)>
<Process(Process-18, started)>
<Process(Process-19, stopped)>
<Process(Process-1, started)>      # <-- Process-1 ends last
finished