过滤具有潜在无限输出的进程输出,检测 X 后的退出代码和超时

filter outpout of process with potentially unlimted output, detect exit code and timeout after X

我有一些现有的 Django 代码 运行 在 uwsgi 下(在 Linux 下) 线程禁用,它为某些请求执行子进程,我无法控制它。

正常操作如下:

然而,在极少数情况下,子进程会遇到目前尚未理解的竞争条件,并且会执行以下操作。

因为我不知道是否有任何其他竞争条件,这可能会在有或没有任何输出的情况下冻结进程。我也想添加超时。 (虽然是一个解决方案,但地址获取 return 代码并检测重复的消息已经是一个很好的成就。

到目前为止我尝试的是:

import os
import select
import subprocess
import time

CMD = ["bash", "-c", "echo hello"]
def run_proc(cmd=CMD, timeout=10):
    """ run a subprocess, fetch (and analyze stdout / stderr) and
        detect if script runs too long
        and exit when script finished
    """

    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout = proc.stdout.fileno()
    stderr = proc.stderr.fileno()

    t0 = time.time()
    while True:
        if time.time() - t0 > timeout:
            print("TIMEOUT")
            break
        rc = proc.returncode
        print("RC", rc)
        if proc.returncode is not None:
            break
        to_rd, to_wr, to_x = select.select([stdout, stderr], [], [], 2)
        print(to_rd, to_wr, to_x)
        if to_rd:
            if stdout in to_rd:
                rdata = os.read(stdout, 100)
                print("S:", repr(rdata))
            if stderr in to_rd:
                edata = os.read(stderr, 100)
                print("E:", repr(edata))
    print(proc.returncode)

实际上我不需要单独处理 stdout 和 stderr,但这并没有改变任何东西

然而,当子进程完成其输出时,一些非常奇怪的事情发生了。

the output of select tells me, that stdout and stderr can be read from, but when I read I get an empty string.
proc.returncode is still None

如何修复我的上述代码或如何以不同的方式解决我的问题?

至少检查 Popen.poll():

def run_proc(cmd=CMD, timeout=10):
    """ run a subprocess, fetch (and analyze stdout / stderr) and
        detect if script runs too long
        and exit when script finished
    """

    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout = proc.stdout.fileno()
    stderr = proc.stderr.fileno()

    t0 = time.time()
    while True:
        returncode = proc.poll()
        print("RC", returnode)
        if returncode is not None:
            break

        if time.time() - t0 > timeout:
            print("TIMEOUT")
            # You need to kill the subprocess, break doesn't stop it!
            proc.terminate()
            # wait for the killed process to 'reap' the zombie
            proc.wait()
            break

        to_rd, to_wr, to_x = select.select([stdout, stderr], [], [], 2)
        print(to_rd, to_wr, to_x)
        if to_rd:
            if stdout in to_rd:
                rdata = os.read(stdout, 100)
                print("S:", repr(rdata))
            if stderr in to_rd:
                edata = os.read(stderr, 100)
                print("E:", repr(edata))
    print(returncode)

输出:

RC None
[3] [] []
S: b'hello\n'
RC None
[3, 5] [] []
S: b''
E: b''
0