过滤具有潜在无限输出的进程输出，检测 X 后的退出代码和超时

Question

我有一些现有的 Django 代码运行在 uwsgi 下（在 Linux 下) 线程禁用，它为某些请求执行子进程，我无法控制它。

正常操作如下：

子进程运行时间相当短，returns 退出代码为 0 或其他。该代码会将一些消息写入标准输出/标准错误。 return 代码（退出代码）会告诉我工作是否正确完成。如果执行失败，最好收集 stdout/stderr 并记录下来以了解失败的原因。

然而，在极少数情况下，子进程会遇到目前尚未理解的竞争条件，并且会执行以下操作。

它会重复将特定消息写入 stdout 和 stderr，然后循环并永远挂起。

因为我不知道是否有任何其他竞争条件，这可能会在有或没有任何输出的情况下冻结进程。我也想添加超时。（虽然是一个解决方案，但地址获取 return 代码并检测重复的消息已经是一个很好的成就。

到目前为止我尝试的是：

import os
import select
import subprocess
import time

CMD = ["bash", "-c", "echo hello"]
def run_proc(cmd=CMD, timeout=10):
    """ run a subprocess, fetch (and analyze stdout / stderr) and
        detect if script runs too long
        and exit when script finished
    """

    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout = proc.stdout.fileno()
    stderr = proc.stderr.fileno()

    t0 = time.time()
    while True:
        if time.time() - t0 > timeout:
            print("TIMEOUT")
            break
        rc = proc.returncode
        print("RC", rc)
        if proc.returncode is not None:
            break
        to_rd, to_wr, to_x = select.select([stdout, stderr], [], [], 2)
        print(to_rd, to_wr, to_x)
        if to_rd:
            if stdout in to_rd:
                rdata = os.read(stdout, 100)
                print("S:", repr(rdata))
            if stderr in to_rd:
                edata = os.read(stderr, 100)
                print("E:", repr(edata))
    print(proc.returncode)

实际上我不需要单独处理 stdout 和 stderr，但这并没有改变任何东西

然而，当子进程完成其输出时，一些非常奇怪的事情发生了。

the output of select tells me, that stdout and stderr can be read from, but when I read I get an empty string.
proc.returncode is still None

如何修复我的上述代码或如何以不同的方式解决我的问题？

Answer 1

至少检查 Popen.poll():

def run_proc(cmd=CMD, timeout=10):
    """ run a subprocess, fetch (and analyze stdout / stderr) and
        detect if script runs too long
        and exit when script finished
    """

    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout = proc.stdout.fileno()
    stderr = proc.stderr.fileno()

    t0 = time.time()
    while True:
        returncode = proc.poll()
        print("RC", returnode)
        if returncode is not None:
            break

        if time.time() - t0 > timeout:
            print("TIMEOUT")
            # You need to kill the subprocess, break doesn't stop it!
            proc.terminate()
            # wait for the killed process to 'reap' the zombie
            proc.wait()
            break

        to_rd, to_wr, to_x = select.select([stdout, stderr], [], [], 2)
        print(to_rd, to_wr, to_x)
        if to_rd:
            if stdout in to_rd:
                rdata = os.read(stdout, 100)
                print("S:", repr(rdata))
            if stderr in to_rd:
                edata = os.read(stderr, 100)
                print("E:", repr(edata))
    print(returncode)

输出：

RC None
[3] [] []
S: b'hello\n'
RC None
[3, 5] [] []
S: b''
E: b''
0

过滤具有潜在无限输出的进程输出，检测 X 后的退出代码和超时

filter outpout of process with potentially unlimted output, detect exit code and timeout after X

python

django

subprocess

uwsgi