运行命令并像在终端中一样近乎实时地分别获取其标准输出、标准错误

Question

我正在尝试在 Python 中找到一种方法运行其他程序，这样：

正在运行的程序的stdout和stderr可以被记录分开。
程序的标准输出和标准错误运行可以是近实时查看，这样如果子进程挂起，用户可以看到。（即我们不等待执行完成之前向用户打印 stdout/stderr)
奖金标准：运行的程序并不知道它是通过 python 运行，因此不会做意想不到的事情（比如分块它的输出而不是实时打印，或退出，因为它需要终端查看其输出）。这个小标准几乎意味着我们需要我想使用 pty。

这是我目前所知道的... 方法一：

def method1(command):
    ## subprocess.communicate() will give us the stdout and stderr sepurately, 
    ## but we will have to wait until the end of command execution to print anything.
    ## This means if the child process hangs, we will never know....
    proc=subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable='/bin/bash')
    stdout, stderr = proc.communicate() # record both, but no way to print stdout/stderr in real-time
    print ' ######### REAL-TIME ######### '
    ########         Not Possible
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print stdout
    print 'STDOUT:'
    print stderr

方法二

def method2(command):
    ## Using pexpect to run our command in a pty, we can see the child's stdout in real-time,
    ## however we cannot see the stderr from "curl google.com", presumably because it is not connected to a pty?
    ## Furthermore, I do not know how to log it beyond writing out to a file (p.logfile). I need the stdout and stderr
    ## as strings, not files on disk! On the upside, pexpect would give alot of extra functionality (if it worked!)
    proc = pexpect.spawn('/bin/bash', ['-c', command])
    print ' ######### REAL-TIME ######### '
    proc.interact()
    print ' ########## RESULTS ########## '
    ########         Not Possible

方法三：

def method3(command):
    ## This method is very much like method1, and would work exactly as desired
    ## if only proc.xxx.read(1) wouldn't block waiting for something. Which it does. So this is useless.
    proc=subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable='/bin/bash')
    print ' ######### REAL-TIME ######### '
    out,err,outbuf,errbuf = '','','',''
    firstToSpeak = None
    while proc.poll() == None:
            stdout = proc.stdout.read(1) # blocks
            stderr = proc.stderr.read(1) # also blocks
            if firstToSpeak == None:
                if stdout != '': firstToSpeak = 'stdout'; outbuf,errbuf = stdout,stderr
                elif stderr != '': firstToSpeak = 'stderr'; outbuf,errbuf = stdout,stderr
            else:
                if (stdout != '') or (stderr != ''): outbuf += stdout; errbuf += stderr
                else:
                    out += outbuf; err += errbuf;
                    if firstToSpeak == 'stdout': sys.stdout.write(outbuf+errbuf);sys.stdout.flush()
                    else: sys.stdout.write(errbuf+outbuf);sys.stdout.flush()
                    firstToSpeak = None
    print ''
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print out
    print 'STDERR:'
    print err

要尝试这些方法，您需要 import sys,subprocess,pexpect

pexpect 是纯粹的-python 并且可以与

一起使用

sudo pip install pexpect

我认为解决方案将涉及 python 的 pty 模块 - 这有点像魔法，我找不到任何知道如何使用的人。也许 SO 知道 :) 作为提醒，我建议您使用 'curl www.google.com' 作为测试命令，因为它出于某种原因将其状态打印在 stderr 上：D

更新-1：
好的，所以 pty 库不适合人类使用。本质上，文档是源代码。任何提出的阻塞而不是异步的解决方案都不会在这里工作。 Padraic Cunningham 的 Threads/Queue 方法效果很好，尽管添加 pty 支持是不可能的 - 它是 'dirty'（引用 Freenode 的#python）。似乎唯一适合生产标准代码的解决方案是使用 Twisted 框架，它甚至支持 pty 作为布尔开关到运行进程，就好像它们是从 shell 调用的一样。但是将 Twisted 添加到项目中需要完全重写所有代码。这真是太糟糕了：/

更新 2：

Two answers were provided, one of which addresses the first two criteria and will work well where you just need both the stdout and stderr using Threads and Queue. The other answer uses select, a non-blocking method for reading file descriptors, and pty, a method to "trick" the spawned process into believing it is running in a real terminal just as if it was run from Bash directly - but may or may not have side-effects. I wish I could accept both answers, because the "correct" method really depends on the situation and why you are subprocessing in the first place, but alas, I could only accept one.

Answer 1

如果你想从 stderr 和 stdout 中读取并分别获取输出，你可以使用带有队列的线程，没有经过过度测试，但类似于以下内容：

import threading
import queue

def run(fd, q):
    for line in iter(fd.readline, ''):
        q.put(line)
    q.put(None)


def create(fd):
    q = queue.Queue()
    t = threading.Thread(target=run, args=(fd, q))
    t.daemon = True
    t.start()
    return q, t


process = Popen(["curl","www.google.com"], stdout=PIPE, stderr=PIPE,
                universal_newlines=True)

std_q, std_out = create(process.stdout)
err_q, err_read = create(process.stderr)

while std_out.is_alive() or err_read.is_alive():
        for line in iter(std_q.get, None):
            print(line)
        for line in iter(err_q.get, None):
            print(line)

Answer 2

The stdout and stderr of the program being run can be logged separately.

您不能使用 pexpect，因为 stdout 和 stderr 都转到同一个 pty 并且之后无法将它们分开。

The stdout and stderr of the program being run can be viewed in near-real time, such that if the child process hangs, the user can see. (i.e. we do not wait for execution to complete before printing the stdout/stderr to the user)

如果子进程的输出不是 tty，则 it is likely that it uses a block buffering，因此如果它不产生太多输出，则 它不会是 "real time" 例如，如果缓冲区是 4K，那么你的父 Python 进程将看不到任何东西，直到子进程打印 4K 字符并且缓冲区溢出或者它被显式刷新（在子进程内）。这个缓冲区在子进程内部，没有标准的方法从外部管理它。这是显示 stdio 缓冲区和 command 1 | command2 shell 管道的管道缓冲区的图片：

The program being run does not know it is being run via python, and thus will not do unexpected things (like chunk its output instead of printing it in real-time, or exit because it demands a terminal to view its output).

看来，您的意思恰恰相反，即如果输出被重定向到管道（当您使用 stdout=PIPE 在 Python 中）。这意味着默认 threading or asyncio solutions 不会像您的情况那样工作。

有几种解决方法：

该命令可以接受命令行参数，例如 grep --line-buffered 或 python -u，以禁用块缓冲。

stdbuf works for some programs 即，您可以运行 ['stdbuf', '-oL', '-eL'] + command 使用上面的线程或异步解决方案，您应该分别获得 stdout、stderr 并且行应该出现在附近-实时：

#!/usr/bin/env python3
import os
import sys
from select import select
from subprocess import Popen, PIPE

with Popen(['stdbuf', '-oL', '-e0', 'curl', 'www.google.com'],
           stdout=PIPE, stderr=PIPE) as p:
    readable = {
        p.stdout.fileno(): sys.stdout.buffer, # log separately
        p.stderr.fileno(): sys.stderr.buffer,
    }
    while readable:
        for fd in select(readable, [], [])[0]:
            data = os.read(fd, 1024) # read available
            if not data: # EOF
                del readable[fd]
            else: 
                readable[fd].write(data)
                readable[fd].flush()

最后，你可以尝试 pty + select 两个 pty 的解决方案：

#!/usr/bin/env python3
import errno
import os
import pty
import sys
from select import select
from subprocess import Popen

masters, slaves = zip(pty.openpty(), pty.openpty())
with Popen([sys.executable, '-c', r'''import sys, time
print('stdout', 1) # no explicit flush
time.sleep(.5)
print('stderr', 2, file=sys.stderr)
time.sleep(.5)
print('stdout', 3)
time.sleep(.5)
print('stderr', 4, file=sys.stderr)
'''],
           stdin=slaves[0], stdout=slaves[0], stderr=slaves[1]):
    for fd in slaves:
        os.close(fd) # no input
    readable = {
        masters[0]: sys.stdout.buffer, # log separately
        masters[1]: sys.stderr.buffer,
    }
    while readable:
        for fd in select(readable, [], [])[0]:
            try:
                data = os.read(fd, 1024) # read available
            except OSError as e:
                if e.errno != errno.EIO:
                    raise #XXX cleanup
                del readable[fd] # EIO means EOF on some systems
            else:
                if not data: # EOF
                    del readable[fd]
                else:
                    readable[fd].write(data)
                    readable[fd].flush()
for fd in masters:
    os.close(fd)

我不知道对 stdout、stderr 使用不同的 pty 有什么副作用。您可以尝试在您的情况下单个 pty 是否足够，例如，设置 stderr=PIPE 并使用 p.stderr.fileno() 而不是 masters[1]。 Comment in sh source suggests that there are issues if stderr not in {STDOUT, pipe}

Answer 3

同时 J.F。塞巴斯蒂安的回答肯定解决了问题的核心，我是运行 python 2.7（这不在最初的标准中）所以我只是把它扔给那些只想cut/paste 一些代码。我还没有彻底测试过这个，但是在我尝试过的所有命令上它似乎都运行得很好:) 您可能想将 .decode('ascii') 更改为 .decode('utf-8') - 我仍在测试这一点。

#!/usr/bin/env python2.7
import errno
import os
import pty
import sys
from select import select
import subprocess
stdout = ''
stderr = ''
command = 'curl google.com ; sleep 5 ; echo "hey"'
masters, slaves = zip(pty.openpty(), pty.openpty())
p = subprocess.Popen(command, stdin=slaves[0], stdout=slaves[0], stderr=slaves[1], shell=True, executable='/bin/bash')
for fd in slaves: os.close(fd)

readable = { masters[0]: sys.stdout, masters[1]: sys.stderr }
try:
    print ' ######### REAL-TIME ######### '
    while readable:
        for fd in select(readable, [], [])[0]:
            try: data = os.read(fd, 1024)
            except OSError as e:
                if e.errno != errno.EIO: raise
                del readable[fd]
            finally:
                if not data: del readable[fd]
                else:
                    if fd == masters[0]: stdout += data.decode('ascii')
                    else: stderr += data.decode('ascii')
                    readable[fd].write(data)
                    readable[fd].flush()
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise
finally:
    p.wait()
    for fd in masters: os.close(fd)
    print ''
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print stdout
    print 'STDERR:'
    print stderr

运行 命令并像在终端中一样近乎实时地分别获取其标准输出、标准错误

Run command and get its stdout, stderr separately in near real time like in a terminal

python

subprocess

tty

pexpect

pty

运行命令并像在终端中一样近乎实时地分别获取其标准输出、标准错误