捕获输出，包括子进程的控制字符

Question

我有以下简单程序运行一个子进程和 tee 它的输出到 stdout 和一些缓冲区

import subprocess
import sys
import time

import unicodedata

p = subprocess.Popen(
    "top",
    shell=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

stdout_parts = []
while p.poll() is None:
    for bytes in iter(p.stdout.readline, b''):
        stdout_parts.append(bytes)
        str = bytes.decode("utf-8")
        sys.stdout.write(str)
        for ch in str:
            if unicodedata.category(ch)[0]=="C" and ord(ch) != 10:
                raise Exception(f"control character! {ord(ch)}")
    time.sleep(0.01)

当运行运行一些终端更新程序时，例如 top 或 docker pull，我希望能够捕获它的整个输出，即使它不是立即的这样可读。

例如阅读How do commands like top update output without appending in the console?，似乎是通过控制字符实现的。但是，从进程输出流 (stdout/stderr) 读取行时，我没有收到任何消息。或者他们使用的技术不同，我无法从子流程中捕捉到它？

Answer 1

许多工具根据它们是否连接到终端来调整它们的输出。如果您想在终端中以交互方式接收运行工具时所看到的输出，请使用 pexpect 等包装器来模拟此行为。（还有一个低级别的 pty 库，但这使用起来很棘手，特别是如果你是这个问题的新手 space。）

一些工具还允许您为脚本指定批处理操作模式；也许查看 top -b（尽管这在 MacOS 上不可用）。

郑重声明，许多屏幕控制序列并不完全或主要由控制字符组成；例如，在 curses 中将光标移动到特定位置的控制序列以转义字符 (0x1B) 开头，但除此之外由常规可打印字符组成。如果你真的想处理这些序列，可能会考虑使用 curses / ANSI 控制代码解析库。但对于大多数用途，更好的方法是使用机器可读的 API 并完全禁用屏幕更新。在 Linux 上，/proc 伪文件系统提供了大量机器可读的信息。

Answer 2

从恢复到问题的编辑中抢救的内容：

一些解决方案可以很好地打印 top 以及答案中的提示：

import os
import pty
import subprocess
import sys
import time

import select

stdout_master_fd, stdout_slave_fd = pty.openpty()
stderr_master_fd, stderr_slave_fd = pty.openpty()

p = subprocess.Popen(
    "top",
    shell=True,
    stdout=stdout_slave_fd,
    stderr=stderr_slave_fd,
    close_fds=True
)

stdout_parts = []
while p.poll() is None:
    rlist, _, _ = select.select([stdout_master_fd, stderr_master_fd], [], [])
    for f in rlist:
        output = os.read(f, 1000)  # This is used because it doesn't block
        sys.stdout.write(output.decode("utf-8"))
        sys.stdout.flush()
    time.sleep(0.01)

捕获输出，包括子进程的控制字符

Capture output including control characters of subprocess

python

subprocess

control-characters

tty

python-module-unicodedata