Python 子进程：提供标准输入，读取标准输出，然后提供更多标准输入

Question

我正在使用一款名为 Chimera 的科学软件。对于这个问题下游的一些代码，它要求我使用 Python 2.7.

我想调用一个进程，为该进程提供一些输入，读取其输出，并在此基础上为其提供更多输入，等等。

我使用 Popen 打开进程，process.stdin.write 传递标准输入，但是当进程仍在运行时我试图获取输出卡住了宁。 process.communicate() 停止进程，process.stdout.readline() 似乎让我陷入无限循环。

这是我想做的一个简化示例：

假设我有一个名为 exampleInput.sh 的 bash 脚本。

#!/bin/bash
# exampleInput.sh

# Read a number from the input
read -p 'Enter a number: ' num

# Multiply the number by 5
ans1=$( expr $num \* 5 )

# Give the user the multiplied number
echo $ans1

# Ask the user whether they want to keep going
read -p 'Based on the previous output, would you like to continue? ' doContinue

if [ $doContinue == "yes" ]
then
    echo "Okay, moving on..."
    # [...] more code here [...]
else
    exit 0
fi

通过命令行与之交互，我会运行脚本，输入“5”，然后，如果它返回“25”，我会输入 "yes" 并且，如果没有，我会输入 "no".

我想运行一个 python 脚本，我在其中传递 exampleInput.sh“5”，如果返回“25”，则我传递 "yes"

到目前为止，这是我能得到的最接近的结果：

#!/home/user/miniconda3/bin/python2
# talk_with_example_input.py
import subprocess
process = subprocess.Popen(["./exampleInput.sh"], 
                        stdin = subprocess.PIPE,
                        stdout = subprocess.PIPE)
process.stdin.write("5")

answer = process.communicate()[0]

if answer == "25":
    process.stdin.write("yes")
    ## I'd like to print the STDOUT here, but the process is already terminated

但这当然失败了，因为在 `process.communicate()' 之后，我的进程不再运行ning。

（就在case/FYI）：实际问题

Chimera 通常是一个基于图形用户界面的应用程序，用于检查蛋白质结构。如果您运行 chimera --nogui，它会打开提示并接受输入。

在我运行我的下一个命令之前，我经常需要知道 chimera 输出什么。例如，我经常会尝试生成一个蛋白质表面，如果 Chimera 不能生成一个表面，它就不会破裂——它只是通过 STDOUT 这么说。因此，在我的 python 脚本中，当我循环遍历许多蛋白质进行分析时，我需要检查 STDOUT 以了解是否继续对该蛋白质进行分析。

在其他用例中，我将运行通过 Chimera 使用大量命令来首先清理蛋白质，然后我将使用运行许多单独的命令来获得不同的片段数据，并使用该数据来决定是否运行其他命令。我可以获取数据，关闭子进程，然后运行另一个进程，但是每次都需要重新运行执行所有这些清理命令。

无论如何，这些是我希望能够将 STDIN 推送到子进程、读取 STDOUT 并且仍然能够推送更多 STDIN 的一些现实原因。

感谢您的宝贵时间！

Answer 1

您不需要在示例中使用 process.communicate。

使用process.stdin.write和process.stdout.read简单地读写。还要确保发送换行符，否则 read 不会 return。当您从标准输入读取时，您还必须处理来自 echo.

的换行符

注意：process.stdout.read 将阻塞直到 EOF。

# talk_with_example_input.py
import subprocess

process = subprocess.Popen(["./exampleInput.sh"], 
                        stdin = subprocess.PIPE,
                        stdout = subprocess.PIPE)

process.stdin.write("5\n")
stdout = process.stdout.readline()
print(stdout)

if stdout == "25\n":
    process.stdin.write("yes\n")
    print(process.stdout.readline())

$ python2 test.py
25

Okay, moving on...

更新

以这种方式与程序通信时，您必须特别注意应用程序实际编写的内容。最好是在十六进制编辑器中分析输出：

$ chimera --nogui 2>&1 | hexdump -C

请注意 readline ^[1] 仅读取到下一个换行符 (\n)。在您的情况下，您必须至少调用 readline 四次才能获得第一个输出块。

如果您只想读取子进程停止打印之前的所有内容，则必须逐字节读取并实现超时。遗憾的是，read 和 readline 都没有提供这样的超时机制。这可能是因为底层 read 系统调用 ^[2] (Linux) 也没有提供。

在 Linux 上，我们可以使用 poll / select 编写单线程 read_with_timeout()。有关示例，请参阅 ^[3].

from select import epoll, EPOLLIN

def read_with_timeout(fd, timeout__s):
    """Reads from fd until there is no new data for at least timeout__s seconds.

    This only works on linux > 2.5.44.
    """
    buf = []
    e = epoll()
    e.register(fd, EPOLLIN)
    while True:
        ret = e.poll(timeout__s)
        if not ret or ret[0][1] is not EPOLLIN:
            break
        buf.append(
            fd.read(1)
        )
    return ''.join(buf)

如果您需要一种可靠的方式来读取 Windows 和 Linux 下的非阻塞，this answer might be helpful.

^[1] 来自 python 2 docs:

readline(limit=-1)

Read and return one line from the stream. If limit is specified, at most limit bytes will be read.

The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized.

^[2] 来自 man 2 read:

#include <unistd.h>

ssize_t read(int fd, void *buf, size_t count);

^[3] 例子

$ tree
.
├── prog.py
└── prog.sh

prog.sh

#!/usr/bin/env bash

for i in $(seq 3); do
  echo "${RANDOM}"
  sleep 1
done

sleep 3
echo "${RANDOM}"

prog.py

# talk_with_example_input.py
import subprocess
from select import epoll, EPOLLIN

def read_with_timeout(fd, timeout__s):
    """Reads from f until there is no new data for at least timeout__s seconds.

    This only works on linux > 2.5.44.
    """
    buf = []
    e = epoll()
    e.register(fd, EPOLLIN)
    while True:
        ret = e.poll(timeout__s)
        if not ret or ret[0][1] is not EPOLLIN:
            break
        buf.append(
            fd.read(1)
        )
    return ''.join(buf)

process = subprocess.Popen(
    ["./prog.sh"],
    stdin = subprocess.PIPE,
    stdout = subprocess.PIPE
)

print(read_with_timeout(process.stdout, 1.5))
print('-----')
print(read_with_timeout(process.stdout, 3))

$ python2 prog.py 
6194
14508
11293

-----
10506

Python 子进程：提供标准输入，读取标准输出，然后提供更多标准输入

Python subprocess: Giving stdin, reading stdout, then giving more stdin

python

subprocess

scientific-software

更新