pexpect 如何分析 child 的标准输出?

How pexpect analyzes stdout of the child?

有如下代码:

child = pexpect.spawn("prog")
#some delay...
child.expect(Name .*: )
child.sendline('anonymous')

当 child 进程启动后,它可以开始向其标准输出发送大量数据,例如日志信息。这是否意味着 pexpect 开始查找所有 child 的标准输出(从进程开始到当前时刻)?或者 pexpect 在 expect 调用后才开始做?

我的 child 进程生成了大量日志信息。 CPU 的速度非常慢。我想这种预期的实现可能是原因

After a child process is spawned, the child will write() its data to the pty (slave side) and waiting parent to read() 来自 pty(主控方)的数据。如果没有 child.expect(),child 的 write() 可能会在输出过多数据时被阻塞,因为写入缓冲区已满。

child.expect()匹配一个模式时它会return然后你必须再次调用child.expect()否则child输出太多后可能仍然被阻塞数据。

参见以下示例:

# python
>>> import pexpect
>>> ch = pexpect.spawn('find /')
>>> ch
<pexpect.pty_spawn.spawn object at 0x7f47390bae90>
>>>

此时 find 已经生成,并且已经输出了一些数据。但是我没有调用 ch.expect() 所以 find 现在被阻塞(休眠)并且它不消耗 CPU.

# ps -C find u
USER     PID %CPU %MEM  VSZ   RSS TTY     STAT START   TIME COMMAND
root  100831  0.0  0.2 9188  2348 pts/12  Ss+  10:23   0:00 /usr/bin/find /
# strace -p 100831
Process 100831 attached
write(1, "\n", 1             <-- The write() is being blocked

这里的 STAT S 表示 sleepings 表示 session leader, +表示前台进程).


根据pexpect的文档,spawn()的两个选项可能会影响性能:

The maxread attribute sets the read buffer size. This is maximum number of bytes that Pexpect will try to read from a TTY at one time. Setting the maxread size to 1 will turn off buffering. Setting the maxread value higher may help performance in cases where large amounts of output are read back from the child. This feature is useful in conjunction with searchwindowsize.

When the keyword argument searchwindowsize is None (default), the full buffer is searched at each iteration of receiving incoming data. The default number of bytes scanned at each iteration is very large and may be reduced to collaterally reduce search cost. After expect() returns, the full buffer attribute remains up to size maxread irrespective of searchwindowsize value.