subprocess.Popen()：在 child 执行期间更改标准错误

Question

目标： 我正在尝试将一个 Python 脚本放在一起，以捕获因执行一段代码而发生的网络流量。为简单起见，假设我想记录因调用 socket.gethostbyname('example.com') 而产生的网络流量。注意：我不能简单地在 gethostbyname() returns 时终止 tcpdump 因为我想测量的实际代码块会触发其他外部代码，而且我无法确定这个外部代码何时代码完成执行（所以我必须离开 tcpdump 运行 “足够长的时间”，因为我很可能记录了这个外部代码产生的所有流量）。

方法： 我正在使用 subprocess 启动 tcpdump，告诉 tcpdump 在 duration 秒后自行终止使用其 -G 和 -W 选项，例如：

duration = 15
nif = 'en0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]
tcpdump_proc = subprocess.Popen(cmd)
socket.gethostbyname('example.com')
time.sleep(duration + 5) # sleep longer than tcpdump is running

这个问题是 Popen() returns before tcpdump 完全启动并且运行，因此 [=调用 gethostbyname() 产生的流量的 76=] 将不会被捕获。我显然可以在调用 gethostbyname() 之前添加一个 time.sleep(x) 来给 tcpdump 一点时间来启动，但这不是一个可移植的解决方案（我不能随便选择一些 x < duration 因为功能强大的系统会比功能较弱的系统更早开始捕获数据包。

为了解决这个问题，我的想法是解析 tcpdump 的输出以查找何时将以下内容写入其 stderr，因为这似乎表明捕获已启动并且运行完全:

tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes

因此我需要附加到 stderr，但问题是我不想承诺读取它的所有输出，因为我需要我的代码继续执行代码块 I想要测量（在此示例中为 gethostbyname()）而不是陷入从 stderr.

读取的循环中

我可以通过添加一个阻止主线程继续调用 gethostbyname() 的信号量来解决这个问题，并让后台线程从 stderr 读取并递减信号量（让主线程thread move on) 当它从 stderr 读取上面的字符串时，但我想尽可能保留代码 single-threaded。

根据我的理解，将 subprocess.PIPE 用于 stderr 和 stdout 而不承诺读取所有输出是一个很大的 NONO，因为 child 将结束当缓冲区填满时阻塞。但是，如果您只对阅读输出的第一部分感兴趣，您能否在执行过程中“分离”（销毁？）管道？基本上我想以这样的方式结束：

duration = 15
nif = 'en0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]
tcpdump_proc = subprocess.Popen(cmd, stderr=subprocess.PIPE, text=True)
for l in tcpdump_proc.stderr:
    if 'tcpdump: listening on' in l:
        break
socket.gethostbyname('example.com')
time.sleep(duration) # sleep at least as long as tcpdump is running

我还需要在 if 块中添加什么来“重新分配”谁负责阅读 stderr？我可以将 stderr 设置回 None (tcpdump_proc.stderr = None) 吗？或者我应该打电话给 tcpdump_proc.stderr.close()（如果我这样做会 tcpdump 提前终止）吗？

也很可能是我错过了一些明显的东西，并且有更好的方法来实现我想要的 - 如果是这样，请赐教 :)。

提前致谢:)

Answer 1

您可以在收到 listening on 消息后在 stderr 上使用 detach() 或 close()：

import subprocess
import time

duration = 10
nif = 'eth0'
pcap = 'dns.pcap'
cmd = ['tcpdump', '-G', str(duration), '-W', '1', '-i', nif, '-w', pcap]

proc = subprocess.Popen(
    cmd, shell=False, stderr=subprocess.PIPE, bufsize=1, text=True
)
for i, line in enumerate(proc.stderr):
    print('read %d lines from stderr' % i)
    if 'listening on' in line:
        print('detach stderr!')
        proc.stderr.detach()
        break

while proc.poll() is None:
    print("doing something else while tcpdump is runnning!")
    time.sleep(2)

print(proc.returncode)
print(proc.stderr.read())

输出：

read 0 lines from stderr
detach stderr!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
doing something else while tcpdump is runnning!
0
Traceback (most recent call last):
  File "x.py", line 24, in <module>
    print(proc.stderr.read())
ValueError: underlying buffer has been detached

注：

我还没有检查真正 stderr 数据发生了什么，但是分离 stderr 似乎对 tcpdump 没有任何影响。

subprocess.Popen()：在 child 执行期间更改标准错误

subprocess.Popen(): change stderr during child's execution

python

subprocess

pipe

tcpdump