使用 python 子进程实时记录到文件

Question

我希望这很简单，但我无法解决这个问题。

我正在尝试将 DD 成像子进程的输出实时写入日志文件 - 我正在使用 DD v 8.25，您可以使用 'status=progress' 选项写入 stderr.

我可以通过将文件对象传递给 stderr 即

来让它实时记录完整输出

log_file = open('mylog.log', 'a')
p = subprocess.Popen['dd command...'], stdout=None, stderr=log_file)

...但我更愿意先拦截来自 stderr 的字符串，这样我就可以在写入文件之前对其进行解析。

我尝试过线程处理，但我似乎无法让它写入，或者即使可以写入，它也只在进程结束时而不是在进程中执行。

我是一个 python 新手，因此不胜感激。谢谢！

更新 - 现在可用 (ISH)

我看过 link J.F。 Sebastian 建议并发现 posts 关于使用线程，所以在那之后我使用 "kill -USR1" 技巧让 DD post 进展到 stderr 然后我可以拿起：

#! /usr/bin/env python
from subprocess import PIPE, Popen
from threading import Thread
from queue import Queue, Empty
import time

q = Queue()

def parsestring(mystr):
    newstring = mystr[0:mystr.find('bytes')]
    return newstring

def enqueue(out, q):
    for line in proc1.stderr:
        q.put(line)
    out.close()

def getstatus():
    while proc1.poll() == None:
        proc2 = Popen(["kill -USR1 $(pgrep ^dd)"], bufsize=1, shell=True)
        time.sleep(2)

with open("log_file.log", mode="a") as log_fh:
    start_time = time.time()

    #start the imaging
    proc1 = Popen(["dd if=/dev/sda1 of=image.dd bs=524288 count=3000"], bufsize=1, stderr=PIPE, shell=True)

    #define and start the queue function thread
    t = Thread(target=enqueue, args=(proc1.stderr, q))
    t.daemon = True
    t.start()

    #define and start the getstatus function thread
    t_getstatus = Thread(target=getstatus, args=())
    t_getstatus.daemon
    t_getstatus.start()

    #get the string from the queue

    while proc1.poll() == None:
        try: nline = q.get_nowait()
        except Empty:
            continue
        else:
            mystr = nline.decode('utf-8')           
            if mystr.find('bytes') > 0:
                log_fh.write(str(time.time()) + ' - ' + parsestring(mystr))
                log_fh.flush()

        #put in a delay
        #time.sleep(2)

    #print duration
    end_time=time.time()
    duration=end_time-start_time
    print('Took ' + str(duration) + ' seconds')

唯一的问题是我不知道如何提高性能。我只需要它每 2 秒左右报告一次状态，但增加时间延迟会增加成像时间，这是我不想要的。这是另一个 post 的问题...

感谢两位J.F。塞巴斯蒂安和阿里。

Answer 1

在此示例中，可以（使用 python 3）从 stderr 流式传输到控制台：

#! /usr/bin/env python
from subprocess import Popen, PIPE

# emulate a program that write on stderr
proc = Popen(["/usr/bin/yes 1>&2 "],  bufsize=512, stdout=PIPE, stderr=PIPE, shell=True)
r = b""
for line in proc.stderr:
    r += line
    print("current line", line, flush=True)

流式传输到文件：

#! /usr/bin/env python
from subprocess import Popen, PIPE

with open("log_file.log", mode="b",  encoding="utf8") as log_fh:
        proc = Popen(["/usr/bin/yes 1>&2 "],  bufsize=512, stdout=PIPE, stderr=PIPE, shell=True)
        r = b""
        # proc.stderr is an io.TextIOWrapper file-like obj
    # iter over line
        for line in proc.stderr:
                r += line
                # print("current line", line, flush=True)
                log_fh.write(line) # file open in binary mode
                # log_fh.write(line.decode("utf8")) # for text mode
                log_fh.flush() # flush the content

Answer 2

在终端中显示 dd 的进度报告并将（解析的）输出保存到日志文件：

#!/usr/bin/env python3
import io
from subprocess import PIPE, Popen
from time import monotonic as timer

cmd = "dd if=/dev/sda1 of=image.dd bs=524288 count=3000 status=progress".split()
with Popen(cmd, stderr=PIPE) as process, \
        open("log_file.log", "a") as log_file:
    start_time = timer()
    for line in io.TextIOWrapper(process.stderr, newline=''):
        print(line, flush=True, end='')  # no newline ('\n')
        if 'bytes' in line:
            # XXX parse line here, add flush=True if necessary
            print(line, file=log_file)
    # print duration
    print('Took {duration} seconds'.format(duration=timer() - start_time))

备注

否shell=True：这里不需要shell。 Popen()可以直接运行dd
没有线程、队列：这里不需要它们
请不要使用 while proc1.poll() == None 你在这里不需要它（如果 proc1.poll() 不是 None，你会在 proc1.stderr 上看到 EOF。您可能会丢失数据（即使进程已经退出，也可能有缓冲内容）。无关：如果需要与None进行比较；使用 is None 而不是 == None
io.TextIOWrapper(newline='') 启用文本模式（它使用 locale.getpreferredencoding(False)）并且它也将 '\r' 视为换行符
使用默认值 bufsize=-1（参见 io.DEFAULT_BUFFER_SIZE）

使用 python 子进程实时记录到文件

real time logging to file with python subprocess

python

subprocess

备注