以块的形式处理文本文件，其中每个块由时间戳分隔

Question

我正在尝试使用 Python 解析 iostat -xt 输出。 iostat 的怪癖是每秒的输出运行在多行上。例如：

06/30/2015 03:09:17 PM 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
           0.03    0.00    0.03    0.00    0.00   99.94 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
xvdap1            0.00     0.04    0.02    0.07     0.30     3.28    81.37     0.00   29.83    2.74   38.30   0.47   0.00 
xvdb              0.00     0.00    0.00    0.00     0.00     0.00    11.62     0.00    0.23    0.19    2.13   0.16   0.00 
xvdf              0.00     0.00    0.00    0.00     0.00     0.00    10.29     0.00    0.41    0.41    0.73   0.38   0.00 
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     9.12     0.00    0.36    0.35    1.20   0.34   0.00 
xvdh              0.00     0.00    0.00    0.00     0.00     0.00    33.35     0.00    1.39    0.41    8.91   0.39   0.00 
dm-0              0.00     0.00    0.00    0.00     0.00     0.00    11.66     0.00    0.46    0.46    0.00   0.37   0.00 

06/30/2015 03:09:18 PM 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
           0.00    0.00    0.50    0.00    0.00   99.50 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdh              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 

06/30/2015 03:09:19 PM 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
           0.00    0.00    0.50    0.00    0.00   99.50 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdh              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

基本上我需要解析 "chunks" 中的输出，其中每个块都由时间戳分隔。

我正在查看 itertools.groupby()，但这似乎并没有完全满足我在这里的要求 - 它似乎更适合分组行，其中每行由一个公共键联合起来，或者类似的东西你可以使用一个函数来检查。

另一个想法是这样的：

for line in f: 
    if line.count("/") == 2 and line.count(":") == 2: 
        current_time = datetime.strptime(line.strip(), '%m/%d/%y %H:%M:%S') 
    while line.count("/") != 2 and line.count(":") != 2: 
        print(line) 
        continue

但这似乎不太奏效。

是否有 Pythonic 方法来解析上述 iostat 输出，并将其分成按时间戳分割的块？

Answer 1

您可以使用正则表达式：

import re
date_reg = "([0-9]{2}\/[0-9]{2}\/[0-9]{4} [0-9]{2}\:[0-9]{2}\:[0-9]{2} (?:AM|PM))"
def split_by_date(text_iter):
    date = None
    lines = []
    for line in text_iter:
        if re.match(date_reg, line):
            if lines or date:
                yield (date, lines)
            date = datetime.strptime(line.strip(), '%m/%d/%y %H:%M:%S')
            lines = []
        else:
            lines.append(line)

    yield (date, lines)

for date, lines in split_by_date(f):
    # here you have lines for each encountered date
    for line in lines:
        print line

以块的形式处理文本文件，其中每个块由时间戳分隔

Processing textfile in chunks, where each chunk is separated by timestamp

python

itertools