Python: 提取随机数标记之间文本文件的行数

Python: Extract random no. of lines from a text file between markers

我有一些包含 1000 多行的文本文件。它包含以下格式的一些行:

seq open @ 2018/02/26 23:07:51 node: \nodes\wroot.nod (wroot)
seq call @ 2018/02/26 23:07:51 node: ttt
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
seq done @ 2018/02/26 23:07:55 node:ttt

seq call @ 2018/02/26 23:07:55 node: fff
Open the firewall
Firewall opened
seq done @ 2018/02/26 23:07:57 node: fff

seq call @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
seq done @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)

seq call @ 2018/02/26 23:07:57 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)

SENDING UUTMonitor.exe /timeevent:PTEFIE
seq done @ 2018/02/26 23:07:58 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)

seq call @ 2018/02/26 23:07:58 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)

02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat

<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
seq done @ 2018/02/26 23:08:04 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)

seq log @ 2018/02/26 23:08:04 node: skipping wroot#14^wbios as \flags\bios_flash_wnd.trg file not exists

seq call @ 2018/02/26 23:08:04 node: aaa

Get SkeletonPO from \working\ubera.ini
seq done @ 2018/02/26 23:08:04 node: aaa

我想提取列表中 seq call 和 seq done 之间的行,如果该行以 seq open 或 seq log 开头,则在列表中插入 NULL。

如您所见,可能有任何随机编号。 seq call 和 seq done 之间的行数甚至为 0。我一直在努力寻找答案,但无济于事。我也是 python.

的新人

上述示例的预期输出:

NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
NULL
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat

<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini

这里有一个快速而肮脏的方法来获得你想要的东西:

def extractTxt(fpth, joinchar=' '):
    loglines = []
    with open(fpth) as f:
        incall = False
        calllines = []

        for line in f:
            if line.startswith('seq open') or line.startswith('seq log'):
                loglines.append('NULL')
            elif line.startswith('seq call'):
                incall = True
            elif incall:
                if line.startswith('seq done'):
                    incall = False
                    call = joinchar.join(l for l in calllines if l)
                    calllines = []

                    if not call.strip():
                        loglines.append('NULL')
                    else:
                        loglines.append(call)
                else:
                    calllines.append(line.strip())

    return loglines

extractTxt('seq.txt')

输出:

['NULL',
 'retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt BCU is working',
 'Open the firewall Firewall opened',
 'NULL',
 'SENDING UUTMonitor.exe /timeevent:PTEFIE',
 '02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat <BISCON Version=xxxx"> x y </BISCON> \process\ProcessInit.bat:::Parsing branding variables from INI files... found \flags\custom.ini PRODUCTIONLOCK not defined in custom.ini \process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data... 02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat',
 'NULL',
 'Get SkeletonPO from \working\ubera.ini']

您可以通过将不同的 joinchar 参数传递给 extractTxt 来更改每个调用中的单独行在列表条目中的连接方式。我会将任何进一步的 styling/organization 任务留作练习。

详情

行:

call = joinchar.join(l for l in calllines if l)

做几件不同的事情。 join method 将使用其前面的字符串将字符串列表连接在一起。例如,下面的表达式:

', '.join(['foo', 'bar', 'baz', 'bof'])

将产生此输出:

'foo, bar, baz, bof'

括号内的行部分:

l for l in calllines if l

是一个叫做 generator expression 的东西。这解释起来有点复杂,但基本上它在这里所做的就是对 calllines 中所有不为空的行进行 "list"。如果您好奇,请参阅链接页面了解更多详情。您可以通过扩展它来稍微简化这条线。总而言之,以下几行:

call = ''
for l in calllines:
    # l evaluates to False if it is empty
    if l:
        call += l + joinchar

# remove any trailing joinchar
if call.endswith(joinchar):
    call = call[:-len(joinchar)]

与单行 call = joinchar.join(l for l in calllines if l).

具有相同的效果
import re

begins_with_open_or_log=re.compile(r'seq open|seq log')
begins_with_call_and_done=re.compile(r'seq call|seq done')

with open('log.txt') as f:
    lines=f.readlines()
i=0
for line in lines:
    if re.match(begins_with_open_or_log,line):
        lines[i]='NULL\n'
    elif re.match(begins_with_call_and_done,line):
        lines[i]=''
    elif line=='\n':
        lines[i]=''
    i+=1
print (''.join(lines),end='')

I want to extract the lines between seq call and seq done in a list and insert NULL in the list if the line starts with seq open or seq log.

这可能是您想要的输出:

NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini

但是,如果你是认真的:

I want to extract the lines between seq call and seq done

请注意,例如,行

retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt

不属于你的输出...你需要尽可能精确


注意:对于 python 2.7,更改此行

print (''.join(lines),end='')

对于这个:

print ''.join(lines)