Python: 提取随机数标记之间文本文件的行数
Python: Extract random no. of lines from a text file between markers
我有一些包含 1000 多行的文本文件。它包含以下格式的一些行:
seq open @ 2018/02/26 23:07:51 node: \nodes\wroot.nod (wroot)
seq call @ 2018/02/26 23:07:51 node: ttt
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
seq done @ 2018/02/26 23:07:55 node:ttt
seq call @ 2018/02/26 23:07:55 node: fff
Open the firewall
Firewall opened
seq done @ 2018/02/26 23:07:57 node: fff
seq call @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
seq done @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
seq call @ 2018/02/26 23:07:57 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)
SENDING UUTMonitor.exe /timeevent:PTEFIE
seq done @ 2018/02/26 23:07:58 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)
seq call @ 2018/02/26 23:07:58 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
seq done @ 2018/02/26 23:08:04 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)
seq log @ 2018/02/26 23:08:04 node: skipping wroot#14^wbios as \flags\bios_flash_wnd.trg file not exists
seq call @ 2018/02/26 23:08:04 node: aaa
Get SkeletonPO from \working\ubera.ini
seq done @ 2018/02/26 23:08:04 node: aaa
我想提取列表中 seq call 和 seq done 之间的行,如果该行以 seq open 或 seq log 开头,则在列表中插入 NULL。
如您所见,可能有任何随机编号。 seq call 和 seq done 之间的行数甚至为 0。我一直在努力寻找答案,但无济于事。我也是 python.
的新人
上述示例的预期输出:
NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
NULL
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini
这里有一个快速而肮脏的方法来获得你想要的东西:
def extractTxt(fpth, joinchar=' '):
loglines = []
with open(fpth) as f:
incall = False
calllines = []
for line in f:
if line.startswith('seq open') or line.startswith('seq log'):
loglines.append('NULL')
elif line.startswith('seq call'):
incall = True
elif incall:
if line.startswith('seq done'):
incall = False
call = joinchar.join(l for l in calllines if l)
calllines = []
if not call.strip():
loglines.append('NULL')
else:
loglines.append(call)
else:
calllines.append(line.strip())
return loglines
extractTxt('seq.txt')
输出:
['NULL',
'retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt BCU is working',
'Open the firewall Firewall opened',
'NULL',
'SENDING UUTMonitor.exe /timeevent:PTEFIE',
'02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat <BISCON Version=xxxx"> x y </BISCON> \process\ProcessInit.bat:::Parsing branding variables from INI files... found \flags\custom.ini PRODUCTIONLOCK not defined in custom.ini \process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data... 02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat',
'NULL',
'Get SkeletonPO from \working\ubera.ini']
您可以通过将不同的 joinchar
参数传递给 extractTxt
来更改每个调用中的单独行在列表条目中的连接方式。我会将任何进一步的 styling/organization 任务留作练习。
详情
行:
call = joinchar.join(l for l in calllines if l)
做几件不同的事情。 join
method 将使用其前面的字符串将字符串列表连接在一起。例如,下面的表达式:
', '.join(['foo', 'bar', 'baz', 'bof'])
将产生此输出:
'foo, bar, baz, bof'
括号内的行部分:
l for l in calllines if l
是一个叫做 generator expression 的东西。这解释起来有点复杂,但基本上它在这里所做的就是对 calllines
中所有不为空的行进行 "list"。如果您好奇,请参阅链接页面了解更多详情。您可以通过扩展它来稍微简化这条线。总而言之,以下几行:
call = ''
for l in calllines:
# l evaluates to False if it is empty
if l:
call += l + joinchar
# remove any trailing joinchar
if call.endswith(joinchar):
call = call[:-len(joinchar)]
与单行 call = joinchar.join(l for l in calllines if l)
.
具有相同的效果
import re
begins_with_open_or_log=re.compile(r'seq open|seq log')
begins_with_call_and_done=re.compile(r'seq call|seq done')
with open('log.txt') as f:
lines=f.readlines()
i=0
for line in lines:
if re.match(begins_with_open_or_log,line):
lines[i]='NULL\n'
elif re.match(begins_with_call_and_done,line):
lines[i]=''
elif line=='\n':
lines[i]=''
i+=1
print (''.join(lines),end='')
I want to extract the lines between seq call and seq done in a list and insert NULL in the list if the line starts with seq open or seq log.
这可能是您想要的输出:
NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini
但是,如果你是认真的:
I want to extract the lines between seq call and seq done
请注意,例如,行
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
不属于你的输出...你需要尽可能精确
注意:对于 python 2.7,更改此行
print (''.join(lines),end='')
对于这个:
print ''.join(lines)
我有一些包含 1000 多行的文本文件。它包含以下格式的一些行:
seq open @ 2018/02/26 23:07:51 node: \nodes\wroot.nod (wroot)
seq call @ 2018/02/26 23:07:51 node: ttt
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
seq done @ 2018/02/26 23:07:55 node:ttt
seq call @ 2018/02/26 23:07:55 node: fff
Open the firewall
Firewall opened
seq done @ 2018/02/26 23:07:57 node: fff
seq call @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
seq done @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
seq call @ 2018/02/26 23:07:57 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)
SENDING UUTMonitor.exe /timeevent:PTEFIE
seq done @ 2018/02/26 23:07:58 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)
seq call @ 2018/02/26 23:07:58 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
seq done @ 2018/02/26 23:08:04 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)
seq log @ 2018/02/26 23:08:04 node: skipping wroot#14^wbios as \flags\bios_flash_wnd.trg file not exists
seq call @ 2018/02/26 23:08:04 node: aaa
Get SkeletonPO from \working\ubera.ini
seq done @ 2018/02/26 23:08:04 node: aaa
我想提取列表中 seq call 和 seq done 之间的行,如果该行以 seq open 或 seq log 开头,则在列表中插入 NULL。
如您所见,可能有任何随机编号。 seq call 和 seq done 之间的行数甚至为 0。我一直在努力寻找答案,但无济于事。我也是 python.
的新人上述示例的预期输出:
NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
NULL
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini
这里有一个快速而肮脏的方法来获得你想要的东西:
def extractTxt(fpth, joinchar=' '):
loglines = []
with open(fpth) as f:
incall = False
calllines = []
for line in f:
if line.startswith('seq open') or line.startswith('seq log'):
loglines.append('NULL')
elif line.startswith('seq call'):
incall = True
elif incall:
if line.startswith('seq done'):
incall = False
call = joinchar.join(l for l in calllines if l)
calllines = []
if not call.strip():
loglines.append('NULL')
else:
loglines.append(call)
else:
calllines.append(line.strip())
return loglines
extractTxt('seq.txt')
输出:
['NULL',
'retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt BCU is working',
'Open the firewall Firewall opened',
'NULL',
'SENDING UUTMonitor.exe /timeevent:PTEFIE',
'02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat <BISCON Version=xxxx"> x y </BISCON> \process\ProcessInit.bat:::Parsing branding variables from INI files... found \flags\custom.ini PRODUCTIONLOCK not defined in custom.ini \process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data... 02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat',
'NULL',
'Get SkeletonPO from \working\ubera.ini']
您可以通过将不同的 joinchar
参数传递给 extractTxt
来更改每个调用中的单独行在列表条目中的连接方式。我会将任何进一步的 styling/organization 任务留作练习。
详情
行:
call = joinchar.join(l for l in calllines if l)
做几件不同的事情。 join
method 将使用其前面的字符串将字符串列表连接在一起。例如,下面的表达式:
', '.join(['foo', 'bar', 'baz', 'bof'])
将产生此输出:
'foo, bar, baz, bof'
括号内的行部分:
l for l in calllines if l
是一个叫做 generator expression 的东西。这解释起来有点复杂,但基本上它在这里所做的就是对 calllines
中所有不为空的行进行 "list"。如果您好奇,请参阅链接页面了解更多详情。您可以通过扩展它来稍微简化这条线。总而言之,以下几行:
call = ''
for l in calllines:
# l evaluates to False if it is empty
if l:
call += l + joinchar
# remove any trailing joinchar
if call.endswith(joinchar):
call = call[:-len(joinchar)]
与单行 call = joinchar.join(l for l in calllines if l)
.
import re
begins_with_open_or_log=re.compile(r'seq open|seq log')
begins_with_call_and_done=re.compile(r'seq call|seq done')
with open('log.txt') as f:
lines=f.readlines()
i=0
for line in lines:
if re.match(begins_with_open_or_log,line):
lines[i]='NULL\n'
elif re.match(begins_with_call_and_done,line):
lines[i]=''
elif line=='\n':
lines[i]=''
i+=1
print (''.join(lines),end='')
I want to extract the lines between seq call and seq done in a list and insert NULL in the list if the line starts with seq open or seq log.
这可能是您想要的输出:
NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini
但是,如果你是认真的:
I want to extract the lines between seq call and seq done
请注意,例如,行
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
不属于你的输出...你需要尽可能精确
注意:对于 python 2.7,更改此行
print (''.join(lines),end='')
对于这个:
print ''.join(lines)