从文本文件中提取特定 start/end 模式之间的行
Extract lines between specific start/end pattern from text file
我想提取指定的开始模式(包括)和结束模式(不包括)之间的线条。
我下面的代码确实提取了一些行,但没有提取与起始模式匹配的第一行。
在我想要的目标输出中,我还想要匹配的第一行。
代码尝试
import re
import xlswriter
linenum = 0
myline = []
pattern_start = re.compile(r"^vsi ipcbb")
pattern_stop = re.compile(r"^vsi ipcbb-ipran")
with open(r'readline.txt', 'rt') as myfile :
for row in myfile :
if pattern_start.search(row) != None :
for line in myfile :
linenum += 1
if pattern_stop.search(line) != None:
break
myline.append((linenum, line.rstrip('\n')))
with xlsxwriter.Workbook('readline.xlsx') as workbook:
worksheet = workbook.add_worksheet('VSI')
for row_num,data in enumerate(myline):
worksheet.write_row(row_num + 0, 0, data)
以文本文件形式输入
!Last configuration was updated at 2021-04-22 05:52:21 UTC by
!Last configuration was saved at 2021-04-22 19:00:49 UTC by
!PdtPrivateInfo = System current forwarding-mode: compatible
!MKHash 0000000000000000
vsi ipcbb-RAC_YBPNM01H-00 static
description *** M-ipcbb-RAC_YBPNM01H(via RAG_MBSPM01H&RAG_YBPNM01H) ***
tnl-policy TE
diffserv-mode pipe af1 green
#
vsi ipcbb-ipran-RSG_NKY2M-00 static
description *** IPCBB-IPRAN VLAN61 Inherit(RAG_NKY2M01H-RAG_NKY2M02H) ***
tnl-policy TE
diffserv-mode pipe af1 green
#
实际输出(提取的行)
description *** M-ipcbb-RAC_YBPNM01H(via RAG_MBSPM01H&RAG_YBPNM01H) ***
tnl-policy TE
diffserv-mode pipe af1 green
#
想要的输出(提取的行)
vsi ipcbb-RAC_YBPNM01H-00 static
description *** M-ipcbb-RAC_YBPNM01H(via RAG_MBSPM01H&RAG_YBPNM01H) ***
tnl-policy TE
diffserv-mode pipe af1 green
#
您可以使用像 extract_on
这样的布尔模式标志,如果当前处于开始和停止之间,它会发出信号并且应该提取该行。
还可以使用 re.match
函数完成行匹配,该函数 returns 匹配对象或 None
.
import re
pattern_start = re.compile(r"^vsi ipcbb")
pattern_stop = re.compile(r"^vsi ipcbb-ipran")
i = 0
extract_on = False
extracts = []
with open(r'readline.txt', 'rt') as myfile:
for line in myfile:
i += 1 # line counting starts with 1
if pattern_start.match(line):
extract_on = True
if pattern_stop.search(line):
extract_on = False
if extract_on:
extracts.append((i, line.rstrip('\n')))
for line in extracts:
print(line)
根据您的输入,它将忽略前 4 行,提取中间 5 行,并再次忽略最后 5 行。
因此,包括文件中位置在内的提取行的打印输出为:
(5, 'vsi ipcbb-RAC_YBPNM01H-00 static')
(6, ' description *** M-ipcbb-RAC_YBPNM01H(via RAG_MBSPM01H&RAG_YBPNM01H) ***')
(7, ' tnl-policy TE')
(8, ' diffserv-mode pipe af1 green')
(9, '#')
省略了 XLS 写入,假设它按预期工作。
我想提取指定的开始模式(包括)和结束模式(不包括)之间的线条。
我下面的代码确实提取了一些行,但没有提取与起始模式匹配的第一行。 在我想要的目标输出中,我还想要匹配的第一行。
代码尝试
import re
import xlswriter
linenum = 0
myline = []
pattern_start = re.compile(r"^vsi ipcbb")
pattern_stop = re.compile(r"^vsi ipcbb-ipran")
with open(r'readline.txt', 'rt') as myfile :
for row in myfile :
if pattern_start.search(row) != None :
for line in myfile :
linenum += 1
if pattern_stop.search(line) != None:
break
myline.append((linenum, line.rstrip('\n')))
with xlsxwriter.Workbook('readline.xlsx') as workbook:
worksheet = workbook.add_worksheet('VSI')
for row_num,data in enumerate(myline):
worksheet.write_row(row_num + 0, 0, data)
以文本文件形式输入
!Last configuration was updated at 2021-04-22 05:52:21 UTC by
!Last configuration was saved at 2021-04-22 19:00:49 UTC by
!PdtPrivateInfo = System current forwarding-mode: compatible
!MKHash 0000000000000000
vsi ipcbb-RAC_YBPNM01H-00 static
description *** M-ipcbb-RAC_YBPNM01H(via RAG_MBSPM01H&RAG_YBPNM01H) ***
tnl-policy TE
diffserv-mode pipe af1 green
#
vsi ipcbb-ipran-RSG_NKY2M-00 static
description *** IPCBB-IPRAN VLAN61 Inherit(RAG_NKY2M01H-RAG_NKY2M02H) ***
tnl-policy TE
diffserv-mode pipe af1 green
#
实际输出(提取的行)
description *** M-ipcbb-RAC_YBPNM01H(via RAG_MBSPM01H&RAG_YBPNM01H) ***
tnl-policy TE
diffserv-mode pipe af1 green
#
想要的输出(提取的行)
vsi ipcbb-RAC_YBPNM01H-00 static
description *** M-ipcbb-RAC_YBPNM01H(via RAG_MBSPM01H&RAG_YBPNM01H) ***
tnl-policy TE
diffserv-mode pipe af1 green
#
您可以使用像 extract_on
这样的布尔模式标志,如果当前处于开始和停止之间,它会发出信号并且应该提取该行。
还可以使用 re.match
函数完成行匹配,该函数 returns 匹配对象或 None
.
import re
pattern_start = re.compile(r"^vsi ipcbb")
pattern_stop = re.compile(r"^vsi ipcbb-ipran")
i = 0
extract_on = False
extracts = []
with open(r'readline.txt', 'rt') as myfile:
for line in myfile:
i += 1 # line counting starts with 1
if pattern_start.match(line):
extract_on = True
if pattern_stop.search(line):
extract_on = False
if extract_on:
extracts.append((i, line.rstrip('\n')))
for line in extracts:
print(line)
根据您的输入,它将忽略前 4 行,提取中间 5 行,并再次忽略最后 5 行。 因此,包括文件中位置在内的提取行的打印输出为:
(5, 'vsi ipcbb-RAC_YBPNM01H-00 static')
(6, ' description *** M-ipcbb-RAC_YBPNM01H(via RAG_MBSPM01H&RAG_YBPNM01H) ***')
(7, ' tnl-policy TE')
(8, ' diffserv-mode pipe af1 green')
(9, '#')
省略了 XLS 写入,假设它按预期工作。