如果使用 Python 和正则表达式的内容因块而异,如何解析多行块文本?
How to parse Multiline block text if content differs from block to block using Python & regex?
我有一个需要解析的配置文件,由于 python.
中的分组,我的想法是稍后将其放入字典中
我面临的问题是,并非每个文本块中的所有行都完全相同,到目前为止,我的正则表达式适用于行数最多的块,但当然只匹配该单个块。
例如,如果在某些块中省略了某些 "set" 行,我该如何进行多行匹配。
我是否需要分解正则表达式并使用 if、elsif、true/false 语句来解决这个问题?好像没有pythonic恕我直言。
我很确定我将不得不分解我的大正则表达式并按顺序处理它?如果为真则...否则跳至下一个正则表达式匹配行。
是否正在考虑将每个块从编辑到下一个放入一个列表元素中以单独解析?或者我可以一次性完成所有事情吗?
我有一些想法,但我想要一些 pythonic 的方法。
一如既往,非常感谢您的帮助。
谢谢
TEXT,其中要匹配的块是从编辑到下一个。并非每个块都包含相同的 "set" 语句:
edit "port11"
set vdom "ACME_Prod"
set vlanforward enable
set type physical
set device-identification enable
set snmp-index 26
next
edit "port21"
set vdom "ACME_Prod"
set vlanforward enable
set type physical
set snmp-index 27
next
edit "port28"
set vdom "ACME_Prod"
set vlanforward enable
set type physical
set snmp-index 28
next
edit "port29"
set vdom "ACME_Prod"
set ip 174.244.244.244 255.255.255.224
set allowaccess ping
set vlanforward enable
set type physical
set alias "Internet-IRISnet"
set snmp-index 29
next
edit "port20"
set vdom "root"
set ip 192.168.1.1 255.255.255.0
set allowaccess ping https ssh snmp fgfm
set vlanforward enable
set type physical
set snmp-index 39
next
edit "port25"
set vdom "root"
set allowaccess fgfm
set vlanforward enable
set type physical
set snmp-index 40
next
代码片段:
import re, pprint
file = "interfaces_2016_10_12.conf"
try:
"""
fileopen = open(file, 'r')
output = open('output.txt', 'w+')
except:
exit("Input file does not exist, exiting script.")
#read whole config in 1 go instead of iterating line by line
text = fileopen.read()
# my verbose regex, verbose so it is more readable !
pattern = r'''^ # use r for multiline usage
\s+edit\s\"(.*)\"\n # group(1) match int name
\s+set\svdom\s\"(.*)\"\n # group(2) match vdom name
\s+set\sip\s(.*)\n # group(3) match interface ip
\s+set\sallowaccess\s(.*)\n # group(4) match allowaccess
\s+set\svlanforward\s(.*)\n # group(5) match vlanforward
\s+set\stype\s(.*)\n # group(6) match type
\s+set\salias\s\"(.*)\"\n # group(7) match alias
\s+set\ssnmp-index\s\d{1,3}\n # match snmp-index but we don't need it
\s+next$''' # match end of config block
regexp = re.compile(pattern, re.VERBOSE | re.MULTILINE)
For multiline regex matching use finditer():
"""
z = 1
for match in regexp.finditer(text):
while z < 8:
print match.group(z)
z += 1
fileopen.close() #always close file
output.close() #always close file
为什么要使用 regex
,因为它看起来是一个非常简单的结构来解析:
data = {}
with open(file, 'r') as fileopen:
for line in fileopen:
words = line.strip().split()
if words[0] == 'edit': # Create a new block
curr = data.setdefault(words[1].strip('"'), {})
elif words[0] == 'set': # Write config to block
curr[words[1]] = words[2].strip('"') if len(words) == 3 else words[2:]
print(data)
输出:
{'port11': {'device-identification': 'enable',
'snmp-index': '26',
'type': 'physical',
'vdom': 'ACME_Prod',
'vlanforward': 'enable'},
'port20': {'allowaccess': ['ping', 'https', 'ssh', 'snmp', 'fgfm'],
'ip': ['192.168.1.1', '255.255.255.0'],
'snmp-index': '39',
'type': 'physical',
'vdom': 'root',
'vlanforward': 'enable'},
...
我有一个需要解析的配置文件,由于 python.
中的分组,我的想法是稍后将其放入字典中我面临的问题是,并非每个文本块中的所有行都完全相同,到目前为止,我的正则表达式适用于行数最多的块,但当然只匹配该单个块。 例如,如果在某些块中省略了某些 "set" 行,我该如何进行多行匹配。
我是否需要分解正则表达式并使用 if、elsif、true/false 语句来解决这个问题?好像没有pythonic恕我直言。
我很确定我将不得不分解我的大正则表达式并按顺序处理它?如果为真则...否则跳至下一个正则表达式匹配行。
是否正在考虑将每个块从编辑到下一个放入一个列表元素中以单独解析?或者我可以一次性完成所有事情吗?
我有一些想法,但我想要一些 pythonic 的方法。
一如既往,非常感谢您的帮助。 谢谢
TEXT,其中要匹配的块是从编辑到下一个。并非每个块都包含相同的 "set" 语句:
edit "port11"
set vdom "ACME_Prod"
set vlanforward enable
set type physical
set device-identification enable
set snmp-index 26
next
edit "port21"
set vdom "ACME_Prod"
set vlanforward enable
set type physical
set snmp-index 27
next
edit "port28"
set vdom "ACME_Prod"
set vlanforward enable
set type physical
set snmp-index 28
next
edit "port29"
set vdom "ACME_Prod"
set ip 174.244.244.244 255.255.255.224
set allowaccess ping
set vlanforward enable
set type physical
set alias "Internet-IRISnet"
set snmp-index 29
next
edit "port20"
set vdom "root"
set ip 192.168.1.1 255.255.255.0
set allowaccess ping https ssh snmp fgfm
set vlanforward enable
set type physical
set snmp-index 39
next
edit "port25"
set vdom "root"
set allowaccess fgfm
set vlanforward enable
set type physical
set snmp-index 40
next
代码片段:
import re, pprint
file = "interfaces_2016_10_12.conf"
try:
"""
fileopen = open(file, 'r')
output = open('output.txt', 'w+')
except:
exit("Input file does not exist, exiting script.")
#read whole config in 1 go instead of iterating line by line
text = fileopen.read()
# my verbose regex, verbose so it is more readable !
pattern = r'''^ # use r for multiline usage
\s+edit\s\"(.*)\"\n # group(1) match int name
\s+set\svdom\s\"(.*)\"\n # group(2) match vdom name
\s+set\sip\s(.*)\n # group(3) match interface ip
\s+set\sallowaccess\s(.*)\n # group(4) match allowaccess
\s+set\svlanforward\s(.*)\n # group(5) match vlanforward
\s+set\stype\s(.*)\n # group(6) match type
\s+set\salias\s\"(.*)\"\n # group(7) match alias
\s+set\ssnmp-index\s\d{1,3}\n # match snmp-index but we don't need it
\s+next$''' # match end of config block
regexp = re.compile(pattern, re.VERBOSE | re.MULTILINE)
For multiline regex matching use finditer():
"""
z = 1
for match in regexp.finditer(text):
while z < 8:
print match.group(z)
z += 1
fileopen.close() #always close file
output.close() #always close file
为什么要使用 regex
,因为它看起来是一个非常简单的结构来解析:
data = {}
with open(file, 'r') as fileopen:
for line in fileopen:
words = line.strip().split()
if words[0] == 'edit': # Create a new block
curr = data.setdefault(words[1].strip('"'), {})
elif words[0] == 'set': # Write config to block
curr[words[1]] = words[2].strip('"') if len(words) == 3 else words[2:]
print(data)
输出:
{'port11': {'device-identification': 'enable',
'snmp-index': '26',
'type': 'physical',
'vdom': 'ACME_Prod',
'vlanforward': 'enable'},
'port20': {'allowaccess': ['ping', 'https', 'ssh', 'snmp', 'fgfm'],
'ip': ['192.168.1.1', '255.255.255.0'],
'snmp-index': '39',
'type': 'physical',
'vdom': 'root',
'vlanforward': 'enable'},
...