如果使用 Python 和正则表达式的内容因块而异,如何解析多行块文本?

How to parse Multiline block text if content differs from block to block using Python & regex?

我有一个需要解析的配置文件,由于 python.

中的分组,我的想法是稍后将其放入字典中

我面临的问题是,并非每个文本块中的所有行都完全相同,到目前为止,我的正则表达式适用于行数最多的块,但当然只匹配该单个块。 例如,如果在某些块中省略了某些 "set" 行,我该如何进行多行匹配。

我有一些想法,但我想要一些 pythonic 的方法。

一如既往,非常感谢您的帮助。 谢谢

TEXT,其中要匹配的块是从编辑到下一个。并非每个块都包含相同的 "set" 语句:

edit "port11"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set device-identification enable
    set snmp-index 26
next
edit "port21"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set snmp-index 27
next
edit "port28"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set snmp-index 28
next
edit "port29"
    set vdom "ACME_Prod"
    set ip 174.244.244.244 255.255.255.224
    set allowaccess ping
    set vlanforward enable
    set type physical
    set alias "Internet-IRISnet"
    set snmp-index 29
next
edit "port20"
    set vdom "root"
    set ip 192.168.1.1 255.255.255.0
    set allowaccess ping https ssh snmp fgfm
    set vlanforward enable
    set type physical
    set snmp-index 39
next
edit "port25"
    set vdom "root"
    set allowaccess fgfm
    set vlanforward enable
    set type physical
    set snmp-index 40
next

代码片段:

import re, pprint
file = "interfaces_2016_10_12.conf"

try:
    """
    fileopen = open(file, 'r')
    output = open('output.txt', 'w+')
except:
    exit("Input file does not exist, exiting script.")

#read whole config in 1 go instead of iterating line by line
text = fileopen.read()   

# my verbose regex, verbose so it is more readable !

pattern = r'''^                 # use r for multiline usage
\s+edit\s\"(.*)\"\n           # group(1) match int name
\s+set\svdom\s\"(.*)\"\n      # group(2) match vdom name
\s+set\sip\s(.*)\n            # group(3) match interface ip
\s+set\sallowaccess\s(.*)\n   # group(4) match allowaccess
\s+set\svlanforward\s(.*)\n   # group(5) match vlanforward
\s+set\stype\s(.*)\n          # group(6) match type
\s+set\salias\s\"(.*)\"\n     # group(7) match alias
\s+set\ssnmp-index\s\d{1,3}\n # match snmp-index but we don't need it
\s+next$'''                   # match end of config block

regexp = re.compile(pattern, re.VERBOSE | re.MULTILINE)

For multiline regex matching use finditer(): 
"""
z = 1
for match in regexp.finditer(text):
    while z < 8:
        print match.group(z)
        z += 1

fileopen.close()  #always close file
output.close() #always close file

为什么要使用 regex,因为它看起来是一个非常简单的结构来解析:

data = {}
with open(file, 'r') as fileopen:
    for line in fileopen:
        words = line.strip().split()
        if words[0] == 'edit':  # Create a new block
            curr = data.setdefault(words[1].strip('"'), {})
        elif words[0] == 'set': # Write config to block
            curr[words[1]] = words[2].strip('"') if len(words) == 3 else words[2:]
print(data)

输出:

{'port11': {'device-identification': 'enable',
  'snmp-index': '26',
  'type': 'physical',
  'vdom': 'ACME_Prod',
  'vlanforward': 'enable'},
 'port20': {'allowaccess': ['ping', 'https', 'ssh', 'snmp', 'fgfm'],
  'ip': ['192.168.1.1', '255.255.255.0'],
  'snmp-index': '39',
  'type': 'physical',
  'vdom': 'root',
  'vlanforward': 'enable'},
  ...