Python 多行正则表达式忽略字符串中的 n 行
Python multiline regex ignore n lines in string
我在编写正确的正则表达式时遇到问题。也许有人可以帮助我?
我有两个网络设备的输出:
1
VRF NAME1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
Old CLI format, supports IPv4 only
Flags: 0xC
Interfaces:
Gi1/1/1 Gi1/1/4
2
VRF NAME2 (VRF Id = 2); default RD 101:2; default VPNID <not set>
Interfaces:
Gi0/0/3 Gi0/0/4 Gi0/1/4
我需要从两者中提取接口名称。
我有正则表达式:
rx = re.compile("""
VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
^.*$[\n\r]
^.*$[\n\r]
^.*$[\n\r]
(^.*)
""",re.MULTILINE|re.VERBOSE)
但它只适用于第一个文本,它会跳过 4 行,而 5 行正是我所需要的。然而,有许多路由器返回像 2 这样的输出。
问题是如何忽略未知数量的行,例如找到带有 Interfaces 字的行并在 "Interfaces:"
之后提取下一行
Positive lookbehind
(?<=...)
Ensures that the given pattern will match, ending at the current position in the expression. The pattern must have a fixed width. Does not consume any characters.
正则表达式 (?<=Interfaces:\n).+
匹配每行“Interfaces:”之后的整行“Interfaces:”
我在 regex101.com 上对其进行了测试,它与您的两个示例完美配合。
有多个选项,但最接近您最初尝试的选项使用可选的未捕获行:
rx = re.compile("""
VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
(?:^.*$[\n\r])?
(?:^.*$[\n\r])?
Interfaces:[\n\r]
(.*)""",re.MULTILINE|re.VERBOSE)
但是,第一行对我来说也很奇怪,无法编译(缺少右大括号),但是 (?:^.*$[\n\r])?
在您的应用程序中工作。
编辑:在向我们提供更多信息后,答案已更正。
有很多方法可以解决这个问题。看看regex101。正则表达式
(?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)
读入完整记录并捕获名称、RD 值和 Interfaces
之后的行。
解释:
(?s) # single line mode: make "." read anything,
# including line breaks
VRF # every records start with VRF
\s # read " "
([^\s]+) # group 1: capture NAME VRF
\s # read " "
.*? # lazy read anything
(?: # start non-capture group
RD\s # read "RD "
( # group 2
[\d.]+:\d # number or ip, followed by ":" and a digit
| # OR
<not\sset> # value "<not set>"
) # group 2 end
) # non-caputure group end
; # read ";"
.*? # lazy read anything
Interfaces: # read "Interfaces:"
(?:\r*\n) # read newline
\s* # read spaces
(.*?) # group 3: read line after "Interfaces:"
(?:\r*\n) # read newline
我们来看一个测试脚本。我稍微减少了脚本中记录的长度,但消息仍然存在。
$ cat test.py
import os
import re
pattern = r"(?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)"
text = '''\
VRF BLA1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
Old CLI format, supports IPv4 only
Flags: 0xC
Interfaces:
Gi1/1/1.451 Gi1/1/4.2019
Address family ipv4 unicast (Table ID = 0x2):
VRF label allocation mode: per-prefix
Address family ipv6 unicast not active
Address family ipv4 multicast not active
VRF BLA2 (VRF Id = 1); default RD <not set>; default VPNID <not set>
New CLI format, supports multiple address-families
Flags: 0x1808
Interfaces:
Gi0
Address family ipv4 unicast (Table ID = 0x1):
Flags: 0x0
Address family ipv6 unicast (Table ID = 0x1E000001):
Flags: 0x0
Address family ipv4 multicast not active\
'''
for rec in text.split( os.linesep + os.linesep):
m = re.match(pattern, rec)
if m:
print("%s\tRD: %s\tInterfaces: %s" % (m.group(1), m.group(2), m.group(3)))
这导致:
$ python test.py
BLA1 RD: 9200:1 Interfaces: Gi1/1/1.451 Gi1/1/4.2019
BLA2 RD: <not set> Interfaces: Gi0
我在编写正确的正则表达式时遇到问题。也许有人可以帮助我?
我有两个网络设备的输出:
1
VRF NAME1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
Old CLI format, supports IPv4 only
Flags: 0xC
Interfaces:
Gi1/1/1 Gi1/1/4
2
VRF NAME2 (VRF Id = 2); default RD 101:2; default VPNID <not set>
Interfaces:
Gi0/0/3 Gi0/0/4 Gi0/1/4
我需要从两者中提取接口名称。
我有正则表达式:
rx = re.compile("""
VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
^.*$[\n\r]
^.*$[\n\r]
^.*$[\n\r]
(^.*)
""",re.MULTILINE|re.VERBOSE)
但它只适用于第一个文本,它会跳过 4 行,而 5 行正是我所需要的。然而,有许多路由器返回像 2 这样的输出。 问题是如何忽略未知数量的行,例如找到带有 Interfaces 字的行并在 "Interfaces:"
之后提取下一行Positive lookbehind
(?<=...) Ensures that the given pattern will match, ending at the current position in the expression. The pattern must have a fixed width. Does not consume any characters.
正则表达式 (?<=Interfaces:\n).+
匹配每行“Interfaces:”之后的整行“Interfaces:”
我在 regex101.com 上对其进行了测试,它与您的两个示例完美配合。
有多个选项,但最接近您最初尝试的选项使用可选的未捕获行:
rx = re.compile("""
VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
(?:^.*$[\n\r])?
(?:^.*$[\n\r])?
Interfaces:[\n\r]
(.*)""",re.MULTILINE|re.VERBOSE)
但是,第一行对我来说也很奇怪,无法编译(缺少右大括号),但是 (?:^.*$[\n\r])?
在您的应用程序中工作。
编辑:在向我们提供更多信息后,答案已更正。
有很多方法可以解决这个问题。看看regex101。正则表达式
(?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)
读入完整记录并捕获名称、RD 值和 Interfaces
之后的行。
解释:
(?s) # single line mode: make "." read anything,
# including line breaks
VRF # every records start with VRF
\s # read " "
([^\s]+) # group 1: capture NAME VRF
\s # read " "
.*? # lazy read anything
(?: # start non-capture group
RD\s # read "RD "
( # group 2
[\d.]+:\d # number or ip, followed by ":" and a digit
| # OR
<not\sset> # value "<not set>"
) # group 2 end
) # non-caputure group end
; # read ";"
.*? # lazy read anything
Interfaces: # read "Interfaces:"
(?:\r*\n) # read newline
\s* # read spaces
(.*?) # group 3: read line after "Interfaces:"
(?:\r*\n) # read newline
我们来看一个测试脚本。我稍微减少了脚本中记录的长度,但消息仍然存在。
$ cat test.py
import os
import re
pattern = r"(?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)"
text = '''\
VRF BLA1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
Old CLI format, supports IPv4 only
Flags: 0xC
Interfaces:
Gi1/1/1.451 Gi1/1/4.2019
Address family ipv4 unicast (Table ID = 0x2):
VRF label allocation mode: per-prefix
Address family ipv6 unicast not active
Address family ipv4 multicast not active
VRF BLA2 (VRF Id = 1); default RD <not set>; default VPNID <not set>
New CLI format, supports multiple address-families
Flags: 0x1808
Interfaces:
Gi0
Address family ipv4 unicast (Table ID = 0x1):
Flags: 0x0
Address family ipv6 unicast (Table ID = 0x1E000001):
Flags: 0x0
Address family ipv4 multicast not active\
'''
for rec in text.split( os.linesep + os.linesep):
m = re.match(pattern, rec)
if m:
print("%s\tRD: %s\tInterfaces: %s" % (m.group(1), m.group(2), m.group(3)))
这导致:
$ python test.py
BLA1 RD: 9200:1 Interfaces: Gi1/1/1.451 Gi1/1/4.2019
BLA2 RD: <not set> Interfaces: Gi0