使用 RegEx 对对象组配置进行分组

Question

我有一个来自 Cisco ASA 的配置，我需要编写一个 Python RegEx 来捕获对象组中的所有内容并将它们分组以供进一步处理。

例如：

object-group network FTP
 description FTP Access
 network-object host BCD1
 network-object host BCD2
object-group network NTP
 description NTP Access
 network-object host ABC1
 network-object host ABC2
 network-object host ABC3
object-group service sample_service tcp
 description Ports 1 2 3
 port-object range 80 81
 port-object eq pop3
 port-object eq imap4
 port-object range 443 444
object-group service 8080 tcp
 description Servers

最终结果应该是这样的：

Group 1: object-group network FTP
          description FTP Access
          network-object host BCD1
          network-object host BCD2

Group 2:  object-group network NTP
          description NTP Access
          network-object host ABC1
          network-object host ABC2
etc.

正如我所说的，我很不擅长这个，但我试图想出一些办法，但结果很糟糕

(object-group\s[^!]*)object or (object-group[^!]*)

都失败了

Answer 1

您可以使用以 unroll-the-loop 技巧编写的正则表达式：

\bobject-group\b\S*(?:\s+(?!object-group\b)\S*)*

参见 regex demo。它与 (?s)object-group(?:(?!\bobject-group\b).)* 或 (?s)object-group.*?(?=\bobject-group\b|$) 基本相同，但效率更高。

解释：

\bobject-group\b - 字符的文字序列 object-group（由于 \b 字边界，整个字）
\S* - 零个或多个 non-whitespace 个符号
(?:\s+(?!object-group\b)\S*)* - 零个或多个......
- \s+(?!object-group\b) - 1 个或多个空格符号后面没有跟 object-group 整个单词
- \S* - 零个或多个 non-whitespace 个符号。

Python code:

import re
p = re.compile(r'\bobject-group\b\S*(?:\s+(?!object-group\b)\S*)*')
test_str = "object-group network FTP\n description FTP Access\n network-object host BCD1\n network-object host BCD2\nobject-group network NTP\n description NTP Access\n network-object host ABC1\n network-object host ABC2\n network-object host ABC3\nobject-group service sample_service tcp\n description Ports 1 2 3\n port-object range 80 81\n port-object eq pop3\n port-object eq imap4\n port-object range 443 444\nobject-group service 8080 tcp\n description Servers"
print(re.findall(p, test_str))

Answer 2

您不需要复杂、难以理解的正则表达式来执行此操作。只需遍历以 object-group 开头的行上的文件中断并构建列表字典。

您可以使用 list 中的 itertools.groupby() or a defaultdict 来完成。我更喜欢后者，它会给你一个对进一步处理有用的字典：

from collections import defaultdict

object_groups = defaultdict(list)
key = 0
with open('cisco.cfg') as f:
    for line in f:
        if line.startswith('object-group'):
            key += 1
        object_groups[key].append(line.strip())

from pprint import pprint
pprint(object_groups.items())

假设您的示例输入，输出将是：

[(1,
  ['object-group network FTP',
   'description FTP Access',
   'network-object host BCD1',
   'network-object host BCD2']),
 (2,
  ['object-group network NTP',
   'description NTP Access',
   'network-object host ABC1',
   'network-object host ABC2',
   'network-object host ABC3']),
 (3,
  ['object-group service sample_service tcp',
   'description Ports 1 2 3',
   'port-object range 80 81',
   'port-object eq pop3',
   'port-object eq imap4',
   'port-object range 443 444']),
 (4, ['object-group service 8080 tcp', 'description Servers'])]

此外，您可以改为使用对象组标识符作为键：

from collections import defaultdict

object_groups = defaultdict(list)
key = None
with open('cisco.cfg') as f:
    for line in f:
        if line.startswith('object-group'):
#            key = line.strip()                      # the whole line
            key = line.strip().partition(' ')[-1]    # just the object group definition
        else:
            object_groups[key].append(line.strip())

from pprint import pprint
pprint(object_groups.items())

这将创建一个类似的字典，但键为 'network FTP'、'network NTP'、'service sample_service tcp' 等

使用 RegEx 对对象组配置进行分组

Grouping object-group configuration using RegEx

python

regex

cisco