正在用文本解析 dhcpd.conf

Question

我正在使用 https://github.com/igordejanovic/textX to parse dhcpd.conf file (no, https://pypi.org/project/iscconf/ 对我不起作用，它在我的 dhcpd.conf 文件上崩溃），特别是提取具有固定地址的主机。

记录如下：

    host example1 {
    option host-name "example1";
    ddns-hostname "example1";
    fixed-address 192.168.1.181;
    }

    host example2 {
    hardware ethernet aa:bb:ff:20:fa:13;
    fixed-address 192.168.1.191;
    option host-name "example2";
    ddns-hostname "example2";
    }

代码：

def get_hosts(s):
    grammar = """
    config: hosts*=host ;

    host: 'host' hostname=ID '{'
        (
            ('hardware ethernet' hardware_ethernet=/[0-9a-fA-F:]+/';')?

            'fixed-address' fixed_address=/([0-9]{1,3}\.){3}[0-9]{1,3}/';'

            ('option host-name' option_host_name=STRING';')?

            ('ddns-hostname' ddns_hostname=STRING';')?
        )#
    '}'
    ;
    """
    mm = metamodel_from_str(grammar)
    model = mm.model_from_str(s)
    for host in model.hosts:
        print host.hostname, host.fixed_address

现在，我无法使用此语法解析整个 dhcpd.conf（很明显，我遇到了语法错误，因为文件中有太多其他元素语法无法解释）；另一方面，我不想为这个文件构建完整的语法，因为我只需要提取特定类型的主机记录。

我当然可以使用正则表达式仅提取主机记录并单独解析它们，但我想知道是否有某种方法可以使 textX 仅从文件中提取 host 记录并忽略剩下的内容？

Answer 1

这里是textX作者。我不是 SO 的常客 :)。您可以尝试使用正则表达式匹配和正则表达式前瞻来消耗不需要的内容。这是一个正确处理中间文本的完整示例，即使存在关键字 host。如果前面没有单词 host，规则 config 首先消耗一个字符，并且由于零个或多个运算符而重复。当我们得到一个单词 host 时，我们尝试执行一次或多次匹配 host 规则并收集所有主机对象，如果规则至少一次没有成功（注意 += 的用法) 我们消耗单词 host 并重复该过程。这可能可以做得更好（性能更高），但您明白了。在做这种事情时，最好知道 textX 默认情况下会使用空格，但您可以使用 noskipws（请参阅 the docs）全局或按规则关闭此功能。

from textx import metamodel_from_str


def test_get_hosts():
    grammar = r"""
    config: ( /(?!host)./ | hosts+=host | 'host' )* ;

    host: 'host' hostname=ID '{'
        (
            ('hardware ethernet' hardware_ethernet=/[0-9a-fA-F:]+/';')?
            'fixed-address' fixed_address=/([0-9]{1,3}\.){3}[0-9]{1,3}/';'
            ('option host-name' option_host_name=STRING';')?
            ('ddns-hostname' ddns_hostname=STRING';')?
        )#
    '}'
    ;
    """
    conf_file = r"""
    host example1 {
    option host-name "example1";
    ddns-hostname "example1";
    fixed-address 192.168.1.181;
    }

    some arbitrary content in between
    with word host but that fails to match host config.

    host example2 {
    hardware ethernet aa:bb:ff:20:fa:13;
    fixed-address 192.168.1.191;
    option host-name "example2";
    ddns-hostname "example2";
    }
    """
    mm = metamodel_from_str(grammar)
    model = mm.model_from_str(conf_file)
    assert len(model.hosts) == 2
    for host in model.hosts:
        print(host.hostname, host.fixed_address)


if __name__ == "__main__":
    test_get_hosts()

编辑：这里有两个关于config规则的想法：一个简单的：

config: ( hosts+=host | /./ )* ;

并且（可能）性能更高，在尝试之前使用正则表达式引擎尽可能多地消耗 host:

config: ( /(?s:.*?(?=host))/ hosts*=host | 'host' )*
        /(?s).*/;

正在用文本解析 dhcpd.conf

Parsing dhcpd.conf with textX

python

regex

textx