错误的 Word() 字符串给出误导性的错误位置

Wrong Word() string gives misleading error location

在我非常深入的 PyParsing(132 个关键字)中,我 运行 遇到了一些古怪的事情。这可能是我对逻辑的使用。但话又说回来,可能不是。

ISC Bind9 配置文件有子句(有点像 INI 部分):

任何向强制性 options 子句添加解析器复杂性的尝试都会导致上述逻辑中断。

我不得不剥离不影响的解析器逻辑,直到它开始工作,然后不得不来回摇动代码,直到我到达由引入此 pyparsing 代码引起的确切破坏:

print("Using 'example1' as a Word() to inside 'options{ };':")
clauses_mandatory_complex = (
        Keyword('options')
        + Literal('{')
        + Word('[a-zA-Z0-9]')
        + Literal(';')
        + Literal('}')
        + Literal(';')
)

作为一个独立的 ParserElement,这个 clause_mandatory_complex 工作得很好。

直到我尝试引入子句逻辑:

    # Exactly one parse_element ('options' clause)
    # and any number of other clauses
    clauses_all_and = (
        clause_mandatory_complex
        & ZeroOrMore(clauses_zero_or_more)
    )

它的子句逻辑开始失败。

如果我取出Word(),像这样:

print("Using 'example1' as a Literal() to inside 'options{ };':")
clauses_mandatory_simple = (
        Keyword('options')
        + Literal('{')
        + Literal('example1')
        + Literal(';')
        + Literal('}')
        + Literal(';')
)

我的子句逻辑按预期再次开始工作。

这对我来说太难了运行所以我把它贴在这里。

下面是一个工作的独立测试程序,它演示了上面给出的差异:

#!/usr/bin/env python3
from pyparsing import ZeroOrMore, Word, Keyword, Literal
from pprint import PrettyPrinter

pp = PrettyPrinter(width=81, indent=4)

clauses_zero_or_more = (
        (Keyword('acl') + ';')
        | (Keyword('server') + ';')
        | (Keyword('view') + ';')
        | (Keyword('zone') + ';')
    )

def test_me(parse_element, test_data, fail_assert):
    # Exactly one parse_element ('options' clause)
    # and any number of other clauses
    clauses_all_and = (
        parse_element
        & ZeroOrMore(clauses_zero_or_more)
    )
    result = clauses_all_and.runTests(test_data, parseAll=True, printResults=True,
                                      failureTests=fail_assert)
    pp.pprint(result)
    return result

def print_all_results(pass_result, fail_result):
    print("Purposely passed test: {}. ".format(pass_result[0]))
    print("Purposely failed test: {}. ".format(fail_result[0]))
    print('\n')

passing_test_data = """
options { example1; };
acl; options { example1; };
options { example1; }; acl;
options { example1; }; server;
server; options { example1; };
acl; options { example1; }; server;
acl; server; options { example1; };
options { example1; }; acl; server;
options { example1; }; server; acl;
server; acl; options { example1; };
server; options { example1; }; acl;
"""
failing_test_data = """
acl;
acl; acl;
server; acl;
server;
acl; server;
options { example1; }; options { example1; };
"""


print("Using 'example1' as a Literal() to inside 'options{ };':")
clauses_mandatory_simple = (
        Keyword('options')
        + Literal('{')
        + Literal('example1')
        + Literal(';')
        + Literal('}')
        + Literal(';')
)
pass_result = test_me(clauses_mandatory_simple, passing_test_data, False)
fail_result = test_me(clauses_mandatory_simple, failing_test_data, True)
print_all_results(pass_result, fail_result)

# Attempted to introduced some more qualifiers to 'options' failed
print("Using 'example1' as a Word() to inside 'options{ };':")
clauses_mandatory_complex = (
        Keyword('options')
        + Literal('{')
        + Word('[a-zA-Z0-9]')
        + Literal(';')
        + Literal('}')
        + Literal(';')
)
pass_result = test_me(clauses_mandatory_complex, passing_test_data, False)
fail_result = test_me(clauses_mandatory_complex, failing_test_data, True)
print_all_results(pass_result, fail_result)

测试运行的输出如下:

/work/python/parsing/isc_config2/how-bad.py
Using 'example1' as a Literal() to inside 'options{ };':

options { example1; };
['options', '{', 'example1', ';', '}', ';']

acl; options { example1; };
['acl', ';', 'options', '{', 'example1', ';', '}', ';']

options { example1; }; acl;
['options', '{', 'example1', ';', '}', ';', 'acl', ';']

options { example1; }; server;
['options', '{', 'example1', ';', '}', ';', 'server', ';']

server; options { example1; };
['server', ';', 'options', '{', 'example1', ';', '}', ';']

acl; options { example1; }; server;
['acl', ';', 'options', '{', 'example1', ';', '}', ';', 'server', ';']

acl; server; options { example1; };
['acl', ';', 'server', ';', 'options', '{', 'example1', ';', '}', ';']

options { example1; }; acl; server;
['options', '{', 'example1', ';', '}', ';', 'acl', ';', 'server', ';']

options { example1; }; server; acl;
['options', '{', 'example1', ';', '}', ';', 'server', ';', 'acl', ';']

server; acl; options { example1; };
['server', ';', 'acl', ';', 'options', '{', 'example1', ';', '}', ';']

server; options { example1; }; acl;
['server', ';', 'options', '{', 'example1', ';', '}', ';', 'acl', ';']
(   True,
    [   (   'options { example1; };',
            (['options', '{', 'example1', ';', '}', ';'], {})),
        (   'acl; options { example1; };',
            (['acl', ';', 'options', '{', 'example1', ';', '}', ';'], {})),
        (   'options { example1; }; acl;',
            (['options', '{', 'example1', ';', '}', ';', 'acl', ';'], {})),
        (   'options { example1; }; server;',
            (['options', '{', 'example1', ';', '}', ';', 'server', ';'], {})),
        (   'server; options { example1; };',
            (['server', ';', 'options', '{', 'example1', ';', '}', ';'], {})),
        (   'acl; options { example1; }; server;',
            (['acl', ';', 'options', '{', 'example1', ';', '}', ';', 'server', ';'], {})),
        (   'acl; server; options { example1; };',
            (['acl', ';', 'server', ';', 'options', '{', 'example1', ';', '}', ';'], {})),
        (   'options { example1; }; acl; server;',
            (['options', '{', 'example1', ';', '}', ';', 'acl', ';', 'server', ';'], {})),
        (   'options { example1; }; server; acl;',
            (['options', '{', 'example1', ';', '}', ';', 'server', ';', 'acl', ';'], {})),
        (   'server; acl; options { example1; };',
            (['server', ';', 'acl', ';', 'options', '{', 'example1', ';', '}', ';'], {})),
        (   'server; options { example1; }; acl;',
            (['server', ';', 'options', '{', 'example1', ';', '}', ';', 'acl', ';'], {}))])

acl;
^
FAIL: Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

acl; acl;
^
FAIL: Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

server; acl;
^
FAIL: Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)

server;
^
FAIL: Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)

acl; server;
^
FAIL: Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

options { example1; }; options { example1; };
                       ^
FAIL: Expected end of text, found 'o'  (at char 23), (line:1, col:24)
(   True,
    [   (   'acl;',
            Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'acl; acl;',
            Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'server; acl;',
            Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)),
        (   'server;',
            Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)),
        (   'acl; server;',
            Missing one or more required elements ({"options" "{" "example1" ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'options { example1; }; options { example1; };',
            Expected end of text, found 'o'  (at char 23), (line:1, col:24))])
Purposely passed test: True. 
Purposely failed test: True. 


Using 'example1' as a Word() to inside 'options{ };':
/usr/local/lib/python3.7/site-packages/pyparsing.py:3161: FutureWarning: Possible nested set at position 1
  self.re = re.compile(self.reString)

options { example1; };
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)

acl; options { example1; };
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

options { example1; }; acl;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)

options { example1; }; server;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)

server; options { example1; };
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)

acl; options { example1; }; server;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

acl; server; options { example1; };
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

options { example1; }; acl; server;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)

options { example1; }; server; acl;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)

server; acl; options { example1; };
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)

server; options { example1; }; acl;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)
(   False,
    [   (   'options { example1; };',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)),
        (   'acl; options { example1; };',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'options { example1; }; acl;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)),
        (   'options { example1; }; server;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)),
        (   'server; options { example1; };',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)),
        (   'acl; options { example1; }; server;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'acl; server; options { example1; };',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'options { example1; }; acl; server;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)),
        (   'options { example1; }; server; acl;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)),
        (   'server; acl; options { example1; };',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)),
        (   'server; options { example1; }; acl;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1))])

acl;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

acl; acl;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

server; acl;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)

server;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)

acl; server;
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)

options { example1; }; options { example1; };
^
FAIL: Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1)
(   True,
    [   (   'acl;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'acl; acl;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'server; acl;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)),
        (   'server;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 's'  (at char 0), (line:1, col:1)),
        (   'acl; server;',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'a'  (at char 0), (line:1, col:1)),
        (   'options { example1; }; options { example1; };',
            Missing one or more required elements ({"options" "{" W:([a-z...) ";" "}" ";"}), found 'o'  (at char 0), (line:1, col:1))])
Purposely passed test: False. 
Purposely failed test: True. 

编辑: 此处发现错误:

Word('[a-zA-Z0-9]')

应该是:

Word(srange('[a-zA-Z0-9]'))

有没有办法改进该错误的插入符“^”定位,使其指向测试数据“example1”而不是关键字?那会在这里节省很多时间。

唉!

替换违规语句:

Word('[a-zA-Z0-9]')

与:

Word(srange('[a-zA-Z0-9]'))

然后问题就消失了。

此类问题的基本答案通常是用 '-' 运算符替换一个或几个 '+' 运算符。 '-' 如果在后续匹配中发现错误,则告诉 pyparsing 禁用回溯。

例如,如果您的语法中有一个关键字没有在其他地方使用,那么您应该合理地期望该关键字之后的任何解析错误都是真正的错误,而不仅仅是不匹配的替代项。在这个关键字后面加上 '-' 是让您的解析器指示特定错误位置的好方法,而不是仅仅标记一组更高级别替代项中的 none 是匹配项。

必须小心'-',而不是用'-'替换所有'+'实例,因为这会打败所有 回溯,并可能阻止您的解析器匹配合法的替代表达式。

所以我正要 post 以下内容会改善您的错误消息:

clauses_mandatory_complex = (
        Keyword('options')
        - Literal('{')
        + Word('[a-zA-Z0-9]')
        + Literal(';')
        + Literal('}')
        + Literal(';')
)

但是当我尝试的时候,我并没有真正得到更好的结果。在这种情况下,令人困惑的问题是您使用 '&' 来乱序每个匹配,虽然在您的解析器中完全合法,但混淆了异常处理(可能会发现 pyparsing 中的错误)。如果您在 clauses_all_and 表达式中将 '&' 替换为 '+',您将在此处看到 '-' 运算符:

options { example1; };
          ^(FATAL)
FAIL: Expected W:([a-z...), found 'e'  (at char 10), (line:1, col:11)

事实上,这指向了 pyparsing 的一般调试策略:如果复杂表达式没有提供有用的异常消息,请单独尝试子表达式。

Pyparsing 在处理包含 MatchFirst 或 Or 表达式('|''^' 运算符)的语法时会进行大量回溯和重试,但在处理 Each ('&' 运算符)。在您的情况下,当我使用 '-' 运算符时,出现了一个非回溯异常,但 Each 将其降级为回溯异常,以便它可以继续尝试其他组合。我会进一步研究这个问题,看看是否有避免这种降级的好方法。