PEG语法有序选择失败

PEG grammar ordered choice failure

我有一个使用 Python Arpeggio 包的玩具 DSL 的 PEG 语法:

from arpeggio.cleanpeg import ParserPEG

grammar = """
    root    = block* EOF
    block   = header (item1+ / item2+)
    header  = "block"
    item1   = number name comment?
    item2   = number name list comment?
    number  = r"\d+"
    name    = r"\w+"
    list    = r"\[.*\]"
    comment = r"\/\/.*"
"""

doc = """
block
  5 alpha []        //
  3 beta [a, b, c]  // this is an item2

block
  6 foo
  1 bar  // This is an item1
  4 baz  // more stuff
"""

parser = ParserPEG(grammar, 'root', debug=True)
parse_tree = parser.parse(doc)
print ('Tree:', parse_tree)

这在解析测试文档时给出了奇怪的结果:它正确地未能匹配有序选择中的 item1,但随后错误地声称(标有 xxxx 的行)它确实匹配了没有测试的选择 item2,这本来是匹配的。

>> Matching rule root=Sequence at position 0 => * block   5
   >> Matching rule ZeroOrMore in root at position 0 => * block   5
      >> Matching rule block=Sequence in root at position 0 => * block   5
         ?? Try match rule header=StrMatch(block) in block at position 1 =>  *block   5 
         ++ Match 'block' at 1 => ' *block*   5 '
         >> Matching rule OrderedChoice in block at position 6 =>  block*   5 alpha
            >> Matching rule OneOrMore in block at position 6 =>  block*   5 alpha
               >> Matching rule item1=Sequence in block at position 6 =>  block*   5 alpha
                  ?? Try match rule number=RegExMatch(\d+) in item1 at position 9 =>  block   *5 alpha []
                  ++ Match '5' at 9 => ' block   *5* alpha []'
                  ?? Try match rule name=RegExMatch(\w+) in item1 at position 11 => block   5 *alpha []  
                  ++ Match 'alpha' at 11 => 'block   5 *alpha* []  '
                  >> Matching rule Optional in item1 at position 16 =>    5 alpha* []       
                     ?? Try match rule comment=RegExMatch(\/\/.*) in item1 at position 17 =>   5 alpha *[]        
                     -- NoMatch at 17
                  <<- Not matched rule Optional in item1 at position 16 =>    5 alpha* []       
               <<+ Matched rule item1=Sequence in item1 at position 16 =>    5 alpha* []       
               >> Matching rule item1=Sequence in block at position 16 =>    5 alpha* []       
                  ?? Try match rule number=RegExMatch(\d+) in item1 at position 17 =>   5 alpha *[]        
                  -- NoMatch at 17
               <<- Not matched rule item1=Sequence in item1 at position 16 =>    5 alpha* []
xxxx-->     <<+ Matched rule OneOrMore in block at position 16 =>    5 alpha* []       
         <<+ Matched rule OrderedChoice in block at position 16 =>    5 alpha* []       
      <<+ Matched rule block=Sequence in block at position 16 =>    5 alpha* []       
      >> Matching rule block=Sequence in root at position 16 =>    5 alpha* []       
         ?? Try match rule header=StrMatch(block) in block at position 17 =>   5 alpha *[]        
         -- No match 'block' at 17 => '  5 alpha *[]   *     '
      <<- Not matched rule block=Sequence in block at position 16 =>    5 alpha* []       
   <<+ Matched rule ZeroOrMore in root at position 16 =>    5 alpha* []       
   ?? Try match rule EOF in root at position 17 =>   5 alpha *[]        
   !! EOF not matched.
<<- Not matched rule root=Sequence in root at position 0 => * block   5

因此,在 item1 失败后,解析器无法使用如果它实际匹配 item2 本应使用的 item2

这是解析器包中的错误还是我的语法中的错误?

注意颠倒顺序选择:

    block   = header (item2+ / item1+)

正确解析示例文档。但是在玩具问题中很容易解决的异常结果可能在真实语法中更难找到。一个项目明确地是 item1 或 item2,因此检查它们的顺序应该是无关紧要的,并且解析它们的代码应该一致地工作。

PEG中,表达式的顺序OrderedChoice很重要。当解析器尝试 item1+ 时,至少匹配 item1 之一就足以成功,然后整个有序选择被认为是成功的。

一般来说,始终将更具体的匹配项放在开头,将更笼统的匹配项放在有序选择的末尾。

更新:在维基百科的 Ambiguity detection and influence of rule order on language that is matched 部分有一个很好的解释。

恐怕这是你的语法错误。

item1 是一个数字,一个单词和一个可选的注释。 5 alpha 符合该描述,因此匹配成功。 item1+ 匹配一个或多个 item1 并且有一个 item1,所以这一切都很好;它找到了 block.

目标是 block* EOF,即“零个或多个 block 后跟文件结束指示符。它找到了一个 block,但是 block后面没有另一个 block 也没有 EOF。(它后面跟着一个列表。)所以现在解析器有 运行 的选择,所以它宣布失败。

PEG 解析器不会重试成功的解析;这是 PEG 解析的关键方面。一旦子表达式匹配,PEG 将不会用它来交换不同的子表达式。