如何编写一个 PEG 解析器来完全使用任何和所有文本,同时仍然匹配其他给定规则?

How to write a PEG parser that fully consumes any and all text whilst still matching other given rules?

我正在制作一个应用程序,使编写 (PEG) 解析器对没有经验的人来说更容易上手和更友好。是的,以前有人做过,但这对我来说是一次很好的 GUI 学习经历。




我使用 parsimonious 库的 MRE。它之所以有效,是因为 match 将匹配任何顶级用户定义的表达式,并且有一个后备匹配任何其他内容,遗憾的是一次只能匹配一个字符。

from parsimonious.grammar import Grammar

grammar = Grammar("""
root = (match / any)*
match = foo / bar # must include all top level user defined rules, but not their children (if any)
any = ~"."
foo = "foo expression" # user defined
bar = "bar expression" # user defined

print(grammar.match("1 foo expression 2 bar expression 3"))


<Node called "root" matching "1 foo expression 2 bar expression 3">
    <Node matching "1">
        <RegexNode called "any" matching "1">
    <Node matching " ">
        <RegexNode called "any" matching " ">
    <Node matching "foo expression">
        <Node called "match" matching "foo expression">
            <Node called "foo" matching "foo expression">
    <Node matching " ">
        <RegexNode called "any" matching " ">
    <Node matching "2">
        <RegexNode called "any" matching "2">
    <Node matching " ">
        <RegexNode called "any" matching " ">
    <Node matching "bar expression">
        <Node called "match" matching "bar expression">
            <Node called "bar" matching "bar expression">
    <Node matching " ">
        <RegexNode called "any" matching " ">
    <Node matching "3">
        <RegexNode called "any" matching "3">


Parsimonious 自述文件中有这样的示例。

my_grammar = Grammar(r"""
    styled_text = bold_text / italic_text
    bold_text   = "((" text "))"
    italic_text = "''" text "''"
    text        = ~"[A-Z 0-9]*"i

这对我来说表明有一种方法可以在我不知道的更大的文本(包含既不是粗体也不是斜体的文本)上使用它。除了在文档的每个位置上为 parse/match 使用可选的“pos”(位置)参数外,这也不优雅。
