Peg.js 引擎是否像正则表达式那样在前瞻后退步?

Does the Peg.js engine backstep after a lookahead like regexs do?

根据常规-expressions.info 环视,引擎在前瞻后后退:

Let's take one more look inside, to make sure you understand the implications of the lookahead. Let's apply q(?=u)i to quit. The lookahead is now positive and is followed by another token. Again, q matches q and u matches u. Again, the match from the lookahead must be discarded, so the engine steps back from i in the string to u. The lookahead was successful, so the engine continues with i. But i cannot match u. So this match attempt fails. All remaining attempts fail as well, because there are no more q's in the string.

然而,在 Peg.js 它 SEEMS 就像引擎仍然移动通过 &! 所以实际上它不是'不是正则表达式意义上的前瞻,而是消费决定,没有后退,因此没有真正的前瞻。

是这样吗?

(如果是这样,那么某些解析甚至不可能,比如 this one?)

Lookahead 的工作方式类似于它在正则表达式引擎中的工作方式。

此查询匹配失败,因为下一个字母应该是 'u',而不是 'i'

word = 'q' &'u' 'i' 't'

本次查询成功:

word = 'q' &'u' 'u' 'i' 't'

本次查询成功:

word = 'q' 'u' 'i' 't'

至于你的例子,按照这些思路尝试一些东西,你根本不需要使用前瞻:

expression
    = termPair ( _ delimiter _ termPair )*

termPair
    = term ('.' term)? ' ' term ('.' term)?

term "term"
    = $([a-z0-9]+)

delimiter "delimiter"
    = "."

_ "whitespace"
    = [ \t\n\r]+

编辑:根据下面的评论添加了另一个示例。

expression
    = first:term rest:delimTerm* { return [first].concat(rest); }

delimTerm
    = delimiter t:term { return t; }

term "term"
    = $((!delimiter [a-z0-9. ])+)

delimiter "delimiter"
    = _ "." _

_ "whitespace"
    = [ \t\n\r]+

编辑:添加了术语表达式的额外解释。

我将尝试稍微分解一下术语规则 $((!delimiter [a-z0-9. ])+)

$() 将内部的所有内容转换为单个文本节点,如 [].join('').

单词的单个"character"是任何字符[a-z0-9. ],如果我们想简化它,我们可以说.。在匹配字符之前,我们要先查找 delimiter,如果找到 delimiter,我们将停止匹配该字符。因为我们想要多个字符,所以我们使用 +.

多次执行整个操作

它认为以这种方式前进是 PEG 解析器中的常见习语。我从 treetop 文档中学到了匹配字符串的想法。