为什么在将 Moo 用作 tokenizer/lexer 时,nearley-unparse 不包括从编译的 Nearley 语法生成的示例字符串中的标记

Why is nearley-unparse not including tokens in sample strings generated from a compiled Nearley grammar when using Moo as tokenizer/lexer

我不确定这是否是 Nearley.js library, the Moo tokenizer/lexer or with my own code. So I might need to submit this as an issue to the Nearley repo. All the referenced files can be found in this Gist 的问题。

我正在尝试编写一个 Nearley 语法来解析我的一个 类 的家庭作业问题列表。问题在 problems.txt 中,看起来像这样:

Section 5.2 (Due 4/23)- #3, 5*, 8*, 9, 11, 14*, 15, 17*, 18*, 20, 21*, 22*, 24*, 25 (see example 5, not discussed in class)
Section 5.3 (Due 4/30)- #1, 3*, 4, 5, 6*, 7, 9*, 11, 13*, 16, 20*, 21*, 22*, 23, 24*, 25*, 27, 28*, 31, 32

这只是两行示例,整个文件更大。

我写的 Nearley 语法在 Nearley 文档的 problems-grammar.ne here and I'm not entirely finished yet. I'm using the Moo tokenizer/lexer according to these instructions 中。

我目前正在使用 nearley-unparse 命令测试我的语法,正如 here 使用此命令所解释的那样,其中 problems-grammar.js 是由 Nearley 编译的解析器。

nearley-unparse problems-grammar.js -o test.txt

不幸的是,除换行符外,解析器似乎无法正确生成带有标记示例的语法。 Herenearley-unparse 的一个输出:

Section  (Due )- #*, , 
Section  (Due )- #, *, 
Section  (Due )- #*, , , *, 
Section  (Due )- #*, *
Section  (Due )- #*, *, *, *

我想知道这是我的语法缺陷还是 Nearley/Moo 本身的缺陷。如果是我的代码有问题,我该如何解决?

由于我没有从这里收到答复,所以我继续 asked in the Nearley GitHub repo

根据维护者的说法,nearley-unparse 目前无法生成字符串来匹配正则表达式。也没有任何计划添加该功能,因为它本身就是一个项目。

这是他们的完整回复:

Hello there! Thanks for trying to post a Whosebug question first, I’m sorry there wasn’t anyone able to help there :-)

This is a limitation of the unparser: it doesn’t know how to generate random strings satisfying a regexp, nor are we planning to do so (that would be a project in itself!).

Your grammar looks fine to me, at a brief glance; if you test it with nearley-test, hopefully you’ll find you get the parse trees you expect.