为什么在将 Moo 用作 tokenizer/lexer 时，nearley-unparse 不包括从编译的 Nearley 语法生成的示例字符串中的标记

Question

我不确定这是否是 Nearley.js library, the Moo tokenizer/lexer or with my own code. So I might need to submit this as an issue to the Nearley repo. All the referenced files can be found in this Gist 的问题。

我正在尝试编写一个 Nearley 语法来解析我的一个类的家庭作业问题列表。问题在 problems.txt 中，看起来像这样：

Section 5.2 (Due 4/23)- #3, 5*, 8*, 9, 11, 14*, 15, 17*, 18*, 20, 21*, 22*, 24*, 25 (see example 5, not discussed in class)
Section 5.3 (Due 4/30)- #1, 3*, 4, 5, 6*, 7, 9*, 11, 13*, 16, 20*, 21*, 22*, 23, 24*, 25*, 27, 28*, 31, 32

这只是两行示例，整个文件更大。

我写的 Nearley 语法在 Nearley 文档的 problems-grammar.ne here and I'm not entirely finished yet. I'm using the Moo tokenizer/lexer according to these instructions 中。

我目前正在使用 nearley-unparse 命令测试我的语法，正如 here 使用此命令所解释的那样，其中 problems-grammar.js 是由 Nearley 编译的解析器。

nearley-unparse problems-grammar.js -o test.txt

不幸的是，除换行符外，解析器似乎无法正确生成带有标记示例的语法。 Here 是 nearley-unparse 的一个输出：

Section  (Due )- #*, , 
Section  (Due )- #, *, 
Section  (Due )- #*, , , *, 
Section  (Due )- #*, *
Section  (Due )- #*, *, *, *

我想知道这是我的语法缺陷还是 Nearley/Moo 本身的缺陷。如果是我的代码有问题，我该如何解决？

Answer 1

由于我没有从这里收到答复，所以我继续 asked in the Nearley GitHub repo。

根据维护者的说法，nearley-unparse 目前无法生成字符串来匹配正则表达式。也没有任何计划添加该功能，因为它本身就是一个项目。

这是他们的完整回复：

Hello there! Thanks for trying to post a Whosebug question first, I’m sorry there wasn’t anyone able to help there :-)

This is a limitation of the unparser: it doesn’t know how to generate random strings satisfying a regexp, nor are we planning to do so (that would be a project in itself!).

Your grammar looks fine to me, at a brief glance; if you test it with nearley-test, hopefully you’ll find you get the parse trees you expect.

为什么在将 Moo 用作 tokenizer/lexer 时，nearley-unparse 不包括从编译的 Nearley 语法生成的示例字符串中的标记

Why is nearley-unparse not including tokens in sample strings generated from a compiled Nearley grammar when using Moo as tokenizer/lexer

javascript

parsing

lexer

context-free-grammar

nearley