在 Rascal 中处理复杂的词法

Question

在 Rascal 中处理复杂文字的最佳实践是什么？

来自JavaScript的两个例子（我的DSL有类似的情况）：

具有 \ 转义的字符串 - 必须未转义为实际值。
正则表达式文字 - 需要它们自己的子 AST。

implode 拒绝将词法映射到抽象树，尽管有完整的解析树可用，但它们显然是 handed differently 来自语法产品。例如，以下解析器因 IllegalArgument("Missing lexical constructor"):

而失败

module lexicals

import Prelude;

lexical Char = "\" ![] | ![\]; // potentially escaped character
lexical String = "\"" Char* "\""; // if I make this "syntax", implode works as expected

start syntax Expr = string: String;

data EXPR = string(list[str] chars);

void main(list[str] args) {
    str text = "\"Hello\nworld\"";
    print(implode(#EXPR, parse(#Expr, text)));
}

到目前为止，我唯一的想法是将所有词法捕获为原始字符串，然后使用单独定义的没有布局空格的语法重新解析它们（内爆和所有）。希望有更好的方法。

Answer 1

implode 将解析树转换为 ast 的方法在 rascal tutor:implode 中的文档中。这包含以下规则：

Unlabeled lexicals are imploded to str, int, real, bool depending on the expected type in the ADT. To implode lexical into types other than str, the PDB parse functions for integers and doubles are used. Boolean lexicals should match "true" or "false". NB: lexicals are imploded this way, even if they are ambiguous.

因此，解决方案 1 是为您的作品添加标签：

lexical String = string: "\"" Char* "\"";

此外，也许您不需要在解析树旁边放置 AST？至少不是必须与您的语法紧密匹配的。两种常见情况是：

您需要 AST，因为语法结构不适合您的目的。在这种情况下，您必须手动编写 implode 函数。
你的解析树结构足够好。在这种情况下，请查看 Concrete Syntax 的示例。这是使用嵌套在 rascal 中的目标语言的一种非常干净的方式。

我们越来越倾向于弃用 implode 函数，因为我们的具体语法在大多数情况下都足够强大。

在 Rascal 中处理复杂的词法

Processing complex lexicals in Rascal

lexical-analysis

rascal