在 Haskell 中使用 Alex 制作解析 Dice Rolls 的词法分析器

Question

我正在使用 Alex + Happy 在 Haskell 中为 DSL 制作解析器。我的 DSL 使用 掷骰子 作为可能表达式的一部分。

有时我想要解析的表达式如下所示：

[some code...]  3D6  [... rest of the code]

大致应翻译为：

TokenInt {... value = 3}, TokenD, TokenInt {... value = 6}

我的 DSL 也使用变量（基本上是字符串），所以我有一个特殊的标记来处理变量名。所以，有了这个标记：

"D"                                 { \pos str -> TokenD pos }
$alpha [$alpha $digit \_ \']*       { \pos str -> TokenName pos str}
$digit+                             { \pos str -> TokenInt pos (read str) }

我现在使用解析时得到的结果是：

TokenInt {... value = 3}, TokenName { ... , name = "D6"}

这意味着我的词法分析器“读取”了一个 Integer 和一个名为“D6”的 Variable。

我尝试了很多东西，例如，我将令牌 D 更改为：

$digit "D" $digit                   { \pos str -> TokenD pos }

但这只是消耗了数字:(

我可以用数字解析掷骰子吗？
或者至少解析 TokenInt-TokenD-TokenInt？

PS：我正在使用 PosN 作为包装器，不确定是否相关。

Answer 1

我的方法是将 TokenD 类型扩展为 TokenD Int Int，因此为了方便起见，我会使用 basic 包装器

$digit+ D $digit+ { dice }
...
dice :: String -> Token
dice s = TokenD (read $ head ls) (read $ last ls)
  where ls = split 'D' s

split可以找到here.

这是一个额外的步骤，通常会在句法分析期间完成，但在这里不会造成太大伤害。

我也无法让 Alex 将 $alpha 解析为 TokenD 而不是 TokenName。如果我们有 Di 而不是 D 那就没问题了。来自 Alex 的文档：

When the input stream matches more than one rule, the rule which matches the longest prefix of the input stream wins. If there are still several rules which match an equal number of characters, then the rule which appears earliest in the file wins.

但是你的代码应该可以工作。我不知道这是否是 Alex 的问题。

Answer 2

我决定我可以使用以 小写字母 开头的变量（例如 Haskell 变量），所以我更改了我的词法分析器以仅解析以 小写字母开头的变量一个小写字母。这也解决了一些其他保留字可能存在的问题。

我还是很想知道有没有其他的解决办法，不过问题本身就解决了。

谢谢大家！

在 Haskell 中使用 Alex 制作解析 Dice Rolls 的词法分析器

Using Alex in Haskell to make a lexer that parses Dice Rolls

dsl

parsing

haskell

lexer

alex