如何过滤解析器组合器中的保留字？

Question

我正在使用 Scala 的 Parser Combinator 框架，扩展 RegexParsers class。我有一个 identifier 标记，它以字母开头，可以包含字母字符、破折号、下划线和数字，只要它不是保留字之一即可。我尝试使用解析器的 not() 来阻止使用保留字，但它也匹配以保留字为前缀的标识符。

def reserved = "and" | "or"

def identifier: Parser[String] = not(reserved) ~> """[a-zA-Z][\.a-zA-Z0-9_-]*""".r

但是，当我尝试解析像 and-today 这样的标识符时，我收到一条错误消息 Expected Failure。

如果保留字与令牌完全匹配，而不仅仅是前缀，我该如何过滤？

另外，在使用 not() 时，有没有办法改善这种情况下的错误报告？在其他情况下，我得到解析器期望的正则表达式，但在这种情况下，它只是说 Failure，没有任何细节。

Answer 1

您可以使用 filterWithError 过滤保留字和自定义错误消息，如下所示：

    val reservedWords = HashSet("and", "or")

    val idRegex= """[a-zA-Z][\.a-zA-Z0-9_-]*""".r

    val identifier = Parser(input =>
      idRegex(input).filterWithError(
        !reservedWords.contains(_),
        reservedWord => s"YOUR ERROR MESSAGE FOR $reservedWord",
        input
      )
    )

如何过滤解析器组合器中的保留字？

How to filter reserved words in parser combinators?

parsing

scala

parser-combinators