使用 Parsec 解析与运算符相邻的字母时出现问题

Trouble parsing letters adjacent to operators with Parsec

我正在尝试在 Haskell 中使用 Parsec 解析一种简化的表达式语言,以解决 CodeWars 上的 Tiny Three-Pass Compiler kata。我 运行 遇到以下问题:如果标识符和运算符之间没有空格,我的解析器将无法正确解析; a * a 解析为完整的表达式,但 a*a 仅生成第一个 a

演示问题的堆栈脚本:

#!/usr/bin/env stack
-- stack --resolver lts-10.7 script

import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as Tok

langDef :: Tok.LanguageDef ()
langDef = Tok.LanguageDef
  { Tok.commentStart    = ""
  , Tok.commentEnd      = ""
  , Tok.commentLine     = ""
  , Tok.nestedComments  = False
  , Tok.identStart      = letter
  , Tok.identLetter     = letter
  , Tok.opStart         = oneOf "+-*/"
  , Tok.opLetter        = oneOf "+-*/"
  , Tok.reservedNames   = []
  , Tok.reservedOpNames = []
  , Tok.caseSensitive   = True
  }

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser langDef

identifier :: Parser String
identifier = Tok.identifier lexer

reserved :: String -> Parser ()
reserved = Tok.reserved lexer

data AST = Var String
         | Add AST AST
         | Sub AST AST
         | Mul AST AST
         | Div AST AST
         deriving (Eq, Show)

expression :: Parser AST
expression = term `chainl1` addSubOp

addSubOp :: Parser (AST -> AST -> AST)
addSubOp =  (reserved "+" >> return Add)
        <|> (reserved "-" >> return Sub)

term :: Parser AST
term = factor `chainl1` multDivOp

multDivOp :: Parser (AST -> AST -> AST)
multDivOp =  (reserved "*" >> return Mul)
         <|> (reserved "/" >> return Div)

factor :: Parser AST
factor = variable

variable :: Parser AST
variable = do
  varName <- identifier
  return $ Var varName

main = do
  putStrLn $ show $ parse expression "" "a + a"
  putStrLn $ show $ parse expression "" "a+a"
  putStrLn $ show $ parse expression "" "a - a"
  putStrLn $ show $ parse expression "" "a-a"
  putStrLn $ show $ parse expression "" "a * a"
  putStrLn $ show $ parse expression "" "a*a"
  putStrLn $ show $ parse expression "" "a / a"
  putStrLn $ show $ parse expression "" "a/a"

运行 这输出:

$ ./AdjacentParseIssue.hs 
Right (Add (Var "a") (Var "a"))
Right (Var "a")
Right (Sub (Var "a") (Var "a"))
Right (Var "a")
Right (Mul (Var "a") (Var "a"))
Right (Var "a")
Right (Div (Var "a") (Var "a"))
Right (Var "a")

如何编写我的解析器以使 a * aa*a 都解析为相同的结果?

Tok.reserved 用于标识符。解析运算符时,您应该使用 Tok.reservedOp。考虑更改对 reserved 的调用以调用类似的函数:

reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer

编辑

为了阐明幕后发生的事情,这里是 Tok.reserved 的实现:

reserved name =
    lexeme $ try $
    do{ _ <- caseString name
      ; notFollowedBy (identLetter languageDef) <?> ("end of " ++ show name)
      }

请注意 reserved 盲目接受 name 而不验证它是否是有效的运算符或标识符,但如果有更多有效的标识符字符它会停止(否则 reserved "foo" 会产生fooBar).

值的错误结果

由于您指定标识符是任何有效字母,Tok.reserved 会在找到更多字母时停止,这就是 "*a" 失败的原因。

Tok.reservedOp 包括一个类似的限制,short-circuits 在相邻的运算符字符(来自 opLetter)存在时进行解析。 (例如,否则您可能会将 **(常见的指数表达式)误认为是 *