使用 Parsec 解析与运算符相邻的字母时出现问题
Trouble parsing letters adjacent to operators with Parsec
我正在尝试在 Haskell 中使用 Parsec 解析一种简化的表达式语言,以解决 CodeWars 上的 Tiny Three-Pass Compiler kata。我 运行 遇到以下问题:如果标识符和运算符之间没有空格,我的解析器将无法正确解析; a * a
解析为完整的表达式,但 a*a
仅生成第一个 a
。
演示问题的堆栈脚本:
#!/usr/bin/env stack
-- stack --resolver lts-10.7 script
import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as Tok
langDef :: Tok.LanguageDef ()
langDef = Tok.LanguageDef
{ Tok.commentStart = ""
, Tok.commentEnd = ""
, Tok.commentLine = ""
, Tok.nestedComments = False
, Tok.identStart = letter
, Tok.identLetter = letter
, Tok.opStart = oneOf "+-*/"
, Tok.opLetter = oneOf "+-*/"
, Tok.reservedNames = []
, Tok.reservedOpNames = []
, Tok.caseSensitive = True
}
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser langDef
identifier :: Parser String
identifier = Tok.identifier lexer
reserved :: String -> Parser ()
reserved = Tok.reserved lexer
data AST = Var String
| Add AST AST
| Sub AST AST
| Mul AST AST
| Div AST AST
deriving (Eq, Show)
expression :: Parser AST
expression = term `chainl1` addSubOp
addSubOp :: Parser (AST -> AST -> AST)
addSubOp = (reserved "+" >> return Add)
<|> (reserved "-" >> return Sub)
term :: Parser AST
term = factor `chainl1` multDivOp
multDivOp :: Parser (AST -> AST -> AST)
multDivOp = (reserved "*" >> return Mul)
<|> (reserved "/" >> return Div)
factor :: Parser AST
factor = variable
variable :: Parser AST
variable = do
varName <- identifier
return $ Var varName
main = do
putStrLn $ show $ parse expression "" "a + a"
putStrLn $ show $ parse expression "" "a+a"
putStrLn $ show $ parse expression "" "a - a"
putStrLn $ show $ parse expression "" "a-a"
putStrLn $ show $ parse expression "" "a * a"
putStrLn $ show $ parse expression "" "a*a"
putStrLn $ show $ parse expression "" "a / a"
putStrLn $ show $ parse expression "" "a/a"
运行 这输出:
$ ./AdjacentParseIssue.hs
Right (Add (Var "a") (Var "a"))
Right (Var "a")
Right (Sub (Var "a") (Var "a"))
Right (Var "a")
Right (Mul (Var "a") (Var "a"))
Right (Var "a")
Right (Div (Var "a") (Var "a"))
Right (Var "a")
如何编写我的解析器以使 a * a
和 a*a
都解析为相同的结果?
Tok.reserved
用于标识符。解析运算符时,您应该使用 Tok.reservedOp
。考虑更改对 reserved
的调用以调用类似的函数:
reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer
编辑
为了阐明幕后发生的事情,这里是 Tok.reserved
的实现:
reserved name =
lexeme $ try $
do{ _ <- caseString name
; notFollowedBy (identLetter languageDef) <?> ("end of " ++ show name)
}
请注意 reserved
盲目接受 name
而不验证它是否是有效的运算符或标识符,但如果有更多有效的标识符字符它会停止(否则 reserved "foo"
会产生fooBar
).
值的错误结果
由于您指定标识符是任何有效字母,Tok.reserved
会在找到更多字母时停止,这就是 "*a"
失败的原因。
Tok.reservedOp
包括一个类似的限制,short-circuits 在相邻的运算符字符(来自 opLetter
)存在时进行解析。 (例如,否则您可能会将 **
(常见的指数表达式)误认为是 *
)
我正在尝试在 Haskell 中使用 Parsec 解析一种简化的表达式语言,以解决 CodeWars 上的 Tiny Three-Pass Compiler kata。我 运行 遇到以下问题:如果标识符和运算符之间没有空格,我的解析器将无法正确解析; a * a
解析为完整的表达式,但 a*a
仅生成第一个 a
。
演示问题的堆栈脚本:
#!/usr/bin/env stack
-- stack --resolver lts-10.7 script
import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as Tok
langDef :: Tok.LanguageDef ()
langDef = Tok.LanguageDef
{ Tok.commentStart = ""
, Tok.commentEnd = ""
, Tok.commentLine = ""
, Tok.nestedComments = False
, Tok.identStart = letter
, Tok.identLetter = letter
, Tok.opStart = oneOf "+-*/"
, Tok.opLetter = oneOf "+-*/"
, Tok.reservedNames = []
, Tok.reservedOpNames = []
, Tok.caseSensitive = True
}
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser langDef
identifier :: Parser String
identifier = Tok.identifier lexer
reserved :: String -> Parser ()
reserved = Tok.reserved lexer
data AST = Var String
| Add AST AST
| Sub AST AST
| Mul AST AST
| Div AST AST
deriving (Eq, Show)
expression :: Parser AST
expression = term `chainl1` addSubOp
addSubOp :: Parser (AST -> AST -> AST)
addSubOp = (reserved "+" >> return Add)
<|> (reserved "-" >> return Sub)
term :: Parser AST
term = factor `chainl1` multDivOp
multDivOp :: Parser (AST -> AST -> AST)
multDivOp = (reserved "*" >> return Mul)
<|> (reserved "/" >> return Div)
factor :: Parser AST
factor = variable
variable :: Parser AST
variable = do
varName <- identifier
return $ Var varName
main = do
putStrLn $ show $ parse expression "" "a + a"
putStrLn $ show $ parse expression "" "a+a"
putStrLn $ show $ parse expression "" "a - a"
putStrLn $ show $ parse expression "" "a-a"
putStrLn $ show $ parse expression "" "a * a"
putStrLn $ show $ parse expression "" "a*a"
putStrLn $ show $ parse expression "" "a / a"
putStrLn $ show $ parse expression "" "a/a"
运行 这输出:
$ ./AdjacentParseIssue.hs
Right (Add (Var "a") (Var "a"))
Right (Var "a")
Right (Sub (Var "a") (Var "a"))
Right (Var "a")
Right (Mul (Var "a") (Var "a"))
Right (Var "a")
Right (Div (Var "a") (Var "a"))
Right (Var "a")
如何编写我的解析器以使 a * a
和 a*a
都解析为相同的结果?
Tok.reserved
用于标识符。解析运算符时,您应该使用 Tok.reservedOp
。考虑更改对 reserved
的调用以调用类似的函数:
reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer
编辑
为了阐明幕后发生的事情,这里是 Tok.reserved
的实现:
reserved name =
lexeme $ try $
do{ _ <- caseString name
; notFollowedBy (identLetter languageDef) <?> ("end of " ++ show name)
}
请注意 reserved
盲目接受 name
而不验证它是否是有效的运算符或标识符,但如果有更多有效的标识符字符它会停止(否则 reserved "foo"
会产生fooBar
).
由于您指定标识符是任何有效字母,Tok.reserved
会在找到更多字母时停止,这就是 "*a"
失败的原因。
Tok.reservedOp
包括一个类似的限制,short-circuits 在相邻的运算符字符(来自 opLetter
)存在时进行解析。 (例如,否则您可能会将 **
(常见的指数表达式)误认为是 *
)