parsec:意外字符解析嵌套注释
parsec: unexpected character parsing nested comments
我正在尝试解析嵌套的类 C 块注释
import Text.ParserCombinators.Parsec
import Control.Monad (liftM)
flat :: Monad m => m [[a]] -> m [a]
flat = liftM concat
comment :: Parser String
comment = between (string "/*") (string "*/") (try nested <|> content)
where
content = many (try (noneOf "*/")
<|> try (char '*' >> notFollowedBy (char '/') >> return '*')
<|> try (char '/' >> notFollowedBy (char '*') >> return '/'))
nested = flat $ many comment
"1234567890"
解析得很好,但是当我尝试
parse comment "" "/*123/*456*/789*/"
我明白了
Left (line 1, column 3):
unexpected "1"
expecting "/*" or "*/"
我不明白为什么,我能想到的地方都有try
。请帮忙。
在像a <|> b
这样的表达式中,如果a
可以匹配空字符串那么b
将永远不会被尝试,而这种情况发生在try nested <|> content
.
您可以通过要求至少一个评论匹配或另一个字符来修正您的方法:
comment :: Parser String
comment = between (string "/*") (string "*/") ( flat $ many $ (try comment <|> liftM toString other ) )
where
toString x = [x]
other = try (noneOf "*/")
<|> try (char '*' >> notFollowedBy (char '/') >> return '*')
<|> try (char '/' >> notFollowedBy (char '*') >> return '/')
FWIW,这是 Text.Parsec.Token
的做法:
https://github.com/aslatter/parsec/blob/master/Text/Parsec/Token.hs#L698-714
对于您的具体情况,等效代码为:
import Data.List (nub)
commentStart = "/*"
commentEnd = "*/"
multiLineComment =
do { try (string commentStart)
; inComment
}
inComment = inCommentMulti
inCommentMulti
= do{ try (string commentEnd) ; return () }
<|> do{ multiLineComment ; inCommentMulti }
<|> do{ skipMany1 (noneOf startEnd) ; inCommentMulti }
<|> do{ oneOf startEnd ; inCommentMulti }
<?> "end of comment"
where
startEnd = nub (commentEnd ++ commentStart)
我正在尝试解析嵌套的类 C 块注释
import Text.ParserCombinators.Parsec
import Control.Monad (liftM)
flat :: Monad m => m [[a]] -> m [a]
flat = liftM concat
comment :: Parser String
comment = between (string "/*") (string "*/") (try nested <|> content)
where
content = many (try (noneOf "*/")
<|> try (char '*' >> notFollowedBy (char '/') >> return '*')
<|> try (char '/' >> notFollowedBy (char '*') >> return '/'))
nested = flat $ many comment
"1234567890"
解析得很好,但是当我尝试
parse comment "" "/*123/*456*/789*/"
我明白了
Left (line 1, column 3):
unexpected "1"
expecting "/*" or "*/"
我不明白为什么,我能想到的地方都有try
。请帮忙。
在像a <|> b
这样的表达式中,如果a
可以匹配空字符串那么b
将永远不会被尝试,而这种情况发生在try nested <|> content
.
您可以通过要求至少一个评论匹配或另一个字符来修正您的方法:
comment :: Parser String
comment = between (string "/*") (string "*/") ( flat $ many $ (try comment <|> liftM toString other ) )
where
toString x = [x]
other = try (noneOf "*/")
<|> try (char '*' >> notFollowedBy (char '/') >> return '*')
<|> try (char '/' >> notFollowedBy (char '*') >> return '/')
FWIW,这是 Text.Parsec.Token
的做法:
https://github.com/aslatter/parsec/blob/master/Text/Parsec/Token.hs#L698-714
对于您的具体情况,等效代码为:
import Data.List (nub)
commentStart = "/*"
commentEnd = "*/"
multiLineComment =
do { try (string commentStart)
; inComment
}
inComment = inCommentMulti
inCommentMulti
= do{ try (string commentEnd) ; return () }
<|> do{ multiLineComment ; inCommentMulti }
<|> do{ skipMany1 (noneOf startEnd) ; inCommentMulti }
<|> do{ oneOf startEnd ; inCommentMulti }
<?> "end of comment"
where
startEnd = nub (commentEnd ++ commentStart)