选择正确的 ReadP 解析结果
Choosing the right ReadP parse result
我正在尝试解析 RFC5322 电子邮件地址。我的解析器的工作原理是,在结果中,其中一个是正确的。但是,我该如何选择“正确”的结果呢?
给定字符串 Foo Bar <foo@bar.com>
,我的解析器应该生成一个值 Address (Just "Foo Bar") "foo@bar.com"
。
或者,给定字符串 foo@bar.com
,我的解析器应该生成 Address Nothing "foo@bar.com"
.
的值
首选包含名称的值。
我的解析器如下所示:
import Control.Applicative
import Data.Char
import qualified Data.Text as T
import Text.ParserCombinators.ReadP
onlyEmail :: ReadP Address
onlyEmail = do
skipSpaces
email <- many1 $ satisfy isAscii
skipSpaces
return $ Address Nothing (T.pack email)
withName :: ReadP Address
withName = do
skipSpaces
name <- many1 (satisfy isAscii)
skipSpaces
email <- between (char '<') (char '>') (many1 $ satisfy isAscii)
skipSpaces
return $ Address (Just $ T.pack name) (T.pack email)
rfc5322 :: ReadP Address
rfc5322 = withName <|> onlyEmail
当我 运行 带有 readP_to_S rfc5322 "Foo Bar <foo@bar.com>"
的解析器时,它会产生以下结果:
[ (Address {addressName = Nothing, addressEmail = "F"},"oo Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Fo"},"o Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo"},"Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo "},"Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo B"},"ar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Ba"},"r <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar"},"<foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar "},"<foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <"},"foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <f"},"oo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <fo"},"o@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo"},"@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@"},"bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@b"},"ar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@ba"},"r.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar"},".com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar."},"com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.c"},"om>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.co"},"m>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.com"},">")
, (Address {addressName = Just "Foo Bar", addressEmail = "foo@bar.com"},"")
, (Address {addressName = Just "Foo Bar ", addressEmail = "foo@bar.com"},"")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.com>"},"")
]
在这种情况下,我真正想要的结果出现在列表的倒数第三位。我该如何表达这种偏好?
你不应该做偏好。您的问题是您的部分解析器正在接受比实际需要更大的字符串集。
例如我的解决方案:
import Control.Bool
import Control.Applicative
import Data.Char
import qualified Data.Text as T
import Data.Text (Text)
import Text.ParserCombinators.ReadP
email :: ReadP Text
email = do
l <- part
a <- char '@'
d <- part
return . T.pack $ l ++ a:d
where
part = munch1 (isAscii <&&> (/='@') <&&> (/='<') <&&> (/='>'))
name :: ReadP Text
name = T.pack <$> chainr1 part sep
where
part = munch1 (isAlpha <||> isDigit <||> (=='\''))
sep = (\xs ys -> xs ++ ' ':ys) <$ munch1 (==' ')
onlyEmail :: ReadP Address
onlyEmail = Address Nothing <$> email
withName :: ReadP Address
withName = do
n <- name
skipSpaces
e <- between (char '<') (char '>') email
return $ Address (Just n) e
address :: ReadP Address
address = skipSpaces *> (withName <|> onlyEmail)
main = print $ readP_to_S address "Foo Bar <foo@bar.com>"
将打印:
[(Address (Just "Foo Bar") "foo@bar.com","")]
我正在尝试解析 RFC5322 电子邮件地址。我的解析器的工作原理是,在结果中,其中一个是正确的。但是,我该如何选择“正确”的结果呢?
给定字符串 Foo Bar <foo@bar.com>
,我的解析器应该生成一个值 Address (Just "Foo Bar") "foo@bar.com"
。
或者,给定字符串 foo@bar.com
,我的解析器应该生成 Address Nothing "foo@bar.com"
.
首选包含名称的值。
我的解析器如下所示:
import Control.Applicative
import Data.Char
import qualified Data.Text as T
import Text.ParserCombinators.ReadP
onlyEmail :: ReadP Address
onlyEmail = do
skipSpaces
email <- many1 $ satisfy isAscii
skipSpaces
return $ Address Nothing (T.pack email)
withName :: ReadP Address
withName = do
skipSpaces
name <- many1 (satisfy isAscii)
skipSpaces
email <- between (char '<') (char '>') (many1 $ satisfy isAscii)
skipSpaces
return $ Address (Just $ T.pack name) (T.pack email)
rfc5322 :: ReadP Address
rfc5322 = withName <|> onlyEmail
当我 运行 带有 readP_to_S rfc5322 "Foo Bar <foo@bar.com>"
的解析器时,它会产生以下结果:
[ (Address {addressName = Nothing, addressEmail = "F"},"oo Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Fo"},"o Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo"},"Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo "},"Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo B"},"ar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Ba"},"r <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar"},"<foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar "},"<foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <"},"foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <f"},"oo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <fo"},"o@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo"},"@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@"},"bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@b"},"ar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@ba"},"r.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar"},".com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar."},"com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.c"},"om>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.co"},"m>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.com"},">")
, (Address {addressName = Just "Foo Bar", addressEmail = "foo@bar.com"},"")
, (Address {addressName = Just "Foo Bar ", addressEmail = "foo@bar.com"},"")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.com>"},"")
]
在这种情况下,我真正想要的结果出现在列表的倒数第三位。我该如何表达这种偏好?
你不应该做偏好。您的问题是您的部分解析器正在接受比实际需要更大的字符串集。
例如我的解决方案:
import Control.Bool
import Control.Applicative
import Data.Char
import qualified Data.Text as T
import Data.Text (Text)
import Text.ParserCombinators.ReadP
email :: ReadP Text
email = do
l <- part
a <- char '@'
d <- part
return . T.pack $ l ++ a:d
where
part = munch1 (isAscii <&&> (/='@') <&&> (/='<') <&&> (/='>'))
name :: ReadP Text
name = T.pack <$> chainr1 part sep
where
part = munch1 (isAlpha <||> isDigit <||> (=='\''))
sep = (\xs ys -> xs ++ ' ':ys) <$ munch1 (==' ')
onlyEmail :: ReadP Address
onlyEmail = Address Nothing <$> email
withName :: ReadP Address
withName = do
n <- name
skipSpaces
e <- between (char '<') (char '>') email
return $ Address (Just n) e
address :: ReadP Address
address = skipSpaces *> (withName <|> onlyEmail)
main = print $ readP_to_S address "Foo Bar <foo@bar.com>"
将打印:
[(Address (Just "Foo Bar") "foo@bar.com","")]