Conduit 和 Attoparsec:解析错误时意外终止
Conduit and Attoparsec: unexpected termination on parse error
我正在尝试将我写了一段时间的日志文件解析器转换回管道,但我 运行 遇到了问题。我将简化解析器本身的细节,因为这与问题无关。我有一个如下所示的日志文件:
200 GET
404 POST
500 GET
FOO
301 PUT
302 GET
201 POST
所以解析代码非常简单:
data SimpleLogEntry = SimpleLogEntry {
status :: Int
, method :: String
} deriving (Show, Eq)
parseHTTPStatus :: Parser Int
parseHTTPStatus = validate <$> decimal
where validate d = if (d >= 200 && d < 999) then d else 100
parseHTTPMethod :: Parser String
parseHTTPMethod =
(stringCI "GET" *> return "Get")
<|> (stringCI "POST" *> return "Post")
<|> (stringCI "PUT" *> return "Put")
<|> return "Unknown"
parseLogLine :: Parser SimpleLogEntry
parseLogLine = fmap SimpleLogEntry
parseHTTPStatus
<*> (space *> parseHTTPMethod)
到目前为止一切顺利。这是我在管道中实现它的方式:
import Prelude hiding (lines)
import Control.Applicative
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Trans.Resource (runResourceT, ResourceT)
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as B8
import Data.Conduit
import qualified Data.Conduit.Attoparsec as CA
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
logLines:: Source (ResourceT IO) B.ByteString
logLines = CB.sourceFile "~/test.log" $= CB.lines
parseEntry :: ConduitM B8.ByteString SimpleLogEntry (ResourceT IO) ()
parseEntry = CA.conduitParserEither parseLogLine =$= awaitForever go
where
go (Left err) = liftIO $ putStrLn ("Got an error: " ++ CA.errorMessage err)
go (Right (_, logEntry)) = yield logEntry
sink :: Sink SimpleLogEntry (ResourceT IO) ()
sink = CL.mapM_ (\t -> liftIO $ putStrLn $ "Got a status: " ++ (show . status) t)
main :: IO ()
main = runResourceT $ logLines $= parseEntry $$ sink
当 运行 main
我得到这个输出:
Got a status: 200
Got a status: 404
Got a status: 500
Got an error: Failed reading: takeWhile1
我无法理解为什么管道此时终止,而不是继续解析文件的下一行,就像我想做的那样。阅读 Data.Conduit.Attoparsec
的文档,这似乎正是为 conduitParserEither
设计的用例。
更新
Per @Fabian,事实证明 conduitParserEither
并不是我真正想要的。这是 parseEntry
的定义,它完成了我想做的事情:
parseEntry' :: ConduitM B8.ByteString SimpleLogEntry (ResourceT IO) ()
parseEntry' = (CL.map (parseOnly parseLogLine)) =$= awaitForever go
where
go (Left err) = liftIO $ putStrLn ("Got an error: " ++ err)
go (Right logEntry) = yield logEntry
conduitParser
(或conduitParserEither
)也可以在一行中使用多个标记:例如,以下输入会产生相同的结果:
200 GET404 POST
500 GET
FOO
301 PUT
302 GET
201 POST
所以解析器没有继续是有道理的,因为它不知道下一个标记将从哪里开始。
我正在尝试将我写了一段时间的日志文件解析器转换回管道,但我 运行 遇到了问题。我将简化解析器本身的细节,因为这与问题无关。我有一个如下所示的日志文件:
200 GET
404 POST
500 GET
FOO
301 PUT
302 GET
201 POST
所以解析代码非常简单:
data SimpleLogEntry = SimpleLogEntry {
status :: Int
, method :: String
} deriving (Show, Eq)
parseHTTPStatus :: Parser Int
parseHTTPStatus = validate <$> decimal
where validate d = if (d >= 200 && d < 999) then d else 100
parseHTTPMethod :: Parser String
parseHTTPMethod =
(stringCI "GET" *> return "Get")
<|> (stringCI "POST" *> return "Post")
<|> (stringCI "PUT" *> return "Put")
<|> return "Unknown"
parseLogLine :: Parser SimpleLogEntry
parseLogLine = fmap SimpleLogEntry
parseHTTPStatus
<*> (space *> parseHTTPMethod)
到目前为止一切顺利。这是我在管道中实现它的方式:
import Prelude hiding (lines)
import Control.Applicative
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Trans.Resource (runResourceT, ResourceT)
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as B8
import Data.Conduit
import qualified Data.Conduit.Attoparsec as CA
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
logLines:: Source (ResourceT IO) B.ByteString
logLines = CB.sourceFile "~/test.log" $= CB.lines
parseEntry :: ConduitM B8.ByteString SimpleLogEntry (ResourceT IO) ()
parseEntry = CA.conduitParserEither parseLogLine =$= awaitForever go
where
go (Left err) = liftIO $ putStrLn ("Got an error: " ++ CA.errorMessage err)
go (Right (_, logEntry)) = yield logEntry
sink :: Sink SimpleLogEntry (ResourceT IO) ()
sink = CL.mapM_ (\t -> liftIO $ putStrLn $ "Got a status: " ++ (show . status) t)
main :: IO ()
main = runResourceT $ logLines $= parseEntry $$ sink
当 运行 main
我得到这个输出:
Got a status: 200
Got a status: 404
Got a status: 500
Got an error: Failed reading: takeWhile1
我无法理解为什么管道此时终止,而不是继续解析文件的下一行,就像我想做的那样。阅读 Data.Conduit.Attoparsec
的文档,这似乎正是为 conduitParserEither
设计的用例。
更新
Per @Fabian,事实证明 conduitParserEither
并不是我真正想要的。这是 parseEntry
的定义,它完成了我想做的事情:
parseEntry' :: ConduitM B8.ByteString SimpleLogEntry (ResourceT IO) ()
parseEntry' = (CL.map (parseOnly parseLogLine)) =$= awaitForever go
where
go (Left err) = liftIO $ putStrLn ("Got an error: " ++ err)
go (Right logEntry) = yield logEntry
conduitParser
(或conduitParserEither
)也可以在一行中使用多个标记:例如,以下输入会产生相同的结果:
200 GET404 POST
500 GET
FOO
301 PUT
302 GET
201 POST
所以解析器没有继续是有道理的,因为它不知道下一个标记将从哪里开始。