管道:产生内存泄漏

conduit: producing memory leak

在对上一个问题进行一些观察时 (haskell-data-hashset-from-unordered-container-performance-for-large-sets),我偶然发现了一个奇怪的内存泄漏

module Main where

import System.Environment (getArgs)
import Control.Monad.Trans.Resource (runResourceT)
import Data.Attoparsec.ByteString (sepBy, Parser)
import Data.Attoparsec.ByteString.Char8 (decimal, char)
import Data.Conduit
import qualified Data.Conduit.Attoparsec as CA
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL

main :: IO ()
main = do (args:_) <- getArgs
          writeFile "input.txt" $ unlines $ map show [1..4 :: Int]
          case args of "list" -> m1
                       "fail" -> m2
                       "listlist" -> m3
                       "memoryleak" -> m4
                       --UPDATE
                       "bs-lines":_ -> m5
                       "bs":_ -> m6
                       _ -> putStr $ unlines ["Usage: conduit list"
                                             ,"               fail"
                                             ,"               listlist"
                                             ,"               memoryleak"
                                             --UPDATE
                                             ,"               bs-lines"
                                             ,"               bs"
                                             ]
m1,m2,m3,m4 :: IO ()
m1 = do hs <- runResourceT
            $  CB.sourceFile "input.txt"
            $$ CB.lines
           =$= CA.conduitParser (decimal :: Parser Int)
           =$= CL.map snd
           =$= CL.consume
        print hs
m2 = do hs <- runResourceT
            $  CB.sourceFile "input.txt"
            $$ CA.conduitParser (decimal :: Parser Int)
           =$= CL.map snd
           =$= CL.consume
        print hs
m3 = do hs <- runResourceT
            $  CB.sourceFile "input.txt"
            $$ CB.lines
           =$= CA.conduitParser (decimal `sepBy` (char '\n') :: Parser [Int])
           =$= CL.map snd
           =$= CL.consume
        print hs
m4 = do hs <- runResourceT
            $  CB.sourceFile "input.txt"
            $$ CA.conduitParser (decimal `sepBy` (char '\n') :: Parser [Int])
           =$= CL.map snd
           =$= CL.consume
        print hs
-- UPDATE
m5 = do inpt <- BS.lines <$> BS.readFile "input.txt"
        let Right hs =  mapM (parseOnly (decimal :: Parser Int)) inpt
        print hs
m6 = do inpt <- BS.readFile "input.txt"
        let Right hs =  (parseOnly (decimal `sepBy` (char '\n') :: Parser [Int])) inpt
        print hs

这是一些示例输出:

$ > stack exec -- example list
[1234]
$ > stack exec -- example listlist
[[1234]]
$ > stack exec -- conduit fail
conduit: ParseError {errorContexts = [], errorMessage = "Failed reading: takeWhile1", errorPosition = 1:2}
$ > stack exec -- example memoryleak
(Ctrl+C)

-- UPDATE
$ > stack exec -- example bs-lines
[1,2,3,4]
$ > stack exec -- example bs
[1,2,3,4]

现在我的问题是:

Why is m2 failing?

作为字符流的输入文件是:

1\n2\n3\n4\n

由于 decimal 解析器不期望换行符,在使用第一个数字后剩余的流是:

\n2\n3\n4\n

由于输入流未耗尽,conduitParser 将再次运行 流上的解析器,这次它甚至无法消耗第一个字符,因此失败。

Why is m4 behaving totally different compared to all other versions and producing a space leak?

decimal `sepBy` (char '\n')只会消耗两个整数之间的\n,成功解析四个数字后,输入流中只有一个字符:

\n

decimal `sepBy` (char '\n') 不能消耗它,更糟糕的是它不会失败:sepBy 不能消耗任何东西和 return 空列表。因此它无限地解析任何东西并且永不终止。

Why is m1 not producing [1,2,3,4]?

我也想知道!估计跟fusing有关,也许你应该联系conduit package的作者,他刚刚评论了你的问题

要回答有关 m1 的问题:当您使用 CB.lines 时,您正在转换如下所示的输入:

["1\n2\n3\n4\n"]

进入:

["1", "2", "3", "4"]

然后,attoparsec 解析“1”,等待更多输入,查看“2”,依此类推。