如何使用 amazonka、conduit 和 lazy bytestring 进行分块

Question

我编写了下面的代码来模拟从 Lazy ByteString 上传到 S3（将通过网络套接字接收。在这里，我们通过读取大小约为 100MB 的文件来模拟）。下面代码的问题是，它似乎强制将整个文件读入内存而不是将其分块 (cbytes) - 将感谢有关为什么分块不起作用的指针：

import Control.Lens
import Network.AWS
import Network.AWS.S3
import Network.AWS.Data.Body
import System.IO
import           Data.Conduit (($$+-))
import           Data.Conduit.Binary (sinkLbs,sourceLbs)
import qualified Data.Conduit.List as CL (mapM_)
import           Network.HTTP.Conduit (responseBody,RequestBody(..),newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS

example :: IO PutObjectResponse
example = do
    -- To specify configuration preferences, newEnv is used to create a new Env. The Region denotes the AWS region requests will be performed against,
    -- and Credentials is used to specify the desired mechanism for supplying or retrieving AuthN/AuthZ information.
    -- In this case, Discover will cause the library to try a number of options such as default environment variables, or an instance's IAM Profile:
    e <- newEnv NorthVirginia Discover

    -- A new Logger to replace the default noop logger is created, with the logger set to print debug information and errors to stdout:
    l <- newLogger Debug stdout

    -- The payload for the S3 object is retrieved from a file that simulates lazy bytestring received over network
    inb <- LBS.readFile "out"
    lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
    let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)

    -- We now run the AWS computation with the overriden logger, performing the PutObject request:
    runResourceT . runAWS (e & envLogger .~ l) $
        send ((putObject "yourtestenv-change-it-please" "testbucket/test" cbytes) & poContentType .~ Just "text; charset=UTF-8")

main = example >> return ()

运行带有 RTS -s 选项的可执行文件显示整个内容都被读入内存（~113MB 最大驻留 - 我确实看到过一次~87MB）。另一方面，如果我使用 chunkedFile，它会被正确分块（~10MB 最大驻留）。

Answer 1

我不确定，但我认为罪魁祸首是 LBS.readFile 它的 documentation 说：

readFile :: FilePath -> IO ByteString

Read an entire file lazily into a ByteString.
The Handle will be held open until EOF is encountered.

chunkedFile 以管道的方式工作 - 或者你可以使用

sourceFile :: MonadResource m => FilePath -> Producer m ByteString

来自 (conduit-extras/Data.Conduit.Binary) 而不是 LBS.readFile，但我不是专家。

Answer 2

这点很清楚

  inb <- LBS.readFile "out"
  lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
  let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)

应该改写为

  lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
  let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (C.sourceFile "out")

如您所写，管道的用途已经落空。整个文件需要 LBS.readFile 来累积，但在馈送到 sourceLBS 时会逐块分解。（如果惰性 IO 工作正常，这可能不会发生。）sourceFile 逐块增量读取文件。可能是这样，例如toBody 累积整个文件，在这种情况下，管道点在不同的点被击败。浏览 send 等的来源，我看不到任何可以做到这一点的东西。

如何使用 amazonka、conduit 和 lazy bytestring 进行分块

How make chunking work with amazonka, conduit and lazy bytestring

haskell

conduit