如何使用 amazonka、conduit 和 lazy bytestring 进行分块

How make chunking work with amazonka, conduit and lazy bytestring

我编写了下面的代码来模拟从 Lazy ByteString 上传到 S3(将通过网络套接字接收。在这里,我们通过读取大小约为 100MB 的文件来模拟)。下面代码的问题是,它似乎强制将整个文件读入内存而不是将其分块 (cbytes) - 将感谢有关为什么分块不起作用的指针:

import Control.Lens
import Network.AWS
import Network.AWS.S3
import Network.AWS.Data.Body
import System.IO
import           Data.Conduit (($$+-))
import           Data.Conduit.Binary (sinkLbs,sourceLbs)
import qualified Data.Conduit.List as CL (mapM_)
import           Network.HTTP.Conduit (responseBody,RequestBody(..),newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS

example :: IO PutObjectResponse
example = do
    -- To specify configuration preferences, newEnv is used to create a new Env. The Region denotes the AWS region requests will be performed against,
    -- and Credentials is used to specify the desired mechanism for supplying or retrieving AuthN/AuthZ information.
    -- In this case, Discover will cause the library to try a number of options such as default environment variables, or an instance's IAM Profile:
    e <- newEnv NorthVirginia Discover

    -- A new Logger to replace the default noop logger is created, with the logger set to print debug information and errors to stdout:
    l <- newLogger Debug stdout

    -- The payload for the S3 object is retrieved from a file that simulates lazy bytestring received over network
    inb <- LBS.readFile "out"
    lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
    let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)

    -- We now run the AWS computation with the overriden logger, performing the PutObject request:
    runResourceT . runAWS (e & envLogger .~ l) $
        send ((putObject "yourtestenv-change-it-please" "testbucket/test" cbytes) & poContentType .~ Just "text; charset=UTF-8")

main = example >> return ()

运行 带有 RTS -s 选项的可执行文件显示整个内容都被读入内存(~113MB 最大驻留 - 我确实看到过一次~87MB)。另一方面,如果我使用 chunkedFile,它会被正确分块(~10MB 最大驻留)。

我不确定,但我认为罪魁祸首是 LBS.readFile 它的 documentation 说:

readFile :: FilePath -> IO ByteString

Read an entire file lazily into a ByteString.
The Handle will be held open until EOF is encountered.

chunkedFile 以管道的方式工作 - 或者你可以使用

sourceFile :: MonadResource m => FilePath -> Producer m ByteString

来自 (conduit-extras/Data.Conduit.Binary) 而不是 LBS.readFile,但我不是专家。

这点很清楚

  inb <- LBS.readFile "out"
  lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
  let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)

应该改写为

  lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
  let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (C.sourceFile "out")

如您所写,管道的用途已经落空。整个文件需要 LBS.readFile 来累积,但在馈送到 sourceLBS 时会逐块分解。 (如果惰性 IO 工作正常,这可能不会发生。)sourceFile 逐块增量读取文件。可能是这样,例如toBody 累积整个文件,在这种情况下,管道点在不同的点被击败。浏览 send 等的来源,我看不到任何可以做到这一点的东西。