如何使用 amazonka、conduit 和 lazy bytestring 进行分块
How make chunking work with amazonka, conduit and lazy bytestring
我编写了下面的代码来模拟从 Lazy ByteString
上传到 S3
(将通过网络套接字接收。在这里,我们通过读取大小约为 100MB 的文件来模拟)。下面代码的问题是,它似乎强制将整个文件读入内存而不是将其分块 (cbytes
) - 将感谢有关为什么分块不起作用的指针:
import Control.Lens
import Network.AWS
import Network.AWS.S3
import Network.AWS.Data.Body
import System.IO
import Data.Conduit (($$+-))
import Data.Conduit.Binary (sinkLbs,sourceLbs)
import qualified Data.Conduit.List as CL (mapM_)
import Network.HTTP.Conduit (responseBody,RequestBody(..),newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS
example :: IO PutObjectResponse
example = do
-- To specify configuration preferences, newEnv is used to create a new Env. The Region denotes the AWS region requests will be performed against,
-- and Credentials is used to specify the desired mechanism for supplying or retrieving AuthN/AuthZ information.
-- In this case, Discover will cause the library to try a number of options such as default environment variables, or an instance's IAM Profile:
e <- newEnv NorthVirginia Discover
-- A new Logger to replace the default noop logger is created, with the logger set to print debug information and errors to stdout:
l <- newLogger Debug stdout
-- The payload for the S3 object is retrieved from a file that simulates lazy bytestring received over network
inb <- LBS.readFile "out"
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)
-- We now run the AWS computation with the overriden logger, performing the PutObject request:
runResourceT . runAWS (e & envLogger .~ l) $
send ((putObject "yourtestenv-change-it-please" "testbucket/test" cbytes) & poContentType .~ Just "text; charset=UTF-8")
main = example >> return ()
运行 带有 RTS -s
选项的可执行文件显示整个内容都被读入内存(~113MB 最大驻留 - 我确实看到过一次~87MB)。另一方面,如果我使用 chunkedFile
,它会被正确分块(~10MB 最大驻留)。
我不确定,但我认为罪魁祸首是 LBS.readFile
它的 documentation 说:
readFile :: FilePath -> IO ByteString
Read an entire file lazily into a ByteString.
The Handle will be held open until EOF is encountered.
chunkedFile
以管道的方式工作 - 或者你可以使用
sourceFile :: MonadResource m => FilePath -> Producer m ByteString
来自 (conduit-extras/Data.Conduit.Binary
) 而不是 LBS.readFile
,但我不是专家。
这点很清楚
inb <- LBS.readFile "out"
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)
应该改写为
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (C.sourceFile "out")
如您所写,管道的用途已经落空。整个文件需要 LBS.readFile
来累积,但在馈送到 sourceLBS
时会逐块分解。 (如果惰性 IO 工作正常,这可能不会发生。)sourceFile
逐块增量读取文件。可能是这样,例如toBody
累积整个文件,在这种情况下,管道点在不同的点被击败。浏览 send
等的来源,我看不到任何可以做到这一点的东西。
我编写了下面的代码来模拟从 Lazy ByteString
上传到 S3
(将通过网络套接字接收。在这里,我们通过读取大小约为 100MB 的文件来模拟)。下面代码的问题是,它似乎强制将整个文件读入内存而不是将其分块 (cbytes
) - 将感谢有关为什么分块不起作用的指针:
import Control.Lens
import Network.AWS
import Network.AWS.S3
import Network.AWS.Data.Body
import System.IO
import Data.Conduit (($$+-))
import Data.Conduit.Binary (sinkLbs,sourceLbs)
import qualified Data.Conduit.List as CL (mapM_)
import Network.HTTP.Conduit (responseBody,RequestBody(..),newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS
example :: IO PutObjectResponse
example = do
-- To specify configuration preferences, newEnv is used to create a new Env. The Region denotes the AWS region requests will be performed against,
-- and Credentials is used to specify the desired mechanism for supplying or retrieving AuthN/AuthZ information.
-- In this case, Discover will cause the library to try a number of options such as default environment variables, or an instance's IAM Profile:
e <- newEnv NorthVirginia Discover
-- A new Logger to replace the default noop logger is created, with the logger set to print debug information and errors to stdout:
l <- newLogger Debug stdout
-- The payload for the S3 object is retrieved from a file that simulates lazy bytestring received over network
inb <- LBS.readFile "out"
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)
-- We now run the AWS computation with the overriden logger, performing the PutObject request:
runResourceT . runAWS (e & envLogger .~ l) $
send ((putObject "yourtestenv-change-it-please" "testbucket/test" cbytes) & poContentType .~ Just "text; charset=UTF-8")
main = example >> return ()
运行 带有 RTS -s
选项的可执行文件显示整个内容都被读入内存(~113MB 最大驻留 - 我确实看到过一次~87MB)。另一方面,如果我使用 chunkedFile
,它会被正确分块(~10MB 最大驻留)。
我不确定,但我认为罪魁祸首是 LBS.readFile
它的 documentation 说:
readFile :: FilePath -> IO ByteString Read an entire file lazily into a ByteString. The Handle will be held open until EOF is encountered.
chunkedFile
以管道的方式工作 - 或者你可以使用
sourceFile :: MonadResource m => FilePath -> Producer m ByteString
来自 (conduit-extras/Data.Conduit.Binary
) 而不是 LBS.readFile
,但我不是专家。
这点很清楚
inb <- LBS.readFile "out"
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)
应该改写为
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (C.sourceFile "out")
如您所写,管道的用途已经落空。整个文件需要 LBS.readFile
来累积,但在馈送到 sourceLBS
时会逐块分解。 (如果惰性 IO 工作正常,这可能不会发生。)sourceFile
逐块增量读取文件。可能是这样,例如toBody
累积整个文件,在这种情况下,管道点在不同的点被击败。浏览 send
等的来源,我看不到任何可以做到这一点的东西。