如何通过缓冲来加速 Haskell IO?
How to speed Haskell IO with buffering?
我在 "Real World Haskell"(第 7 章,第 189 页)中阅读了有关 IO 缓冲的内容,并尝试测试不同的缓冲大小如何影响性能。
import System.IO
import Data.Time.Clock
import Data.Char(toUpper)
main :: IO ()
main = do
hInp <- openFile "bigFile.txt" ReadMode
let bufferSize = truncate $ 2**10
hSetBuffering hInp (BlockBuffering (Just bufferSize))
bufferMode <- hGetBuffering hInp
putStrLn $ "Current buffering mode: " ++ (show bufferMode)
startTime <- getCurrentTime
inp <- hGetContents hInp
writeFile "processed.txt" (map toUpper inp)
hClose hInp
finishTime <- getCurrentTime
print $ diffUTCTime finishTime startTime
return ()
然后我创建了一个"bigFile.txt"
-rw-rw-r-- 1 user user 96M янв. 26 09:49 bigFile.txt
和运行我针对这个文件的程序,缓冲区大小不同:
Current buffering mode: BlockBuffering (Just 32)
9.744967s
Current buffering mode: BlockBuffering (Just 1024)
9.667924s
Current buffering mode: BlockBuffering (Just 1048576)
9.494807s
Current buffering mode: BlockBuffering (Just 1073741824)
9.792453s
但是程序运行宁时间差不多。这是正常现象,还是我做错了什么?
在现代 OS 上,缓冲区大小可能对线性读取文件影响不大,因为 1) 内核执行预读和 2) 文件可能已经在页面中如果您最近已经阅读过该文件,请缓存。
这是一个测量缓冲对写入的影响的程序。典型的结果是:
$ ./mkbigfile 32 -- 12.864733s
$ ./mkbigfile 64 -- 9.668272s
$ ./mkbigfile 128 -- 6.993664s
$ ./mkbigfile 512 -- 4.130989s
$ ./mkbigfile 1024 -- 3.536652s
$ ./mkbigfile 16384 -- 3.055403s
$ ./mkbigfile 1000000 -- 3.004879s
来源:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString as BS
import Data.ByteString (ByteString)
import Control.Monad
import System.IO
import System.Environment
import Data.Time.Clock
main = do
(arg:_) <- getArgs
let size = read arg
let bs = "abcdefghijklmnopqrstuvwxyz"
n = 96000000 `div` (length bs)
h <- openFile "bigFile.txt" WriteMode
hSetBuffering h (BlockBuffering (Just size))
startTime <- getCurrentTime
replicateM_ n $ hPutStrLn h bs
hClose h
finishTime <- getCurrentTime
print $ diffUTCTime finishTime startTime
return ()
我在 "Real World Haskell"(第 7 章,第 189 页)中阅读了有关 IO 缓冲的内容,并尝试测试不同的缓冲大小如何影响性能。
import System.IO
import Data.Time.Clock
import Data.Char(toUpper)
main :: IO ()
main = do
hInp <- openFile "bigFile.txt" ReadMode
let bufferSize = truncate $ 2**10
hSetBuffering hInp (BlockBuffering (Just bufferSize))
bufferMode <- hGetBuffering hInp
putStrLn $ "Current buffering mode: " ++ (show bufferMode)
startTime <- getCurrentTime
inp <- hGetContents hInp
writeFile "processed.txt" (map toUpper inp)
hClose hInp
finishTime <- getCurrentTime
print $ diffUTCTime finishTime startTime
return ()
然后我创建了一个"bigFile.txt"
-rw-rw-r-- 1 user user 96M янв. 26 09:49 bigFile.txt
和运行我针对这个文件的程序,缓冲区大小不同:
Current buffering mode: BlockBuffering (Just 32)
9.744967s
Current buffering mode: BlockBuffering (Just 1024)
9.667924s
Current buffering mode: BlockBuffering (Just 1048576)
9.494807s
Current buffering mode: BlockBuffering (Just 1073741824)
9.792453s
但是程序运行宁时间差不多。这是正常现象,还是我做错了什么?
在现代 OS 上,缓冲区大小可能对线性读取文件影响不大,因为 1) 内核执行预读和 2) 文件可能已经在页面中如果您最近已经阅读过该文件,请缓存。
这是一个测量缓冲对写入的影响的程序。典型的结果是:
$ ./mkbigfile 32 -- 12.864733s
$ ./mkbigfile 64 -- 9.668272s
$ ./mkbigfile 128 -- 6.993664s
$ ./mkbigfile 512 -- 4.130989s
$ ./mkbigfile 1024 -- 3.536652s
$ ./mkbigfile 16384 -- 3.055403s
$ ./mkbigfile 1000000 -- 3.004879s
来源:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString as BS
import Data.ByteString (ByteString)
import Control.Monad
import System.IO
import System.Environment
import Data.Time.Clock
main = do
(arg:_) <- getArgs
let size = read arg
let bs = "abcdefghijklmnopqrstuvwxyz"
n = 96000000 `div` (length bs)
h <- openFile "bigFile.txt" WriteMode
hSetBuffering h (BlockBuffering (Just size))
startTime <- getCurrentTime
replicateM_ n $ hPutStrLn h bs
hClose h
finishTime <- getCurrentTime
print $ diffUTCTime finishTime startTime
return ()