在 haskell 中,异步代码运行速度比同步版本慢
Asynchronous code runs slower than synchronous version in haskell
对以下内容进行基准测试:
#!/usr/bin/env stack
-- stack --resolver lts-16.2 script --package async --package criterion
import Control.Concurrent.Async (async, replicateConcurrently_)
import Control.Monad (replicateM_, void)
import Criterion.Main
main :: IO ()
main = defaultMain [
bgroup "tests" [ bench "sync" $ nfIO syncTest
, bench "async" $ nfIO asyncTest
]
]
syncTest :: IO ()
syncTest = replicateM_ 100000 dummy
asyncTest :: IO ()
asyncTest = replicateConcurrently_ 100000 dummy
dummy :: IO Int
dummy = return $ fib 10000000000
fib :: Int -> Int
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)
给我这个:
% ./applicative-v-monad.hs
benchmarking tests/sync
time 2.120 ms (2.075 ms .. 2.160 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 2.040 ms (2.023 ms .. 2.073 ms)
std dev 77.37 μs (54.96 μs .. 122.8 μs)
variance introduced by outliers: 23% (moderately inflated)
benchmarking tests/async
time 475.3 ms (310.7 ms .. 642.8 ms)
0.984 R² (0.943 R² .. 1.000 R²)
mean 527.2 ms (497.9 ms .. 570.9 ms)
std dev 41.30 ms (4.833 ms .. 52.83 ms)
variance introduced by outliers: 21% (moderately inflated)
显然 asyncTest 运行的时间比 syncTest 长。
我原以为 运行 并发执行昂贵的操作会比 运行 按顺序执行它们更快。我的推理有问题吗?
此基准测试存在一些问题。
首先是懒惰
正如@David Fletcher 指出的那样,您并不是在强制计算 fib。这个问题的修复通常很简单:
dummy :: IO Int
dummy = return $! fib 10000000000
足以让我们等待永恒。将它降低到更易于管理的程度是我们接下来应该做的事情:
dummy :: IO Int
dummy = return $! fib 35
这通常就足够了,但是 ghc 太聪明了,它会发现这个计算真的很纯粹,会将 100000 次迭代的循环优化为一次计算,return 相同的结果 100000 次,所以实际上它只会计算一次这个 fib。相反,让 fib
取决于迭代次数:
xs :: [Int]
xs = [1..35]
syncTest :: IO ()
syncTest = mapM_ dummy xs
asyncTest :: IO ()
asyncTest = mapConcurrently_ dummy xs
dummy :: Int -> IO Int
dummy n = return $! fib n
下一个问题是编译
stack script
将 运行 代码解释并且没有线程环境。因此,您的代码将 运行 变慢并按顺序执行。我们通过手动编译和一些标志修复它:
$ stack exec --resolver lts-16.2 --package async --package criterion -- ghc -threaded -O2 -rtsopts -with-rtsopts=-N bench-async.hs
$ stack exec --resolver lts-16.2 -- ./bench-async
当然,对于一个完整的堆栈项目,所有这些标志都进入了 cabal 文件,运行宁 stack bench
将完成剩下的工作。
最后但并非最不重要的一点。线程太多。
在你的问题中 asyncTest = replicateConcurrently_ 100000 dummy
。除非迭代次数非常低(事实并非如此),否则您不想为此使用 async
因为生成至少 100000 个线程不是免费的,最好为这种类型使用工作窃取调度程序的工作量。我专门为此写了一个库:scheduler
这是一个如何使用它的例子:
import qualified Control.Scheduler as S
main :: IO ()
main = defaultMain [
bgroup "tests" [ bench "sync" $ whnfIO syncTest
, bench "async" $ nfIO asyncTest
, bench "scheduler" $ nfIO schedulerTest
]
]
schedulerTest :: IO ()
schedulerTest = S.traverseConcurrently_ S.Par dummy xs
现在这将为我们提供更合理的数字:
benchmarking tests/sync
time 246.7 ms (210.6 ms .. 269.0 ms)
0.989 R² (0.951 R² .. 1.000 R²)
mean 266.4 ms (256.4 ms .. 286.0 ms)
std dev 21.60 ms (457.3 μs .. 26.92 ms)
variance introduced by outliers: 18% (moderately inflated)
benchmarking tests/async
time 135.4 ms (127.8 ms .. 147.9 ms)
0.992 R² (0.980 R² .. 1.000 R²)
mean 134.8 ms (129.7 ms .. 138.0 ms)
std dev 6.578 ms (3.605 ms .. 9.807 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking tests/scheduler
time 109.0 ms (96.83 ms .. 120.3 ms)
0.989 R² (0.956 R² .. 1.000 R²)
mean 111.5 ms (108.0 ms .. 120.2 ms)
std dev 7.574 ms (2.496 ms .. 11.85 ms)
variance introduced by outliers: 12% (moderately inflated)
对以下内容进行基准测试:
#!/usr/bin/env stack
-- stack --resolver lts-16.2 script --package async --package criterion
import Control.Concurrent.Async (async, replicateConcurrently_)
import Control.Monad (replicateM_, void)
import Criterion.Main
main :: IO ()
main = defaultMain [
bgroup "tests" [ bench "sync" $ nfIO syncTest
, bench "async" $ nfIO asyncTest
]
]
syncTest :: IO ()
syncTest = replicateM_ 100000 dummy
asyncTest :: IO ()
asyncTest = replicateConcurrently_ 100000 dummy
dummy :: IO Int
dummy = return $ fib 10000000000
fib :: Int -> Int
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)
给我这个:
% ./applicative-v-monad.hs
benchmarking tests/sync
time 2.120 ms (2.075 ms .. 2.160 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 2.040 ms (2.023 ms .. 2.073 ms)
std dev 77.37 μs (54.96 μs .. 122.8 μs)
variance introduced by outliers: 23% (moderately inflated)
benchmarking tests/async
time 475.3 ms (310.7 ms .. 642.8 ms)
0.984 R² (0.943 R² .. 1.000 R²)
mean 527.2 ms (497.9 ms .. 570.9 ms)
std dev 41.30 ms (4.833 ms .. 52.83 ms)
variance introduced by outliers: 21% (moderately inflated)
显然 asyncTest 运行的时间比 syncTest 长。
我原以为 运行 并发执行昂贵的操作会比 运行 按顺序执行它们更快。我的推理有问题吗?
此基准测试存在一些问题。
首先是懒惰
正如@David Fletcher 指出的那样,您并不是在强制计算 fib。这个问题的修复通常很简单:
dummy :: IO Int
dummy = return $! fib 10000000000
足以让我们等待永恒。将它降低到更易于管理的程度是我们接下来应该做的事情:
dummy :: IO Int
dummy = return $! fib 35
这通常就足够了,但是 ghc 太聪明了,它会发现这个计算真的很纯粹,会将 100000 次迭代的循环优化为一次计算,return 相同的结果 100000 次,所以实际上它只会计算一次这个 fib。相反,让 fib
取决于迭代次数:
xs :: [Int]
xs = [1..35]
syncTest :: IO ()
syncTest = mapM_ dummy xs
asyncTest :: IO ()
asyncTest = mapConcurrently_ dummy xs
dummy :: Int -> IO Int
dummy n = return $! fib n
下一个问题是编译
stack script
将 运行 代码解释并且没有线程环境。因此,您的代码将 运行 变慢并按顺序执行。我们通过手动编译和一些标志修复它:
$ stack exec --resolver lts-16.2 --package async --package criterion -- ghc -threaded -O2 -rtsopts -with-rtsopts=-N bench-async.hs
$ stack exec --resolver lts-16.2 -- ./bench-async
当然,对于一个完整的堆栈项目,所有这些标志都进入了 cabal 文件,运行宁 stack bench
将完成剩下的工作。
最后但并非最不重要的一点。线程太多。
在你的问题中 asyncTest = replicateConcurrently_ 100000 dummy
。除非迭代次数非常低(事实并非如此),否则您不想为此使用 async
因为生成至少 100000 个线程不是免费的,最好为这种类型使用工作窃取调度程序的工作量。我专门为此写了一个库:scheduler
这是一个如何使用它的例子:
import qualified Control.Scheduler as S
main :: IO ()
main = defaultMain [
bgroup "tests" [ bench "sync" $ whnfIO syncTest
, bench "async" $ nfIO asyncTest
, bench "scheduler" $ nfIO schedulerTest
]
]
schedulerTest :: IO ()
schedulerTest = S.traverseConcurrently_ S.Par dummy xs
现在这将为我们提供更合理的数字:
benchmarking tests/sync
time 246.7 ms (210.6 ms .. 269.0 ms)
0.989 R² (0.951 R² .. 1.000 R²)
mean 266.4 ms (256.4 ms .. 286.0 ms)
std dev 21.60 ms (457.3 μs .. 26.92 ms)
variance introduced by outliers: 18% (moderately inflated)
benchmarking tests/async
time 135.4 ms (127.8 ms .. 147.9 ms)
0.992 R² (0.980 R² .. 1.000 R²)
mean 134.8 ms (129.7 ms .. 138.0 ms)
std dev 6.578 ms (3.605 ms .. 9.807 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking tests/scheduler
time 109.0 ms (96.83 ms .. 120.3 ms)
0.989 R² (0.956 R² .. 1.000 R²)
mean 111.5 ms (108.0 ms .. 120.2 ms)
std dev 7.574 ms (2.496 ms .. 11.85 ms)
variance introduced by outliers: 12% (moderately inflated)