为什么仅对功能设计进行微小更改就会从根本上改变标准基准测试的结果?

Why only minor change to function design radically changes result of criterion benchmark?

我有两个源文件,它们的功能大致相同。唯一的区别是,在第一种情况下函数作为参数传递,而在第二种情况下 - 值。

第一种情况:

module Main where

import Data.Vector.Unboxed as UB
import qualified Data.Vector as V

import Criterion.Main

regularVectorGenerator :: (Int -> t) -> V.Vector t
regularVectorGenerator = V.generate 99999

unboxedVectorGenerator :: Unbox t => (Int -> t) -> UB.Vector t
unboxedVectorGenerator = UB.generate 99999

main :: IO ()
main = defaultMain
    [
        bench "boxed"   $ whnf regularVectorGenerator (+2137)
      , bench "unboxed" $ whnf unboxedVectorGenerator (+2137)
    ]

第二种情况:

module Main where

import Data.Vector.Unboxed as UB
import qualified Data.Vector as V

import Criterion.Main

regularVectorGenerator :: Int -> V.Vector Int
regularVectorGenerator = flip V.generate (+2137)

unboxedVectorGenerator :: Int -> UB.Vector Int
unboxedVectorGenerator = flip UB.generate (+2137)

main :: IO ()
main = defaultMain
    [
        bench "boxed"   $ whnf regularVectorGenerator 99999
      , bench "unboxed" $ whnf unboxedVectorGenerator 99999
    ]

我注意到,在基准测试向量大小时,如预期的那样,未装箱的向量总是更小,但两个向量的大小差异很大。这是

的输出

第一种情况:

 benchmarking boxed
 time                 7.626 ms   (7.515 ms .. 7.738 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
 mean                 7.532 ms   (7.472 ms .. 7.583 ms)
 std dev              164.3 μs   (133.8 μs .. 201.3 μs)
 allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
   iters              **1.680e7**    (1.680e7 .. 1.680e7)
   y                  2357.390   (1556.690 .. 3422.724)

 benchmarking unboxed
 time                 889.1 μs   (878.9 μs .. 901.8 μs)
                     0.998 R²   (0.995 R² .. 0.999 R²)
 mean                 868.6 μs   (858.6 μs .. 882.6 μs)
 std dev              39.05 μs   (28.30 μs .. 57.02 μs)
 allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
   iters              **4000009.003** (4000003.843 .. 4000014.143)
   y                  2507.089   (2025.196 .. 3035.962)
 variance introduced by outliers: 36% (moderately inflated)

第二种情况:

 benchmarking boxed
 time                 1.366 ms   (1.357 ms .. 1.379 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
 mean                 1.350 ms   (1.343 ms .. 1.361 ms)
 std dev              29.96 μs   (21.74 μs .. 43.56 μs)
 allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
   iters              **2400818.350** (2400810.284 .. 2400826.685)
  y                  2423.216   (1910.901 .. 3008.024)
 variance introduced by outliers: 12% (moderately inflated)

 benchmarking unboxed
 time                 61.30 μs   (61.24 μs .. 61.37 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 61.29 μs   (61.25 μs .. 61.33 μs)
 std dev              122.1 ns   (91.64 ns .. 173.9 ns)
 allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
   iters              **800040.029** (800039.745 .. 800040.354)
   y                  2553.830   (2264.684 .. 2865.637)

仅通过去参数化函数,向量的基准大小就减少了一个数量级。有人可以解释一下为什么吗?

我用这些标志编译了两个例子:

-O2 -rtsopts

并推出

--regress allocated:iters +RTS -T

不同之处在于,如果生成函数在基准函数中已知,则生成器是内联的,并且所涉及的 Int-s 也是未装箱的。如果生成函数是基准参数,则不能内联

从基准测试的角度来看,第二个版本是正确的,因为在正常使用中我们希望内联生成函数。