Haskell/GHC `any`/`all` 的表现
Haskell/GHC performance of `any`/`all`
我为 Haskell 的内置 []
列表数据类型编写了量化函数 exists
、forall
和 none
。在多个场合,这些似乎比 Prelude
/Data.List
s any
和 all
更有效。我天真地怀疑这种性能是由于 any
和 all
是使用 Θ(n) 折叠实现的。由于我对Haskell比较陌生,我想我一定是弄错了,或者说这种现象是有充分理由的。
来自Data.Foldable
:
-- | Determines whether any element of the structure satisfies the predicate.
any :: Foldable t => (a -> Bool) -> t a -> Bool
any p = getAny #. foldMap (Any #. p)
-- | Determines whether all elements of the structure satisfy the predicate.
all :: Foldable t => (a -> Bool) -> t a -> Bool
all p = getAll #. foldMap (All #. p)
我的实现:
exists :: (a -> Bool) -> [a] -> Bool
exists _ [] = False
exists pred (x : xs) | pred x = True
| otherwise = exists pred xs
和
forall pred = not . exists (not . pred)
none pred = not . exists pred = forall (not . pred)
消除布尔反转:
forall, none :: (a -> Bool) -> [a] -> Bool
forall _ [] = True
forall pred (x : xs) | pred x = forall pred xs
| otherwise = False
none _ [] = True
none pred (x : xs) | pred x = False
| otherwise = none pred xs
all
:
time 327.8 μs (322.4 μs .. 333.0 μs)
0.997 R² (0.996 R² .. 0.998 R²)
mean 328.7 μs (324.1 μs .. 334.2 μs)
std dev 16.95 μs (14.63 μs .. 22.02 μs)
和forall
:
time 113.2 μs (111.2 μs .. 115.0 μs)
0.997 R² (0.996 R² .. 0.998 R²)
mean 112.0 μs (110.0 μs .. 113.9 μs)
std dev 6.333 μs (5.127 μs .. 7.896 μs)
使用标准 nf
.
衡量的性能
正如预期的那样,我没有重新发明折叠,但低估了编译器标志,并且天真地没想到 -O2
与默认优化级别性能相比会产生如此巨大的整体差异,也没有想到两者之间的优化效果差异个人定制和库公式。许多高效的专用标准功能优化显然只有在明确启用时才会启动。
Haskell 标签信息的 "performance" 部分强调了优化级别编译器标志在测试代码效率时的重要性。通常建议相信库函数实现的复杂性,而不是重新连接 RULES
编译指示或重新制定基本形式,尝试利用已经培养的优化潜力。
我发现它对 re-implement any
有很多启发:
import Prelude hiding (any)
import Criterion.Main
import Data.Foldable (foldMap)
import Data.Monoid
你的exists
:
exists :: (a -> Bool) -> [a] -> Bool
exists _ [] = False
exists pred (x : xs)
= if pred x
then True
else exists pred xs
使用(||)
的版本:
existsOr :: (a -> Bool) -> [a] -> Bool
existsOr _ [] = False
existsOr pred (x : xs) = pred x || existsOr pred xs
使用foldr
:
any :: (a -> Bool) -> [a] -> Bool
any pred = foldr ((||) . pred) False
使用 foldr
和 Any
:
anyF :: (a -> Bool) -> [a] -> Bool
anyF pred = getAny . foldr (mappend . (Any . pred)) mempty
使用 foldMap
和 Any
:
anyFM :: (a -> Bool) -> [a] -> Bool
anyFM pred = getAny . foldMap (Any . pred)
ghc -O0
的基准:
benchmarking exists
time 1.552 μs (1.504 μs .. 1.593 μs)
0.989 R² (0.983 R² .. 0.993 R²)
mean 1.482 μs (1.427 μs .. 1.545 μs)
std dev 196.1 ns (168.8 ns .. 229.2 ns)
variance introduced by outliers: 93% (severely inflated)
benchmarking existsOr
time 2.699 μs (2.616 μs .. 2.768 μs)
0.992 R² (0.988 R² .. 0.995 R²)
mean 2.629 μs (2.554 μs .. 2.704 μs)
std dev 277.8 ns (235.8 ns .. 351.1 ns)
variance introduced by outliers: 89% (severely inflated)
benchmarking any
time 5.551 μs (5.354 μs .. 5.777 μs)
0.990 R² (0.986 R² .. 0.995 R²)
mean 5.553 μs (5.395 μs .. 5.750 μs)
std dev 584.2 ns (447.5 ns .. 835.5 ns)
variance introduced by outliers: 88% (severely inflated)
benchmarking anyF
time 7.330 μs (7.081 μs .. 7.612 μs)
0.988 R² (0.982 R² .. 0.994 R²)
mean 7.502 μs (7.272 μs .. 7.762 μs)
std dev 848.2 ns (712.6 ns .. 1.022 μs)
variance introduced by outliers: 89% (severely inflated)
benchmarking anyFM
time 5.668 μs (5.451 μs .. 6.008 μs)
0.987 R² (0.975 R² .. 0.996 R²)
mean 5.807 μs (5.659 μs .. 5.975 μs)
std dev 542.5 ns (446.4 ns .. 721.8 ns)
variance introduced by outliers: 86% (severely inflated)
你的版本(exists
)确实是最快的,foldr
版本比较慢
在 ghc -O2
中,您的版本 (exists
) 是最慢的,而所有其他函数彼此几乎同样快:
benchmarking exists
time 753.5 ns (725.4 ns .. 779.9 ns)
0.990 R² (0.986 R² .. 0.995 R²)
mean 762.4 ns (737.0 ns .. 787.0 ns)
std dev 82.47 ns (66.79 ns .. 105.1 ns)
variance introduced by outliers: 91% (severely inflated)
benchmarking existsOr
time 491.5 ns (478.2 ns .. 503.2 ns)
0.994 R² (0.992 R² .. 0.996 R²)
mean 494.5 ns (481.1 ns .. 512.9 ns)
std dev 54.97 ns (42.54 ns .. 80.34 ns)
variance introduced by outliers: 92% (severely inflated)
benchmarking any
time 461.2 ns (442.0 ns .. 479.7 ns)
0.989 R² (0.985 R² .. 0.993 R²)
mean 456.0 ns (439.3 ns .. 476.3 ns)
std dev 60.04 ns (47.27 ns .. 89.47 ns)
variance introduced by outliers: 94% (severely inflated)
benchmarking anyF
time 436.9 ns (415.8 ns .. 461.0 ns)
0.978 R² (0.967 R² .. 0.988 R²)
mean 450.8 ns (430.1 ns .. 472.6 ns)
std dev 70.64 ns (57.04 ns .. 85.92 ns)
variance introduced by outliers: 96% (severely inflated)
benchmarking anyFM
time 438.9 ns (426.9 ns .. 449.5 ns)
0.993 R² (0.989 R² .. 0.996 R²)
mean 435.8 ns (421.4 ns .. 447.6 ns)
std dev 45.32 ns (36.73 ns .. 58.74 ns)
variance introduced by outliers: 90% (severely inflated)
如果查看简化的核心代码 (ghc -O2 -ddump-simpl
),就会发现不再有 foldr
(-O0
,一切都还在,[=包括 34=]s)。
因此我敢说您的代码(在 un-optimized 版本中,-O0
)比库代码更快,因为它更简单(潜在的代价是不太通用).优化后的库代码比您的版本更快,因为它是以编译器识别其优化潜力的方式编写的。 (不可否认,这是一些猜测)
我为 Haskell 的内置 []
列表数据类型编写了量化函数 exists
、forall
和 none
。在多个场合,这些似乎比 Prelude
/Data.List
s any
和 all
更有效。我天真地怀疑这种性能是由于 any
和 all
是使用 Θ(n) 折叠实现的。由于我对Haskell比较陌生,我想我一定是弄错了,或者说这种现象是有充分理由的。
来自Data.Foldable
:
-- | Determines whether any element of the structure satisfies the predicate.
any :: Foldable t => (a -> Bool) -> t a -> Bool
any p = getAny #. foldMap (Any #. p)
-- | Determines whether all elements of the structure satisfy the predicate.
all :: Foldable t => (a -> Bool) -> t a -> Bool
all p = getAll #. foldMap (All #. p)
我的实现:
exists :: (a -> Bool) -> [a] -> Bool
exists _ [] = False
exists pred (x : xs) | pred x = True
| otherwise = exists pred xs
和
forall pred = not . exists (not . pred)
none pred = not . exists pred = forall (not . pred)
消除布尔反转:
forall, none :: (a -> Bool) -> [a] -> Bool
forall _ [] = True
forall pred (x : xs) | pred x = forall pred xs
| otherwise = False
none _ [] = True
none pred (x : xs) | pred x = False
| otherwise = none pred xs
all
:
time 327.8 μs (322.4 μs .. 333.0 μs)
0.997 R² (0.996 R² .. 0.998 R²)
mean 328.7 μs (324.1 μs .. 334.2 μs)
std dev 16.95 μs (14.63 μs .. 22.02 μs)
和forall
:
time 113.2 μs (111.2 μs .. 115.0 μs)
0.997 R² (0.996 R² .. 0.998 R²)
mean 112.0 μs (110.0 μs .. 113.9 μs)
std dev 6.333 μs (5.127 μs .. 7.896 μs)
使用标准 nf
.
正如预期的那样,我没有重新发明折叠,但低估了编译器标志,并且天真地没想到 -O2
与默认优化级别性能相比会产生如此巨大的整体差异,也没有想到两者之间的优化效果差异个人定制和库公式。许多高效的专用标准功能优化显然只有在明确启用时才会启动。
Haskell 标签信息的 "performance" 部分强调了优化级别编译器标志在测试代码效率时的重要性。通常建议相信库函数实现的复杂性,而不是重新连接 RULES
编译指示或重新制定基本形式,尝试利用已经培养的优化潜力。
我发现它对 re-implement any
有很多启发:
import Prelude hiding (any)
import Criterion.Main
import Data.Foldable (foldMap)
import Data.Monoid
你的exists
:
exists :: (a -> Bool) -> [a] -> Bool
exists _ [] = False
exists pred (x : xs)
= if pred x
then True
else exists pred xs
使用(||)
的版本:
existsOr :: (a -> Bool) -> [a] -> Bool
existsOr _ [] = False
existsOr pred (x : xs) = pred x || existsOr pred xs
使用foldr
:
any :: (a -> Bool) -> [a] -> Bool
any pred = foldr ((||) . pred) False
使用 foldr
和 Any
:
anyF :: (a -> Bool) -> [a] -> Bool
anyF pred = getAny . foldr (mappend . (Any . pred)) mempty
使用 foldMap
和 Any
:
anyFM :: (a -> Bool) -> [a] -> Bool
anyFM pred = getAny . foldMap (Any . pred)
ghc -O0
的基准:
benchmarking exists
time 1.552 μs (1.504 μs .. 1.593 μs)
0.989 R² (0.983 R² .. 0.993 R²)
mean 1.482 μs (1.427 μs .. 1.545 μs)
std dev 196.1 ns (168.8 ns .. 229.2 ns)
variance introduced by outliers: 93% (severely inflated)
benchmarking existsOr
time 2.699 μs (2.616 μs .. 2.768 μs)
0.992 R² (0.988 R² .. 0.995 R²)
mean 2.629 μs (2.554 μs .. 2.704 μs)
std dev 277.8 ns (235.8 ns .. 351.1 ns)
variance introduced by outliers: 89% (severely inflated)
benchmarking any
time 5.551 μs (5.354 μs .. 5.777 μs)
0.990 R² (0.986 R² .. 0.995 R²)
mean 5.553 μs (5.395 μs .. 5.750 μs)
std dev 584.2 ns (447.5 ns .. 835.5 ns)
variance introduced by outliers: 88% (severely inflated)
benchmarking anyF
time 7.330 μs (7.081 μs .. 7.612 μs)
0.988 R² (0.982 R² .. 0.994 R²)
mean 7.502 μs (7.272 μs .. 7.762 μs)
std dev 848.2 ns (712.6 ns .. 1.022 μs)
variance introduced by outliers: 89% (severely inflated)
benchmarking anyFM
time 5.668 μs (5.451 μs .. 6.008 μs)
0.987 R² (0.975 R² .. 0.996 R²)
mean 5.807 μs (5.659 μs .. 5.975 μs)
std dev 542.5 ns (446.4 ns .. 721.8 ns)
variance introduced by outliers: 86% (severely inflated)
你的版本(exists
)确实是最快的,foldr
版本比较慢
在 ghc -O2
中,您的版本 (exists
) 是最慢的,而所有其他函数彼此几乎同样快:
benchmarking exists
time 753.5 ns (725.4 ns .. 779.9 ns)
0.990 R² (0.986 R² .. 0.995 R²)
mean 762.4 ns (737.0 ns .. 787.0 ns)
std dev 82.47 ns (66.79 ns .. 105.1 ns)
variance introduced by outliers: 91% (severely inflated)
benchmarking existsOr
time 491.5 ns (478.2 ns .. 503.2 ns)
0.994 R² (0.992 R² .. 0.996 R²)
mean 494.5 ns (481.1 ns .. 512.9 ns)
std dev 54.97 ns (42.54 ns .. 80.34 ns)
variance introduced by outliers: 92% (severely inflated)
benchmarking any
time 461.2 ns (442.0 ns .. 479.7 ns)
0.989 R² (0.985 R² .. 0.993 R²)
mean 456.0 ns (439.3 ns .. 476.3 ns)
std dev 60.04 ns (47.27 ns .. 89.47 ns)
variance introduced by outliers: 94% (severely inflated)
benchmarking anyF
time 436.9 ns (415.8 ns .. 461.0 ns)
0.978 R² (0.967 R² .. 0.988 R²)
mean 450.8 ns (430.1 ns .. 472.6 ns)
std dev 70.64 ns (57.04 ns .. 85.92 ns)
variance introduced by outliers: 96% (severely inflated)
benchmarking anyFM
time 438.9 ns (426.9 ns .. 449.5 ns)
0.993 R² (0.989 R² .. 0.996 R²)
mean 435.8 ns (421.4 ns .. 447.6 ns)
std dev 45.32 ns (36.73 ns .. 58.74 ns)
variance introduced by outliers: 90% (severely inflated)
如果查看简化的核心代码 (ghc -O2 -ddump-simpl
),就会发现不再有 foldr
(-O0
,一切都还在,[=包括 34=]s)。
因此我敢说您的代码(在 un-optimized 版本中,-O0
)比库代码更快,因为它更简单(潜在的代价是不太通用).优化后的库代码比您的版本更快,因为它是以编译器识别其优化潜力的方式编写的。 (不可否认,这是一些猜测)