为什么这些定点 cata / ana 态射定义优于递归定义?
Why do these fixpoint cata / ana morphism definitions outperform the recursive ones?
考虑来自 的这些定义:
type Algebra f a = f a -> a
cata :: Functor f => Algebra f b -> Fix f -> b
cata alg = alg . fmap (cata alg) . unFix
fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg = fix $ \f -> alg . fmap f . unFix
type CoAlgebra f a = a -> f a
ana :: Functor f => CoAlgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg
fixana :: Functor f => CoAlgebra f a -> a -> Fix f
fixana coalg = fix $ \f -> Fix . fmap f . coalg
我 运行 进行了一些基准测试,结果让我感到惊讶。 criterion
报告类似十倍的加速,特别是在启用 O2
时。我想知道是什么导致了如此巨大的改进,并开始严重怀疑我的基准测试能力。
这正是我使用的 criterion
代码:
smallWord, largeWord :: Word
smallWord = 2^10
largeWord = 2^20
shortEnv, longEnv :: Fix Maybe
shortEnv = ana coAlg smallWord
longEnv = ana coAlg largeWord
benchCata = nf (cata alg)
benchFixcata = nf (fixcata alg)
benchAna = nf (ana coAlg)
benchFixana = nf (fixana coAlg)
main = defaultMain
[ bgroup "cata"
[ bgroup "short input"
[ env (return shortEnv) $ \x -> bench "cata" (benchCata x)
, env (return shortEnv) $ \x -> bench "fixcata" (benchFixcata x)
]
, bgroup "long input"
[ env (return longEnv) $ \x -> bench "cata" (benchCata x)
, env (return longEnv) $ \x -> bench "fixcata" (benchFixcata x)
]
]
, bgroup "ana"
[ bgroup "small word"
[ bench "ana" $ benchAna smallWord
, bench "fixana" $ benchFixana smallWord
]
, bgroup "large word"
[ bench "ana" $ benchAna largeWord
, bench "fixana" $ benchFixana largeWord
]
]
]
还有一些辅助代码:
alg :: Algebra Maybe Word
alg Nothing = 0
alg (Just x) = succ x
coAlg :: CoAlgebra Maybe Word
coAlg 0 = Nothing
coAlg x = Just (pred x)
用O0
编译,位数比较均匀。对于 O2
,fix~
函数似乎优于普通函数:
benchmarking cata/short input/cata
time 31.67 μs (31.10 μs .. 32.26 μs)
0.999 R² (0.998 R² .. 1.000 R²)
mean 31.20 μs (31.05 μs .. 31.46 μs)
std dev 633.9 ns (385.3 ns .. 1.029 μs)
variance introduced by outliers: 18% (moderately inflated)
benchmarking cata/short input/fixcata
time 2.422 μs (2.407 μs .. 2.440 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.399 μs (2.388 μs .. 2.410 μs)
std dev 37.12 ns (31.44 ns .. 47.06 ns)
variance introduced by outliers: 14% (moderately inflated)
如果有人能确认或发现缺陷,我将不胜感激。
*这次我用 ghc 8.2.2
编译了东西。)
后记
This post from back in 2012 非常详细地阐述了 fix
的性能。 (感谢 @chi
的 link。)
这是由于 how the fixed point is computed by fix
。
上面的@duplode 指出了这一点(以及我自己在 中指出的)。无论如何,我们可以将问题总结如下。
我们有
fix f = f (fix f)
有效,但在每次递归时都会进行 fix f
新调用。相反,
fix f = go
where go = f go
计算避免该调用的相同不动点。在库中 fix
以这种更有效的方式实现。
回到问题,考虑以下三个实现cata
:
cata :: Functor f => Algebra f b -> Fix f -> b
cata alg' = alg' . fmap (cata alg') . unFix
cata2 :: Functor f => Algebra f b -> Fix f -> b
cata2 alg' = go
where
go = alg' . fmap go . unFix
fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg' = fix $ \f -> alg' . fmap f . unFix
第一个在每次递归时调用 cata alg'
。第二个没有。第三个也没有,因为库 fix
是高效的。
事实上,我们可以使用 Criterion 来确认这一点,甚至使用 OP 使用的相同测试:
benchmarking cata/short input/cata
time 16.58 us (16.54 us .. 16.62 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 16.62 us (16.58 us .. 16.65 us)
std dev 111.6 ns (89.76 ns .. 144.0 ns)
benchmarking cata/short input/cata2
time 1.746 us (1.742 us .. 1.749 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.741 us (1.736 us .. 1.744 us)
std dev 12.69 ns (10.50 ns .. 17.31 ns)
benchmarking cata/short input/fixcata
time 2.010 us (2.003 us .. 2.016 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.006 us (2.001 us .. 2.011 us)
std dev 16.40 ns (14.05 ns .. 19.27 ns)
长输入也显示出改进。
benchmarking cata/long input/cata
time 119.3 ms (113.4 ms .. 125.8 ms)
0.996 R² (0.992 R² .. 1.000 R²)
mean 119.8 ms (117.7 ms .. 121.7 ms)
std dev 2.924 ms (2.073 ms .. 4.064 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking cata/long input/cata2
time 17.89 ms (17.43 ms .. 18.36 ms)
0.996 R² (0.992 R² .. 0.999 R²)
mean 18.02 ms (17.49 ms .. 18.62 ms)
std dev 1.362 ms (853.9 us .. 2.022 ms)
variance introduced by outliers: 33% (moderately inflated)
benchmarking cata/long input/fixcata
time 18.03 ms (17.56 ms .. 18.50 ms)
0.996 R² (0.992 R² .. 0.999 R²)
mean 18.17 ms (17.57 ms .. 18.72 ms)
std dev 1.365 ms (852.1 us .. 2.045 ms)
variance introduced by outliers: 33% (moderately inflated)
我也用 ana
进行了实验,观察到类似改进的 ana2
的性能与 fixana
一致。那里也没有惊喜。
考虑来自
type Algebra f a = f a -> a
cata :: Functor f => Algebra f b -> Fix f -> b
cata alg = alg . fmap (cata alg) . unFix
fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg = fix $ \f -> alg . fmap f . unFix
type CoAlgebra f a = a -> f a
ana :: Functor f => CoAlgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg
fixana :: Functor f => CoAlgebra f a -> a -> Fix f
fixana coalg = fix $ \f -> Fix . fmap f . coalg
我 运行 进行了一些基准测试,结果让我感到惊讶。 criterion
报告类似十倍的加速,特别是在启用 O2
时。我想知道是什么导致了如此巨大的改进,并开始严重怀疑我的基准测试能力。
这正是我使用的 criterion
代码:
smallWord, largeWord :: Word
smallWord = 2^10
largeWord = 2^20
shortEnv, longEnv :: Fix Maybe
shortEnv = ana coAlg smallWord
longEnv = ana coAlg largeWord
benchCata = nf (cata alg)
benchFixcata = nf (fixcata alg)
benchAna = nf (ana coAlg)
benchFixana = nf (fixana coAlg)
main = defaultMain
[ bgroup "cata"
[ bgroup "short input"
[ env (return shortEnv) $ \x -> bench "cata" (benchCata x)
, env (return shortEnv) $ \x -> bench "fixcata" (benchFixcata x)
]
, bgroup "long input"
[ env (return longEnv) $ \x -> bench "cata" (benchCata x)
, env (return longEnv) $ \x -> bench "fixcata" (benchFixcata x)
]
]
, bgroup "ana"
[ bgroup "small word"
[ bench "ana" $ benchAna smallWord
, bench "fixana" $ benchFixana smallWord
]
, bgroup "large word"
[ bench "ana" $ benchAna largeWord
, bench "fixana" $ benchFixana largeWord
]
]
]
还有一些辅助代码:
alg :: Algebra Maybe Word
alg Nothing = 0
alg (Just x) = succ x
coAlg :: CoAlgebra Maybe Word
coAlg 0 = Nothing
coAlg x = Just (pred x)
用O0
编译,位数比较均匀。对于 O2
,fix~
函数似乎优于普通函数:
benchmarking cata/short input/cata
time 31.67 μs (31.10 μs .. 32.26 μs)
0.999 R² (0.998 R² .. 1.000 R²)
mean 31.20 μs (31.05 μs .. 31.46 μs)
std dev 633.9 ns (385.3 ns .. 1.029 μs)
variance introduced by outliers: 18% (moderately inflated)
benchmarking cata/short input/fixcata
time 2.422 μs (2.407 μs .. 2.440 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.399 μs (2.388 μs .. 2.410 μs)
std dev 37.12 ns (31.44 ns .. 47.06 ns)
variance introduced by outliers: 14% (moderately inflated)
如果有人能确认或发现缺陷,我将不胜感激。
*这次我用 ghc 8.2.2
编译了东西。)
后记
This post from back in 2012 非常详细地阐述了 fix
的性能。 (感谢 @chi
的 link。)
这是由于 how the fixed point is computed by fix
。
上面的@duplode 指出了这一点(以及我自己在
我们有
fix f = f (fix f)
有效,但在每次递归时都会进行 fix f
新调用。相反,
fix f = go
where go = f go
计算避免该调用的相同不动点。在库中 fix
以这种更有效的方式实现。
回到问题,考虑以下三个实现cata
:
cata :: Functor f => Algebra f b -> Fix f -> b
cata alg' = alg' . fmap (cata alg') . unFix
cata2 :: Functor f => Algebra f b -> Fix f -> b
cata2 alg' = go
where
go = alg' . fmap go . unFix
fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg' = fix $ \f -> alg' . fmap f . unFix
第一个在每次递归时调用 cata alg'
。第二个没有。第三个也没有,因为库 fix
是高效的。
事实上,我们可以使用 Criterion 来确认这一点,甚至使用 OP 使用的相同测试:
benchmarking cata/short input/cata
time 16.58 us (16.54 us .. 16.62 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 16.62 us (16.58 us .. 16.65 us)
std dev 111.6 ns (89.76 ns .. 144.0 ns)
benchmarking cata/short input/cata2
time 1.746 us (1.742 us .. 1.749 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.741 us (1.736 us .. 1.744 us)
std dev 12.69 ns (10.50 ns .. 17.31 ns)
benchmarking cata/short input/fixcata
time 2.010 us (2.003 us .. 2.016 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.006 us (2.001 us .. 2.011 us)
std dev 16.40 ns (14.05 ns .. 19.27 ns)
长输入也显示出改进。
benchmarking cata/long input/cata
time 119.3 ms (113.4 ms .. 125.8 ms)
0.996 R² (0.992 R² .. 1.000 R²)
mean 119.8 ms (117.7 ms .. 121.7 ms)
std dev 2.924 ms (2.073 ms .. 4.064 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking cata/long input/cata2
time 17.89 ms (17.43 ms .. 18.36 ms)
0.996 R² (0.992 R² .. 0.999 R²)
mean 18.02 ms (17.49 ms .. 18.62 ms)
std dev 1.362 ms (853.9 us .. 2.022 ms)
variance introduced by outliers: 33% (moderately inflated)
benchmarking cata/long input/fixcata
time 18.03 ms (17.56 ms .. 18.50 ms)
0.996 R² (0.992 R² .. 0.999 R²)
mean 18.17 ms (17.57 ms .. 18.72 ms)
std dev 1.365 ms (852.1 us .. 2.045 ms)
variance introduced by outliers: 33% (moderately inflated)
我也用 ana
进行了实验,观察到类似改进的 ana2
的性能与 fixana
一致。那里也没有惊喜。