内存使用异常(内存泄漏?)
Memory usage oddity (memory leak?)
我打算将以下类型和两个相关函数作为大型列表折叠的一部分进行测量:
类型和访问函数:
data Aggregate a = Aggregate (Maybe a) (a -> Aggregate a)
get :: Aggregate a -> Maybe a
get (Aggregate get' _) = get'
put :: Aggregate a -> a -> Aggregate a
put (Aggregate _ put') = put'
第一个函数:
updateFirst :: Maybe a -> a -> Aggregate a
updateFirst cur val = Aggregate new (updateFirst new)
where new = mplus cur (Just val)
first :: Aggregate a
first = Aggregate Nothing (updateFirst Nothing)
第二个函数:
updateMinimum :: Ord a => Maybe a -> a -> Aggregate a
updateMinimum cur val = Aggregate new (updateMinimum new)
where new = min <$> (mplus cur (Just val)) <*> Just val
minimum :: Ord a => Aggregate a
minimum = Aggregate Nothing (updateMinimum Nothing)
函数的编写方式是内存应该是常量。因此,我选择在 GHC 中使用 Strict
语言扩展,这会强制对 thunk 进行评估。 Weigh
库用于执行分配测量:
test :: A.Aggregate Double -> Int -> Maybe Double
test agg len = A.get $ F.foldl' A.put agg (take len $ iterate (+0.3) 2.0)
testGroup :: String -> A.Aggregate Double -> Weigh ()
testGroup name agg = sequence_ $ map (\cnt -> func (str cnt) (test agg) cnt) counts
where
counts = map (10^) [0 .. 6]
str cnt = name ++ (show cnt)
main :: IO ()
main =
mainWith
(do setColumns [Case, Allocated, Max, GCs]
testGroup "fst" A.first
testGroup "min" A.minimum
)
Weigh
输出如下:
Case Allocated Max GCs
fst1 304 96 0
fst10 2,248 96 0
fst100 21,688 96 0
fst1000 216,088 96 0
fst10000 2,160,088 96 2
fst100000 21,600,088 96 20
fst1000000 216,000,088 96 207
min1 552 96 0
min10 4,728 96 0
min100 46,488 96 0
min1000 464,088 96 0
min10000 4,967,768 96 4
min100000 49,709,656 6,537,712 44
min1000000 497,226,840 103,345,512 445
为什么 GHC 突然在大小为 10^5 和 10^6 的输入中分配更多内存?我的 GHC 版本是 8.4.4
。
Haskell 中的严格注释是 "relational" so to speak. They say that something must be evaluated to WHNF (weak head normal form) 每当对 WHNF 评估其他内容时。
对于函数参数,这意味着在函数应用程序本身被评估为 WHNF 之前,函数参数将被评估为 WHNF。
对于严格字段,这意味着每当包含值被评估为 WHNF 时,该字段将被评估为 WHNF。这对于在用作累加器的数据类型中维护 "chain of strictness" 很有用(例如,用作 foldl'
的累加器的数据类型)。否则,即使包含值保留在 WHNF 中,thunk 也可以隐藏在惰性字段后面,并导致 space 泄漏。特别是,元组没有严格的组件,并且是累加器中 space 泄漏的常见来源。
Maybe
对 Just
构造函数中包含的值也不严格。事实上,这就是问题的根源。 Just
中的值在 foldl'
的过程中从未被强制使用,thunk 在那里积累。
为什么 Strict
prevent the problem? Because it only affects function and datatype definitions in the current module, and Maybe
was defined elsewhere. The solution is to manually force the value inside the Just
at each iteration, or perhaps define your own "strict Maybe
".
我打算将以下类型和两个相关函数作为大型列表折叠的一部分进行测量:
类型和访问函数:
data Aggregate a = Aggregate (Maybe a) (a -> Aggregate a)
get :: Aggregate a -> Maybe a
get (Aggregate get' _) = get'
put :: Aggregate a -> a -> Aggregate a
put (Aggregate _ put') = put'
第一个函数:
updateFirst :: Maybe a -> a -> Aggregate a
updateFirst cur val = Aggregate new (updateFirst new)
where new = mplus cur (Just val)
first :: Aggregate a
first = Aggregate Nothing (updateFirst Nothing)
第二个函数:
updateMinimum :: Ord a => Maybe a -> a -> Aggregate a
updateMinimum cur val = Aggregate new (updateMinimum new)
where new = min <$> (mplus cur (Just val)) <*> Just val
minimum :: Ord a => Aggregate a
minimum = Aggregate Nothing (updateMinimum Nothing)
函数的编写方式是内存应该是常量。因此,我选择在 GHC 中使用 Strict
语言扩展,这会强制对 thunk 进行评估。 Weigh
库用于执行分配测量:
test :: A.Aggregate Double -> Int -> Maybe Double
test agg len = A.get $ F.foldl' A.put agg (take len $ iterate (+0.3) 2.0)
testGroup :: String -> A.Aggregate Double -> Weigh ()
testGroup name agg = sequence_ $ map (\cnt -> func (str cnt) (test agg) cnt) counts
where
counts = map (10^) [0 .. 6]
str cnt = name ++ (show cnt)
main :: IO ()
main =
mainWith
(do setColumns [Case, Allocated, Max, GCs]
testGroup "fst" A.first
testGroup "min" A.minimum
)
Weigh
输出如下:
Case Allocated Max GCs
fst1 304 96 0
fst10 2,248 96 0
fst100 21,688 96 0
fst1000 216,088 96 0
fst10000 2,160,088 96 2
fst100000 21,600,088 96 20
fst1000000 216,000,088 96 207
min1 552 96 0
min10 4,728 96 0
min100 46,488 96 0
min1000 464,088 96 0
min10000 4,967,768 96 4
min100000 49,709,656 6,537,712 44
min1000000 497,226,840 103,345,512 445
为什么 GHC 突然在大小为 10^5 和 10^6 的输入中分配更多内存?我的 GHC 版本是 8.4.4
。
Haskell 中的严格注释是 "relational" so to speak. They say that something must be evaluated to WHNF (weak head normal form) 每当对 WHNF 评估其他内容时。
对于函数参数,这意味着在函数应用程序本身被评估为 WHNF 之前,函数参数将被评估为 WHNF。
对于严格字段,这意味着每当包含值被评估为 WHNF 时,该字段将被评估为 WHNF。这对于在用作累加器的数据类型中维护 "chain of strictness" 很有用(例如,用作 foldl'
的累加器的数据类型)。否则,即使包含值保留在 WHNF 中,thunk 也可以隐藏在惰性字段后面,并导致 space 泄漏。特别是,元组没有严格的组件,并且是累加器中 space 泄漏的常见来源。
Maybe
对 Just
构造函数中包含的值也不严格。事实上,这就是问题的根源。 Just
中的值在 foldl'
的过程中从未被强制使用,thunk 在那里积累。
为什么 Strict
prevent the problem? Because it only affects function and datatype definitions in the current module, and Maybe
was defined elsewhere. The solution is to manually force the value inside the Just
at each iteration, or perhaps define your own "strict Maybe
".