内存使用异常(内存泄漏?)

Memory usage oddity (memory leak?)

我打算将以下类型和两个相关函数作为大型列表折叠的一部分进行测量:

类型和访问函数:

data Aggregate a = Aggregate (Maybe a) (a -> Aggregate a)

get :: Aggregate a -> Maybe a
get (Aggregate get' _) = get'

put :: Aggregate a -> a -> Aggregate a
put (Aggregate _ put') = put'

第一个函数:

updateFirst :: Maybe a -> a -> Aggregate a
updateFirst cur val = Aggregate new (updateFirst new)
  where new = mplus cur (Just val)

first :: Aggregate a
first = Aggregate Nothing (updateFirst Nothing)

第二个函数:

updateMinimum :: Ord a => Maybe a -> a -> Aggregate a
updateMinimum cur val = Aggregate new (updateMinimum new)
  where new = min <$> (mplus cur (Just val)) <*> Just val

minimum :: Ord a => Aggregate a
minimum = Aggregate Nothing (updateMinimum Nothing)

函数的编写方式是内存应该是常量。因此,我选择在 GHC 中使用 Strict 语言扩展,这会强制对 thunk 进行评估。 Weigh 库用于执行分配测量:

test :: A.Aggregate Double -> Int -> Maybe Double
test agg len = A.get $ F.foldl' A.put agg (take len $ iterate (+0.3) 2.0)

testGroup :: String -> A.Aggregate Double -> Weigh ()
testGroup name agg = sequence_ $ map (\cnt -> func (str cnt) (test agg) cnt) counts
  where
    counts  = map (10^) [0 .. 6]
    str cnt = name ++ (show cnt)

main :: IO ()
main =
  mainWith
    (do setColumns [Case, Allocated, Max, GCs]
        testGroup "fst" A.first
        testGroup "min" A.minimum
    )

Weigh输出如下:

Case          Allocated          Max  GCs
fst1                304           96    0
fst10             2,248           96    0
fst100           21,688           96    0
fst1000         216,088           96    0
fst10000      2,160,088           96    2
fst100000    21,600,088           96   20
fst1000000  216,000,088           96  207
min1                552           96    0
min10             4,728           96    0
min100           46,488           96    0
min1000         464,088           96    0
min10000      4,967,768           96    4
min100000    49,709,656    6,537,712   44
min1000000  497,226,840  103,345,512  445

为什么 GHC 突然在大小为 10^5 和 10^6 的输入中分配更多内存?我的 GHC 版本是 8.4.4

Haskell 中的严格注释是 "relational" so to speak. They say that something must be evaluated to WHNF (weak head normal form) 每当对 WHNF 评估其他内容时。

对于函数参数,这意味着在函数应用程序本身被评估为 WHNF 之前,函数参数将被评估为 WHNF。

对于严格字段,这意味着每当包含值被评估为 WHNF 时,该字段将被评估为 WHNF。这对于在用作累加器的数据类型中维护 "chain of strictness" 很有用(例如,用作 foldl' 的累加器的数据类型)。否则,即使包含值保留在 WHNF 中,thunk 也可以隐藏在惰性字段后面,并导致 space 泄漏。特别是,元组没有严格的组件,并且是累加器中 space 泄漏的常见来源。

MaybeJust 构造函数中包含的值也不严格。事实上,这就是问题的根源。 Just 中的值在 foldl' 的过程中从未被强制使用,thunk 在那里积累。

为什么 Strict prevent the problem? Because it only affects function and datatype definitions in the current module, and Maybe was defined elsewhere. The solution is to manually force the value inside the Just at each iteration, or perhaps define your own "strict Maybe".