为什么不对这里的早期术语进行垃圾收集?
Why aren't the earlier terms here being garbage-collected?
如果我将 Kolakoski Sequence 定义为
kolakoski :: () -> [Int]
kolakoski () = 1 : 2 : helper ()
where
helper () = 2 : concat (zipWith replicate (helper ()) (cycle [1, 2]))
并用
找到第 500,000,000 项
kolakoski () !! 500000000
我发现使用 ghc -O 编译时会很快消耗大量内存。但是在关闭优化的情况下,它几乎什么都不用。哪个优化导致了这个问题,我该如何关闭它?
让我们比较一下实际数字。如果 运行 没有优化,您的 kolakoski
版本使用大约 70k:
$ ghc --make Kolakoski-Unit && ./Kolakoski-Unit +RTS -s
2
288,002,359,096 bytes allocated in the heap
1,343,933,816 bytes copied during GC
67,576 bytes maximum residency (422 sample(s))
52,128 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 551615 colls, 0 par 1.89s 2.30s 0.0000s 0.0001s
Gen 1 422 colls, 0 par 0.02s 0.02s 0.0001s 0.0001s
INIT time 0.00s ( 0.00s elapsed)
MUT time 37.34s ( 37.25s elapsed)
GC time 1.91s ( 2.33s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 39.25s ( 39.58s elapsed)
%GC time 4.9% (5.9% elapsed)
Alloc rate 7,712,197,063 bytes per MUT second
Productivity 95.1% of total user, 94.3% of total elapsed
经过优化,它使用大约 4GB(尽管任务管理器中的实际内存使用量上升到 6GB)。
$ ghc --make Kolakoski-Unit -O && ./Kolakoski-Unit +RTS -s
2
64,000,066,608 bytes allocated in the heap
27,971,527,816 bytes copied during GC
3,899,031,480 bytes maximum residency (34 sample(s))
91,679,728 bytes maximum slop
9549 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 122806 colls, 0 par 8.67s 9.48s 0.0001s 0.0148s
Gen 1 34 colls, 0 par 11.55s 69.78s 2.0524s 56.2970s
INIT time 0.00s ( 0.00s elapsed)
MUT time 8.80s ( 8.43s elapsed)
GC time 20.22s ( 79.26s elapsed)
EXIT time 0.03s ( 0.89s elapsed)
Total time 29.05s ( 88.58s elapsed)
%GC time 69.6% (89.5% elapsed)
Alloc rate 7,275,318,406 bytes per MUT second
Productivity 30.4% of total user, 10.0% of total elapsed
如果我们使用基于列表的版本并且没有优化,内存消耗与启用优化的版本非常相似:
kolakoskiList :: [Int]
kolakoskiList = 1 : 2 : helper
where
helper = 2 : concat (zipWith replicate helper (cycle [1, 2]))
$ ghc --make Kolakoski-List && ./Kolakoski-List +RTS -s
2
96,000,143,328 bytes allocated in the heap
26,615,974,536 bytes copied during GC
3,565,429,808 bytes maximum residency (34 sample(s))
83,610,688 bytes maximum slop
8732 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 184252 colls, 0 par 9.98s 10.16s 0.0001s 0.0006s
Gen 1 34 colls, 0 par 10.45s 21.61s 0.6357s 12.0792s
INIT time 0.00s ( 0.00s elapsed)
MUT time 12.02s ( 11.88s elapsed)
GC time 20.44s ( 31.77s elapsed)
EXIT time 0.05s ( 0.69s elapsed)
Total time 32.50s ( 44.34s elapsed)
%GC time 62.9% (71.7% elapsed)
Alloc rate 7,989,608,807 bytes per MUT second
Productivity 37.1% of total user, 27.2% of total elapsed
现在,我们可以检查 GHC flag reference 以了解在 -O
上自动设置的标志。由于列表版本似乎与优化版本做同样的事情,有人可能会认为 GHC 将 kolakoski
转换为 kolakoskiList
:
kolakoskiOptimized _ = kolakoskiList
让我们用 -ddump-simpl -dsupress-all
核实一下:
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 45, types: 30, coercions: 0}
kolakoski
kolakoski =
\ ds_d10r ->
case ds_d10r of _ { () ->
: (I# 1)
(: (I# 2)
(letrec {
helper_aNo
helper_aNo =
\ ds1_d10s ->
case ds1_d10s of _ { () ->
: (I# 2)
(concat
(zipWith
(replicate) (helper_aNo ()) (cycle (: (I# 1) (: (I# 2) ([]))))))
}; } in
helper_aNo ()))
}
main
main = print $fShowInt (!! (kolakoski ()) (I# 500000000))
main
main = runMainIO main
即使您不熟悉 GHC 的核心,您也可以看到 kolakoski
与您的原始版本基本相同。现在将其与 -O -ddump-simpl -dsupress-all
:
进行比较
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 125, types: 102, coercions: 9}
kolakoski6
kolakoski6 = I# 1
kolakoski5
kolakoski5 = I# 2
Rec {
go_r1NG
go_r1NG =
\ ds_a14B _ys_a14C ->
case ds_a14B of _ {
[] -> [];
: ipv_a14H ipv1_a14I ->
case _ys_a14C of _ {
[] -> [];
: ipv2_a14O ipv3_a14P ->
case ipv_a14H of _ { I# n#_a13J ->
case tagToEnum# (<=# n#_a13J 0) of _ {
False ->
let {
lvl2_s1N3
lvl2_s1N3 = : ipv2_a14O ([]) } in
letrec {
xs_a1LH
xs_a1LH =
\ m_a1LO ->
case tagToEnum# (<=# m_a1LO 1) of _ {
False -> : ipv2_a14O (xs_a1LH (-# m_a1LO 1));
True -> lvl2_s1N3
}; } in
++ (xs_a1LH n#_a13J) (go_r1NG ipv1_a14I ipv3_a14P);
True -> ++ ([]) (go_r1NG ipv1_a14I ipv3_a14P)
}
}
}
}
end Rec }
lvl_r1NH
lvl_r1NH = : kolakoski5 ([])
lvl1_r1NI
lvl1_r1NI = : kolakoski6 lvl_r1NH
Rec {
xs'_r1NJ
xs'_r1NJ = ++ lvl1_r1NI xs'_r1NJ
end Rec }
Rec {
kolakoski3
kolakoski3 = : kolakoski5 kolakoski4
kolakoski4
kolakoski4 = go_r1NG kolakoski3 xs'_r1NJ
end Rec }
kolakoski2
kolakoski2 = : kolakoski5 kolakoski3
kolakoski1
kolakoski1 = : kolakoski6 kolakoski2
kolakoski
kolakoski = \ ds_d13p -> case ds_d13p of _ { () -> kolakoski1 }
此版本包含几个顶级 CAFs,它们将在程序的生命周期内保留。所以你真的生成了第 500,000,000 个值的列表并保存它。
现在,那里发生了什么?函数内部的某些东西向外漂浮。让我们再次检查标志引用。 -O
:
暗示了一个有前途的标志
-ffull-laziness
Turn on full laziness (floating bindings outwards). Implied by -O
.
这就是导致您出现问题的标志。的确,你可以使用 ghc --make -O -fno-full-laziness Kolakoski-Unit.hs
得到你原来的内存消耗:
$ ghc --make Kolakoski-Unit.hs -O -fno-full-laziness && ./Kolakoski-Unit +RTS -s
2
192,001,417,688 bytes allocated in the heap
637,962,464 bytes copied during GC
66,104 bytes maximum residency (151 sample(s))
43,448 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 368364 colls, 0 par 1.34s 1.32s 0.0000s 0.0002s
Gen 1 151 colls, 0 par 0.00s 0.01s 0.0001s 0.0003s
INIT time 0.00s ( 0.00s elapsed)
MUT time 27.89s ( 28.13s elapsed)
GC time 1.34s ( 1.33s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 29.25s ( 29.46s elapsed)
%GC time 4.6% (4.5% elapsed)
Alloc rate 6,884,084,443 bytes per MUT second
Productivity 95.4% of total user, 94.7% of total elapsed
相关问题
- How to make a CAF not a CAF in Haskell?
如果我将 Kolakoski Sequence 定义为
kolakoski :: () -> [Int]
kolakoski () = 1 : 2 : helper ()
where
helper () = 2 : concat (zipWith replicate (helper ()) (cycle [1, 2]))
并用
找到第 500,000,000 项kolakoski () !! 500000000
我发现使用 ghc -O 编译时会很快消耗大量内存。但是在关闭优化的情况下,它几乎什么都不用。哪个优化导致了这个问题,我该如何关闭它?
让我们比较一下实际数字。如果 运行 没有优化,您的 kolakoski
版本使用大约 70k:
$ ghc --make Kolakoski-Unit && ./Kolakoski-Unit +RTS -s 2 288,002,359,096 bytes allocated in the heap 1,343,933,816 bytes copied during GC 67,576 bytes maximum residency (422 sample(s)) 52,128 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 551615 colls, 0 par 1.89s 2.30s 0.0000s 0.0001s Gen 1 422 colls, 0 par 0.02s 0.02s 0.0001s 0.0001s INIT time 0.00s ( 0.00s elapsed) MUT time 37.34s ( 37.25s elapsed) GC time 1.91s ( 2.33s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 39.25s ( 39.58s elapsed) %GC time 4.9% (5.9% elapsed) Alloc rate 7,712,197,063 bytes per MUT second Productivity 95.1% of total user, 94.3% of total elapsed
经过优化,它使用大约 4GB(尽管任务管理器中的实际内存使用量上升到 6GB)。
$ ghc --make Kolakoski-Unit -O && ./Kolakoski-Unit +RTS -s 2 64,000,066,608 bytes allocated in the heap 27,971,527,816 bytes copied during GC 3,899,031,480 bytes maximum residency (34 sample(s)) 91,679,728 bytes maximum slop 9549 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 122806 colls, 0 par 8.67s 9.48s 0.0001s 0.0148s Gen 1 34 colls, 0 par 11.55s 69.78s 2.0524s 56.2970s INIT time 0.00s ( 0.00s elapsed) MUT time 8.80s ( 8.43s elapsed) GC time 20.22s ( 79.26s elapsed) EXIT time 0.03s ( 0.89s elapsed) Total time 29.05s ( 88.58s elapsed) %GC time 69.6% (89.5% elapsed) Alloc rate 7,275,318,406 bytes per MUT second Productivity 30.4% of total user, 10.0% of total elapsed
如果我们使用基于列表的版本并且没有优化,内存消耗与启用优化的版本非常相似:
kolakoskiList :: [Int]
kolakoskiList = 1 : 2 : helper
where
helper = 2 : concat (zipWith replicate helper (cycle [1, 2]))
$ ghc --make Kolakoski-List && ./Kolakoski-List +RTS -s 2 96,000,143,328 bytes allocated in the heap 26,615,974,536 bytes copied during GC 3,565,429,808 bytes maximum residency (34 sample(s)) 83,610,688 bytes maximum slop 8732 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 184252 colls, 0 par 9.98s 10.16s 0.0001s 0.0006s Gen 1 34 colls, 0 par 10.45s 21.61s 0.6357s 12.0792s INIT time 0.00s ( 0.00s elapsed) MUT time 12.02s ( 11.88s elapsed) GC time 20.44s ( 31.77s elapsed) EXIT time 0.05s ( 0.69s elapsed) Total time 32.50s ( 44.34s elapsed) %GC time 62.9% (71.7% elapsed) Alloc rate 7,989,608,807 bytes per MUT second Productivity 37.1% of total user, 27.2% of total elapsed
现在,我们可以检查 GHC flag reference 以了解在 -O
上自动设置的标志。由于列表版本似乎与优化版本做同样的事情,有人可能会认为 GHC 将 kolakoski
转换为 kolakoskiList
:
kolakoskiOptimized _ = kolakoskiList
让我们用 -ddump-simpl -dsupress-all
核实一下:
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 45, types: 30, coercions: 0}
kolakoski
kolakoski =
\ ds_d10r ->
case ds_d10r of _ { () ->
: (I# 1)
(: (I# 2)
(letrec {
helper_aNo
helper_aNo =
\ ds1_d10s ->
case ds1_d10s of _ { () ->
: (I# 2)
(concat
(zipWith
(replicate) (helper_aNo ()) (cycle (: (I# 1) (: (I# 2) ([]))))))
}; } in
helper_aNo ()))
}
main
main = print $fShowInt (!! (kolakoski ()) (I# 500000000))
main
main = runMainIO main
即使您不熟悉 GHC 的核心,您也可以看到 kolakoski
与您的原始版本基本相同。现在将其与 -O -ddump-simpl -dsupress-all
:
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 125, types: 102, coercions: 9}
kolakoski6
kolakoski6 = I# 1
kolakoski5
kolakoski5 = I# 2
Rec {
go_r1NG
go_r1NG =
\ ds_a14B _ys_a14C ->
case ds_a14B of _ {
[] -> [];
: ipv_a14H ipv1_a14I ->
case _ys_a14C of _ {
[] -> [];
: ipv2_a14O ipv3_a14P ->
case ipv_a14H of _ { I# n#_a13J ->
case tagToEnum# (<=# n#_a13J 0) of _ {
False ->
let {
lvl2_s1N3
lvl2_s1N3 = : ipv2_a14O ([]) } in
letrec {
xs_a1LH
xs_a1LH =
\ m_a1LO ->
case tagToEnum# (<=# m_a1LO 1) of _ {
False -> : ipv2_a14O (xs_a1LH (-# m_a1LO 1));
True -> lvl2_s1N3
}; } in
++ (xs_a1LH n#_a13J) (go_r1NG ipv1_a14I ipv3_a14P);
True -> ++ ([]) (go_r1NG ipv1_a14I ipv3_a14P)
}
}
}
}
end Rec }
lvl_r1NH
lvl_r1NH = : kolakoski5 ([])
lvl1_r1NI
lvl1_r1NI = : kolakoski6 lvl_r1NH
Rec {
xs'_r1NJ
xs'_r1NJ = ++ lvl1_r1NI xs'_r1NJ
end Rec }
Rec {
kolakoski3
kolakoski3 = : kolakoski5 kolakoski4
kolakoski4
kolakoski4 = go_r1NG kolakoski3 xs'_r1NJ
end Rec }
kolakoski2
kolakoski2 = : kolakoski5 kolakoski3
kolakoski1
kolakoski1 = : kolakoski6 kolakoski2
kolakoski
kolakoski = \ ds_d13p -> case ds_d13p of _ { () -> kolakoski1 }
此版本包含几个顶级 CAFs,它们将在程序的生命周期内保留。所以你真的生成了第 500,000,000 个值的列表并保存它。
现在,那里发生了什么?函数内部的某些东西向外漂浮。让我们再次检查标志引用。 -O
:
-ffull-laziness
Turn on full laziness (floating bindings outwards). Implied by-O
.
这就是导致您出现问题的标志。的确,你可以使用 ghc --make -O -fno-full-laziness Kolakoski-Unit.hs
得到你原来的内存消耗:
$ ghc --make Kolakoski-Unit.hs -O -fno-full-laziness && ./Kolakoski-Unit +RTS -s 2 192,001,417,688 bytes allocated in the heap 637,962,464 bytes copied during GC 66,104 bytes maximum residency (151 sample(s)) 43,448 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 368364 colls, 0 par 1.34s 1.32s 0.0000s 0.0002s Gen 1 151 colls, 0 par 0.00s 0.01s 0.0001s 0.0003s INIT time 0.00s ( 0.00s elapsed) MUT time 27.89s ( 28.13s elapsed) GC time 1.34s ( 1.33s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 29.25s ( 29.46s elapsed) %GC time 4.6% (4.5% elapsed) Alloc rate 6,884,084,443 bytes per MUT second Productivity 95.4% of total user, 94.7% of total elapsed
相关问题
- How to make a CAF not a CAF in Haskell?