免费 ~90% Gen2 .NET 堆

Free ~90% Gen2 .NET heap

我们正在尝试调试 windows 托管服务上的内存泄漏。我得到了进程转储并开始在 windbg 中进行分析。 Heapstat 显示 SOH 中 >~90% 的内存是空闲的,但没有进行垃圾回收。系统现在抛出 OutOfMemory 异常。

0:000> !heapstat
Heap             Gen0         Gen1         Gen2          LOH
Heap0      1280897288     14491488   3047752776      5809312
Heap1       363914880     15115160   4352729656      4666416
Heap2      1703707464     30418232   2747904232     12655040
Heap3       494016304     20954808   4365778560       800136
Total      3842535936     80979688  14514165224     23930904

Free space:                                                 Percentage
Heap0      1249220440      4930840   2915300544        48424SOH: 96% LOH:  0%
Heap1       331677752      4231032   4180971712          184SOH: 95% LOH:  0%
Heap2      1681027112      6764328   2612922440      2073728SOH: 95% LOH: 16%
Heap3       462287616      5282384   4230317520           88SOH: 96% LOH:  0%
Total      3724212920     21208584  13939512216      2122424

0:000> !EEHeap -gc
Number of GC Heaps: 4
------------------------------
Heap 0 (000001621d6173a0)
generation 0 starts at 0x0000016882f12f60
generation 1 starts at 0x0000016882141000
generation 2 starts at 0x000001621deb1000
ephemeral segment allocation context: none
         segment             begin         allocated              size
000001621deb0000  000001621deb1000  00000162d3941448  0xb5a90448(3047752776)
0000016882140000  0000016882141000  00000168cf4a2068  0x4d361068(1295388776)
Large object heap starts at 0x000001661deb1000
         segment             begin         allocated              size
000001661deb0000  000001661deb1000  000001661e43b4a0  0x58a4a0(5809312)
Heap Size:               Size: 0x10337b950 (4348950864) bytes.
------------------------------
Heap 1 (000001621d640760)
generation 0 starts at 0x0000016672c34480
generation 1 starts at 0x0000016671dca0e8
generation 2 starts at 0x000001631deb1000
ephemeral segment allocation context: none
         segment             begin         allocated              size
000001631deb0000  000001631deb1000  000001641deae150  0xffffd150(4294955344)
000001666e6b0000  000001666e6b1000  0000016688742b00  0x1a091b00(436804352)
Large object heap starts at 0x000001662deb1000
         segment             begin         allocated              size
000001662deb0000  000001662deb1000  000001662e324430  0x473430(4666416)
Heap Size:               Size: 0x11a502080 (4736426112) bytes.
------------------------------
Heap 2 (000001621d669bb0)
generation 0 starts at 0x0000016783e43538
generation 1 starts at 0x0000016782141000
generation 2 starts at 0x000001641deb1000
ephemeral segment allocation context: none
         segment             begin         allocated              size
000001641deb0000  000001641deb1000  00000164c1b4c0e8  0xa3c9b0e8(2747904232)
0000016782140000  0000016782141000  00000167e970b880  0x675ca880(1734125696)
Large object heap starts at 0x000001663deb1000
         segment             begin         allocated              size
000001663deb0000  000001663deb1000  000001663eac29c0  0xc119c0(12655040)
Heap Size:               Size: 0x10be77328 (4494684968) bytes.
------------------------------
Heap 3 (000001621d692760)
generation 0 starts at 0x000001698794b530
generation 1 starts at 0x000001698654f678
generation 2 starts at 0x000001651deb1000
ephemeral segment allocation context: none
         segment             begin         allocated              size
000001651deb0000  000001651deb1000  000001661de2a808  0xfff79808(4294416392)
0000016982140000  0000016982141000  00000169a506cc60  0x22f2bc60(586333280)
Large object heap starts at 0x000001664deb1000
         segment             begin         allocated              size
000001664deb0000  000001664deb1000  000001664df74588  0xc3588(800136)
Heap Size:               Size: 0x122f689f0 (4881549808) bytes.
------------------------------
GC Heap Size:            Size: 0x44c65d6e8 (18461611752) bytes.

我尝试查看 gchandles,结果如下。这些句柄数很大,但现在我们陷入了如何进一步调试以找到根本原因的困境。

!gchandles
Handles:
    Strong Handles:       130
    Pinned Handles:       16
    Async Pinned Handles: 297
    Ref Count Handles:    88
    Weak Long Handles:    1261
    Weak Short Handles:   829
    SizedRef Handles:     8

大多数固定句柄是 System.Object[] 或 System.String[],但无助于找到根本原因。

000001621d541710 Pinned      000001661dfd7498   130584                  System.Object[]
000001621d541798 Pinned      000001661df214c0    65304                  System.Object[]
000001621d5417a0 Pinned      000001621deb1420       26                  System.String
000001621d5417a8 Pinned      000001621deb1420       26                  System.String
000001621d5417d0 Pinned      000001661deb9a30    32664                  System.Object[]
000001621d5417d8 Pinned      000001661deb5a38    16344                  System.Object[]
000001621d5417e0 Pinned      000001661deb3a20     8184                  System.Object[]
000001621d5417e8 Pinned      000001661deb35e8     1048                  System.Object[]
000001621d5417f0 Pinned      000001621deb1408       24                  System.Object
000001621d5417f8 Pinned      000001661deb1038     9616                  System.Object[]

是否有任何方法可以跟踪导致 SOH 碎片化并阻止此免费 space 回收的原因?

没有准备好完成的对象,我使用 !FinalizeQueue 检查过。

Heapstat showing that >~90% memory in SOH is free, but not getting garbage collected.

如果不进行垃圾回收,它就不会免费。您是说 "but not getting compacted" 吗?

执行 !dumpheap -type Free 查看垃圾收集器已经收集的内容。

Most of the pinned handles are System.Object[] or System.String[], but does not help in reaching to root cause.

我会说根本原因是固定的对象。你还想要什么根本原因?现在您知道您可以在代码审查中寻找什么了。如果有帮助,您还可以查看 Object[] 以查看其内容。

如果你想要每个对象的堆栈跟踪,你需要一个专用的工具,比如JetBrains dotMemory。如果你用 18 GB 这样做肯定会有点慢,所以你应该尝试以较小的比例重现它。

Is there any way to track what is causing the SOH to be fragmented and preventing this free space to reclaim?

SOH 因固定对象而碎片化。固定对象会阻止 SOH 被压缩,从而在通常不应该存在的地方留下空闲 space。

总而言之,查找 GCHandle.Alloc() 并确保每个对象都有一个 Free() 调用。