为什么 ctypes.cast() 似乎会触发内存泄漏?

Why does ctypes.cast() appear to trigger a memory leak?

使用 python 3.9.9(在 Windows 10),对于大量使用 ctypes 的应用程序,我一直遇到与“内存不足”相关的问题。

我能够将这些问题归结为一个简单的复制器,它很快就会触发类似的 MemoryError:

import ctypes

if __name__ == '__main__':
    
    i = 0
    while True:
        print("i = %d" % (i))
        i = i + 1
        barray = bytearray(10485760)
        ubuffer = (ctypes.c_char * len(barray)).from_buffer(barray)
        c_ptr = ctypes.cast(ubuffer, ctypes.POINTER(ctypes.c_char))

在我的系统上,在 while 循环的大约 350 次迭代后,触发了 MemoryError:

...
i = 336
i = 337
i = 338
i = 339
i = 340
i = 341
Traceback (most recent call last):
  File "C:\..\crash_reproducer.py", line 9, in <module>
    barray = bytearray(10485760)
MemoryError

有人可以帮忙解释一下这是怎么回事吗?

其次,删除 'ctype.cast' 后,我不再遇到 MemoryError。知道为什么会这样吗?

我将通过几个断言来回答你的问题,然后我将在下面支持:

  1. 决定您的进程是否会停止的因素 运行ning 与它 运行 所在的操作系统无关,甚至与进程是 32 位还是 64 位无关位,而是进程在无法分配更多内存或被终止之前允许达到的最大大小。

  2. 即使在像您这样程序崩溃的环境中,崩溃的原因也不是内存泄漏,而是垃圾回收没有及早发生以防止崩溃。

  3. 进程的增长主要是因为bytearray对象每次分配给barray通过循环,在当前编写的程序下,在垃圾收集发生之前并没有真正释放,并且每个 bytearray 对象持有大量内存,当 bytearray 已创建。

  4. 垃圾收集在某些平台上发生得太晚的原因是垃圾收集逻辑没有考虑 bytearray 持有的总内存。

  5. 使用 ctypes.cast 的语句在您的案例中引入崩溃的原因是每次该语句都是 运行它引入了一个新的 python 对象引用循环和从循环中的一个对象到 bytearray 的引用链。在编写程序时,每个引用循环中的 none 个对象可以被释放,直到循环被打破,这反过来意味着任何 bytearray 对象的引用计数被保留通过该循环不会转到 0。我将展示循环是什么以及引用链导致 bytearray.

为了支持断言 (1),足以表明减少允许进程使用的最大虚拟内存会使其过早停止。在我的 Linux 系统上,64 位进程不会随着您的程序停止,但如果我减小最大大小,它将:

$ (ulimit -v 131072; python3 usectypes.py)
i = 0
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
Traceback (most recent call last):
  File "usectypes.py", line 9, in <module>
    barray = bytearray(10485760)
MemoryError

为了支持断言 (2),足以表明使垃圾收集更早发生可以防止崩溃,正如可以通过向问题中的程序添加两行来做到的那样,运行使程序运行并观察它不会崩溃。我将 运行ning 程序作为 reader 的练习,但这里是额外两行的样子:

import ctypes
import gc

if __name__ == '__main__':
    
    i = 0
    while True:
        print("i = %d" % (i))
        i = i + 1
        barray = bytearray(10485760)
        ubuffer = (ctypes.c_char * len(barray)).from_buffer(barray)
        c_ptr = ctypes.cast(ubuffer, ctypes.POINTER(ctypes.c_char))
        gc.collect()

为了支持断言 (3),可以简单地作为 reader 的练习,表明可以通过减少创建 bytearray 的行上的常量来修改原始程序 并且这将允许程序 运行 即使是 32 位 Windows 进程并且可以表明增加该行上的常量会导致程序 运行 在以下情况下过早终止它是 运行 作为 64 位 Linux 进程。这是有问题的行:

        barray = bytearray(10485760)

因此,例如,通过将行更改为:

,您应该能够使您的程序作为 32 位 windows 进程不会 运行 内存不足
        barray = bytearray(1048576)

类似地,我在我的 Linux 系统上的 64 位 python 进程中制作了程序 运行ning,在第 15 次循环时终止,即使没有使用ulimit 人为地减少允许该程序使用的虚拟内存,方法是将该行更改为如下所示:

        barray = bytearray(1048576000)

上面没有完全显示断言 (4),因为它实际上并没有查看 python 源代码来表明垃圾收集逻辑没有考虑 at all 用于 bytearray 占用的大小。但是,它至少强烈表明垃圾收集逻辑没有充分考虑 的大小,因为可以采用 Linux 的情况并使其在多次后停止工作通过循环。如果上面使用的常量太大而无法用于单个分配,则程序在第一次循环时就会失败。 cpython 的代码是开源的,可以通过查看它进一步验证这一断言,但我不打算在这里展示。

为了证明断言 (5) 我 运行 原始程序在 Linux 上运行了几分钟,它没有为我崩溃,使用 [=95 收集了进程的实时核心=]gcore 并使用开源工具分析核心 chap.

我在后台启动它并忽略了输出,因为如果需要的话,有办法从内核中查看 i 的值,但我给了它超过 3 分钟到 运行:

$ python3 usectypes.py >/dev/null &
[1] 658100
$ sleep 180

为了获得合适的核心,我为该过程设置了 coredump_filter,这样我就可以在核心中获得我需要的所有部分,然后使用 gcore[=160= 创建核心].

$ echo 0x37 >/proc/658100/coredump_filter
$ sudo gcore 658100
[sudo] password for tim: 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007ffb381f8a51 in __memset_avx2_erms () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/658100/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile core.658100
[Inferior 1 (process 658100) detached]
$ 

获得核心后,我启动了 chap 并使用了两个命令显示进程使用的 3,800,186,880 字节内存中,大小为 0xa00008 的分配使用了 3,764,390,712 字节,其中 0xa00008 是 10 MiB + 8,因此几乎可以肯定与那些字节数组实例相关联,并支持它们不被分配的理论释放直到垃圾回收发生。

$ chap core.658100*
chap> count writable
27 writable ranges use 0xe2824000 (3,800,186,880) bytes.
chap> summarize used /minsize a00000
Unrecognized allocations have 359 instances taking 0xe0600b38(3,764,390,712) bytes.
   Unrecognized allocations of size 0xa00008 have 359 instances taking 0xe0600b38(3,764,390,712) bytes.
359 allocations use 0xe0600b38 (3,764,390,712) bytes.

我使用以下命令对这些大分配进行抽样:

chap> describe used /minsize a00000 /geometricSample 100
Anchored allocation at 212d4b0 of size a00008

Anchored allocation at 3ff2dae0 of size a00008

2 allocations use 0x1400010 (20,971,536) bytes.

我选择了其中之一并查看了引用它的分配,确认它与 bytearray:

相关联
chap> describe exactincoming 212d4b0
Anchored allocation at 7ffb376addb0 of size c0
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x904080 (memoryview)

Anchored allocation at 7ffb376dc930 of size 40
This allocation matches pattern SimplePythonObject.
This has reference count 1 and python type 0x9005a0 (bytearray)

Anchored allocation at 7ffb376e32b0 of size 80
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x904220 (managedbuffer)

Anchored allocation at 7ffb376e3830 of size 80
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x211e830 (c_char_Array_10485760)

4 allocations use 0x200 (512) bytes.

可以查看 cpython 的源代码中的 (bytearrayobject.c)[https://github.com/python/cpython/blob/main/Objects/bytearrayobject.c] 以了解bytearray 对象将自己视为大缓冲区的唯一所有者,该缓冲区是 ob_bytes 字段的目标。

static void
bytearray_dealloc(PyByteArrayObject *self)
{
    if (self->ob_exports > 0) {
        PyErr_SetString(PyExc_SystemError,
                        "deallocated bytearray object has exported buffers");
        PyErr_Print();
    }
    if (self->ob_bytes != 0) {
        PyObject_Free(self->ob_bytes);
    }
    Py_TYPE(self)->tp_free((PyObject *)self);
}

这意味着要理解为什么大缓冲区被保存在内存中,我们需要理解为什么相应的 bytearray 对象仍在内存中。从 chap 执行此操作的一种方法是使用如下命令:

chap> describe allocation 7ffb376dc930 /extend %SimplePythonObject<- /extend %ContainerPythonObject<- /extend %PyDictKeysObject<- /skipUnfavoredReferences true /commentExtensions true
Anchored allocation at 7ffb376dc930 of size 40
This allocation matches pattern SimplePythonObject.
This has reference count 1 and python type 0x9005a0 (bytearray)

# Allocation at 0x7ffb376dc930 is referenced by allocation at 0x7ffb376addb0.
Anchored allocation at 7ffb376addb0 of size c0
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x904080 (memoryview)

# Allocation at 0x7ffb376addb0 is referenced by allocation at 0x7ffb37722710.
Anchored allocation at 7ffb37722710 of size b0
This allocation matches pattern PyDictKeysObject.

# Allocation at 0x7ffb37722710 is referenced by allocation at 0x7ffb376dca30.
Anchored allocation at 7ffb376dca30 of size 40
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x90bf00 (dict)

# Allocation at 0x7ffb376dca30 is referenced by allocation at 0x7ffb376e3830.
Anchored allocation at 7ffb376e3830 of size 80
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x211e830 (c_char_Array_10485760)

# Allocation at 0x7ffb376e3830 is referenced by allocation at 0x7ffb37722710.
# Allocation at 0x7ffb37722710 was already visited.

# Allocation at 0x7ffb376dc930 is referenced by allocation at 0x7ffb376e32b0.
Anchored allocation at 7ffb376e32b0 of size 80
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x904220 (managedbuffer)

# Allocation at 0x7ffb376e32b0 is referenced by allocation at 0x7ffb376addb0.
# Allocation at 0x7ffb376addb0 was already visited.

6 allocations use 0x2f0 (752) bytes.

上面显示一个循环持有bytearray因为bytearray被一个memoryview[=160持有=] 由 %PyDictKeysObject 持有,由 dict 持有,由 c_char_Array_10485760 持有,后者由相同的 %PyDictKeysObject 循环引用。这并不奇怪,因为我们期待一个循环,但一件有趣的事情是 c_char_Array_10485760 类型与 ubuffer 相关联而不是 c_ptr。所以这意味着即使分配给 c_ptr 显然是导致循环所必需的,但 c_ptr 的目标实际上不是循环的一部分。

为了验证这一点,我们实际上需要查看赋值 c_ptr 的语句实际做了什么。在不实际查看 ctypes 代码的情况下,一种方法是在语句前后收集一个核心,并观察循环在语句之前不存在但在语句之后存在.

为此我们可以稍微修改一下原来的程序,在赋值给c_ptr之前和之后休眠,如图:

import ctypes
import time

if __name__ == '__main__':
    
    i = 0
    while True:
        print("i = %d" % (i))
        i = i + 1
        barray = bytearray(10485760)
        ubuffer = (ctypes.c_char * len(barray)).from_buffer(barray)
        print("sleep after assign ubuffer")
        time.sleep(300)
        c_ptr = ctypes.cast(ubuffer, ctypes.POINTER(ctypes.c_char))
        print("sleep after assign c_ptr")
        time.sleep(300)

我们可以运行稍微修改一下程序,大致和之前一样,记得给它设置coredump_filter,这样我们就可以在内核中得到足够的信息:

$ python3 usectypeswithsleeps.py &
[1] 76026
$ i = 0
sleep after assign ubuffer
echo 0x37 >/proc/76026/coredump_filter
$ sudo gcore 76026
[sudo] password for tim: 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fd4131a0faa in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/76026/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile core.76026
[Inferior 1 (process 76026) detached]
$ mv core.76026 core.76026_before_assign_c_ptr
$ sleep after assign c_ptr
$ sudo gcore 76026
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fd4131a0faa in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/76026/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile core.76026
[Inferior 1 (process 76026) detached]
$ mv core.76026 core.76026_after_assign_c_ptr

现在只需要查看两个内核,看看循环在第一个内核中不存在但在第二个内核中存在。

查看赋值前的核心,只有一个大缓冲区,因为我们在 c_ptr 的第一次赋值之前收集了核心,正如预期的那样由字节数组引用:

chap> describe used ? /minsize a00000 /extend ?@0<-%SimplePythonObject
Anchored allocation at 7fd411cc7010 of size a00ff0

Anchored allocation at 7fd412755cb0 of size 40
This allocation matches pattern SimplePythonObject.
This has reference count 2 and python type 0x9005a0 (bytearray)

2 allocations use 0xa01030 (10,489,904) bytes.

bytearray 在我们在更大的核心中看到的那个地方的几个地方被引用。例如,在这种情况下,bytearray 实际上是当前 barray 的目标,这解释了为什么它被与 [= 关联的 %PyDictKeysObject 引用95=]主要函数。

chap> describe incoming 7fd412755cb0 /skipUnfavoredReferences true
Anchored allocation at 2972e20 of size 248
This allocation matches pattern PyDictKeysObject.
"__name__" : "__main__"
"__file__" : "usectypeswithsleeps.py"

Anchored allocation at 7fd412751440 of size 50
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 0 and python type 0x90c0a0 (tuple)

Anchored allocation at 7fd4127a0630 of size 80
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x904220 (managedbuffer)

Anchored allocation at 7fd4127bcf30 of size c0
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x904080 (memoryview)

4 allocations use 0x3d8 (984) bytes.

如上所示,bytearraymemoryview 引用,如预期的那样,如下所示,memoryview 由 %PyDictKeysObject 持有,后者由 dict.

持有
chap> describe incoming 7fd4127bcf30 /skipUnfavoredReferences true
Anchored allocation at 7fd412765a80 of size b0
This allocation matches pattern PyDictKeysObject.

1 allocations use 0xb0 (176) bytes.
chap> describe incoming 7fd412765a80 /skipUnfavoredReferences true
Anchored allocation at 7fd41287edb0 of size 40
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x90bf00 (dict)

1 allocations use 0x40 (64) bytes.
chap> describe incoming 7fd41287edb0 /skipUnfavoredReferences true
Anchored allocation at 7fd41273ad30 of size 40
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 0 and python type 0x90c0a0 (tuple)

Anchored allocation at 7fd4127a0e30 of size 80
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x2968980 (c_char_Array_10485760)

2 allocations use 0xc0 (192) bytes.

然而,此时当我们查看传入引用时 c_char_Array_10485760 我们可以看到它仅被 [=95 的局部变量引用=]main函数,意思是还没有参与到一个循环中。

chap> 描述传入 7fd4127a0e30 /skipUnfavoredReferences true 大小为 248 的 2972​​e20 锚定分配 此分配匹配模式 PyDictKeysObject。 “名称”:“主要” “文件”:“usectypeswithsleeps.py”

1 个分配使用 0x248 (584) 字节。

当我们在分配给 c_ptr 之后查看核心时,我们可以看到除了那个局部变量引用之外它还有一些其他的,包括一个来自位于 0x7fd412765a80 的 %PyDictKeysObject 然后我们可以看到 c_char_Array_10485760 现在处于一个循环中:

chap> describe incoming 7fd4127a0e30 /skipUnfavoredReferences true
Anchored allocation at 2972e20 of size 248
This allocation matches pattern PyDictKeysObject.
"__name__" : "__main__"
"__file__" : "usectypeswithsleeps.py"

Anchored allocation at 7fd41273ad30 of size 40
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 0 and python type 0x90c0a0 (tuple)

Anchored allocation at 7fd412765a80 of size b0
This allocation matches pattern PyDictKeysObject.

Anchored allocation at 7fd4127c49e0 of size 1f0
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 0 and python type 0x906420 (frame)

4 allocations use 0x528 (1,320) bytes.
chap> describe incoming 7fd412765a80 /skipUnfavoredReferences true
Anchored allocation at 7fd41287edb0 of size 40
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 2 and python type 0x90bf00 (dict)

1 allocations use 0x40 (64) bytes.
chap> describe incoming 7fd41287edb0 /skipUnfavoredReferences true
Anchored allocation at 7fd4127a0e30 of size 80
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 2 and python type 0x2968980 (c_char_Array_10485760)

Anchored allocation at 7fd4127a0eb0 of size 80
This allocation matches pattern ContainerPythonObject.
This has a PyGC_Head at the start so the real PyObject is at offset 0x10.
This has reference count 1 and python type 0x2977010 (LP_c_char)

2 allocations use 0x100 (256) bytes.

所以现在证明了断言 5。有一件事可以说明为什么涉及分配 c_ptr 的语句,就是在语句之​​后 LP_c_char 在 0x7fd4127a0eb0(这是 c_ptr 的目标)引用了与 相同的 dict c_char_Array_10485760 正在引用。对我来说,这似乎是一个轻微的错误,当 c_ptr 被重新分配并且 LP_c_char 以前是目标c_ptr 被释放,循环保持存在,但至少现在你可以通过使用 gc.collect()不时。