从分配的数组在 C 中创建一个 numpy 数组会导致内存泄漏

Creating a numpy array in C from an allocated array is causing memory leaks

我已将我的程序中的内存泄漏跟踪到我用 C 编写的 Python 模块,以有效地解析以 ASCII-hex 表示的数组。 (例如 "FF 39 00 FC ...")

char* buf;
unsigned short bytesPerTable;
if (!PyArg_ParseTuple(args, "sH", &buf, &bytesPerTable))
{
    return NULL;
}

unsigned short rowSize = bytesPerTable;
char* CArray = malloc(rowSize * sizeof(char));

// Populate CArray with data parsed from buf
ascii_buf_to_table(buf, bytesPerTable, rowSize, CArray);

int dims[1] = {rowSize};

PyObject* pythonArray = PyArray_SimpleNewFromData(1, (npy_intp*)dims, NPY_INT8, (void*)CArray);
return Py_BuildValue("(O)", pythonArray);

我意识到numpy不知道释放为CArray分配的内存,从而导致内存泄漏。在对这个问题进行一些研究之后,根据 this article 中评论的建议,我添加了以下行,它应该告诉数组它 "owns" 它的数据,并在它被删除时释放它。

PyArray_ENABLEFLAGS((PyArrayObject*)pythonArray, NPY_ARRAY_OWNDATA);

但我仍然遇到内存泄漏问题。我究竟做错了什么?如何让 NPY_ARRAY_OWNDATA 标志正常工作?

作为参考,ndarraytypes.h 中的文档使它看起来应该有效:

/*
 * If set, the array owns the data: it will be free'd when the array
 * is deleted.
 *
 * This flag may be tested for in PyArray_FLAGS(arr).
 */
#define NPY_ARRAY_OWNDATA         0x0004

同样作为参考,以下代码(调用 C 中定义的 Python 函数)演示了内存泄漏。

tableData = "FF 39 00 FC FD 37 FF FF F9 38 FE FF F1 39 FE FC \n" \
            "EF 38 FF FE 47 40 00 FB 3D 3B 00 FE 41 3D 00 FE \n" \
            "43 3E 00 FF 42 3C FE 02 3C 40 FD 02 31 40 FE FF \n" \
            "2E 3E FF FE 24 3D FF FE 15 3E 00 FC 0D 3C 01 FA \n" \
            "02 3E 01 FE 01 3E 00 FF F7 3F FF FB F4 3F FF FB \n" \
            "F1 3D FE 00 F4 3D FE 00 F9 3E FE FC FE 3E FD FE \n" \
            "F6 3E FE 02 03 3E 00 FE 04 3E 00 FC 0B 3D 00 FD \n" \
            "09 3A 00 01 03 3D 00 FD FB 3B FE FB FD 3E FD FF \n"

for i in xrange(1000000):
    PES = ParseTable(tableData, 128, 4) //Causes memory usage to skyrocket

可能是引用计数问题(来自How to extend NumPy):

One common source of reference-count errors is the Py_BuildValue function. Pay careful attention to the difference between the ‘N’ format character and the ‘O’ format character. If you create a new object in your subroutine (such as an output array), and you are passing it back in a tuple of return values, then you should most- likely use the ‘N’ format character in Py_BuildValue. The ‘O’ character will increase the reference count by one. This will leave the caller with two reference counts for a brand-new array. When the variable is deleted and the reference count decremented by one, there will still be that extra reference count, and the array will never be deallocated. You will have a reference-counting induced memory leak. Using the ‘N’ character will avoid this situation as it will return to the caller an object (inside the tuple) with a single reference count.