"Pointer being freed was not allocated" 将 cython 与基于 FFTW 的外部 c 库并行使用时出错

Question

我在尝试什么

我有一个代码，之前在 python 中使用 multiprocess 进行了并行化，它运行良好（尽管速度慢且内存不足）。我决定尝试将其转换为 cython。我是 cython 的新手，在 c 方面经验不多。下面的例子是我能得到的尽可能简化的，它串行工作，但是一旦我并行化它，它就不再工作了。由于并行运行ning 的性质，我检查了所有代码并关闭了 gil。

该代码依赖于外部 C 库 https://github.com/astro-informatics/ssht/（README 中的编译说明），它在后台使用 fftw。这个库有它自己的 cython 文件，它调用我正在使用的相同 c 函数 (ssht_core_mw_inverse_sov_sym_ss)。与我的非常相似的函数（在那个 repo 的 cython 文件中）看起来像这样

def ssht_inverse_mwss_complex(
    np.ndarray[ double complex, ndim=1, mode="c"] f_lm not None,
    int L,
    int spin
):
    cdef ssht_dl_method_t dl_method = SSHT_DL_RISBO
    f_mwss_c = np.empty([L+1,2*L,], dtype=complex)
    ssht_core_mw_inverse_sov_sym_ss(
        <double complex*> np.PyArray_DATA(f_mwss_c),
        <const double complex*> np.PyArray_DATA(f_lm),
        L,
        spin,
        dl_method,
        0
    )
    return f_mwss_c

我基本上不得不在没有 gil 的情况下根据需要在本地重新创建它。

问题

当我运行使用 cython 模块的脚本时，出现分段错误，但每次错误都略有不同。要么没有解释错误，要么讲指针内存分配：

malloc: *** error for object 0x7f83de8b58e0: pointer being freed was not allocated
python(21855,0x70000d333000) malloc: Double free of object 0x7f83de8b58e0
python(21855,0x70000d536000) malloc: *** set a breakpoint in malloc_error_break to debug

或者好像是FFTW特有的：

fftw: /Users/runner/.conan/data/fftw/3.3.8/_/_/build/55f3919d9a41efc78a625ee65e5d1ea60d02b2ff/source_subfolder/kernel/planner.c:261: assertion failed: SLVNDX(slot) == slvndx

环顾四周，我发现了这种问题 https://github.com/bytedeco/javacpp-presets/issues/435 所以我希望 fftw 意味着我正在尝试做的事情是不可能的（而且我更希望不擅长 c)。

我试过的

我试过使用 libc.stdlib 中的 free，但这并没有奏效。我也曾尝试使用 cython.view 数组创建数组，但努力使它们成为 double complex（这是 ssht 库所必需的）。我试图让 cython 调试工作，但在我的 mac 上工作时遇到了问题。我也用了2天的时间撞墙...

我的系统

我以通常的方式编译我的扩展 python setup.py build_ext --inplace。我正在使用 python3.8.5、Cython==0.29.21。我运行正在 macOS 11.0.1。

代码

我的 cython 文件：

import numpy as np
from libc.stdio cimport printf
from libc.stdlib cimport calloc, malloc
from cython.parallel import parallel, prange
from openmp cimport omp_get_thread_num

# needed to recreate without importing (for nogil)
cdef extern from "ssht/ssht.h" nogil:
    ctypedef enum ssht_dl_method_t:
        SSHT_DL_RISBO, SSHT_DL_TRAPANI
    void ssht_core_mw_inverse_sov_sym_ss(
        double complex *f,
        const double complex *flm,
        int L,
        int spin,
        ssht_dl_method_t dl_method,
        int verbosity
    )

def my_cython_module(int L, int threads):
    """
    dummy function more to show that parallel loops fails
    """
    cdef int ell, tid
    with nogil, parallel(num_threads=threads):
        tid = omp_get_thread_num()
        for ell in prange(L * L, schedule="guided"):
            printf("ell: %i\n", ell)
            _ssht_inverse(L, ell)

cdef double complex * _ssht_inverse(int L, int ind) nogil:
    """
    function creates a 1D complex array flm  with zeros and a 1
    then calls c function to get 2D complex array f
    not returning anything as it's just for demonstration
    """
    cdef ssht_dl_method_t dl_method = SSHT_DL_RISBO
    cdef double complex *flm = NULL
    cdef double complex *f = NULL
    flm = <double complex *> calloc(L * L, sizeof(double complex))
    flm[ind] = 1
    f = <double complex *> malloc((L + 1) * (2 * L) * sizeof(double complex))
    ssht_core_mw_inverse_sov_sym_ss(f, flm, L, 0, dl_method, 0)
    return f

我的setup.py:

import os
from Cython.Build import cythonize
from setuptools import Extension, setup

# running on mac so need GCC instead of clang
os.environ["CC"] = "gcc-10"

setup(
    ext_modules=cythonize(
        Extension(
            "test",
            ["*.pyx"],
            extra_compile_args=["-fopenmp"],
            extra_link_args=["-fopenmp"],
            include_dirs=["/usr/local/include"],
        ),
        annotate=True,
        language_level=3,
        compiler_directives=dict(boundscheck=False, embedsignature=True),
    ),
)

python 中使用并行性的最小工作示例

以下工作 (pip install pyssht) 并成功并行工作。所以问题似乎出在 c/cython

# the cython wrapper from the external library
from pyssht import ssht_inverse_mwss_complex
import numpy as np
from multiprocess import Pool

def my_python_implementation(L, threads):
    """
    the python equivalent in parallel
    """
    def func(chunk):
        """
        deals with each chunk
        """
        for ell in chunk:
            print(f"ell: {ell}")
            flm = np.zeros(L * L, dtype=np.complex_)
            flm[ell] = 1
            ssht_inverse_mwss_complex(flm, L, 0)

    chunks = np.array_split(np.arange(L * L), threads)
    with Pool(processes=threads) as p:
        p.map(func, chunks)

提前致谢！

看到我能够在 python 中并行运行它，我真的希望它可以完成。

Answer 1

正如@DavidW 指出的那样，由于 FFTW 不能运行多线程（但在 python 和 multiprocessing)。问题与我正在使用的依赖于 FFTW 的外部代码有关。我提出了一个问题，看看我们是否可以强制 FFTW 位为单线程 https://github.com/astro-informatics/ssht/issues/44

"Pointer being freed was not allocated" 将 cython 与基于 FFTW 的外部 c 库并行使用时出错

"Pointer being freed was not allocated" error using cython in parallel with an external c library based on FFTW

python

c

pointers

cython

fftw

我在尝试什么

问题

我试过的

我的系统

代码

python 中使用并行性的最小工作示例

提前致谢！