当还导入 NumPy 时,依赖于包裹在 pybind11 段错误中的 ARPACK 的自定义 C++ 库

Custom C++ library dependent on ARPACK wrapped in pybind11 segfaults when NumPy is also imported

我正在创建一个自定义库(用 C++ 编写),它使用 ARPACK-NG 执行一些数字操作。该函数包装在 pybind11 中,以提供对包中 Python 方法的访问。我观察到奇怪的行为。

问题概述

在调用我的方法之前导入 NumPy 时,发生段错误。

import numpy as np
from mylib import mymethod

mymethod() # Segfault

如果导入顺序发生变化,结果也是一样的。

from mylib import mymethod
import numpy as np

mymethod() # Segfault

在调用我的方法后导入 NumPy 时,一切正常。

from mylib import mymethod
mymethod() # Works fine

import numpy as np

# Further calls to NumPy or my library works also.

GDB 跟踪

回溯看起来像这样。

#0  0x00007fffec59d2ef in mkl_blas.cdotc () from /home/myname/.conda/envs/mylib/lib/./libmkl_intel_lp64.so.1
#1  0x00007ffff7281974 in cneupd_ () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#2  0x00007ffff72af228 in cneupd_c () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#3  0x00007ffff76b25cd in void complex_symmetric_runner<float>(double const&) ()
   from /home/myname/Documents/mylib/build/lib.linux-x86_64-3.9/mylib/libmylib.so
#4  0x00007ffff76b102b in mymethod() ()

可复制的例子

测试代码与C++ example provided by ARPACK-NG基本相同,主要方法被mymethod()替换。最小绑定码为

#include<pybind11/pybind11.h>

// ARPACK-NG C++ example code goes here. The main method is replaced with mymethod so it can be called from pybind11.

void mymethod(){
   // ...Contents of the main() function in the example ARPACK-NG code...
}

PYBIND11_MODULE(mylib, m){
    m.def("mymethod", mymethod);
}

我猜是什么问题。

我认为 NumPy 和 MKL 的初始化存在一些问题,类似于 this issue。根据我收集到的信息,NumPy 通过 mkl_rt 通过 libmkl_rt.so 动态链接到 MKL,如下面的 NumPy 配置所示。

import numpy
numpy.show_config()
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/myname/.conda/envs/mylib/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/myname/.conda/envs/mylib/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/myname/.conda/envs/mylib/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/myname/.conda/envs/mylib/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/myname/.conda/envs/mylib/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/myname/.conda/envs/mylib/include']
Supported SIMD extensions in this NumPy install:
    baseline = SSE,SSE2,SSE3
    found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
    not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CNL

我的库通过 ARPACK-NG 的共享库动态链接到它,并且根据 GDB 跟踪,最终链接到 libmkl_intel_lp64.so。然而,这令人困惑,因为当我键入 ldd /home/myname/.conda/envs/mylib/lib/libarpack.so.2 时,没有提到 libarpack.so 链接到 MKL。

linux-vdso.so.1 (0x0000697945afd000)
libblas.so.3 => /home/myname/.conda/envs/mylib/lib/./libblas.so.3 (0x0000697945200000)
libgfortran.so.4 => /home/myname/.conda/envs/mylib/lib/./libgfortran.so.4 (0x0000697945972000)
libm.so.6 => /usr/lib/libm.so.6 (0x00006979450db000)
libc.so.6 => /usr/lib/libc.so.6 (0x0000697944ed1000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x000069794596d000)
libquadmath.so.0 => /home/myname/.conda/envs/mylib/lib/./libquadmath.so.0 (0x0000697944e97000)
libgcc_s.so.1 => /home/myname/.conda/envs/mylib/lib/./libgcc_s.so.1 (0x0000697944e82000)
/usr/lib64/ld-linux-x86-64.so.2 (0x0000697945aff000)

如果我猜到发生了什么,NumPy 正在检查导入时是否加载了某些 BLAS 库。如果我的代码首先被调用,libblas.so 被它加载并且 NumPy 恰好使用它。但是,如果首先导入 NumPy,它会加载 BLAS 库的 MKL,这会以某种方式干扰 libarpack.so.

我的评估是否正确,有没有办法解决这个问题?

据我所知,我认为问题的根本原因是正确的。我找到了一个解决方案,虽然不是完全令人满意,但仍然解决了问题:使用 nomkl 包(即 conda create -n mylib_nomkl nomkl python=3.9 numpy)实例化 Anaconda 环境。 NumPy 将不再尝试从 ARPACK-NG.

下换出 BLAS