当还导入 NumPy 时,依赖于包裹在 pybind11 段错误中的 ARPACK 的自定义 C++ 库
Custom C++ library dependent on ARPACK wrapped in pybind11 segfaults when NumPy is also imported
我正在创建一个自定义库(用 C++ 编写),它使用 ARPACK-NG 执行一些数字操作。该函数包装在 pybind11 中,以提供对包中 Python 方法的访问。我观察到奇怪的行为。
问题概述
在调用我的方法之前导入 NumPy 时,发生段错误。
import numpy as np
from mylib import mymethod
mymethod() # Segfault
如果导入顺序发生变化,结果也是一样的。
from mylib import mymethod
import numpy as np
mymethod() # Segfault
在调用我的方法后导入 NumPy 时,一切正常。
from mylib import mymethod
mymethod() # Works fine
import numpy as np
# Further calls to NumPy or my library works also.
GDB 跟踪
回溯看起来像这样。
#0 0x00007fffec59d2ef in mkl_blas.cdotc () from /home/myname/.conda/envs/mylib/lib/./libmkl_intel_lp64.so.1
#1 0x00007ffff7281974 in cneupd_ () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#2 0x00007ffff72af228 in cneupd_c () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#3 0x00007ffff76b25cd in void complex_symmetric_runner<float>(double const&) ()
from /home/myname/Documents/mylib/build/lib.linux-x86_64-3.9/mylib/libmylib.so
#4 0x00007ffff76b102b in mymethod() ()
可复制的例子
测试代码与C++ example provided by ARPACK-NG基本相同,主要方法被mymethod()
替换。最小绑定码为
#include<pybind11/pybind11.h>
// ARPACK-NG C++ example code goes here. The main method is replaced with mymethod so it can be called from pybind11.
void mymethod(){
// ...Contents of the main() function in the example ARPACK-NG code...
}
PYBIND11_MODULE(mylib, m){
m.def("mymethod", mymethod);
}
我猜是什么问题。
我认为 NumPy 和 MKL 的初始化存在一些问题,类似于 this issue。根据我收集到的信息,NumPy 通过 mkl_rt
通过 libmkl_rt.so
动态链接到 MKL,如下面的 NumPy 配置所示。
import numpy
numpy.show_config()
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CNL
我的库通过 ARPACK-NG 的共享库动态链接到它,并且根据 GDB 跟踪,最终链接到 libmkl_intel_lp64.so
。然而,这令人困惑,因为当我键入 ldd /home/myname/.conda/envs/mylib/lib/libarpack.so.2
时,没有提到 libarpack.so
链接到 MKL。
linux-vdso.so.1 (0x0000697945afd000)
libblas.so.3 => /home/myname/.conda/envs/mylib/lib/./libblas.so.3 (0x0000697945200000)
libgfortran.so.4 => /home/myname/.conda/envs/mylib/lib/./libgfortran.so.4 (0x0000697945972000)
libm.so.6 => /usr/lib/libm.so.6 (0x00006979450db000)
libc.so.6 => /usr/lib/libc.so.6 (0x0000697944ed1000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x000069794596d000)
libquadmath.so.0 => /home/myname/.conda/envs/mylib/lib/./libquadmath.so.0 (0x0000697944e97000)
libgcc_s.so.1 => /home/myname/.conda/envs/mylib/lib/./libgcc_s.so.1 (0x0000697944e82000)
/usr/lib64/ld-linux-x86-64.so.2 (0x0000697945aff000)
如果我猜到发生了什么,NumPy 正在检查导入时是否加载了某些 BLAS 库。如果我的代码首先被调用,libblas.so
被它加载并且 NumPy 恰好使用它。但是,如果首先导入 NumPy,它会加载 BLAS 库的 MKL,这会以某种方式干扰 libarpack.so
.
我的评估是否正确,有没有办法解决这个问题?
据我所知,我认为问题的根本原因是正确的。我找到了一个解决方案,虽然不是完全令人满意,但仍然解决了问题:使用 nomkl
包(即 conda create -n mylib_nomkl nomkl python=3.9 numpy
)实例化 Anaconda 环境。 NumPy 将不再尝试从 ARPACK-NG.
下换出 BLAS
我正在创建一个自定义库(用 C++ 编写),它使用 ARPACK-NG 执行一些数字操作。该函数包装在 pybind11 中,以提供对包中 Python 方法的访问。我观察到奇怪的行为。
问题概述
在调用我的方法之前导入 NumPy 时,发生段错误。
import numpy as np
from mylib import mymethod
mymethod() # Segfault
如果导入顺序发生变化,结果也是一样的。
from mylib import mymethod
import numpy as np
mymethod() # Segfault
在调用我的方法后导入 NumPy 时,一切正常。
from mylib import mymethod
mymethod() # Works fine
import numpy as np
# Further calls to NumPy or my library works also.
GDB 跟踪
回溯看起来像这样。
#0 0x00007fffec59d2ef in mkl_blas.cdotc () from /home/myname/.conda/envs/mylib/lib/./libmkl_intel_lp64.so.1
#1 0x00007ffff7281974 in cneupd_ () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#2 0x00007ffff72af228 in cneupd_c () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#3 0x00007ffff76b25cd in void complex_symmetric_runner<float>(double const&) ()
from /home/myname/Documents/mylib/build/lib.linux-x86_64-3.9/mylib/libmylib.so
#4 0x00007ffff76b102b in mymethod() ()
可复制的例子
测试代码与C++ example provided by ARPACK-NG基本相同,主要方法被mymethod()
替换。最小绑定码为
#include<pybind11/pybind11.h>
// ARPACK-NG C++ example code goes here. The main method is replaced with mymethod so it can be called from pybind11.
void mymethod(){
// ...Contents of the main() function in the example ARPACK-NG code...
}
PYBIND11_MODULE(mylib, m){
m.def("mymethod", mymethod);
}
我猜是什么问题。
我认为 NumPy 和 MKL 的初始化存在一些问题,类似于 this issue。根据我收集到的信息,NumPy 通过 mkl_rt
通过 libmkl_rt.so
动态链接到 MKL,如下面的 NumPy 配置所示。
import numpy
numpy.show_config()
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CNL
我的库通过 ARPACK-NG 的共享库动态链接到它,并且根据 GDB 跟踪,最终链接到 libmkl_intel_lp64.so
。然而,这令人困惑,因为当我键入 ldd /home/myname/.conda/envs/mylib/lib/libarpack.so.2
时,没有提到 libarpack.so
链接到 MKL。
linux-vdso.so.1 (0x0000697945afd000)
libblas.so.3 => /home/myname/.conda/envs/mylib/lib/./libblas.so.3 (0x0000697945200000)
libgfortran.so.4 => /home/myname/.conda/envs/mylib/lib/./libgfortran.so.4 (0x0000697945972000)
libm.so.6 => /usr/lib/libm.so.6 (0x00006979450db000)
libc.so.6 => /usr/lib/libc.so.6 (0x0000697944ed1000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x000069794596d000)
libquadmath.so.0 => /home/myname/.conda/envs/mylib/lib/./libquadmath.so.0 (0x0000697944e97000)
libgcc_s.so.1 => /home/myname/.conda/envs/mylib/lib/./libgcc_s.so.1 (0x0000697944e82000)
/usr/lib64/ld-linux-x86-64.so.2 (0x0000697945aff000)
如果我猜到发生了什么,NumPy 正在检查导入时是否加载了某些 BLAS 库。如果我的代码首先被调用,libblas.so
被它加载并且 NumPy 恰好使用它。但是,如果首先导入 NumPy,它会加载 BLAS 库的 MKL,这会以某种方式干扰 libarpack.so
.
我的评估是否正确,有没有办法解决这个问题?
据我所知,我认为问题的根本原因是正确的。我找到了一个解决方案,虽然不是完全令人满意,但仍然解决了问题:使用 nomkl
包(即 conda create -n mylib_nomkl nomkl python=3.9 numpy
)实例化 Anaconda 环境。 NumPy 将不再尝试从 ARPACK-NG.