Import c-modules from embedded Python interpreter (pybind11) in a shared object raises an undefined symbol exception
Import c-modules from embedded Python interpreter (pybind11) in a shared object raises an undefined symbol exception
更新(1): 一些已编译的stdlib模块可以看到同样的问题。这与 numpy 无关(我从标题中删除了 numpy 标签和 numpy)
我正在编写一个包含嵌入式 python 解释器的共享 object(即软件插件)。共享 object 启动解释器,解释器导入要执行的 python 模块。如果导入的模块包含 numpy,我会得到一个未定义的符号错误。实际的未定义符号错误在 python 版本或 numpy 版本的函数中发生变化,但它始终是 PyExc_*
系列的结构。
我已将问题简化为这个最小示例(它实际上包含两个文件):
// main.cc
#include "pybind11/embed.h"
namespace py = pybind11;
extern "C" {
int main() {
py::scoped_interpreter guard{};
auto py_module = py::module::import("numpy");
auto version = py_module.attr("__version__");
py::print(version);
return 0;
}
}
// load.cc
#include <dlfcn.h>
int main() {
void * lib = dlopen("./libissue.so", RTLD_NOW);
int(*fnc)(void) = (int(*)(void))dlsym(lib, "main");
fnc();
dlclose(lib);
return 0;
}
我正在使用此 CMakeFile 进行编译:
cmake_minimum_required(VERSION 3.14)
include(FetchContent)
FetchContent_Declare(
pybind11
GIT_REPOSITORY https://github.com/pybind/pybind11
GIT_TAG v2.8.1)
FetchContent_MakeAvailable(pybind11)
project(
pybind_issue
LANGUAGES C CXX
VERSION 1.0.0)
add_library(issue SHARED main.cc)
set_target_properties(issue PROPERTIES
POSITION_INDEPENDENT_CODE ON
CXX_STANDARD 11)
target_link_libraries(issue PRIVATE pybind11::embed)
# also tested with
# target_link_libraries(main PRIVATE mylib pybind11::lto pybind11::embed pybind11::module)
add_executable(issue_main main.cc)
set_target_properties(issue_main PROPERTIES
POSITION_INDEPENDENT_CODE ON
CXX_STANDARD 11)
target_link_libraries(issue_main PRIVATE pybind11::embed)
add_executable(loader load.cc)
target_link_libraries(loader PRIVATE ${CMAKE_DL_LIBS})
这个CMakeFile编译了三个目标:
- 加载解释器、导入 numpy 并打印其版本的可执行文件
- 一个共享 object,它导出一个 C 函数,做同样的事情
- 共享 object 的简单加载程序,它尝试 运行 从共享 object 导出的函数
"main"
。
如果我 运行 issue_main
可执行文件,我会在屏幕上正确显示 numpy 版本。如果我 运行 loader
我得到这个错误:
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): ImportError:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
* The Python version is: Python3.8 from "/usr/bin/python3"
* The NumPy version is: "1.20.3"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError
At:
/usr/local/lib/python3.8/dist-packages/numpy/core/__init__.py(51): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(1050): _handle_fromlist
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(961): _find_and_load_unlocked
irb(main):003:1* module TestMain
=> #<FFI::Function address=0x00007f9d0ba43bb6>
irb(main):008:0>
irb(main):009:0> TestMain.main
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.8 from "/usr/bin/python3"
* The NumPy version is: "1.20.3"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError
At:
/usr/local/lib/python3.8/dist-packages/numpy/core/__init__.py(51): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(1050): _handle_fromlist
/usr/local/lib/python3.8/dist-packages/numpy/__init__.py(145): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(961): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
这个问题在 linux 上是特定的(未在 OSX 上测试),而在 Windows 上一切都按预期进行(代码有所更改,为了完整性报告在这里):
// main.cc
#include "pybind11/embed.h"
namespace py = pybind11;
extern "C" {
__declspec(dllexport) int main() {
py::scoped_interpreter guard{};
auto py_module = py::module::import("numpy");
auto version = py_module.attr("__version__");
py::print(version);
return 0;
}
}
// load.cc
#include <windows.h>
int main() {
HMODULE lib = LoadLibrary("./issue.dll");
int(*fnc)(void) = (int(*)(void))GetProcAddress(lib, "main");
fnc();
FreeLibrary(lib);
return 0;
}
有没有我遗漏的东西?
备注:
- 我的第一个问题是 pybind cmake 中的错误,这就是我发布 this bug report
的原因
- 我的问题似乎与 this bug report 中描述的问题相似,但我不确定,我什至不确定这是一个错误
- 问题与 one described here 类似,但我认为我在最小示例中加载解释器的次数不超过一次。我想我已经看到一个 SO question related to the same problem with the same solution (do not load the interpreter than once),但我现在找不到参考。
- 我测试了几个 numpy 版本(从 1.19 到 1.22,从 Ubuntu 存储库安装,从 pip 安装,本地构建),但问题仍然存在。只有未定义的符号改变了(但总是
PyExc_
)
- 在 Ubuntu 18.04 和 Ubuntu 20.04
中使用 python3.6 和 3.8 进行了测试
- 在 pybind 2.6、2.7、2.8.1 上测试
- 我厌倦了 link 到 python 静态库,但它不是用 -fPIC 编译的,因此编译失败...
更新注意事项 (1):这似乎不仅仅与 numpy 相关。如果我导入 decimal
(带有 c-module 组件的标准库数字 class),我会得到类似的错误:
#include "pybind11/embed.h"
namespace py = pybind11;
extern "C" {
int main() {
py::scoped_interpreter guard{};
auto py_module = py::module::import("decimal");
auto version = py_module.attr("__name__");
py::print(version);
return 0;
}
}
给我
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): ImportError: /usr/lib/python3.8/lib-dynload/_contextvars.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyContextVar_Type
At:
/usr/lib/python3.8/contextvars.py(1): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
/usr/lib/python3.8/_pydecimal.py(440): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
/usr/lib/python3.8/decimal.py(8): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
[1] 3095287 abort (core dumped) ./loader
我找到了解决办法。知道它与 numpy 无关,将注意力转移到真正的问题上有很多困难:符号丢失。采纳 this answer 的建议,特别是这一点:
Solve a problem. Load the library found in step 1 by dlopen first (use RTLD_GLOBAL there as well).
我修改了最小示例如下:
// main.cc
#include "pybind11/embed.h"
#include <dlfcn.h>
namespace py = pybind11;
extern "C" {
void * python;
int create() {
python = dlopen("/usr/lib/x86_64-linux-gnu/libpython3.8.so", RTLD_NOW | RTLD_GLOBAL);
return 0;
}
int destroy() {
dlclose(python);
return 0;
}
int main() {
py::scoped_interpreter guard{};
auto py_module = py::module::import("numpy");
auto version = py_module.attr("__version__");
py::print(version);
return 0;
}
}
// load.cc
#include <dlfcn.h>
int main() {
void * lib = dlopen("./libissue.so", RTLD_NOW | RTLD_DEEPBIND);
int(*fnc)(void) = (int(*)(void))dlsym(lib, "main");
int(*create)(void) = (int(*)(void))dlsym(lib, "create");
int(*destroy)(void) = (int(*)(void))dlsym(lib, "destroy");
create();
fnc();
destroy();
dlclose(lib);
return 0;
}
(显然在 cmake 中我必须添加 ${CMAKE_DL_LIBS}
作为 issue
目标的目标 link 库)。
更新(1): 一些已编译的stdlib模块可以看到同样的问题。这与 numpy 无关(我从标题中删除了 numpy 标签和 numpy)
我正在编写一个包含嵌入式 python 解释器的共享 object(即软件插件)。共享 object 启动解释器,解释器导入要执行的 python 模块。如果导入的模块包含 numpy,我会得到一个未定义的符号错误。实际的未定义符号错误在 python 版本或 numpy 版本的函数中发生变化,但它始终是 PyExc_*
系列的结构。
我已将问题简化为这个最小示例(它实际上包含两个文件):
// main.cc
#include "pybind11/embed.h"
namespace py = pybind11;
extern "C" {
int main() {
py::scoped_interpreter guard{};
auto py_module = py::module::import("numpy");
auto version = py_module.attr("__version__");
py::print(version);
return 0;
}
}
// load.cc
#include <dlfcn.h>
int main() {
void * lib = dlopen("./libissue.so", RTLD_NOW);
int(*fnc)(void) = (int(*)(void))dlsym(lib, "main");
fnc();
dlclose(lib);
return 0;
}
我正在使用此 CMakeFile 进行编译:
cmake_minimum_required(VERSION 3.14)
include(FetchContent)
FetchContent_Declare(
pybind11
GIT_REPOSITORY https://github.com/pybind/pybind11
GIT_TAG v2.8.1)
FetchContent_MakeAvailable(pybind11)
project(
pybind_issue
LANGUAGES C CXX
VERSION 1.0.0)
add_library(issue SHARED main.cc)
set_target_properties(issue PROPERTIES
POSITION_INDEPENDENT_CODE ON
CXX_STANDARD 11)
target_link_libraries(issue PRIVATE pybind11::embed)
# also tested with
# target_link_libraries(main PRIVATE mylib pybind11::lto pybind11::embed pybind11::module)
add_executable(issue_main main.cc)
set_target_properties(issue_main PROPERTIES
POSITION_INDEPENDENT_CODE ON
CXX_STANDARD 11)
target_link_libraries(issue_main PRIVATE pybind11::embed)
add_executable(loader load.cc)
target_link_libraries(loader PRIVATE ${CMAKE_DL_LIBS})
这个CMakeFile编译了三个目标:
- 加载解释器、导入 numpy 并打印其版本的可执行文件
- 一个共享 object,它导出一个 C 函数,做同样的事情
- 共享 object 的简单加载程序,它尝试 运行 从共享 object 导出的函数
"main"
。
如果我 运行 issue_main
可执行文件,我会在屏幕上正确显示 numpy 版本。如果我 运行 loader
我得到这个错误:
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): ImportError:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
* The Python version is: Python3.8 from "/usr/bin/python3"
* The NumPy version is: "1.20.3"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError
At:
/usr/local/lib/python3.8/dist-packages/numpy/core/__init__.py(51): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(1050): _handle_fromlist
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(961): _find_and_load_unlocked
irb(main):003:1* module TestMain
=> #<FFI::Function address=0x00007f9d0ba43bb6>
irb(main):008:0>
irb(main):009:0> TestMain.main
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.8 from "/usr/bin/python3"
* The NumPy version is: "1.20.3"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError
At:
/usr/local/lib/python3.8/dist-packages/numpy/core/__init__.py(51): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(1050): _handle_fromlist
/usr/local/lib/python3.8/dist-packages/numpy/__init__.py(145): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(961): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
这个问题在 linux 上是特定的(未在 OSX 上测试),而在 Windows 上一切都按预期进行(代码有所更改,为了完整性报告在这里):
// main.cc
#include "pybind11/embed.h"
namespace py = pybind11;
extern "C" {
__declspec(dllexport) int main() {
py::scoped_interpreter guard{};
auto py_module = py::module::import("numpy");
auto version = py_module.attr("__version__");
py::print(version);
return 0;
}
}
// load.cc
#include <windows.h>
int main() {
HMODULE lib = LoadLibrary("./issue.dll");
int(*fnc)(void) = (int(*)(void))GetProcAddress(lib, "main");
fnc();
FreeLibrary(lib);
return 0;
}
有没有我遗漏的东西?
备注:
- 我的第一个问题是 pybind cmake 中的错误,这就是我发布 this bug report 的原因
- 我的问题似乎与 this bug report 中描述的问题相似,但我不确定,我什至不确定这是一个错误
- 问题与 one described here 类似,但我认为我在最小示例中加载解释器的次数不超过一次。我想我已经看到一个 SO question related to the same problem with the same solution (do not load the interpreter than once),但我现在找不到参考。
- 我测试了几个 numpy 版本(从 1.19 到 1.22,从 Ubuntu 存储库安装,从 pip 安装,本地构建),但问题仍然存在。只有未定义的符号改变了(但总是
PyExc_
) - 在 Ubuntu 18.04 和 Ubuntu 20.04 中使用 python3.6 和 3.8 进行了测试
- 在 pybind 2.6、2.7、2.8.1 上测试
- 我厌倦了 link 到 python 静态库,但它不是用 -fPIC 编译的,因此编译失败...
更新注意事项 (1):这似乎不仅仅与 numpy 相关。如果我导入 decimal
(带有 c-module 组件的标准库数字 class),我会得到类似的错误:
#include "pybind11/embed.h"
namespace py = pybind11;
extern "C" {
int main() {
py::scoped_interpreter guard{};
auto py_module = py::module::import("decimal");
auto version = py_module.attr("__name__");
py::print(version);
return 0;
}
}
给我
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): ImportError: /usr/lib/python3.8/lib-dynload/_contextvars.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyContextVar_Type
At:
/usr/lib/python3.8/contextvars.py(1): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
/usr/lib/python3.8/_pydecimal.py(440): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
/usr/lib/python3.8/decimal.py(8): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
[1] 3095287 abort (core dumped) ./loader
我找到了解决办法。知道它与 numpy 无关,将注意力转移到真正的问题上有很多困难:符号丢失。采纳 this answer 的建议,特别是这一点:
Solve a problem. Load the library found in step 1 by dlopen first (use RTLD_GLOBAL there as well).
我修改了最小示例如下:
// main.cc
#include "pybind11/embed.h"
#include <dlfcn.h>
namespace py = pybind11;
extern "C" {
void * python;
int create() {
python = dlopen("/usr/lib/x86_64-linux-gnu/libpython3.8.so", RTLD_NOW | RTLD_GLOBAL);
return 0;
}
int destroy() {
dlclose(python);
return 0;
}
int main() {
py::scoped_interpreter guard{};
auto py_module = py::module::import("numpy");
auto version = py_module.attr("__version__");
py::print(version);
return 0;
}
}
// load.cc
#include <dlfcn.h>
int main() {
void * lib = dlopen("./libissue.so", RTLD_NOW | RTLD_DEEPBIND);
int(*fnc)(void) = (int(*)(void))dlsym(lib, "main");
int(*create)(void) = (int(*)(void))dlsym(lib, "create");
int(*destroy)(void) = (int(*)(void))dlsym(lib, "destroy");
create();
fnc();
destroy();
dlclose(lib);
return 0;
}
(显然在 cmake 中我必须添加 ${CMAKE_DL_LIBS}
作为 issue
目标的目标 link 库)。