Import c-modules from embedded Python interpreter (pybind11) in a shared object raises an undefined symbol exception

Import c-modules from embedded Python interpreter (pybind11) in a shared object raises an undefined symbol exception

更新(1): 一些已编译的stdlib模块可以看到同样的问题。这与 numpy 无关(我从标题中删除了 numpy 标签和 numpy)

我正在编写一个包含嵌入式 python 解释器的共享 object(即软件插件)。共享 object 启动解释器,解释器导入要执行的 python 模块。如果导入的模块包含 numpy,我会得到一个未定义的符号错误。实际的未定义符号错误在 python 版本或 numpy 版本的函数中发生变化,但它始终是 PyExc_* 系列的结构。

我已将问题简化为这个最小示例(它实际上包含两个文件):

// main.cc
#include "pybind11/embed.h"
namespace py = pybind11;

extern "C" {
int main() {
  py::scoped_interpreter guard{};
  auto py_module = py::module::import("numpy");
  auto version   = py_module.attr("__version__");
  py::print(version);
  return 0;
}
}

// load.cc
#include <dlfcn.h>

int main() {
  void * lib = dlopen("./libissue.so", RTLD_NOW);
  int(*fnc)(void) = (int(*)(void))dlsym(lib, "main");
  fnc();
  dlclose(lib);
  return 0;
}

我正在使用此 CMakeFile 进行编译:

cmake_minimum_required(VERSION 3.14)

include(FetchContent)
FetchContent_Declare(
  pybind11
  GIT_REPOSITORY https://github.com/pybind/pybind11
  GIT_TAG v2.8.1)
FetchContent_MakeAvailable(pybind11)

project(
  pybind_issue
  LANGUAGES C CXX
  VERSION 1.0.0)

add_library(issue SHARED main.cc)
set_target_properties(issue PROPERTIES 
  POSITION_INDEPENDENT_CODE ON 
  CXX_STANDARD 11)
target_link_libraries(issue PRIVATE pybind11::embed)
# also tested with
# target_link_libraries(main PRIVATE mylib pybind11::lto pybind11::embed pybind11::module)

add_executable(issue_main main.cc)
set_target_properties(issue_main PROPERTIES 
  POSITION_INDEPENDENT_CODE ON
  CXX_STANDARD 11)
target_link_libraries(issue_main PRIVATE pybind11::embed)

add_executable(loader load.cc)
target_link_libraries(loader PRIVATE ${CMAKE_DL_LIBS})

这个CMakeFile编译了三个目标:

如果我 运行 issue_main 可执行文件,我会在屏幕上正确显示 numpy 版本。如果我 运行 loader 我得到这个错误:

terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ImportError: 


    https://numpy.org/devdocs/user/troubleshooting-importerror.html

  * The Python version is: Python3.8 from "/usr/bin/python3"
  * The NumPy version is: "1.20.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError


At:
  /usr/local/lib/python3.8/dist-packages/numpy/core/__init__.py(51): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(1050): _handle_fromlist
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(961): _find_and_load_unlocked

irb(main):003:1* module TestMain
=> #<FFI::Function address=0x00007f9d0ba43bb6>
irb(main):008:0> 
irb(main):009:0> TestMain.main
terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.8 from "/usr/bin/python3"
  * The NumPy version is: "1.20.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError


At:
  /usr/local/lib/python3.8/dist-packages/numpy/core/__init__.py(51): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(1050): _handle_fromlist
  /usr/local/lib/python3.8/dist-packages/numpy/__init__.py(145): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(961): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load

这个问题在 linux 上是特定的(未在 OSX 上测试),而在 Windows 上一切都按预期进行(代码有所更改,为了完整性报告在这里):

// main.cc
#include "pybind11/embed.h"
namespace py = pybind11;

extern "C" {
__declspec(dllexport) int main() {
  py::scoped_interpreter guard{};
  auto py_module = py::module::import("numpy");
  auto version   = py_module.attr("__version__");
  py::print(version);
  return 0;
}
}
// load.cc
#include <windows.h>

int main() {
  HMODULE lib = LoadLibrary("./issue.dll");
  int(*fnc)(void) = (int(*)(void))GetProcAddress(lib, "main");
  fnc();
  FreeLibrary(lib);
  return 0;
}

有没有我遗漏的东西?

备注:

更新注意事项 (1):这似乎不仅仅与 numpy 相关。如果我导入 decimal(带有 c-module 组件的标准库数字 class),我会得到类似的错误:

#include "pybind11/embed.h"
namespace py = pybind11;

extern "C" {
int main() {
  py::scoped_interpreter guard{};
  auto py_module = py::module::import("decimal");
  auto version   = py_module.attr("__name__");
  py::print(version);
  return 0;
}
}

给我

terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ImportError: /usr/lib/python3.8/lib-dynload/_contextvars.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyContextVar_Type

At:
  /usr/lib/python3.8/contextvars.py(1): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  /usr/lib/python3.8/_pydecimal.py(440): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  /usr/lib/python3.8/decimal.py(8): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load

[1]    3095287 abort (core dumped)  ./loader

我找到了解决办法。知道它与 numpy 无关,将注意力转移到真正的问题上有很多困难:符号丢失。采纳 this answer 的建议,特别是这一点:

Solve a problem. Load the library found in step 1 by dlopen first (use RTLD_GLOBAL there as well).

我修改了最小示例如下:

// main.cc
#include "pybind11/embed.h"
#include <dlfcn.h>
namespace py = pybind11;

extern "C" {
void * python;

int create() {
  python = dlopen("/usr/lib/x86_64-linux-gnu/libpython3.8.so", RTLD_NOW | RTLD_GLOBAL);
  return 0;
}

int destroy() {
  dlclose(python);
  return 0;
}

int main() {
  py::scoped_interpreter guard{};
  auto py_module = py::module::import("numpy");
  auto version   = py_module.attr("__version__");
  py::print(version);
  return 0;
}
}
// load.cc
#include <dlfcn.h>

int main() {
  void * lib = dlopen("./libissue.so", RTLD_NOW | RTLD_DEEPBIND);
  int(*fnc)(void) = (int(*)(void))dlsym(lib, "main");
  int(*create)(void) = (int(*)(void))dlsym(lib, "create");
  int(*destroy)(void) = (int(*)(void))dlsym(lib, "destroy");
  create();
  fnc();
  destroy();
  dlclose(lib);
  return 0;
}

(显然在 cmake 中我必须添加 ${CMAKE_DL_LIBS} 作为 issue 目标的目标 link 库)。