使用 Pybind11 在 Python 中调用并行 C++ 代码

Question

我有一个与 OpenMP 并行运行的 C++ 代码，执行一些长时间的计算。这部分效果很好。

现在，我正在使用 Python 围绕这段代码制作一个 GUI。所以，我想在我的 python 程序中调用我的 C++ 代码。为此，我使用 Pybind11（但我想如果需要我可以使用其他东西）。

问题是，当从 Python 调用时，我的 C++ 代码串行运行，只有一个 thread/CPU。

我尝试（通过两种方式）了解 pybind11 文档中所做的工作 here 但它似乎根本不起作用。

我的绑定看起来像这样：

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "../cpp/include/myHeader.hpp"
namespace py = pybind11;

PYBIND11_MODULE(my_module, m) {
    m.def("testFunction", &testFunction, py::call_guard<py::gil_scoped_release>());

    m.def("testFunction2", [](inputType input) -> outputType {
        /* Release GIL before calling into (potentially long-running) C++ code */
        py::gil_scoped_release release;
        outputType output =  testFunction(input);
        py::gil_scoped_acquire acquire;

        return output;
    });
}

问题： 这仍然不起作用并且只使用一个线程（我在 omp 并行区域中用 omp_get_num_threads() 的打印验证了这一点）。

问题：我做错了什么？我需要做什么才能在 Python 中使用并行 C++ 代码？

免责声明： 我必须承认我并不真正理解 GIL 的东西，特别是在我不在我的 C++ 代码中使用 Python 的情况下，这理论上确实是“独立”的。我只是希望能够在另一个 (Python) 代码中使用它。

祝你有美好的一天。

编辑：由于 pptaszni 的回答，我已经解决了我的问题。确实，根本不需要 GIL 的东西，我误解了文档。 pptaszni 的代码有效，实际上这是我的 CMake 文件的问题。谢谢。

Answer 1

这不是一个很好的答案（对于评论想法来说太长了），因为我没有重现你的问题，但也许你可以通过尝试这个对我有用的例子来隔离你的代码中的问题：

C++ 代码：

#include "OpenMpExample.hpp"

#include <algorithm>
#include <iostream>
#include <random>
#include <vector>

#include <omp.h>

constexpr int DATA_SIZE = 10000000;

std::vector<int> testFunction()
{
  int nthreads = 0, tid = 0;
  std::vector<std::vector<int> > data;
  std::vector<int> results;
  std::random_device rnd_device;
  std::mt19937 mersenne_engine {rnd_device()};
  std::uniform_int_distribution<int> dist {-10, 10};
  auto gen = [&dist, &mersenne_engine](){ return dist(mersenne_engine); };

  #pragma omp parallel private(tid)
  {
    tid = omp_get_thread_num();
    if (tid == 0)
    {
      nthreads = omp_get_num_threads();
      std::cout << "Num threads: " << nthreads << std::endl;
      data.resize(nthreads);
      results.resize(nthreads);
    }
  }
  
  #pragma omp parallel private(tid) shared(data, gen)
  {
    tid = omp_get_thread_num();
    data[tid].resize(DATA_SIZE);
    std::generate(data[tid].begin(), data[tid].end(), gen);
  }
  #pragma omp parallel private(tid) shared(data, results)
  {
    tid = omp_get_thread_num();
    results[tid] = std::accumulate(data[tid].begin(), data[tid].end(), 0);
  }
  for (auto r : results)
  {
    std::cout << r << ", ";
  }
  std::cout << std::endl;
  return results;
}

我试图使代码简短，但同时强制机器实际进行一些计算。每个线程生成 10^7 个随机整数，然后将它们相加。那么 python 绑定甚至不需要 gil_scoped_release:

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "OpenMpExample.hpp"
namespace py = pybind11;

// both versions work for me
// PYBIND11_MODULE(mylib, m) {
//     m.def("testFunction", &testFunction, py::call_guard<py::gil_scoped_release>());
// }

PYBIND11_MODULE(mylib, m) {
    m.def("testFunction", &testFunction);
}

python 的示例输出：

Python 3.6.8 (default, Jun 29 2020, 16:38:14) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mylib
>>> x = mylib.testFunction()
Num threads: 12
-10975, -22101, -11333, -28603, -471, -15505, -18141, 2887, -6813, -5328, -13975, -4321,

我的环境：Ubuntu 18.04.3 LTS，gcc 8.4.0，openMP 201511，python 3.6.8；

使用 Pybind11 在 Python 中调用并行 C++ 代码

Calling parallel C++ code in Python using Pybind11

c++

python

openmp

pybind11