使用 boost.python 在 python 模块中未正确发布静态 openCL class

Question

编辑： 好吧，所有的编辑都让问题的布局有点混乱所以我会尝试重写问题（不改变内容，但改进它的结构） .

问题简述

我有一个运行良好的 openCL 程序，如果我将它编译为可执行文件。现在我尝试使用 boost.python 从 Python 调用它。但是，一旦我退出 Python（在导入我的模块之后），python 就会崩溃。

原因好像跟

有关

statically storing only GPU CommandQueues and their release mechanism when the program terminates

MWE 和设置

设置

IDE 使用：Visual Studio 2015
OS 使用：Windows 7 64bit
Python版本：3.5
AMD OpenCL APP 3.0 headers
cl2.hpp 直接来自 Khronos，如这里所建议：
我还有一个 Intel CPU 集成显卡硬件，没有其他专用显卡
我使用编译为64位版本的boost库1.60版本
我使用的boost dll叫做：boost_python-vc140-mt-1_60.dll
没有python的openCL程序工作正常
没有 openCL 的 python 模块工作正常

MWE

#include <vector>

#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 200
#define CL_HPP_MINIMUM_OPENCL_VERSION 200 // I have the same issue for 100 and 110
#include "cl2.hpp"
#include <boost/python.hpp>

using namespace std;

class TestClass
{
private:
    std::vector<cl::CommandQueue> queues;
    TestClass();

public:
    static const TestClass& getInstance()
    {
        static TestClass instance;
        return instance;
    }
};

TestClass::TestClass()
{
    std::vector<cl::Device> devices;
    vector<cl::Platform> platforms;

    cl::Platform::get(&platforms);

    //remove non 2.0 platforms (as suggested by doqtor)
    platforms.erase(
        std::remove_if(platforms.begin(), platforms.end(),
            [](const cl::Platform& platform)
    {
        int v = cl::detail::getPlatformVersion(platform());
        short version_major = v >> 16;
        return !(version_major >= 2);
    }),
        platforms.end());

    //Get all available GPUs
    for (const cl::Platform& pl : platforms)
    {
        vector<cl::Device> plDevices;
        try {
            pl.getDevices(CL_DEVICE_TYPE_GPU, &plDevices);
        }
        catch (cl::Error&)
        {

            // Doesn't matter. No GPU is available on the current machine for 
            // this platform. Just check afterwards, that you have at least one
            // device
            continue;
        }       
        devices.insert(end(devices), begin(plDevices), end(plDevices));
    }

    cl::Context context(devices[0]);
    cl::CommandQueue queue(context, devices[0]);

    queues.push_back(queue);
}

int main()
{
    TestClass::getInstance();

    return 0;
}

BOOST_PYTHON_MODULE(FrameWork)
{
    TestClass::getInstance();
}

调用程序

所以在将程序编译为 dll 之后，我开始 python 和运行下面的程序

import FrameWork
exit()

虽然导入没有问题，但 python 在 exit() 上崩溃了。因此，我单击调试，Visual Studio 告诉我以下代码部分（在 cl2.hpp 中）出现异常：

template <>
struct ReferenceHandler<cl_command_queue>
{
    static cl_int retain(cl_command_queue queue)
    { return ::clRetainCommandQueue(queue); }
    static cl_int release(cl_command_queue queue)  //  --  HERE  --
    { return ::clReleaseCommandQueue(queue); }
};

如果您将上述代码编译为简单的可执行文件，它可以正常运行。如果满足以下条件之一，代码也可以工作：

CL_DEVICE_TYPE_GPU 替换为 CL_DEVICE_TYPE_ALL
行 queues.push_back(queue) 已删除

问题

那么这可能是什么原因以及可能的解决方案是什么？我怀疑这与我的测试类是静态的这一事实有关，但由于它与可执行文件一起工作，我不知道是什么原因造成的。

Answer 1

我以前遇到过类似的问题。

clRetain* 函数从 OpenCL1.2 开始支持。当为第一个 GPU 平台（platforms[0].getDevices(...) for CL_DEVICE_TYPE_GPU）获取设备时，在您的情况下，它必须恰好是 OpenCL1.2 之前的平台，因此您会遇到崩溃。当获得任何类型的设备时 (GPU/CPU/...)，您的第一个平台更改为 OpenCL1.2+，一切正常。

修复问题集：

#define CL_HPP_MINIMUM_OPENCL_VERSION 110

这将确保不会为不受支持的平台（OpenCL 1.2 之前）调用 clRetain*

更新：我认为 cl2.hpp 中存在一个错误，尽管将最低 OpenCL 版本设置为 1.1，它仍然尝试在 clRetain* 上使用 OpenCL1.2 创建命令队列时的设备。将最低 OpenCL 版本设置为 110，版本过滤对我来说工作正常。

完整的工作示例：

#include "stdafx.h"
#include <vector>

#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 200
#define CL_HPP_MINIMUM_OPENCL_VERSION 110
#include <CL/cl2.hpp>

using namespace std;

class TestClass
{
private:
    std::vector<cl::CommandQueue> queues;
    TestClass();

public:
    static const TestClass& getInstance()
    {
        static TestClass instance;
        return instance;
    }
};

TestClass::TestClass()
{
    std::vector<cl::Device> devices;
    vector<cl::Platform> platforms;

    cl::Platform::get(&platforms);

    size_t x = 0;
    for (; x < platforms.size(); ++x)
    {
        cl::Platform &p = platforms[x];
        int v = cl::detail::getPlatformVersion(p());
        short version_major = v >> 16;
        if (version_major >= 2) // OpenCL 2.x
            break;
    }
    if (x == platforms.size())
        return; // no OpenCL 2.0 platform available

    platforms[x].getDevices(CL_DEVICE_TYPE_GPU, &devices); 
    cl::Context context(devices);
    cl::CommandQueue queue(context, devices[0]);

    queues.push_back(queue); 
}

int main()
{
    TestClass::getInstance();
    return 0;
}

更新 2:

So what could be the reason for this and what are possible solutions? I suspect it has something to do with the fact that my testclass is static, but since it works with the executable I am at a loss what is causing it.

TestClass 静态似乎是一个原因。当运行从 python 时，看起来释放内存的顺序是错误的。要解决此问题，您可能需要添加一个方法，该方法必须在 python 开始释放内存之前显式调用以释放 opencl 对象。

static TestClass& getInstance() // <- const removed
{
    static TestClass instance;
    return instance;
}

void release()
{
    queues.clear();
}

BOOST_PYTHON_MODULE(FrameWork)
{
    TestClass::getInstance();
    TestClass::getInstance().release();
}

Answer 2

"I would appreciate an answer that explains to me what the problem actually is and if there are ways to fix it."

首先，让我说 doqtor 已经回答了如何解决这个问题——通过确保所有使用的 OpenCL 资源的明确定义的销毁时间。 IMO，这不是 "hack"，而是正确的做法。试图依靠静态 init/cleanup 魔法来做正确的事——而眼睁睁地看着它失败——才是真正的黑客！

其次，对问题的一些思考：实际问题比常见的静态初始化顺序惨败故事还要复杂。它涉及 DLL loading/unloading 顺序，与 python 在运行时加载您的自定义 dll 以及（更重要的）与 OpenCL 的可安装客户端驱动程序 (ICD) 模型有关。

当运行一个使用 OpenCL 的 application/dll 时涉及哪些 DLL？对于应用程序，唯一相关的 DLL 是您 link 反对的 opencl.dll。它在应用程序启动期间加载到进程内存中（或者当需要 opencl 的自定义 DLL 在 python 中动态加载时）。然后，当您第一次在代码中调用 clGetPlatformInfo() 或类似代码时，ICD 逻辑开始：opencl.dll 将查找已安装的驱动程序（在 windows 中，这些在注册表中的某处提到）并动态加载它们各自的 dll（使用诸如 LoadLibrary() 系统调用之类的东西）。那可能是例如nvopencl.dll 用于 nvidia，或用于您安装的英特尔驱动程序的其他 dll。现在，与相对简单的 opencl.dll 相比，这个 ICD dll 可以并且将会有大量的依赖关系——可能使用 Intel IPP，或 TBB，或其他。所以到现在，事情已经变得很乱了。

现在，在关闭期间，windows 加载程序必须决定以何种顺序卸载哪些 dll。当您在单个可执行文件中编译示例时，loaded/unloaded 的 dll 的数量和顺序肯定与 "python loads your custom dll at runtime" 场景中的不同。这很可能就是为什么您只在后一种情况下遇到问题的原因，并且只有当您在关闭自定义 dll 期间仍然有一个 opencl-context+commandqueue 时。队列的销毁（通过 clRelease 触发...在测试类实例的静态销毁期间）被委托给 intel-icd-dll，因此该 dll 必须在那时仍然可以正常运行。如果出于某种原因，情况并非如此（可能是因为加载程序选择卸载它或它需要的 dll 之一），你就会崩溃。

这个思路让我想起了这篇文章：

https://blogs.msdn.microsoft.com/larryosterman/2004/06/10/dll_process_detach-is-the-last-thing-my-dlls-going-to-see-right/

有一段，说的是"COM objects"，可能同样适用于"OpenCL resources"：

"So consider the case where you have a DLL that instantiates a COM object at some point during its lifetime. If that DLL keeps a reference to the COM object in a global variable, and doesn’t release the COM object until the DLL_PROCESS_DETACH, then the DLL that implements the COM object will be kept in memory during the lifetime of the COM object. Effectively the DLL implementing the COM object has become dependant on the DLL that holds the reference to the COM object. But the loader has no way of knowing about this dependency. All it knows is that the DLL’s are loaded into memory."

现在，我写了很多字，但没有得出确切的证据来证明到底出了什么问题。我从这些错误中吸取的主要教训是：不要进入那个蛇坑，并像 doqtor 建议的那样在定义明确的地方进行资源清理。晚安

使用 boost.python 在 python 模块中未正确发布静态 openCL class

static openCL class not properly released in python module using boost.python

c++

python

boost

opencl

boost-python

问题简述

MWE 和设置

设置

MWE

调用程序

问题