CUDA 运行时的当前设备和驱动程序上下文堆栈如何交互?

How do the CUDA Runtime's current device and the driver context stack interact?

CUDA 运行时有一个“当前设备”的概念,而 CUDA 驱动程序没有。相反,驱动程序有一个上下文堆栈,其中“当前上下文”位于堆栈的顶部。

两者如何互动?也就是说,驱动程序 API 调用如何影响运行时 API 的当前设备,以及更改当前设备如何影响驱动程序 API 的上下文堆栈或其他状态?

有点相关的问题:

运行时当前设备 -> 驱动程序上下文堆栈

如果您设置当前设备(cudaSetDevice()),则所选设备的主要上下文将放在堆栈的顶部。

  • 如果堆栈,它被压到堆栈上。
  • 如果堆栈非空,它替换堆栈的顶部。

驱动程序上下文堆栈 -> 运行时当前设备

(这部分我不是100%确定的,所以持保留态度。)

运行时会将当前设备报告为当前上下文的设备 - 无论它是否是主上下文。

如果上下文堆栈为空,则运行时的当前设备将报告为 0。

说明此行为的程序:

#include <cuda/api.hpp>
#include <iostream>

void report_current_device()
{
    std::cout << "Runtime believes the current device index is: "
        << cuda::device::current::detail_::get_id() << '\n';
}

int main()
{
    namespace context = cuda::context::detail_;
    namespace cur_dev = cuda::device::current::detail_;
    namespace pc = cuda::device::primary_context::detail_;
    namespace cur_ctx = cuda::context::current::detail_;
    using std::cout;

    cuda::device::id_t dev_idx[2];
    cuda::context::handle_t pc_handle[2];
    
    cuda::initialize_driver();
    dev_idx[0] = cur_dev::get_id();
    report_current_device();
    dev_idx[1] = (dev_idx[0] == 0) ? 1 : 0;
    pc_handle[0] = pc::obtain_and_increase_refcount(dev_idx[0]);
    cout << "Obtained primary context handle for device " << dev_idx[0]<< '\n';
    pc_handle[1] = pc::obtain_and_increase_refcount(dev_idx[1]);
    cout << "Obtained primary context handle for device " << dev_idx[1]<< '\n';
    report_current_device();
    cur_ctx::push(pc_handle[1]);
    cout << "Pushed primary context handle for device " << dev_idx[1] << " onto the stack\n";
    report_current_device();
    auto ctx = context::create_and_push(dev_idx[0]);
    cout << "Created a new context for device " << dev_idx[0] << " and pushed it onto the stack\n";
    report_current_device();
    cur_ctx::push(ctx);
    cout << "Pushed primary context handle for device " << dev_idx[0] << " onto the stack\n";
    report_current_device();
    cur_ctx::push(pc_handle[1]);
    cout << "Pushed primary context for device " << dev_idx[1] << " onto the stack\n";
    report_current_device();
    pc::decrease_refcount(dev_idx[1]);
    cout << "Deactivated/destroyed primary context for device " << dev_idx[1] << '\n';
    report_current_device();
}

... 结果是:

Runtime believes the current device index is: 0
Obtained primary context handle for device 0
Obtained primary context handle for device 1
Runtime believes the current device index is: 0
Pushed primary context handle for device 1 onto the stack
Runtime believes the current device index is: 1
Created a new context for device 0 and pushed it onto the stack
Runtime believes the current device index is: 0
Pushed primary context handle for device 0 onto the stack
Runtime believes the current device index is: 0
Pushed primary context for device 1 onto the stack
Runtime believes the current device index is: 1
Deactivated/destroyed primary context for device 1
Runtime believes the current device index is: 1

程序使用this library of mine.