CUDA 运行时的当前设备和驱动程序上下文堆栈如何交互?
How do the CUDA Runtime's current device and the driver context stack interact?
CUDA 运行时有一个“当前设备”的概念,而 CUDA 驱动程序没有。相反,驱动程序有一个上下文堆栈,其中“当前上下文”位于堆栈的顶部。
两者如何互动?也就是说,驱动程序 API 调用如何影响运行时 API 的当前设备,以及更改当前设备如何影响驱动程序 API 的上下文堆栈或其他状态?
有点相关的问题:
运行时当前设备 -> 驱动程序上下文堆栈
如果您设置当前设备(cudaSetDevice()
),则所选设备的主要上下文将放在堆栈的顶部。
- 如果堆栈空,它被压到堆栈上。
- 如果堆栈非空,它替换堆栈的顶部。
驱动程序上下文堆栈 -> 运行时当前设备
(这部分我不是100%确定的,所以持保留态度。)
运行时会将当前设备报告为当前上下文的设备 - 无论它是否是主上下文。
如果上下文堆栈为空,则运行时的当前设备将报告为 0。
说明此行为的程序:
#include <cuda/api.hpp>
#include <iostream>
void report_current_device()
{
std::cout << "Runtime believes the current device index is: "
<< cuda::device::current::detail_::get_id() << '\n';
}
int main()
{
namespace context = cuda::context::detail_;
namespace cur_dev = cuda::device::current::detail_;
namespace pc = cuda::device::primary_context::detail_;
namespace cur_ctx = cuda::context::current::detail_;
using std::cout;
cuda::device::id_t dev_idx[2];
cuda::context::handle_t pc_handle[2];
cuda::initialize_driver();
dev_idx[0] = cur_dev::get_id();
report_current_device();
dev_idx[1] = (dev_idx[0] == 0) ? 1 : 0;
pc_handle[0] = pc::obtain_and_increase_refcount(dev_idx[0]);
cout << "Obtained primary context handle for device " << dev_idx[0]<< '\n';
pc_handle[1] = pc::obtain_and_increase_refcount(dev_idx[1]);
cout << "Obtained primary context handle for device " << dev_idx[1]<< '\n';
report_current_device();
cur_ctx::push(pc_handle[1]);
cout << "Pushed primary context handle for device " << dev_idx[1] << " onto the stack\n";
report_current_device();
auto ctx = context::create_and_push(dev_idx[0]);
cout << "Created a new context for device " << dev_idx[0] << " and pushed it onto the stack\n";
report_current_device();
cur_ctx::push(ctx);
cout << "Pushed primary context handle for device " << dev_idx[0] << " onto the stack\n";
report_current_device();
cur_ctx::push(pc_handle[1]);
cout << "Pushed primary context for device " << dev_idx[1] << " onto the stack\n";
report_current_device();
pc::decrease_refcount(dev_idx[1]);
cout << "Deactivated/destroyed primary context for device " << dev_idx[1] << '\n';
report_current_device();
}
... 结果是:
Runtime believes the current device index is: 0
Obtained primary context handle for device 0
Obtained primary context handle for device 1
Runtime believes the current device index is: 0
Pushed primary context handle for device 1 onto the stack
Runtime believes the current device index is: 1
Created a new context for device 0 and pushed it onto the stack
Runtime believes the current device index is: 0
Pushed primary context handle for device 0 onto the stack
Runtime believes the current device index is: 0
Pushed primary context for device 1 onto the stack
Runtime believes the current device index is: 1
Deactivated/destroyed primary context for device 1
Runtime believes the current device index is: 1
程序使用this library of mine.
CUDA 运行时有一个“当前设备”的概念,而 CUDA 驱动程序没有。相反,驱动程序有一个上下文堆栈,其中“当前上下文”位于堆栈的顶部。
两者如何互动?也就是说,驱动程序 API 调用如何影响运行时 API 的当前设备,以及更改当前设备如何影响驱动程序 API 的上下文堆栈或其他状态?
有点相关的问题:
运行时当前设备 -> 驱动程序上下文堆栈
如果您设置当前设备(cudaSetDevice()
),则所选设备的主要上下文将放在堆栈的顶部。
- 如果堆栈空,它被压到堆栈上。
- 如果堆栈非空,它替换堆栈的顶部。
驱动程序上下文堆栈 -> 运行时当前设备
(这部分我不是100%确定的,所以持保留态度。)
运行时会将当前设备报告为当前上下文的设备 - 无论它是否是主上下文。
如果上下文堆栈为空,则运行时的当前设备将报告为 0。
说明此行为的程序:
#include <cuda/api.hpp>
#include <iostream>
void report_current_device()
{
std::cout << "Runtime believes the current device index is: "
<< cuda::device::current::detail_::get_id() << '\n';
}
int main()
{
namespace context = cuda::context::detail_;
namespace cur_dev = cuda::device::current::detail_;
namespace pc = cuda::device::primary_context::detail_;
namespace cur_ctx = cuda::context::current::detail_;
using std::cout;
cuda::device::id_t dev_idx[2];
cuda::context::handle_t pc_handle[2];
cuda::initialize_driver();
dev_idx[0] = cur_dev::get_id();
report_current_device();
dev_idx[1] = (dev_idx[0] == 0) ? 1 : 0;
pc_handle[0] = pc::obtain_and_increase_refcount(dev_idx[0]);
cout << "Obtained primary context handle for device " << dev_idx[0]<< '\n';
pc_handle[1] = pc::obtain_and_increase_refcount(dev_idx[1]);
cout << "Obtained primary context handle for device " << dev_idx[1]<< '\n';
report_current_device();
cur_ctx::push(pc_handle[1]);
cout << "Pushed primary context handle for device " << dev_idx[1] << " onto the stack\n";
report_current_device();
auto ctx = context::create_and_push(dev_idx[0]);
cout << "Created a new context for device " << dev_idx[0] << " and pushed it onto the stack\n";
report_current_device();
cur_ctx::push(ctx);
cout << "Pushed primary context handle for device " << dev_idx[0] << " onto the stack\n";
report_current_device();
cur_ctx::push(pc_handle[1]);
cout << "Pushed primary context for device " << dev_idx[1] << " onto the stack\n";
report_current_device();
pc::decrease_refcount(dev_idx[1]);
cout << "Deactivated/destroyed primary context for device " << dev_idx[1] << '\n';
report_current_device();
}
... 结果是:
Runtime believes the current device index is: 0
Obtained primary context handle for device 0
Obtained primary context handle for device 1
Runtime believes the current device index is: 0
Pushed primary context handle for device 1 onto the stack
Runtime believes the current device index is: 1
Created a new context for device 0 and pushed it onto the stack
Runtime believes the current device index is: 0
Pushed primary context handle for device 0 onto the stack
Runtime believes the current device index is: 0
Pushed primary context for device 1 onto the stack
Runtime believes the current device index is: 1
Deactivated/destroyed primary context for device 1
Runtime believes the current device index is: 1
程序使用this library of mine.