当检测到 Cuda API 错误时,如何查找程序崩溃的位置:cudaMemcpy 返回 (0xb)

How to find where does program crashed when Cuda API error detected: cudaMemcpy returned (0xb)

我正在调试 cuda 程序并收到以下警告:

warning: Cuda API error detected: cudaMemcpy returned (0xb)

warning: Cuda API error detected: cudaMemcpy returned (0xb)

warning: Cuda API error detected: cudaGetLastError returned (0xb)

Error in kernel
GPUassert: invalid argument

当我在 cuda-gdb 中输入 "where" 时,它显示 "no stack."

(cuda-gdb) where
No stack.

如何找到我的程序崩溃的地方?

在这里找到答案:http://on-demand.gputechconf.com/gtc/2012/presentations/S0027A-Monday-Debugging-Experience-CUDA.pdf @ 第 27 页。

您首先需要:

(cuda-gdb) set cuda api_failures stop

然后当错误发生时,它会停止:

Cuda API error detected: cudaMemcpy returned (0xb)
(cuda-gdb) where
#0  0x00007fffea6a06d0 in cudbgReportDriverApiError () from       /usr/lib64/nvidia/libcuda.so.1
#1  0x00007fffea6a2c36 in cudbgReportDriverInternalError () from /usr/lib64/nvidia/libcuda.so.1
#2  0x00007fffea6eed93 in cudbgGetAPIVersion () from /usr/lib64/nvidia/libcuda.so.1
...