openACC编译代码,cuStreamSynchronize返回错误700

openACC compiled code, cuStreamSynchronize returned error 700

我已经用简单的 openACC 导数编译了一个程序。编译很好,没有错误。但是,当我 运行 程序时,出现一般性 "call to cuStreamSynchronize returned error 700: Illegal address during kernel execution" 错误。

我运行 cuda-memcheck 并得到以下错误。有没有人可以帮我找出问题所在?

========= CUDA-MEMCHECK
simpleGridingRatio: 300
========= Invalid __global__ read of size 4
=========     at 0x000007a8 in /home/forwardSolver/ChannelCppSolver.h:135:void linearDiscretization_135_gpu<double>(caseProp<double>&, std::vector<double, std::allocator<double>>&, std::vector<double, std::allocator<double>>&, std::vector<double, std::allocator<double>>&, std::vector<double, std::allocator<double>>&)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7ffca4f9a7b0 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x2fe) [0x28187e]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch3 + 0x1d59) [0x1a64a]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so [0x1b392]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch + 0x13a) [0x1b4ce]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccg.so (__pgi_uacc_launch + 0x1ff) [0x18f92]
=========     Host Frame:./ChannelCppProposal [0x2ffd5]
=========     Host Frame:./ChannelCppProposal [0x2dfe4]
=========     Host Frame:./ChannelCppProposal [0x2dd77]
=========     Host Frame:./ChannelCppProposal [0x2fcc5]
=========     Host Frame:./ChannelCppProposal [0x2eaf7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./ChannelCppProposal [0x65fa]
=========
========= Program hit CUDA_ERROR_LAUNCH_FAILED (error 719) due to "unspecified launch failure" on CUDA API call to cuStreamSynchronize. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuStreamSynchronize + 0x165) [0x281355]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch3 + 0x20c9) [0x1a9ba]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so [0x1b392]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch + 0x13a) [0x1b4ce]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccg.so (__pgi_uacc_launch + 0x1ff) [0x18f92]
=========     Host Frame:./ChannelCppProposal [0x2ffd5]
=========     Host Frame:./ChannelCppProposal [0x2dfe4]
=========     Host Frame:./ChannelCppProposal [0x2dd77]
=========     Host Frame:./ChannelCppProposal [0x2fcc5]
=========     Host Frame:./ChannelCppProposal [0x2eaf7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./ChannelCppProposal [0x65fa]
=========
Failing in Thread:1
========= Program hit CUDA_ERROR_LAUNCH_FAILED (error 719) due to "unspecified launch failure" on CUDA API call to cuCtxSynchronize. 
=========     Saved host backtrace up to driver entry point at error
call to cuStreamSynchronize returned error 719: Launch failed (often invalid pointer dereference)

=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuCtxSynchronize + 0x152) [0x258c22]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_error_handler + 0x258) [0xef30]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch3 + 0x20ec) [0x1a9dd]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so [0x1b392]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch + 0x13a) [0x1b4ce]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccg.so (__pgi_uacc_launch + 0x1ff) [0x18f92]
=========     Host Frame:./ChannelCppProposal [0x2ffd5]
=========     Host Frame:./ChannelCppProposal [0x2dfe4]
=========     Host Frame:./ChannelCppProposal [0x2dd77]
=========     Host Frame:./ChannelCppProposal [0x2fcc5]
=========     Host Frame:./ChannelCppProposal [0x2eaf7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./ChannelCppProposal [0x65fa]
=========
========= ERROR SUMMARY: 3 errors

"Illegal address during kernel execution" 类似于主机上使用错误地址的分段违规 (segv)。

虽然我不能确定,但​​ "Address 0x7ffca4f9a7b0" 在我看来是一个主机地址。

另外,从 linearDiscretization_135_gpu 的签名来看,您似乎在代码中使用了向量。您如何管理这些向量的数据?向量是一个带有三个指针的不透明 class。鉴于 OpenACC 数据区域执行浅拷贝,如果您在数据子句中包含向量,则只会复制指针,而不是它们指向的数据。因此,如果我对主机地址的看法是正确的,一个可能的原因是您正在复制一个向量,该向量复制了主机指针地址,这会导致设备出现非法地址错误。

对于向量,您需要执行手动深度复制,或者如果您使用的是 PGI,请尝试使用“-ta=tesla:managed”进行编译,以便使用 CUDA 统一内存。然后,使用的向量指针将成为主机和设备均可访问的统一地址。

当然这纯属猜测,因此您可能需要做更多调查。您可以尝试设置环境变量 PGI_ACC_DEBUG=1(对于 PGI)或 CRAY_ACC_DEBUG=1(对于 Cray)以使运行时打印详细信息。不确定 GNU 是否为其 OpenACC 实现提供等效的环境变量。

如果您需要更多帮助调查,请提供一个小的重现示例,我们可以看看是否可以确定问题所在。