是否可以从设备功能调用 cublas 功能?

Is it possible to call cublas functions from a device function?

here 中,Robert Crovella 说可以从设备代码调用 cublas 例程。虽然我正在使用动态并行并使用 3.5 计算能力进行编译,但我无法设法从设备函数调用 Cublas 例程。我总是收到错误消息“不允许从 设备/全局 函数调用主机函数”我的代码包含调用 CUBLAS 例程的设备函数,如 cublsAlloccublasGetVectorcublasSetVectorcublasDgemm

我的编译和链接命令:

  
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc GPUutil.cu -o ./build/GPUutil.o   
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc DivideParalelo.cu -o ./build/DivideParalelo.o
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -dlink ./build/io.o ./build/GPUutil.o ./build/DivideParalelo.o -lcudadevrt -o ./build/link.o
icc -Wwrite-strings ./build/GPUutil.o ./build/DivideParalelo.o ./build/link.o -lcudadevrt -L/usr/local/cuda/lib64  -L~/Intel/composer_xe_2015.0.090/mkl/lib/intel64  -L~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64  -Wl,--start-group ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_lp64.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_core.a ~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64/libiomp5.a -Wl,--end-group -lpthread  -lm  -lcublas -lcudart   -o DivideParalelo   
 

Here 您可以找到有关 cuBLAS 设备 API 的所有详细信息,例如:

Starting with release 5.0, the CUDA Toolkit now provides a static cuBLAS Library cublas_device.a that contains device routines with the same API as the regular cuBLAS Library. Those routines use internally the Dynamic Parallelism feature to launch kernel from within and thus is only available for device with compute capability at least equal to 3.5.

In order to use those library routines from the device the user must include the header file “cublas_v2.h” corresponding to the new cuBLAS API and link against the static cuBLAS library cublas_device.a.

如果您在通读文档并应用其中描述的所有步骤后仍然遇到问题,请寻求更多帮助。