CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE in Python

CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE in Python

我在尝试使用 CUDA 运行 Python 中的此代码时遇到此错误。我正在学习本教程,但我正在 Windows 7 x64 机器上尝试。

https://www.youtube.com/watch?v=jKV1m8APttU

事实上,我 运行 check_cuda() 并且所有测试都通过了。任何人都可以帮助我这里的确切问题是什么。

我的代码:

import numpy as np
from timeit import default_timer as timer
from numbapro import vectorize, cuda

@vectorize(['float64(float64, float64)'], target='gpu')
def VectorAdd(a, b):
    return a + b

def main():
    N = 32000000

A = np.ones(N, dtype=np.float64)
B = np.ones(N, dtype=np.float64)
C = np.zeros(N, dtype=np.float64)

start = timer()
C = VectorAdd(A, B)
vectoradd_time = timer() - start

print("C[:5] = " + str(C[:5]))
print("C[-5:] = " + str(C[-5:]))

print("VectorAdd took %f seconds" % vectoradd_time)

if __name__ == '__main__':
    main()

错误信息:

---------------------------------------------------------------------------
CudaAPIError                              Traceback (most recent call last)
<ipython-input-18-2436fc2ab63a> in <module>()
      1 if __name__ == '__main__':
----> 2     main()

<ipython-input-17-64de53fdbe77> in main()
      7 
      8     start = timer()
----> 9     C = VectorAdd(A, B)
     10     vectoradd_time = timer() - start
     11 

C:\Anaconda2\lib\site-packages\numba\cuda\dispatcher.pyc in __call__(self, *args, **kws)
     93                       the input arguments.
     94         """
---> 95         return CUDAUFuncMechanism.call(self.functions, args, kws)
     96 
     97     def reduce(self, arg, stream=0):

C:\Anaconda2\lib\site-packages\numba\npyufunc\deviceufunc.pyc in call(cls, typemap, args, kws)
    297 
    298             devarys.extend([devout])
--> 299             cr.launch(func, shape[0], stream, devarys)
    300 
    301             if any_device:

C:\Anaconda2\lib\site-packages\numba\cuda\dispatcher.pyc in launch(self, func, count, stream, args)
    202 
    203     def launch(self, func, count, stream, args):
--> 204         func.forall(count, stream=stream)(*args)
    205 
    206     def is_device_array(self, obj):

C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in __call__(self, *args)
    193 
    194         return kernel.configure(blkct, tpb, stream=self.stream,
--> 195                                 sharedmem=self.sharedmem)(*args)
    196 
    197 class CUDAKernelBase(object):

C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in __call__(self, *args, **kwargs)
    357                           blockdim=self.blockdim,
    358                           stream=self.stream,
--> 359                           sharedmem=self.sharedmem)
    360 
    361     def bind(self):

C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in _kernel_call(self, args, griddim, blockdim, stream, sharedmem)
    431                                    sharedmem=sharedmem)
    432         # Invoke kernel
--> 433         cu_func(*kernelargs)
    434 
    435         if self.debug:

C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in __call__(self, *args)
   1114 
   1115         launch_kernel(self.handle, self.griddim, self.blockdim,
-> 1116                       self.sharedmem, streamhandle, args)
   1117 
   1118     @property

C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in launch_kernel(cufunc_handle, griddim, blockdim, sharedmem, hstream, args)
   1158                           hstream,
   1159                           params,
-> 1160                           None)
   1161 
   1162 

C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in safe_cuda_api_call(*args)
    220         def safe_cuda_api_call(*args):
    221             retcode = libfn(*args)
--> 222             self._check_error(fname, retcode)
    223 
    224         setattr(self, fname, safe_cuda_api_call)

C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in _check_error(self, fname, retcode)
    250             errname = ERROR_MAP.get(retcode, "UNKNOWN_CUDA_ERROR")
    251             msg = "Call to %s results in %s" % (fname, errname)
--> 252             raise CudaAPIError(retcode, msg)
    253 
    254     def get_device(self, devnum=0):

CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE

我通过 NVIDIA 开发者论坛找到了解决我的问题的方法。如果您想了解有关解决方案的更多信息,请查看此 link。

https://devtalk.nvidia.com/default/topic/962843/cuda-programming-and-performance/cudaapierror-1-call-to-culaunchkernel-results-in-cuda_error_invalid_value-in-python/?offset=3#4968130

简而言之:

  • 当我更改 N = 32000 或任何其他更小的数量时,它确实工作得很好。
  • 事实上,这意味着我没有以正确的 GPU 类型编译它(check_cuda 是验证它的函数调用)。

希望我的回答对大家有所帮助。

这可能意味着,您尝试在一个块中 运行 更多线程,因为它实际上是允许的。对我来说就是这样。因此,请尝试将您的执行分块进行。