Numbapro 是否支持 Maxwell 架构?

Is Maxwell architecture supported in Numbapro?

我想使用 Numbapro API 在 python 中执行 CUDA 内核。我有这个代码:

import math
import numpy
from numbapro import jit, cuda, int32, float32
from matplotlib import pyplot

@cuda.jit('void(float32[:], float32[:], float32[:], float32[:], float32, float32, float32, int32)')
def calculate_velocity_field(X, Y, u_source, v_source, x_source, y_source, strength_source, N):
    start  = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x
    end    = N
    stride = cuda.gridDim.x * cuda.blockDim.x
    for i in range(start, end, stride):
        u_source[i] = strength_source/(2*math.pi) * (X[i]-x_source)/((X[i]-x_source)**2 + (Y[i]-y_source)**2)
        v_source[i] = strength_source/(2*math.pi) * (Y[i]-x_source)/((X[i]-x_source)**2 + (Y[i]-y_source)**2)


def main():
    N = 200                                # number of points in each direction
    x_start, x_end = -4.0, 4.0            # boundaries in the x-direction
    y_start, y_end = -2.0, 2.0            # boundaries in the y-direction
    x = numpy.linspace(x_start, x_end, N)    # creates a 1D-array with the x-coordinates
    y = numpy.linspace(y_start, y_end, N)    # creates a 1D-array with the y-coordinates

    X, Y = numpy.meshgrid(x, y)              # generates a mesh grid

    strength_source = 5.0                      # source strength
    x_source, y_source = -1.0, 0.0             # location of the source

    start = timer()

    #calculate grid dimensions
    blockSize = 1024
    gridSize  = int(math.ceil(float(N)/blockSize))

    #transfer memory to device
    X_d        = cuda.to_device(X)
    Y_d        = cuda.to_device(Y)
    u_source_d = cuda.device_array_like(X)
    v_source_d = cuda.device_array_like(Y)

    #launch kernel
    calculate_velocity_field[gridSize,blockSize](X_d,Y_d,u_source_d,v_source_d,x_source,y_source,strength_source,N)

    #transfer memory to host
    u_source = numpy.empty_like(X)
    v_source = numpy.empty_like(Y)
    u_source_d.to_host(u_source)
    v_source_d.to_host(v_source)

    elapsed_time = timer() - start
    print("Exec time with GPU %f s" % elapsed_time)

if __name__ == "__main__":
    main()

给我这个错误:

NvvmError                                 Traceback (most recent call last)
<ipython-input-17-85e4a6e56a14> in <module>()
----> 1 @cuda.jit('void(float32[:], float32[:], float32[:], float32[:], float32, float32, float32, int32)')
      2 def calculate_velocity_field(X, Y, u_source, v_source, x_source, y_source, strength_source, N):
      3     start  = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x
      4     end    = N
      5     stride = cuda.gridDim.x * cuda.blockDim.x

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/decorators.py in kernel_jit(func)
     89             # Force compilation for the current context
     90             if bind:
---> 91                 kernel.bind()
     92 
     93             return kernel

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in bind(self)
    319         Force binding to current CUDA context
    320         """
--> 321         self._func.get()
    322 
    323     @property

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in get(self)
    254         cufunc = self.cache.get(device.id)
    255         if cufunc is None:
--> 256             ptx = self.ptx.get()
    257 
    258             # Link

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in get(self)
    226             arch = nvvm.get_arch_option(*cc)
    227             ptx = nvvm.llvm_to_ptx(self.llvmir, opt=3, arch=arch,
--> 228                                    **self._extra_options)
    229             self.cache[cc] = ptx
    230             if config.DUMP_ASSEMBLY:

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in llvm_to_ptx(llvmir, **opts)
    420     cu.add_module(llvmir.encode('utf8'))
    421     cu.add_module(libdevice.get())
--> 422     ptx = cu.compile(**opts)
    423     return ptx
    424 

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in compile(self, **options)
    211                                           for x in opts])
    212         err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)
--> 213         self._try_error(err, 'Failed to compile\n')
    214 
    215         # get result

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in _try_error(self, err, msg)
    229 
    230     def _try_error(self, err, msg):
--> 231         self.driver.check_error(err, "%s\n%s" % (msg, self.get_log()))
    232 
    233     def get_log(self):

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in check_error(self, error, msg, exit)
    118                 sys.exit(1)
    119             else:
--> 120                 raise exc
    121 
    122 

NvvmError: Failed to compile

libnvvm : error: -arch=compute_52 is an unsupported option
NVVM_ERROR_INVALID_OPTION

我尝试了另一个 numbapro 示例,但出现了同样的错误。 不知道是Numbapro不支持5.2计算能力的bug还是Nvidia NVVM的问题...求推荐?

理论上it should be supported,但我不知道发生了什么。

我正在使用 Linux CUDA 7.0 和驱动程序版本 346.29

终于找到解决方法here

  • 解决方案 1:

conda update cudatoolkit

Fetching package metadata: ....
# All requested packages already installed.
# packages in environment at ~/.anaconda3:
#
cudatoolkit               6.0                          p0

看来我更新 CUDA 工具包并没有更新到 CUDA 7.0。可以做第二种方案:

  • 解决方案 2

conda install -c numba cudatoolkit

Fetching package metadata: ...... 
Solving package specifications: .
Package plan for installation in environment ~/.anaconda3:

The following packages will be downloaded:
     package                    |            build
    ---------------------------|-----------------
    cudatoolkit-7.0            |                1       190.8 MB

The following packages will be UPDATED:

    cudatoolkit: 6.0-p0 --> 7.0-1

Proceed ([y]/n)? y

之前:

In [4]: check_cuda()
------------------------------libraries detection-------------------------------
Finding cublas
    located at ~/.anaconda3/lib/libcublas.so.6.0.37
    trying to open library...   ok
Finding cusparse
    located at ~/.anaconda3/lib/libcusparse.so.6.0.37
    trying to open library...   ok
Finding cufft
    located at ~/.anaconda3/lib/libcufft.so.6.0.37
    trying to open library...   ok
Finding curand
    located at ~/.anaconda3/lib/libcurand.so.6.0.37
    trying to open library...   ok
Finding nvvm
    located at ~/.anaconda3/lib/libnvvm.so.2.0.0
    trying to open library...   ok
    finding libdevice for compute_20... ok
    finding libdevice for compute_30... ok
    finding libdevice for compute_35... ok
-------------------------------hardware detection-------------------------------
Found 1 CUDA devices
id 0      b'GeForce GTX 970'                              [SUPPORTED]
                      compute capability: 5.2
                           pci device id: 0
                              pci bus id: 7
Summary:
    1/1 devices are supported
PASSED
Out[4]: True

之后:

In [6]:  check_cuda()
------------------------------libraries detection-------------------------------
Finding cublas
    located at ~/.anaconda3/lib/libcublas.so.7.0.28
    trying to open library...   ok
Finding cusparse
    located at ~/.anaconda3/lib/libcusparse.so.7.0.28
    trying to open library...   ok
Finding cufft
    located at ~/.anaconda3/lib/libcufft.so.7.0.35
    trying to open library...   ok
Finding curand
    located at ~/.anaconda3/lib/libcurand.so.7.0.28
    trying to open library...   ok
Finding nvvm
    located at ~/.anaconda3/lib/libnvvm.so.3.0.0
    trying to open library...   ok
    finding libdevice for compute_20... ok
    finding libdevice for compute_30... ok
    finding libdevice for compute_35... ok
-------------------------------hardware detection-------------------------------
Found 1 CUDA devices
id 0      b'GeForce GTX 970'                              [SUPPORTED]
                      compute capability: 5.2
                           pci device id: 0
                              pci bus id: 7
Summary:
    1/1 devices are supported
PASSED
Out[6]:  True