CUDA 错误 - 内核执行失败,设备功能无效

CUDA Error - Kernel execution failed with invalid device function

我在成功编译 cuda-convnet2 后尝试 运行 CIFAR10,我收到这个错误

src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function .

我 运行宁 linux 在 Zotak Nvidia geforce 750ti GPU 上。这是日志输出

$ python convnet.py --data-provider cifar --test-range 6 --train-range 1-5 --data-path cifar/cifar-10-py-colmajor --inner-size 24 --save-path cifar/save/ --gpu 0 --layer-def layers/layers-cifar10-11pct.cfg --layer-params layers/layer-params-cifar10-11pct.cfg
python: can't open file 'convnet.py': [Errno 2] No such file or directory
pbu@pbu-OptiPlex-740-Enhanced:~/Desktop$ cd cuda-convnet2
pbu@pbu-OptiPlex-740-Enhanced:~/Desktop/cuda-convnet2$ python convnet.py --data-provider cifar --test-range 6 --train-range 1-5 --data-path cifar/cifar-10-py-colmajor --inner-size 24 --save-path cifar/save/ --gpu 0 --layer-def layers/layers-cifar10-11pct.cfg --layer-params layers/layer-params-cifar10-11pct.cfg
Initialized data layer 'data', producing 1728 outputs
Initialized data layer 'labels', producing 1 outputs
Initialized convolutional layer 'conv1' on GPUs 0, producing 24x24 64-channel output
Initialized max-pooling layer 'pool1' on GPUs 0, producing 12x12 64-channel output
Initialized cross-map response-normalization layer 'rnorm1' on GPUs 0, producing 12x12 64-channel output
Initialized convolutional layer 'conv2' on GPUs 0, producing 12x12 64-channel output
Initialized cross-map response-normalization layer 'rnorm2' on GPUs 0, producing 12x12 64-channel output
Initialized max-pooling layer 'pool2' on GPUs 0, producing 6x6 64-channel output
Initialized locally-connected layer 'local3' on GPUs 0, producing 6x6 64-channel output
Initialized locally-connected layer 'local4' on GPUs 0, producing 6x6 32-channel output
Initialized fully-connected layer 'fc10' on GPUs 0, producing 10 outputs
Initialized softmax layer 'probs' on GPUs 0, producing 10 outputs
Initialized logistic regression cost 'logprob' on GPUs 0
Initialized neuron layer 'conv2_neuron' on GPUs 0, producing 9216 outputs
Initialized neuron layer 'conv1_neuron' on GPUs 0, producing 36864 outputs
Initialized neuron layer 'local4_neuron' on GPUs 0, producing 1152 outputs
Initialized neuron layer 'local3_neuron' on GPUs 0, producing 2304 outputs
Layer local4_neuron using acts from layer local4
Layer conv2_neuron using acts from layer conv2
Layer local3_neuron using acts from layer local3
Layer conv1_neuron using acts from layer conv1
=========================
Importing cudaconvnet._ConvNet C++ module
Fwd terminal: logprob
found bwd terminal conv1[0] in passIdx=0
=========================
Training ConvNet
Add PCA noise to color channels with given scale                        : 0     [DEFAULT]
Check gradients and quit?                                               : 0     [DEFAULT]
Conserve GPU memory (slower)?                                           : 0     [DEFAULT]
Convert given conv layers to unshared local                             :       
Cropped DP: crop size (0 = don't crop)                                  : 24    
Cropped DP: test on multiple patches?                                   : 0     [DEFAULT]
Data batch range: testing                                               : 6-6   
Data batch range: training                                              : 1-5   
Data path                                                               : cifar/cifar-10-py-colmajor 
Data provider                                                           : cifar 
Force save before quitting                                              : 0     [DEFAULT]
GPU override                                                            : 0     
Layer definition file                                                   : layers/layers-cifar10-11pct.cfg 
Layer file path prefix                                                  :       [DEFAULT]
Layer parameter file                                                    : layers/layer-params-cifar10-11pct.cfg 
Load file                                                               :       [DEFAULT]
Logreg cost layer name (for --test-out)                                 :       [DEFAULT]
Minibatch size                                                          : 128   [DEFAULT]
Number of epochs                                                        : 50000 [DEFAULT]
Output test case predictions to given path                              :       [DEFAULT]
Save file override                                                      :       
Save path                                                               : cifar/save/ 
Subtract this scalar from image (-1 = don't)                            : -1    [DEFAULT]
Test and quit?                                                          : 0     [DEFAULT]
Test on one batch at a time?                                            : 1     [DEFAULT]
Testing frequency                                                       : 57    [DEFAULT]
Unshare weight matrices in given layers                                 :       
Write test data features from given layer                               :       [DEFAULT]
Write test data features to this path (to be used with --write-features):       [DEFAULT]
=========================
Running on CUDA device(s) 0
Current time: Thu Jan 15 20:15:50 2015
Saving checkpoints to cifar/save/ConvNet__2015-01-15_20.15.47
=========================
src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function .

您可能需要修改 Makefile:

  • cudaconv3/Makefile
  • cudaconvnet/Makefile
  • nvmatrix/Makefile

并改变

GENCODE_SM35    := -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS   := $(GENCODE_SM35)

GENCODE_SM35    := -gencode arch=compute_35,code=sm_35
GENCODE_SM50    := -gencode arch=compute_50,code=sm_50
GENCODE_FLAGS   := $(GENCODE_SM50)

因为 750Ti 具有计算能力 5.0。