CUDA 错误 - 内核执行失败,设备功能无效
CUDA Error - Kernel execution failed with invalid device function
我在成功编译 cuda-convnet2 后尝试 运行 CIFAR10,我收到这个错误
src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function .
我 运行宁 linux 在 Zotak Nvidia geforce 750ti GPU 上。这是日志输出
$ python convnet.py --data-provider cifar --test-range 6 --train-range 1-5 --data-path cifar/cifar-10-py-colmajor --inner-size 24 --save-path cifar/save/ --gpu 0 --layer-def layers/layers-cifar10-11pct.cfg --layer-params layers/layer-params-cifar10-11pct.cfg
python: can't open file 'convnet.py': [Errno 2] No such file or directory
pbu@pbu-OptiPlex-740-Enhanced:~/Desktop$ cd cuda-convnet2
pbu@pbu-OptiPlex-740-Enhanced:~/Desktop/cuda-convnet2$ python convnet.py --data-provider cifar --test-range 6 --train-range 1-5 --data-path cifar/cifar-10-py-colmajor --inner-size 24 --save-path cifar/save/ --gpu 0 --layer-def layers/layers-cifar10-11pct.cfg --layer-params layers/layer-params-cifar10-11pct.cfg
Initialized data layer 'data', producing 1728 outputs
Initialized data layer 'labels', producing 1 outputs
Initialized convolutional layer 'conv1' on GPUs 0, producing 24x24 64-channel output
Initialized max-pooling layer 'pool1' on GPUs 0, producing 12x12 64-channel output
Initialized cross-map response-normalization layer 'rnorm1' on GPUs 0, producing 12x12 64-channel output
Initialized convolutional layer 'conv2' on GPUs 0, producing 12x12 64-channel output
Initialized cross-map response-normalization layer 'rnorm2' on GPUs 0, producing 12x12 64-channel output
Initialized max-pooling layer 'pool2' on GPUs 0, producing 6x6 64-channel output
Initialized locally-connected layer 'local3' on GPUs 0, producing 6x6 64-channel output
Initialized locally-connected layer 'local4' on GPUs 0, producing 6x6 32-channel output
Initialized fully-connected layer 'fc10' on GPUs 0, producing 10 outputs
Initialized softmax layer 'probs' on GPUs 0, producing 10 outputs
Initialized logistic regression cost 'logprob' on GPUs 0
Initialized neuron layer 'conv2_neuron' on GPUs 0, producing 9216 outputs
Initialized neuron layer 'conv1_neuron' on GPUs 0, producing 36864 outputs
Initialized neuron layer 'local4_neuron' on GPUs 0, producing 1152 outputs
Initialized neuron layer 'local3_neuron' on GPUs 0, producing 2304 outputs
Layer local4_neuron using acts from layer local4
Layer conv2_neuron using acts from layer conv2
Layer local3_neuron using acts from layer local3
Layer conv1_neuron using acts from layer conv1
=========================
Importing cudaconvnet._ConvNet C++ module
Fwd terminal: logprob
found bwd terminal conv1[0] in passIdx=0
=========================
Training ConvNet
Add PCA noise to color channels with given scale : 0 [DEFAULT]
Check gradients and quit? : 0 [DEFAULT]
Conserve GPU memory (slower)? : 0 [DEFAULT]
Convert given conv layers to unshared local :
Cropped DP: crop size (0 = don't crop) : 24
Cropped DP: test on multiple patches? : 0 [DEFAULT]
Data batch range: testing : 6-6
Data batch range: training : 1-5
Data path : cifar/cifar-10-py-colmajor
Data provider : cifar
Force save before quitting : 0 [DEFAULT]
GPU override : 0
Layer definition file : layers/layers-cifar10-11pct.cfg
Layer file path prefix : [DEFAULT]
Layer parameter file : layers/layer-params-cifar10-11pct.cfg
Load file : [DEFAULT]
Logreg cost layer name (for --test-out) : [DEFAULT]
Minibatch size : 128 [DEFAULT]
Number of epochs : 50000 [DEFAULT]
Output test case predictions to given path : [DEFAULT]
Save file override :
Save path : cifar/save/
Subtract this scalar from image (-1 = don't) : -1 [DEFAULT]
Test and quit? : 0 [DEFAULT]
Test on one batch at a time? : 1 [DEFAULT]
Testing frequency : 57 [DEFAULT]
Unshare weight matrices in given layers :
Write test data features from given layer : [DEFAULT]
Write test data features to this path (to be used with --write-features): [DEFAULT]
=========================
Running on CUDA device(s) 0
Current time: Thu Jan 15 20:15:50 2015
Saving checkpoints to cifar/save/ConvNet__2015-01-15_20.15.47
=========================
src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function .
您可能需要修改 Makefile
:
- cudaconv3/Makefile
- cudaconvnet/Makefile
- nvmatrix/Makefile
并改变
GENCODE_SM35 := -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS := $(GENCODE_SM35)
至
GENCODE_SM35 := -gencode arch=compute_35,code=sm_35
GENCODE_SM50 := -gencode arch=compute_50,code=sm_50
GENCODE_FLAGS := $(GENCODE_SM50)
因为 750Ti 具有计算能力 5.0。
我在成功编译 cuda-convnet2 后尝试 运行 CIFAR10,我收到这个错误
src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function .
我 运行宁 linux 在 Zotak Nvidia geforce 750ti GPU 上。这是日志输出
$ python convnet.py --data-provider cifar --test-range 6 --train-range 1-5 --data-path cifar/cifar-10-py-colmajor --inner-size 24 --save-path cifar/save/ --gpu 0 --layer-def layers/layers-cifar10-11pct.cfg --layer-params layers/layer-params-cifar10-11pct.cfg
python: can't open file 'convnet.py': [Errno 2] No such file or directory
pbu@pbu-OptiPlex-740-Enhanced:~/Desktop$ cd cuda-convnet2
pbu@pbu-OptiPlex-740-Enhanced:~/Desktop/cuda-convnet2$ python convnet.py --data-provider cifar --test-range 6 --train-range 1-5 --data-path cifar/cifar-10-py-colmajor --inner-size 24 --save-path cifar/save/ --gpu 0 --layer-def layers/layers-cifar10-11pct.cfg --layer-params layers/layer-params-cifar10-11pct.cfg
Initialized data layer 'data', producing 1728 outputs
Initialized data layer 'labels', producing 1 outputs
Initialized convolutional layer 'conv1' on GPUs 0, producing 24x24 64-channel output
Initialized max-pooling layer 'pool1' on GPUs 0, producing 12x12 64-channel output
Initialized cross-map response-normalization layer 'rnorm1' on GPUs 0, producing 12x12 64-channel output
Initialized convolutional layer 'conv2' on GPUs 0, producing 12x12 64-channel output
Initialized cross-map response-normalization layer 'rnorm2' on GPUs 0, producing 12x12 64-channel output
Initialized max-pooling layer 'pool2' on GPUs 0, producing 6x6 64-channel output
Initialized locally-connected layer 'local3' on GPUs 0, producing 6x6 64-channel output
Initialized locally-connected layer 'local4' on GPUs 0, producing 6x6 32-channel output
Initialized fully-connected layer 'fc10' on GPUs 0, producing 10 outputs
Initialized softmax layer 'probs' on GPUs 0, producing 10 outputs
Initialized logistic regression cost 'logprob' on GPUs 0
Initialized neuron layer 'conv2_neuron' on GPUs 0, producing 9216 outputs
Initialized neuron layer 'conv1_neuron' on GPUs 0, producing 36864 outputs
Initialized neuron layer 'local4_neuron' on GPUs 0, producing 1152 outputs
Initialized neuron layer 'local3_neuron' on GPUs 0, producing 2304 outputs
Layer local4_neuron using acts from layer local4
Layer conv2_neuron using acts from layer conv2
Layer local3_neuron using acts from layer local3
Layer conv1_neuron using acts from layer conv1
=========================
Importing cudaconvnet._ConvNet C++ module
Fwd terminal: logprob
found bwd terminal conv1[0] in passIdx=0
=========================
Training ConvNet
Add PCA noise to color channels with given scale : 0 [DEFAULT]
Check gradients and quit? : 0 [DEFAULT]
Conserve GPU memory (slower)? : 0 [DEFAULT]
Convert given conv layers to unshared local :
Cropped DP: crop size (0 = don't crop) : 24
Cropped DP: test on multiple patches? : 0 [DEFAULT]
Data batch range: testing : 6-6
Data batch range: training : 1-5
Data path : cifar/cifar-10-py-colmajor
Data provider : cifar
Force save before quitting : 0 [DEFAULT]
GPU override : 0
Layer definition file : layers/layers-cifar10-11pct.cfg
Layer file path prefix : [DEFAULT]
Layer parameter file : layers/layer-params-cifar10-11pct.cfg
Load file : [DEFAULT]
Logreg cost layer name (for --test-out) : [DEFAULT]
Minibatch size : 128 [DEFAULT]
Number of epochs : 50000 [DEFAULT]
Output test case predictions to given path : [DEFAULT]
Save file override :
Save path : cifar/save/
Subtract this scalar from image (-1 = don't) : -1 [DEFAULT]
Test and quit? : 0 [DEFAULT]
Test on one batch at a time? : 1 [DEFAULT]
Testing frequency : 57 [DEFAULT]
Unshare weight matrices in given layers :
Write test data features from given layer : [DEFAULT]
Write test data features to this path (to be used with --write-features): [DEFAULT]
=========================
Running on CUDA device(s) 0
Current time: Thu Jan 15 20:15:50 2015
Saving checkpoints to cifar/save/ConvNet__2015-01-15_20.15.47
=========================
src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function .
您可能需要修改 Makefile
:
- cudaconv3/Makefile
- cudaconvnet/Makefile
- nvmatrix/Makefile
并改变
GENCODE_SM35 := -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS := $(GENCODE_SM35)
至
GENCODE_SM35 := -gencode arch=compute_35,code=sm_35
GENCODE_SM50 := -gencode arch=compute_50,code=sm_50
GENCODE_FLAGS := $(GENCODE_SM50)
因为 750Ti 具有计算能力 5.0。