Caffe compiled fine with cudnn however runtest fails with error: CUDNN_STATUS_ARCH_MISMATCH
Caffe compiled fine with cudnn however runtest fails with error: CUDNN_STATUS_ARCH_MISMATCH
当 运行 使用 caffe 进行运行测试时,我得到以下输出,使用 Cudnn 编译一切正常,没有提供任何错误,我还包含了 build_release/tools/caffe device_query 的输出 - gpu <0,1> 用于 NVidia Tesla GPU 运行 Cuda 驱动程序和运行时版本 7.0。有人可以帮忙吗?
[----------] 1 test from SolverTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN ] SolverTest/0.TestInitTrainTestNets
F0907 18:53:28.279698 309 cudnn_softmax_layer.cpp:19] Check failed: status == CUDNN_STATUS_SUCCESS (6 vs. 0) CUDNN_STATUS_ARCH_MISMATCH
*** Check failure stack trace: ***
@ 0x2b8a426cfdaa (unknown)
@ 0x2b8a426cfce4 (unknown)
@ 0x2b8a426cf6e6 (unknown)
@ 0x2b8a426d2687 (unknown)
@ 0x2b8a4404b3c5 caffe::CuDNNSoftmaxLayer<>::LayerSetUp()
@ 0x2b8a440bf8a7 caffe::SoftmaxWithLossLayer<>::LayerSetUp()
@ 0x2b8a440eb9dd caffe::Net<>::Init()
@ 0x2b8a440eca25 caffe::Net<>::Net()
@ 0x2b8a4410335a caffe::Solver<>::InitTrainNet()
@ 0x2b8a44104354 caffe::Solver<>::Init()
@ 0x2b8a44104659 caffe::Solver<>::Solver()
@ 0x787e9c caffe::SolverTest<>::InitSolverFromProtoString()
@ 0x785170 caffe::SolverTest_TestInitTrainTestNets_Test<>::TestBody()
@ 0x7e9943 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x7e0627 testing::Test::Run()
@ 0x7e06ce testing::TestInfo::Run()
@ 0x7e07d5 testing::TestCase::Run()
@ 0x7e3b18 testing::internal::UnitTestImpl::RunAllTests()
@ 0x7e3da7 testing::UnitTest::Run()
@ 0x45552a main
@ 0x2b8a44d70ec5 (unknown)
@ 0x45bd69 (unknown)
@ (nil) (unknown)
make: *** [runtest] Aborted (core dumped)
% ./build_release/tools/caffe device_query -gpu 0
I0907 18:55:04.830653 729 caffe.cpp:111] Querying GPUs 0
I0907 18:55:05.037142 729 common.cpp:168] Device id: 0
I0907 18:55:05.037195 729 common.cpp:169] Major revision number: 2
I0907 18:55:05.037201 729 common.cpp:170] Minor revision number: 0
I0907 18:55:05.037207 729 common.cpp:171] Name: Tesla M2090
I0907 18:55:05.037214 729 common.cpp:172] Total global memory: 5636554752
I0907 18:55:05.037220 729 common.cpp:173] Total shared memory per block: 49152
I0907 18:55:05.037225 729 common.cpp:174] Total registers per block: 32768
I0907 18:55:05.037231 729 common.cpp:175] Warp size: 32
I0907 18:55:05.037236 729 common.cpp:176] Maximum memory pitch: 2147483647
I0907 18:55:05.037241 729 common.cpp:177] Maximum threads per block: 1024
I0907 18:55:05.037246 729 common.cpp:178] Maximum dimension of block: 1024, 1024, 64
I0907 18:55:05.037253 729 common.cpp:181] Maximum dimension of grid: 65535, 65535, 65535
I0907 18:55:05.037258 729 common.cpp:184] Clock rate: 1301000
I0907 18:55:05.037263 729 common.cpp:185] Total constant memory: 65536
I0907 18:55:05.037268 729 common.cpp:186] Texture alignment: 512
I0907 18:55:05.037272 729 common.cpp:187] Concurrent copy and execution: Yes
I0907 18:55:05.037278 729 common.cpp:189] Number of multiprocessors: 16
I0907 18:55:05.037283 729 common.cpp:190] Kernel execution timeout: No
% ./build_release/tools/caffe device_query -gpu 1
I0907 18:55:15.162884 784 caffe.cpp:111] Querying GPUs 1
I0907 18:55:20.532964 784 common.cpp:168] Device id: 1
I0907 18:55:20.533093 784 common.cpp:169] Major revision number: 2
I0907 18:55:20.533129 784 common.cpp:170] Minor revision number: 0
I0907 18:55:20.533161 784 common.cpp:171] Name: Tesla M2090
I0907 18:55:20.533193 784 common.cpp:172] Total global memory: 5636554752
I0907 18:55:20.533227 784 common.cpp:173] Total shared memory per block: 49152
I0907 18:55:20.533252 784 common.cpp:174] Total registers per block: 32768
I0907 18:55:20.533277 784 common.cpp:175] Warp size: 32
I0907 18:55:20.533298 784 common.cpp:176] Maximum memory pitch: 2147483647
I0907 18:55:20.533323 784 common.cpp:177] Maximum threads per block: 1024
I0907 18:55:20.533345 784 common.cpp:178] Maximum dimension of block: 1024, 1024, 64
I0907 18:55:20.533371 784 common.cpp:181] Maximum dimension of grid: 65535, 65535, 65535
I0907 18:55:20.533404 784 common.cpp:184] Clock rate: 1301000
I0907 18:55:20.533428 784 common.cpp:185] Total constant memory: 65536
I0907 18:55:20.533452 784 common.cpp:186] Texture alignment: 512
I0907 18:55:20.533476 784 common.cpp:187] Concurrent copy and execution: Yes
I0907 18:55:20.533500 784 common.cpp:189] Number of multiprocessors: 16
I0907 18:55:20.533524 784 common.cpp:190] Kernel execution timeout: No
cuDNN 库 requires a GPU of compute capability 3.0 or higher:
Supported on Windows, Linux and MacOS systems with Kepler, Maxwell or Tegra K1 GPUs.
您的 Fermi M2090 是计算能力 2.0 GPU:
I0907 18:55:05.037195 729 common.cpp:169] Major revision number: 2
I0907 18:55:05.037201 729 common.cpp:170] Minor revision number: 0
I0907 18:55:05.037207 729 common.cpp:171] Name: Tesla M2090
首先检查使用终端输入此命令 nvidia-smi
。
然后转到此 link 并尝试在 table 中找到您的 GPU。您可能会发现您的 gpu 的计算能力低于 3.0,而 cuDNN 不支持它。
根据您的输出和 TESLA M2090,您可能拥有以下 GPU 之一:
GeForce GTX 590, GeForce GTX 580, GeForce GTX 570, GeForce GTX 480,
GeForce GTX 470, GeForce GTX 465, GeForce GTX 480M
以上GPU的计算能力为2.0。所以我的建议是尝试安装不带 cuDNN 的 caffe,并且至少不要在当前机器上使用它。
当 运行 使用 caffe 进行运行测试时,我得到以下输出,使用 Cudnn 编译一切正常,没有提供任何错误,我还包含了 build_release/tools/caffe device_query 的输出 - gpu <0,1> 用于 NVidia Tesla GPU 运行 Cuda 驱动程序和运行时版本 7.0。有人可以帮忙吗?
[----------] 1 test from SolverTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN ] SolverTest/0.TestInitTrainTestNets
F0907 18:53:28.279698 309 cudnn_softmax_layer.cpp:19] Check failed: status == CUDNN_STATUS_SUCCESS (6 vs. 0) CUDNN_STATUS_ARCH_MISMATCH
*** Check failure stack trace: ***
@ 0x2b8a426cfdaa (unknown)
@ 0x2b8a426cfce4 (unknown)
@ 0x2b8a426cf6e6 (unknown)
@ 0x2b8a426d2687 (unknown)
@ 0x2b8a4404b3c5 caffe::CuDNNSoftmaxLayer<>::LayerSetUp()
@ 0x2b8a440bf8a7 caffe::SoftmaxWithLossLayer<>::LayerSetUp()
@ 0x2b8a440eb9dd caffe::Net<>::Init()
@ 0x2b8a440eca25 caffe::Net<>::Net()
@ 0x2b8a4410335a caffe::Solver<>::InitTrainNet()
@ 0x2b8a44104354 caffe::Solver<>::Init()
@ 0x2b8a44104659 caffe::Solver<>::Solver()
@ 0x787e9c caffe::SolverTest<>::InitSolverFromProtoString()
@ 0x785170 caffe::SolverTest_TestInitTrainTestNets_Test<>::TestBody()
@ 0x7e9943 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x7e0627 testing::Test::Run()
@ 0x7e06ce testing::TestInfo::Run()
@ 0x7e07d5 testing::TestCase::Run()
@ 0x7e3b18 testing::internal::UnitTestImpl::RunAllTests()
@ 0x7e3da7 testing::UnitTest::Run()
@ 0x45552a main
@ 0x2b8a44d70ec5 (unknown)
@ 0x45bd69 (unknown)
@ (nil) (unknown)
make: *** [runtest] Aborted (core dumped)
% ./build_release/tools/caffe device_query -gpu 0
I0907 18:55:04.830653 729 caffe.cpp:111] Querying GPUs 0
I0907 18:55:05.037142 729 common.cpp:168] Device id: 0
I0907 18:55:05.037195 729 common.cpp:169] Major revision number: 2
I0907 18:55:05.037201 729 common.cpp:170] Minor revision number: 0
I0907 18:55:05.037207 729 common.cpp:171] Name: Tesla M2090
I0907 18:55:05.037214 729 common.cpp:172] Total global memory: 5636554752
I0907 18:55:05.037220 729 common.cpp:173] Total shared memory per block: 49152
I0907 18:55:05.037225 729 common.cpp:174] Total registers per block: 32768
I0907 18:55:05.037231 729 common.cpp:175] Warp size: 32
I0907 18:55:05.037236 729 common.cpp:176] Maximum memory pitch: 2147483647
I0907 18:55:05.037241 729 common.cpp:177] Maximum threads per block: 1024
I0907 18:55:05.037246 729 common.cpp:178] Maximum dimension of block: 1024, 1024, 64
I0907 18:55:05.037253 729 common.cpp:181] Maximum dimension of grid: 65535, 65535, 65535
I0907 18:55:05.037258 729 common.cpp:184] Clock rate: 1301000
I0907 18:55:05.037263 729 common.cpp:185] Total constant memory: 65536
I0907 18:55:05.037268 729 common.cpp:186] Texture alignment: 512
I0907 18:55:05.037272 729 common.cpp:187] Concurrent copy and execution: Yes
I0907 18:55:05.037278 729 common.cpp:189] Number of multiprocessors: 16
I0907 18:55:05.037283 729 common.cpp:190] Kernel execution timeout: No
% ./build_release/tools/caffe device_query -gpu 1
I0907 18:55:15.162884 784 caffe.cpp:111] Querying GPUs 1
I0907 18:55:20.532964 784 common.cpp:168] Device id: 1
I0907 18:55:20.533093 784 common.cpp:169] Major revision number: 2
I0907 18:55:20.533129 784 common.cpp:170] Minor revision number: 0
I0907 18:55:20.533161 784 common.cpp:171] Name: Tesla M2090
I0907 18:55:20.533193 784 common.cpp:172] Total global memory: 5636554752
I0907 18:55:20.533227 784 common.cpp:173] Total shared memory per block: 49152
I0907 18:55:20.533252 784 common.cpp:174] Total registers per block: 32768
I0907 18:55:20.533277 784 common.cpp:175] Warp size: 32
I0907 18:55:20.533298 784 common.cpp:176] Maximum memory pitch: 2147483647
I0907 18:55:20.533323 784 common.cpp:177] Maximum threads per block: 1024
I0907 18:55:20.533345 784 common.cpp:178] Maximum dimension of block: 1024, 1024, 64
I0907 18:55:20.533371 784 common.cpp:181] Maximum dimension of grid: 65535, 65535, 65535
I0907 18:55:20.533404 784 common.cpp:184] Clock rate: 1301000
I0907 18:55:20.533428 784 common.cpp:185] Total constant memory: 65536
I0907 18:55:20.533452 784 common.cpp:186] Texture alignment: 512
I0907 18:55:20.533476 784 common.cpp:187] Concurrent copy and execution: Yes
I0907 18:55:20.533500 784 common.cpp:189] Number of multiprocessors: 16
I0907 18:55:20.533524 784 common.cpp:190] Kernel execution timeout: No
cuDNN 库 requires a GPU of compute capability 3.0 or higher:
Supported on Windows, Linux and MacOS systems with Kepler, Maxwell or Tegra K1 GPUs.
您的 Fermi M2090 是计算能力 2.0 GPU:
I0907 18:55:05.037195 729 common.cpp:169] Major revision number: 2
I0907 18:55:05.037201 729 common.cpp:170] Minor revision number: 0
I0907 18:55:05.037207 729 common.cpp:171] Name: Tesla M2090
首先检查使用终端输入此命令 nvidia-smi
。
然后转到此 link 并尝试在 table 中找到您的 GPU。您可能会发现您的 gpu 的计算能力低于 3.0,而 cuDNN 不支持它。
根据您的输出和 TESLA M2090,您可能拥有以下 GPU 之一:
GeForce GTX 590, GeForce GTX 580, GeForce GTX 570, GeForce GTX 480, GeForce GTX 470, GeForce GTX 465, GeForce GTX 480M
以上GPU的计算能力为2.0。所以我的建议是尝试安装不带 cuDNN 的 caffe,并且至少不要在当前机器上使用它。