Caffe compiled fine with cudnn however runtest fails with error: CUDNN_STATUS_ARCH_MISMATCH

Question

当运行使用 caffe 进行运行测试时，我得到以下输出，使用 Cudnn 编译一切正常，没有提供任何错误，我还包含了 build_release/tools/caffe device_query 的输出 - gpu <0,1> 用于 NVidia Tesla GPU 运行 Cuda 驱动程序和运行时版本 7.0。有人可以帮忙吗？

[----------] 1 test from SolverTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN      ] SolverTest/0.TestInitTrainTestNets
F0907 18:53:28.279698   309 cudnn_softmax_layer.cpp:19] Check failed: status == CUDNN_STATUS_SUCCESS (6 vs. 0)  CUDNN_STATUS_ARCH_MISMATCH
*** Check failure stack trace: ***
    @     0x2b8a426cfdaa  (unknown)
    @     0x2b8a426cfce4  (unknown)
    @     0x2b8a426cf6e6  (unknown)
    @     0x2b8a426d2687  (unknown)
    @     0x2b8a4404b3c5  caffe::CuDNNSoftmaxLayer<>::LayerSetUp()
    @     0x2b8a440bf8a7  caffe::SoftmaxWithLossLayer<>::LayerSetUp()
    @     0x2b8a440eb9dd  caffe::Net<>::Init()
    @     0x2b8a440eca25  caffe::Net<>::Net()
    @     0x2b8a4410335a  caffe::Solver<>::InitTrainNet()
    @     0x2b8a44104354  caffe::Solver<>::Init()
    @     0x2b8a44104659  caffe::Solver<>::Solver()
    @           0x787e9c  caffe::SolverTest<>::InitSolverFromProtoString()
    @           0x785170  caffe::SolverTest_TestInitTrainTestNets_Test<>::TestBody()
    @           0x7e9943  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x7e0627  testing::Test::Run()
    @           0x7e06ce  testing::TestInfo::Run()
    @           0x7e07d5  testing::TestCase::Run()
    @           0x7e3b18  testing::internal::UnitTestImpl::RunAllTests()
    @           0x7e3da7  testing::UnitTest::Run()
    @           0x45552a  main
    @     0x2b8a44d70ec5  (unknown)
    @           0x45bd69  (unknown)
    @              (nil)  (unknown)
make: *** [runtest] Aborted (core dumped)


% ./build_release/tools/caffe device_query -gpu 0
I0907 18:55:04.830653   729 caffe.cpp:111] Querying GPUs 0
I0907 18:55:05.037142   729 common.cpp:168] Device id:                     0
I0907 18:55:05.037195   729 common.cpp:169] Major revision number:         2
I0907 18:55:05.037201   729 common.cpp:170] Minor revision number:         0
I0907 18:55:05.037207   729 common.cpp:171] Name:                          Tesla M2090
I0907 18:55:05.037214   729 common.cpp:172] Total global memory:           5636554752
I0907 18:55:05.037220   729 common.cpp:173] Total shared memory per block: 49152
I0907 18:55:05.037225   729 common.cpp:174] Total registers per block:     32768
I0907 18:55:05.037231   729 common.cpp:175] Warp size:                     32
I0907 18:55:05.037236   729 common.cpp:176] Maximum memory pitch:          2147483647
I0907 18:55:05.037241   729 common.cpp:177] Maximum threads per block:     1024
I0907 18:55:05.037246   729 common.cpp:178] Maximum dimension of block:    1024, 1024, 64
I0907 18:55:05.037253   729 common.cpp:181] Maximum dimension of grid:     65535, 65535, 65535
I0907 18:55:05.037258   729 common.cpp:184] Clock rate:                    1301000
I0907 18:55:05.037263   729 common.cpp:185] Total constant memory:         65536
I0907 18:55:05.037268   729 common.cpp:186] Texture alignment:             512
I0907 18:55:05.037272   729 common.cpp:187] Concurrent copy and execution: Yes
I0907 18:55:05.037278   729 common.cpp:189] Number of multiprocessors:     16
I0907 18:55:05.037283   729 common.cpp:190] Kernel execution timeout:      No


% ./build_release/tools/caffe device_query -gpu 1
I0907 18:55:15.162884   784 caffe.cpp:111] Querying GPUs 1
I0907 18:55:20.532964   784 common.cpp:168] Device id:                     1
I0907 18:55:20.533093   784 common.cpp:169] Major revision number:         2
I0907 18:55:20.533129   784 common.cpp:170] Minor revision number:         0
I0907 18:55:20.533161   784 common.cpp:171] Name:                          Tesla M2090
I0907 18:55:20.533193   784 common.cpp:172] Total global memory:           5636554752
I0907 18:55:20.533227   784 common.cpp:173] Total shared memory per block: 49152
I0907 18:55:20.533252   784 common.cpp:174] Total registers per block:     32768
I0907 18:55:20.533277   784 common.cpp:175] Warp size:                     32
I0907 18:55:20.533298   784 common.cpp:176] Maximum memory pitch:          2147483647
I0907 18:55:20.533323   784 common.cpp:177] Maximum threads per block:     1024
I0907 18:55:20.533345   784 common.cpp:178] Maximum dimension of block:    1024, 1024, 64
I0907 18:55:20.533371   784 common.cpp:181] Maximum dimension of grid:     65535, 65535, 65535
I0907 18:55:20.533404   784 common.cpp:184] Clock rate:                    1301000
I0907 18:55:20.533428   784 common.cpp:185] Total constant memory:         65536
I0907 18:55:20.533452   784 common.cpp:186] Texture alignment:             512
I0907 18:55:20.533476   784 common.cpp:187] Concurrent copy and execution: Yes
I0907 18:55:20.533500   784 common.cpp:189] Number of multiprocessors:     16
I0907 18:55:20.533524   784 common.cpp:190] Kernel execution timeout:      No

Answer 1

cuDNN 库 requires a GPU of compute capability 3.0 or higher:

Supported on Windows, Linux and MacOS systems with Kepler, Maxwell or Tegra K1 GPUs.

您的 Fermi M2090 是计算能力 2.0 GPU：

I0907 18:55:05.037195   729 common.cpp:169] Major revision number:         2
I0907 18:55:05.037201   729 common.cpp:170] Minor revision number:         0
I0907 18:55:05.037207   729 common.cpp:171] Name:                          Tesla M2090

Answer 2

首先检查使用终端输入此命令 nvidia-smi。然后转到此 link 并尝试在 table 中找到您的 GPU。您可能会发现您的 gpu 的计算能力低于 3.0，而 cuDNN 不支持它。

根据您的输出和 TESLA M2090，您可能拥有以下 GPU 之一：

GeForce GTX 590, GeForce GTX 580, GeForce GTX 570, GeForce GTX 480, GeForce GTX 470, GeForce GTX 465, GeForce GTX 480M

以上GPU的计算能力为2.0。所以我的建议是尝试安装不带 cuDNN 的 caffe，并且至少不要在当前机器上使用它。

Caffe compiled fine with cudnn however runtest fails with error: CUDNN_STATUS_ARCH_MISMATCH

Caffe compiled fine with cudnn however runtest fails with error: CUDNN_STATUS_ARCH_MISMATCH

automated-tests

cuda

compilation

caffe