Caffe2好像没有看到cuDNN?

Caffe2 doesn't seem to see cuDNN?

我正在尝试安装 SpConv(空间稀疏卷积库),但是当 运行 python3 setup.py bdist_wheel 时,我收到一个错误,似乎与 Caffe2 无法看到 cuDNN 有关,因为我从这条消息推断:

CMake Error at /home/ivan/.local/lib/python3.6/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
  Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
  libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.

我尝试重新安装 cuDNN,但没有成功。我可以手动将 Caffe2 指向 cuDNN 吗?这是因为通常 cuda 位于 /usr/local/cuda/,但我在 /usr/lib/cuda.

完整输出如下:

running bdist_wheel
running build
running build_py
running build_ext
Release
|||||CMAKE ARGS||||| ['-DCMAKE_PREFIX_PATH=/home/ivan/.local/lib/python3.6/site-packages/torch', '-DPYBIND11_PYTHON_VERSION=3.6', '-DSPCONV_BuildTests=OFF', '-DCMAKE_CUDA_FLAGS="--expt-relaxed-constexpr" -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/mnt/home/ivan/second.pytorch/second/spconv/build/lib.linux-x86_64-3.6/spconv', '-DCMAKE_BUILD_TYPE=Release']
-- Caffe2: CUDA detected: 9.1
-- Caffe2: CUDA nvcc is: /usr/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr
-- Caffe2: Header version is: 9.1
-- Could NOT find CUDNN (missing: CUDNN_INCLUDE_DIR) 
CMake Warning at /home/ivan/.local/lib/python3.6/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:131 (message):
  Caffe2: Cannot find cuDNN library.  Turning the option off
Call Stack (most recent call first):
  /home/ivan/.local/lib/python3.6/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
  /home/ivan/.local/lib/python3.6/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:40 (find_package)
  CMakeLists.txt:20 (find_package)


-- Autodetected CUDA architecture(s):  6.1
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
CMake Error at /home/ivan/.local/lib/python3.6/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
  Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
  libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.
Call Stack (most recent call first):
  /home/ivan/.local/lib/python3.6/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:40 (find_package)
  CMakeLists.txt:20 (find_package)


-- Configuring incomplete, errors occurred!
See also "/mnt/home/ivan/second.pytorch/second/spconv/build/temp.linux-x86_64-3.6/CMakeFiles/CMakeOutput.log".
See also "/mnt/home/ivan/second.pytorch/second/spconv/build/temp.linux-x86_64-3.6/CMakeFiles/CMakeError.log".
Traceback (most recent call last):
  File "setup.py", line 99, in <module>
    zip_safe=False,
  File "/home/ivan/.local/lib/python3.6/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 204, in run
    self.run_command('build')
  File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.6/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 40, in run
    self.build_extension(ext)
  File "setup.py", line 82, in build_extension
    subprocess.check_call(['cmake', ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/mnt/home/ivan/second.pytorch/second/spconv', '-DCMAKE_PREFIX_PATH=/home/ivan/.local/lib/python3.6/site-packages/torch', '-DPYBIND11_PYTHON_VERSION=3.6', '-DSPCONV_BuildTests=OFF', '-DCMAKE_CUDA_FLAGS="--expt-relaxed-constexpr" -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/mnt/home/ivan/second.pytorch/second/spconv/build/lib.linux-x86_64-3.6/spconv', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.

系统:Ubuntu18.04,CUDA 9.1,cuDNN 7.6.3

找到问题了。我需要将 CUDNN_INCLUDE_DIR 设置为 /usr/lib/cuda/include(即 cudnn.h 文件所在的位置)。

经验教训:花时间了解错误,missing: CUDNN_INCLUDE_DIR 是关键。

解决此问题的另一种方法:

  1. here 获取 cuDNN。您可能需要注册才能执行此操作,但注册是免费的。
  2. 解压缩 cuDNN 存档文件。
  3. 如下移动cuDNN文件:
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
  1. 或者,在 /usr/local/ 处放置一个符号 link,指向通过解压 cuDNN 存档生成的 cuda 目录。