tensorflow-gpu 2.2 适用于 CUDA 10.2,但需要 cuDNN 7.6.4,它在 NVIDIA 存档中没有 CUDA 10.2 的下载文件

tensorflow-gpu 2.2 works with CUDA 10.2 but requires cuDNN 7.6.4 which doesn't have a download file in NVIDIA archive for CUDA 10.2

错误如下,完整的日志可以在这里找到:https://pastebin.com/raw/0WQw8ktB

2021-06-10 22:03:04.201770: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2021-06-10 22:03:04.420481: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-06-10 22:03:05.034154: E tensorflow/stream_executor/cuda/cuda_dnn.cc:319] Loaded runtime CuDNN library: 7.4.2 but source was compiled with:
7.6.4.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2021-06-10 22:03:05.038684: E tensorflow/stream_executor/cuda/cuda_dnn.cc:319] Loaded runtime CuDNN library: 7.4.2 but source was compiled with: 7.6.4.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

这些是我从 nvidia 档案中看到的:

https://developer.nvidia.com/rdp/cudnn-archive

Download cuDNN v7.6.4 (September 27, 2019), for CUDA 10.1
Download cuDNN v7.6.4 (September 27, 2019), for CUDA 10.0
Download cuDNN v7.6.4 (September 27, 2019), for CUDA 9.2
Download cuDNN v7.6.4 (September 27, 2019), for CUDA 9.0

如您所见,没有适用于 CUDA 10.2 的 cuDNN,但是我需要为框架的其余部分使用 CUDA 10.2。 tensorflow-gpu 2.2 可与 CUDA 10.2 一起使用,但出现此错误,这意味着我需要使用 cuDNN 7.6.4 而不是 7.4.2

python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
v2.2.0-rc4-8-g2b96f3662b 2.2.0

GPU 型号和内存:

GeForce 1080 Ti (2x) each 12GB memory

$ stat /usr/local/cuda
  File: ‘/usr/local/cuda’ -> ‘/usr/local/cuda-10.2’
  Size: 20          Blocks: 0          IO Block: 4096   symbolic link
Device: fd00h/64768d    Inode: 67157410    Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:usr_t:s0
Access: 2021-06-10 22:12:20.673080083 -0400
Modify: 2020-09-21 09:39:18.559883390 -0400
Change: 2020-09-21 09:39:18.559883390 -0400
 Birth: -

[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux

Python 3.8.5 (default, Mar 31 2021, 02:37:07)

tensorflow-gpu 2.2 是使用 pip 安装的。 和

$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.9.2009 (Core)
Release:    7.9.2009
Codename:   Core

我也看到这个here但是我找不到下载文件:

从 NVIDIA 官方网站 cudnn-10.2-linux-x64-v7.6.5.32.tgz 下载后,使用这些命令为 CUDA 10.2 安装了 cuDNN 7.6.5

$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include 

$ sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64 

$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

然后:

$ export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH