Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
我刚刚用
更新了我的显卡驱动器
sudo apt install nvidia-driver-470
sudo apt install cuda-drivers-470
我决定以这种方式安装它们,因为它们在尝试 sudo apt upgrade
时受到阻碍。然后我错误地做了 sudo apt autoremove
来清理旧包。重新启动计算机以正确设置新驱动程序后,我无法再使用 tensorflow 的 GPU 加速。
import tensorflow as tf
tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-12-07 16:52:01.771391: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 16:52:01.807283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 16:52:01.807973: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.808017: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.808048: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.856391: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.856466: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.857601: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
False
您可以在 /usr/lib/x86_64-linux-gnu
目录中创建符号链接。我通过以下方式找到它:
$ whereis libcudart
libcudart: /usr/lib/x86_64-linux-gnu/libcudart.so /usr/share/man/man7/libcudart.7.gz
在此文件夹中,您可以找到这些 cuda 库的其他版本。然后像这样创建符号链接。您链接到的特定版本可能略有不同。
$ sudo ln -s libcublas.so.10.2.1.243 libcublas.so.11
$ sudo ln -s libcublasLt.so.10.2.1.243 libcublasLt.so.11
$ sudo ln -s libcusolver.so.10.2.0.243 libcusolver.so.11
$ sudo ln -s libcusparse.so.10.3.0.243 libcusparse.so.11
现在应该检测到您的 GPU。
import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-12-07 17:07:26.914296: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 17:07:26.950731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.029687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.030421: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.325218: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.325642: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.326022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.326408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 9280 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:06:00.0, compute capability: 8.6
True
这种方法之所以有效,是因为这些 cuda 库非常相似,甚至 NVIDIA 也经常使用符号链接来构建它们。如果 tensorflow 正在寻找 libcublas.so.11
,您可以使用该名称创建一个文件,该文件仅指向已安装的另一个版本的 libcublas。
你安装了cuda-toolkit
了吗?该错误表明找不到版本 11 的库。问题是 cudatoolkit 和 cudnn 版本可能与您的 tensorflow 版本不兼容。
如果您已经安装了正确版本的工具包,请直接转到步骤 5。(您可以使用命令 nvcc --version
检查版本)。
从 https://developer.nvidia.com/cuda-11-4-4-download-archive?target_os=Linux 下载安装程序(此版本与您安装的驱动程序 nvidia-470
兼容)。接下来的步骤特定于 runfile
选项。
因为您已经安装了 nvidia-drivers
,如果出现此消息,请按 Continue
。
接受条款。
同样,因为您已经安装了驱动程序,只需禁用驱动程序选项并按 Install
。
现在您需要配置二进制文件和库的路径。使用 find
命令搜索 nvcc
和 libcublas.so.*
:
sudo find / -name 'nvcc' # Path to binaries
sudo find / -name 'libcublas.so.*' # Path to libraries
最后,根据您在上面找到的路径,在文件 ~/.profile
的末尾添加下一行。 Cuda 安装在我系统的 /usr/local/cuda-11.4
上。
if [ -d "/usr/local/cuda-11.4" ]; then
PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
LD_LIBRARY_PATH=/usr/local/cuda-11.4/targets/x86_64-linux/lib/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi
如果更新 ~\.profile
不起作用,请尝试更新 .bashrc
或 .zshrc
(以防您使用 zsh
而不是 bash
)。
- 重新启动计算机。
我刚刚用
更新了我的显卡驱动器sudo apt install nvidia-driver-470
sudo apt install cuda-drivers-470
我决定以这种方式安装它们,因为它们在尝试 sudo apt upgrade
时受到阻碍。然后我错误地做了 sudo apt autoremove
来清理旧包。重新启动计算机以正确设置新驱动程序后,我无法再使用 tensorflow 的 GPU 加速。
import tensorflow as tf
tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-12-07 16:52:01.771391: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 16:52:01.807283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 16:52:01.807973: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.808017: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.808048: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.856391: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.856466: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.857601: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
False
您可以在 /usr/lib/x86_64-linux-gnu
目录中创建符号链接。我通过以下方式找到它:
$ whereis libcudart
libcudart: /usr/lib/x86_64-linux-gnu/libcudart.so /usr/share/man/man7/libcudart.7.gz
在此文件夹中,您可以找到这些 cuda 库的其他版本。然后像这样创建符号链接。您链接到的特定版本可能略有不同。
$ sudo ln -s libcublas.so.10.2.1.243 libcublas.so.11
$ sudo ln -s libcublasLt.so.10.2.1.243 libcublasLt.so.11
$ sudo ln -s libcusolver.so.10.2.0.243 libcusolver.so.11
$ sudo ln -s libcusparse.so.10.3.0.243 libcusparse.so.11
现在应该检测到您的 GPU。
import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-12-07 17:07:26.914296: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 17:07:26.950731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.029687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.030421: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.325218: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.325642: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.326022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.326408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 9280 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:06:00.0, compute capability: 8.6
True
这种方法之所以有效,是因为这些 cuda 库非常相似,甚至 NVIDIA 也经常使用符号链接来构建它们。如果 tensorflow 正在寻找 libcublas.so.11
,您可以使用该名称创建一个文件,该文件仅指向已安装的另一个版本的 libcublas。
你安装了cuda-toolkit
了吗?该错误表明找不到版本 11 的库。问题是 cudatoolkit 和 cudnn 版本可能与您的 tensorflow 版本不兼容。
如果您已经安装了正确版本的工具包,请直接转到步骤 5。(您可以使用命令 nvcc --version
检查版本)。
从 https://developer.nvidia.com/cuda-11-4-4-download-archive?target_os=Linux 下载安装程序(此版本与您安装的驱动程序
nvidia-470
兼容)。接下来的步骤特定于runfile
选项。因为您已经安装了
nvidia-drivers
,如果出现此消息,请按Continue
。接受条款。
同样,因为您已经安装了驱动程序,只需禁用驱动程序选项并按
Install
。现在您需要配置二进制文件和库的路径。使用
find
命令搜索nvcc
和libcublas.so.*
:sudo find / -name 'nvcc' # Path to binaries sudo find / -name 'libcublas.so.*' # Path to libraries
最后,根据您在上面找到的路径,在文件
~/.profile
的末尾添加下一行。 Cuda 安装在我系统的/usr/local/cuda-11.4
上。if [ -d "/usr/local/cuda-11.4" ]; then PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}} LD_LIBRARY_PATH=/usr/local/cuda-11.4/targets/x86_64-linux/lib/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} fi
如果更新 ~\.profile
不起作用,请尝试更新 .bashrc
或 .zshrc
(以防您使用 zsh
而不是 bash
)。
- 重新启动计算机。