Tensorflow 无法在 ubuntu 18.04 上加载动态库 'libcudart.so.10.0
Tensorflow Could not load dynamic library 'libcudart.so.10.0 on ubuntu 18.04
我有
$ python3 -c "import tensorflow as tf;print(tf.__version__)"
1.15.0
和
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
和
python --version
Python 3.6.9
pip --version
pip 19.3.1 from /usr/local/lib/python3.6/dist-packages/pip (python 3.6)
但我从 nvidia-smi
看到 CUDA 10.2
$ nvidia-smi
Tue Nov 17 18:40:54 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 On | 00000000:01:00.0 Off | N/A |
| 32% 42C P2 56W / 215W | 265MiB / 7979MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1840 G /usr/lib/xorg/Xorg 57MiB |
| 0 1895 G /usr/bin/gnome-shell 85MiB |
| 0 29999 C /usr/bin/python 109MiB |
+-----------------------------------------------------------------------------+
我能看到
$ ls /usr/local/
bin cuda cuda-10.1 cuda-10.2 etc games include lib man sbin share src
在 .profile
我可以看到
# set PATH for cuda 10.2 installation
if [ -d "/usr/local/cuda-10.2/bin/" ]; then
export PATH=/usr/local/cuda-10.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi
所以我确实将 PATH
和 LD_LIBRARY_PATH
覆盖为
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
但似乎没有解决。
2020-11-17 18:38:39.470074: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-17 18:38:39.487544: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-11-17 18:38:39.489215: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47007e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-17 18:38:39.489273: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-17 18:38:39.494309: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-17 18:38:39.542010: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 18:38:39.542387: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b1bf40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-17 18:38:39.542399: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2020-11-17 18:38:39.542519: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 18:38:39.542788: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2020-11-17 18:38:39.542872: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.542919: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.543012: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.543059: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.543093: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.543125: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.545590: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-17 18:38:39.545617: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-11-17 18:38:39.545653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-17 18:38:39.545658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-11-17 18:38:39.545662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
['/device:CPU:0', '/device:XLA_CPU:0', '/device:XLA_GPU:0']
我假设图书馆存在于 /usr/local/lib/libcudart.so.11.0
首先激活您的 python 虚拟环境,例如:source ./venv/bin/activate
在虚拟环境中设置后 LD_LIBRARY_PATH
:
export LD_LIBRARY_PATH=/usr/local/lib
终于重新运行
在我的例子中,Tensor Flow 正在寻找 libcudart.so.11.0,上面的步骤对我有用:
devbox1@devbox1:~/onibex/algo$ source ./venv/bin/activate
(venv) devbox1@devbox1:~/onibex/algo$
(venv) devbox1@devbox1:~/onibex/algo$ cd /home/devbox1/docs/onibex/wa/data/sprint0/code/algo ; /usr/bin/env /home/devbox1/docs/onibex/wa/data/sprint0/code/algo/venv/bin/python3 /home/devbox1/.vscode/extensions/ms-python.python-2021.2.636928669/pythonFiles/lib/python/debugpy/launcher 34287 -- /home/devbox1/docs/onibex/wa/data/sprint0/code/algo/quickly_tensor_flow.py
2021-03-14 00:12:18.588232: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory;
(venv) devbox1@devbox1:~/onibex/algo$ export LD_LIBRARY_PATH=/usr/local/cuda-11.2/targets/x86_64-linux/lib
(venv) devbox1@devbox1:~/onibex/algo$ echo $LD_LIBRARY_PATH
/usr/local/cuda-11.2/targets/x86_64-linux/lib
(venv) devbox1@devbox1:~/onibex/algo$ cd /home/devbox1/docs/onibex/wa/data/sprint0/code/algo ; /usr/bin/env /home/devbox1/docs/onibex/wa/data/sprint0/code/algo/venv/bin/python3 /home/devbox1/.vscode/extensions/ms-python.python-2021.2.636928669/pythonFiles/lib/python/debugpy/launcher 34089 -- /home/devbox1/docs/onibex/wa/data/sprint0/code/algo/quickly_tensor_flow.py
2021-03-14 21:36:49.207430: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
... hello world!
(venv) devbox1@devbox1:~/onibex/algo$
我有
$ python3 -c "import tensorflow as tf;print(tf.__version__)"
1.15.0
和
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
和
python --version
Python 3.6.9
pip --version
pip 19.3.1 from /usr/local/lib/python3.6/dist-packages/pip (python 3.6)
但我从 nvidia-smi
CUDA 10.2
$ nvidia-smi
Tue Nov 17 18:40:54 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 On | 00000000:01:00.0 Off | N/A |
| 32% 42C P2 56W / 215W | 265MiB / 7979MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1840 G /usr/lib/xorg/Xorg 57MiB |
| 0 1895 G /usr/bin/gnome-shell 85MiB |
| 0 29999 C /usr/bin/python 109MiB |
+-----------------------------------------------------------------------------+
我能看到
$ ls /usr/local/
bin cuda cuda-10.1 cuda-10.2 etc games include lib man sbin share src
在 .profile
我可以看到
# set PATH for cuda 10.2 installation
if [ -d "/usr/local/cuda-10.2/bin/" ]; then
export PATH=/usr/local/cuda-10.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi
所以我确实将 PATH
和 LD_LIBRARY_PATH
覆盖为
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
但似乎没有解决。
2020-11-17 18:38:39.470074: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-17 18:38:39.487544: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-11-17 18:38:39.489215: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47007e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-17 18:38:39.489273: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-17 18:38:39.494309: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-17 18:38:39.542010: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 18:38:39.542387: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b1bf40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-17 18:38:39.542399: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2020-11-17 18:38:39.542519: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 18:38:39.542788: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2020-11-17 18:38:39.542872: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.542919: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.543012: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.543059: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.543093: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.543125: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-11-17 18:38:39.545590: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-17 18:38:39.545617: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-11-17 18:38:39.545653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-17 18:38:39.545658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-11-17 18:38:39.545662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
['/device:CPU:0', '/device:XLA_CPU:0', '/device:XLA_GPU:0']
我假设图书馆存在于 /usr/local/lib/libcudart.so.11.0
首先激活您的 python 虚拟环境,例如:
source ./venv/bin/activate
在虚拟环境中设置后
LD_LIBRARY_PATH
:export LD_LIBRARY_PATH=/usr/local/lib
终于重新运行
在我的例子中,Tensor Flow 正在寻找 libcudart.so.11.0,上面的步骤对我有用:
devbox1@devbox1:~/onibex/algo$ source ./venv/bin/activate
(venv) devbox1@devbox1:~/onibex/algo$
(venv) devbox1@devbox1:~/onibex/algo$ cd /home/devbox1/docs/onibex/wa/data/sprint0/code/algo ; /usr/bin/env /home/devbox1/docs/onibex/wa/data/sprint0/code/algo/venv/bin/python3 /home/devbox1/.vscode/extensions/ms-python.python-2021.2.636928669/pythonFiles/lib/python/debugpy/launcher 34287 -- /home/devbox1/docs/onibex/wa/data/sprint0/code/algo/quickly_tensor_flow.py
2021-03-14 00:12:18.588232: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory;
(venv) devbox1@devbox1:~/onibex/algo$ export LD_LIBRARY_PATH=/usr/local/cuda-11.2/targets/x86_64-linux/lib
(venv) devbox1@devbox1:~/onibex/algo$ echo $LD_LIBRARY_PATH
/usr/local/cuda-11.2/targets/x86_64-linux/lib
(venv) devbox1@devbox1:~/onibex/algo$ cd /home/devbox1/docs/onibex/wa/data/sprint0/code/algo ; /usr/bin/env /home/devbox1/docs/onibex/wa/data/sprint0/code/algo/venv/bin/python3 /home/devbox1/.vscode/extensions/ms-python.python-2021.2.636928669/pythonFiles/lib/python/debugpy/launcher 34089 -- /home/devbox1/docs/onibex/wa/data/sprint0/code/algo/quickly_tensor_flow.py
2021-03-14 21:36:49.207430: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
... hello world!
(venv) devbox1@devbox1:~/onibex/algo$