具有 TensorFlow 后端的 Keras 不使用 GPU

Keras with TensorFlow backend not using GPU

我构建了 docker 图像的 gpu 版本 https://github.com/floydhub/dl-docker with keras version 2.0.0 and tensorflow version 0.12.1. I then ran the mnist tutorial https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py 但意识到 keras 没有使用 GPU。下面是我的输出

root@b79b8a57fb1f:~/sharedfolder# python test.py
Using TensorFlow backend.
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2017-09-06 16:26:54.866833: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-06 16:26:54.866855: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-06 16:26:54.866863: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-06 16:26:54.866870: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-06 16:26:54.866876: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

谁能告诉我在 keras 使用 GPU 之前是否需要进行一些设置?我对所有这些都很陌生,所以如果我需要提供更多信息,请告诉我。

我已经安装了 page

中提到的先决条件

我可以启动 docker 图像

docker run -it -p 8888:8888 -p 6006:6006 -v /sharedfolder:/root/sharedfolder floydhub/dl-docker:cpu bash

我可以运行最后一步

cv@cv-P15SM:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.66  Mon May  1 15:29:16 PDT 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

我能够运行这一步here

# Test nvidia-smi
cv@cv-P15SM:~$ nvidia-docker run --rm nvidia/cuda nvidia-smi

Thu Sep  7 00:33:06 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 780M    Off  | 0000:01:00.0     N/A |                  N/A |
| N/A   55C    P0    N/A /  N/A |    310MiB /  4036MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

我还能够 运行 nvidia-docker 命令来启动 gpu 支持的图像。

我试过的

我尝试了下面的建议

  1. 检查您是否完成了本教程的第 9 步 (https://github.com/ignaciorlando/skinner/wiki/Keras-and-TensorFlow-installation)。注意:您的文件路径在 docker 图像中可能完全不同,您必须以某种方式找到它们。

我将建议的行附加到我的 bashrc 并已验证 bashrc 文件已更新。

echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda-8.0' >> ~/.bashrc
  1. 在我的 python 文件中导入以下命令

    import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152 os.environ["CUDA_VISIBLE_DEVICES"]="0"

不幸的是,这两个步骤单独或一起完成都没有解决问题。 Keras 仍然是 运行 的 CPU 版本的 tensorflow 作为其后端。但是,我可能已经发现了可能的问题。我通过以下命令检查了我的tensorflow的版本,发现了两个。

这是CPU版本

root@08b5fff06800:~# pip show tensorflow
Name: tensorflow
Version: 1.3.0
Summary: TensorFlow helps the tensors flow
Home-page: http://tensorflow.org/
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /usr/local/lib/python2.7/dist-packages
Requires: tensorflow-tensorboard, six, protobuf, mock, numpy, backports.weakref, wheel

这是 GPU 版本

root@08b5fff06800:~# pip show tensorflow-gpu
Name: tensorflow-gpu
Version: 0.12.1
Summary: TensorFlow helps the tensors flow
Home-page: http://tensorflow.org/
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /usr/local/lib/python2.7/dist-packages
Requires: mock, numpy, protobuf, wheel, six

有趣的是,输出显示 keras 使用的是 tensorflow 版本 1.3.0,它是 CPU 版本而不是 0.12.1,即 GPU 版本

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

import tensorflow as tf
print('Tensorflow: ', tf.__version__)

输出

root@08b5fff06800:~/sharedfolder# python test.py
Using TensorFlow backend.
Tensorflow:  1.3.0

我想现在我需要弄清楚如何让keras使用tensorflow的gpu版本。

tensorflowtensorflow-gpu 软件包并排安装是 永远不会 一个好主意(我不小心发生了一次,Keras 使用的是 CPU 版本)。

I guess now I need to figure out how to have keras use the gpu version of tensorflow.

您只需从系统中删除这两个软件包,然后重新安装 tensorflow-gpu [评论后更新]:

pip uninstall tensorflow tensorflow-gpu
pip install tensorflow-gpu

此外,令人费解的是为什么您似乎使用 floydhub/dl-docker:cpu 容器,而根据说明您应该使用 floydhub/dl-docker:gpu 容器...

以后可以尝试使用虚拟环境将tensorflowCPU和GPU分开,例如:

conda create --name tensorflow python=3.5
activate tensorflow
pip install tensorflow

conda create --name tensorflow-gpu python=3.5
activate tensorflow-gpu
pip install tensorflow-gpu

我遇到了类似的问题 - keras 没有使用我的 GPU。我根据 conda 中的说明安装了 tensorflow-gpu,但是在安装 keras 之后它根本没有将 GPU 列为可用设备。我已经意识到安装keras会添加tensorflow包!所以我同时拥有 tensorflow 和 tensorflow-gpu 包。我发现有可用的 keras-gpu 包。彻底卸载keras、tensorflow、tensorflow-gpu并安装tensorflow-gpu、keras-gpu后问题解决

这对我有用: 安装tensorflow v2.2.0 pip 安装 tensorflow==2.2.0 同时删除 tensorflow-gpu(如果存在)