Tensorflow 2.0rc 未检测到 GPU

Question

TF2 当前未检测到 GPU，我从使用

的 TF1.14 迁移而来

tf.keras.utils.multi_gpu_model(model=model, gpus=2)

现在返回错误

ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0', '/xla_gpu:0', '/xla_gpu:1', '/xla_gpu:2', '/xla_gpu:3']. Try reducing `gpus`.

运行nvidia-smireturns以下信息

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:09:00.0 Off |                    0 |
| N/A   46C    P0    62W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:0A:00.0 Off |                    0 |
| N/A   36C    P0    71W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:86:00.0 Off |                    0 |
| N/A   38C    P0    58W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:87:00.0 Off |                    0 |
| N/A   31C    P0    82W / 149W |      0MiB / 11441MiB |     73%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

也是我的 TF 版本，是为 cuda 构建的

2.0.0-rc0

请让我知道我做错了什么，以便我可以解决它。

Answer 1

CUDA 应该是 10.0 版本，而不是 10.1

Answer 2

我建议你-

请先检查您的 Cuda 版本。确保它是 10.0.
如果是10.0，那么检查你的TF版本是不是GPU的。
检查TF是否可以使用命令访问GPU

value = tf.test.is_gpu_available(
    cuda_only=False,
    min_cuda_compute_capability=None
)
print ('***If TF can access GPU: ***\n\n',value) # MUST RETURN True IF IT CAN!!

我假设您已经解决了前两点。如果 TF 也可以访问您的 GPU，那么正如您在 Value error 中看到的那样，它实际上具有 GPU 的名称。我不能说 tf.keras.utils.multi_gpu_model() 函数，因为我没有在 TF 中使用它。但我建议您使用 with tf.device('/gpu:0'):。在这里面你调用你的 model 或定义模型。
如果第 4 点也不起作用，则只需添加以下行

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3" # 0,1,2,3 are number of GPUs

在文件的顶部并删除 with tf.device('/gpu:0')

Tensorflow 2.0rc 未检测到 GPU

Tensorflow 2.0rc not detecting GPUs

python

keras

tensorflow2.0