tensorflow-gpu 耗时太长

Question

已解决

我最近买了一台配备 Nvidia RTX 3080 的笔记本电脑，并安装了 tensorflow-gpu 所需的必要库。安装它们之后，我运行以下代码用于完整性检查：

import tensorflow as tf
import time


print(f"TensorFlow version: {tf.__version__}")
# TensorFlow version: 2.3.0

start = time.time()
print(tf.reduce_sum(tf.random.normal([1000, 1000])))
end = time.time()

print(f"it took = {end - start} seconds")

"""
2021-05-18 22:43:03.963371: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-05-18 22:43:05.775204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.545GHz coreCount: 48 deviceMemorySize: 16.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-05-18 22:43:05.775328: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-05-18 22:43:05.780061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-05-18 22:43:05.782762: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-05-18 22:43:05.783655: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-05-18 22:43:05.786527: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-05-18 22:43:05.788290: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-05-18 22:43:05.798942: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-05-18 22:43:05.799065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-18 22:43:05.799697: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-18 22:43:05.805786: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ace28679f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-05-18 22:43:05.805863: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-05-18 22:43:05.806387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.545GHz coreCount: 48 deviceMemorySize: 16.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-05-18 22:43:05.806547: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-05-18 22:43:05.807051: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-05-18 22:43:05.807346: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-05-18 22:43:05.807641: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-05-18 22:43:05.807948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-05-18 22:43:05.808240: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-05-18 22:43:05.808529: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-05-18 22:43:05.808841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-18 22:46:57.375562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-18 22:46:57.375695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-05-18 22:46:57.376038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-05-18 22:46:57.376271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14255 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-05-18 22:46:57.378538: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1aca510dc20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-05-18 22:46:57.378605: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3080 Laptop GPU, Compute Capability 8.6
tf.Tensor(-1331.8541, shape=(), dtype=float32)
it took = 233.85769605636597 seconds
"""

这一行大约用了 4 分钟。这不行。有什么地方不对劲。有关已安装系统的更多信息：

sys_details = tf.sysconfig.get_build_info()

sys_details['cuda_version']
# '64_101'

sys_details['cuda_compute_capabilities']
'''
['compute_30',
 'compute_35',
 'compute_52',
 'compute_60',
 'compute_61',
 'compute_70',
 'compute_75']
'''

sys_details['cudnn_version']
# '64_7'

怎么了？

Answer 1

Nvidia RTX 3080 卡基于 Ampere 架构，兼容的 CUDA 版本以 11.x.

开头

将 tensorflow 从 2.3 升级到 2.4 或 2.5 将解决上述问题。更多详情可以参考here.

tensorflow-gpu 耗时太长

tensorflow-gpu taking too long

nvidia

tensorflow

tensorflow2.0