无法将符号张量转换为 Numpy 数组(使用 RTX 30xx GPU)
Cannot convert a symbolic Tensor to Numpy array (using RTX 30xx GPU)
我用谷歌搜索了每个错误,尝试了很多解决方案,但我就是无法让 TensorFlow 为我 运行 一个 LSTM/GRU 网络。我以前能做到这一点。
我按照规定的方式使用 Anaconda 安装了它:conda create -n tf-gpu tensorFlow-gpu
,然后安装了 jupyterlab、spyder、matplotlib、scikit-learn 和 pandas,仅此而已。没有兼容性错误或警告。
我启动笔记本并试试这个:
def make_model(X_train, y_train):
model = Sequential()
model.add(InputLayer(input_shape = (X_train.shape[1], X_train.shape[2])))
model.add(GRU(units = 100))
model.add(Dense(units = 100, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(units = y_train.shape[1]))
model.compile(loss = 'mse', optimizer = 'adam', metrics = 'mae')
return model
但是无论我做什么,我都会遇到这个错误:
NotImplementedError: Cannot convert a symbolic Tensor
(gru_1/strided_slice:0) to a numpy array. This error may indicate that
you're trying to pass a Tensor to a NumPy call, which is not supported
我能找到的关于此错误的所有信息都表明它是一个 numpy 版本问题,我尝试使用 pip 降级到 1.18.5
,但这完全破坏了我的环境。尽管 Anaconda 告诉我 python 3.9
不兼容,但我现在正在尝试这样做。但是这种追逐鹅的行为已经失控了。
据我所知,我并没有尝试做任何特别的事情,这应该是开箱即用的,如果不是,Anaconda 有什么意义?问题是,我正在重用我确定在某一时刻(大约 9 个月前)工作的代码和数据。
我在一个新的环境中重新开始,这次使用 conda install tensorflow-gpu
安装了 tensorflow-gpu 而不是下载一个完整的环境。使用 conda install numpy=1.18.5
将 numpy 降级到 1.18.5 后,它似乎可以正常工作!但现在 tensorflow 没有检测到我的 gpu...
>>> import tensorflow as tf
>>> print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
我跟着this guide得出的结论是conda没有安装cudnn或cudatoolkit。 运行 nvcc -V
在命令提示符下产生了这个输出:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:25:35_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
该指南要求使用 conda search cudnn
并将提供的内部版本号与 nvcc -V
列出的内部版本号相匹配,因此在我的情况下:release 11.4
。当然,当我 运行 conda search cudnn
我得到这个:
# Name Version Build Channel
cudnn 7.1.4 cuda8.0_0 pkgs/main
cudnn 7.1.4 cuda9.0_0 pkgs/main
cudnn 7.3.1 cuda10.0_0 pkgs/main
cudnn 7.3.1 cuda9.0_0 pkgs/main
cudnn 7.6.0 cuda10.0_0 pkgs/main
cudnn 7.6.0 cuda10.1_0 pkgs/main
cudnn 7.6.0 cuda9.0_0 pkgs/main
cudnn 7.6.4 cuda10.0_0 pkgs/main
cudnn 7.6.4 cuda10.1_0 pkgs/main
cudnn 7.6.4 cuda9.0_0 pkgs/main
cudnn 7.6.5 cuda10.0_0 pkgs/main
cudnn 7.6.5 cuda10.1_0 pkgs/main
cudnn 7.6.5 cuda10.2_0 pkgs/main
cudnn 7.6.5 cuda9.0_0 pkgs/main
cudnn 7.6.5 cuda9.2_0 pkgs/main
cudnn 8.2.1 cuda11.3_0 pkgs/main
由于没有选择,我决定在新环境中为构建 cuda11 安装 8.2.1。3_0 然后安装 tensorflow-gpu,不出所料,这不起作用。
>>> import tensorflow as tf
>>> print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
所以我从 here 下载了 cuda 11.3 驱动程序,但是当我 运行 nvcc -V
时,输出保持不变。我正在考虑 运行ning DisplayDriverUninstaller 并重试。但是,为了让 tensorflow-gpu 正常工作,它的 2 个版本落后于最新版本!
我的硬件:
锐龙 9 5950x
NVIDIA RTX 3060 钛
64GB DDR4 内存
我在实际尝试 DDU 之前写这篇文章,因为我现在无法访问物理机器。如果它有任何变化,我会 post 明天回来更新。
可以看到针对此问题的完全不同的解决方案。我认为这对很多人来说还不够好,但是由于我今天的目标很简单,所以我要取得胜利。
重现步骤:
- 使用 python 3.7
创建一个新环境
- 安装 Cuda 10.1
- 重启电脑(不要跳过这个!)
- 在新环境中运行
conda install tensorflow-gpu=2.1
- 然后 运行
pip install tensorflow-gpu==2.3
恭喜,如果您遇到与我遇到的相同(但仍未知)的问题,现在应该已经解决了。请记住,许多其他不适用于 python <3.8 的库(或它们的更新)现在已关闭 table 并且您将使用的 tensorflow 版本已有一年的历史。
此外,tensorflow 库(非-gpu)在我的环境中仍然是 2.1 版。但在我再次破坏环境之前,我会在这里停下来把那个实验留给其他人。
edit: 事实证明它只能在命令提示符下工作并且没有错误地崩溃。从 spyder 的 Ipython 控制台尝试了一些东西(说实话不知道它是如何工作的),没有用。
最终确定答案:
硬件:
- 锐龙 9 5950X
- 64GB DDR4 内存
- RTX 3060 钛
我真的很想和 Anaconda 一起工作,因为我对它非常熟悉,而且我所做的一切都在 Anaconda 中进行。最重要的是,去年我让它在 Anaconda 中工作没问题,所以它必须是可能的!
问题:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import LSTM
from numpy.random import rand
X, y = rand(8000, 50, 5), rand(8000, 10)
model = keras.Sequential()
model.add(keras.Input(shape = (X.shape[1], X.shape[2])))
到目前为止一切正常。
下一行:
model.add(LSTM(units = 100))
产生以下错误:
NotImplementedError: Cannot convert a symbolic Tensor
(lstm_1/strided_slice:0) to a numpy array. This error may indicate that
you're trying to pass a Tensor to a NumPy call, which is not supported
原因/解决方法:
要获得明确的答案,我必须将您推荐给 Tensorflow
的开发人员,但我能够推断出以下内容:
和我有完全相同的问题,它是通过将 numpy
从 1.20.x
降级到 1.19.x
来解决的。关于 post 的讨论很有趣,基本上 Tensorflow
版本 >2.3.x
是用 numpy 1.19.5
编译的。 Anaconda 在使用 conda install tensorflow-gpu
时默认安装版本 1.20.x
,它们不能很好地播放。降级本身很容易解决。
如果您有 NVIDIA RTX 30xx GPU,那么您还没有完成!
长话短说,RTX 30xx采用Ampere架构,这需要较新版本的CUDA,这需要较新版本的Tensorflow
,准确地说是版本>2.4.x
。截至撰写本文时,此版本在 conda
.
上不可用
因此,conda
自动安装 cuDNN
和 cudatoolkit
所提供的所有便利不再可用。简单地 pip install tensorflow=2.4.0
是行不通的。最糟糕的是,它可能看起来一直在工作,直到训练了一个多小时才突然停止并出现完全随机的错误。 (sorry,我这时候已经准备暴走了,来晚了,没有记下错误,有很多,都没有解决。)
This guide 详细介绍了如何从源代码编译 cuDNN 和 CUDA。在您遵循本指南之前:如果您进入控制面板 > 程序和功能并从 NVIDIA 卸载所有内容 那不是: NVIDIA graphics driver
, NVIDIA geforce experience
, NVIDIA HD audio driver
, NVIDIA PhysX
.
另外重要提示:
在步骤 Building CUDA/cuDNN: Set 3 中有一个严重的拼写错误。该指南指示您复制文件
来自:
# 1. cuDNN
\...\cudnn-11.0-windows-x64-v8.0.4.30.zip\cuda\bin
至:
# 2. NVIDIA GPU Computing Toolkit
\...\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include
这是不正确的!!
应该来自:
# 1. cuDNN
\...\cudnn-11.0-windows-x64-v8.0.4.30.zip\cuda\bin
至:
# 2. NVIDIA GPU Computing Toolkit
\...\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin
按照本指南进行操作后,我 重新启动了我的电脑(不要跳过),使用 python 3.8.11
创建了一个新环境:
conda create -n tf python=3.8
我直接从命令提示符和我的新 tf
环境中使用 pip
安装了 tensorflow 2.4.0
:
pip install tensorflow==2.4.0
这也会安装 tensorflow
的 gpu 功能,而 anaconda
版本仅在调用 conda install tensorflow
时才安装 cpu。当然,它仍然不起作用,您现在已经安装了 numpy 1.20.3
(您可以使用 conda list numpy
进行检查)。只需使用 conda install numpy=1.19
即可将其降级。最重要的是,在我的系统上,指南中提供的示例:
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.compile(optimizer='Adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
history = model.fit(train_images, train_labels, batch_size=10, epochs=100)
会抛出一个错误(至少对我来说是这样):
NotFoundError: No algorithm worked!
[[node sequential/conv2d/Relu (defined at <ipython-input-1-bf665ec77ee4>:18) ]] [Op:__inference_train_function_580]
但是,我们对这个例子不感兴趣,我们想要运行 LSTM / GRU,并且不修复这个例子。因此我们将丢弃它并继续,现在我们将尝试:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import LSTM
from numpy.random import rand
X, y = rand(8000, 50, 5), rand(8000, 10)
model = keras.Sequential()
model.add(keras.Input(shape = (X.shape[1], X.shape[2])))
model.add(LSTM(units = 100))
model.add(Dense(units = 10))
低看,没有错误!
model.compile(loss = 'mse', optimizer = 'adam')
仍然没有错误!
history = model.fit(X, y, epochs = 10)
仍然没有错误!,它甚至使用了 GPU 吗?控制台中的消息似乎确实表明了这一点:
2021-08-19 13:04:09.234795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Default GPU Device: /device:GPU:0
training model
2021-08-19 13:04:09.234795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.645028: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-19 13:04:10.647857: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-08-19 13:04:10.662783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.662799: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.667119: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.667133: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.669347: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.670066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.675548: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.677202: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.677612: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.677658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.979738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.979763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.979770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.979886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.980387: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.980542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.980555: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.980563: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.980569: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.980575: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.980580: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.980586: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.980592: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.980646: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.980676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.980693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.980698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.980703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.980744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.980757: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.984016: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.984082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.984094: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.984100: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.984106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.984112: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.984117: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.984122: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.984127: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.984132: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.984158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.984332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.984344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.984350: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.984355: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.984360: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.984365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.984369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.984374: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.984420: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.984445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.984470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.984475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.984479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.984533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.984546: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:11.334311: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
查看任务管理器时,我可以看到内存已完全分配,并且 3D 图形显示 99% 的利用率!与使用 CPU 相比,所需的训练时间减少了四分之一。总而言之,非常成功!
我现在真的希望 运行我自己设计的 Conv2D 网络不会导致与示例相同的错误,但只有时间会证明一切,目前这对我来说已经足够好了目的。
我用谷歌搜索了每个错误,尝试了很多解决方案,但我就是无法让 TensorFlow 为我 运行 一个 LSTM/GRU 网络。我以前能做到这一点。
我按照规定的方式使用 Anaconda 安装了它:conda create -n tf-gpu tensorFlow-gpu
,然后安装了 jupyterlab、spyder、matplotlib、scikit-learn 和 pandas,仅此而已。没有兼容性错误或警告。
我启动笔记本并试试这个:
def make_model(X_train, y_train):
model = Sequential()
model.add(InputLayer(input_shape = (X_train.shape[1], X_train.shape[2])))
model.add(GRU(units = 100))
model.add(Dense(units = 100, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(units = y_train.shape[1]))
model.compile(loss = 'mse', optimizer = 'adam', metrics = 'mae')
return model
但是无论我做什么,我都会遇到这个错误:
NotImplementedError: Cannot convert a symbolic Tensor
(gru_1/strided_slice:0) to a numpy array. This error may indicate that
you're trying to pass a Tensor to a NumPy call, which is not supported
我能找到的关于此错误的所有信息都表明它是一个 numpy 版本问题,我尝试使用 pip 降级到 1.18.5
,但这完全破坏了我的环境。尽管 Anaconda 告诉我 python 3.9
不兼容,但我现在正在尝试这样做。但是这种追逐鹅的行为已经失控了。
据我所知,我并没有尝试做任何特别的事情,这应该是开箱即用的,如果不是,Anaconda 有什么意义?问题是,我正在重用我确定在某一时刻(大约 9 个月前)工作的代码和数据。
我在一个新的环境中重新开始,这次使用 conda install tensorflow-gpu
安装了 tensorflow-gpu 而不是下载一个完整的环境。使用 conda install numpy=1.18.5
将 numpy 降级到 1.18.5 后,它似乎可以正常工作!但现在 tensorflow 没有检测到我的 gpu...
>>> import tensorflow as tf
>>> print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
我跟着this guide得出的结论是conda没有安装cudnn或cudatoolkit。 运行 nvcc -V
在命令提示符下产生了这个输出:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:25:35_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
该指南要求使用 conda search cudnn
并将提供的内部版本号与 nvcc -V
列出的内部版本号相匹配,因此在我的情况下:release 11.4
。当然,当我 运行 conda search cudnn
我得到这个:
# Name Version Build Channel
cudnn 7.1.4 cuda8.0_0 pkgs/main
cudnn 7.1.4 cuda9.0_0 pkgs/main
cudnn 7.3.1 cuda10.0_0 pkgs/main
cudnn 7.3.1 cuda9.0_0 pkgs/main
cudnn 7.6.0 cuda10.0_0 pkgs/main
cudnn 7.6.0 cuda10.1_0 pkgs/main
cudnn 7.6.0 cuda9.0_0 pkgs/main
cudnn 7.6.4 cuda10.0_0 pkgs/main
cudnn 7.6.4 cuda10.1_0 pkgs/main
cudnn 7.6.4 cuda9.0_0 pkgs/main
cudnn 7.6.5 cuda10.0_0 pkgs/main
cudnn 7.6.5 cuda10.1_0 pkgs/main
cudnn 7.6.5 cuda10.2_0 pkgs/main
cudnn 7.6.5 cuda9.0_0 pkgs/main
cudnn 7.6.5 cuda9.2_0 pkgs/main
cudnn 8.2.1 cuda11.3_0 pkgs/main
由于没有选择,我决定在新环境中为构建 cuda11 安装 8.2.1。3_0 然后安装 tensorflow-gpu,不出所料,这不起作用。
>>> import tensorflow as tf
>>> print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
所以我从 here 下载了 cuda 11.3 驱动程序,但是当我 运行 nvcc -V
时,输出保持不变。我正在考虑 运行ning DisplayDriverUninstaller 并重试。但是,为了让 tensorflow-gpu 正常工作,它的 2 个版本落后于最新版本!
我的硬件: 锐龙 9 5950x NVIDIA RTX 3060 钛 64GB DDR4 内存
我在实际尝试 DDU 之前写这篇文章,因为我现在无法访问物理机器。如果它有任何变化,我会 post 明天回来更新。
可以看到针对此问题的完全不同的解决方案
重现步骤:
- 使用 python 3.7 创建一个新环境
- 安装 Cuda 10.1
- 重启电脑(不要跳过这个!)
- 在新环境中运行
conda install tensorflow-gpu=2.1
- 然后 运行
pip install tensorflow-gpu==2.3
恭喜,如果您遇到与我遇到的相同(但仍未知)的问题,现在应该已经解决了。请记住,许多其他不适用于 python <3.8 的库(或它们的更新)现在已关闭 table 并且您将使用的 tensorflow 版本已有一年的历史。
此外,tensorflow 库(非-gpu)在我的环境中仍然是 2.1 版。但在我再次破坏环境之前,我会在这里停下来把那个实验留给其他人。
edit: 事实证明它只能在命令提示符下工作并且没有错误地崩溃。从 spyder 的 Ipython 控制台尝试了一些东西(说实话不知道它是如何工作的),没有用。
最终确定答案:
硬件:
- 锐龙 9 5950X
- 64GB DDR4 内存
- RTX 3060 钛
我真的很想和 Anaconda 一起工作,因为我对它非常熟悉,而且我所做的一切都在 Anaconda 中进行。最重要的是,去年我让它在 Anaconda 中工作没问题,所以它必须是可能的!
问题:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import LSTM
from numpy.random import rand
X, y = rand(8000, 50, 5), rand(8000, 10)
model = keras.Sequential()
model.add(keras.Input(shape = (X.shape[1], X.shape[2])))
到目前为止一切正常。
下一行:
model.add(LSTM(units = 100))
产生以下错误:
NotImplementedError: Cannot convert a symbolic Tensor
(lstm_1/strided_slice:0) to a numpy array. This error may indicate that
you're trying to pass a Tensor to a NumPy call, which is not supported
原因/解决方法:
要获得明确的答案,我必须将您推荐给 Tensorflow
的开发人员,但我能够推断出以下内容:
numpy
从 1.20.x
降级到 1.19.x
来解决的。关于 post 的讨论很有趣,基本上 Tensorflow
版本 >2.3.x
是用 numpy 1.19.5
编译的。 Anaconda 在使用 conda install tensorflow-gpu
时默认安装版本 1.20.x
,它们不能很好地播放。降级本身很容易解决。
如果您有 NVIDIA RTX 30xx GPU,那么您还没有完成!
长话短说,RTX 30xx采用Ampere架构,这需要较新版本的CUDA,这需要较新版本的Tensorflow
,准确地说是版本>2.4.x
。截至撰写本文时,此版本在 conda
.
因此,conda
自动安装 cuDNN
和 cudatoolkit
所提供的所有便利不再可用。简单地 pip install tensorflow=2.4.0
是行不通的。最糟糕的是,它可能看起来一直在工作,直到训练了一个多小时才突然停止并出现完全随机的错误。 (sorry,我这时候已经准备暴走了,来晚了,没有记下错误,有很多,都没有解决。)
This guide 详细介绍了如何从源代码编译 cuDNN 和 CUDA。在您遵循本指南之前:如果您进入控制面板 > 程序和功能并从 NVIDIA 卸载所有内容 那不是: NVIDIA graphics driver
, NVIDIA geforce experience
, NVIDIA HD audio driver
, NVIDIA PhysX
.
另外重要提示:
在步骤 Building CUDA/cuDNN: Set 3 中有一个严重的拼写错误。该指南指示您复制文件
来自:
# 1. cuDNN
\...\cudnn-11.0-windows-x64-v8.0.4.30.zip\cuda\bin
至:
# 2. NVIDIA GPU Computing Toolkit
\...\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include
这是不正确的!!
应该来自:
# 1. cuDNN
\...\cudnn-11.0-windows-x64-v8.0.4.30.zip\cuda\bin
至:
# 2. NVIDIA GPU Computing Toolkit
\...\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin
按照本指南进行操作后,我 重新启动了我的电脑(不要跳过),使用 python 3.8.11
创建了一个新环境:
conda create -n tf python=3.8
我直接从命令提示符和我的新 tf
环境中使用 pip
安装了 tensorflow 2.4.0
:
pip install tensorflow==2.4.0
这也会安装 tensorflow
的 gpu 功能,而 anaconda
版本仅在调用 conda install tensorflow
时才安装 cpu。当然,它仍然不起作用,您现在已经安装了 numpy 1.20.3
(您可以使用 conda list numpy
进行检查)。只需使用 conda install numpy=1.19
即可将其降级。最重要的是,在我的系统上,指南中提供的示例:
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.compile(optimizer='Adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
history = model.fit(train_images, train_labels, batch_size=10, epochs=100)
会抛出一个错误(至少对我来说是这样):
NotFoundError: No algorithm worked!
[[node sequential/conv2d/Relu (defined at <ipython-input-1-bf665ec77ee4>:18) ]] [Op:__inference_train_function_580]
但是,我们对这个例子不感兴趣,我们想要运行 LSTM / GRU,并且不修复这个例子。因此我们将丢弃它并继续,现在我们将尝试:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import LSTM
from numpy.random import rand
X, y = rand(8000, 50, 5), rand(8000, 10)
model = keras.Sequential()
model.add(keras.Input(shape = (X.shape[1], X.shape[2])))
model.add(LSTM(units = 100))
model.add(Dense(units = 10))
低看,没有错误!
model.compile(loss = 'mse', optimizer = 'adam')
仍然没有错误!
history = model.fit(X, y, epochs = 10)
仍然没有错误!,它甚至使用了 GPU 吗?控制台中的消息似乎确实表明了这一点:
2021-08-19 13:04:09.234795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Default GPU Device: /device:GPU:0
training model
2021-08-19 13:04:09.234795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.645028: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-19 13:04:10.647857: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-08-19 13:04:10.662783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.662799: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.667119: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.667133: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.669347: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.670066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.675548: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.677202: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.677612: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.677658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.979738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.979763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.979770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.979886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.980387: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.980542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.980555: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.980563: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.980569: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.980575: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.980580: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.980586: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.980592: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.980646: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.980676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.980693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.980698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.980703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.980744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.980757: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.984016: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.984082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.984094: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.984100: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.984106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.984112: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.984117: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.984122: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.984127: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.984132: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.984158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.984332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.984344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.984350: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.984355: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.984360: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.984365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.984369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.984374: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.984420: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.984445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.984470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.984475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.984479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.984533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.984546: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:11.334311: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
查看任务管理器时,我可以看到内存已完全分配,并且 3D 图形显示 99% 的利用率!与使用 CPU 相比,所需的训练时间减少了四分之一。总而言之,非常成功!
我现在真的希望 运行我自己设计的 Conv2D 网络不会导致与示例相同的错误,但只有时间会证明一切,目前这对我来说已经足够好了目的。