无法将符号张量转换为 Numpy 数组(使用 RTX 30xx GPU)

Cannot convert a symbolic Tensor to Numpy array (using RTX 30xx GPU)

我用谷歌搜索了每个错误,尝试了很多解决方案,但我就是无法让 TensorFlow 为我 运行 一个 LSTM/GRU 网络。我以前能做到这一点。

我按照规定的方式使用 Anaconda 安装了它:conda create -n tf-gpu tensorFlow-gpu,然后安装了 jupyterlab、spyder、matplotlib、scikit-learn 和 pandas,仅此而已。没有兼容性错误或警告。

我启动笔记本并试试这个:

def make_model(X_train, y_train):
    model = Sequential()
    model.add(InputLayer(input_shape = (X_train.shape[1], X_train.shape[2])))
    model.add(GRU(units = 100))
    model.add(Dense(units = 100, activation = 'relu'))
    model.add(Dropout(0.2))
    model.add(Dense(units = y_train.shape[1]))
    model.compile(loss = 'mse', optimizer = 'adam', metrics = 'mae')
    return model

但是无论我做什么,我都会遇到这个错误:

NotImplementedError: Cannot convert a symbolic Tensor
(gru_1/strided_slice:0) to a numpy array. This error may indicate that
you're trying to pass a Tensor to a NumPy call, which is not supported

我能找到的关于此错误的所有信息都表明它是一个 numpy 版本问题,我尝试使用 pip 降级到 1.18.5,但这完全破坏了我的环境。尽管 Anaconda 告诉我 python 3.9 不兼容,但我现在正在尝试这样做。但是这种追逐鹅的行为已经失控了。

据我所知,我并没有尝试做任何特别的事情,这应该是开箱即用的,如果不是,Anaconda 有什么意义?问题是,我正在重用我确定在某一时刻(大约 9 个月前)工作的代码和数据。

我在一个新的环境中重新开始,这次使用 conda install tensorflow-gpu 安装了 tensorflow-gpu 而不是下载一个完整的环境。使用 conda install numpy=1.18.5 将 numpy 降级到 1.18.5 后,它似乎可以正常工作!但现在 tensorflow 没有检测到我的 gpu...

>>> import tensorflow as tf
>>> print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')] 

我跟着this guide得出的结论是conda没有安装cudnn或cudatoolkit。 运行 nvcc -V 在命令提示符下产生了这个输出:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:25:35_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0

该指南要求使用 conda search cudnn 并将提供的内部版本号与 nvcc -V 列出的内部版本号相匹配,因此在我的情况下:release 11.4。当然,当我 运行 conda search cudnn 我得到这个:

# Name                       Version           Build  Channel
cudnn                          7.1.4       cuda8.0_0  pkgs/main
cudnn                          7.1.4       cuda9.0_0  pkgs/main
cudnn                          7.3.1      cuda10.0_0  pkgs/main
cudnn                          7.3.1       cuda9.0_0  pkgs/main
cudnn                          7.6.0      cuda10.0_0  pkgs/main
cudnn                          7.6.0      cuda10.1_0  pkgs/main
cudnn                          7.6.0       cuda9.0_0  pkgs/main
cudnn                          7.6.4      cuda10.0_0  pkgs/main
cudnn                          7.6.4      cuda10.1_0  pkgs/main
cudnn                          7.6.4       cuda9.0_0  pkgs/main
cudnn                          7.6.5      cuda10.0_0  pkgs/main
cudnn                          7.6.5      cuda10.1_0  pkgs/main
cudnn                          7.6.5      cuda10.2_0  pkgs/main
cudnn                          7.6.5       cuda9.0_0  pkgs/main
cudnn                          7.6.5       cuda9.2_0  pkgs/main
cudnn                          8.2.1      cuda11.3_0  pkgs/main

由于没有选择,我决定在新环境中为构建 cuda11 安装 8.2.1。3_0 然后安装 tensorflow-gpu,不出所料,这不起作用。

>>> import tensorflow as tf
>>> print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

所以我从 here 下载了 cuda 11.3 驱动程序,但是当我 运行 nvcc -V 时,输出保持不变。我正在考虑 运行ning DisplayDriverUninstaller 并重试。但是,为了让 tensorflow-gpu 正常工作,它的 2 个版本落后于最新版本!

我的硬件: 锐龙 9 5950x NVIDIA RTX 3060 钛 64GB DDR4 内存

我在实际尝试 DDU 之前写这篇文章,因为我现在无法访问物理机器。如果它有任何变化,我会 post 明天回来更新。

可以看到针对此问题的完全不同的解决方案。我认为这对很多人来说还不够好,但是由于我今天的目标很简单,所以我要取得胜利。

重现步骤:

  1. 使用 python 3.7
  2. 创建一个新环境
  3. 安装 Cuda 10.1
  4. 重启电脑(不要跳过这个!)
  5. 在新环境中运行conda install tensorflow-gpu=2.1
  6. 然后 运行 pip install tensorflow-gpu==2.3

恭喜,如果您遇到与我遇到的相同(但仍未知)的问题,现在应该已经解决了。请记住,许多其他不适用于 python <3.8 的库(或它们的更新)现在已关闭 table 并且您将使用的 tensorflow 版本已有一年的历史。

此外,tensorflow 库(非-gpu)在我的环境中仍然是 2.1 版。但在我再次破坏环境之前,我会在这里停下来把那个实验留给其他人。

edit: 事实证明它只能在命令提示符下工作并且没有错误地崩溃。从 spyder 的 Ipython 控制台尝试了一些东西(说实话不知道它是如何工作的),没有用。

最终确定答案:


硬件:

  • 锐龙 9 5950X
  • 64GB DDR4 内存
  • RTX 3060 钛

我真的很想和 Anaconda 一起工作,因为我对它非常熟悉,而且我所做的一切都在 Anaconda 中进行。最重要的是,去年我让它在 Anaconda 中工作没问题,所以它必须是可能的!

问题:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import LSTM
from numpy.random import rand


X, y = rand(8000, 50, 5), rand(8000, 10)

model = keras.Sequential()
model.add(keras.Input(shape = (X.shape[1], X.shape[2])))

到目前为止一切正常。

下一行:

model.add(LSTM(units = 100))

产生以下错误:

NotImplementedError: Cannot convert a symbolic Tensor
(lstm_1/strided_slice:0) to a numpy array. This error may indicate that
you're trying to pass a Tensor to a NumPy call, which is not supported

原因/解决方法: 要获得明确的答案,我必须将您推荐给 Tensorflow 的开发人员,但我能够推断出以下内容:

和我有完全相同的问题,它是通过将 numpy1.20.x 降级到 1.19.x 来解决的。关于 post 的讨论很有趣,基本上 Tensorflow 版本 >2.3.x 是用 numpy 1.19.5 编译的。 Anaconda 在使用 conda install tensorflow-gpu 时默认安装版本 1.20.x,它们不能很好地播放。降级本身很容易解决。

如果您有 NVIDIA RTX 30xx GPU,那么您还没有完成!

长话短说,RTX 30xx采用Ampere架构,这需要较新版本的CUDA,这需要较新版本的Tensorflow,准确地说是版本>2.4.x。截至撰写本文时,此版本在 conda.

上不可用

因此,conda 自动安装 cuDNNcudatoolkit 所提供的所有便利不再可用。简单地 pip install tensorflow=2.4.0 是行不通的。最糟糕的是,它可能看起来一直在工作,直到训练了一个多小时才突然停止并出现完全随机的错误。 (sorry,我这时候已经准备暴走了,来晚了,没有记下错误,有很多,都没有解决。)

This guide 详细介绍了如何从源代码编译 cuDNN 和 CUDA。在您遵循本指南之前:如果您进入控制面板 > 程序和功能并从 NVIDIA 卸载所有内容 那不是: NVIDIA graphics driver, NVIDIA geforce experience, NVIDIA HD audio driver, NVIDIA PhysX.

另外重要提示:

在步骤 Building CUDA/cuDNN: Set 3 中有一个严重的拼写错误。该指南指示您复制文件

来自:

# 1. cuDNN
\...\cudnn-11.0-windows-x64-v8.0.4.30.zip\cuda\bin

至:

# 2. NVIDIA GPU Computing Toolkit
\...\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include

这是不正确的!!

应该来自:

# 1. cuDNN
\...\cudnn-11.0-windows-x64-v8.0.4.30.zip\cuda\bin

至:

# 2. NVIDIA GPU Computing Toolkit
\...\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin

按照本指南进行操作后,我 重新启动了我的电脑(不要跳过),使用 python 3.8.11 创建了一个新环境:

conda create -n tf python=3.8

我直接从命令提示符和我的新 tf 环境中使用 pip 安装了 tensorflow 2.4.0

pip install tensorflow==2.4.0

这也会安装 tensorflow 的 gpu 功能,而 anaconda 版本仅在调用 conda install tensorflow 时才安装 cpu。当然,它仍然不起作用,您现在已经安装了 numpy 1.20.3(您可以使用 conda list numpy 进行检查)。只需使用 conda install numpy=1.19 即可将其降级。最重要的是,在我的系统上,指南中提供的示例:

from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

train_images, test_images = train_images / 255.0, test_images / 255.0

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))

model.compile(optimizer='Adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

history = model.fit(train_images, train_labels, batch_size=10, epochs=100)

会抛出一个错误(至少对我来说是这样):

NotFoundError:  No algorithm worked!
     [[node sequential/conv2d/Relu (defined at <ipython-input-1-bf665ec77ee4>:18) ]] [Op:__inference_train_function_580]

但是,我们对这个例子不感兴趣,我们想要运行 LSTM / GRU,并且不修复这个例子。因此我们将丢弃它并继续,现在我们将尝试:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import LSTM
from numpy.random import rand


X, y = rand(8000, 50, 5), rand(8000, 10)

model = keras.Sequential()
model.add(keras.Input(shape = (X.shape[1], X.shape[2])))           

model.add(LSTM(units = 100))
model.add(Dense(units = 10))

低看,没有错误!

model.compile(loss = 'mse', optimizer = 'adam')

仍然没有错误!

history = model.fit(X, y, epochs = 10)

仍然没有错误!,它甚至使用了 GPU 吗?控制台中的消息似乎确实表明了这一点:

2021-08-19 13:04:09.234795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Default GPU Device: /device:GPU:0
training model

2021-08-19 13:04:09.234795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.645028: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-19 13:04:10.647857: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-08-19 13:04:10.662783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.662799: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.667119: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.667133: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.669347: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.670066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.675548: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.677202: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.677612: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.677658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.979738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.979763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-08-19 13:04:10.979770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-08-19 13:04:10.979886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.980387: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.980542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.980555: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.980563: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.980569: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.980575: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.980580: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.980586: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.980592: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.980646: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.980676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.980693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.980698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-08-19 13:04:10.980703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-08-19 13:04:10.980744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.980757: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.984016: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.984082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.984094: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.984100: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.984106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.984112: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.984117: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.984122: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.984127: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.984132: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.984158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.984332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.984344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.984350: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.984355: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.984360: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.984365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.984369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.984374: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.984420: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.984445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.984470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.984475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-08-19 13:04:10.984479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-08-19 13:04:10.984533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.984546: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:11.334311: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)

查看任务管理器时,我可以看到内存已完全分配,并且 3D 图形显示 99% 的利用率!与使用 CPU 相比,所需的训练时间减少了四分之一。总而言之,非常成功!

我现在真的希望 运行我自己设计的 Conv2D 网络不会导致与示例相同的错误,但只有时间会证明一切,目前这对我来说已经足够好了目的。