如何在 Tensorflow 2.0 中使用来自 Google Colab 的 TPU？

Question

我正在尝试利用 Google Colab 来使用张量处理单元 (TPU) 来训练神经网络。 Tensorflow 刚刚发布了一个主要版本 2.0，所以我正在尝试在 Tensorflow 2.0 中执行此操作。我已尝试遵循三个指南，但所有指南都是为 Tensorflow 1.14 编写的，但在 Tensorflow 2.0 中失败了：

1) 按照指南 TPUs in Colab，我得到错误：

AttributeError: module 'tensorflow' has no attribute 'Session'

（来自参考：with tf.Session(tpu_address) as session:)

2) 按照指南 Simple Classification Model using Keras on Colab TPU，我得到了同样的错误

3) 按照指南 cloud_tpu_custom_training，我得到错误：

AttributeError: module 'tensorflow' has no attribute 'contrib'

（来自参考：resolver = tf.contrib.cluster_resolver.TPUClusterResolver（tpu=TPU_WORKER））

有没有人有在 Tensorflow 2.0 中使用 TPU 训练神经网络的例子？

编辑：这个问题似乎也在 github 上提出：InvalidArgumentError: Unable to find a context_id matching the specified one #1

Answer 1

在运行代码之前，

前往，

Edit --> Notebook Settings

在那之下select

Hardware Accelerator --> TPU

Answer 2

Tensorflow 2.0 并不真正向后兼容 Tensorflow 1.X 代码。 Tensorflow 在这些版本之间的工作方式有相当多的变化，因此我强烈建议阅读有关如何迁移代码的官方指南：

https://www.tensorflow.org/guide/migrate#estimators

我会说，自动转换脚本虽然在技术上是成功的，但只是将我的代码更改为 Tensorflow 1.X 代码的兼容版本 - 如果您想使用任何实际的 Tensorflow 2.0 功能，您将可能需要手动更改代码。

Answer 3

首先教程中给出的代码不2.x兼容

您需要在 colab 中选择 runtime 作为 TPU 才能在 TPU 中执行代码
对于错误

AttributeError: module 'tensorflow' has no attribute 'Session'

您需要使用 tf.compat.v1.Session()，因为 tf.session 已弃用。
代替tf.contrib.cluster_resolver请使用tf.distribute.cluster_resolver

请参考 Tensorflow Addon-repo 将代码从 1.x 转换为 2.x 兼容。

Answer 4

Tensorflow 2.1.0 终于添加了对 TPU 的支持（截至 2020 年 1 月 8 日）。从此处的发行说明 https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0:

Experimental support for Keras .compile, .fit, .evaluate, and .predict is available for Cloud TPUs, Cloud TPU, for all types of Keras models (sequential, functional and subclassing models).

教程可在此处获取：https://www.tensorflow.org/guide/tpu

为了完整起见，我将在此处添加演练：

转到 Google Colab 并在此处创建一个新的 Python 3 Notebook：https://colab.research.google.com/
在工具栏中，单击运行时/更改运行时间类型，然后在硬件加速器下选择"TPU"。
将以下代码复制并粘贴到笔记本中，然后单击运行单元格（播放按钮）。

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import os
import tensorflow_datasets as tfds

# Distribution strategies
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)

# MNIST model
def create_model():
  return tf.keras.Sequential(
      [tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
       tf.keras.layers.Flatten(),
       tf.keras.layers.Dense(128, activation='relu'),
       tf.keras.layers.Dense(10)])

# Input datasets
def get_dataset(batch_size=200):
  datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True,
                             try_gcs=True)
  mnist_train, mnist_test = datasets['train'], datasets['test']

  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.0

    return image, label

  train_dataset = mnist_train.map(scale).shuffle(10000).batch(batch_size)
  test_dataset = mnist_test.map(scale).batch(batch_size)

  return train_dataset, test_dataset

# Create and train a model
strategy = tf.distribute.experimental.TPUStrategy(resolver)
with strategy.scope():
  model = create_model()
  model.compile(optimizer='adam',
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])

train_dataset, test_dataset = get_dataset()

model.fit(train_dataset,
          epochs=5,
          validation_data=test_dataset,steps_per_epoch=50)

请注意，当我按原样运行来自 tensorflow 教程的代码时，出现以下错误。我已通过在 model.fit()

中添加 steps_per_epoch 参数来更正此问题

ValueError: Number of steps could not be inferred from the data, please pass the steps_per_epoch argument.

Answer 5

“tf”的升级版本将解决上述问题。

!pip install tensorflow==2.7.0

如何在 Tensorflow 2.0 中使用来自 Google Colab 的 TPU？

How can you use TPU from Google Colab in Tensorflow 2.0?

python

tensorflow

google-colaboratory

tpu

tensorflow2.0