Keras/Tensorflow 使用 TPU 在 GCP 上进行训练

Keras/Tensorflow training on GCP with TPU

我正在尝试使用 keras 和 tensorflow 1.15 在 GCP 上训练模型。 从现在开始,我的代码类似于我在 colab 上可以做的,即:

# TPUs
import tensorflow as tf
print(tf.__version__)
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver("tpu-name")
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
print("Number of accelerators: ", tpu_strategy.num_replicas_in_sync)


import numpy as np


np.random.seed(123)  # for reproducibility
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, Input
from tensorflow.keras import utils
from tensorflow.keras.datasets import mnist, cifar10
from tensorflow.keras.models import Model

# 4. Load data into train and test sets
(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) =  load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)
print(X_train.shape, X_test.shape)

# 5. Preprocess input data
#X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
#X_test = X_test.reshape(X_test.shape[0], 28, 28,1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255.0
X_test /= 255.0

print(y_train.shape, y_test.shape)
# 6. Preprocess class labels One hot encoding
Y_train = utils.to_categorical(y_train, 2)
Y_test = utils.to_categorical(y_test, 2)
print(Y_train.shape, Y_test.shape)

with tpu_strategy.scope():
  model = make_model((img_size, img_size, 3))
  # 8. Compile model
  model.compile(loss='categorical_crossentropy',
                optimizer="sgd",
                metrics=['accuracy'])

model.summary()

batch_size = 1250 * tpu_strategy.num_replicas_in_sync
# 9. Fit model on training data
model.fit(X_train, Y_train, steps_per_epoch=len(X_train)//batch_size,  
            epochs=5, verbose=1)

但我的数据在存储桶上,而我的代码在虚拟机上。那我该怎么办?我尝试使用 "gs://BUCKETS" 加载我的数据,但它不起作用。我应该怎么办 ? 编辑:我添加我的代码来加载数据,我忘记了抱歉。

def load_data(sets="dogcats/train/", k = 5000, target_size=250):
  # define location of dataset
  folder = sets
  photos, labels = list(), list()
  # determine class
  output = 0.0
  for i, dog in enumerate(listdir(folder + "dogs/")):
    if i >= k:
      break
    # load image
    photo = load_img(folder + "dogs/" +dog, target_size=(target_size, target_size))
    # convert to numpy array
    photo = img_to_array(photo)
    # store
    photos.append(photo)
    labels.append(output)

  output = 1.0

  for i, cat in enumerate(listdir(folder + "cats/") ):
    if i >= k:
      break
    # load image
    photo = load_img(folder + "cats/"+cat, target_size=(target_size, target_size))
    # convert to numpy array
    photo = img_to_array(photo)
    # store
    photos.append(photo)
    labels.append(output)

  # convert to a numpy arrays
  photos = asarray(photos)
  labels = asarray(labels)
  print(photos.shape, labels.shape)
  photos, labels = shuffle(photos, labels, random_state=0)
  return photos, labels

EDIT2:完成@daudnadeem 的回答,以防其他人遇到同样的情况。

我的目标是从桶中获取图像,因此代码运行良好并允许获取字节对象。要将其转换为图像,您只需要使用 PIL 库:

from PIL import Image
from io import BytesIO
import numpy as np

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("BUCKETS")
blob = bucket.get_blob('dogscats/train/<you-will-need-to-point-to-a-file-and-not-a-directory>')
data = blob.download_as_string()

img = Image.open(BytesIO(data))
img = np.array(img)
(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) =  load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)

这显然行不通,因为基本上您所做的一切都是给定的字符串集。您需要做的是将此数据下载为字符串,然后使用它。

首先安装包pip install google-cloud-storagepip3 install google-cloud-storage

点 -> Python

pip3 -> Python3

看看 this,您需要一个服务帐户才能通过您的代码与 GCP 进行交互。用于身份验证目的。

当您获得服务帐户 json 后,您需要执行以下两项操作之一:

将其设置为环境变量: export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"

或者我更喜欢的解决方法

gcloud auth activate-service-account \
  <repalce-with-email-from-json-file> \
          --key-file=<path/to/your/json/file> --project=<name-of-your-gcp-project>

现在让我们看看如何使用 google-cloud-storage 库将文件下载为字符串:

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("BUCKETS")
blob = bucket.get_blob('/dogscats/train/<you-will-need-to-point-to-a-file-and-not-a-directory>')
data = blob.download_as_string()

现在您已将数据作为字符串,您可以简单地将 data 传递到加载数据中,就像这样 (X_train, y_train) = load_data(sets=data,target_size=img_size)

这听起来很复杂,但这是一个快速的伪布局:

  1. 安装google-云存储
  2. 转到 Google 云平台控制台 -> IAM 和管理 -> 服务帐户
  3. 创建具有相关权限的服务帐户(google-cloud-storage)
  4. 下载 (JSON) 文件,并记住位置。
  5. 激活服务帐户
  6. 将文件下载为字符串并将该字符串传递给您的 load_data(data)

希望对您有所帮助!