如何在 Kaggle 笔记本中编写 TFRecords?

How to write TFRecords in a Kaggle notebook?

我目前正在尝试将 Kaggle TPU 与 cifar10 数据集一起使用。下面的代码展示了我如何在 TFRecords 中对数据进行编码,但我现在知道了之后如何将它们存储在文件中。

def _bytes_feature(value):
    """Returns a bytes_list from a string / byte"""
    if isinstance(value, type(tf.constant(0))):
        value = value.numpy() # BytesList won't unpack a string from an EagerTensor (what??)
    return tf.train.Feature(bytes_list = tf.train.BytesList(value=[value]))

def _float_feature(value):
    """Returns a float_list from a float / double"""
    return tf.train.Feature(float_list = tf.train.FloatList(value=[value]))

def _int64_feature(value):
    """"Returns an int64_list from a bool / enum / int / uint"""
    return tf.train.Feature(int64_list = tf.train.Int64List(value=[value]))

def image_example(image, label, dimension):
    feature = {
        'dimension': _int64_feature(dimension),
        'label': _int64_feature(label),
        'image_raw': _bytes_feature(image.tobytes()),
    }
    return tf.train.Example(features=tf.train.Features(feature=feature))

并在 TFRecords 中写入数据:

record_file = './cifar10.tfrecords'
n_samples = x_train.shape[0]
dimension = x_train.shape[1]
depth = x_train.shape[3]
# print(x_train.shape)

with tf.io.TFRecordWriter(record_file) as writer:
   for i in range(n_samples):
      image = x_train[i]
      label = y_train[i]
      tf_example = image_example(image, label, dimension) # function defined above
      writer.write(tf_example.SerializeToString()) # serializes the input to store the data

现在我想我只需要运行这个来获取我的数据:

data = tf.data.TFRecordDataset(record_file)

如果我尝试解析记录,我会收到以下错误:

UnimplementedError: File system scheme '[local]' not implemented (file: './cifar10.tfrecords')

但它什么也不做(实际上重新初始化了 Kaggle 会话,就好像我之前没有 运行 任何东西一样)。你知道我犯的错误吗?

非常感谢您提供的任何帮助!!

UnimplementedError:文件系统方案“[local]”未实现(文件:“./cifar10.tfrecords”)。

这是因为 Cloud TPU 无法在 Kaggle/Colab 文件系统上写入(或访问)文件,它需要将文件放在 google 存储桶中。

https://cloud.google.com/tpu/docs/troubleshooting#cannot_use_local_filesystem