你如何将固定长度的特征写入 tfrecord

Question

我正在努力学习编写 tensorflow tfrecord 文件的基础知识。我正在用 python 中的 ndarray 编写一个简单的示例，但由于某种原因，当我阅读它时，它需要可变长度并将其读取为 SparseTensor。

示例如下

def serialize_tf_record(features, targets):
    record = {
        'shape': tf.train.Int64List(value=features.shape),
        'features': tf.train.FloatList(value=features.flatten()),
        'targets': tf.train.Int64List(value=targets),
    }

    return build_tf_example(record)

def deserialize_tf_record(record):
    tfrecord_format = {
        'shape': tf.io.VarLenFeature(tf.int64),
        'features': tf.io.VarLenFeature(tf.float32),
        'targets': tf.io.VarLenFeature(tf.int64),
    }

    features_tensor = tf.io.parse_single_example(record, tfrecord_format)
    return features_tensor

任何人都可以向我解释为什么这会写入一个可变长度的记录吗？它在代码中是固定的，但我似乎无法以 tensorflow 知道其固定的方式编写它。 tensorflow 文档在这里非常可怕。有人可以为我澄清一下 API 吗？

Answer 1

您应该提供更多上下文代码，例如您的 build_tf_example 函数以及您的功能和目标的示例。

这是一个 return 密集张量的例子：


import numpy as np
import tensorflow as tf

def build_tf_example(record):
    return tf.train.Example(features=tf.train.Features(feature=record)).SerializeToString()

def serialize_tf_record(features, targets):
    record = {
        'shape': tf.train.Feature(int64_list=tf.train.Int64List(value=features.shape)),
        'features': tf.train.Feature(float_list=tf.train.FloatList(value=features.flatten())),
        'targets': tf.train.Feature(int64_list=tf.train.Int64List(value=targets)),
    }

    return build_tf_example(record)

def deserialize_tf_record(record):
    tfrecord_format = {
        'shape': tf.io.FixedLenSequenceFeature((), dtype=tf.int64, allow_missing=True),
        'features': tf.io.FixedLenSequenceFeature((), dtype=tf.float32, allow_missing=True),
        'targets': tf.io.FixedLenSequenceFeature((), dtype=tf.int64, allow_missing=True),
    }

    features_tensor = tf.io.parse_single_example(record, tfrecord_format)
    return features_tensor

def main():
    features = np.zeros((3, 5, 7))
    targets = np.ones((4,), dtype=int)
    tf.print(deserialize_tf_record(serialize_tf_record(features, targets)))


if __name__ == '__main__':
    main()

我将 record 转换为特征字典（以便轻松序列化）
据我了解，您的每个特征都可以是一个数组（与标量值相对），因此您可以使用 FixedLenSequenceFeature 输入特征解析它以构建密集张量而不是稀疏张量。

你如何将固定长度的特征写入 tfrecord

How do you write a fixed len feature to tfrecord

python

tensorflow

tensorflow-datasets