tf.data.Dataset .from_tensor_slices() 是否保留示例的顺序?

does tf.data.Dataset .from_tensor_slices() preserve the order of examples?

如果我有一组 tfrecords,在这里使用 .from_tensor_slices(),创建的 dataset 会保留数据的顺序吗?例如,如果我有 3 个 tfrecords(第一个包含 40 个示例,第二个包含 30 个示例,第三个包含 70 个示例)分别称为 1.tfrecord2.tfrecord3.tfrecord,然后我构建 dataset = tf.data.Dataset.from_tensor_slices(['1.tfrecord', '2.tfrecord', '3.tfrecord'])。在加载过程中,这些例子的顺序会保留吗?

如果我没有正确理解你的问题,是的,将 tf.data.Dataset.from_tensor_slicestfrecord 结合使用时示例的顺序会保留。这是一个简单的例子:

import tensorflow as tf

with tf.io.TFRecordWriter("sample1.tfrecord") as w:
    w.write(b"Record A")
    w.write(b"Record B")

with tf.io.TFRecordWriter("sample2.tfrecord") as w:
    w.write(b"Record C")
    w.write(b"Record D")
    w.write(b"Record E")
    w.write(b"Record F")

with tf.io.TFRecordWriter("sample3.tfrecord") as w:
    w.write(b"Record G")
    w.write(b"Record H")
    w.write(b"Record I")
    w.write(b"Record J")
    w.write(b"Record K")
    w.write(b"Record L")

dataset = tf.data.Dataset.from_tensor_slices(["sample1.tfrecord",
                                              "sample2.tfrecord",
                                              "sample3.tfrecord"])
for record in dataset:
   for item in tf.data.TFRecordDataset(record):
     tf.print('Record:', record, 'Item -->', item)
Record: "sample1.tfrecord" Item --> "Record A"
Record: "sample1.tfrecord" Item --> "Record B"
Record: "sample2.tfrecord" Item --> "Record C"
Record: "sample2.tfrecord" Item --> "Record D"
Record: "sample2.tfrecord" Item --> "Record E"
Record: "sample2.tfrecord" Item --> "Record F"
Record: "sample3.tfrecord" Item --> "Record G"
Record: "sample3.tfrecord" Item --> "Record H"
Record: "sample3.tfrecord" Item --> "Record I"
Record: "sample3.tfrecord" Item --> "Record J"
Record: "sample3.tfrecord" Item --> "Record K"
Record: "sample3.tfrecord" Item --> "Record L"

或者:

dataset = tf.data.Dataset.from_tensor_slices(["sample1.tfrecord",
                                              "sample2.tfrecord",
                                              "sample3.tfrecord"])
for item in tf.data.TFRecordDataset(dataset):
  tf.print('Item -->', item)
Item --> "Record A"
Item --> "Record B"
Item --> "Record C"
Item --> "Record D"
Item --> "Record E"
Item --> "Record F"
Item --> "Record G"
Item --> "Record H"
Item --> "Record I"
Item --> "Record J"
Item --> "Record K"
Item --> "Record L"