在 Tensorflow 2 数据集中，重置实际上意味着什么？

Question

我正在关注 tensorflow 2 Keras documentation。我的模型如下所示：

train_dataset = tf.data.Dataset.from_tensor_slices((np.array([_my_cus_func(i) for i in X_train]), y_train))
train_dataset = train_dataset.map(lambda vals,lab: _process_tensors(vals,lab), num_parallel_calls=4)
train_dataset = train_dataset.shuffle(buffer_size=10000)
train_dataset = train_dataset.batch(64,drop_remainder=True)
train_dataset = train_dataset.prefetch(1)
model=get_compiled_model()
model.fit(train_dataset, epochs=100)

文档说

Note that the Dataset is reset at the end of each epoch, so it can be reused of the next epoch.

If you want to run training only on a specific number of batches from this Dataset, you can pass the steps_per_epoch argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.

If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).

重置实际上意味着什么？ tensorflow 会在每个纪元后从张量切片中读取数据吗？或者它只是重新洗牌和运行s map 功能？我希望 tensorflow 在纪元和运行 _my_cus_func 之后从 numpy 读取数据。我宁愿在 dataset map or apply api 上传递 _my_cus_func，但我更愿意在 python 列表或 numpy 数组上这样做。

Answer 1

在此上下文中，重置意味着从头开始迭代数据集。在您的特定情况下，代码缺少 repeat() 功能。因此，如果您像这样指定 steps_per_epoch 参数

model.fit(train_dataset, steps_per_epoch=N, epochs=100)

它将在数据集上迭代N步，如果N小于实际样本数，将终止训练。如果 N 较大，它将完成一个 epoch，但在用完数据时仍会终止。如果添加重复，

train_dataset = train_dataset.shuffle(buffer_size=10000).repeat()

它会在达到实际示例数量时开始对数据集进行新的循环，而不是在新的纪元开始时。

在 Tensorflow 2 数据集中，重置实际上意味着什么？

What does reset actually mean in Tensorflow 2 dataset?

python

keras

tensorflow

tensorflow-datasets

tensorflow2.0