tf.data.Dataset：完成 GeneratorDataset 迭代器时出错：前提条件失败：Python 解释器状态未初始化

Question

我需要将基于 Sequence 的数据生成器转换为 tf.data.Dataset 格式。为此，我使用 from_generator 函数为我的所有训练、验证和测试数据创建重复的 BatchedDataset。

  dataset = tf.data.Dataset.from_generator(gen_function,
                                           output_signature=output_signature)
  dataset = dataset.shuffle(shuffle_buffer,
                            reshuffle_each_iteration=True)
  dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)

模型拟合中使用了这些：

OCR.model.fit(x=training_generator,
              validation_data=validation_generator,
              steps_per_epoch=steps_per_epoch, 
              epochs=epochs,
              use_multiprocessing=True,
              callbacks=callbacks,
              workers=workers,
              verbose=verbose)

导致以下错误：

    /user/.../python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py, 
    line 739, in _validate_args raise ValueError(
    ValueError: When providing an infinite dataset, you must specify the number of 
    steps to run (if you did not intend to create an infinite dataset, make sure to 
    not call `repeat()` on the dataset).
    [date time]: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error 
    occurred when finalizing GeneratorDataset iterator: Failed precondition: Python 
    interpreter state is not initialized. The process may be terminated.
    >· [[{{node PyFunc}}]]

这令人困惑，因为我按照建议为我的重复无限数据集指定了步数。此外，当我之前使用基于序列的数据生成器时，它以这种方式与以这种方式指定的 steps_per_epoch 一起工作。

Answer 1

解决方法很简单，除了 fit 函数中的 steps_per_epoch 之外，只需指定 validation_steps 参数即可。

tf.data.Dataset：完成 GeneratorDataset 迭代器时出错：前提条件失败：Python 解释器状态未初始化

tf.data.Dataset: Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized

python

data-generation

keras

tensorflow

tf.data.dataset