如何在 tf.estimator 的 input_fn 中使用 tf.data 的可初始化迭代器？

Question

我想用 tf.estimator.Estimator but have some trouble to use it alongside the tf.data API 来管理我的训练。

我有这样的东西：

def model_fn(features, labels, params, mode):
  # Defines model's ops.
  # Initializes with tf.train.Scaffold.
  # Returns an tf.estimator.EstimatorSpec.

def input_fn():
  dataset = tf.data.TextLineDataset("test.txt")
  # map, shuffle, padded_batch, etc.

  iterator = dataset.make_initializable_iterator()

  return iterator.get_next()

estimator = tf.estimator.Estimator(model_fn)
estimator.train(input_fn)

因为我不能在我的用例中使用 make_one_shot_iterator，我的问题是 input_fn 包含一个应该在 model_fn 内初始化的迭代器（在这里，我使用 tf.train.Scaffold 初始化本地操作。

此外，我了解到我们不能只使用 input_fn = iterator.get_next，否则其他操作将不会添加到同一个图中。

初始化迭代器的推荐方法是什么？

Answer 1

从 TensorFlow 1.5 开始，可以将 input_fn return 变成 tf.data.Dataset，例如：

def input_fn():
  dataset = tf.data.TextLineDataset("test.txt")
  # map, shuffle, padded_batch, etc.
  return dataset

见c294fcfd。

对于以前的版本，您可以在 tf.GraphKeys.TABLE_INITIALIZERS 集合中添加迭代器的初始化程序，并依赖于默认的初始化程序。

tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer)

如何在 tf.estimator 的 input_fn 中使用 tf.data 的可初始化迭代器？

How to use tf.data's initializable iterators within a tf.estimator's input_fn?

python

tensorflow

tensorflow-datasets

tensorflow-estimator