Iterator.get_next 方法背后的直觉是什么?

What is the intuition behind the Iterator.get_next method?

方法的名称 get_next() 有点误导。文档说

Returns a nested structure of tf.Tensors representing the next element.

In graph mode, you should typically call this method once and use its result as the input to another computation. A typical loop will then call tf.Session.run on the result of that computation. The loop will terminate when the Iterator.get_next() operation raises tf.errors.OutOfRangeError. The following skeleton shows how to use this method when building a training loop:

dataset = ...  # A `tf.data.Dataset` object.
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()

# Build a TensorFlow graph that does something with each element.
loss = model_function(next_element)
optimizer = ...  # A `tf.compat.v1.train.Optimizer` object.
train_op = optimizer.minimize(loss)

with tf.compat.v1.Session() as sess:
  try:
    while True:
      sess.run(train_op)
  except tf.errors.OutOfRangeError:
    pass

Python 也有一个名为 next, which needs to be called every time we need the next element of the iterator. However, according to the documentation of get_next() quoted above, get_next() should be called only once and its result should be evaluated by calling the method run of the session, so this is a little bit unintuitive, because I was used to the Python's built-in function next. In this script 的函数,get_next() 也只被调用,调用的结果在计算的每一步都被评估。

get_next() 背后的直觉是什么?它与 next() 有何不同?我认为,在我上面链接的第二个示例中,每次通过调用方法 run 评估第一次调用 get_next() 的结果时,都会检索数据集(或可馈送迭代器)的下一个元素,但这有点不直观。我不明白为什么我们不需要在计算的每一步都调用 get_next(以获取可馈送迭代器的下一个元素),即使在阅读了文档

中的注释之后

NOTE: It is legitimate to call Iterator.get_next() multiple times, e.g. when you are distributing different elements to multiple devices in a single step. However, a common pitfall arises when users call Iterator.get_next() in each iteration of their training loop. Iterator.get_next() adds ops to the graph, and executing each op allocates resources (including threads); as a consequence, invoking it in every iteration of a training loop causes slowdown and eventual resource exhaustion. To guard against this outcome, we log a warning when the number of uses crosses a fixed threshold of suspiciousness.

总的来说,并不清楚迭代器是如何工作的。

想法是 get_next 向图中添加一些操作,这样,每次对它们求值时,您都会获得数据集中的下一个元素。在每次迭代中,您只需要 运行 get_next 所做的操作,您不需要一遍又一遍地创建它们。

也许获得直觉的好方法是尝试自己编写一个迭代器。考虑如下内容:

import tensorflow as tf
tf.compat.v1.disable_v2_behavior()

# Make an iterator, returns next element and initializer
def iterator_next(data):
    data = tf.convert_to_tensor(data)
    i = tf.Variable(0)
    # Check we are not out of bounds
    with tf.control_dependencies([tf.assert_less(i, tf.shape(data)[0])]):
        # Get next value
        next_val_1 = data[i]
    # Update index after the value is read
    with tf.control_dependencies([next_val_1]):
        i_updated = tf.compat.v1.assign_add(i, 1)
        with tf.control_dependencies([i_updated]):
            next_val_2 = tf.identity(next_val_1)
    return next_val_2, i.initializer

# Test
with tf.compat.v1.Graph().as_default(), tf.compat.v1.Session() as sess:
    # Example data
    data = tf.constant([1, 2, 3, 4])
    # Make operations that give you the next element
    next_val, iter_init = iterator_next(data)
    # Initialize iterator
    sess.run(iter_init)
    # Iterate until exception is raised
    while True:
        try:
            print(sess.run(next_val))
        # assert throws InvalidArgumentError
        except tf.errors.InvalidArgumentError: break

输出:

1
2
3
4

在这里,iterator_next 为您提供了与迭代器中的 get_next 相媲美的东西,外加一个初始化操作。每次你 运行 next_valdata 获取一个新元素时,你不需要每次都调用该函数(这就是 next 在 [=34 中的工作方式) =]), 你调用它一次,然后多次计算结果。

编辑:上面的函数iterator_next也可以简化为:

def iterator_next(data):
    data = tf.convert_to_tensor(data)
    # Start from -1
    i = tf.Variable(-1)
    # First increment i
    i_updated = tf.compat.v1.assign_add(i, 1)
    with tf.control_dependencies([i_updated]):
        # Check i is not out of bounds
        with tf.control_dependencies([tf.assert_less(i, tf.shape(data)[0])]):
            # Get next value
            next_val = data[i]
    return next_val, i.initializer

或者更简单:

def iterator_next(data):
    data = tf.convert_to_tensor(data)
    i = tf.Variable(-1)
    i_updated = tf.compat.v1.assign_add(i, 1)
    # Using i_updated directly as a value is equivalent to using i with
    # a control dependency to i_updated
    with tf.control_dependencies([tf.assert_less(i_updated, tf.shape(data)[0])]):
        next_val = data[i_updated]
    return next_val, i.initializer