MirroredStrategy causes IndexError: pop from empty list when using Keras Sequences as model input

MirroredStrategy causes IndexError: pop from empty list when using Keras Sequences as model input

虽然 MirroredStrategyIndexError: pop from empty list 现在声名狼藉并且有许多可能的原因,例如在以下问题中报告的:

等等,但 none 适用于我的用例。

在我的用例中,我使用 Keras Sequence 对象来生成训练输入,因为我正在处理具有单个已知正数的大型数据集(不适合 RAM)class 和未知底片。

遵循 Keras Documentation and TensorFlow documentation 上可用的教程,我的代码如下所示:


my_training_sequence = MySequenceObject()

if tf.config.list_physical_devices('GPU'):
    strategy = tf.distribute.MirroredStrategy(devices)
else:
    # Use the Default Strategy
    strategy = tf.distribute.get_strategy()

with strategy.scope():
    model = CreateMyKerasModel()
    # While in the TensorFlow documentation the compilation step
    # is shown OUTSIDE the scope, in the Keras one it happens
    # within the scope.
    # I  have found out that is NECESSARY to place it inside the scope
    # as the Keras Metrics need to be in the same strategy scope of the model
    # to work properly.
    model.compile(...)

# Then, OUSIDE from the score, run the fit
# which causes the IndexError
model.fit(my_training_sequence)

关于如何处理这个问题有什么想法吗?

历经千辛万苦,才明白in the Keras Documentation they make use of TensorFlow Dataset objects

现在,向量等普通输入会在拟合过程中转换为数据集,因此不会导致问题,但目前 Keras 不支持在后台将 Keras 序列自动转换为数据集。虽然我不知道这是为什么,但幸运的是,创建一个将序列转换为数据集的方法相对容易。

不幸的是,它取决于您使用的 TensorFlow 版本,因此在某些版本中您想要使用 TensorSpec objects, while in older one just the combination of tensorflow data types and TensorShape 即可。

在下面的示例中,我将展示编写可转换为数据集的 Keras 序列 class 的高级方法。之后,我将 link 我已经以这种方式实现的所有 Keras 序列作为后代的例子(或者我自己,一旦我忘记了这个恶魔般的东西的一些细节)。

import tensorflow as tf
import numpy as np
from packaging import version
from validate_version_code import validate_version_code


def tensorflow_version_is_higher_or_equal_than(tensorflow_version: str) -> bool:
    """Returns boolean if the TensorFlow version is higher than provided one.

    Parameters
    ----------------------
    tensorflow_version: str,
        The version of TensorFlow to check against.

    Raises
    ----------------------
    ValueError,
        If the provided version code is not a valid one.

    Returns
    ----------------------
    Boolean representing if installed TensorFlow version is higher than given one.
    """
    if not validate_version_code(tensorflow_version):
        raise ValueError(
            (
                "The provided TensorFlow version code `{}` "
                "is not a valid version code."
            ).format(tensorflow_version)
        )
    return version.parse(tf.__version__) >= version.parse(tensorflow_version)


class ExampleSequence:
    """Keras Sequence convertible into a TensorFlow Dataset."""

    def __init__(
        self,
        batch_size: int = 32,
        batches_per_epoch: int,
        # Your other parameters go here
    ):
        """

        Parameters
        --------------------------------
        batch_size: int = 32
            Size for the batches to generate,
            if the size is expected to be CONSTANT
            otherwise use None if some batches have different size
        batches_per_epoch: int
            The number of batches within an epoch
        """
        self._batch_size = batch_size
        self._batches_per_epoch = batches_per_epoch
        # Initialize the index of the batch for the Dataset calls
        self._current_index = 0
        # Your other parameters go here

    def __call__(self):
        """Return next batch using an infinite generator model."""
        self._current_index = (self._current_index + 1) % self._batches_per_epoch
        return self[self._current_index]

    def into_dataset(self) -> tf.data.Dataset:
        """Return dataset generated out of the current sequence instance.

        Implementative details
        ---------------------------------
        This method handles the conversion of this Keras Sequence into
        a TensorFlow dataset, also handling the proper dispatching according
        to what version of TensorFlow is installed in this system.

        Returns
        ----------------------------------
        Dataset to be used for the training of a model
        """

        #################################################################
        # Handling kernel creation when TensorFlow is a modern version. #
        #################################################################

        if tensorflow_version_is_higher_or_equal_than("2.5.0"):
            return tf.data.Dataset.from_generator(
                self,
                output_signature=(
                    (
                        tf.TensorSpec(
                            shape=(self._batch_size, 10),
                            dtype=tf.uint32
                        )
                    ),
                    tf.TensorSpec(
                        shape=(self._batch_size,),
                        dtype=tf.bool
                    )
                )
            )

        return tf.data.Dataset.from_generator(
            self,
            output_types=(
                (tf.uint32, ),
                tf.bool
            ),
            output_shapes=(
                (tf.TensorShape([self._batch_size, 10]),),
                tf.TensorShape([self._batch_size, ]),
            )
        )

    def __getitem__(self, idx: int):
        """Return batch corresponding to given index.

        Parameters
        ---------------
        idx: int,
            Index corresponding to batch to be returned.

        Returns
        ---------------
        Return Tuple containing X and Y numpy arrays corresponding to given batch index.
        """
        X = np.random.randint(shape=(self._batch_size, 10), dtype=np.uint32)
        y = np.random.randint(high=2, shape=(self._batch_size, ), dtype=np.bool)

        # Please do observe that the return type
        # has multiple layer of tuple wrapping, and they are ALL needed!
        # It is weird, but it is the only way this thing worked.
        return (((X, ), y,),)


然后,当你运行适合时,你可以使用:

model.fit(my_training_sequence.into_dataset())