Tensorflow tf.dataset.shuffle 很慢

Question

我正在训练一个包含 9100 张图像（每张大小为 256 x 64）的 VAE 模型。我使用 Nvidia RTX 3080 训练模型。首先，我将所有图像加载到一个大小为 9100 x 256 x 64 的 numpy 数组中，称为 traindata。然后，为了形成训练数据集，我使用

train_dataset = (tf.data.Dataset.from_tensor_slices(traindata).shuffle(len(traindata)).batch(batch_size))

这里我用的是65的batch_size，我主要有2个问题，关于我在训练中看到的东西：

问题 1：

根据docs, the whole dataset is being re-shuffled for every epoch. However, the training is very slow in this way (around 50 seconds per epoch). I did a comparison with a training without shuffle by not calling .shuffle(len(traindata)) when creating the dataset, and the training is much faster (around 20s/epoch). I am wondering why the .shuffle() operation is so slow and if there's any methods to make it faster? According to this StatsSE thread，洗牌对于训练来说非常重要，这就是我加入洗牌操作的原因。

问题 2：

当我在创建数据集时调用.shuffle()时，Tensorflow总是给出以下信息

I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 4294967295

上网查了下还是没明白这句话的意思。这是否意味着存在一些错误，或者这只是一个我可以忽略的警告？

Answer 1

那是因为将数据集的所有元素保存在缓冲区中的成本很高。除非你绝对需要完美的随机性，否则你应该使用较小的 buffer_size。最终将采用所有元素，但采用更确定的方式。

这就是较小的 buffer_size 会发生的情况，比如 3。缓冲区是括号，Tensorflow 在这个括号中采样一个随机值。随机抽取的是^

1) [1 2 3]4 5 6 7 8 9 
      ^
2) [1 3 4]5 6 7 8
        ^
3) [1 3 5]6 7 8
        ^
4) [1 3 6]7 8
    ^
5) [3 6 7]8

等等

因此，较早的值将在您的纪元中较早地获取，但您仍将进行一些改组，最终将获取所有样本。

tl;dr 将 buffer_size 减少很多

Tensorflow tf.dataset.shuffle 很慢

Tensorflow tf.dataset.shuffle very slow

python

machine-learning

tensorflow

tensorflow-datasets

问题 1：

问题 2：