快速提问：在 tfds.load 中使用 shuffle_files 在实践中改组数据

Question

在最新版本的 TF 中使用 tfds.load 调用 shuffle_files 时，如果像 imagenet 这样的加载数据集（我认为分为 1024 个不同的文件）被调用为：

tfds.load(name = 'imagenet', shuffle_files = True)

这将随机播放不同的文件，但不会随机播放每 1024 个文件中的实际图像。在实践中这样做有什么原因吗？这是否与您通常在将一组 100 张图像输入神经网络之前将其打乱顺序的原因相同？

谢谢！

Answer 1

我认为你在谈论 'imagenet2012' 所以你的代码应该是：

ds = tfds.load('imagenet2012', split='train', shuffle_files=True)

如果你说的是imagenet，你需要看这个页面load imagenet

这里参数shuffle_files会在批量加载时打乱文件。但是你也应该洗牌数据集。这里有一个关于数据集随机播放如何工作的教程 shuffle_repeat_explained 您还可以在这里找到 shuffle_files 如何使性能更好 shuffle and training

Quick Question: shuffling data in practice with shuffle_files in tfds.load