使用 Caffe 训练时文件列表应该排序吗？

Question

当使用 Caffe 进行训练而不使用 lmdb 文件时，必须提供用于训练和验证输入文件的列表文件。通常这两个列表文件称为 train.txt 和 val.txt。它们具有相同的结构，如下所示：

/path/to/a/file.jpg 0
/path/to/another/file.jpg 0
...
/path/to/another/file.jpg M
/path/to/another/file.jpg M
...
/path/to/another/file.jpg N
/path/to/another/file.jpg N

一组 N+1 个类别。

train.txt 和 val.txt 然后分别在 TRAIN 阶段和 TEST 阶段的节中的 train_val.prototxt 中引用。

我的问题：train.txt 和 val.txt 是否应该按类别编号（即按数字第二字段）排序？

提问原因：在示例中，文件始终按类别编号排序。如果我对 train.txt 和 val.txt 文件进行随机排序，它不会中断训练 - caffe.bin 不会崩溃或报告警告。 OTOH 我不知道 caffe 是按行顺序读取 train.txt 和 val.txt，还是随机抽取它们。

Answer 1

Caffe支持逐行排序或随机播放：https://github.com/BVLC/caffe/blob/2a1c552b66f026c7508d390b526f2495ed3be594/src/caffe/layers/image_data_layer.cpp#L51

要启用随机播放，您需要在 ImageDataLayer(https://github.com/BVLC/caffe/blob/2a1c552b66f026c7508d390b526f2495ed3be594/src/caffe/proto/caffe.proto#L810)

中添加一个 shuffle: true 参数

使用 Caffe 训练时文件列表应该排序吗？

When training with Caffe should file lists be sorted?

configuration

machine-learning

training-data

caffe