YOLOv4 在 Colab Pro 上仅使用 340 张训练图像报告了 30 小时的训练时间

YOLOv4 reports 30 hour training time on Colab Pro with only 340 training images

我正在尝试在 Colab Pro 上测试我的模型,我只使用了 340 张训练图像,其中 16 张 类 仅用于测试。但是,Colab Pro 告诉我,还有大约 30 小时的训练时间:

(next mAP calculation at 1200 iterations) 
 Last accuracy mAP@0.5 = 0.37 %, best = 0.37 % 
 1187: 3.270728, 3.027621 avg loss, 0.010000 rate, 1.429193 seconds, 75968 images, 30.824708 hours left
Loaded: 1.136631 seconds - performance bottleneck on CPU or Disk HDD/SSD
...
...
...
 (next mAP calculation at 1300 iterations) 
 Last accuracy mAP@0.5 = 0.33 %, best = 0.37 % 
 1278: 3.231166, 2.967602 avg loss, 0.010000 rate, 2.552415 seconds, 81792 images, 30.512658 hours left
Loaded: 0.712928 seconds - performance bottleneck on CPU or Disk HDD/SSD

我不知道为什么要这样做。我只有一个小数据集。

这是我的 cnfg 参数:

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=1024
height=1024
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
 
learning_rate=0.01
burn_in=1000
max_batches = {max_batches}
policy=steps
steps={steps_str}
scales=.1,.1
 
[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers=-1
groups=2
group_id=1
 
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers = -1,-2
 
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
 
[route]
layers = -6,-1
 
[maxpool]
size=2
stride=2
 
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers=-1
groups=2
group_id=1
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers = -1,-2
 
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
 
[route]
layers = -6,-1
 
[maxpool]
size=2
stride=2
 
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers=-1
groups=2
group_id=1
 
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
 
[route]
layers = -1,-2
 
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
 
[route]
layers = -6,-1
 
[maxpool]
size=2
stride=2
 
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
 
##################################
 
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
 
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
size=1
stride=1
pad=1
filters={num_filters}
activation=linear
 
 
 
[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes={num_classes}
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
truth_thresh = 1
random=1
nms_kind=greedynms
beta_nms=0.6
ignore_thresh = .9 
iou_normalizer=0.5 
iou_loss=giou
 
[route]
layers = -4
 
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
 
[upsample]
stride=2
 
[route]
layers = -1, 23
 
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
 
[convolutional]
size=1
stride=1
pad=1
filters={num_filters}
activation=linear
 
[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes={num_classes}
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
ignore_thresh = .9 
iou_normalizer=0.5
iou_loss=giou
truth_thresh = 1
random=1
nms_kind=greedynms
beta_nms=0.6

你的训练取决于max_batches参数,基本上是最大批数。

根据这个 repo's 建议,max_batches 应该是 classes*2000。所以在你的例子中,它是 16*2000=32,000。这就是为什么尽管数据集很小,但仍需要更多时间的原因。