KeyError: ''val_loss" when training model

Question

我正在使用 keras 训练模型，但在 fit_generator 函数的回调中遇到错误。我总是运行第 3 个纪元并得到这个错误

annotation_path = 'train2.txt'
    log_dir = 'logs/000/'
    classes_path = 'model_data/deplao_classes.txt'
    anchors_path = 'model_data/yolo_anchors.txt'
    class_names = get_classes(classes_path)
    num_classes = len(class_names)
    anchors = get_anchors(anchors_path)

    input_shape = (416,416) # multiple of 32, hw

    is_tiny_version = len(anchors)==6 # default setting
    if is_tiny_version:
        model = create_tiny_model(input_shape, anchors, num_classes,
            freeze_body=2, weights_path='model_data/tiny_yolo_weights.h5')
    else:
        model = create_model(input_shape, anchors, num_classes,
            freeze_body=2, weights_path='model_data/yolo_weights.h5') # make sure you know what you freeze

    logging = TensorBoard(log_dir=log_dir)
    checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
        monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)

    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
    early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)


[error]
Traceback (most recent call last):
  File "train.py", line 194, in <module>
    _main()
  File "train.py", line 69, in _main
    callbacks=[logging, checkpoint])
  File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\engine\training_generator.py", line 251, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\callbacks.py", line 79, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\callbacks.py", line 429, in on_epoch_end
    filepath = self.filepath.format(epoch=epoch + 1, **logs)
KeyError: 'val_loss'

谁能找出问题来帮助我？

在此先感谢您的帮助。

Answer 1

此回调在迭代 3 结束时运行。

    checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
        monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)

报错提示执行时logs变量中没有val_loss:

filepath = self.filepath.format(epoch=epoch + 1, **logs)

如果在没有 validation_data 的情况下调用 fit 就会发生这种情况。

我将从简化模型检查点的路径名开始。在名称中包含时代可能就足够了。

Answer 2

此答案不适用于该问题，但这是 keras "KeyError: 'val_loss'" 的 Google 结果的顶部，因此我将分享我的问题的解决方案。

这个错误对我来说是一样的：在检查点文件名中使用 val_loss 时，我会得到以下错误：KeyError: 'val_loss'。我的检查点也在监视这个字段，所以即使我把这个字段从文件名中去掉，我仍然会从检查点得到这个警告：WARNING:tensorflow:Can save best model only with val_loss available, skipping.

就我而言，问题是我从单独使用 Keras 和 Tensorflow 1 升级到使用 Tensorflow 2 附带的 Keras。ModelCheckpoint 的 period 参数已替换为save_freq。我错误地认为 save_freq 的行为方式相同，所以我将其设置为 save_freq=1 认为这将保存每一个史诗。但是，docs 状态：

save_freq: 'epoch' or integer. When using 'epoch', the callback saves the model after each epoch. When using integer, the callback saves the model at end of a batch at which this many samples have been seen since last saving. Note that if the saving isn't aligned to epochs, the monitored metric may potentially be less reliable (it could reflect as little as 1 batch, since the metrics get reset every epoch). Defaults to 'epoch'

设置save_freq='epoch'为我解决了这个问题。 注意：OP 仍在使用 period=1，所以这绝对不是导致他们出现问题的原因

Answer 3

对我来说，问题是我试图将 initial_epoch（在 model.fit 中）设置为标准 0 以外的值。我这样做是因为我运行 model.fit 在每个循环运行 10 个 epoch 的循环中，然后检索历史数据，检查损失是否减少并再次运行 model.fit 直到满意为止。
我以为我必须在重新启动以前的模型时更新该值，但显然没有...

switch = True
epoch = 0
wait = 0
previous = 10E+10
while switch:
    history = model.fit( X, y, batch_size=1, epochs=step, verbose=False )
    epoch += step
    current = history.history["loss"][-1]
    if current >= previous:
        wait += 1
        if wait >= tolerance:
            switch = False
    else:
        wait = 0
    if epoch >= max_epochs:
        switch = False
    previous = current

Answer 4

就我而言，当 colab notebook 尝试从 google 驱动器读取图像时，val_generator 损坏。所以我运行单元格再次创建 val_generator 并且它起作用了

Answer 5

我不知道这是否适用于所有情况。但是，对我来说，我重新启动了我的电脑，它似乎工作了。

Answer 6

在文件路径和检查点中使用 val_accuracy。如果仍然没有改善，请重新启动电脑或 colab。

Answer 7

当我们没有向模型提供验证数据时会发生此错误，并检查 model.fit_generator(or model.fit)(train_data, steps_per_epoch,validation_data, validation_steps, epochs,initial_epoch, 回调)

Answer 8

我遇到了这个错误，但没能在网上找到错误的原因。

在我的案例中发生的事情是我要求的训练样本比我实际拥有的要多。 TF 没有给我明确的错误，它甚至为我提供了损失的保存值。我只收到深奥的 KeyError: "val_loss" 试图保存它时。

希望这能帮助人们找出他们遇到的愚蠢错误。

KeyError: ''val_loss" when training model

KeyError: ''val_loss" when training model

python

keras

yolo