为什么我的模型在 Google Colab 上训练时在 Keras Tensorflow 中总是 return 0 val loss?

Why my model always return 0 val loss in Keras Tensorflow when trained on Google Colab?

我正在尝试在 Colab 上训练一个简单的模型,但是当 !python train.py 使用我自己的代码时,它总是 returns 0 验证损失。然而,这段代码在我自己的电脑上运行得很好。有谁知道原因吗?

Epoch 1/500
2020-06-17 19:53:31.689547: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-17 19:53:31.889892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
47/47 - 7s - loss: 52.6930 - mse: 2876.5457 - mae: 52.5915 - val_loss: 0.1029 - val_mse: 0.0000e+00 - val_mae: 0.0000e+00

训练代码:

    def build_model(self):
        new_model = self.base_model

        opt = Adam(lr=self.lr)
        new_model.compile(loss='mae',
                          optimizer=opt,
                          metrics=['mse', 'mae'])

        return new_model

    def train(self, base_epochs=500,
              save_model=False, save_path=None, cal_time=True):
        model = self.build_model()

        early_stopping = EarlyStopping(monitor='val_loss',
                                       patience=50,
                                       mode='min')
        save_best = ModelCheckpoint(filepath=save_file,
                                    monitor='val_loss',
                                    save_best_only=True,
                                    mode='min')
        cp_callback = [early_stopping, save_best]

        history = model.fit(
            x=self.standardize(self.train_data),
            y=self.train_labels,
            batch_size=self.batch_size,
            epochs=base_epochs,
            verbose=2,
            callbacks=cp_callback,
            validation_data=[self.standardize(self.val_data), self.val_labels],
        )
        return history

我还写了代码来检查图像数据。

    def check_data(self):
        data_name = ['Train Data', 'Train Labels', 'Validation Data', 'Validation Labels']
        for i, data in enumerate([self.train_data, self.train_labels, self.val_data, self.val_labels]):
            print('{0:<20}:  shape-{1:<20} type--{2}' \
                  .format(data_name[i], str(data.shape), data.dtype))

这里是关于数据的信息,它们都是 numpy 数组:

Train Data          :  shape-(3000, 224, 224, 1)  type--float32
Train Labels        :  shape-(3000, 2)            type--float64
Validation Data     :  shape-(200, 224, 224, 1)   type--float32
Validation Labels   :  shape-(200, 2)             type--float64

好的我终于找到问题了:

我正在向validation_data=传递一个列表,根据官网应该是一个元组。

应该是:

history = model.fit(
    x=self.standardize(self.train_data),
    y=self.train_labels,
    batch_size=self.batch_size,
    epochs=base_epochs,
    verbose=2,
    callbacks=cp_callback,
    validation_data=(self.standardize(self.val_data), self.val_labels),
)