为什么我的模型在 Google Colab 上训练时在 Keras Tensorflow 中总是 return 0 val loss?
Why my model always return 0 val loss in Keras Tensorflow when trained on Google Colab?
我正在尝试在 Colab 上训练一个简单的模型,但是当 !python train.py
使用我自己的代码时,它总是 returns 0 验证损失。然而,这段代码在我自己的电脑上运行得很好。有谁知道原因吗?
Epoch 1/500
2020-06-17 19:53:31.689547: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-17 19:53:31.889892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
47/47 - 7s - loss: 52.6930 - mse: 2876.5457 - mae: 52.5915 - val_loss: 0.1029 - val_mse: 0.0000e+00 - val_mae: 0.0000e+00
训练代码:
def build_model(self):
new_model = self.base_model
opt = Adam(lr=self.lr)
new_model.compile(loss='mae',
optimizer=opt,
metrics=['mse', 'mae'])
return new_model
def train(self, base_epochs=500,
save_model=False, save_path=None, cal_time=True):
model = self.build_model()
early_stopping = EarlyStopping(monitor='val_loss',
patience=50,
mode='min')
save_best = ModelCheckpoint(filepath=save_file,
monitor='val_loss',
save_best_only=True,
mode='min')
cp_callback = [early_stopping, save_best]
history = model.fit(
x=self.standardize(self.train_data),
y=self.train_labels,
batch_size=self.batch_size,
epochs=base_epochs,
verbose=2,
callbacks=cp_callback,
validation_data=[self.standardize(self.val_data), self.val_labels],
)
return history
我还写了代码来检查图像数据。
def check_data(self):
data_name = ['Train Data', 'Train Labels', 'Validation Data', 'Validation Labels']
for i, data in enumerate([self.train_data, self.train_labels, self.val_data, self.val_labels]):
print('{0:<20}: shape-{1:<20} type--{2}' \
.format(data_name[i], str(data.shape), data.dtype))
这里是关于数据的信息,它们都是 numpy 数组:
Train Data : shape-(3000, 224, 224, 1) type--float32
Train Labels : shape-(3000, 2) type--float64
Validation Data : shape-(200, 224, 224, 1) type--float32
Validation Labels : shape-(200, 2) type--float64
好的我终于找到问题了:
我正在向validation_data=
传递一个列表,根据官网应该是一个元组。
应该是:
history = model.fit(
x=self.standardize(self.train_data),
y=self.train_labels,
batch_size=self.batch_size,
epochs=base_epochs,
verbose=2,
callbacks=cp_callback,
validation_data=(self.standardize(self.val_data), self.val_labels),
)
我正在尝试在 Colab 上训练一个简单的模型,但是当 !python train.py
使用我自己的代码时,它总是 returns 0 验证损失。然而,这段代码在我自己的电脑上运行得很好。有谁知道原因吗?
Epoch 1/500
2020-06-17 19:53:31.689547: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-17 19:53:31.889892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
47/47 - 7s - loss: 52.6930 - mse: 2876.5457 - mae: 52.5915 - val_loss: 0.1029 - val_mse: 0.0000e+00 - val_mae: 0.0000e+00
训练代码:
def build_model(self):
new_model = self.base_model
opt = Adam(lr=self.lr)
new_model.compile(loss='mae',
optimizer=opt,
metrics=['mse', 'mae'])
return new_model
def train(self, base_epochs=500,
save_model=False, save_path=None, cal_time=True):
model = self.build_model()
early_stopping = EarlyStopping(monitor='val_loss',
patience=50,
mode='min')
save_best = ModelCheckpoint(filepath=save_file,
monitor='val_loss',
save_best_only=True,
mode='min')
cp_callback = [early_stopping, save_best]
history = model.fit(
x=self.standardize(self.train_data),
y=self.train_labels,
batch_size=self.batch_size,
epochs=base_epochs,
verbose=2,
callbacks=cp_callback,
validation_data=[self.standardize(self.val_data), self.val_labels],
)
return history
我还写了代码来检查图像数据。
def check_data(self):
data_name = ['Train Data', 'Train Labels', 'Validation Data', 'Validation Labels']
for i, data in enumerate([self.train_data, self.train_labels, self.val_data, self.val_labels]):
print('{0:<20}: shape-{1:<20} type--{2}' \
.format(data_name[i], str(data.shape), data.dtype))
这里是关于数据的信息,它们都是 numpy 数组:
Train Data : shape-(3000, 224, 224, 1) type--float32
Train Labels : shape-(3000, 2) type--float64
Validation Data : shape-(200, 224, 224, 1) type--float32
Validation Labels : shape-(200, 2) type--float64
好的我终于找到问题了:
我正在向validation_data=
传递一个列表,根据官网应该是一个元组。
应该是:
history = model.fit(
x=self.standardize(self.train_data),
y=self.train_labels,
batch_size=self.batch_size,
epochs=base_epochs,
verbose=2,
callbacks=cp_callback,
validation_data=(self.standardize(self.val_data), self.val_labels),
)