通过 colab 上的回调保存准确性和损失
Saving accuracy and loss with callback on colab
所以我正在尝试在 colab 上训练一个模型,这将花费我大约 70-72 小时的时间 运行。我有一个免费帐户,所以我经常因过度使用或不活动而被踢,这意味着我不能将历史记录转储到 pickle 文件中。
history = model.fit_generator(custom_generator(train_csv_list,batch_size), steps_per_epoch=len(train_csv_list[:13400])//(batch_size), epochs=1000, verbose=1, callbacks=[stop_training], validation_data=(x_valid,y_valid))
我在回调方法中找到了 CSVLogger,并将其添加到我的回调中,如下所示。但由于某种原因它不会创建 model_history_log.csv 。我没有收到任何错误或警告。我做错了什么?
我的目标是在整个训练过程中只保存准确率和损失
class stop_(Callback):
def on_epoch_end(self, epoch, logs={}):
model.save(Path("/content/drive/MyDrive/.../model" +str(int(epoch))))
CSVLogger("/content/drive/MyDrive/.../model_history_log.csv", append=True)
if(logs.get('accuracy') > ACCURACY_THRESHOLD):
print("\nReached %2.2f%% accuracy, so stopping training!!" %(ACCURACY_THRESHOLD*100))
self.model.stop_training = True
stop_training = stop_()
另外,由于我在每个时期都保存了模型,模型是否保存了这些信息?到目前为止我还没有发现任何东西,我怀疑它能节省准确性、损失、val 准确性等
你想像下面这样写你的回调
class STOP(tf.keras.callbacks.Callback):
def __init__ (self, model, csv_path, model_save_dir, epochs, acc_thld): # initialization of the callback
# model is your compiled model
# csv_path is path where csv file will be stored
# model_save_dir is path to directory where model files will be saved
# number of epochs you set in model.fit
self.model=model
self.csv_path=csv_path
self.model_save_dir=model_save_dir
self.epochs=epochs
self.acc_thld=acc_thld
self.acc_list=[] # create empty list to store accuracy
self.loss_list=[] # create empty list to store loss
self.epoch_list=[] # create empty list to store the epoch
def on_epoch_end(self, epoch, logs=None): # method runs on the end of each epoch
savestr='_' + str(epoch+1) + '.h5' # model will be save as an .h5 file with name _epoch.h5
save_path=os.path.join(self.model_save_dir, savestr)
acc= logs.get('accuracy') #get the accuracy for this epoch
loss=logs.get('loss') # get the loss for this epoch
self.model.save (save_path) # save the model
self.acc_list.append(logs.get('accuracy'))
self.loss_list.append(logs.get('loss'))
self.epoch_list.append(epoch + 1)
if acc > self.acc_thld or epoch+1 ==epochs: # see of acc >thld or if this was the last epoch
self.model.stop_training = True # stop training
Eseries=pd.Series(self.epoch_list, name='Epoch')
Accseries =pd.Series(self.acc_list, name='accuracy')
Lseries=pd.Series(self.loss_list, name='loss')
df=pd.concat([Eseries, Lseries, Accseries], axis=1) # create a dataframe with columns epoch loss accuracy
df.to_csv(self.csv_path, index=False) # convert dataframe to a csv file and save it
if acc > self.acc_thld:
print ('\nTraining halted on epoch ', epoch + 1, ' when accuracy exceeded the threshhold')
然后在你 运行 model.fit 使用代码之前
epochs=20 # set number of epoch for model.fit and the callback
sdir=r'C:\Temp\stooges' # set directory where save model files and the csv file will be stored
acc_thld=.98 # set accuracy threshold
csv_path=os.path.join(sdir, 'traindata.csv') # name your csv file to be saved in sdir
callbacks=STOP(model, csv_path, sdir, epochs, acc_thld) # instantiate the callback
记得在 model.fit 中设置 callbacks = callbacks。我在一个简单的数据集上测试了这个。在精度超过 0.98 的阈值之前,它 运行 仅持续了 3 个时期。因此,由于它 运行 3 个纪元,它在 sdir 中创建了 3 个保存模型文件,标记为
_1.h5
_2.h5
_3.h5
它还创建了标记为 traindata.csv 的 csv 文件。 csv 文件内容为
Epoch loss accuracy
1 8.086007 .817778
2 6.911876 .974444
3 6.129871 .987778
所以我正在尝试在 colab 上训练一个模型,这将花费我大约 70-72 小时的时间 运行。我有一个免费帐户,所以我经常因过度使用或不活动而被踢,这意味着我不能将历史记录转储到 pickle 文件中。
history = model.fit_generator(custom_generator(train_csv_list,batch_size), steps_per_epoch=len(train_csv_list[:13400])//(batch_size), epochs=1000, verbose=1, callbacks=[stop_training], validation_data=(x_valid,y_valid))
我在回调方法中找到了 CSVLogger,并将其添加到我的回调中,如下所示。但由于某种原因它不会创建 model_history_log.csv 。我没有收到任何错误或警告。我做错了什么? 我的目标是在整个训练过程中只保存准确率和损失
class stop_(Callback):
def on_epoch_end(self, epoch, logs={}):
model.save(Path("/content/drive/MyDrive/.../model" +str(int(epoch))))
CSVLogger("/content/drive/MyDrive/.../model_history_log.csv", append=True)
if(logs.get('accuracy') > ACCURACY_THRESHOLD):
print("\nReached %2.2f%% accuracy, so stopping training!!" %(ACCURACY_THRESHOLD*100))
self.model.stop_training = True
stop_training = stop_()
另外,由于我在每个时期都保存了模型,模型是否保存了这些信息?到目前为止我还没有发现任何东西,我怀疑它能节省准确性、损失、val 准确性等
你想像下面这样写你的回调
class STOP(tf.keras.callbacks.Callback):
def __init__ (self, model, csv_path, model_save_dir, epochs, acc_thld): # initialization of the callback
# model is your compiled model
# csv_path is path where csv file will be stored
# model_save_dir is path to directory where model files will be saved
# number of epochs you set in model.fit
self.model=model
self.csv_path=csv_path
self.model_save_dir=model_save_dir
self.epochs=epochs
self.acc_thld=acc_thld
self.acc_list=[] # create empty list to store accuracy
self.loss_list=[] # create empty list to store loss
self.epoch_list=[] # create empty list to store the epoch
def on_epoch_end(self, epoch, logs=None): # method runs on the end of each epoch
savestr='_' + str(epoch+1) + '.h5' # model will be save as an .h5 file with name _epoch.h5
save_path=os.path.join(self.model_save_dir, savestr)
acc= logs.get('accuracy') #get the accuracy for this epoch
loss=logs.get('loss') # get the loss for this epoch
self.model.save (save_path) # save the model
self.acc_list.append(logs.get('accuracy'))
self.loss_list.append(logs.get('loss'))
self.epoch_list.append(epoch + 1)
if acc > self.acc_thld or epoch+1 ==epochs: # see of acc >thld or if this was the last epoch
self.model.stop_training = True # stop training
Eseries=pd.Series(self.epoch_list, name='Epoch')
Accseries =pd.Series(self.acc_list, name='accuracy')
Lseries=pd.Series(self.loss_list, name='loss')
df=pd.concat([Eseries, Lseries, Accseries], axis=1) # create a dataframe with columns epoch loss accuracy
df.to_csv(self.csv_path, index=False) # convert dataframe to a csv file and save it
if acc > self.acc_thld:
print ('\nTraining halted on epoch ', epoch + 1, ' when accuracy exceeded the threshhold')
然后在你 运行 model.fit 使用代码之前
epochs=20 # set number of epoch for model.fit and the callback
sdir=r'C:\Temp\stooges' # set directory where save model files and the csv file will be stored
acc_thld=.98 # set accuracy threshold
csv_path=os.path.join(sdir, 'traindata.csv') # name your csv file to be saved in sdir
callbacks=STOP(model, csv_path, sdir, epochs, acc_thld) # instantiate the callback
记得在 model.fit 中设置 callbacks = callbacks。我在一个简单的数据集上测试了这个。在精度超过 0.98 的阈值之前,它 运行 仅持续了 3 个时期。因此,由于它 运行 3 个纪元,它在 sdir 中创建了 3 个保存模型文件,标记为
_1.h5
_2.h5
_3.h5
它还创建了标记为 traindata.csv 的 csv 文件。 csv 文件内容为
Epoch loss accuracy
1 8.086007 .817778
2 6.911876 .974444
3 6.129871 .987778