如何将检查点保存为每个时期的文件名,然后从 Tensorflow 2 中最新保存的权重中加载权重?

How to save checkpoints as filenames with every epoch and then load the weights from the latest saved one in Tensorflow 2?

当我 运行 以下代码时,我正在创建名为 cp_1、cp_2 的文件夹,同时我想在每个时期保存检查点文件。 然后我想使用最新保存的检查点文件通过 model.load_weights(tf.train.latest_checkpoint('model_checkpoints_5000'))

为我的模型实例加载权重

请问我该怎么做?

import os
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

# Use the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0

# Use smaller subset -- speeds things up
x_train = x_train[:10000]
y_train = y_train[:10000]
x_test = x_test[:1000]
y_test = y_test[:1000]

# define a function that creates a new instance of a simple CNN.
def create_model():
    model = Sequential([
        Conv2D(filters=16, input_shape=(32, 32, 3), kernel_size=(3, 3), 
               activation='relu', name='conv_1'),
        Conv2D(filters=8, kernel_size=(3, 3), activation='relu', name='conv_2'),
        MaxPooling2D(pool_size=(4, 4), name='pool_1'),
        Flatten(name='flatten'),
        Dense(units=32, activation='relu', name='dense_1'),
        Dense(units=10, activation='softmax', name='dense_2')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model


checkpoint_5000_path = './model_checkpoints_5000/cp_{epoch:02d}'
checkpoint_5000 = ModelCheckpoint(filepath = checkpoint_5000_path,
                                 save_weights= True,
                                 save_freq = 'epoch',
                                 verbose = 1)


model = create_model()
model.fit(x = x_train,
          y = y_train,
          epochs = 3,
          validation_data = (x_test, y_test),
          batch_size = 10,
          callbacks = [checkpoint_5000])

我的输出如下。

Epoch 00001: saving model to ./model_checkpoints_5000\cp_01
INFO:tensorflow:Assets written to: ./model_checkpoints_5000\cp_01\assets
Epoch 2/3
1000/1000 [==============================] - 3s 3ms/step - loss: 1.4493 - accuracy: 0.4744 - val_loss: 1.4664 - val_accuracy: 0.4770

我试过将 .h5 添加到

'./model_checkpoints_5000/cp_{epoch:02d}.h5'. 

然而,如果我尝试 tf.train.latest_checkpoint('model_checkpoints_5000'),我得到None? 虽然我应该得到文件名 cp_03.h5?

训练模型后需要使用以下代码:

checkpoint_dir = os.path.dirname(checkpoint_5000_path)
os.listdir(checkpoint_dir)

输出:

['cp_01',
 'cp_00.h5',
 'cp_03',
 'cp_00.data-00000-of-00001',
 'cp_00.index',
 'cp_03.h5',
 'cp_02',
 'cp_01.h5',
 'cp_02.h5',
 'checkpoint']

请查看此 link 了解更多详情。