训练后如何save/restore一个模型?
How to save/restore a model after training?
在 Tensorflow 中训练模型后:
- 如何保存训练好的模型?
- 你以后如何恢复这个保存的模型?
对于 TensorFlow 版本 < 0.11.0RC1:
保存的检查点包含模型中 Variable
的值,而不是 model/graph 本身的值,这意味着恢复检查点时图形应该相同。
这是一个线性回归示例,其中有一个保存变量检查点的训练循环和一个将恢复先前 运行 中保存的变量并计算预测的评估部分。当然你也可以恢复变量继续训练
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
w = tf.Variable(tf.zeros([1, 1], dtype=tf.float32))
b = tf.Variable(tf.ones([1, 1], dtype=tf.float32))
y_hat = tf.add(b, tf.matmul(x, w))
...more setup for optimization and what not...
saver = tf.train.Saver() # defaults to saving all variables - in this case w and b
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
if FLAGS.train:
for i in xrange(FLAGS.training_steps):
...training loop...
if (i + 1) % FLAGS.checkpoint_steps == 0:
saver.save(sess, FLAGS.checkpoint_dir + 'model.ckpt',
global_step=i+1)
else:
# Here's where you're restoring the variables w and b.
# Note that the graph is exactly as it was when the variables were
# saved in a prior training run.
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
else:
...no checkpoint found...
# Now you can run the model to get predictions
batch_x = ...load some data...
predictions = sess.run(y_hat, feed_dict={x: batch_x})
这是 Saver
的 docs for Variable
s, which cover saving and restoring. And here are the docs。
模型有两部分,模型定义,由 Supervisor
在模型目录中保存为 graph.pbtxt
和张量的数值,保存到检查点文件中,如 model.ckpt-1003418
.
可以使用tf.import_graph_def
恢复模型定义,使用Saver
恢复权重。
但是,Saver
使用特殊的集合保存附加到模型图的变量列表,并且此集合未使用 import_graph_def 初始化,因此您不能同时使用两者时刻(这是我们修复的路线图)。现在,您必须使用 Ryan Sepassi 的方法——手动构建具有相同节点名称的图,并使用 Saver
将权重加载到其中。
(或者,您可以通过使用 import_graph_def
、手动创建变量、对每个变量使用 tf.add_to_collection(tf.GraphKeys.VARIABLES, variable)
,然后使用 Saver
)
正如 Yaroslav 所说,您可以通过导入图表、手动创建变量然后使用 Saver 来从 graph_def 和检查点恢复。
我实现这个是为了个人使用,所以我想在这里分享代码。
Link: https://gist.github.com/nikitakit/6ef3b72be67b86cb7868
(当然,这是一种 hack,不能保证以这种方式保存的模型在未来版本的 TensorFlow 中仍然可读。)
您还可以查看 examples in TensorFlow/skflow,它提供了 save
和 restore
方法,可以帮助您轻松管理模型。它具有您还可以控制备份模型的频率的参数。
如果是内部保存的模型,你只需要为所有变量指定一个恢复器即可
restorer = tf.train.Saver(tf.all_variables())
并使用它来恢复当前会话中的变量:
restorer.restore(self._sess, model_file)
对于外部模型,您需要指定从其变量名到您的变量名的映射。您可以使用命令
查看模型变量名称
python /path/to/tensorflow/tensorflow/python/tools/inspect_checkpoint.py --file_name=/path/to/pretrained_model/model.ckpt
inspect_checkpoint.py 脚本可以在 Tensorflow 源的“./tensorflow/python/tools”文件夹中找到。
要指定映射,可以使用我的Tensorflow-Worklab, which contains a set of classes and scripts to train and retrain different models. It includes an example of retraining ResNet models, located here
在TensorFlow 0.11.0RC1(及之后)版本中,您可以根据https://www.tensorflow.org/programmers_guide/meta_graph.
调用tf.train.export_meta_graph
和tf.train.import_meta_graph
直接保存和恢复模型
保存模型
w1 = tf.Variable(tf.truncated_normal(shape=[10]), name='w1')
w2 = tf.Variable(tf.truncated_normal(shape=[20]), name='w2')
tf.add_to_collection('vars', w1)
tf.add_to_collection('vars', w2)
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, 'my-model')
# `save` method will call `export_meta_graph` implicitly.
# you will get saved graph files:my-model.meta
恢复模型
sess = tf.Session()
new_saver = tf.train.import_meta_graph('my-model.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
all_vars = tf.get_collection('vars')
for v in all_vars:
v_ = sess.run(v)
print(v_)
如问题 6255 中所述:
use '**./**model_name.ckpt'
saver.restore(sess,'./my_model_final.ckpt')
而不是
saver.restore('my_model_final.ckpt')
您也可以采用这种更简单的方法。
第 1 步:初始化所有变量
W1 = tf.Variable(tf.truncated_normal([6, 6, 1, K], stddev=0.1), name="W1")
B1 = tf.Variable(tf.constant(0.1, tf.float32, [K]), name="B1")
Similarly, W2, B2, W3, .....
第 2 步:在模型 Saver
中保存会话并保存
model_saver = tf.train.Saver()
# Train the model and save it in the end
model_saver.save(session, "saved_models/CNN_New.ckpt")
第三步:恢复模型
with tf.Session(graph=graph_cnn) as session:
model_saver.restore(session, "saved_models/CNN_New.ckpt")
print("Model restored.")
print('Initialized')
第 4 步:检查您的变量
W1 = session.run(W1)
print(W1)
虽然 运行 在不同的 python 实例中,使用
with tf.Session() as sess:
# Restore latest checkpoint
saver.restore(sess, tf.train.latest_checkpoint('saved_model/.'))
# Initalize the variables
sess.run(tf.global_variables_initializer())
# Get default graph (supply your custom graph if you have one)
graph = tf.get_default_graph()
# It will give tensor object
W1 = graph.get_tensor_by_name('W1:0')
# To get the value (numpy array)
W1_value = session.run(W1)
在大多数情况下,使用 tf.train.Saver
从磁盘保存和恢复是您的最佳选择:
... # build your model
saver = tf.train.Saver()
with tf.Session() as sess:
... # train the model
saver.save(sess, "/tmp/my_great_model")
with tf.Session() as sess:
saver.restore(sess, "/tmp/my_great_model")
... # use the model
您还可以 save/restore 图形结构本身(有关详细信息,请参阅 MetaGraph documentation)。默认情况下,Saver
将图形结构保存到 .meta
文件中。您可以调用 import_meta_graph()
来恢复它。它恢复了图形结构和 returns 一个 Saver
,你可以用它来恢复模型的状态:
saver = tf.train.import_meta_graph("/tmp/my_great_model.meta")
with tf.Session() as sess:
saver.restore(sess, "/tmp/my_great_model")
... # use the model
但是,有些情况下您需要更快的速度。例如,如果你实施提前停止,你希望在训练期间每次模型改进时保存检查点(如在验证集上测量的那样),然后如果一段时间没有进展,你想回滚到最佳模型。如果每次模型改进时都将模型保存到磁盘,它将极大地减慢训练速度。诀窍是将变量状态保存到内存,然后稍后恢复它们:
... # build your model
# get a handle on the graph nodes we need to save/restore the model
graph = tf.get_default_graph()
gvars = graph.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
assign_ops = [graph.get_operation_by_name(v.op.name + "/Assign") for v in gvars]
init_values = [assign_op.inputs[1] for assign_op in assign_ops]
with tf.Session() as sess:
... # train the model
# when needed, save the model state to memory
gvars_state = sess.run(gvars)
# when needed, restore the model state
feed_dict = {init_value: val
for init_value, val in zip(init_values, gvars_state)}
sess.run(assign_ops, feed_dict=feed_dict)
快速解释:当你创建一个变量X
时,TensorFlow会自动创建一个赋值操作X/Assign
来设置变量的初始值。我们没有创建占位符和额外的赋值操作(这只会让图表变得混乱),而是使用这些现有的赋值操作。每个赋值操作的第一个输入是对它应该初始化的变量的引用,第二个输入(assign_op.inputs[1]
)是初始值。因此,为了设置我们想要的任何值(而不是初始值),我们需要使用 feed_dict
并替换初始值。是的,TensorFlow 允许您为任何操作提供一个值,而不仅仅是占位符,所以这很好用。
这是我针对两种基本情况的简单解决方案,这两种情况在您是要从文件加载图表还是在运行时构建图表方面有所不同。
此答案适用于 Tensorflow 0.12+(包括 1.0)。
在代码中重建图表
节省
graph = ... # build the graph
saver = tf.train.Saver() # create the saver after the graph
with ... as sess: # your session object
saver.save(sess, 'my-model')
正在加载
graph = ... # build the graph
saver = tf.train.Saver() # create the saver after the graph
with ... as sess: # your session object
saver.restore(sess, tf.train.latest_checkpoint('./'))
# now you can use the graph, continue training or whatever
同时从文件加载图表
使用此技术时,请确保您的所有 layers/variables 都明确设置了唯一的名称。 否则 Tensorflow 会使名称本身独一无二,因此它们会有所不同从存储在文件中的名称。这在以前的技术中不是问题,因为名称在加载和保存时 "mangled" 相同。
节省
graph = ... # build the graph
for op in [ ... ]: # operators you want to use after restoring the model
tf.add_to_collection('ops_to_restore', op)
saver = tf.train.Saver() # create the saver after the graph
with ... as sess: # your session object
saver.save(sess, 'my-model')
正在加载
with ... as sess: # your session object
saver = tf.train.import_meta_graph('my-model.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
ops = tf.get_collection('ops_to_restore') # here are your operators in the same order in which you saved them to the collection
我正在改进我的答案以添加更多关于保存和恢复模型的细节。
在(及之后)Tensorflow 版本 0.11:
保存模型:
import tensorflow as tf
#Prepare to feed input, i.e. feed_dict and placeholders
w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1= tf.Variable(2.0,name="bias")
feed_dict ={w1:4,w2:8}
#Define a test operation that we will restore
w3 = tf.add(w1,w2)
w4 = tf.multiply(w3,b1,name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
#Create a saver object which will save all the variables
saver = tf.train.Saver()
#Run the operation by feeding input
print sess.run(w4,feed_dict)
#Prints 24 which is sum of (w1+w2)*b1
#Now, save the graph
saver.save(sess, 'my_test_model',global_step=1000)
恢复模型:
import tensorflow as tf
sess=tf.Session()
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('my_test_model-1000.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
# Access saved Variables directly
print(sess.run('bias:0'))
# This will print 2, which is the value of bias that we saved
# Now, let's access and create placeholders variables and
# create feed-dict to feed new data
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict ={w1:13.0,w2:17.0}
#Now, access the op that you want to run.
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")
print sess.run(op_to_restore,feed_dict)
#This will print 60 which is calculated
这个和一些更高级的用例已经在这里得到了很好的解释。
A quick complete tutorial to save and restore Tensorflow models
如果您使用 tf.train.MonitoredTrainingSession 作为默认会话,则无需添加额外的代码来执行 save/restore 操作。只需将检查点目录名称传递给 MonitoredTrainingSession 的构造函数,它将使用会话挂钩来处理这些。
这里所有的答案都很棒,但我想补充两点。
首先,详细说明@user7505159 的回答,将“./”添加到要恢复的文件名的开头可能很重要。
例如,您可以像这样保存文件名中没有“./”的图形:
# Some graph defined up here with specific names
saver = tf.train.Saver()
save_file = 'model.ckpt'
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.save(sess, save_file)
但为了恢复图形,您可能需要在 file_name:
前添加一个“./”
# Same graph defined up here
saver = tf.train.Saver()
save_file = './' + 'model.ckpt' # String addition used for emphasis
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.restore(sess, save_file)
您并不总是需要“./”,但它可能会导致问题,具体取决于您的环境和 TensorFlow 版本。
还想提一下 sess.run(tf.global_variables_initializer())
在恢复会话之前可能很重要。
如果您在尝试恢复已保存的会话时收到有关未初始化变量的错误,请确保在 saver.restore(sess, save_file)
行之前包含 sess.run(tf.global_variables_initializer())
。它可以让你省去头痛。
我的环境:Python3.6,Tensorflow 1.3.0
虽然有很多解决方案,但大多数都是基于tf.train.Saver
。当我们加载由 Saver
保存的 .ckpt
时,我们必须重新定义 tensorflow 网络或使用一些奇怪且难以记住的名称,例如'placehold_0:0'
、'dense/Adam/Weight:0'
。这里推荐使用tf.saved_model
,下面给出一个最简单的例子,大家可以参考Serving a TensorFlow Model:
保存模型:
import tensorflow as tf
# define the tensorflow network and do some trains
x = tf.placeholder("float", name="x")
w = tf.Variable(2.0, name="w")
b = tf.Variable(0.0, name="bias")
h = tf.multiply(x, w)
y = tf.add(h, b, name="y")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# save the model
export_path = './savedmodel'
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y)
prediction_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={'x_input': tensor_info_x},
outputs={'y_output': tensor_info_y},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
prediction_signature
},
)
builder.save()
加载模型:
import tensorflow as tf
sess=tf.Session()
signature_key = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
input_key = 'x_input'
output_key = 'y_output'
export_path = './savedmodel'
meta_graph_def = tf.saved_model.loader.load(
sess,
[tf.saved_model.tag_constants.SERVING],
export_path)
signature = meta_graph_def.signature_def
x_tensor_name = signature[signature_key].inputs[input_key].name
y_tensor_name = signature[signature_key].outputs[output_key].name
x = sess.graph.get_tensor_by_name(x_tensor_name)
y = sess.graph.get_tensor_by_name(y_tensor_name)
y_out = sess.run(y, {x: 3.0})
Tensorflow 2 文档
保存检查点
改编自the docs
# -------------------------
# ----- Toy Context -----
# -------------------------
import tensorflow as tf
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def call(self, x):
return self.l1(x)
def toy_dataset():
inputs = tf.range(10.0)[:, None]
labels = inputs * 5.0 + tf.range(5.0)[None, :]
return (
tf.data.Dataset.from_tensor_slices(dict(x=inputs, y=labels)).repeat().batch(2)
)
def train_step(net, example, optimizer):
"""Trains `net` on `example` using `optimizer`."""
with tf.GradientTape() as tape:
output = net(example["x"])
loss = tf.reduce_mean(tf.abs(output - example["y"]))
variables = net.trainable_variables
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
return loss
# ----------------------------
# ----- Create Objects -----
# ----------------------------
net = Net()
opt = tf.keras.optimizers.Adam(0.1)
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)
# ----------------------------
# ----- Train and Save -----
# ----------------------------
ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
print("Restored from {}".format(manager.latest_checkpoint))
else:
print("Initializing from scratch.")
for _ in range(50):
example = next(iterator)
loss = train_step(net, example, opt)
ckpt.step.assign_add(1)
if int(ckpt.step) % 10 == 0:
save_path = manager.save()
print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
print("loss {:1.2f}".format(loss.numpy()))
# ---------------------
# ----- Restore -----
# ---------------------
# In another script, re-initialize objects
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)
# Re-use the manager code above ^
ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
print("Restored from {}".format(manager.latest_checkpoint))
else:
print("Initializing from scratch.")
for _ in range(50):
example = next(iterator)
# Continue training or evaluate etc.
更多链接
关于 saved_model
-> https://www.tensorflow.org/guide/saved_model
的详尽且有用的教程
keras
模型保存详细指南->https://www.tensorflow.org/guide/keras/save_and_serialize
Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available.
The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Lite, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).
(亮点是我自己的)
张量流 < 2
来自文档:
保存
# Create some variables.
v1 = tf.get_variable("v1", shape=[3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", shape=[5], initializer = tf.zeros_initializer)
inc_v1 = v1.assign(v1+1)
dec_v2 = v2.assign(v2-1)
# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, and save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
inc_v1.op.run()
dec_v2.op.run()
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in path: %s" % save_path)
恢复
tf.reset_default_graph()
# Create some variables.
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
print("Model restored.")
# Check the values of the variables
print("v1 : %s" % v1.eval())
print("v2 : %s" % v2.eval())
simple_save
很多好的答案,为了完整起见,我将添加我的 2 美分:simple_save。也是使用 tf.data.Dataset
API.
的独立代码示例
Python 3 ;张量流 1.14
import tensorflow as tf
from tensorflow.saved_model import tag_constants
with tf.Graph().as_default():
with tf.Session() as sess:
...
# Saving
inputs = {
"batch_size_placeholder": batch_size_placeholder,
"features_placeholder": features_placeholder,
"labels_placeholder": labels_placeholder,
}
outputs = {"prediction": model_output}
tf.saved_model.simple_save(
sess, 'path/to/your/location/', inputs, outputs
)
正在恢复:
graph = tf.Graph()
with restored_graph.as_default():
with tf.Session() as sess:
tf.saved_model.loader.load(
sess,
[tag_constants.SERVING],
'path/to/your/location/',
)
batch_size_placeholder = graph.get_tensor_by_name('batch_size_placeholder:0')
features_placeholder = graph.get_tensor_by_name('features_placeholder:0')
labels_placeholder = graph.get_tensor_by_name('labels_placeholder:0')
prediction = restored_graph.get_tensor_by_name('dense/BiasAdd:0')
sess.run(prediction, feed_dict={
batch_size_placeholder: some_value,
features_placeholder: some_other_value,
labels_placeholder: another_value
})
独立示例
为了演示,以下代码生成随机数据。
- 我们从创建占位符开始。他们将在 运行 时间保存数据。从他们那里,我们创建了
Dataset
,然后是 Iterator
。我们得到迭代器生成的张量,称为 input_tensor
,它将作为我们模型的输入。
- 模型本身是从
input_tensor
构建的:一个基于 GRU 的双向 RNN,后跟一个密集分类器。因为为什么不呢。
- 损失是
softmax_cross_entropy_with_logits
,用 Adam
优化。在 2 个时期(每个时期 2 个批次)之后,我们用 tf.saved_model.simple_save
保存“训练有素”的模型。如果您 运行 按原样编写代码,则模型将保存在当前工作目录中名为 simple/
的文件夹中。
- 在一个新的图表中,我们然后用
tf.saved_model.loader.load
恢复保存的模型。我们使用 graph.get_tensor_by_name
获取占位符和 logits,使用 graph.get_operation_by_name
. 获取 Iterator
初始化操作
- 最后,我们 运行 对数据集中的两个批次进行推断,并检查保存和恢复的模型是否都产生相同的值。他们做到了!
代码:
import os
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants
def model(graph, input_tensor):
"""Create the model which consists of
a bidirectional rnn (GRU(10)) followed by a dense classifier
Args:
graph (tf.Graph): Tensors' graph
input_tensor (tf.Tensor): Tensor fed as input to the model
Returns:
tf.Tensor: the model's output layer Tensor
"""
cell = tf.nn.rnn_cell.GRUCell(10)
with graph.as_default():
((fw_outputs, bw_outputs), (fw_state, bw_state)) = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell,
cell_bw=cell,
inputs=input_tensor,
sequence_length=[10] * 32,
dtype=tf.float32,
swap_memory=True,
scope=None)
outputs = tf.concat((fw_outputs, bw_outputs), 2)
mean = tf.reduce_mean(outputs, axis=1)
dense = tf.layers.dense(mean, 5, activation=None)
return dense
def get_opt_op(graph, logits, labels_tensor):
"""Create optimization operation from model's logits and labels
Args:
graph (tf.Graph): Tensors' graph
logits (tf.Tensor): The model's output without activation
labels_tensor (tf.Tensor): Target labels
Returns:
tf.Operation: the operation performing a stem of Adam optimizer
"""
with graph.as_default():
with tf.variable_scope('loss'):
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=labels_tensor, name='xent'),
name="mean-xent"
)
with tf.variable_scope('optimizer'):
opt_op = tf.train.AdamOptimizer(1e-2).minimize(loss)
return opt_op
if __name__ == '__main__':
# Set random seed for reproducibility
# and create synthetic data
np.random.seed(0)
features = np.random.randn(64, 10, 30)
labels = np.eye(5)[np.random.randint(0, 5, (64,))]
graph1 = tf.Graph()
with graph1.as_default():
# Random seed for reproducibility
tf.set_random_seed(0)
# Placeholders
batch_size_ph = tf.placeholder(tf.int64, name='batch_size_ph')
features_data_ph = tf.placeholder(tf.float32, [None, None, 30], 'features_data_ph')
labels_data_ph = tf.placeholder(tf.int32, [None, 5], 'labels_data_ph')
# Dataset
dataset = tf.data.Dataset.from_tensor_slices((features_data_ph, labels_data_ph))
dataset = dataset.batch(batch_size_ph)
iterator = tf.data.Iterator.from_structure(dataset.output_types, dataset.output_shapes)
dataset_init_op = iterator.make_initializer(dataset, name='dataset_init')
input_tensor, labels_tensor = iterator.get_next()
# Model
logits = model(graph1, input_tensor)
# Optimization
opt_op = get_opt_op(graph1, logits, labels_tensor)
with tf.Session(graph=graph1) as sess:
# Initialize variables
tf.global_variables_initializer().run(session=sess)
for epoch in range(3):
batch = 0
# Initialize dataset (could feed epochs in Dataset.repeat(epochs))
sess.run(
dataset_init_op,
feed_dict={
features_data_ph: features,
labels_data_ph: labels,
batch_size_ph: 32
})
values = []
while True:
try:
if epoch < 2:
# Training
_, value = sess.run([opt_op, logits])
print('Epoch {}, batch {} | Sample value: {}'.format(epoch, batch, value[0]))
batch += 1
else:
# Final inference
values.append(sess.run(logits))
print('Epoch {}, batch {} | Final inference | Sample value: {}'.format(epoch, batch, values[-1][0]))
batch += 1
except tf.errors.OutOfRangeError:
break
# Save model state
print('\nSaving...')
cwd = os.getcwd()
path = os.path.join(cwd, 'simple')
shutil.rmtree(path, ignore_errors=True)
inputs_dict = {
"batch_size_ph": batch_size_ph,
"features_data_ph": features_data_ph,
"labels_data_ph": labels_data_ph
}
outputs_dict = {
"logits": logits
}
tf.saved_model.simple_save(
sess, path, inputs_dict, outputs_dict
)
print('Ok')
# Restoring
graph2 = tf.Graph()
with graph2.as_default():
with tf.Session(graph=graph2) as sess:
# Restore saved values
print('\nRestoring...')
tf.saved_model.loader.load(
sess,
[tag_constants.SERVING],
path
)
print('Ok')
# Get restored placeholders
labels_data_ph = graph2.get_tensor_by_name('labels_data_ph:0')
features_data_ph = graph2.get_tensor_by_name('features_data_ph:0')
batch_size_ph = graph2.get_tensor_by_name('batch_size_ph:0')
# Get restored model output
restored_logits = graph2.get_tensor_by_name('dense/BiasAdd:0')
# Get dataset initializing operation
dataset_init_op = graph2.get_operation_by_name('dataset_init')
# Initialize restored dataset
sess.run(
dataset_init_op,
feed_dict={
features_data_ph: features,
labels_data_ph: labels,
batch_size_ph: 32
}
)
# Compute inference for both batches in dataset
restored_values = []
for i in range(2):
restored_values.append(sess.run(restored_logits))
print('Restored values: ', restored_values[i][0])
# Check if original inference and restored inference are equal
valid = all((v == rv).all() for v, rv in zip(values, restored_values))
print('\nInferences match: ', valid)
这将打印:
$ python3 save_and_restore.py
Epoch 0, batch 0 | Sample value: [-0.13851789 -0.3087595 0.12804556 0.20013677 -0.08229901]
Epoch 0, batch 1 | Sample value: [-0.00555491 -0.04339041 -0.05111827 -0.2480045 -0.00107776]
Epoch 1, batch 0 | Sample value: [-0.19321944 -0.2104792 -0.00602257 0.07465433 0.11674127]
Epoch 1, batch 1 | Sample value: [-0.05275984 0.05981954 -0.15913513 -0.3244143 0.10673307]
Epoch 2, batch 0 | Final inference | Sample value: [-0.26331693 -0.13013336 -0.12553 -0.04276478 0.2933622 ]
Epoch 2, batch 1 | Final inference | Sample value: [-0.07730117 0.11119192 -0.20817074 -0.35660955 0.16990358]
Saving...
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/some/path/simple/saved_model.pb'
Ok
Restoring...
INFO:tensorflow:Restoring parameters from b'/some/path/simple/variables/variables'
Ok
Restored values: [-0.26331693 -0.13013336 -0.12553 -0.04276478 0.2933622 ]
Restored values: [-0.07730117 0.11119192 -0.20817074 -0.35660955 0.16990358]
Inferences match: True
使用tf.train.Saver
保存模型。请记住,如果要减小模型大小,则需要指定 var_list
。 val_list
可以是:
tf.trainable_variables
或
tf.global_variables
.
根据新的 Tensorflow 版本,tf.train.Checkpoint
是保存和恢复模型的首选方式:
Checkpoint.save
and Checkpoint.restore
write and read object-based
checkpoints, in contrast to tf.train.Saver which writes and reads
variable.name based checkpoints. Object-based checkpointing saves a
graph of dependencies between Python objects (Layers, Optimizers,
Variables, etc.) with named edges, and this graph is used to match
variables when restoring a checkpoint. It can be more robust to
changes in the Python program, and helps to support restore-on-create
for variables when executing eagerly. Prefer tf.train.Checkpoint
over
tf.train.Saver
for new code.
这是一个例子:
import tensorflow as tf
import os
tf.enable_eager_execution()
checkpoint_directory = "/tmp/training_checkpoints"
checkpoint_prefix = os.path.join(checkpoint_directory, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
status = checkpoint.restore(tf.train.latest_checkpoint(checkpoint_directory))
for _ in range(num_training_steps):
optimizer.minimize( ... ) # Variables will be restored on creation.
status.assert_consumed() # Optional sanity checks.
checkpoint.save(file_prefix=checkpoint_prefix)
无论你想把模型保存到哪里,
self.saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
...
self.saver.save(sess, filename)
确保所有 tf.Variable
都有名字,因为您以后可能想使用它们的名字来恢复它们。
而你想要预测的地方,
saver = tf.train.import_meta_graph(filename)
name = 'name given when you saved the file'
with tf.Session() as sess:
saver.restore(sess, name)
print(sess.run('W1:0')) #example to retrieve by variable name
确保保护程序在相应的会话中运行。
请记住,如果您使用 tf.train.latest_checkpoint('./')
,则只会使用最新的检查点。
您可以使用
保存网络中的变量
saver = tf.train.Saver()
saver.save(sess, 'path of save/fileName.ckpt')
要恢复网络以便以后或在另一个脚本中重用,请使用:
saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('path of save/')
sess.run(....)
要点:
sess
必须在第一次和后来的运行之间相同(连贯结构)。
saver.restore
需要保存文件的文件夹路径,而不是单个文件路径。
对于tensorflow 2.0,是as simple as
# Save the model
model.save('path_to_my_model.h5')
恢复:
new_model = tensorflow.keras.models.load_model('path_to_my_model.h5')
我的版本:
tensorflow (1.13.1)
tensorflow-gpu (1.13.1)
简单的方法是
保存:
model.save("model.h5")
恢复:
model = tf.keras.models.load_model("model.h5")
在新版本的tensorflow 2.0中,saving/loading一个模型的过程要简单很多。由于 Keras API 的实施,TensorFlow 的高级 API。
要保存模型:
查看文档以供参考:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/models/save_model
tf.keras.models.save_model(model_name, filepath, save_format)
加载模型:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/models/load_model
model = tf.keras.models.load_model(filepath)
tf.keras 使用 TF2.0
保存模型
我看到关于使用 TF1.x 保存模型的很好的答案。我想提供更多关于保存 tensorflow.keras
模型的建议,这有点复杂,因为有很多方法可以保存模型。
这里我提供一个例子,将tensorflow.keras
模型保存到当前目录下的model_path
文件夹中。这适用于最新的 tensorflow (TF2.0)。如果近期有任何变化,我会更新此描述。
保存和加载整个模型
import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist
#import data
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# create a model
def create_model():
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
# compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
# Create a basic model instance
model=create_model()
model.fit(x_train, y_train, epochs=1)
loss, acc = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))
# Save entire model to a HDF5 file
model.save('./model_path/my_model.h5')
# Recreate the exact same model, including weights and optimizer.
new_model = keras.models.load_model('./model_path/my_model.h5')
loss, acc = new_model.evaluate(x_test, y_test)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
仅保存和加载模型权重
如果您只想保存模型权重然后加载权重来恢复模型,那么
model.fit(x_train, y_train, epochs=5)
loss, acc = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))
# Save the weights
model.save_weights('./checkpoints/my_checkpoint')
# Restore the weights
model = create_model()
model.load_weights('./checkpoints/my_checkpoint')
loss,acc = model.evaluate(x_test, y_test)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
使用 keras 检查点回调保存和恢复
# include the epoch in the file name. (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(
checkpoint_path, verbose=1, save_weights_only=True,
# Save weights, every 5-epochs.
period=5)
model = create_model()
model.save_weights(checkpoint_path.format(epoch=0))
model.fit(train_images, train_labels,
epochs = 50, callbacks = [cp_callback],
validation_data = (test_images,test_labels),
verbose=0)
latest = tf.train.latest_checkpoint(checkpoint_dir)
new_model = create_model()
new_model.load_weights(latest)
loss, acc = new_model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
使用自定义指标保存模型
import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Custom Loss1 (for example)
@tf.function()
def customLoss1(yTrue,yPred):
return tf.reduce_mean(yTrue-yPred)
# Custom Loss2 (for example)
@tf.function()
def customLoss2(yTrue, yPred):
return tf.reduce_mean(tf.square(tf.subtract(yTrue,yPred)))
def create_model():
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy', customLoss1, customLoss2])
return model
# Create a basic model instance
model=create_model()
# Fit and evaluate model
model.fit(x_train, y_train, epochs=1)
loss, acc,loss1, loss2 = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))
model.save("./model.h5")
new_model=tf.keras.models.load_model("./model.h5",custom_objects={'customLoss1':customLoss1,'customLoss2':customLoss2})
使用自定义操作保存 keras 模型
当我们有以下情况中的自定义操作时 (tf.tile
),我们需要创建一个函数并用 Lambda 层包装。否则,模型无法保存。
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Lambda
from tensorflow.keras import Model
def my_fun(a):
out = tf.tile(a, (1, tf.shape(a)[0]))
return out
a = Input(shape=(10,))
#out = tf.tile(a, (1, tf.shape(a)[0]))
out = Lambda(lambda x : my_fun(x))(a)
model = Model(a, out)
x = np.zeros((50,10), dtype=np.float32)
print(model(x).numpy())
model.save('my_model.h5')
#load the model
new_model=tf.keras.models.load_model("my_model.h5")
我想我已经介绍了保存 tf.keras 模型的众多方法中的一些。但是,还有许多其他方法。如果您发现上面未涵盖您的用例,请在下方评论。谢谢!
根据@Vishnuvardhan Janapati 的回答,这是另一种在 TensorFlow 2.0.0[下使用 自定义 layer/metric/loss 保存和重新加载模型的方法
import tensorflow as tf
from tensorflow.keras.layers import Layer
from tensorflow.keras.utils.generic_utils import get_custom_objects
# custom loss (for example)
def custom_loss(y_true,y_pred):
return tf.reduce_mean(y_true - y_pred)
get_custom_objects().update({'custom_loss': custom_loss})
# custom loss (for example)
class CustomLayer(Layer):
def __init__(self, ...):
...
# define custom layer and all necessary custom operations inside custom layer
get_custom_objects().update({'CustomLayer': CustomLayer})
这样,一旦你执行了这样的代码,并用tf.keras.models.save_model
或model.save
或ModelCheckpoint
回调保存你的模型,你可以重新加载你的模型而不需要精确的自定义对象,简单到
new_model = tf.keras.models.load_model("./model.h5"})
对于tensorflow-2.0
很简单
import tensorflow as tf
保存
model.save("model_name")
恢复
model = tf.keras.models.load_model('model_name')
这是一个使用 Tensorflow 2.0 SavedModel 格式 (推荐格式,according to the docs) 的简单示例简单的 MNIST 数据集分类器,使用 Keras 函数 API,没有太多花哨的东西:
# Imports
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt
# Load data
mnist = tf.keras.datasets.mnist # 28 x 28
(x_train,y_train), (x_test, y_test) = mnist.load_data()
# Normalize pixels [0,255] -> [0,1]
x_train = tf.keras.utils.normalize(x_train,axis=1)
x_test = tf.keras.utils.normalize(x_test,axis=1)
# Create model
input = Input(shape=(28,28), dtype='float64', name='graph_input')
x = Flatten()(input)
x = Dense(128, activation='relu')(x)
x = Dense(128, activation='relu')(x)
output = Dense(10, activation='softmax', name='graph_output', dtype='float64')(x)
model = Model(inputs=input, outputs=output)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train
model.fit(x_train, y_train, epochs=3)
# Save model in SavedModel format (Tensorflow 2.0)
export_path = 'model'
tf.saved_model.save(model, export_path)
# ... possibly another python program
# Reload model
loaded_model = tf.keras.models.load_model(export_path)
# Get image sample for testing
index = 0
img = x_test[index] # I normalized the image on a previous step
# Predict using the signature definition (Tensorflow 2.0)
predict = loaded_model.signatures["serving_default"]
prediction = predict(tf.constant(img))
# Show results
print(np.argmax(prediction['graph_output'])) # prints the class number
plt.imshow(x_test[index], cmap=plt.cm.binary) # prints the image
什么是serving_default
?
这是 you selected (in this case, the default serve
tag was selected). Also, here 的名称解释了如何使用 saved_model_cli
查找模型的标签和签名。
免责声明
这只是一个基本的例子,如果你只是想得到它 运行,但绝不是一个完整的答案 - 也许我可以在未来更新它。我只是想给出一个使用 TF 2.0 中的 SavedModel
的简单示例,因为我在任何地方都没有看到过,即使是这么简单。
@的答案是一个 SavedModel 示例,但它不适用于 Tensorflow 2.0,因为不幸的是有一些重大变化。
@的回答是TF 2.0,但不是SavedModel格式。
Tensorflow 2.6 : 现在变得更简单了,你可以用两种格式保存模型
- Saved_model(与 tf 服务兼容)
- H5 或 HDF5
以两种格式保存模型:
from tensorflow.keras import Model
inputs = tf.keras.Input(shape=(224,224,3))
y = tf.keras.layers.Conv2D(24, 3, activation='relu', input_shape=input_shape[1:])(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(y)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.save("saved_model/my_model") #To Save in Saved_model format
model.save("my_model.h5") #To save model in H5 or HDF5 format
以两种格式加载模型
import tensorflow as tf
h5_model = tf.keras.models.load_model("my_model.h5") # loading model in h5 format
h5_model.summary()
saved_m = tf.keras.models.load_model("saved_model/my_model") #loading model in saved_model format
saved_m.summary()
最简单的方法是使用keras api,在线保存模型,在线加载模型
from keras.models import load_model
my_model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del my_model # deletes the existing model
my_model = load_model('my_model.h5') # returns a compiled model identical to the previous one
在 Tensorflow 中训练模型后:
- 如何保存训练好的模型?
- 你以后如何恢复这个保存的模型?
对于 TensorFlow 版本 < 0.11.0RC1:
保存的检查点包含模型中 Variable
的值,而不是 model/graph 本身的值,这意味着恢复检查点时图形应该相同。
这是一个线性回归示例,其中有一个保存变量检查点的训练循环和一个将恢复先前 运行 中保存的变量并计算预测的评估部分。当然你也可以恢复变量继续训练
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
w = tf.Variable(tf.zeros([1, 1], dtype=tf.float32))
b = tf.Variable(tf.ones([1, 1], dtype=tf.float32))
y_hat = tf.add(b, tf.matmul(x, w))
...more setup for optimization and what not...
saver = tf.train.Saver() # defaults to saving all variables - in this case w and b
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
if FLAGS.train:
for i in xrange(FLAGS.training_steps):
...training loop...
if (i + 1) % FLAGS.checkpoint_steps == 0:
saver.save(sess, FLAGS.checkpoint_dir + 'model.ckpt',
global_step=i+1)
else:
# Here's where you're restoring the variables w and b.
# Note that the graph is exactly as it was when the variables were
# saved in a prior training run.
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
else:
...no checkpoint found...
# Now you can run the model to get predictions
batch_x = ...load some data...
predictions = sess.run(y_hat, feed_dict={x: batch_x})
这是 Saver
的 docs for Variable
s, which cover saving and restoring. And here are the docs。
模型有两部分,模型定义,由 Supervisor
在模型目录中保存为 graph.pbtxt
和张量的数值,保存到检查点文件中,如 model.ckpt-1003418
.
可以使用tf.import_graph_def
恢复模型定义,使用Saver
恢复权重。
但是,Saver
使用特殊的集合保存附加到模型图的变量列表,并且此集合未使用 import_graph_def 初始化,因此您不能同时使用两者时刻(这是我们修复的路线图)。现在,您必须使用 Ryan Sepassi 的方法——手动构建具有相同节点名称的图,并使用 Saver
将权重加载到其中。
(或者,您可以通过使用 import_graph_def
、手动创建变量、对每个变量使用 tf.add_to_collection(tf.GraphKeys.VARIABLES, variable)
,然后使用 Saver
)
正如 Yaroslav 所说,您可以通过导入图表、手动创建变量然后使用 Saver 来从 graph_def 和检查点恢复。
我实现这个是为了个人使用,所以我想在这里分享代码。
Link: https://gist.github.com/nikitakit/6ef3b72be67b86cb7868
(当然,这是一种 hack,不能保证以这种方式保存的模型在未来版本的 TensorFlow 中仍然可读。)
您还可以查看 examples in TensorFlow/skflow,它提供了 save
和 restore
方法,可以帮助您轻松管理模型。它具有您还可以控制备份模型的频率的参数。
如果是内部保存的模型,你只需要为所有变量指定一个恢复器即可
restorer = tf.train.Saver(tf.all_variables())
并使用它来恢复当前会话中的变量:
restorer.restore(self._sess, model_file)
对于外部模型,您需要指定从其变量名到您的变量名的映射。您可以使用命令
查看模型变量名称python /path/to/tensorflow/tensorflow/python/tools/inspect_checkpoint.py --file_name=/path/to/pretrained_model/model.ckpt
inspect_checkpoint.py 脚本可以在 Tensorflow 源的“./tensorflow/python/tools”文件夹中找到。
要指定映射,可以使用我的Tensorflow-Worklab, which contains a set of classes and scripts to train and retrain different models. It includes an example of retraining ResNet models, located here
在TensorFlow 0.11.0RC1(及之后)版本中,您可以根据https://www.tensorflow.org/programmers_guide/meta_graph.
调用tf.train.export_meta_graph
和tf.train.import_meta_graph
直接保存和恢复模型
保存模型
w1 = tf.Variable(tf.truncated_normal(shape=[10]), name='w1')
w2 = tf.Variable(tf.truncated_normal(shape=[20]), name='w2')
tf.add_to_collection('vars', w1)
tf.add_to_collection('vars', w2)
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, 'my-model')
# `save` method will call `export_meta_graph` implicitly.
# you will get saved graph files:my-model.meta
恢复模型
sess = tf.Session()
new_saver = tf.train.import_meta_graph('my-model.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
all_vars = tf.get_collection('vars')
for v in all_vars:
v_ = sess.run(v)
print(v_)
如问题 6255 中所述:
use '**./**model_name.ckpt'
saver.restore(sess,'./my_model_final.ckpt')
而不是
saver.restore('my_model_final.ckpt')
您也可以采用这种更简单的方法。
第 1 步:初始化所有变量
W1 = tf.Variable(tf.truncated_normal([6, 6, 1, K], stddev=0.1), name="W1")
B1 = tf.Variable(tf.constant(0.1, tf.float32, [K]), name="B1")
Similarly, W2, B2, W3, .....
第 2 步:在模型 Saver
中保存会话并保存
model_saver = tf.train.Saver()
# Train the model and save it in the end
model_saver.save(session, "saved_models/CNN_New.ckpt")
第三步:恢复模型
with tf.Session(graph=graph_cnn) as session:
model_saver.restore(session, "saved_models/CNN_New.ckpt")
print("Model restored.")
print('Initialized')
第 4 步:检查您的变量
W1 = session.run(W1)
print(W1)
虽然 运行 在不同的 python 实例中,使用
with tf.Session() as sess:
# Restore latest checkpoint
saver.restore(sess, tf.train.latest_checkpoint('saved_model/.'))
# Initalize the variables
sess.run(tf.global_variables_initializer())
# Get default graph (supply your custom graph if you have one)
graph = tf.get_default_graph()
# It will give tensor object
W1 = graph.get_tensor_by_name('W1:0')
# To get the value (numpy array)
W1_value = session.run(W1)
在大多数情况下,使用 tf.train.Saver
从磁盘保存和恢复是您的最佳选择:
... # build your model
saver = tf.train.Saver()
with tf.Session() as sess:
... # train the model
saver.save(sess, "/tmp/my_great_model")
with tf.Session() as sess:
saver.restore(sess, "/tmp/my_great_model")
... # use the model
您还可以 save/restore 图形结构本身(有关详细信息,请参阅 MetaGraph documentation)。默认情况下,Saver
将图形结构保存到 .meta
文件中。您可以调用 import_meta_graph()
来恢复它。它恢复了图形结构和 returns 一个 Saver
,你可以用它来恢复模型的状态:
saver = tf.train.import_meta_graph("/tmp/my_great_model.meta")
with tf.Session() as sess:
saver.restore(sess, "/tmp/my_great_model")
... # use the model
但是,有些情况下您需要更快的速度。例如,如果你实施提前停止,你希望在训练期间每次模型改进时保存检查点(如在验证集上测量的那样),然后如果一段时间没有进展,你想回滚到最佳模型。如果每次模型改进时都将模型保存到磁盘,它将极大地减慢训练速度。诀窍是将变量状态保存到内存,然后稍后恢复它们:
... # build your model
# get a handle on the graph nodes we need to save/restore the model
graph = tf.get_default_graph()
gvars = graph.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
assign_ops = [graph.get_operation_by_name(v.op.name + "/Assign") for v in gvars]
init_values = [assign_op.inputs[1] for assign_op in assign_ops]
with tf.Session() as sess:
... # train the model
# when needed, save the model state to memory
gvars_state = sess.run(gvars)
# when needed, restore the model state
feed_dict = {init_value: val
for init_value, val in zip(init_values, gvars_state)}
sess.run(assign_ops, feed_dict=feed_dict)
快速解释:当你创建一个变量X
时,TensorFlow会自动创建一个赋值操作X/Assign
来设置变量的初始值。我们没有创建占位符和额外的赋值操作(这只会让图表变得混乱),而是使用这些现有的赋值操作。每个赋值操作的第一个输入是对它应该初始化的变量的引用,第二个输入(assign_op.inputs[1]
)是初始值。因此,为了设置我们想要的任何值(而不是初始值),我们需要使用 feed_dict
并替换初始值。是的,TensorFlow 允许您为任何操作提供一个值,而不仅仅是占位符,所以这很好用。
这是我针对两种基本情况的简单解决方案,这两种情况在您是要从文件加载图表还是在运行时构建图表方面有所不同。
此答案适用于 Tensorflow 0.12+(包括 1.0)。
在代码中重建图表
节省
graph = ... # build the graph
saver = tf.train.Saver() # create the saver after the graph
with ... as sess: # your session object
saver.save(sess, 'my-model')
正在加载
graph = ... # build the graph
saver = tf.train.Saver() # create the saver after the graph
with ... as sess: # your session object
saver.restore(sess, tf.train.latest_checkpoint('./'))
# now you can use the graph, continue training or whatever
同时从文件加载图表
使用此技术时,请确保您的所有 layers/variables 都明确设置了唯一的名称。 否则 Tensorflow 会使名称本身独一无二,因此它们会有所不同从存储在文件中的名称。这在以前的技术中不是问题,因为名称在加载和保存时 "mangled" 相同。
节省
graph = ... # build the graph
for op in [ ... ]: # operators you want to use after restoring the model
tf.add_to_collection('ops_to_restore', op)
saver = tf.train.Saver() # create the saver after the graph
with ... as sess: # your session object
saver.save(sess, 'my-model')
正在加载
with ... as sess: # your session object
saver = tf.train.import_meta_graph('my-model.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
ops = tf.get_collection('ops_to_restore') # here are your operators in the same order in which you saved them to the collection
我正在改进我的答案以添加更多关于保存和恢复模型的细节。
在(及之后)Tensorflow 版本 0.11:
保存模型:
import tensorflow as tf
#Prepare to feed input, i.e. feed_dict and placeholders
w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1= tf.Variable(2.0,name="bias")
feed_dict ={w1:4,w2:8}
#Define a test operation that we will restore
w3 = tf.add(w1,w2)
w4 = tf.multiply(w3,b1,name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
#Create a saver object which will save all the variables
saver = tf.train.Saver()
#Run the operation by feeding input
print sess.run(w4,feed_dict)
#Prints 24 which is sum of (w1+w2)*b1
#Now, save the graph
saver.save(sess, 'my_test_model',global_step=1000)
恢复模型:
import tensorflow as tf
sess=tf.Session()
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('my_test_model-1000.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
# Access saved Variables directly
print(sess.run('bias:0'))
# This will print 2, which is the value of bias that we saved
# Now, let's access and create placeholders variables and
# create feed-dict to feed new data
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict ={w1:13.0,w2:17.0}
#Now, access the op that you want to run.
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")
print sess.run(op_to_restore,feed_dict)
#This will print 60 which is calculated
这个和一些更高级的用例已经在这里得到了很好的解释。
A quick complete tutorial to save and restore Tensorflow models
如果您使用 tf.train.MonitoredTrainingSession 作为默认会话,则无需添加额外的代码来执行 save/restore 操作。只需将检查点目录名称传递给 MonitoredTrainingSession 的构造函数,它将使用会话挂钩来处理这些。
这里所有的答案都很棒,但我想补充两点。
首先,详细说明@user7505159 的回答,将“./”添加到要恢复的文件名的开头可能很重要。
例如,您可以像这样保存文件名中没有“./”的图形:
# Some graph defined up here with specific names
saver = tf.train.Saver()
save_file = 'model.ckpt'
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.save(sess, save_file)
但为了恢复图形,您可能需要在 file_name:
前添加一个“./”# Same graph defined up here
saver = tf.train.Saver()
save_file = './' + 'model.ckpt' # String addition used for emphasis
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.restore(sess, save_file)
您并不总是需要“./”,但它可能会导致问题,具体取决于您的环境和 TensorFlow 版本。
还想提一下 sess.run(tf.global_variables_initializer())
在恢复会话之前可能很重要。
如果您在尝试恢复已保存的会话时收到有关未初始化变量的错误,请确保在 saver.restore(sess, save_file)
行之前包含 sess.run(tf.global_variables_initializer())
。它可以让你省去头痛。
我的环境:Python3.6,Tensorflow 1.3.0
虽然有很多解决方案,但大多数都是基于tf.train.Saver
。当我们加载由 Saver
保存的 .ckpt
时,我们必须重新定义 tensorflow 网络或使用一些奇怪且难以记住的名称,例如'placehold_0:0'
、'dense/Adam/Weight:0'
。这里推荐使用tf.saved_model
,下面给出一个最简单的例子,大家可以参考Serving a TensorFlow Model:
保存模型:
import tensorflow as tf
# define the tensorflow network and do some trains
x = tf.placeholder("float", name="x")
w = tf.Variable(2.0, name="w")
b = tf.Variable(0.0, name="bias")
h = tf.multiply(x, w)
y = tf.add(h, b, name="y")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# save the model
export_path = './savedmodel'
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y)
prediction_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={'x_input': tensor_info_x},
outputs={'y_output': tensor_info_y},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
prediction_signature
},
)
builder.save()
加载模型:
import tensorflow as tf
sess=tf.Session()
signature_key = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
input_key = 'x_input'
output_key = 'y_output'
export_path = './savedmodel'
meta_graph_def = tf.saved_model.loader.load(
sess,
[tf.saved_model.tag_constants.SERVING],
export_path)
signature = meta_graph_def.signature_def
x_tensor_name = signature[signature_key].inputs[input_key].name
y_tensor_name = signature[signature_key].outputs[output_key].name
x = sess.graph.get_tensor_by_name(x_tensor_name)
y = sess.graph.get_tensor_by_name(y_tensor_name)
y_out = sess.run(y, {x: 3.0})
Tensorflow 2 文档
保存检查点
改编自the docs
# -------------------------
# ----- Toy Context -----
# -------------------------
import tensorflow as tf
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def call(self, x):
return self.l1(x)
def toy_dataset():
inputs = tf.range(10.0)[:, None]
labels = inputs * 5.0 + tf.range(5.0)[None, :]
return (
tf.data.Dataset.from_tensor_slices(dict(x=inputs, y=labels)).repeat().batch(2)
)
def train_step(net, example, optimizer):
"""Trains `net` on `example` using `optimizer`."""
with tf.GradientTape() as tape:
output = net(example["x"])
loss = tf.reduce_mean(tf.abs(output - example["y"]))
variables = net.trainable_variables
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
return loss
# ----------------------------
# ----- Create Objects -----
# ----------------------------
net = Net()
opt = tf.keras.optimizers.Adam(0.1)
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)
# ----------------------------
# ----- Train and Save -----
# ----------------------------
ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
print("Restored from {}".format(manager.latest_checkpoint))
else:
print("Initializing from scratch.")
for _ in range(50):
example = next(iterator)
loss = train_step(net, example, opt)
ckpt.step.assign_add(1)
if int(ckpt.step) % 10 == 0:
save_path = manager.save()
print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
print("loss {:1.2f}".format(loss.numpy()))
# ---------------------
# ----- Restore -----
# ---------------------
# In another script, re-initialize objects
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)
# Re-use the manager code above ^
ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
print("Restored from {}".format(manager.latest_checkpoint))
else:
print("Initializing from scratch.")
for _ in range(50):
example = next(iterator)
# Continue training or evaluate etc.
更多链接
关于
的详尽且有用的教程saved_model
-> https://www.tensorflow.org/guide/saved_modelkeras
模型保存详细指南->https://www.tensorflow.org/guide/keras/save_and_serialize
Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available.
The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Lite, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).
(亮点是我自己的)
张量流 < 2
来自文档:
保存
# Create some variables.
v1 = tf.get_variable("v1", shape=[3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", shape=[5], initializer = tf.zeros_initializer)
inc_v1 = v1.assign(v1+1)
dec_v2 = v2.assign(v2-1)
# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, and save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
inc_v1.op.run()
dec_v2.op.run()
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in path: %s" % save_path)
恢复
tf.reset_default_graph()
# Create some variables.
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
print("Model restored.")
# Check the values of the variables
print("v1 : %s" % v1.eval())
print("v2 : %s" % v2.eval())
simple_save
很多好的答案,为了完整起见,我将添加我的 2 美分:simple_save。也是使用 tf.data.Dataset
API.
Python 3 ;张量流 1.14
import tensorflow as tf
from tensorflow.saved_model import tag_constants
with tf.Graph().as_default():
with tf.Session() as sess:
...
# Saving
inputs = {
"batch_size_placeholder": batch_size_placeholder,
"features_placeholder": features_placeholder,
"labels_placeholder": labels_placeholder,
}
outputs = {"prediction": model_output}
tf.saved_model.simple_save(
sess, 'path/to/your/location/', inputs, outputs
)
正在恢复:
graph = tf.Graph()
with restored_graph.as_default():
with tf.Session() as sess:
tf.saved_model.loader.load(
sess,
[tag_constants.SERVING],
'path/to/your/location/',
)
batch_size_placeholder = graph.get_tensor_by_name('batch_size_placeholder:0')
features_placeholder = graph.get_tensor_by_name('features_placeholder:0')
labels_placeholder = graph.get_tensor_by_name('labels_placeholder:0')
prediction = restored_graph.get_tensor_by_name('dense/BiasAdd:0')
sess.run(prediction, feed_dict={
batch_size_placeholder: some_value,
features_placeholder: some_other_value,
labels_placeholder: another_value
})
独立示例
为了演示,以下代码生成随机数据。
- 我们从创建占位符开始。他们将在 运行 时间保存数据。从他们那里,我们创建了
Dataset
,然后是Iterator
。我们得到迭代器生成的张量,称为input_tensor
,它将作为我们模型的输入。 - 模型本身是从
input_tensor
构建的:一个基于 GRU 的双向 RNN,后跟一个密集分类器。因为为什么不呢。 - 损失是
softmax_cross_entropy_with_logits
,用Adam
优化。在 2 个时期(每个时期 2 个批次)之后,我们用tf.saved_model.simple_save
保存“训练有素”的模型。如果您 运行 按原样编写代码,则模型将保存在当前工作目录中名为simple/
的文件夹中。 - 在一个新的图表中,我们然后用
tf.saved_model.loader.load
恢复保存的模型。我们使用graph.get_tensor_by_name
获取占位符和 logits,使用graph.get_operation_by_name
. 获取 - 最后,我们 运行 对数据集中的两个批次进行推断,并检查保存和恢复的模型是否都产生相同的值。他们做到了!
Iterator
初始化操作
代码:
import os
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants
def model(graph, input_tensor):
"""Create the model which consists of
a bidirectional rnn (GRU(10)) followed by a dense classifier
Args:
graph (tf.Graph): Tensors' graph
input_tensor (tf.Tensor): Tensor fed as input to the model
Returns:
tf.Tensor: the model's output layer Tensor
"""
cell = tf.nn.rnn_cell.GRUCell(10)
with graph.as_default():
((fw_outputs, bw_outputs), (fw_state, bw_state)) = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell,
cell_bw=cell,
inputs=input_tensor,
sequence_length=[10] * 32,
dtype=tf.float32,
swap_memory=True,
scope=None)
outputs = tf.concat((fw_outputs, bw_outputs), 2)
mean = tf.reduce_mean(outputs, axis=1)
dense = tf.layers.dense(mean, 5, activation=None)
return dense
def get_opt_op(graph, logits, labels_tensor):
"""Create optimization operation from model's logits and labels
Args:
graph (tf.Graph): Tensors' graph
logits (tf.Tensor): The model's output without activation
labels_tensor (tf.Tensor): Target labels
Returns:
tf.Operation: the operation performing a stem of Adam optimizer
"""
with graph.as_default():
with tf.variable_scope('loss'):
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=labels_tensor, name='xent'),
name="mean-xent"
)
with tf.variable_scope('optimizer'):
opt_op = tf.train.AdamOptimizer(1e-2).minimize(loss)
return opt_op
if __name__ == '__main__':
# Set random seed for reproducibility
# and create synthetic data
np.random.seed(0)
features = np.random.randn(64, 10, 30)
labels = np.eye(5)[np.random.randint(0, 5, (64,))]
graph1 = tf.Graph()
with graph1.as_default():
# Random seed for reproducibility
tf.set_random_seed(0)
# Placeholders
batch_size_ph = tf.placeholder(tf.int64, name='batch_size_ph')
features_data_ph = tf.placeholder(tf.float32, [None, None, 30], 'features_data_ph')
labels_data_ph = tf.placeholder(tf.int32, [None, 5], 'labels_data_ph')
# Dataset
dataset = tf.data.Dataset.from_tensor_slices((features_data_ph, labels_data_ph))
dataset = dataset.batch(batch_size_ph)
iterator = tf.data.Iterator.from_structure(dataset.output_types, dataset.output_shapes)
dataset_init_op = iterator.make_initializer(dataset, name='dataset_init')
input_tensor, labels_tensor = iterator.get_next()
# Model
logits = model(graph1, input_tensor)
# Optimization
opt_op = get_opt_op(graph1, logits, labels_tensor)
with tf.Session(graph=graph1) as sess:
# Initialize variables
tf.global_variables_initializer().run(session=sess)
for epoch in range(3):
batch = 0
# Initialize dataset (could feed epochs in Dataset.repeat(epochs))
sess.run(
dataset_init_op,
feed_dict={
features_data_ph: features,
labels_data_ph: labels,
batch_size_ph: 32
})
values = []
while True:
try:
if epoch < 2:
# Training
_, value = sess.run([opt_op, logits])
print('Epoch {}, batch {} | Sample value: {}'.format(epoch, batch, value[0]))
batch += 1
else:
# Final inference
values.append(sess.run(logits))
print('Epoch {}, batch {} | Final inference | Sample value: {}'.format(epoch, batch, values[-1][0]))
batch += 1
except tf.errors.OutOfRangeError:
break
# Save model state
print('\nSaving...')
cwd = os.getcwd()
path = os.path.join(cwd, 'simple')
shutil.rmtree(path, ignore_errors=True)
inputs_dict = {
"batch_size_ph": batch_size_ph,
"features_data_ph": features_data_ph,
"labels_data_ph": labels_data_ph
}
outputs_dict = {
"logits": logits
}
tf.saved_model.simple_save(
sess, path, inputs_dict, outputs_dict
)
print('Ok')
# Restoring
graph2 = tf.Graph()
with graph2.as_default():
with tf.Session(graph=graph2) as sess:
# Restore saved values
print('\nRestoring...')
tf.saved_model.loader.load(
sess,
[tag_constants.SERVING],
path
)
print('Ok')
# Get restored placeholders
labels_data_ph = graph2.get_tensor_by_name('labels_data_ph:0')
features_data_ph = graph2.get_tensor_by_name('features_data_ph:0')
batch_size_ph = graph2.get_tensor_by_name('batch_size_ph:0')
# Get restored model output
restored_logits = graph2.get_tensor_by_name('dense/BiasAdd:0')
# Get dataset initializing operation
dataset_init_op = graph2.get_operation_by_name('dataset_init')
# Initialize restored dataset
sess.run(
dataset_init_op,
feed_dict={
features_data_ph: features,
labels_data_ph: labels,
batch_size_ph: 32
}
)
# Compute inference for both batches in dataset
restored_values = []
for i in range(2):
restored_values.append(sess.run(restored_logits))
print('Restored values: ', restored_values[i][0])
# Check if original inference and restored inference are equal
valid = all((v == rv).all() for v, rv in zip(values, restored_values))
print('\nInferences match: ', valid)
这将打印:
$ python3 save_and_restore.py
Epoch 0, batch 0 | Sample value: [-0.13851789 -0.3087595 0.12804556 0.20013677 -0.08229901]
Epoch 0, batch 1 | Sample value: [-0.00555491 -0.04339041 -0.05111827 -0.2480045 -0.00107776]
Epoch 1, batch 0 | Sample value: [-0.19321944 -0.2104792 -0.00602257 0.07465433 0.11674127]
Epoch 1, batch 1 | Sample value: [-0.05275984 0.05981954 -0.15913513 -0.3244143 0.10673307]
Epoch 2, batch 0 | Final inference | Sample value: [-0.26331693 -0.13013336 -0.12553 -0.04276478 0.2933622 ]
Epoch 2, batch 1 | Final inference | Sample value: [-0.07730117 0.11119192 -0.20817074 -0.35660955 0.16990358]
Saving...
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/some/path/simple/saved_model.pb'
Ok
Restoring...
INFO:tensorflow:Restoring parameters from b'/some/path/simple/variables/variables'
Ok
Restored values: [-0.26331693 -0.13013336 -0.12553 -0.04276478 0.2933622 ]
Restored values: [-0.07730117 0.11119192 -0.20817074 -0.35660955 0.16990358]
Inferences match: True
使用tf.train.Saver
保存模型。请记住,如果要减小模型大小,则需要指定 var_list
。 val_list
可以是:
tf.trainable_variables
或tf.global_variables
.
根据新的 Tensorflow 版本,tf.train.Checkpoint
是保存和恢复模型的首选方式:
Checkpoint.save
andCheckpoint.restore
write and read object-based checkpoints, in contrast to tf.train.Saver which writes and reads variable.name based checkpoints. Object-based checkpointing saves a graph of dependencies between Python objects (Layers, Optimizers, Variables, etc.) with named edges, and this graph is used to match variables when restoring a checkpoint. It can be more robust to changes in the Python program, and helps to support restore-on-create for variables when executing eagerly. Prefertf.train.Checkpoint
overtf.train.Saver
for new code.
这是一个例子:
import tensorflow as tf
import os
tf.enable_eager_execution()
checkpoint_directory = "/tmp/training_checkpoints"
checkpoint_prefix = os.path.join(checkpoint_directory, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
status = checkpoint.restore(tf.train.latest_checkpoint(checkpoint_directory))
for _ in range(num_training_steps):
optimizer.minimize( ... ) # Variables will be restored on creation.
status.assert_consumed() # Optional sanity checks.
checkpoint.save(file_prefix=checkpoint_prefix)
无论你想把模型保存到哪里,
self.saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
...
self.saver.save(sess, filename)
确保所有 tf.Variable
都有名字,因为您以后可能想使用它们的名字来恢复它们。
而你想要预测的地方,
saver = tf.train.import_meta_graph(filename)
name = 'name given when you saved the file'
with tf.Session() as sess:
saver.restore(sess, name)
print(sess.run('W1:0')) #example to retrieve by variable name
确保保护程序在相应的会话中运行。
请记住,如果您使用 tf.train.latest_checkpoint('./')
,则只会使用最新的检查点。
您可以使用
保存网络中的变量saver = tf.train.Saver()
saver.save(sess, 'path of save/fileName.ckpt')
要恢复网络以便以后或在另一个脚本中重用,请使用:
saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('path of save/')
sess.run(....)
要点:
sess
必须在第一次和后来的运行之间相同(连贯结构)。saver.restore
需要保存文件的文件夹路径,而不是单个文件路径。
对于tensorflow 2.0,是as simple as
# Save the model model.save('path_to_my_model.h5')
恢复:
new_model = tensorflow.keras.models.load_model('path_to_my_model.h5')
我的版本:
tensorflow (1.13.1)
tensorflow-gpu (1.13.1)
简单的方法是
保存:
model.save("model.h5")
恢复:
model = tf.keras.models.load_model("model.h5")
在新版本的tensorflow 2.0中,saving/loading一个模型的过程要简单很多。由于 Keras API 的实施,TensorFlow 的高级 API。
要保存模型: 查看文档以供参考: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/models/save_model
tf.keras.models.save_model(model_name, filepath, save_format)
加载模型:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/models/load_model
model = tf.keras.models.load_model(filepath)
tf.keras 使用 TF2.0
保存模型
我看到关于使用 TF1.x 保存模型的很好的答案。我想提供更多关于保存 tensorflow.keras
模型的建议,这有点复杂,因为有很多方法可以保存模型。
这里我提供一个例子,将tensorflow.keras
模型保存到当前目录下的model_path
文件夹中。这适用于最新的 tensorflow (TF2.0)。如果近期有任何变化,我会更新此描述。
保存和加载整个模型
import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist
#import data
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# create a model
def create_model():
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
# compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
# Create a basic model instance
model=create_model()
model.fit(x_train, y_train, epochs=1)
loss, acc = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))
# Save entire model to a HDF5 file
model.save('./model_path/my_model.h5')
# Recreate the exact same model, including weights and optimizer.
new_model = keras.models.load_model('./model_path/my_model.h5')
loss, acc = new_model.evaluate(x_test, y_test)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
仅保存和加载模型权重
如果您只想保存模型权重然后加载权重来恢复模型,那么
model.fit(x_train, y_train, epochs=5)
loss, acc = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))
# Save the weights
model.save_weights('./checkpoints/my_checkpoint')
# Restore the weights
model = create_model()
model.load_weights('./checkpoints/my_checkpoint')
loss,acc = model.evaluate(x_test, y_test)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
使用 keras 检查点回调保存和恢复
# include the epoch in the file name. (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(
checkpoint_path, verbose=1, save_weights_only=True,
# Save weights, every 5-epochs.
period=5)
model = create_model()
model.save_weights(checkpoint_path.format(epoch=0))
model.fit(train_images, train_labels,
epochs = 50, callbacks = [cp_callback],
validation_data = (test_images,test_labels),
verbose=0)
latest = tf.train.latest_checkpoint(checkpoint_dir)
new_model = create_model()
new_model.load_weights(latest)
loss, acc = new_model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
使用自定义指标保存模型
import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Custom Loss1 (for example)
@tf.function()
def customLoss1(yTrue,yPred):
return tf.reduce_mean(yTrue-yPred)
# Custom Loss2 (for example)
@tf.function()
def customLoss2(yTrue, yPred):
return tf.reduce_mean(tf.square(tf.subtract(yTrue,yPred)))
def create_model():
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy', customLoss1, customLoss2])
return model
# Create a basic model instance
model=create_model()
# Fit and evaluate model
model.fit(x_train, y_train, epochs=1)
loss, acc,loss1, loss2 = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))
model.save("./model.h5")
new_model=tf.keras.models.load_model("./model.h5",custom_objects={'customLoss1':customLoss1,'customLoss2':customLoss2})
使用自定义操作保存 keras 模型
当我们有以下情况中的自定义操作时 (tf.tile
),我们需要创建一个函数并用 Lambda 层包装。否则,模型无法保存。
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Lambda
from tensorflow.keras import Model
def my_fun(a):
out = tf.tile(a, (1, tf.shape(a)[0]))
return out
a = Input(shape=(10,))
#out = tf.tile(a, (1, tf.shape(a)[0]))
out = Lambda(lambda x : my_fun(x))(a)
model = Model(a, out)
x = np.zeros((50,10), dtype=np.float32)
print(model(x).numpy())
model.save('my_model.h5')
#load the model
new_model=tf.keras.models.load_model("my_model.h5")
我想我已经介绍了保存 tf.keras 模型的众多方法中的一些。但是,还有许多其他方法。如果您发现上面未涵盖您的用例,请在下方评论。谢谢!
根据@Vishnuvardhan Janapati 的回答,这是另一种在 TensorFlow 2.0.0[下使用 自定义 layer/metric/loss 保存和重新加载模型的方法
import tensorflow as tf
from tensorflow.keras.layers import Layer
from tensorflow.keras.utils.generic_utils import get_custom_objects
# custom loss (for example)
def custom_loss(y_true,y_pred):
return tf.reduce_mean(y_true - y_pred)
get_custom_objects().update({'custom_loss': custom_loss})
# custom loss (for example)
class CustomLayer(Layer):
def __init__(self, ...):
...
# define custom layer and all necessary custom operations inside custom layer
get_custom_objects().update({'CustomLayer': CustomLayer})
这样,一旦你执行了这样的代码,并用tf.keras.models.save_model
或model.save
或ModelCheckpoint
回调保存你的模型,你可以重新加载你的模型而不需要精确的自定义对象,简单到
new_model = tf.keras.models.load_model("./model.h5"})
对于tensorflow-2.0
很简单
import tensorflow as tf
保存
model.save("model_name")
恢复
model = tf.keras.models.load_model('model_name')
这是一个使用 Tensorflow 2.0 SavedModel 格式 (推荐格式,according to the docs) 的简单示例简单的 MNIST 数据集分类器,使用 Keras 函数 API,没有太多花哨的东西:
# Imports
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt
# Load data
mnist = tf.keras.datasets.mnist # 28 x 28
(x_train,y_train), (x_test, y_test) = mnist.load_data()
# Normalize pixels [0,255] -> [0,1]
x_train = tf.keras.utils.normalize(x_train,axis=1)
x_test = tf.keras.utils.normalize(x_test,axis=1)
# Create model
input = Input(shape=(28,28), dtype='float64', name='graph_input')
x = Flatten()(input)
x = Dense(128, activation='relu')(x)
x = Dense(128, activation='relu')(x)
output = Dense(10, activation='softmax', name='graph_output', dtype='float64')(x)
model = Model(inputs=input, outputs=output)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train
model.fit(x_train, y_train, epochs=3)
# Save model in SavedModel format (Tensorflow 2.0)
export_path = 'model'
tf.saved_model.save(model, export_path)
# ... possibly another python program
# Reload model
loaded_model = tf.keras.models.load_model(export_path)
# Get image sample for testing
index = 0
img = x_test[index] # I normalized the image on a previous step
# Predict using the signature definition (Tensorflow 2.0)
predict = loaded_model.signatures["serving_default"]
prediction = predict(tf.constant(img))
# Show results
print(np.argmax(prediction['graph_output'])) # prints the class number
plt.imshow(x_test[index], cmap=plt.cm.binary) # prints the image
什么是serving_default
?
这是 serve
tag was selected). Also, here 的名称解释了如何使用 saved_model_cli
查找模型的标签和签名。
免责声明
这只是一个基本的例子,如果你只是想得到它 运行,但绝不是一个完整的答案 - 也许我可以在未来更新它。我只是想给出一个使用 TF 2.0 中的 SavedModel
的简单示例,因为我在任何地方都没有看到过,即使是这么简单。
@
@
Tensorflow 2.6 : 现在变得更简单了,你可以用两种格式保存模型
- Saved_model(与 tf 服务兼容)
- H5 或 HDF5
以两种格式保存模型:
from tensorflow.keras import Model
inputs = tf.keras.Input(shape=(224,224,3))
y = tf.keras.layers.Conv2D(24, 3, activation='relu', input_shape=input_shape[1:])(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(y)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.save("saved_model/my_model") #To Save in Saved_model format
model.save("my_model.h5") #To save model in H5 or HDF5 format
以两种格式加载模型
import tensorflow as tf
h5_model = tf.keras.models.load_model("my_model.h5") # loading model in h5 format
h5_model.summary()
saved_m = tf.keras.models.load_model("saved_model/my_model") #loading model in saved_model format
saved_m.summary()
最简单的方法是使用keras api,在线保存模型,在线加载模型
from keras.models import load_model
my_model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del my_model # deletes the existing model
my_model = load_model('my_model.h5') # returns a compiled model identical to the previous one