训练后如何save/restore一个模型？

Question

在 Tensorflow 中训练模型后：

如何保存训练好的模型？
你以后如何恢复这个保存的模型？

Answer 1

对于 TensorFlow 版本 < 0.11.0RC1:

保存的检查点包含模型中 Variable 的值，而不是 model/graph 本身的值，这意味着恢复检查点时图形应该相同。

这是一个线性回归示例，其中有一个保存变量检查点的训练循环和一个将恢复先前运行中保存的变量并计算预测的评估部分。当然你也可以恢复变量继续训练

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

w = tf.Variable(tf.zeros([1, 1], dtype=tf.float32))
b = tf.Variable(tf.ones([1, 1], dtype=tf.float32))
y_hat = tf.add(b, tf.matmul(x, w))

...more setup for optimization and what not...

saver = tf.train.Saver()  # defaults to saving all variables - in this case w and b

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    if FLAGS.train:
        for i in xrange(FLAGS.training_steps):
            ...training loop...
            if (i + 1) % FLAGS.checkpoint_steps == 0:
                saver.save(sess, FLAGS.checkpoint_dir + 'model.ckpt',
                           global_step=i+1)
    else:
        # Here's where you're restoring the variables w and b.
        # Note that the graph is exactly as it was when the variables were
        # saved in a prior training run.
        ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
        else:
            ...no checkpoint found...

        # Now you can run the model to get predictions
        batch_x = ...load some data...
        predictions = sess.run(y_hat, feed_dict={x: batch_x})

这是 Saver 的 docs for Variables, which cover saving and restoring. And here are the docs。

Answer 2

模型有两部分，模型定义，由 Supervisor 在模型目录中保存为 graph.pbtxt 和张量的数值，保存到检查点文件中，如 model.ckpt-1003418.

可以使用tf.import_graph_def恢复模型定义，使用Saver恢复权重。

但是，Saver 使用特殊的集合保存附加到模型图的变量列表，并且此集合未使用 import_graph_def 初始化，因此您不能同时使用两者时刻（这是我们修复的路线图）。现在，您必须使用 Ryan Sepassi 的方法——手动构建具有相同节点名称的图，并使用 Saver 将权重加载到其中。

（或者，您可以通过使用 import_graph_def、手动创建变量、对每个变量使用 tf.add_to_collection(tf.GraphKeys.VARIABLES, variable)，然后使用 Saver）

Answer 3

正如 Yaroslav 所说，您可以通过导入图表、手动创建变量然后使用 Saver 来从 graph_def 和检查点恢复。

我实现这个是为了个人使用，所以我想在这里分享代码。

Link: https://gist.github.com/nikitakit/6ef3b72be67b86cb7868

（当然，这是一种 hack，不能保证以这种方式保存的模型在未来版本的 TensorFlow 中仍然可读。）

Answer 4

您还可以查看 examples in TensorFlow/skflow，它提供了 save 和 restore 方法，可以帮助您轻松管理模型。它具有您还可以控制备份模型的频率的参数。

Answer 5

如果是内部保存的模型，你只需要为所有变量指定一个恢复器即可

restorer = tf.train.Saver(tf.all_variables())

并使用它来恢复当前会话中的变量：

restorer.restore(self._sess, model_file)

对于外部模型，您需要指定从其变量名到您的变量名的映射。您可以使用命令

查看模型变量名称

python /path/to/tensorflow/tensorflow/python/tools/inspect_checkpoint.py --file_name=/path/to/pretrained_model/model.ckpt

inspect_checkpoint.py 脚本可以在 Tensorflow 源的“./tensorflow/python/tools”文件夹中找到。

要指定映射，可以使用我的Tensorflow-Worklab, which contains a set of classes and scripts to train and retrain different models. It includes an example of retraining ResNet models, located here

Answer 6

在TensorFlow 0.11.0RC1（及之后）版本中，您可以根据https://www.tensorflow.org/programmers_guide/meta_graph.

调用tf.train.export_meta_graph和tf.train.import_meta_graph直接保存和恢复模型

保存模型

w1 = tf.Variable(tf.truncated_normal(shape=[10]), name='w1')
w2 = tf.Variable(tf.truncated_normal(shape=[20]), name='w2')
tf.add_to_collection('vars', w1)
tf.add_to_collection('vars', w2)
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, 'my-model')
# `save` method will call `export_meta_graph` implicitly.
# you will get saved graph files:my-model.meta

恢复模型

sess = tf.Session()
new_saver = tf.train.import_meta_graph('my-model.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
all_vars = tf.get_collection('vars')
for v in all_vars:
    v_ = sess.run(v)
    print(v_)

Answer 7

如问题 6255 中所述：

use '**./**model_name.ckpt'
saver.restore(sess,'./my_model_final.ckpt')

而不是

saver.restore('my_model_final.ckpt')

Answer 8

您也可以采用这种更简单的方法。

第 1 步：初始化所有变量

W1 = tf.Variable(tf.truncated_normal([6, 6, 1, K], stddev=0.1), name="W1")
B1 = tf.Variable(tf.constant(0.1, tf.float32, [K]), name="B1")

Similarly, W2, B2, W3, .....

第 2 步：在模型 `Saver` 中保存会话并保存

model_saver = tf.train.Saver()

# Train the model and save it in the end
model_saver.save(session, "saved_models/CNN_New.ckpt")

第三步：恢复模型

with tf.Session(graph=graph_cnn) as session:
    model_saver.restore(session, "saved_models/CNN_New.ckpt")
    print("Model restored.") 
    print('Initialized')

第 4 步：检查您的变量

W1 = session.run(W1)
print(W1)

虽然运行在不同的 python 实例中，使用

with tf.Session() as sess:
    # Restore latest checkpoint
    saver.restore(sess, tf.train.latest_checkpoint('saved_model/.'))

    # Initalize the variables
    sess.run(tf.global_variables_initializer())

    # Get default graph (supply your custom graph if you have one)
    graph = tf.get_default_graph()

    # It will give tensor object
    W1 = graph.get_tensor_by_name('W1:0')

    # To get the value (numpy array)
    W1_value = session.run(W1)

Answer 9

在大多数情况下，使用 tf.train.Saver 从磁盘保存和恢复是您的最佳选择：

... # build your model
saver = tf.train.Saver()

with tf.Session() as sess:
    ... # train the model
    saver.save(sess, "/tmp/my_great_model")

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_great_model")
    ... # use the model

您还可以 save/restore 图形结构本身（有关详细信息，请参阅 MetaGraph documentation）。默认情况下，Saver 将图形结构保存到 .meta 文件中。您可以调用 import_meta_graph() 来恢复它。它恢复了图形结构和 returns 一个 Saver，你可以用它来恢复模型的状态：

saver = tf.train.import_meta_graph("/tmp/my_great_model.meta")

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_great_model")
    ... # use the model

但是，有些情况下您需要更快的速度。例如，如果你实施提前停止，你希望在训练期间每次模型改进时保存检查点（如在验证集上测量的那样），然后如果一段时间没有进展，你想回滚到最佳模型。如果每次模型改进时都将模型保存到磁盘，它将极大地减慢训练速度。诀窍是将变量状态保存到内存，然后稍后恢复它们：

... # build your model

# get a handle on the graph nodes we need to save/restore the model
graph = tf.get_default_graph()
gvars = graph.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
assign_ops = [graph.get_operation_by_name(v.op.name + "/Assign") for v in gvars]
init_values = [assign_op.inputs[1] for assign_op in assign_ops]

with tf.Session() as sess:
    ... # train the model

    # when needed, save the model state to memory
    gvars_state = sess.run(gvars)

    # when needed, restore the model state
    feed_dict = {init_value: val
                 for init_value, val in zip(init_values, gvars_state)}
    sess.run(assign_ops, feed_dict=feed_dict)

快速解释：当你创建一个变量X时，TensorFlow会自动创建一个赋值操作X/Assign来设置变量的初始值。我们没有创建占位符和额外的赋值操作（这只会让图表变得混乱），而是使用这些现有的赋值操作。每个赋值操作的第一个输入是对它应该初始化的变量的引用，第二个输入（assign_op.inputs[1]）是初始值。因此，为了设置我们想要的任何值（而不是初始值），我们需要使用 feed_dict 并替换初始值。是的，TensorFlow 允许您为任何操作提供一个值，而不仅仅是占位符，所以这很好用。

Answer 10

这是我针对两种基本情况的简单解决方案，这两种情况在您是要从文件加载图表还是在运行时构建图表方面有所不同。

此答案适用于 Tensorflow 0.12+（包括 1.0）。

在代码中重建图表

节省

graph = ... # build the graph
saver = tf.train.Saver()  # create the saver after the graph
with ... as sess:  # your session object
    saver.save(sess, 'my-model')

正在加载

graph = ... # build the graph
saver = tf.train.Saver()  # create the saver after the graph
with ... as sess:  # your session object
    saver.restore(sess, tf.train.latest_checkpoint('./'))
    # now you can use the graph, continue training or whatever

同时从文件加载图表

使用此技术时，请确保您的所有 layers/variables 都明确设置了唯一的名称。 否则 Tensorflow 会使名称本身独一无二，因此它们会有所不同从存储在文件中的名称。这在以前的技术中不是问题，因为名称在加载和保存时 "mangled" 相同。

节省

graph = ... # build the graph

for op in [ ... ]:  # operators you want to use after restoring the model
    tf.add_to_collection('ops_to_restore', op)

saver = tf.train.Saver()  # create the saver after the graph
with ... as sess:  # your session object
    saver.save(sess, 'my-model')

正在加载

with ... as sess:  # your session object
    saver = tf.train.import_meta_graph('my-model.meta')
    saver.restore(sess, tf.train.latest_checkpoint('./'))
    ops = tf.get_collection('ops_to_restore')  # here are your operators in the same order in which you saved them to the collection

Answer 11

我正在改进我的答案以添加更多关于保存和恢复模型的细节。

在（及之后）Tensorflow 版本 0.11：

保存模型：

import tensorflow as tf

#Prepare to feed input, i.e. feed_dict and placeholders
w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1= tf.Variable(2.0,name="bias")
feed_dict ={w1:4,w2:8}

#Define a test operation that we will restore
w3 = tf.add(w1,w2)
w4 = tf.multiply(w3,b1,name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())

#Create a saver object which will save all the variables
saver = tf.train.Saver()

#Run the operation by feeding input
print sess.run(w4,feed_dict)
#Prints 24 which is sum of (w1+w2)*b1 

#Now, save the graph
saver.save(sess, 'my_test_model',global_step=1000)

恢复模型：

import tensorflow as tf

sess=tf.Session()    
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('my_test_model-1000.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))


# Access saved Variables directly
print(sess.run('bias:0'))
# This will print 2, which is the value of bias that we saved


# Now, let's access and create placeholders variables and
# create feed-dict to feed new data

graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict ={w1:13.0,w2:17.0}

#Now, access the op that you want to run. 
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")

print sess.run(op_to_restore,feed_dict)
#This will print 60 which is calculated

这个和一些更高级的用例已经在这里得到了很好的解释。

A quick complete tutorial to save and restore Tensorflow models

Answer 12

如果您使用 tf.train.MonitoredTrainingSession 作为默认会话，则无需添加额外的代码来执行 save/restore 操作。只需将检查点目录名称传递给 MonitoredTrainingSession 的构造函数，它将使用会话挂钩来处理这些。

Answer 13

这里所有的答案都很棒，但我想补充两点。

首先，详细说明@user7505159 的回答，将“./”添加到要恢复的文件名的开头可能很重要。

例如，您可以像这样保存文件名中没有“./”的图形：

# Some graph defined up here with specific names

saver = tf.train.Saver()
save_file = 'model.ckpt'

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

但为了恢复图形，您可能需要在 file_name:

前添加一个“./”

# Same graph defined up here

saver = tf.train.Saver()
save_file = './' + 'model.ckpt' # String addition used for emphasis

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.restore(sess, save_file)

您并不总是需要“./”，但它可能会导致问题，具体取决于您的环境和 TensorFlow 版本。

还想提一下 sess.run(tf.global_variables_initializer()) 在恢复会话之前可能很重要。

如果您在尝试恢复已保存的会话时收到有关未初始化变量的错误，请确保在 saver.restore(sess, save_file) 行之前包含 sess.run(tf.global_variables_initializer())。它可以让你省去头痛。

Answer 14

我的环境：Python3.6，Tensorflow 1.3.0

虽然有很多解决方案，但大多数都是基于tf.train.Saver。当我们加载由 Saver 保存的 .ckpt 时，我们必须重新定义 tensorflow 网络或使用一些奇怪且难以记住的名称，例如'placehold_0:0'、'dense/Adam/Weight:0'。这里推荐使用tf.saved_model，下面给出一个最简单的例子，大家可以参考Serving a TensorFlow Model:

保存模型：

import tensorflow as tf

# define the tensorflow network and do some trains
x = tf.placeholder("float", name="x")
w = tf.Variable(2.0, name="w")
b = tf.Variable(0.0, name="bias")

h = tf.multiply(x, w)
y = tf.add(h, b, name="y")
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# save the model
export_path =  './savedmodel'
builder = tf.saved_model.builder.SavedModelBuilder(export_path)

tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y)

prediction_signature = (
  tf.saved_model.signature_def_utils.build_signature_def(
      inputs={'x_input': tensor_info_x},
      outputs={'y_output': tensor_info_y},
      method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))

builder.add_meta_graph_and_variables(
  sess, [tf.saved_model.tag_constants.SERVING],
  signature_def_map={
      tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
          prediction_signature 
  },
  )
builder.save()

加载模型：

import tensorflow as tf
sess=tf.Session() 
signature_key = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
input_key = 'x_input'
output_key = 'y_output'

export_path =  './savedmodel'
meta_graph_def = tf.saved_model.loader.load(
           sess,
          [tf.saved_model.tag_constants.SERVING],
          export_path)
signature = meta_graph_def.signature_def

x_tensor_name = signature[signature_key].inputs[input_key].name
y_tensor_name = signature[signature_key].outputs[output_key].name

x = sess.graph.get_tensor_by_name(x_tensor_name)
y = sess.graph.get_tensor_by_name(y_tensor_name)

y_out = sess.run(y, {x: 3.0})

Answer 15

Tensorflow 2 文档

保存检查点

改编自the docs

# -------------------------
# -----  Toy Context  -----
# -------------------------
import tensorflow as tf


class Net(tf.keras.Model):
    """A simple linear model."""

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = tf.keras.layers.Dense(5)

    def call(self, x):
        return self.l1(x)


def toy_dataset():
    inputs = tf.range(10.0)[:, None]
    labels = inputs * 5.0 + tf.range(5.0)[None, :]
    return (
        tf.data.Dataset.from_tensor_slices(dict(x=inputs, y=labels)).repeat().batch(2)
    )


def train_step(net, example, optimizer):
    """Trains `net` on `example` using `optimizer`."""
    with tf.GradientTape() as tape:
        output = net(example["x"])
        loss = tf.reduce_mean(tf.abs(output - example["y"]))
    variables = net.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))
    return loss


# ----------------------------
# -----  Create Objects  -----
# ----------------------------

net = Net()
opt = tf.keras.optimizers.Adam(0.1)
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# ----------------------------
# -----  Train and Save  -----
# ----------------------------

ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
else:
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    loss = train_step(net, example, opt)
    ckpt.step.assign_add(1)
    if int(ckpt.step) % 10 == 0:
        save_path = manager.save()
        print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
        print("loss {:1.2f}".format(loss.numpy()))


# ---------------------
# -----  Restore  -----
# ---------------------

# In another script, re-initialize objects
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# Re-use the manager code above ^

ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
else:
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    # Continue training or evaluate etc.

张量流 < 2

来自文档：

保存

# Create some variables.
v1 = tf.get_variable("v1", shape=[3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", shape=[5], initializer = tf.zeros_initializer)

inc_v1 = v1.assign(v1+1)
dec_v2 = v2.assign(v2-1)

# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, and save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  # Do some work with the model.
  inc_v1.op.run()
  dec_v2.op.run()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print("Model saved in path: %s" % save_path)

恢复

tf.reset_default_graph()

# Create some variables.
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Check the values of the variables
  print("v1 : %s" % v1.eval())
  print("v2 : %s" % v2.eval())

`simple_save`

很多好的答案，为了完整起见，我将添加我的 2 美分：simple_save。也是使用 tf.data.Dataset API.

的独立代码示例

Python 3 ;张量流 1.14

import tensorflow as tf
from tensorflow.saved_model import tag_constants

with tf.Graph().as_default():
    with tf.Session() as sess:
        ...

        # Saving
        inputs = {
            "batch_size_placeholder": batch_size_placeholder,
            "features_placeholder": features_placeholder,
            "labels_placeholder": labels_placeholder,
        }
        outputs = {"prediction": model_output}
        tf.saved_model.simple_save(
            sess, 'path/to/your/location/', inputs, outputs
        )

正在恢复：

graph = tf.Graph()
with restored_graph.as_default():
    with tf.Session() as sess:
        tf.saved_model.loader.load(
            sess,
            [tag_constants.SERVING],
            'path/to/your/location/',
        )
        batch_size_placeholder = graph.get_tensor_by_name('batch_size_placeholder:0')
        features_placeholder = graph.get_tensor_by_name('features_placeholder:0')
        labels_placeholder = graph.get_tensor_by_name('labels_placeholder:0')
        prediction = restored_graph.get_tensor_by_name('dense/BiasAdd:0')

        sess.run(prediction, feed_dict={
            batch_size_placeholder: some_value,
            features_placeholder: some_other_value,
            labels_placeholder: another_value
        })

独立示例

Original blog post

为了演示，以下代码生成随机数据。

我们从创建占位符开始。他们将在运行时间保存数据。从他们那里，我们创建了 Dataset，然后是 Iterator。我们得到迭代器生成的张量，称为 input_tensor，它将作为我们模型的输入。
模型本身是从 input_tensor 构建的：一个基于 GRU 的双向 RNN，后跟一个密集分类器。因为为什么不呢。
损失是 softmax_cross_entropy_with_logits，用 Adam 优化。在 2 个时期（每个时期 2 个批次）之后，我们用 tf.saved_model.simple_save 保存“训练有素”的模型。如果您运行按原样编写代码，则模型将保存在当前工作目录中名为 simple/ 的文件夹中。
在一个新的图表中，我们然后用 tf.saved_model.loader.load 恢复保存的模型。我们使用 graph.get_tensor_by_name 获取占位符和 logits，使用 graph.get_operation_by_name.

Iterator

最后，我们运行对数据集中的两个批次进行推断，并检查保存和恢复的模型是否都产生相同的值。他们做到了！

代码：

import os
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants


def model(graph, input_tensor):
    """Create the model which consists of
    a bidirectional rnn (GRU(10)) followed by a dense classifier

    Args:
        graph (tf.Graph): Tensors' graph
        input_tensor (tf.Tensor): Tensor fed as input to the model

    Returns:
        tf.Tensor: the model's output layer Tensor
    """
    cell = tf.nn.rnn_cell.GRUCell(10)
    with graph.as_default():
        ((fw_outputs, bw_outputs), (fw_state, bw_state)) = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=cell,
            cell_bw=cell,
            inputs=input_tensor,
            sequence_length=[10] * 32,
            dtype=tf.float32,
            swap_memory=True,
            scope=None)
        outputs = tf.concat((fw_outputs, bw_outputs), 2)
        mean = tf.reduce_mean(outputs, axis=1)
        dense = tf.layers.dense(mean, 5, activation=None)

        return dense


def get_opt_op(graph, logits, labels_tensor):
    """Create optimization operation from model's logits and labels

    Args:
        graph (tf.Graph): Tensors' graph
        logits (tf.Tensor): The model's output without activation
        labels_tensor (tf.Tensor): Target labels

    Returns:
        tf.Operation: the operation performing a stem of Adam optimizer
    """
    with graph.as_default():
        with tf.variable_scope('loss'):
            loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                    logits=logits, labels=labels_tensor, name='xent'),
                    name="mean-xent"
                    )
        with tf.variable_scope('optimizer'):
            opt_op = tf.train.AdamOptimizer(1e-2).minimize(loss)
        return opt_op


if __name__ == '__main__':
    # Set random seed for reproducibility
    # and create synthetic data
    np.random.seed(0)
    features = np.random.randn(64, 10, 30)
    labels = np.eye(5)[np.random.randint(0, 5, (64,))]

    graph1 = tf.Graph()
    with graph1.as_default():
        # Random seed for reproducibility
        tf.set_random_seed(0)
        # Placeholders
        batch_size_ph = tf.placeholder(tf.int64, name='batch_size_ph')
        features_data_ph = tf.placeholder(tf.float32, [None, None, 30], 'features_data_ph')
        labels_data_ph = tf.placeholder(tf.int32, [None, 5], 'labels_data_ph')
        # Dataset
        dataset = tf.data.Dataset.from_tensor_slices((features_data_ph, labels_data_ph))
        dataset = dataset.batch(batch_size_ph)
        iterator = tf.data.Iterator.from_structure(dataset.output_types, dataset.output_shapes)
        dataset_init_op = iterator.make_initializer(dataset, name='dataset_init')
        input_tensor, labels_tensor = iterator.get_next()

        # Model
        logits = model(graph1, input_tensor)
        # Optimization
        opt_op = get_opt_op(graph1, logits, labels_tensor)

        with tf.Session(graph=graph1) as sess:
            # Initialize variables
            tf.global_variables_initializer().run(session=sess)
            for epoch in range(3):
                batch = 0
                # Initialize dataset (could feed epochs in Dataset.repeat(epochs))
                sess.run(
                    dataset_init_op,
                    feed_dict={
                        features_data_ph: features,
                        labels_data_ph: labels,
                        batch_size_ph: 32
                    })
                values = []
                while True:
                    try:
                        if epoch < 2:
                            # Training
                            _, value = sess.run([opt_op, logits])
                            print('Epoch {}, batch {} | Sample value: {}'.format(epoch, batch, value[0]))
                            batch += 1
                        else:
                            # Final inference
                            values.append(sess.run(logits))
                            print('Epoch {}, batch {} | Final inference | Sample value: {}'.format(epoch, batch, values[-1][0]))
                            batch += 1
                    except tf.errors.OutOfRangeError:
                        break
            # Save model state
            print('\nSaving...')
            cwd = os.getcwd()
            path = os.path.join(cwd, 'simple')
            shutil.rmtree(path, ignore_errors=True)
            inputs_dict = {
                "batch_size_ph": batch_size_ph,
                "features_data_ph": features_data_ph,
                "labels_data_ph": labels_data_ph
            }
            outputs_dict = {
                "logits": logits
            }
            tf.saved_model.simple_save(
                sess, path, inputs_dict, outputs_dict
            )
            print('Ok')
    # Restoring
    graph2 = tf.Graph()
    with graph2.as_default():
        with tf.Session(graph=graph2) as sess:
            # Restore saved values
            print('\nRestoring...')
            tf.saved_model.loader.load(
                sess,
                [tag_constants.SERVING],
                path
            )
            print('Ok')
            # Get restored placeholders
            labels_data_ph = graph2.get_tensor_by_name('labels_data_ph:0')
            features_data_ph = graph2.get_tensor_by_name('features_data_ph:0')
            batch_size_ph = graph2.get_tensor_by_name('batch_size_ph:0')
            # Get restored model output
            restored_logits = graph2.get_tensor_by_name('dense/BiasAdd:0')
            # Get dataset initializing operation
            dataset_init_op = graph2.get_operation_by_name('dataset_init')

            # Initialize restored dataset
            sess.run(
                dataset_init_op,
                feed_dict={
                    features_data_ph: features,
                    labels_data_ph: labels,
                    batch_size_ph: 32
                }

            )
            # Compute inference for both batches in dataset
            restored_values = []
            for i in range(2):
                restored_values.append(sess.run(restored_logits))
                print('Restored values: ', restored_values[i][0])

    # Check if original inference and restored inference are equal
    valid = all((v == rv).all() for v, rv in zip(values, restored_values))
    print('\nInferences match: ', valid)

这将打印：

$ python3 save_and_restore.py

Epoch 0, batch 0 | Sample value: [-0.13851789 -0.3087595   0.12804556  0.20013677 -0.08229901]
Epoch 0, batch 1 | Sample value: [-0.00555491 -0.04339041 -0.05111827 -0.2480045  -0.00107776]
Epoch 1, batch 0 | Sample value: [-0.19321944 -0.2104792  -0.00602257  0.07465433  0.11674127]
Epoch 1, batch 1 | Sample value: [-0.05275984  0.05981954 -0.15913513 -0.3244143   0.10673307]
Epoch 2, batch 0 | Final inference | Sample value: [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Epoch 2, batch 1 | Final inference | Sample value: [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Saving...
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/some/path/simple/saved_model.pb'
Ok

Restoring...
INFO:tensorflow:Restoring parameters from b'/some/path/simple/variables/variables'
Ok
Restored values:  [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Restored values:  [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Inferences match:  True

Answer 16

使用tf.train.Saver保存模型。请记住，如果要减小模型大小，则需要指定 var_list。 val_list 可以是：

tf.trainable_variables 或
tf.global_variables.

Answer 17

根据新的 Tensorflow 版本，tf.train.Checkpoint 是保存和恢复模型的首选方式：

Checkpoint.save and Checkpoint.restore write and read object-based checkpoints, in contrast to tf.train.Saver which writes and reads variable.name based checkpoints. Object-based checkpointing saves a graph of dependencies between Python objects (Layers, Optimizers, Variables, etc.) with named edges, and this graph is used to match variables when restoring a checkpoint. It can be more robust to changes in the Python program, and helps to support restore-on-create for variables when executing eagerly. Prefer tf.train.Checkpoint over tf.train.Saver for new code.

这是一个例子：

import tensorflow as tf
import os

tf.enable_eager_execution()

checkpoint_directory = "/tmp/training_checkpoints"
checkpoint_prefix = os.path.join(checkpoint_directory, "ckpt")

checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
status = checkpoint.restore(tf.train.latest_checkpoint(checkpoint_directory))
for _ in range(num_training_steps):
  optimizer.minimize( ... )  # Variables will be restored on creation.
status.assert_consumed()  # Optional sanity checks.
checkpoint.save(file_prefix=checkpoint_prefix)

More information and example here.

Answer 18

无论你想把模型保存到哪里，

self.saver = tf.train.Saver()
with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            ...
            self.saver.save(sess, filename)

确保所有 tf.Variable 都有名字，因为您以后可能想使用它们的名字来恢复它们。而你想要预测的地方，

saver = tf.train.import_meta_graph(filename)
name = 'name given when you saved the file' 
with tf.Session() as sess:
      saver.restore(sess, name)
      print(sess.run('W1:0')) #example to retrieve by variable name

确保保护程序在相应的会话中运行。请记住，如果您使用 tf.train.latest_checkpoint('./')，则只会使用最新的检查点。

Answer 19

您可以使用

保存网络中的变量

saver = tf.train.Saver() 
saver.save(sess, 'path of save/fileName.ckpt')

要恢复网络以便以后或在另一个脚本中重用，请使用：

saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('path of save/')
sess.run(....)

要点：

sess 必须在第一次和后来的运行之间相同（连贯结构）。
saver.restore 需要保存文件的文件夹路径，而不是单个文件路径。

Answer 20

对于tensorflow 2.0，是as simple as

# Save the model
model.save('path_to_my_model.h5')

恢复：

new_model = tensorflow.keras.models.load_model('path_to_my_model.h5')

Answer 21

我的版本：

tensorflow (1.13.1)
tensorflow-gpu (1.13.1)

简单的方法是

保存：

model.save("model.h5")

恢复：

model = tf.keras.models.load_model("model.h5")

Answer 22

在新版本的tensorflow 2.0中，saving/loading一个模型的过程要简单很多。由于 Keras API 的实施，TensorFlow 的高级 API。

要保存模型：查看文档以供参考： https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/models/save_model

tf.keras.models.save_model(model_name, filepath, save_format)

加载模型：

https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/models/load_model

model = tf.keras.models.load_model(filepath)

Answer 23

tf.keras 使用 `TF2.0`

保存模型

我看到关于使用 TF1.x 保存模型的很好的答案。我想提供更多关于保存 tensorflow.keras 模型的建议，这有点复杂，因为有很多方法可以保存模型。

这里我提供一个例子，将tensorflow.keras模型保存到当前目录下的model_path文件夹中。这适用于最新的 tensorflow (TF2.0)。如果近期有任何变化，我会更新此描述。

保存和加载整个模型

import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

#import data
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# create a model
def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
# compile the model
  model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
  return model

# Create a basic model instance
model=create_model()

model.fit(x_train, y_train, epochs=1)
loss, acc = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))

# Save entire model to a HDF5 file
model.save('./model_path/my_model.h5')

# Recreate the exact same model, including weights and optimizer.
new_model = keras.models.load_model('./model_path/my_model.h5')
loss, acc = new_model.evaluate(x_test, y_test)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

仅保存和加载模型权重

如果您只想保存模型权重然后加载权重来恢复模型，那么

model.fit(x_train, y_train, epochs=5)
loss, acc = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))

# Save the weights
model.save_weights('./checkpoints/my_checkpoint')

# Restore the weights
model = create_model()
model.load_weights('./checkpoints/my_checkpoint')

loss,acc = model.evaluate(x_test, y_test)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

使用 keras 检查点回调保存和恢复

# include the epoch in the file name. (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    checkpoint_path, verbose=1, save_weights_only=True,
    # Save weights, every 5-epochs.
    period=5)

model = create_model()
model.save_weights(checkpoint_path.format(epoch=0))
model.fit(train_images, train_labels,
          epochs = 50, callbacks = [cp_callback],
          validation_data = (test_images,test_labels),
          verbose=0)

latest = tf.train.latest_checkpoint(checkpoint_dir)

new_model = create_model()
new_model.load_weights(latest)
loss, acc = new_model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

使用自定义指标保存模型

import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Custom Loss1 (for example) 
@tf.function() 
def customLoss1(yTrue,yPred):
  return tf.reduce_mean(yTrue-yPred) 

# Custom Loss2 (for example) 
@tf.function() 
def customLoss2(yTrue, yPred):
  return tf.reduce_mean(tf.square(tf.subtract(yTrue,yPred))) 

def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),  
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
  model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy', customLoss1, customLoss2])
  return model

# Create a basic model instance
model=create_model()

# Fit and evaluate model 
model.fit(x_train, y_train, epochs=1)
loss, acc,loss1, loss2 = model.evaluate(x_test, y_test,verbose=1)
print("Original model, accuracy: {:5.2f}%".format(100*acc))

model.save("./model.h5")

new_model=tf.keras.models.load_model("./model.h5",custom_objects={'customLoss1':customLoss1,'customLoss2':customLoss2})

使用自定义操作保存 keras 模型

当我们有以下情况中的自定义操作时 (tf.tile)，我们需要创建一个函数并用 Lambda 层包装。否则，模型无法保存。

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Lambda
from tensorflow.keras import Model

def my_fun(a):
  out = tf.tile(a, (1, tf.shape(a)[0]))
  return out

a = Input(shape=(10,))
#out = tf.tile(a, (1, tf.shape(a)[0]))
out = Lambda(lambda x : my_fun(x))(a)
model = Model(a, out)

x = np.zeros((50,10), dtype=np.float32)
print(model(x).numpy())

model.save('my_model.h5')

#load the model
new_model=tf.keras.models.load_model("my_model.h5")

我想我已经介绍了保存 tf.keras 模型的众多方法中的一些。但是，还有许多其他方法。如果您发现上面未涵盖您的用例，请在下方评论。谢谢！

Answer 24

根据@Vishnuvardhan Janapati 的回答，这是另一种在 TensorFlow 2.0.0[下使用 自定义 layer/metric/loss 保存和重新加载模型的方法

import tensorflow as tf
from tensorflow.keras.layers import Layer
from tensorflow.keras.utils.generic_utils import get_custom_objects

# custom loss (for example)  
def custom_loss(y_true,y_pred):
  return tf.reduce_mean(y_true - y_pred)
get_custom_objects().update({'custom_loss': custom_loss}) 

# custom loss (for example) 
class CustomLayer(Layer):
  def __init__(self, ...):
      ...
  # define custom layer and all necessary custom operations inside custom layer

get_custom_objects().update({'CustomLayer': CustomLayer})

这样，一旦你执行了这样的代码，并用tf.keras.models.save_model或model.save或ModelCheckpoint回调保存你的模型，你可以重新加载你的模型而不需要精确的自定义对象，简单到

new_model = tf.keras.models.load_model("./model.h5"})

Answer 25

对于tensorflow-2.0

很简单

import tensorflow as tf

保存

model.save("model_name")

恢复

model = tf.keras.models.load_model('model_name')

Answer 26

这是一个使用 Tensorflow 2.0 SavedModel 格式 （推荐格式，according to the docs） 的简单示例简单的 MNIST 数据集分类器，使用 Keras 函数 API，没有太多花哨的东西：

# Imports
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt

# Load data
mnist = tf.keras.datasets.mnist # 28 x 28
(x_train,y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixels [0,255] -> [0,1]
x_train = tf.keras.utils.normalize(x_train,axis=1)
x_test = tf.keras.utils.normalize(x_test,axis=1)

# Create model
input = Input(shape=(28,28), dtype='float64', name='graph_input')
x = Flatten()(input)
x = Dense(128, activation='relu')(x)
x = Dense(128, activation='relu')(x)
output = Dense(10, activation='softmax', name='graph_output', dtype='float64')(x)
model = Model(inputs=input, outputs=output)

model.compile(optimizer='adam',
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])

# Train
model.fit(x_train, y_train, epochs=3)

# Save model in SavedModel format (Tensorflow 2.0)
export_path = 'model'
tf.saved_model.save(model, export_path)

# ... possibly another python program 

# Reload model
loaded_model = tf.keras.models.load_model(export_path) 

# Get image sample for testing
index = 0
img = x_test[index] # I normalized the image on a previous step

# Predict using the signature definition (Tensorflow 2.0)
predict = loaded_model.signatures["serving_default"]
prediction = predict(tf.constant(img))

# Show results
print(np.argmax(prediction['graph_output']))  # prints the class number
plt.imshow(x_test[index], cmap=plt.cm.binary)  # prints the image

什么是serving_default？

这是 you selected (in this case, the default serve tag was selected). Also, here 的名称解释了如何使用 saved_model_cli 查找模型的标签和签名。

免责声明

这只是一个基本的例子，如果你只是想得到它运行，但绝不是一个完整的答案 - 也许我可以在未来更新它。我只是想给出一个使用 TF 2.0 中的 SavedModel 的简单示例，因为我在任何地方都没有看到过，即使是这么简单。

@的答案是一个 SavedModel 示例，但它不适用于 Tensorflow 2.0，因为不幸的是有一些重大变化。

@的回答是TF 2.0，但不是SavedModel格式。

Answer 27

Tensorflow 2.6 : 现在变得更简单了，你可以用两种格式保存模型

Saved_model（与 tf 服务兼容）
H5 或 HDF5

以两种格式保存模型：

 from tensorflow.keras import Model
 inputs = tf.keras.Input(shape=(224,224,3))
 y = tf.keras.layers.Conv2D(24, 3, activation='relu', input_shape=input_shape[1:])(inputs)
 outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(y)
 model = tf.keras.Model(inputs=inputs, outputs=outputs)
 model.save("saved_model/my_model") #To Save in Saved_model format
 model.save("my_model.h5") #To save model in H5 or HDF5 format

以两种格式加载模型

import tensorflow as tf
h5_model = tf.keras.models.load_model("my_model.h5") # loading model in h5 format
h5_model.summary()
saved_m = tf.keras.models.load_model("saved_model/my_model") #loading model in saved_model format
saved_m.summary()

Answer 28

最简单的方法是使用keras api，在线保存模型，在线加载模型

from keras.models import load_model

my_model.save('my_model.h5')  # creates a HDF5 file 'my_model.h5'

del my_model  # deletes the existing model


my_model = load_model('my_model.h5') # returns a compiled model identical to the previous one

训练后如何save/restore一个模型？

How to save/restore a model after training?

python

tensorflow

保存模型

恢复模型

第 1 步：初始化所有变量

第 2 步：在模型 `Saver` 中保存会话并保存

第三步：恢复模型

第 4 步：检查您的变量

在代码中重建图表

节省

正在加载

同时从文件加载图表

节省

正在加载

Tensorflow 2 文档

保存检查点

更多链接

张量流 < 2

保存

恢复

`simple_save`

独立示例

tf.keras 使用 `TF2.0`

保存和加载整个模型

仅保存和加载模型权重

使用 keras 检查点回调保存和恢复

使用自定义指标保存模型

使用自定义操作保存 keras 模型

保存

恢复

训练后如何save/restore一个模型？

How to save/restore a model after training?

python

tensorflow

保存模型

恢复模型

第 1 步：初始化所有变量

第 2 步：在模型 Saver 中保存会话并保存

第三步：恢复模型

第 4 步：检查您的变量

在代码中重建图表

节省

正在加载

同时从文件加载图表

节省

正在加载

Tensorflow 2 文档

保存检查点

更多链接

张量流 < 2

保存

恢复

simple_save

独立示例

tf.keras 使用 TF2.0

保存和加载整个模型

仅保存和加载模型权重

使用 keras 检查点回调保存和恢复

使用自定义指标保存模型

使用自定义操作保存 keras 模型

保存

恢复

第 2 步：在模型 `Saver` 中保存会话并保存

`simple_save`

tf.keras 使用 `TF2.0`