使用 Estimator export_saved_model 时出错（未找到：在检查点中未找到密钥 global_step）

Question

我正在尝试为 google 云 ML 准备预训练模型。我正在尝试使用估算器导出模型。在估算器加载检查点期间，出现以下错误：

2018-11-19 13:28:57.526564: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key global_step not found in checkpoint
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
    return fn(*args)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key global_step not found in checkpoint
         [[{{node save_1/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]

TensorFlow 版本（使用下面的命令）：1.12.0
Python版本：3.6

这是我使用的代码：

MODEL_DIR='model/'
def decode_image(image_bytes):
    image = tf.image.decode_image(image_bytes)
    image = tf.cast(image, dtype=tf.uint8)
    return image

def serving_input_fn():
    createmodel()
    inputs = {'image_bytes': tf.placeholder(tf.string, shape=(), name="image_bytes")}
    imagebytes = tf.squeeze(inputs['image_bytes']) # make it a scalar
    image = decode_image(imagebytes)
    # make the outer dimension unknown (and not 1)
    image = tf.placeholder_with_default(image, shape=[None, None, None, 3])

    features = {'image_bytes' : image}
    return tf.estimator.export.ServingInputReceiver(features, inputs)

def model_fn(features, labels, mode, params):
    pred = tf.get_default_graph().get_tensor_by_name("fc1_voc12:0")
    return tf.estimator.EstimatorSpec(
        mode=tf.estimator.ModeKeys.PREDICT,
        predictions=pred,
        export_outputs={'pred':tf.estimator.export.PredictOutput(pred)}
        )

estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    model_dir=MODEL_DIR)

estimator.export_savedmodel('deployment_gcp_1', serving_input_fn, strip_default_attrs=True)

我已经对这个问题进行了大量搜索。旧版本的 tensorflow 有一个错误报告（我认为是 1.2.0，但我现在不确定）。我可以使用 tf.saved_model.simple_save 加载和保存该模型，当我对它进行运行预测时它会起作用。

我不确定这是一个错误还是我遗漏了一些非常简单的东西。我在 tensorflow github 回购上发布了同样的东西，但还没有回应。

Answer 1

在您的代码中，您使用任何优化器将损失降至最低，

train_op = optimizer.minimize( loss , global_step=tf.train.get_global_step())

您可能没有提到 global_step= 导致错误的原因。

Answer 2

我终于设法通过手动添加 "global_step" 变量让它工作，将其导出为新的检查点并加载它。加载当前检查点后我运行以下代码：

b = tf.Variable(load_step, name="global_step", dtype=tf.int64)
sess.run(b.initializer)
saver = tf.train.Saver()
saver.save(sess,'UpdatedModel/model.ckpt', global_step=load_step)

然后在我之前的代码中我使用了新文件夹 MODEL_DIR 并且它起作用了。

使用 Estimator export_saved_model 时出错（未找到：在检查点中未找到密钥 global_step）

Error using Estimator export_saved_model (Not found: Key global_step not found in checkpoint)

tensorflow

google-cloud-ml