Bazel 错误解析 tf.estimator 模型

Question

我正在尝试使用 tf.estimator 和 export_savedmodel() 制作一个 *.pb 模型，它是一个简单的分类器，用于对鸢尾花数据集进行分类（4 个特征，3 个类）：

import tensorflow as tf


num_epoch = 500
num_train = 120
num_test = 30

# 1 Define input function
def input_function(x, y, is_train):
    dict_x = {
        "thisisinput" : x,
    }

    dataset = tf.data.Dataset.from_tensor_slices((
        dict_x, y
    ))

    if is_train:
        dataset = dataset.shuffle(num_train).repeat(num_epoch).batch(num_train)
    else:   
        dataset = dataset.batch(num_test)

    return dataset


def my_serving_input_fn():
    input_data = tf.placeholder(tf.string, [None], name='input_tensors')
    receiver_tensors = {"inputs" : input_data}

    # 2 Define feature columns
    feature_columns = [
        tf.feature_column.numeric_column(key="thisisinput", shape=4),]
    features = tf.parse_example(
        input_data, 
        tf.feature_column.make_parse_example_spec(feature_columns))

    return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)


def main(argv):
    tf.set_random_seed(1103) # avoiding different result of random

    # 2 Define feature columns
    feature_columns = [
        tf.feature_column.numeric_column(key="thisisinput", shape=4),]

    # 3 Define an estimator
    classifier = tf.estimator.DNNClassifier(
        feature_columns=feature_columns,
        hidden_units=[10],
        n_classes=3,
        optimizer=tf.train.GradientDescentOptimizer(0.001),
        activation_fn=tf.nn.relu,
        model_dir = 'modeliris2/'
    )

    # Train the model
    classifier.train(
        input_fn=lambda:input_function(xtrain, ytrain, True)
    )

    # Evaluate the model
    eval_result = classifier.evaluate(
        input_fn=lambda:input_function(xtest, ytest, False)
    )

    print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
    print('\nSaving models...')
    classifier.export_savedmodel("modeliris2pb", my_serving_input_fn)


if __name__ == "__main__":
    tf.logging.set_verbosity(tf.logging.INFO)
    tf.app.run(main)

这将生成一个 saved_model.pb 文件。我已经确认该模型有效。我还可以制作另一个程序来加载并运行s 它。现在，我想使用 Bazel 总结和冻结模型。如果我构建 Bazel 然后运行以下命令：

bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \
--in_graph=saved_model.pb

我收到以下错误：

[libprotobuf ERROR external/protobuf_archive/src/google/protobuf/text_format.cc:307] Error parsing text-format tensorflow.GraphDef: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR external/protobuf_archive/src/google/protobuf/text_format.cc:307] Error parsing text-format tensorflow.GraphDef: 1:4: Interpreting non ascii codepoint 218.
[libprotobuf ERROR external/protobuf_archive/src/google/protobuf/text_format.cc:307] Error parsing text-format tensorflow.GraphDef: 1:4: Expected identifier, got: �
2018-08-14 11:50:17.759617: E tensorflow/tools/graph_transforms/summarize_graph_main.cc:320] Loading graph 'saved_model.pb' failed with Can't parse saved_model.pb as binary proto
(both text and binary parsing failed for file saved_model.pb)
2018-08-14 11:50:17.759670: E tensorflow/tools/graph_transforms/summarize_graph_main.cc:322] usage: bazel-bin/tensorflow/tools/graph_transforms/summarize_graph
Flags:
--in_graph="" string input graph file name
--print_structure=false bool whether to print the network connections of the graph

我不明白这个错误。我试过使用 inception pb file 并且效果很好，所以我认为问题在于 tf.estimator 如何构建 .pb 文件。

我在使用 export_savedmodel() 或 tf.estimator 创建已保存模型时是否遗漏了什么？

更新

Tensorflow 版本：v1.9.0-0-g25c197e023 1.9.0

tf_env_collect.sh的结果：

== cat /etc/issue ===============================================
Linux rianadam 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
VERSION="18.04.1 LTS (Bionic Beaver)"
VERSION_ID="18.04"
VERSION_CODENAME=bionic

== are we in docker =============================================
No

== compiler =====================================================
c++ (Ubuntu 7.3.0-16ubuntu3) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


== uname -a =====================================================
Linux rianadam 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

== check pips ===================================================
numpy               1.15.0 
protobuf            3.6.0  
tensorflow-gpu      1.9.0  

== check for virtualenv =========================================
True

== tensorflow import ============================================
tf.VERSION = 1.9.0
tf.GIT_VERSION = v1.9.0-0-g25c197e023
tf.COMPILER_VERSION = v1.9.0-0-g25c197e023
Sanity check: array([1], dtype=int32)
/home/rian/NgodingYuk/tf_env/env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/rian/NgodingYuk/tf_env/env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)

== env ==========================================================
LD_LIBRARY_PATH /usr/local/cuda/lib64:/usr/local/cuda-9.0/lib64:/usr/local/cuda/lib64:/usr/local/cuda-9.0/lib64:
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
Tue Aug 21 11:13:55 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.77                 Driver Version: 390.77                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 920M        Off  | 00000000:04:00.0 N/A |                  N/A |
| N/A   51C    P0    N/A /  N/A |    367MiB /  2004MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

== cuda libs  ===================================================
/usr/local/cuda-9.0/lib64/libcudart_static.a
/usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
/usr/local/cuda-9.0/doc/man/man7/libcudart.7
/usr/local/cuda-9.0/doc/man/man7/libcudart.so.7

Answer 1

当我试图从我使用 custom tf.Estimator 训练的模型中找到 input/output 节点时，我运行遇到了同样的问题。错误是因为使用 export_savedmodel 时得到的输出是 servable （这是一个，因为我目前理解它，一个 GraphDef 和其他元数据）而不仅仅是一个 GraphDef.

找到输入输出节点，你可以做到。

# -*- coding: utf-8 -*-

import tensorflow as tf
from tensorflow.saved_model import tag_constants

with tf.Session(graph=tf.Graph()) as sess:
    gf = tf.saved_model.loader.load(
        sess,
        [tf.saved_model.tag_constants.SERVING],
        "/path/to/saved/model/")

    nodes = gf.graph_def.node
    print([n.name + " -> " + n.op for n in nodes
           if n.op in ('Softmax', 'Placeholder')])

    # ... ['Placeholder -> Placeholder',
    #      'dnn/head/predictions/probabilities -> Softmax']

我也使用了固定的 DNNEstimator，所以 OP 的节点应该和我的一样，其他用户，您的操作名称可能与 Placeholder 和 Softmax 不同，具体取决于您的分类器。

现在您有了 input/output 个节点的名称，您可以冻结图，地址为 here

If you want to work with the values of your trained parameters, for example to quantize weights, you'll need to run tensorflow/python/tools/freeze_graph.py to convert the checkpoint values into embedded constants within the graph file itself.

#!/bin/bash

python ./freeze_graph.py \
  --in_graph="/path/to/model/saved_model.pb" \
  --input_checkpoint="/MyModel/model.ckpt-xxxx" \
  --output_graph="/home/user/pruned_saved_model_or_whatever.pb" \
  --input_saved_model_dir="/path/to/model" \
  --output_node_names="dnn/head/predictions/probabilities" \

然后假设您已经 graph_transforms 建造了

#!/bin/bash

tensorflow/bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \
  --in_graph=pruned_saved_model_or_whatever.pb

输出:

Found 1 possible inputs: (name=Placeholder, type=string(7), shape=[?])
No variables spotted.
Found 1 possible outputs: (name=dnn/head/predictions/probabilities, op=Softmax)
Found 256974297 (256.97M) const parameters, 0 (0) variable parameters, and 0 
control_edges
Op types used: 155 Const, 41 Identity, 32 RegexReplace, 18 Gather, 9 
StridedSlice, 9 MatMul, 6 Shape, 6 Reshape, 6 Relu, 5 ConcatV2, 4 BiasAdd, 4 
Add, 3 ExpandDims, 3 Pack, 2 NotEqual, 2 Where, 2 Select, 2 StringJoin, 2 Cast, 
2 DynamicPartition, 2 Fill, 2 Maximum, 1 Size, 1 Unique, 1 Tanh, 1 Sum, 1 
StringToHashBucketFast, 1 StringSplit, 1 Equal, 1 Squeeze, 1 Square, 1 
SparseToDense, 1 SparseSegmentSqrtN, 1 SparseFillEmptyRows, 1 Softmax, 1 
FloorDiv, 1 Rsqrt, 1 FloorMod, 1 HashTableV2, 1 LookupTableFindV2, 1 Range, 1 
Prod, 1 Placeholder, 1 ParallelDynamicStitch, 1 LookupTableSizeV2, 1 Max, 1 Mul
To use with tensorflow/tools/benchmark:benchmark_model try these arguments:
bazel run tensorflow/tools/benchmark:benchmark_model -- -- 
graph=pruned_saved_model.pb --show_flops --input_layer=Placeholder -- 
input_layer_type=string --input_layer_shape=-1 -- 
output_layer=dnn/head/predictions/probabilities

希望这对您有所帮助。

更新 (2018-12-03):

我打开了一个相关的github issue好像在一个详细的博客post中解决了，在工单末尾列出。

Bazel 错误解析 tf.estimator 模型

Bazel error parsing tf.estimator model

python

bazel

tensorflow

tensorflow-estimator