gcloud ml-engine local predict --text-instances 失败并出现 "Could not parse" 错误

Question

我正在尝试使 tensorflow boston 样本 (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials/input_fn) 在 google cloudml 上运行，我似乎在训练中取得了成功，但我在随后的预测中遇到了困难。

我调整了代码以适应 tf.contrib.learn.Experiment 和 learn_runner.run()。它运行在本地和云端 "gcloud ml-engine local train ..."/"gcloud ml-engine jobs submit training ..."。
我可以使用经过训练的模型运行 estimator.predict(input_fn=predict_input_fn)) 并使用给定的 boston_predict.csv集.
我可以使用 "gcloud ml-engine models create ..." 和 "gcloud ml-engine versions create ..."

但是

超过 "gcloud ml-engine local predict --model-dir=/export/Servo/XXX --text-instances boston_predict.csv" 的本地预测失败并显示“InvalidArgumentError（回溯见上文）：无法解析示例输入 <..>（错误代码：2）。见下文的抄本。失败与无头 boston_predict.csv。

我用“$ gcloud ml-engine local predict --help 查找了预期的格式 "，阅读 https://cloud.google.com/ml-engine/docs/how-tos/troubleshooting，但通常无法通过 google 或 stackexhange 报告找到我的特定错误。

我是菜鸟，所以我可能犯了一些基本的错误，但我无法发现。

感谢所有帮助，

:-)

yarc68000。

--------环境----------

(env1) $ gcloud --version
Google Cloud SDK 170.0.0
alpha 2017.03.24
beta 2017.03.24
bq 2.0.25
core 2017.09.01
datalab 20170818
gcloud 
gsutil 4.27

(env1) $ python --version
Python 2.7.13 :: Anaconda 4.3.1 (64-bit)

(env1) $ conda list | grep tensorflow
tensorflow                1.3.0                     <pip>
tensorflow-tensorboard    0.1.6                     <pip>

-------------执行和错误：boston_predict.csv ----------

$ gcloud ml-engine local predict --model-dir=<..>/export/Servo/1504780684 --text-instances 1709boston/boston_predict.csv
<..>
ERROR:root:Exception during running the graph: Could not parse example input, value: 'CRIM,ZN,INDUS,NOX,RM,AGE,DIS,TAX,PTRATIO'
[[Node: ParseExample/ParseExample = ParseExample[Ndense=9, Nsparse=0, Tdense=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], dense_shapes=[[1], [1], [1], [1], [1], [1], [1], [1], [1]], sparse_types=[], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_Placeholder_0_0, ParseExample/ParseExample/names, ParseExample/ParseExample/dense_keys_0, ParseExample/ParseExample/dense_keys_1, ParseExample/ParseExample/dense_keys_2, ParseExample/ParseExample/dense_keys_3, ParseExample/ParseExample/dense_keys_4, ParseExample/ParseExample/dense_keys_5, ParseExample/ParseExample/dense_keys_6, ParseExample/ParseExample/dense_keys_7, ParseExample/ParseExample/dense_keys_8, ParseExample/Const, ParseExample/Const_1, ParseExample/Const_2, ParseExample/Const_3, ParseExample/Const_4, ParseExample/Const_5, ParseExample/Const_6, ParseExample/Const_7, ParseExample/Const_8)]]
<..>

-------- 无头执行和错误boston_predict.csv ------

（这里我尝试使用省略第一行的 boston_predict.csv）

$ gcloud ml-engine local predict --model-dir=<..>/export/Servo/1504780684 --text-instances 1709boston/boston_predict_headerless.csv
<..>
ERROR:root:Exception during running the graph: Could not parse example input, value: '0.03359,75.0,2.95,0.428,7.024,15.8,5.4011,252,18.3'
[[Node: ParseExample/ParseExample = ParseExample[Ndense=9, Nsparse=0, Tdense=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], dense_shapes=[[1], [1], [1], [1], [1], [1], [1], [1], [1]], sparse_types=[], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_Placeholder_0_0, ParseExample/ParseExample/names, ParseExample/ParseExample/dense_keys_0, ParseExample/ParseExample/dense_keys_1, ParseExample/ParseExample/dense_keys_2, ParseExample/ParseExample/dense_keys_3, ParseExample/ParseExample/dense_keys_4, ParseExample/ParseExample/dense_keys_5, ParseExample/ParseExample/dense_keys_6, ParseExample/ParseExample/dense_keys_7, ParseExample/ParseExample/dense_keys_8, ParseExample/Const, ParseExample/Const_1, ParseExample/Const_2, ParseExample/Const_3, ParseExample/Const_4, ParseExample/Const_5, ParseExample/Const_6, ParseExample/Const_7, ParseExample/Const_8)]]
<..>

Answer 1

可能有两个问题。

首先，看起来您正在导出的图形需要 tf.Example 原型作为输入，即其中有一个 parse_example(...) 操作。波士顿示例似乎没有添加该操作，因此我怀疑这是您修改的一部分。

在展示你想要的input_fn代码之前，我们需要谈谈第二个问题：版本控制。估计器存在于 tensorflow.contrib 下的早期版本的 TensorFlow 中。但是，随着 TensorFlow 的连续版本，各个部分已经迁移到 tensorflow.estimator，API 也随着迁移而改变。

CloudML Engine 目前（截至 2017 年 9 月 7 日）仅支持 TF 1.0 和 1.2，因此我将提供适用于 1.2 的解决方案。这是基于census sample。这是使用 CSV 数据所需的 input_fn，尽管我通常建议导出独立于输入格式的模型：

# Provides the data types for the various columns.
FEATURE_DEFAULTS=[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0], [0.0]]

def predict_input_fn(rows_string_tensor):
  # Takes a rank-1 tensor and converts it into rank-2 tensor
  # Example if the data is ['csv,line,1', 'csv,line,2', ..] to
  # [['csv,line,1'], ['csv,line,2']] which after parsing will result in a
  # tuple of tensors: [['csv'], ['csv']], [['line'], ['line']], [[1], [2]]
  row_columns = tf.expand_dims(rows_string_tensor, -1)
  columns = tf.decode_csv(row_columns, record_defaults=FEATURE_DEFAULTS)
  features = dict(zip(FEATURES, columns))

  return tf.contrib.learn.InputFnOps(features, None, {'csv_row': csv_row})

并且您需要这样的导出策略：

saved_model_export_utils.make_export_strategy(
    predict_input_fn,
    exports_to_keep=1,
    default_output_alternative_key=None,
)

您将把它作为大小为 1 的列表传递给 tf.contrib.learn.Experiment 的构造函数。

gcloud ml-engine local predict --text-instances 失败并出现 "Could not parse" 错误

gcloud ml-engine local predict --text-instances fails with "Could not parse" error

gcloud

tensorflow

google-cloud-ml-engine