通过 CloudML 获取 TFrecords 的批量预测

Getting batch predictions for TFrecords via CloudML

我关注 this great tutorial 并成功训练了一个模型(在 CloudML 上)。我的代码也进行离线预测,但现在我尝试使用 Cloud ML 进行预测并遇到一些问题。

为了部署我的模型,我遵循 this tutorial. Now I have a code that generates TFRecords via apache_beam.io.WriteToTFRecord and I want to make predictions for those TFRecords. To do so I am following this article,我的命令如下所示:

gcloud ml-engine jobs submit prediction $JOB_ID --model $MODEL --input-paths gs://"$FILE_INPUT".gz --output-path gs://"$OUTPUT"/predictions --region us-west1 --data-format TF_RECORD_GZIP

但我只得到错误: 'Exception during running the graph: Expected serialized to be a scalar, got shape: [64]

它似乎需要不同格式的数据。我找到了 JSON here 的格式规范,但找不到如何使用 TFrecords 进行操作。

更新:这里是saved_model_cli show --all --dir

的输出
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['prediction']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['example_proto'] tensor_info:
    dtype: DT_STRING
    shape: unknown_rank
    name: input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['probability'] tensor_info:
    dtype: DT_FLOAT
    shape: (1, 1)
    name: probability:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['example_proto'] tensor_info:
    dtype: DT_STRING
    shape: unknown_rank
    name: input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['probability'] tensor_info:
    dtype: DT_FLOAT
    shape: (1, 1)
    name: probability:0
  Method name is: tensorflow/serving/predict

当你导出你的模型时,你需要确保它是"batchable",即输入占位符的外部尺寸有shape=[None],例如

input = tf.Placeholder(dtype=tf.string, shape=[None])
...

这可能需要稍微修改图表。例如,我看到您的输出形状被硬编码为 [1,1]。最外面的维度应该是 None,这可能会在您修复占位符时自动发生,或者可能需要其他更改。

鉴于输出的名称是 probabilities,我还希望最里面的维度 >1,即被预测的 类 的数量,所以像 [None, NUM_CLASSES].