机器学习引擎服务似乎没有按预期工作

ML engine serving seems to not be working as intended

在使用以下代码并执行 gcloud ml-engine 本地预测时,我得到:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype string and shape [?] [[Node: Placeholder = Placeholderdtype=DT_STRING, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] (Error code: 2)

tf_files_path = './tf'
# os.makedirs(tf_files_path)  # temp dir
estimator =\
    tf.keras.estimator.model_to_estimator(keras_model_path="model_data/yolo.h5",
                                            model_dir=tf_files_path)

#up_one_dir(os.path.join(tf_files_path, 'keras'))

def serving_input_receiver_fn():
    def prepare_image(image_str_tensor):
        image = tf.image.decode_jpeg(image_str_tensor,
                                    channels=3)
        image = tf.divide(image, 255)
        image = tf.image.convert_image_dtype(image, tf.float32)
        return image

    # Ensure model is batchable
    # 
    input_ph = tf.placeholder(tf.string, shape=[None])
    images_tensor = tf.map_fn(
                prepare_image, input_ph, back_prop=False, dtype=tf.float32)

    return tf.estimator.export.ServingInputReceiver(
        {model.input_names[0]: images_tensor},
        {'image_bytes': input_ph})

export_path = './export'
estimator.export_savedmodel(
    export_path,
    serving_input_receiver_fn=serving_input_receiver_fn)

我发送到 ml 引擎的 json 看起来像这样:

{"image_bytes": {"b64": "/9j/4AAQSkZJRgABAQAAAQABAAD/2w..."}}

当不进行本地预测,而是将其发送到 ML 引擎本身时,我得到:

ERROR: (gcloud.ml-engine.predict) HTTP request failed. Response: {
"error": {
"code": 500,
"message": "Internal error encountered.",
"status": "INTERNAL"
}
}

saved_model_cli 给出:

saved_model_cli show --all --dir export/1547848897/

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['image_bytes'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: Placeholder:0
The given SavedModel SignatureDef contains the following output(s):
outputs['conv2d_59'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, 255)
name: conv2d_59/BiasAdd:0
outputs['conv2d_67'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, 255)
name: conv2d_67/BiasAdd:0
outputs['conv2d_75'] tensor_info:
dtype: DT_FLOAT
shape: (-1, -1, -1, 255)
name: conv2d_75/BiasAdd:0
Method name is: tensorflow/serving/predict

有人看到这里出了什么问题吗?

问题已解决。模型的输出似乎太大,ML 引擎无法将其发回,并且它没有在比 500 内部错误更相关的异常中捕获它。我们在模型中添加了一些 post 处理步骤,现在它工作正常。

对于 return 错误的 gcloud ml-engine local predict 命令,这似乎是一个错误。由于该模型现在可以在 ml-engine 上运行,但本地预测仍然会 return 这个错误。