无法使用自定义容器在 Cloud AI Platform 中创建版本进行预测

Question

由于某些 VPC 限制，我不得不使用自定义容器对在 Tensorflow 上训练的模型进行预测。根据 documentation 的要求，我使用 Tensorflow Serving 创建了一个 HTTP 服务器。 build镜像使用的Dockerfile如下：

FROM tensorflow/serving:2.3.0-gpu

# copy the model file
ENV MODEL_NAME=my_model
COPY my_model /models/my_model

其中 my_model 包含 saved_model 在名为 1/.

的文件夹中

然后我将容器镜像推送到 Artifact Registry，然后创建了一个 Model。要创建 Version，我在 Cloud Console UI 上选择了 Customer Container，并将路径添加到 Container Image。然后我提到预测路线和健康路线为/v1/models/my_model:predict并更改了端口到 8501。我还将机器类型选择为 n1-standard-16 类型的单个计算节点和 1 个 P100 GPU，并保持扩展 Auto scaling.

单击保存后，我可以看到 Tensorflow 服务器正在启动，在查看日志时我们可以看到以下消息：

Successfully loaded servable version {name: my_model version: 1}

Running gRPC ModelServer at 0.0.0.0:8500

Exporting HTTP/REST API at:localhost:8501

NET_LOG: Entering the event loop

然而，大约 20-25 分钟后，version 创建就停止抛出以下错误：

Error: model server never became ready. Please validate that your model file or container configuration are valid.

我不明白为什么会这样。我能够在我的本地机器上运行相同的 docker 图像，并且我能够通过点击创建的端点成功获得预测：http://localhost:8501/v1/models/my_model :预测

在这方面的任何帮助将不胜感激。

Answer 1

您是否尝试过使用不同的健康路径？我相信 /v1/models/my_model:predict 使用 HTTP POST，但健康检查通常使用 HTTP GET

您的健康检查路径可能需要一个 GET 端点。

编辑：根据文档 https://www.tensorflow.org/tfx/serving/api_rest，您也许可以仅使用 /v1/models/my_model 作为您的健康端点进行测试

Answer 2

在与 Google 云支持团队合作找出错误后，我自己回答了这个问题。

原来我创建 Version 的端口与 Cloud AI Platform 端的 Kubernetes 部署冲突。因此，我将 Dockerfile 更改为以下内容，并且能够在经典 AI 平台和统一 AI 平台上成功运行在线预测：

FROM tensorflow/serving:2.3.0-gpu

# Set where models should be stored in the container
ENV MODEL_BASE_PATH=/models
RUN mkdir -p ${MODEL_BASE_PATH}

# copy the model file
ENV MODEL_NAME=my_model
COPY my_model /models/my_model

EXPOSE 5000

EXPOSE 8080

CMD ["tensorflow_model_server", "--rest_api_port=8080", "--port=5000", "--model_name=my_model", "--model_base_path=/models/my_model"]

无法使用自定义容器在 Cloud AI Platform 中创建版本进行预测

Unable to create a version in Cloud AI Platform using custom containers for prediction

docker

google-cloud-platform

tensorflow

tensorflow-serving

google-cloud-ml