Create Version Failed. Bad model detected with error: "Error loading the model" - AI Platform Prediction

Question

我通过 AI Platform UI 创建了一个使用全局端点的模型。我正在尝试部署我使用保存的模型生成器导出的基本 tensorflow 1.15.0 模型。当我尝试部署此模型时，我在 UI 中收到 Create Version Failed. Bad model detected with error: "Error loading the model" 错误，并且我在日志中看到以下内容：

ERROR:root:Failed to import GA GRPC module. This is OK if the runtime version is 1.x

Failure: Could not reach metadata service: Internal Server Error.

ERROR:root:Command '['/tools/google-cloud-sdk/bin/gsutil', '-o', 'GoogleCompute:service_account=default', 'cp', '-R', 'gs://cml-365057443918-1608667078774578/models/xsqr_global/v6/7349456410861999293/model/*', '/tmp/model/0001']' returned non-zero exit status 1.

ERROR:root:Error loading model: 'generator' object has no attribute 'next'

ERROR:root:Error loading the model

Framework/ML运行时版本：Tensorflow 1.15.0
Python: 3.7.3

奇怪的是 gcloud ai-platform local predict 可以与这个导出的模型一起正常工作，和我可以在 regional 端点没有问题。如果我尝试使用全局端点模型，它只会出现此错误。但我需要全局端点，因为我计划使用自定义预测例程（如果我能让这个基本模型先工作的话）。

日志似乎表明从存储中复制模型时出现问题？我已尝试为各种 IAM 角色提供额外的查看者权限，但我仍然遇到相同的错误。

感谢您的帮助。

Answer 1

我认为这与 https://issuetracker.google.com/issues/175316320

是同一个问题

问题中的评论说现在正在推出修复程序。

Answer 2

今天我遇到了同样的错误（错误：（gcloud.ai-platform.versions.create）创建版本失败。检测到错误模型错误：“加载模型时出错”）& 对于那些想要一个总结：

建议在部署版本时通过区域端点（例如：us-central1）使用 n1* 机器类型（例如：n1-standard-4）而不是 mls1* 机器。此外，我确保在使用以下命令创建模型本身时提及同一区域 (us-central1)，从而解决上述错误。

!gcloud ai-platform models create $model_name
--region=$REGION

Create Version Failed. Bad model detected with error: "Error loading the model" - AI Platform Prediction

Create Version Failed. Bad model detected with error: "Error loading the model" - AI Platform Prediction

google-prediction

tensorflow

google-cloud-ml

google-ai-platform