google云AI平台有没有办法同时使用GPU加速器和Torch进行模型部署？

Question

我已经有一个 torch 模型 (BERT)，我想使用 ai-platform 服务使用 GPU 进行在线预测，但我不知道该怎么做。

以下命令在没有加速器的情况下有效：

gcloud alpha ai-platform versions create {VERSION} --model {MODEL_NAME} --origin=gs://{BUCKET}/models/ --python-version=3.5 --runtime-version=1.14 --package-uris=gs://{BUCKET}/packages/my-torch-package-0.1.tar.gz,gs://cloud-ai-pytorch/torch-1.0.0-cp35-cp35m-linux_x86_64.whl --machine-type=mls1-c4-m4 --prediction-class=predictor.CustomModelPrediction

但是，如果我尝试添加加速器参数：

--accelerator=^:^count=1:type=nvidia-tesla-k80

我收到以下错误消息：

ERROR: (gcloud.alpha.ai-platform.versions.create) INVALID_ARGUMENT: Field: version.machine_type Error: GPU accelerators are not supported on the requested machine type: mls1-c4-m4
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: 'GPU accelerators are not supported on the requested machine type:
      mls1-c4-m4'
    field: version.machine_type

但是如果我使用不同的机器类型，我知道我可以使用加速器，我会收到以下错误：

ERROR: (gcloud.alpha.ai-platform.versions.create) FAILED_PRECONDITION: Field: framework Error: Machine type n1-highcpu-4 does not support CUSTOM_CLASS.
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: Machine type n1-highcpu-4 does not support CUSTOM_CLASS.
    field: framework

就像任何支持 GPU 加速器的机器都不支持自定义类（需要 AFAIK 才能使用 Torch），任何支持自定义类的机器都不支持 GPU 加速器。

有什么办法让它起作用吗？

有很多关于如何在 Torch 上使用 ai-platform 的教程，但我看不出使用 gcloud 来训练和预测是否必须在 CPU 上做所有事情的意义所在，所以我觉得很奇怪。

Answer 1

至于现在，使用 Custom Prediction Routines is in Beta. In addition, using other machine types 而不是 mls1-c1-m2 也在 Beta 中。

然而，正如您在之前引用的 link 中看到的那样，GPU 不适用于类似 mls1 的机器。同时，这些是唯一允许在 TensorFlow 之外建立模型的机器类型。

总而言之，在 Torch 中部署您的预测模型并使用 GPU 现在可能不是一个可行的选择。

Answer 2

Pytorch + GPU 在 AI Platform Prediction 中不可用，但您可以使用 Deep Learning VM images 并使用 GPU 创建自定义 Pytorch 服务

更新：您现在可以将 AI Platform Prediction 与 containers 一起使用。

google云AI平台有没有办法同时使用GPU加速器和Torch进行模型部署？

Is there any way to use both a GPU accelerator and Torch in google cloud AI platform for model deployment?

pytorch

gcloud

google-cloud-ml