GCP 上的 Pytorch：此端点上的机器类型不可用

Question

我是 GCP 的新手，所以请原谅asking/missing这里有一些明显的东西。

我正在尝试使用自定义 pytorch 模型在 GCP 上部署和创建版本资源。到目前为止，一切都运行良好，直到我尝试创建模型的新版本。然后我不断得到： INVALID_ARGUMENT: 机器类型在此端点上不可用。

我试过在他们的列表 here 中的不同类型之间切换，但没有成功。我错过了什么？

这是我运行部署的脚本：

MODEL_NAME='test_iris'
MODEL_VERSION='v1'
RUNTIME_VERSION='2.4'
MODEL_CLASS='model.PyTorchIrisClassifier'
PYTORCH_PACKAGE='gs://${BUCKET_NAME}/packages/torch-1.8.1+cpu-cp37-cp37m-linux_x86_64.whl'

DIST_PACKAGE='gs://${BUCKET_NAME}/models/Test_model-0.1.tar.gz'
GCS_MODEL_DIR='models/'
REGION="europe-west1"


# Creating model on AI platform
gcloud alpha ai-platform models create ${MODEL_NAME}\
--region=europe-west1 --enable-logging \
--enable-console-logging

gcloud beta ai-platform versions create ${MODEL_VERSION} --model=${MODEL_NAME} \
    --origin=gs://${BUCKET_NAME}/${GCS_MODEL_DIR} \
    --python-version=3.7 \
    --machine-type=mls1-c4-m2\
    --runtime-version=${RUNTIME_VERSION} \
    --package-uris=${DIST_PACKAGE},${PYTORCH_PACKAGE} \
    --prediction-class=${MODEL_CLASS}

谢谢！

Answer 1

根据 documentation, you can only deploy a Custom prediction routine when using a legacy (MLS1) machine type for your model version. However, you can not use a regional endpoint with this type of machine, as stated here,

Regional endpoints only support Compute Engine (N1) machine types. You cannot use legacy (MLS1) machine types on regional endpoints.

如我所见，您指定了带有 --region 标志的区域端点，它不支持您的用例所需的机器类型。因此，您需要将模型及其版本更改为全局端点，这样您就不会再遇到错误了。

此外，当您在 gcloud create model --region 内指定区域端点时，您需要在创建模型版本时指定相同的区域。另一方面，在全局端点 gcloud create model --regions 中创建模型时，可以在命令 gcloud ai-platform versions create 中省略区域标志。 注意--regions命令仅用于全局端点

最后，我必须指出，根据 documentation，在为全球端点选择区域时，在创建模型时使用 --regions 标志，您的预测指定区域中的节点运行。虽然，管理您的资源的 AI Platform Prediction 基础设施可能不一定运行在同一地区。

GCP 上的 Pytorch：此端点上的机器类型不可用

Pytorch on GCP: Machine type is not available on this endpoint

continuous-integration

continuous-deployment

google-cloud-platform

pytorch

mlops