为什么执行 gcloud ml-engine 作业时加速器不足？

Question

我正在尝试运行 Google Cloud 中的机器学习 Jon，但它总是告诉我可用的加速器不足，我已经尝试使用参数 ----scale-tier=BASIC | BASIC_GPU | STANDARD_1 | PREMIUM_1.并且是相同的结果。

命令和结果如下：

gcloud ml-engine jobs submit training object_detection_`date +%s`     --job-dir=gs://${TRAIN_DIR}     --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz     --module-name object_detection.train     --region us-central1     --config ${PATH_TO_LOCAL_YAML_FILE}     --     --train_dir=gs://${TRAIN_DIR}     --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
ERROR: (gcloud.ml-engine.jobs.submit.training) RESOURCE_EXHAUSTED: Field: scale_tier Error: Insufficient accelerators are available in region us-central1 to schedule the job which requests 6 K80 accelerators. Please wait and try again or else try submitting your job to a different region.
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: Insufficient accelerators are available in region us-central1 to
      schedule the job which requests 6 K80 accelerators. Please wait and try again
      or else try submitting your job to a different region.
    field: scale_tier

Answer 1

us-central1 对 GPU 的需求量很大。如果可能的话，我建议运行你在 us-east1 的工作，直到有更多 GPU 可用为止。

为什么执行 gcloud ml-engine 作业时加速器不足？

Why there are insufficient accelerators when I execute gcloud ml-engine jobs?

google-cloud-platform

google-cloud-ml-engine