GCP 部署实例因 ansible 脚本失败

GCP deploy instance fails from ansible script

一年多以来,我一直在通过 ansible 脚本在 GCP 中部署集群,但突然间,我的一个脚本一直给我这个错误:

libcloud.common.google.GoogleBaseError: u\"The zone 'projects/[project]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.

明显的原因是我没有足够的资源,但没有太多改变,配额看起来不错:

ansible 脚本本身要求不高。 我正在使用 100GB SSD 创建 3 个 n1-standard-4 实例。 请参阅下面的脚本片段:

tasks:
    - name: create boot disks
      gce_pd:
          disk_type: pd-ssd
          image: "debian-9-stretch-v20171025"
          name: "{{ item.node }}-disk"
          size_gb: 100
          state: present
          zone: "europe-west1-d"
          service_account_email: "{{ service_account_email }}"          
          credentials_file: "{{ credentials_file }}"
          project_id: "{{ project_id }}"          
      with_items: "{{nodes}}"
      async: 3600
      poll: 2

    - name: create instances
      gce:        
        instance_names: "{{item.node}}"
        zone: "europe-west1-d"
        machine_type: "n1-standard-4"        
        preemptible: "{{ false if item.num == '0' else true }}"        
        disk_auto_delete: true
        disks:
          - name: "{{ item.node }}-disk"
            mode: READ_WRITE
        state: present
        service_account_email: "{{ service_account_email }}"
        service_account_permissions: "compute-rw"
        credentials_file: "{{ credentials_file }}"
        project_id: "{{ project_id }}"
        tags: "elasticsearch"        
      register: gce_raw_results
      with_items: "{{nodes}}"
      async: 3600
      poll: 2

更新 1:


完整的错误是:

TASK [Gathering Facts] ****************************************************************************************************************************************************************************************************************************************************************************************************** ok: [localhost]

TASK [create boot disks] **************************************************************************************************************************************************************************************************************************************************************************************************** changed: [localhost] => (item={u'node': u'elasticsearch-link-0', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'0', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) changed: [localhost] => (item={u'node': u'elasticsearch-link-1', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'1', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) ok: [localhost] => (item={u'node': u'elasticsearch-link-2', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'2', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'})

TASK [create instances] ***************************************************************************************************************************************************************************************************************************************************************************************************** changed: [localhost] => (item={u'node': u'elasticsearch-link-0', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'0', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) changed: [localhost] => (item={u'node': u'elasticsearch-link-1', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'1', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) failed: [localhost] (item={u'node': u'elasticsearch-link-2', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'2', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) => {"ansible_job_id": "371957735383.2688", "changed": false, "cmd": "/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/gce.py", "data": "", "failed": 1, "finished": 1, "item": {"cluster_name": "elasticsearch-link", "ip_field": "private_ip", "machine_type": "n1-standard-4", "node": "elasticsearch-link-2", "num": "2", "project_id": "[projectid]", "zone": "europe-west1-d"}, "msg": "Traceback (most recent call last):\n File \"/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/async_wrapper.py\", line 158, in _run_module\n (filtered_outdata, json_warnings) = _filter_non_json_lines(outdata)\n File \"/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/async_wrapper.py\", line 99, in _filter_non_json_lines\n raise ValueError('No start of json char found')\nValueError: No start of json char found\n", "stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 750, in \n main()\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 712, in main\n module, gce, inames, number)\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 524, in create_instances\n instance, lc_machine_type, lc_image(), **gce_args\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 3874, in create_node\n self.connection.async_request(request, method='POST', data=node_data)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 784, in async_request\n response = request(**kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 121, in request\n response = super(GCEConnection, self).request(*args, **kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 806, in request\n *args, **kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 641, in request\n response = responseCls(**kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 163, in init\n self.object = self.parse_body()\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 268, in parse_body\n raise GoogleBaseError(message, self.status, code)\nlibcloud.common.google.GoogleBaseError: u\"The zone 'projects/[projectid]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.\"\n", "stderr_lines": ["Traceback (most recent call last):", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 750, in ", " main()", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 712, in main", "
module, gce, inames, number)", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 524, in create_instances", " instance, lc_machine_type, lc_image(), **gce_args", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 3874, in create_node", "
self.connection.async_request(request, method='POST', data=node_data)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 784, in async_request", " response = request(**kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 121, in request", " response = super(GCEConnection, self).request(*args, **kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 806, in request", " *args, **kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 641, in request", " response = responseCls(**kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 163, in init", " self.object = self.parse_body()", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 268, in parse_body", " raise GoogleBaseError(message, self.status, code)", "libcloud.common.google.GoogleBaseError: u\"The zone 'projects/[projectid]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.\""]} to retry, use: --limit @/usr/local/airflow/ansible/playbooks/elasticsearch-link-cluster-create.retry

错误消息并未显示配额错误,而是区域资源问题,我建议您尝试新区域。

引用自documentation

Even if you have a regional quota, it is possible that a resource might not be available in a specific zone. For example, you might have quota in region us-central1 to create VM instances, but might not be able to create VM instances in the zone us-central1-a if the zone is depleted. In such cases, try creating the same resource in another zone, such as us-central1-f.

因此,在创建脚本时,您应该考虑到这种可能性,即使这种可能性并不常见。

这个问题在 preentible 个实例中更加突出,因为:

Preemptible instances are finite Compute Engine resources, so they might not always be available. [...] these instances if it requires access to those resources for other tasks. Preemptible instances are excess Compute Engine capacity so their availability varies with usage.

更新

要仔细检查我在说什么,您可以尝试保留 preentible 标志并更改区域以确保脚本正常工作并且它是在晚上发生的缺货(并且因为在白天它工作这个应该是这样的)。

  • 如果问题真的是可用性 -|您可能会考虑启动 preentible 实例,如果不可用,捕获错误,然后依赖普通实例或依赖其他区域 |-

更新2

正如我承诺的那样,我代表您创建了功能请求,您可以在 public 跟踪器上关注更新。 我建议您启动它以便通过电子邮件接收更新: