创建 Google Cloud Dataproc 集群时出错 - 无法访问初始化代理脚本
Error Creating Google Cloud Dataproc Cluster - no access to initialization proxy script
我正在尝试使用以下命令创建我的第一个 Google Cloud Dataproc 集群:
gcloud dataproc clusters create hive-cluster \
--scopes sql-admin \
--image-version 1.3 \
--initialization-actions "gs://goog-dataproc-${PROJECT}:${REGION}:hive-metastore" \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 15 \
--num-workers 2 \
--worker-machine-type n1-standard-1 \
--worker-boot-disk-size 15 \
--region us-east1 \
--zone us-east1-b
但是,我收到以下错误:
Dataproc could not validate the initialization action using the service-owned service accounts. Cluster creation may still succeed if the initialization action is accessible from GCE VMs.
Reason: service-1456309104734317@dataproc-accounts.iam.gserviceaccount.com does not have storage.objects.get access to goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh.
Waiting for cluster creation operation...done.
ERROR: (gcloud.dataproc.clusters.create) Operation [projects/traits-seater-824109/regions/us-east1/operations/5b36fb82-ade2-3d5f-a6bd-cb1a206bb54e] failed: Multiple Errors:
- Error downloading script 'gs://goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh': 1456309104734317-compute@developer.gserviceaccount.com does not have storage.objects.get access to goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh.
我检查了 IAM 中的权限,并将存储-> 对象查看器角色授予了上面错误消息中提到的服务帐户,但我仍然遇到相同的错误。
有什么建议可以解决这个错误吗?
问题可能出在scopes
you provided when creating the cluster. You only restrict your cluster to access the sql-admin
API (https://www.googleapis.com/auth/sqlservice.admin)。
您可能需要添加 storage-ro
范围(或 https://www.googleapis.com/auth/devstorage.read_only):
gcloud dataproc clusters create hive-cluster \
--scopes sql-admin,storage-ro \
[...]
如果没有 storage-ro
范围,即使存储桶 goog-dataproc-initialization-actions-us-east1
是 public,我认为 Dataproc 集群将无法从 GCS 检索文件。
Dataproc 区域托管版本的初始化操作的权限设置似乎存在暂时性问题——从长远来看,这些区域副本确实是您应该使用的,以更好地隔离初始化操作的区域可靠性,并且避免跨区域复制 init 操作,但与此同时,您可以使用 init 操作的共享 "global" 副本:
gcloud dataproc clusters create hive-cluster \
--initialization-actions gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
...
我正在尝试使用以下命令创建我的第一个 Google Cloud Dataproc 集群:
gcloud dataproc clusters create hive-cluster \
--scopes sql-admin \
--image-version 1.3 \
--initialization-actions "gs://goog-dataproc-${PROJECT}:${REGION}:hive-metastore" \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 15 \
--num-workers 2 \
--worker-machine-type n1-standard-1 \
--worker-boot-disk-size 15 \
--region us-east1 \
--zone us-east1-b
但是,我收到以下错误:
Dataproc could not validate the initialization action using the service-owned service accounts. Cluster creation may still succeed if the initialization action is accessible from GCE VMs.
Reason: service-1456309104734317@dataproc-accounts.iam.gserviceaccount.com does not have storage.objects.get access to goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh.
Waiting for cluster creation operation...done.
ERROR: (gcloud.dataproc.clusters.create) Operation [projects/traits-seater-824109/regions/us-east1/operations/5b36fb82-ade2-3d5f-a6bd-cb1a206bb54e] failed: Multiple Errors:
- Error downloading script 'gs://goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh': 1456309104734317-compute@developer.gserviceaccount.com does not have storage.objects.get access to goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh.
我检查了 IAM 中的权限,并将存储-> 对象查看器角色授予了上面错误消息中提到的服务帐户,但我仍然遇到相同的错误。 有什么建议可以解决这个错误吗?
问题可能出在scopes
you provided when creating the cluster. You only restrict your cluster to access the sql-admin
API (https://www.googleapis.com/auth/sqlservice.admin)。
您可能需要添加 storage-ro
范围(或 https://www.googleapis.com/auth/devstorage.read_only):
gcloud dataproc clusters create hive-cluster \
--scopes sql-admin,storage-ro \
[...]
如果没有 storage-ro
范围,即使存储桶 goog-dataproc-initialization-actions-us-east1
是 public,我认为 Dataproc 集群将无法从 GCS 检索文件。
Dataproc 区域托管版本的初始化操作的权限设置似乎存在暂时性问题——从长远来看,这些区域副本确实是您应该使用的,以更好地隔离初始化操作的区域可靠性,并且避免跨区域复制 init 操作,但与此同时,您可以使用 init 操作的共享 "global" 副本:
gcloud dataproc clusters create hive-cluster \
--initialization-actions gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
...