Google Dataproc 初始化脚本错误找不到文件
Google Dataproc initialization script error File not Found
我正在使用 Google Dataproc 来初始化 Jupyter 集群。
起初我使用了 github 中可用的 "dataproc-initialization-actions",它非常有效。
这是文档中可用的创建集群调用:
gcloud dataproc clusters create my-dataproc-cluster \
--metadata "JUPYTER_PORT=8124" \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--bucket my-dataproc-bucket \
--num-workers 2 \
--properties spark:spark.executorEnv.PYTHONHASHSEED=0,spark:spark.yarn.am.memory=1024m \
--worker-machine-type=n1-standard-4 \
--master-machine-type=n1-standard-4
但我想自定义它,所以我得到了初始化文件并将其保存在我的 Google 存储中(在我尝试创建集群的同一项目下)。因此,我将调用更改为指向我的脚本,如下所示:
gcloud dataproc clusters create my-dataproc-cluster \
--metadata "JUPYTER_PORT=8124" \
--initialization-actions \
gs://myjupyterbucketname/jupyter.sh \
--bucket my-dataproc-bucket \
--num-workers 2 \
--properties spark:spark.executorEnv.PYTHONHASHSEED=0,spark:spark.yarn.am.memory=1024m \
--worker-machine-type=n1-standard-4 \
--master-machine-type=n1-standard-4
但是运行我得到了以下错误:
Waiting on operation [projects/myprojectname/regions/global/operations/cf20
466c-ccb1-4c0c-aae6-fac0b99c9a35].
Waiting for cluster creation operation...done.
ERROR: (gcloud.dataproc.clusters.create) Operation [projects/myprojectname/
regions/global/operations/cf20466c-ccb1-4c0c-aae6-fac0b99c9a35] failed: Multiple
Errors:
- Google Cloud Dataproc Agent reports failure. If logs are available, they can
be found in 'gs://myjupyterbucketname/google-cloud-dataproc-metainfo/231e5160-75f3-
487c-9cc3-06a5918b77f5/my-dataproc-cluster-m'.
- Google Cloud Dataproc Agent reports failure. If logs are available, they can
be found in 'gs://myjupyterbucketname/google-cloud-dataproc-metainfo/231e5160-75f3-
487c-9cc3-06a5918b77f5/my-dataproc-cluster-w-1'..
嗯,那里的文件,所以我认为这可能不是某些访问权限问题。名为 "dataproc-initialization-script-0_output" 的文件包含以下内容:
/usr/bin/env: bash: No such file or directory
有什么想法吗?
我正在使用 Google Dataproc 来初始化 Jupyter 集群。 起初我使用了 github 中可用的 "dataproc-initialization-actions",它非常有效。
这是文档中可用的创建集群调用:
gcloud dataproc clusters create my-dataproc-cluster \
--metadata "JUPYTER_PORT=8124" \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--bucket my-dataproc-bucket \
--num-workers 2 \
--properties spark:spark.executorEnv.PYTHONHASHSEED=0,spark:spark.yarn.am.memory=1024m \
--worker-machine-type=n1-standard-4 \
--master-machine-type=n1-standard-4
但我想自定义它,所以我得到了初始化文件并将其保存在我的 Google 存储中(在我尝试创建集群的同一项目下)。因此,我将调用更改为指向我的脚本,如下所示:
gcloud dataproc clusters create my-dataproc-cluster \
--metadata "JUPYTER_PORT=8124" \
--initialization-actions \
gs://myjupyterbucketname/jupyter.sh \
--bucket my-dataproc-bucket \
--num-workers 2 \
--properties spark:spark.executorEnv.PYTHONHASHSEED=0,spark:spark.yarn.am.memory=1024m \
--worker-machine-type=n1-standard-4 \
--master-machine-type=n1-standard-4
但是运行我得到了以下错误:
Waiting on operation [projects/myprojectname/regions/global/operations/cf20
466c-ccb1-4c0c-aae6-fac0b99c9a35].
Waiting for cluster creation operation...done.
ERROR: (gcloud.dataproc.clusters.create) Operation [projects/myprojectname/
regions/global/operations/cf20466c-ccb1-4c0c-aae6-fac0b99c9a35] failed: Multiple
Errors:
- Google Cloud Dataproc Agent reports failure. If logs are available, they can
be found in 'gs://myjupyterbucketname/google-cloud-dataproc-metainfo/231e5160-75f3-
487c-9cc3-06a5918b77f5/my-dataproc-cluster-m'.
- Google Cloud Dataproc Agent reports failure. If logs are available, they can
be found in 'gs://myjupyterbucketname/google-cloud-dataproc-metainfo/231e5160-75f3-
487c-9cc3-06a5918b77f5/my-dataproc-cluster-w-1'..
嗯,那里的文件,所以我认为这可能不是某些访问权限问题。名为 "dataproc-initialization-script-0_output" 的文件包含以下内容:
/usr/bin/env: bash: No such file or directory
有什么想法吗?