(bdutil) 无法让 hadoop/spark 集群使用全新安装

Question

我正在 GCE 中设置一个小型集群来使用它，但是尽管创建了一些实例，但一些故障阻止了它的工作。我正在按照 https://cloud.google.com/hadoop/downloads

中的步骤操作

到目前为止，我正在使用（截至目前）最新版本的 gcloud (143.0.0) 和 bdutil (1.3.5)，新安装。

./bdutil deploy -e extensions/spark/spark_env.sh

使用 debian-8 作为映像（因为 bdutil 仍然使用 debian-7-backports）。

在某个时候我得到了

Fri Feb 10 16:19:34 CET 2017: Command failed: wait ${SUBPROC} on line 326.
Fri Feb 10 16:19:34 CET 2017: Exit code of failed command: 1

完整的调试输出在 https://gist.github.com/jlorper/4299a816fc0b140575ed70fe0da1f272 （项目 ID 和存储桶名称已更改）

创建了实例，但还没有安装 spark。挖掘了一下，我已经设法运行在 ssh 之后在 master 中启动安装和启动 hadoop 命令。但是在启动 spark-shell:

时它严重失败

17/02/10 15:53:20 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop1
17/02/10 15:53:20 INFO gcsio.FileSystemBackedDirectoryListCache: Creating '/hadoop_gcs_connector_metadata_cache' with createDirectories()...
java.lang.RuntimeException: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /hadoop_gcs_connector_metadata_cache
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)

并且无法import sparkSQL。对于我读过的内容，一切都应该自动启动。

到目前为止，我有点迷茫，不知道还能做什么。我错过了任何一步吗？是否有任何命令错误？提前致谢。

更新：已解决

正如已接受的解决方案中所指出的，我克隆了存储库，并且集群的创建没有问题。尝试启动 spark-shell 虽然它给了

java.lang.RuntimeException: java.io.IOException: GoogleHadoopFileSystem has been closed or not initialized.`

这听起来像是连接器没有正确初始化，所以在运行ning

之后

 ./bdutil --env_var_files extensions/spark/spark_env.sh,bigquery_env.sh run_command_group install_connectors

它按预期工作。

Answer 1

https://cloud.google.com/hadoop/downloads is a bit stale and I'd instead recommend using the version of bdutil at head on github: https://github.com/GoogleCloudPlatform/bdutil 上的最新版本的 bdutil。

(bdutil) 无法让 hadoop/spark 集群使用全新安装

(bdutil) Unable to get hadoop/spark cluster working with a fresh install

apache-spark

google-hadoop

更新：已解决