Google Cloud Engine:在 Hadoop 的命令行安装期间未安装 LibSnappy errur

Google Cloud Engine : LibSnappy not installed errur during command-line installation of Hadoop

我正在尝试使用 command line option 在 Google Compute Engine 上安装自定义 Hadoop 实现 (>2.0)。我的bdutil_env.sh文件修改参数如下:

GCE_IMAGE='ubuntu-14-04'
GCE_MACHINE_TYPE='n1-standard-1'
GCE_ZONE='us-central1-a'
DEFAULT_FS='hdfs'
HADOOP_TARBALL_URI='gs://<mybucket>/<my_hadoop_tar.gz>'

./bdutil deploy 失败,退出代码为 1。我在生成的 debug.info 文件中发现以下错误:

    ssh: connect to host 130.211.161.181 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
ssh: connect to host 104.197.63.39 port 22: Connection refused
ssh: connect to host 104.197.7.106 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
.....
.....
Connection to 104.197.7.106 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [123].
Connection to 104.197.63.39 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [123].
Connection to 130.211.161.181 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [123].
...
...
hadoop-w-1: ==> deploy-core-setup_deploy.stderr <==
....
....
hadoop-w-1: dpkg-query: package 'libsnappy1' is not installed and no information is available
hadoop-w-1: Use dpkg --info (= dpkg-deb --info) to examine archive files,
hadoop-w-1: and dpkg --contents (= dpkg-deb --contents) to list their contents.
hadoop-w-1: dpkg-preconfigure: unable to re-open stdin: No such file or directory
hadoop-w-1: dpkg-query: package 'libsnappy-dev' is not installed and no information is available
hadoop-w-1: Use dpkg --info (= dpkg-deb --info) to examine archive files,
hadoop-w-1: and dpkg --contents (= dpkg-deb --contents) to list their contents.
hadoop-w-1: dpkg-preconfigure: unable to re-open stdin: No such file or directory
hadoop-w-1: ./hadoop-env-setup.sh: line 612: Package:: command not found
....
....
hadoop-w-1: find: `/home/hadoop/hadoop-install/lib': No such file or directory

不明白为什么初始ssh报错;我可以看到虚拟机并从 UI 正确登录到它们;我的tar.gz也被复制到合适的地方。

我也不明白为什么没有安装libsnappy;我有什么特别需要做的吗? shell 脚本似乎有安装它的命令,但它以某种方式失败了。

我检查了所有的虚拟机; Hadoop 未启动。

编辑:为了解决 ssh 问题,我 运行 以下命令:

gcutil --project= addfirewall --allowed=tcp:22 default-ssh

没有区别。

查看您的错误代码,您似乎必须在类路径中下载 snappy 库。如果您使用的是 java,那么您可以从此路径 https://github.com/xerial/snappy-java. OR try this link https://code.google.com/p/snappy/.

下载您的库

在这种情况下,ssh 和 libsnappy 错误是转移注意力的错误;当虚拟机不能立即使用 SSH 时,bdutil 会轮询一段时间,直到它应该打印出如下内容:

...Thu May 14 16:52:23 PDT 2015: Waiting on async 'wait_for_ssh' jobs to finish. Might take a while...
...
Thu May 14 16:52:33 PDT 2015: Instances all ssh-able

同样,您看到的 libsnappy 错误是一个转移注意力的错误,因为它来自对 dpkg -s 的调用,试图确定是否确实安装了软件包,如果没有,则进行 apt-get 安装:https://github.com/GoogleCloudPlatform/bdutil/blob/master/libexec/bdutil_helpers.sh#L163

我们将努力清理这些错误消息,因为它们可能具有误导性。与此同时,这里的主要问题是 Ubuntu 在历史上并不是 bdutil 支持的图像之一;我们彻底验证了 CentOS 和 Debian 映像,但没有 Ubuntu 映像,因为它们只有 added as GCE options in November 2014. Your deployment should work fine with your custom tarball for any debian-7 or centos-6 image. We've filed an issue on GitHub to track Ubuntu support for bdutil: https://github.com/GoogleCloudPlatform/bdutil/issues/29

编辑:问题有 been resolved with Ubuntu now supported at head in the master repository; you can download at this most recent commit here