GCS Hadoop connector error: ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer ls: No FileSystem for scheme gs
GCS Hadoop connector error: ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer ls: No FileSystem for scheme gs
我正在尝试在本地 Ubuntu 20.04 和 运行 测试命令 hadoop fs -ls gs://my-bucket
上设置 hadoop-connectors,但我不断收到如下错误:
$ hadoop fs -ls gs://my-bucket
2020-08-22 03:29:06,976 WARN fs.FileSystem: Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem Unable to get public no-arg constructor
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.NoClassDefFoundError: com/google/api/client/http/HttpRequestInitializer
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer
ls: No FileSystem for scheme "gs"
请注意,我可以使用 gsutil ls gs://my-bucket
访问存储桶。
我已经从 here 下载了 gcs-connector-hadoop3-latest.jar
并将其放在 /usr/local/hadoop/share/hadoop/common/lib
中。我希望这是这个 jar 文件的正确位置?
我已使用 here 列出的属性配置 core-site.xml
,并将 GOOGLE_APPLICATION_CREDENTIALS
设置为我的服务帐户密钥文件。在 hadoop-env.sh
我出口了
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export HADOOP_CLASSPATH+="$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/common/lib/*.jar:$HADOOP_HOME/lib/*.jar"
不确定我是否正确设置了 HADOOP_CLASSPATH
以及 hadoop
是否识别了 /usr/local/hadoop/share/hadoop/common/lib
中的 jar 文件?和 /usr/local/hadoop/lib
有什么区别?
这里是core-site.xml
的相关内容:
<configuration>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>The AbstractFileSystem for gs: uris.</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>my-project-id</value>
<description>
Optional. Google Cloud Project ID with access to GCS buckets.
Required only for list buckets and create bucket operations.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
<description>
Whether to use a service account for GCS authorization.
Setting this property to `false` will disable use of service accounts for
authentication.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>/path/to/service-account.json</value>
<description>
The JSON key file of the service account used for GCS
access when google.cloud.auth.service.account.enable is true.
</description>
</property>
</configuration>
$ java --version
openjdk 11.0.8 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)
$ hadoop version
Hadoop 3.3.0
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
bashrc:
...
export PDSH_RCMD_TYPE=ssh
export HADOOP_HOME="/usr/local/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
看来重启有助于解决问题。重新启动后,命令 hadoop fs -ls gs://my-bucket
起作用并按预期列出存储桶的内容。
感谢@IgorDvorzhak 提供命令:hadoop classpath --glob
检查是否可以找到 gcs-connector-hadoop3-latest.jar
。我用过:
hadoop classpath --glob | grep gcs-connector
我正在尝试在本地 Ubuntu 20.04 和 运行 测试命令 hadoop fs -ls gs://my-bucket
上设置 hadoop-connectors,但我不断收到如下错误:
$ hadoop fs -ls gs://my-bucket
2020-08-22 03:29:06,976 WARN fs.FileSystem: Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem Unable to get public no-arg constructor
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.NoClassDefFoundError: com/google/api/client/http/HttpRequestInitializer
2020-08-22 03:29:06,977 WARN fs.FileSystem: java.lang.ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer
ls: No FileSystem for scheme "gs"
请注意,我可以使用 gsutil ls gs://my-bucket
访问存储桶。
我已经从 here 下载了 gcs-connector-hadoop3-latest.jar
并将其放在 /usr/local/hadoop/share/hadoop/common/lib
中。我希望这是这个 jar 文件的正确位置?
我已使用 here 列出的属性配置 core-site.xml
,并将 GOOGLE_APPLICATION_CREDENTIALS
设置为我的服务帐户密钥文件。在 hadoop-env.sh
我出口了
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export HADOOP_CLASSPATH+="$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/common/lib/*.jar:$HADOOP_HOME/lib/*.jar"
不确定我是否正确设置了 HADOOP_CLASSPATH
以及 hadoop
是否识别了 /usr/local/hadoop/share/hadoop/common/lib
中的 jar 文件?和 /usr/local/hadoop/lib
有什么区别?
这里是core-site.xml
的相关内容:
<configuration>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>The AbstractFileSystem for gs: uris.</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>my-project-id</value>
<description>
Optional. Google Cloud Project ID with access to GCS buckets.
Required only for list buckets and create bucket operations.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
<description>
Whether to use a service account for GCS authorization.
Setting this property to `false` will disable use of service accounts for
authentication.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>/path/to/service-account.json</value>
<description>
The JSON key file of the service account used for GCS
access when google.cloud.auth.service.account.enable is true.
</description>
</property>
</configuration>
$ java --version
openjdk 11.0.8 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)
$ hadoop version
Hadoop 3.3.0
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
bashrc:
...
export PDSH_RCMD_TYPE=ssh
export HADOOP_HOME="/usr/local/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
看来重启有助于解决问题。重新启动后,命令 hadoop fs -ls gs://my-bucket
起作用并按预期列出存储桶的内容。
感谢@IgorDvorzhak 提供命令:hadoop classpath --glob
检查是否可以找到 gcs-connector-hadoop3-latest.jar
。我用过:
hadoop classpath --glob | grep gcs-connector