Spark-HBase - GCP 模板 (3/3) - 缺少库?
Spark-HBase - GCP template (3/3) - Missing libraries?
我正在尝试在 GCP 上下文中测试 Spark-HBase 连接器并尝试遵循 the instructions, which asks to locally package the connector, and I get the following error when submitting the job on Dataproc
(after having completed )。
命令
(base) gcloud dataproc jobs submit spark --cluster $SPARK_CLUSTER --class com.example.bigtable.spark.shc.BigtableSource --jars target/scala-2.11/cloud-bigtable-dataproc-spark-shc-assembly-0.1.jar --region us-east1 -- $BIGTABLE_TABLE
错误
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
我找到了一种工作方式,通过在 build.sbt
中添加以下依赖项 - 感谢 @jccampanero 的指导!
libraryDependencies += "org.apache.hbase" % "hbase-common" % "2.0.2"
libraryDependencies += "org.apache.hbase" % "hbase-mapreduce" % "2.0.2"
输出 (Bigtablesource.scala
)
+------+-----+----+----+
| col0| col1|col2|col3|
+------+-----+----+----+
|row000| true| 0.0| 0|
|row001|false| 1.0| 1|
|row002| true| 2.0| 2|
|row003|false| 3.0| 3|
|row004| true| 4.0| 4|
|row005|false| 5.0| 5|
|row006| true| 6.0| 6|
|row007|false| 7.0| 7|
|row008| true| 8.0| 8|
|row009|false| 9.0| 9|
|row010| true|10.0| 10|
|row011|false|11.0| 11|
|row012| true|12.0| 12|
|row013|false|13.0| 13|
|row014| true|14.0| 14|
|row015|false|15.0| 15|
|row016| true|16.0| 16|
|row017|false|17.0| 17|
|row018| true|18.0| 18|
|row019|false|19.0| 19|
+------+-----+----+----+
only showing top 20 rows
+------+-----+
| col0| col1|
+------+-----+
|row000| true|
|row001|false|
|row002| true|
|row003|false|
|row004| true|
|row005|false|
+------+-----+
+------+-----+
| col0| col1|
+------+-----+
|row000| true|
|row001|false|
|row002| true|
|row003|false|
|row004| true|
|row005|false|
+------+-----+
+------+-----+
| col0| col1|
+------+-----+
|row251|false|
|row252| true|
|row253|false|
|row254| true|
|row255|false|
+------+-----+
+-----------+
|count(col1)|
+-----------+
| 50|
+-----------+
我正在尝试在 GCP 上下文中测试 Spark-HBase 连接器并尝试遵循 the instructions, which asks to locally package the connector, and I get the following error when submitting the job on Dataproc
(after having completed
命令
(base) gcloud dataproc jobs submit spark --cluster $SPARK_CLUSTER --class com.example.bigtable.spark.shc.BigtableSource --jars target/scala-2.11/cloud-bigtable-dataproc-spark-shc-assembly-0.1.jar --region us-east1 -- $BIGTABLE_TABLE
错误
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
我找到了一种工作方式,通过在 build.sbt
中添加以下依赖项 - 感谢 @jccampanero 的指导!
libraryDependencies += "org.apache.hbase" % "hbase-common" % "2.0.2"
libraryDependencies += "org.apache.hbase" % "hbase-mapreduce" % "2.0.2"
输出 (Bigtablesource.scala
)
+------+-----+----+----+
| col0| col1|col2|col3|
+------+-----+----+----+
|row000| true| 0.0| 0|
|row001|false| 1.0| 1|
|row002| true| 2.0| 2|
|row003|false| 3.0| 3|
|row004| true| 4.0| 4|
|row005|false| 5.0| 5|
|row006| true| 6.0| 6|
|row007|false| 7.0| 7|
|row008| true| 8.0| 8|
|row009|false| 9.0| 9|
|row010| true|10.0| 10|
|row011|false|11.0| 11|
|row012| true|12.0| 12|
|row013|false|13.0| 13|
|row014| true|14.0| 14|
|row015|false|15.0| 15|
|row016| true|16.0| 16|
|row017|false|17.0| 17|
|row018| true|18.0| 18|
|row019|false|19.0| 19|
+------+-----+----+----+
only showing top 20 rows
+------+-----+
| col0| col1|
+------+-----+
|row000| true|
|row001|false|
|row002| true|
|row003|false|
|row004| true|
|row005|false|
+------+-----+
+------+-----+
| col0| col1|
+------+-----+
|row000| true|
|row001|false|
|row002| true|
|row003|false|
|row004| true|
|row005|false|
+------+-----+
+------+-----+
| col0| col1|
+------+-----+
|row251|false|
|row252| true|
|row253|false|
|row254| true|
|row255|false|
+------+-----+
+-----------+
|count(col1)|
+-----------+
| 50|
+-----------+