shc-核心:NoSuchMethodError org.apache.hadoop.hbase.client.Put.addColumn
shc-core: NoSuchMethodError org.apache.hadoop.hbase.client.Put.addColumn
我尝试使用 shc-core 通过 spark 将 spark 数据帧保存到 hbase 中。
我的版本:
- hbase: 1.1.2.2.6.4.0-91
- 火花:1.6
- scala:2.10
- shc: 1.1.1-1.6-s_2.10
- hdp: 2.6.4.0-91
配置如下所示:
val schema_array = s"""{"type": "array", "items": ["string","null"]}""".stripMargin
def catalog: String = s"""{
|"table":{"namespace":"default", "name":"tblename"},
|"rowkey":"id",
|"columns":{
|"id":{"cf":"rowkey", "col":"id", "type":"string"},
|"col1":{"cf":"data", "col":"col1", "avro":"schema_array"}
|}
|}""".stripMargin
df
.write
.options(Map(
"schema_array"-> schema_array,
HBaseTableCatalog.tableCatalog -> catalog,
HBaseTableCatalog.newTable -> "5"
))
.format("org.apache.spark.sql.execution.datasources.hbase")
.save()
有时它会按预期正常工作并创建 table 并将所有数据保存到 hbase 中。但有时会因以下错误而失败:
Lost task 35.0 in stage 9.0 (TID 301, host): java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.addColumn([B[B[B)Lorg/apache/hadoop/hbase/client/Put;
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut.apply(HBaseRelation.scala:211)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut.apply(HBaseRelation.scala:210)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut(HBaseRelation.scala:210)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$insert.apply(HBaseRelation.scala:219)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$insert.apply(HBaseRelation.scala:219)
at scala.collection.Iterator$$anon.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun$$anonfun$apply.apply$mcV$sp(PairRDDFunctions.scala:1112)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun$$anonfun$apply.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun$$anonfun$apply.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun.apply(PairRDDFunctions.scala:1119)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun.apply(PairRDDFunctions.scala:1091)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
有什么想法吗?
这实际上是一个 class 路径问题 - 我有两个不同版本的 hbase 客户端。
我尝试使用 shc-core 通过 spark 将 spark 数据帧保存到 hbase 中。 我的版本:
- hbase: 1.1.2.2.6.4.0-91
- 火花:1.6
- scala:2.10
- shc: 1.1.1-1.6-s_2.10
- hdp: 2.6.4.0-91
配置如下所示:
val schema_array = s"""{"type": "array", "items": ["string","null"]}""".stripMargin
def catalog: String = s"""{
|"table":{"namespace":"default", "name":"tblename"},
|"rowkey":"id",
|"columns":{
|"id":{"cf":"rowkey", "col":"id", "type":"string"},
|"col1":{"cf":"data", "col":"col1", "avro":"schema_array"}
|}
|}""".stripMargin
df
.write
.options(Map(
"schema_array"-> schema_array,
HBaseTableCatalog.tableCatalog -> catalog,
HBaseTableCatalog.newTable -> "5"
))
.format("org.apache.spark.sql.execution.datasources.hbase")
.save()
有时它会按预期正常工作并创建 table 并将所有数据保存到 hbase 中。但有时会因以下错误而失败:
Lost task 35.0 in stage 9.0 (TID 301, host): java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.addColumn([B[B[B)Lorg/apache/hadoop/hbase/client/Put;
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut.apply(HBaseRelation.scala:211)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut.apply(HBaseRelation.scala:210)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut(HBaseRelation.scala:210)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$insert.apply(HBaseRelation.scala:219)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$insert.apply(HBaseRelation.scala:219)
at scala.collection.Iterator$$anon.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun$$anonfun$apply.apply$mcV$sp(PairRDDFunctions.scala:1112)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun$$anonfun$apply.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun$$anonfun$apply.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun.apply(PairRDDFunctions.scala:1119)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$$anonfun.apply(PairRDDFunctions.scala:1091)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
有什么想法吗?
这实际上是一个 class 路径问题 - 我有两个不同版本的 hbase 客户端。