批量删除 hbase 行的 Spark 程序抛出 AbstractMethodError
Spark program to bulkdelete hbase rows throws AbstractMethodError
以下是我在 spark 应用程序中的代码块,用于从 hbase table、
中删除一组行键 (rePartitionedRowKeys)
hbaseContext.bulkDelete[Array[Byte]](rePartitionedRowKeys,
TableName.valueOf(hbaseTableName),
putRecord => new Delete(putRecord), batchSize)
pom.xml 中的相关依赖项是,
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-spark</artifactId>
<version>1.2.0-cdh5.7.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.0-cdh5.7.0</version>
<scope>provided</scope>
</dependency>
当我 运行 应用程序时,我收到其中一种日志方法的 AbstractMethodError,
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 4.0 failed 4 times, most recent failure: Lost task 3.3 in stage 4.0 (TID 17, c175vjt.int.westgroup.com, executor 2): java.lang.AbstractMethodError: org.apache.hadoop.hbase.spark.HBaseContext.initializeLogIfNecessary(Z)V
at org.apache.spark.Logging$class.log(Logging.scala:50)
at org.apache.hadoop.hbase.spark.HBaseContext.log(HBaseContext.scala:60)
at org.apache.spark.Logging$class.logDebug(Logging.scala:62)
at org.apache.hadoop.hbase.spark.HBaseContext.logDebug(HBaseContext.scala:60)
at org.apache.hadoop.hbase.spark.HBaseContext.applyCreds(HBaseContext.scala:235)
at org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$hbaseForeachPartition(HBaseContext.scala:482)
at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$org$apache$hadoop$hbase$spark$HBaseContext$$bulkMutation.apply(HBaseContext.scala:322)
at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$org$apache$hadoop$hbase$spark$HBaseContext$$bulkMutation.apply(HBaseContext.scala:322)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$$anonfun$apply.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$$anonfun$apply.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1866)
at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1866)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1457) ~[spark-assembly-1.6.0-cdh5.10.2-hadoop2.6.0-cdh5.10.2.jar:na]
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1445) ~[spark-assembly-1.6.0-cdh5.10.2-hadoop2.6.0-cdh5.10.2.jar:na]
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1444) ~[spark-assembly-1.6.0-cdh5.10.2-hadoop2.6.0-cdh5.10.2.jar:na]
我是否遗漏了任何依赖 jar 或者是由于 jar 冲突?
提前致谢
我在 Spark 1.5.1 中使用 CDH5.5.7 jar,但 spark-hbase 相关的 jar 指向 cdh5.7.0。
然后我将 CDH 版本升级到 5.10.2,并使所有其他 jar 兼容解决了 AbstractMethodError 的 cdh5.10.2。
以下是我在 spark 应用程序中的代码块,用于从 hbase table、
中删除一组行键 (rePartitionedRowKeys)hbaseContext.bulkDelete[Array[Byte]](rePartitionedRowKeys,
TableName.valueOf(hbaseTableName),
putRecord => new Delete(putRecord), batchSize)
pom.xml 中的相关依赖项是,
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-spark</artifactId>
<version>1.2.0-cdh5.7.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.0-cdh5.7.0</version>
<scope>provided</scope>
</dependency>
当我 运行 应用程序时,我收到其中一种日志方法的 AbstractMethodError,
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 4.0 failed 4 times, most recent failure: Lost task 3.3 in stage 4.0 (TID 17, c175vjt.int.westgroup.com, executor 2): java.lang.AbstractMethodError: org.apache.hadoop.hbase.spark.HBaseContext.initializeLogIfNecessary(Z)V
at org.apache.spark.Logging$class.log(Logging.scala:50)
at org.apache.hadoop.hbase.spark.HBaseContext.log(HBaseContext.scala:60)
at org.apache.spark.Logging$class.logDebug(Logging.scala:62)
at org.apache.hadoop.hbase.spark.HBaseContext.logDebug(HBaseContext.scala:60)
at org.apache.hadoop.hbase.spark.HBaseContext.applyCreds(HBaseContext.scala:235)
at org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$hbaseForeachPartition(HBaseContext.scala:482)
at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$org$apache$hadoop$hbase$spark$HBaseContext$$bulkMutation.apply(HBaseContext.scala:322)
at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$org$apache$hadoop$hbase$spark$HBaseContext$$bulkMutation.apply(HBaseContext.scala:322)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$$anonfun$apply.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$$anonfun$apply.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1866)
at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1866)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1457) ~[spark-assembly-1.6.0-cdh5.10.2-hadoop2.6.0-cdh5.10.2.jar:na]
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1445) ~[spark-assembly-1.6.0-cdh5.10.2-hadoop2.6.0-cdh5.10.2.jar:na]
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1444) ~[spark-assembly-1.6.0-cdh5.10.2-hadoop2.6.0-cdh5.10.2.jar:na]
我是否遗漏了任何依赖 jar 或者是由于 jar 冲突? 提前致谢
我在 Spark 1.5.1 中使用 CDH5.5.7 jar,但 spark-hbase 相关的 jar 指向 cdh5.7.0。
然后我将 CDH 版本升级到 5.10.2,并使所有其他 jar 兼容解决了 AbstractMethodError 的 cdh5.10.2。