使用 saveToPhoenix 方法 load/save Hbase 上的 RDD 时出现异常

Exception when using the saveToPhoenix method to load/save a RDD on Hbase

我想使用apache-phoenix 框架。 问题是我一直有异常告诉我找不到 class HBaseConfiguration。 这是我要使用的代码:

import org.apache.spark.SparkContext
import org.apache.spark.sql._
import org.apache.phoenix.spark._

// Load INPUT_TABLE
object MainTest2 extends App {
  val sc = new SparkContext("local", "phoenix-test")
  val sqlContext = new SQLContext(sc)
  val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> "INPUT_TABLE",
    "zkUrl" -> "localhost:3888"))
}

这是我正在使用的 SBT:

name := "spark-to-hbase"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
  "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.3.0",
  "org.apache.phoenix" % "phoenix-core" % "4.11.0-HBase-1.3",
  "org.apache.spark" % "spark-core_2.11" % "2.1.1",
  "org.apache.spark" % "spark-sql_2.11" % "2.1.1",
  "org.apache.phoenix" % "phoenix-spark" % "4.11.0-HBase-1.3"
)

这里是个例外:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.call(ConfigurationFactory.java:49) at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.call(ConfigurationFactory.java:46) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76) at org.apache.phoenix.util.PhoenixContextExecutor.callWithoutPropagation(PhoenixContextExecutor.java:91) at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.getConfiguration(ConfigurationFactory.java:46) at org.apache.phoenix.jdbc.PhoenixDriver.initializeConnectionCache(PhoenixDriver.java:151) at org.apache.phoenix.jdbc.PhoenixDriver.(PhoenixDriver.java:142) at org.apache.phoenix.jdbc.PhoenixDriver.(PhoenixDriver.java:69) at org.apache.phoenix.spark.PhoenixRDD.(PhoenixRDD.scala:43) at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:52) at org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:40) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:389) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:965) at MainTest2$.delayedEndpoint$MainTest2(MainTest2.scala:9) at MainTest2$delayedInit$body.apply(MainTest2.scala:6) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main.apply(App.scala:76) at scala.App$$anonfun$main.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at MainTest2$.main(MainTest2.scala:6) at MainTest2.main(MainTest2.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more

我已经尝试更改 hadoop-env.sh 中的 HADOOP_CLASSPATH,就像之前 post.

中所说的那样

我该怎么做才能克服这个问题?

我找到了解决问题的办法。如异常所述,我的编译器无法找到 class HBaseConfiguration。 HBaseConfiguration 在 org.apache.hadoop.hbase 库中使用,因此需要编译。我注意到 HBaseConfiguration class 并不像我想的那样出现在 org.apache.hadoop 库中。对于安装在我的 PC 计算机上的 hbase 1.3.1 版本,我设法在我的 HBASE_HOME/lib 文件夹中的 hbase-common-1.3.1 jar 中找到了这个 class。

然后我在 built.SBT 中包含此依赖项:

"org.apache.hbase" % "hbase-common" % "1.3.1"

异常消失了。