解决 spark-avro error = Failed to load class for data source: com.databricks.spark.avro
Resolve spark-avro error = Failed to load class for data source: com.databricks.spark.avro
我正在尝试使用 spark-avro 库来处理 avro 文件。我正在使用 SBT:
build.sbt:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "1.3.0",
"com.databricks" %% "spark-avro" % "1.0.0")
tester.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import com.databricks.spark.avro._
object tester {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("SimpleApplication").setMaster("local")
val sc = new SparkContext(conf)
// Creates a DataFrame from a specified file
val df = sqlContext.load("episodes.avro", "com.databricks.spark.avro")
}
}
当我 运行 在 IntelliJ IDE 中进行测试时,我得到以下堆栈跟踪:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/mapred/FsInput
at com.databricks.spark.avro.AvroRelation.newReader(AvroRelation.scala:111)
at com.databricks.spark.avro.AvroRelation.<init>(AvroRelation.scala:53)
at com.databricks.spark.avro.DefaultSource.createRelation(DefaultSource.scala:41)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290)
当我 运行:
$ sbt package
$ ~/spark-1.3.1/bin/spark-submit --class "tester" target/scala-2.10/project_2.10-0.1-SNAPSHOT.jar
我得到以下堆栈跟踪:
Exception in thread "main" java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.avro
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:194)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:205)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
我该怎么做才能解决这个错误?任何帮助是极大的赞赏。谢谢!!
"sbt package" 不会包含您的依赖项,请尝试 sbt-assembly。
我将 build.sbt 文件更改为:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "1.3.0",
"com.databricks" %% "spark-avro" % "1.0.0",
"org.apache.avro" % "avro" % "1.7.7",
"org.apache.avro" % "avro-mapred" % "1.7.7")
assemblyMergeStrategy in assembly := {
case PathList("org", "slf4j", xs@_*) => MergeStrategy.first
case PathList("org", "apache", "spark", xs @_*) => MergeStrategy.first
case PathList("com", "esotericsoftware", "minlog", xs @_*) => MergeStrategy.first
case PathList("javax", "activation", xs @_*) => MergeStrategy.first
case PathList("javax", "servlet", xs @_*) => MergeStrategy.first
case PathList("javax", "xml", "stream", xs @_*) => MergeStrategy.first
case PathList("org", "apache", "commons", xs @_*) => MergeStrategy.first
case PathList("com", "google", "common", xs @_*) => MergeStrategy.first
case "org/apache/hadoop/yarn/factories/package-info.class" => MergeStrategy.first
case "org/apache/hadoop/yarn/factory/providers/package-info.class" => MergeStrategy.first
case "org/apache/hadoop/yarn/util/package-info.class" => MergeStrategy.first
case x if x.startsWith("META-INF") => MergeStrategy.discard
case x if x.startsWith("plugin.properties") => MergeStrategy.discard
case x => {
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
}
并使用命令
$ sbt assembly
构建 jar。现在一切正常。
我正在尝试使用 spark-avro 库来处理 avro 文件。我正在使用 SBT:
build.sbt:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "1.3.0",
"com.databricks" %% "spark-avro" % "1.0.0")
tester.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import com.databricks.spark.avro._
object tester {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("SimpleApplication").setMaster("local")
val sc = new SparkContext(conf)
// Creates a DataFrame from a specified file
val df = sqlContext.load("episodes.avro", "com.databricks.spark.avro")
}
}
当我 运行 在 IntelliJ IDE 中进行测试时,我得到以下堆栈跟踪:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/mapred/FsInput
at com.databricks.spark.avro.AvroRelation.newReader(AvroRelation.scala:111)
at com.databricks.spark.avro.AvroRelation.<init>(AvroRelation.scala:53)
at com.databricks.spark.avro.DefaultSource.createRelation(DefaultSource.scala:41)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290)
当我 运行:
$ sbt package
$ ~/spark-1.3.1/bin/spark-submit --class "tester" target/scala-2.10/project_2.10-0.1-SNAPSHOT.jar
我得到以下堆栈跟踪:
Exception in thread "main" java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.avro
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:194)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:205)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
我该怎么做才能解决这个错误?任何帮助是极大的赞赏。谢谢!!
"sbt package" 不会包含您的依赖项,请尝试 sbt-assembly。
我将 build.sbt 文件更改为:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "1.3.0",
"com.databricks" %% "spark-avro" % "1.0.0",
"org.apache.avro" % "avro" % "1.7.7",
"org.apache.avro" % "avro-mapred" % "1.7.7")
assemblyMergeStrategy in assembly := {
case PathList("org", "slf4j", xs@_*) => MergeStrategy.first
case PathList("org", "apache", "spark", xs @_*) => MergeStrategy.first
case PathList("com", "esotericsoftware", "minlog", xs @_*) => MergeStrategy.first
case PathList("javax", "activation", xs @_*) => MergeStrategy.first
case PathList("javax", "servlet", xs @_*) => MergeStrategy.first
case PathList("javax", "xml", "stream", xs @_*) => MergeStrategy.first
case PathList("org", "apache", "commons", xs @_*) => MergeStrategy.first
case PathList("com", "google", "common", xs @_*) => MergeStrategy.first
case "org/apache/hadoop/yarn/factories/package-info.class" => MergeStrategy.first
case "org/apache/hadoop/yarn/factory/providers/package-info.class" => MergeStrategy.first
case "org/apache/hadoop/yarn/util/package-info.class" => MergeStrategy.first
case x if x.startsWith("META-INF") => MergeStrategy.discard
case x if x.startsWith("plugin.properties") => MergeStrategy.discard
case x => {
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
}
并使用命令
$ sbt assembly
构建 jar。现在一切正常。