如何在 IDE 上的常规 Scala 项目中使用 Delta Lake
How to make use of Delta Lake on a regular Scala project on IDE
我已经在 build.sbt
中添加了增量依赖项
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion,
// logging
"org.apache.logging.log4j" % "log4j-api" % "2.4.1",
"org.apache.logging.log4j" % "log4j-core" % "2.4.1",
// postgres for DB connectivity
"org.postgresql" % "postgresql" % postgresVersion,
"io.delta" %% "delta-core" % "0.7.0"
但是,我不知道 spark 会话必须包含什么配置。下面的代码失败了。
val spark = SparkSession.builder()
.appName("Spark SQL Practice")
.config("spark.master", "local")
.config("spark.network.timeout" , "10000000s")//to avoid Heartbeat exception
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.getOrCreate()
异常 -
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable
您需要升级 Apache Spark。 MergeIntoTable
功能是在 v3.0.0 版本中引入的。 Link 来源:AstBuilder.scala, Analyzer.scala, Github Pull Request, Release Notes(查看功能增强部分)。
这里 an example project I made 会对你有所帮助。
build.sbt
文件应包含以下依赖项:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.0" % "provided"
libraryDependencies += "io.delta" %% "delta-core" % "0.7.0" % "provided"
我认为你需要 using Spark 3 for Delta Lake 0.7.0。
您不需要任何特殊的 SparkSession 配置选项,像这样应该没问题:
lazy val spark: SparkSession = {
SparkSession
.builder()
.master("local")
.appName("spark session")
.config("spark.databricks.delta.retentionDurationCheck.enabled", "false")
.getOrCreate()
}
这是因为存在您的代码所依赖的 class 文件,并且该文件在编译时存在但在运行时未找到。寻找构建时间和运行时的差异 classpaths.
更具体到您的场景:
If you get java.lang.NoClassDefFoundError on
org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable exception
in this case JAR version does not have MergeIntoTable.scala file.
The solution was to add the apache spark latest version, which comes with the
org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable.scala file .
spark 3.x.x 升级和发布中的更多信息 - https://github.com/apache/spark/pull/26167。
我已经在 build.sbt
中添加了增量依赖项libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion,
// logging
"org.apache.logging.log4j" % "log4j-api" % "2.4.1",
"org.apache.logging.log4j" % "log4j-core" % "2.4.1",
// postgres for DB connectivity
"org.postgresql" % "postgresql" % postgresVersion,
"io.delta" %% "delta-core" % "0.7.0"
但是,我不知道 spark 会话必须包含什么配置。下面的代码失败了。
val spark = SparkSession.builder()
.appName("Spark SQL Practice")
.config("spark.master", "local")
.config("spark.network.timeout" , "10000000s")//to avoid Heartbeat exception
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.getOrCreate()
异常 -
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable
您需要升级 Apache Spark。 MergeIntoTable
功能是在 v3.0.0 版本中引入的。 Link 来源:AstBuilder.scala, Analyzer.scala, Github Pull Request, Release Notes(查看功能增强部分)。
这里 an example project I made 会对你有所帮助。
build.sbt
文件应包含以下依赖项:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.0" % "provided"
libraryDependencies += "io.delta" %% "delta-core" % "0.7.0" % "provided"
我认为你需要 using Spark 3 for Delta Lake 0.7.0。
您不需要任何特殊的 SparkSession 配置选项,像这样应该没问题:
lazy val spark: SparkSession = {
SparkSession
.builder()
.master("local")
.appName("spark session")
.config("spark.databricks.delta.retentionDurationCheck.enabled", "false")
.getOrCreate()
}
这是因为存在您的代码所依赖的 class 文件,并且该文件在编译时存在但在运行时未找到。寻找构建时间和运行时的差异 classpaths.
更具体到您的场景:
If you get java.lang.NoClassDefFoundError on
org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable exception
in this case JAR version does not have MergeIntoTable.scala file.
The solution was to add the apache spark latest version, which comes with the
org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable.scala file .
spark 3.x.x 升级和发布中的更多信息 - https://github.com/apache/spark/pull/26167。