Apache-Spark 图形框架中的 SBT
SBT in Apache-Spark graph frames
我有以下 SBT 文件,我正在使用 Apache GraphFrame 编译 Scala 代码并读取 CSV 文件。
name := "Simple"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1",
"graphframes" % "graphframes" % "0.2.0-spark1.6-s_2.10",
"org.apache.spark" %% "spark-sql" % "1.0.0",
"com.databricks" % "spark-csv" % "1.0.3"
)
这是我在 Scala 中的代码
import org.graphframes._
import org.apache.spark.sql.DataFrame
val nodesList = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/Users/Desktop/GraphFrame/NodesList.csv")
val edgesList= sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/Users/Desktop/GraphFrame/EdgesList.csv")
val v=nodesList.toDF("id", "name")
val e=edgesList.toDF("src", "dst", "dist")
val g = GraphFrame(v, e)
当我尝试使用 SBT 制作 Jar 文件时,在编译期间出现以下错误
[trace] Stack trace suppressed: run last *:update for the full output.
[error] (*:update) sbt.ResolveException: unresolved dependency: graphframes#graphframes;0.2.0-spark1.6-s_2.10: not found
[error] Total time:
GraphFrames 尚未在 Maven 中央存储库中。
您可以:
- 在 Spark Packages 页面下载工件并安装到本地存储库
- 将 Spark Packages repository 添加到您的 SBT
build.sbt
:
build.sbt中的代码:
resolvers += Resolver.url("SparkPackages", url("https://dl.bintray.com/spark-packages/maven/"))
我设法使用 sbt-spark-package
使其正常工作
在project/plugins.sbt
中,我添加了:
resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.5")
然后,在 build.sbt
我添加了:
spDependencies += "graphframes/graphframes:0.5.0-spark2.1-s_2.11"
它奏效了。
希望对您有所帮助。
出于某种原因,Gawęda 的回答中提到的 Resolver.url 对我不起作用,下面的方法有效:
resolvers += "SparkPackages" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "graphframes" % "graphframes" % "0.7.0-spark2.4-s_2.11"
我有以下 SBT 文件,我正在使用 Apache GraphFrame 编译 Scala 代码并读取 CSV 文件。
name := "Simple"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1",
"graphframes" % "graphframes" % "0.2.0-spark1.6-s_2.10",
"org.apache.spark" %% "spark-sql" % "1.0.0",
"com.databricks" % "spark-csv" % "1.0.3"
)
这是我在 Scala 中的代码
import org.graphframes._
import org.apache.spark.sql.DataFrame
val nodesList = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/Users/Desktop/GraphFrame/NodesList.csv")
val edgesList= sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/Users/Desktop/GraphFrame/EdgesList.csv")
val v=nodesList.toDF("id", "name")
val e=edgesList.toDF("src", "dst", "dist")
val g = GraphFrame(v, e)
当我尝试使用 SBT 制作 Jar 文件时,在编译期间出现以下错误
[trace] Stack trace suppressed: run last *:update for the full output.
[error] (*:update) sbt.ResolveException: unresolved dependency: graphframes#graphframes;0.2.0-spark1.6-s_2.10: not found
[error] Total time:
GraphFrames 尚未在 Maven 中央存储库中。
您可以:
- 在 Spark Packages 页面下载工件并安装到本地存储库
- 将 Spark Packages repository 添加到您的 SBT
build.sbt
:
build.sbt中的代码:
resolvers += Resolver.url("SparkPackages", url("https://dl.bintray.com/spark-packages/maven/"))
我设法使用 sbt-spark-package
使其正常工作在project/plugins.sbt
中,我添加了:
resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.5")
然后,在 build.sbt
我添加了:
spDependencies += "graphframes/graphframes:0.5.0-spark2.1-s_2.11"
它奏效了。
希望对您有所帮助。
出于某种原因,Gawęda 的回答中提到的 Resolver.url 对我不起作用,下面的方法有效:
resolvers += "SparkPackages" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "graphframes" % "graphframes" % "0.7.0-spark2.4-s_2.11"