Spark cassandra 与 spark-cassandra 连接器集成时出错
Error in Spark cassandra integration with spark-cassandra connector
我正在尝试以独立模式从 spark 中将数据保存在 cassandra 中。通过 运行 以下命令:
bin/spark-submit --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
--class "pl.japila.spark.SparkMeApp" --master local /home/hduser2/code14/target/scala-2.10/simple-project_2.10-1.0.jar
我的 build.sbt 文件是:-
**name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.0"
resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "datastax" % "spark-cassandra-connector" % "1.6.0-s_2.10"
libraryDependencies ++= Seq(
"org.apache.cassandra" % "cassandra-thrift" % "3.5" ,
"org.apache.cassandra" % "cassandra-clientutil" % "3.5",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0"
)**
我的 Spark 代码是:-
package pl.japila.spark
import org.apache.spark.sql._
import com.datastax.spark.connector._
import com.datastax.driver.core._
import com.datastax.spark.connector.cql._
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.driver.core.QueryOptions._
import org.apache.spark.SparkConf
import com.datastax.driver.core._
import com.datastax.spark.connector.rdd._
object SparkMeApp {
def main(args: Array[String]) {
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext("local", "test", conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val rdd = sc.cassandraTable("test", "kv")
val collection = sc.parallelize(Seq(("cat", 30), ("fox", 40)))
collection.saveToCassandra("test", "kv", SomeColumns("key", "value"))
}
}
我得到了这个错误:-
Exception in thread "main" java.lang.NoSuchMethodError: com.datastax.driver.core.QueryOptions.setRefreshNodeIntervalMillis(I)Lcom/datastax/driver/core/QueryOptions;**
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:49)
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:92)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:153)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
使用的版本是:-
Spark - 1.6.0
斯卡拉 - 2.10.4
cassandra 驱动程序核心 jar - 3.0.0
卡桑德拉版本 2.2.7
spark-cassandra 连接器 - 1.6.0-s_2.10
有人请帮忙!!
我会先删除
libraryDependencies ++= Seq(
"org.apache.cassandra" % "cassandra-thrift" % "3.5" ,
"org.apache.cassandra" % "cassandra-clientutil" % "3.5",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0"
)
因为作为连接器依赖项的库将自动包含在包依赖项中。
然后我将通过使用
启动 spark-shell 来测试包分辨率
./bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
您看到以下解决方案正确发生
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found datastax#spark-cassandra-connector;1.6.0-s_2.10 in spark-packages
found org.apache.cassandra#cassandra-clientutil;3.0.2 in list
found com.datastax.cassandra#cassandra-driver-core;3.0.0 in list
...
[2.10.5] org.scala-lang#scala-reflect;2.10.5
:: resolution report :: resolve 627ms :: artifacts dl 10ms
:: modules in use:
com.datastax.cassandra#cassandra-driver-core;3.0.0 from list in [default]
com.google.guava#guava;16.0.1 from list in [default]
com.twitter#jsr166e;1.1.0 from list in [default]
datastax#spark-cassandra-connector;1.6.0-s_2.10 from spark-packages in [default]
...
如果这些似乎可以正确解析但一切仍然无法正常工作,我会尝试清除这些工件的缓存。
我正在尝试以独立模式从 spark 中将数据保存在 cassandra 中。通过 运行 以下命令:
bin/spark-submit --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
--class "pl.japila.spark.SparkMeApp" --master local /home/hduser2/code14/target/scala-2.10/simple-project_2.10-1.0.jar
我的 build.sbt 文件是:-
**name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.0"
resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "datastax" % "spark-cassandra-connector" % "1.6.0-s_2.10"
libraryDependencies ++= Seq(
"org.apache.cassandra" % "cassandra-thrift" % "3.5" ,
"org.apache.cassandra" % "cassandra-clientutil" % "3.5",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0"
)**
我的 Spark 代码是:-
package pl.japila.spark
import org.apache.spark.sql._
import com.datastax.spark.connector._
import com.datastax.driver.core._
import com.datastax.spark.connector.cql._
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.driver.core.QueryOptions._
import org.apache.spark.SparkConf
import com.datastax.driver.core._
import com.datastax.spark.connector.rdd._
object SparkMeApp {
def main(args: Array[String]) {
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext("local", "test", conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val rdd = sc.cassandraTable("test", "kv")
val collection = sc.parallelize(Seq(("cat", 30), ("fox", 40)))
collection.saveToCassandra("test", "kv", SomeColumns("key", "value"))
}
}
我得到了这个错误:-
Exception in thread "main" java.lang.NoSuchMethodError: com.datastax.driver.core.QueryOptions.setRefreshNodeIntervalMillis(I)Lcom/datastax/driver/core/QueryOptions;** at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:49) at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:92) at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:153) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun.apply(CassandraConnector.scala:148) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun.apply(CassandraConnector.scala:148) at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31) at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56) at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
使用的版本是:-
Spark - 1.6.0
斯卡拉 - 2.10.4
cassandra 驱动程序核心 jar - 3.0.0
卡桑德拉版本 2.2.7
spark-cassandra 连接器 - 1.6.0-s_2.10
有人请帮忙!!
我会先删除
libraryDependencies ++= Seq(
"org.apache.cassandra" % "cassandra-thrift" % "3.5" ,
"org.apache.cassandra" % "cassandra-clientutil" % "3.5",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0"
)
因为作为连接器依赖项的库将自动包含在包依赖项中。
然后我将通过使用
启动 spark-shell 来测试包分辨率./bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
您看到以下解决方案正确发生
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found datastax#spark-cassandra-connector;1.6.0-s_2.10 in spark-packages
found org.apache.cassandra#cassandra-clientutil;3.0.2 in list
found com.datastax.cassandra#cassandra-driver-core;3.0.0 in list
...
[2.10.5] org.scala-lang#scala-reflect;2.10.5
:: resolution report :: resolve 627ms :: artifacts dl 10ms
:: modules in use:
com.datastax.cassandra#cassandra-driver-core;3.0.0 from list in [default]
com.google.guava#guava;16.0.1 from list in [default]
com.twitter#jsr166e;1.1.0 from list in [default]
datastax#spark-cassandra-connector;1.6.0-s_2.10 from spark-packages in [default]
...
如果这些似乎可以正确解析但一切仍然无法正常工作,我会尝试清除这些工件的缓存。