为什么 spark-submit 在 --jars 中使用 Cassandra 连接器时 "Failed to load class for data source: org.apache.spark.sql.cassandra" 失败?
Why does spark-submit fail with "Failed to load class for data source: org.apache.spark.sql.cassandra" with Cassandra connector in --jars?
Spark 版本:1.4.1
卡桑德拉版本:2.1.8
Datastax Cassandra 连接器:1.4.2-SNAPSHOT.jar
命令一 运行
./spark-submit --jars /usr/local/src/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.4.2-SNAPSHOT.jar --driver-class-path /usr/local/src/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.4.2-SNAPSHOT.jar --jars /usr/local/lib/spark-1.4.1/external/kafka/target/scala-2.10/spark-streaming-kafka_2.10-1.4.1.jar --jars /usr/local/lib/spark-1.4.1/external/kafka-assembly/target/scala-2.10/spark-streaming-kafka-assembly_2.10-1.4.1.jar --driver-class-path /usr/local/lib/spark-1.4.1/external/kafka/target/scala-2.10/spark-streaming-kafka_2.10-1.4.1.jar --driver-class-path /usr/local/lib/spark-1.4.1/external/kafka-assembly/target/scala-2.10/spark-streaming-kafka-assembly_2.10-1.4.1.jar --packages org.apache.spark:spark-streaming-kafka_2.10:1.4.1 --executor-memory 6g --executor-cores 6 --master local[4] kafka_streaming.py
以下是我收到的错误:
Py4JJavaError: An error occurred while calling o169.save.
: java.lang.RuntimeException: Failed to load class for data source: org.apache.spark.sql.cassandra
一定是在做什么傻事。如有任何回复,我们将不胜感激。
尝试在同一个 --jars 选项中提供所有 jar (comma-separated) :
--jars yourFirstJar.jar,yourSecondJar.jar
一个更方便的开发解决方案是使用来自 maven central (comma-separated) 的 jar :
--packages org.apache.spark:spark-streaming-kafka_2.10:1.4.1,com.datastax.spark:spark-cassandra-connector_2.10:1.4.1
Spark 版本:1.4.1
卡桑德拉版本:2.1.8
Datastax Cassandra 连接器:1.4.2-SNAPSHOT.jar
命令一 运行
./spark-submit --jars /usr/local/src/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.4.2-SNAPSHOT.jar --driver-class-path /usr/local/src/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.4.2-SNAPSHOT.jar --jars /usr/local/lib/spark-1.4.1/external/kafka/target/scala-2.10/spark-streaming-kafka_2.10-1.4.1.jar --jars /usr/local/lib/spark-1.4.1/external/kafka-assembly/target/scala-2.10/spark-streaming-kafka-assembly_2.10-1.4.1.jar --driver-class-path /usr/local/lib/spark-1.4.1/external/kafka/target/scala-2.10/spark-streaming-kafka_2.10-1.4.1.jar --driver-class-path /usr/local/lib/spark-1.4.1/external/kafka-assembly/target/scala-2.10/spark-streaming-kafka-assembly_2.10-1.4.1.jar --packages org.apache.spark:spark-streaming-kafka_2.10:1.4.1 --executor-memory 6g --executor-cores 6 --master local[4] kafka_streaming.py
以下是我收到的错误:
Py4JJavaError: An error occurred while calling o169.save.
: java.lang.RuntimeException: Failed to load class for data source: org.apache.spark.sql.cassandra
一定是在做什么傻事。如有任何回复,我们将不胜感激。
尝试在同一个 --jars 选项中提供所有 jar (comma-separated) :
--jars yourFirstJar.jar,yourSecondJar.jar
一个更方便的开发解决方案是使用来自 maven central (comma-separated) 的 jar :
--packages org.apache.spark:spark-streaming-kafka_2.10:1.4.1,com.datastax.spark:spark-cassandra-connector_2.10:1.4.1