Spark 1.5 和 datastax-ddc-3.2.1 Cassandra 依赖罐?
Spark 1.5 and datastax-ddc-3.2.1 Cassandra Dependency Jars?
我正在使用 Spark 1.5 和 Cassandra 3.2.1。谁能指定在构建路径中连接、查询和插入数据到 Cassandra 所需的确切 jar 是什么。
现在我正在使用以下罐子
spark-cassandra-connector_2.10-1.5.0-M3.jar
apache-cassandra-clientutil-3.2.1.jar
cassandra-driver-core-3.0.0-beta1-bb1bce4-SNAPSHOT-shaded.jar
spark-assembly-1.5.1-hadoop2.0.0-mr1-cdh4.2.0.jar
番石榴 18.0.jar
netty-all-4.0.23.Final.jar
有了上面的罐子,我就可以连接到 cassandra 了。我能够截断表和删除表。但是我无法插入任何数据,甚至连简单的插入查询都无法插入。
代码如下:
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import com.datastax.driver.core.Session;
import com.datastax.spark.connector.cql.CassandraConnector;
public class Test {
public static void main(String[] args) {
JavaSparkContext ctx = new JavaSparkContext(new SparkConf().setMaster("spark://blr-lt-203:7077").set("spark.cassandra.connection.host", "blr-lt-203").setAppName("testinsert").set("spark.serializer" ,"org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max" , "1024mb"));
CassandraConnector connector = CassandraConnector.apply(ctx.getConf());
Session session = connector.openSession();
session.execute("insert into test.table1 (name) values ('abcd')") ;
session.close();
ctx.stop();
}
}
以下是日志:
16/03/28 21:24:52 INFO BlockManagerMaster: Trying to register BlockManager
16/03/28 21:24:52 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50238 with 944.7 MB RAM,BlockManagerId(driver, localhost, 50238)
16/03/28 21:24:52 INFO BlockManagerMaster: Registered BlockManager
16/03/28 21:24:53 INFO NettyUtil: Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
16/03/28 21:24:53 INFO Cluster: New Cassandra host localhost/127.0.0.1:9042 added
16/03/28 21:24:53 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
它只是在这里停了一段时间,然后超时并出现以下异常:
Exception in thread "main" com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_QUORUM (2 required but only 1 alive)
我做错了什么?
请让我知道需要哪些jar 或是否存在版本兼容性问题。
spark(1.5)和cassandra(?)哪个版本最稳定
提前致谢
问题是由于 google 的 guava 库之间的冲突造成的。
解决方案是隐藏 spark-cassandra-connector 依赖项中存在的番石榴库。您可以使用 maven shade 插件来做到这一点。
这是我的pom.xml遮蔽番石榴库。
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.pc.test</groupId>
<artifactId>casparktest</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>casparktest</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0-beta1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<relocations>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>com.pointcross.shaded.google</shadedPattern>
</relocation>
</relocations>
<minimizeJar>false</minimizeJar>
<shadedArtifactAttached>true</shadedArtifactAttached>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
之后你做一个 maven 构建,它将生成一个 jar,其中包含 pom.xml 中提到的所有依赖项,同时遮蔽你可以用来提交 spark 作业的 guava 库。
我正在使用 Spark 1.5 和 Cassandra 3.2.1。谁能指定在构建路径中连接、查询和插入数据到 Cassandra 所需的确切 jar 是什么。
现在我正在使用以下罐子 spark-cassandra-connector_2.10-1.5.0-M3.jar apache-cassandra-clientutil-3.2.1.jar cassandra-driver-core-3.0.0-beta1-bb1bce4-SNAPSHOT-shaded.jar spark-assembly-1.5.1-hadoop2.0.0-mr1-cdh4.2.0.jar 番石榴 18.0.jar netty-all-4.0.23.Final.jar
有了上面的罐子,我就可以连接到 cassandra 了。我能够截断表和删除表。但是我无法插入任何数据,甚至连简单的插入查询都无法插入。
代码如下:
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import com.datastax.driver.core.Session;
import com.datastax.spark.connector.cql.CassandraConnector;
public class Test {
public static void main(String[] args) {
JavaSparkContext ctx = new JavaSparkContext(new SparkConf().setMaster("spark://blr-lt-203:7077").set("spark.cassandra.connection.host", "blr-lt-203").setAppName("testinsert").set("spark.serializer" ,"org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max" , "1024mb"));
CassandraConnector connector = CassandraConnector.apply(ctx.getConf());
Session session = connector.openSession();
session.execute("insert into test.table1 (name) values ('abcd')") ;
session.close();
ctx.stop();
}
}
以下是日志:
16/03/28 21:24:52 INFO BlockManagerMaster: Trying to register BlockManager
16/03/28 21:24:52 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50238 with 944.7 MB RAM,BlockManagerId(driver, localhost, 50238)
16/03/28 21:24:52 INFO BlockManagerMaster: Registered BlockManager
16/03/28 21:24:53 INFO NettyUtil: Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
16/03/28 21:24:53 INFO Cluster: New Cassandra host localhost/127.0.0.1:9042 added
16/03/28 21:24:53 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
它只是在这里停了一段时间,然后超时并出现以下异常:
Exception in thread "main" com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_QUORUM (2 required but only 1 alive)
我做错了什么?
请让我知道需要哪些jar 或是否存在版本兼容性问题。
spark(1.5)和cassandra(?)哪个版本最稳定
提前致谢
问题是由于 google 的 guava 库之间的冲突造成的。
解决方案是隐藏 spark-cassandra-connector 依赖项中存在的番石榴库。您可以使用 maven shade 插件来做到这一点。 这是我的pom.xml遮蔽番石榴库。
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.pc.test</groupId>
<artifactId>casparktest</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>casparktest</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0-beta1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<relocations>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>com.pointcross.shaded.google</shadedPattern>
</relocation>
</relocations>
<minimizeJar>false</minimizeJar>
<shadedArtifactAttached>true</shadedArtifactAttached>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
之后你做一个 maven 构建,它将生成一个 jar,其中包含 pom.xml 中提到的所有依赖项,同时遮蔽你可以用来提交 spark 作业的 guava 库。