安装卡桑德拉火花连接器

Installing cassandra spark connector

根据

https://github.com/datastax/spark-cassandra-connector
http://spark-packages.org/package/datastax/spark-cassandra-connector

我执行了命令,但最后似乎有错误。这些是致命的还是我需要解决它们?

[idf@node1 bin]$ spark-shell --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11
Ivy Default Cache set to: /home/idf/.ivy2/cache
The jars for the packages stored in: /home/idf/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found datastax#spark-cassandra-connector;1.6.0-M1-s_2.11 in spark-packages
        found org.apache.cassandra#cassandra-clientutil;3.0.2 in central
        found com.datastax.cassandra#cassandra-driver-core;3.0.0 in central
        found io.netty#netty-handler;4.0.33.Final in central
        found io.netty#netty-buffer;4.0.33.Final in central
        found io.netty#netty-common;4.0.33.Final in central
        found io.netty#netty-transport;4.0.33.Final in central
        found io.netty#netty-codec;4.0.33.Final in central
        found io.dropwizard.metrics#metrics-core;3.1.2 in list
        found org.slf4j#slf4j-api;1.7.7 in central
        found org.apache.commons#commons-lang3;3.3.2 in list
        found com.google.guava#guava;16.0.1 in central
        found org.joda#joda-convert;1.2 in central
        found joda-time#joda-time;2.3 in central
        found com.twitter#jsr166e;1.1.0 in central
        found org.scala-lang#scala-reflect;2.11.7 in list
        [2.11.7] org.scala-lang#scala-reflect;2.11.7
downloading http://dl.bintray.com/spark-packages/maven/datastax/spark-cassandra-connector/1.6.0-M1-s_2.11/spark-cassandra-connector-1.6.0-M1-s_2.11.jar ...
        [SUCCESSFUL ] datastax#spark-cassandra-connector;1.6.0-M1-s_2.11!spark-cassandra-connector.jar (2430ms)
downloading https://repo1.maven.org/maven2/org/apache/cassandra/cassandra-clientutil/3.0.2/cassandra-clientutil-3.0.2.jar ...
        [SUCCESSFUL ] org.apache.cassandra#cassandra-clientutil;3.0.2!cassandra-clientutil.jar (195ms)
downloading https://repo1.maven.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.0.0/cassandra-driver-core-3.0.0.jar ...
        [SUCCESSFUL ] com.datastax.cassandra#cassandra-driver-core;3.0.0!cassandra-driver-core.jar(bundle) (874ms)
downloading https://repo1.maven.org/maven2/com/google/guava/guava/16.0.1/guava-16.0.1.jar ...
        [SUCCESSFUL ] com.google.guava#guava;16.0.1!guava.jar(bundle) (1930ms)
downloading https://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.jar ...
        [SUCCESSFUL ] org.joda#joda-convert;1.2!joda-convert.jar (68ms)
downloading https://repo1.maven.org/maven2/joda-time/joda-time/2.3/joda-time-2.3.jar ...
        [SUCCESSFUL ] joda-time#joda-time;2.3!joda-time.jar (524ms)
downloading https://repo1.maven.org/maven2/com/twitter/jsr166e/1.1.0/jsr166e-1.1.0.jar ...
        [SUCCESSFUL ] com.twitter#jsr166e;1.1.0!jsr166e.jar (138ms)
downloading https://repo1.maven.org/maven2/io/netty/netty-handler/4.0.33.Final/netty-handler-4.0.33.Final.jar ...
        [SUCCESSFUL ] io.netty#netty-handler;4.0.33.Final!netty-handler.jar (266ms)
downloading https://repo1.maven.org/maven2/io/netty/netty-buffer/4.0.33.Final/netty-buffer-4.0.33.Final.jar ...
        [SUCCESSFUL ] io.netty#netty-buffer;4.0.33.Final!netty-buffer.jar (202ms)
downloading https://repo1.maven.org/maven2/io/netty/netty-transport/4.0.33.Final/netty-transport-4.0.33.Final.jar ...
        [SUCCESSFUL ] io.netty#netty-transport;4.0.33.Final!netty-transport.jar (330ms)
downloading https://repo1.maven.org/maven2/io/netty/netty-codec/4.0.33.Final/netty-codec-4.0.33.Final.jar ...
        [SUCCESSFUL ] io.netty#netty-codec;4.0.33.Final!netty-codec.jar (157ms)
downloading https://repo1.maven.org/maven2/io/netty/netty-common/4.0.33.Final/netty-common-4.0.33.Final.jar ...
        [SUCCESSFUL ] io.netty#netty-common;4.0.33.Final!netty-common.jar (409ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.7/slf4j-api-1.7.7.jar ...
        [SUCCESSFUL ] org.slf4j#slf4j-api;1.7.7!slf4j-api.jar (57ms)
:: resolution report :: resolve 5827ms :: artifacts dl 7749ms
        :: modules in use:
        com.datastax.cassandra#cassandra-driver-core;3.0.0 from central in [default]
        com.google.guava#guava;16.0.1 from central in [default]
        com.twitter#jsr166e;1.1.0 from central in [default]
        datastax#spark-cassandra-connector;1.6.0-M1-s_2.11 from spark-packages in [default]
        io.dropwizard.metrics#metrics-core;3.1.2 from list in [default]
        io.netty#netty-buffer;4.0.33.Final from central in [default]
        io.netty#netty-codec;4.0.33.Final from central in [default]
        io.netty#netty-common;4.0.33.Final from central in [default]
        io.netty#netty-handler;4.0.33.Final from central in [default]
        io.netty#netty-transport;4.0.33.Final from central in [default]
        joda-time#joda-time;2.3 from central in [default]
        org.apache.cassandra#cassandra-clientutil;3.0.2 from central in [default]
        org.apache.commons#commons-lang3;3.3.2 from list in [default]
        org.joda#joda-convert;1.2 from central in [default]
        org.scala-lang#scala-reflect;2.11.7 from list in [default]
        org.slf4j#slf4j-api;1.7.7 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   16  |   13  |   13  |   0   ||   16  |   13  |
        ---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
        unknown resolver sbt-chain

        unknown resolver null

        unknown resolver sbt-chain

        unknown resolver null

        unknown resolver sbt-chain

        unknown resolver null

        unknown resolver sbt-chain

        unknown resolver null

        unknown resolver null

        unknown resolver sbt-chain

        unknown resolver null


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        16 artifacts copied, 0 already retrieved (12730kB/549ms)
16/04/08 14:48:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
error: bad symbolic reference. A signature in package.class refers to type compileTimeOnly
in package scala.annotation which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling package.class.
<console>:14: error: Reference to value sc should not have survived past type checking,
it should have been processed and eliminated during expansion of an enclosing macro.
                @transient val sc = {
                               ^
<console>:15: error: Reference to method createSQLContext in class SparkILoop should not have survived past type checking,
it should have been processed and eliminated during expansion of an enclosing macro.
                  val _sqlContext = org.apache.spark.repl.Main.interp.createSQLContext()
                                                                      ^
<console>:14: error: Reference to value sqlContext should not have survived past type checking,
it should have been processed and eliminated during expansion of an enclosing macro.
                @transient val sqlContext = {
                               ^
<console>:16: error: not found: value sqlContext
         import sqlContext.implicits._
                ^
<console>:16: error: not found: value sqlContext
         import sqlContext.sql
                ^

scala>

编辑 1

在选择正确的 scala 版本时,它似乎更进一步,但我不确定下面的输出是否仍然有似乎需要解决的错误:

[idf@node1 bin]$ spark-shell --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.10
Ivy Default Cache set to: /home/idf/.ivy2/cache
The jars for the packages stored in: /home/idf/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apach                                                                  e/ivy/core/settings/ivysettings.xml
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found datastax#spark-cassandra-connector;1.6.0-M1-s_2.10 in spark-packages
        found org.apache.cassandra#cassandra-clientutil;3.0.2 in central
        found com.datastax.cassandra#cassandra-driver-core;3.0.0 in central
        found io.netty#netty-handler;4.0.33.Final in central
        found io.netty#netty-buffer;4.0.33.Final in central
        found io.netty#netty-common;4.0.33.Final in central
        found io.netty#netty-transport;4.0.33.Final in central
        found io.netty#netty-codec;4.0.33.Final in central
        found io.dropwizard.metrics#metrics-core;3.1.2 in list
        found org.slf4j#slf4j-api;1.7.7 in central
        found org.apache.commons#commons-lang3;3.3.2 in list
        found com.google.guava#guava;16.0.1 in central
        found org.joda#joda-convert;1.2 in central
        found joda-time#joda-time;2.3 in central
        found com.twitter#jsr166e;1.1.0 in central
        found org.scala-lang#scala-reflect;2.10.5 in list
downloading http://dl.bintray.com/spark-packages/maven/datastax/spark-cassandra-connector/1.6.0-M1-s_2.10/spark-cassandr                                                                  a-connector-1.6.0-M1-s_2.10.jar ...
        [SUCCESSFUL ] datastax#spark-cassandra-connector;1.6.0-M1-s_2.10!spark-cassandra-connector.jar (2414ms)
:: resolution report :: resolve 3281ms :: artifacts dl 2430ms
        :: modules in use:
        com.datastax.cassandra#cassandra-driver-core;3.0.0 from central in [default]
        com.google.guava#guava;16.0.1 from central in [default]
        com.twitter#jsr166e;1.1.0 from central in [default]
        datastax#spark-cassandra-connector;1.6.0-M1-s_2.10 from spark-packages in [default]
        io.dropwizard.metrics#metrics-core;3.1.2 from list in [default]
        io.netty#netty-buffer;4.0.33.Final from central in [default]
        io.netty#netty-codec;4.0.33.Final from central in [default]
        io.netty#netty-common;4.0.33.Final from central in [default]
        io.netty#netty-handler;4.0.33.Final from central in [default]
        io.netty#netty-transport;4.0.33.Final from central in [default]
        joda-time#joda-time;2.3 from central in [default]
        org.apache.cassandra#cassandra-clientutil;3.0.2 from central in [default]
        org.apache.commons#commons-lang3;3.3.2 from list in [default]
        org.joda#joda-convert;1.2 from central in [default]
        org.scala-lang#scala-reflect;2.10.5 from list in [default]
        org.slf4j#slf4j-api;1.7.7 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   16  |   6   |   6   |   0   ||   16  |   1   |
        ---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
        unknown resolver null

        unknown resolver sbt-chain

        unknown resolver null


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        2 artifacts copied, 14 already retrieved (5453kB/69ms)
16/04/08 15:50:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java cl                                                                  asses where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/04/08 15:50:28 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple J                                                                  AR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-core-3.2.10.jar" is alr                                                                  eady registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/                                                                  lib/datanucleus-core-3.2.10.jar."
16/04/08 15:50:28 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont hav                                                                  e multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-rdbms-3.2.9                                                                  .jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bi                                                                  n-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
16/04/08 15:50:28 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have mu                                                                  ltiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-api-jdo-3.2.6.j                                                                  ar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-                                                                  hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
16/04/08 15:50:45 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/04/08 15:50:45 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/04/08 15:50:49 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
16/04/08 15:50:49 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
16/04/08 15:50:49 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
16/04/08 15:51:09 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/04/08 15:51:09 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.

scala>

您选择了Scala 2.11版本的神器s_2.11。您很可能正在使用由 Scala 2.10 构建的 Spark,因此请使用 s_2.10 工件

spark-shell --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.10

我在使用包 com.databricks:spark-redshift_2.11:2.0.1 时遇到了同样的问题。我的命令是

pyspark --packages com.databricks:spark-redshift_2.11:2.0.1

我发现发生 unknown resolver null unknown resolver sbt-chain 问题的最主要原因是你的 spark 版本,你的 scala 版本和你的包版本不一致。所以你需要做的就是找到合适的包版本。

我的包裹spark-redshift

斯卡拉 2.10

groupId: com.databricks
artifactId: spark-redshift_2.10
version: 2.0.1

斯卡拉 2.11

groupId: com.databricks
artifactId: spark-redshift_2.11
version: 2.0.1