如何在 spark2 中通过 Impala 读取 Kudu
how to read Kudu by Impala in spark2
我想通过 Impala 在 spark2-shell 中阅读 Kudu,但在很多方面都失败了:(
输入 spark2-shell:
spark2-shell --jars commons-codec-1.3.jar,hive_metastore.jar,httpclient-4.1.3.jar,ImpalaJDBC41.jar,libthrift-0.9.0.jar,ql.jar,slf4j-log4j12-1.5.11.jar,zookeeper-3.4.6.jar,commons-logging-1.1.1.jar,hive_service.jar,httpcore-4.1.3.jar,libfb303-0.9.0.jar,log4j-1.2.14.jar,slf4j-api-1.5.11.jar,TCLIServiceClient.jar
我的代码:
spark.read.format("jdbc") .option("driver","com.cloudera.impala.jdbc41.Driver")
.option("url","jdbc:impala:Domainname")
//.option("databaseName","default") also use impala::default
.option("dbtable", "impala::default.tablename")
.load()
顺便说一下,我通过 desc formatted tablename
获得了 "impala::default.tablename"。
输出:
java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:113)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
... 48 elided
最好直接使用 Spark 连接到 Kudu,然后通过 Impala 连接
该文档在此处 https://blog.cloudera.com/blog/2017/02/up-and-running-with-apache-spark-on-apache-kudu/
Impala 为 Spark SQL
提供有竞争力的表现
我想通过 Impala 在 spark2-shell 中阅读 Kudu,但在很多方面都失败了:(
输入 spark2-shell:
spark2-shell --jars commons-codec-1.3.jar,hive_metastore.jar,httpclient-4.1.3.jar,ImpalaJDBC41.jar,libthrift-0.9.0.jar,ql.jar,slf4j-log4j12-1.5.11.jar,zookeeper-3.4.6.jar,commons-logging-1.1.1.jar,hive_service.jar,httpcore-4.1.3.jar,libfb303-0.9.0.jar,log4j-1.2.14.jar,slf4j-api-1.5.11.jar,TCLIServiceClient.jar
我的代码:
spark.read.format("jdbc") .option("driver","com.cloudera.impala.jdbc41.Driver")
.option("url","jdbc:impala:Domainname")
//.option("databaseName","default") also use impala::default
.option("dbtable", "impala::default.tablename")
.load()
顺便说一下,我通过 desc formatted tablename
获得了 "impala::default.tablename"。
输出:
java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:113)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
... 48 elided
最好直接使用 Spark 连接到 Kudu,然后通过 Impala 连接 该文档在此处 https://blog.cloudera.com/blog/2017/02/up-and-running-with-apache-spark-on-apache-kudu/
Impala 为 Spark SQL
提供有竞争力的表现