Apache-Zeppelin / Spark：为什么我不能使用此代码示例访问远程数据库

Question

我正在使用 Spark 和 Zeppelin 执行我自己的第一步，但不明白为什么此代码示例不起作用。

第一块：

%dep
z.reset()                                                     // clean up 
z.load("/data/extraJarFiles/postgresql-9.4.1208.jar")         // load a jdbc driver for postgresql

第二块

%spark
// This code loads some data from a PostGreSql DB with the help of a JDBC driver.
// The JDBC driver is stored on the Zeppelin server, the necessary Code is transfered to the Spark Workers and the workers build the connection with the DB.
// 
// The connection between table and data source is "lazy".  So the data will only be loaded in the case that an action need them.
// With the current script means this the DB is queried twice.   ==> Q:  How can I keep a RDD in Mem or on disk?

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.rdd.JdbcRDD
import java.sql.Connection
import java.sql.DriverManager
import java.sql.ResultSet

import org.apache.spark.sql.hive._ 
import org.apache.spark.sql._

val url = "jdbc:postgresql://10.222.22.222:5432/myDatabase"
val username = "postgres"
val pw = "geheim"

Class.forName("org.postgresql.Driver").newInstance                             // activating the jdbc driver. The jar file was loaded inside of the %dep block


case class RowClass(Id:Integer, Col1:String , Col2:String)                         // create a class with possible values

val myRDD = new JdbcRDD(sc,                                                    // SparkContext sc
                        () => DriverManager.getConnection(url,username,pw),    // scala.Function0<java.sql.Connection> getConnection
                        "select * from tab1 where \"Id\">=? and \"Id\" <=? ",  // String sql    Important: we need here two '?' for the lower/upper Bounds vlaues
                        0,                                                     // long lowerBound   = start value
                        10000,                                                // long upperBound,  = end value that is still included
                        1,                                                     // int numPartitions  = the area is spitted into x sub commands.   
                                                                               //  e.g. 0,1000,2  => first cmd from 0 ... 499, second cmd from 500..1000
                        row => RowClass(row.getInt("Id"),
                                        row.getString("Col1"),
                                        row.getString("Col2"))
                       )

myRDD.toDF().registerTempTable("Tab1")

// --- improved methode (not working at the moment)----
val prop = new java.util.Properties
prop.setProperty("user",username)
prop.setProperty("password",pw)

val tab1b = sqlContext.read.jdbc(url,"tab1",prop)             // <-- not working

tab1b.show

所以问题是什么

我想连接到外部 PostgreSql 数据库。

块 I 正在为数据库添加必要的 JAR 文件，第二个块的第一行已经在使用 JAR，它能够从数据库中获取一些数据。

但是第一种方式比较难看，因为必须自己将数据转换成table，所以我想在脚本的最后使用更简单的方法。

但我收到错误消息

java.sql.SQLException: No suitable driver found for jdbc:postgresql://10.222.22.222:5432/myDatabase

但与上面的代码相同URL/相同的登录名/相同的密码。为什么这不起作用？

也许有人对我有帮助的提示。

---- 更新：24.3。 12:15 ---

我不认为 JAR 的加载不起作用。我添加了一个额外的 val db = DriverManager.getConnection(url, username, pw); 用于测试。（在异常内部失败的函数）这很好用。

另一个有趣的细节。如果我删除 %dep 块和 class 行，则会在第一个块中产生一个非常相似的错误。相同的错误信息；相同的函数+失败的行号，但函数堆栈有点不同。

我在这里找到了源代码：http://code.metager.de/source/xref/openjdk/jdk8/jdk/src/share/classes/java/sql/DriverManager.java

我的问题在第 689 行。所以如果所有参数都正常，可能来自 isDriverAllowed() 检查？

Answer 1

我在 Zeppelin 中遇到过与依赖项相同的问题，我不得不将我的 jar 添加到 zeepelin 中的 SPARK_SUBMIT_OPTIONS-env.sh 以将它们包含在所有笔记本和段落中

所以在 zeppelin-env.sh 你修改 SPARK_SUBMIT_OPTIONS 为：

export SPARK_SUBMIT_OPTIONS="--jars /data/extraJarFiles/postgresql-9.4.1208.jar

然后你必须重新启动你的 zeppelin 实例。

Answer 2

就我而言，在执行 spark/scala 代码时，我收到了同样的错误。我之前在我的 spark-env.sh conf 文件中设置了 SPARK_CLASSPATH - 它指向一个 jar 文件。我 removed/commented 退出了 spark-env.sh 中的行并重新启动了 zepplin。这消除了错误。

Apache-Zeppelin / Spark：为什么我不能使用此代码示例访问远程数据库

Apache-Zeppelin / Spark : Why can't I access a remote DB with this code sample

java

scalar

scala

apache-spark

apache-zeppelin