无法从 Spark 访问 sqlite 数据库

Can't access sqlite db from Spark

我有以下代码:

val conf = new SparkConf().setAppName("Spark Test")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val data = sqlContext.read.format("jdbc").options(
  Map(
    "url" -> "jdbc:sqlite:/nv/pricing/ix_tri_pi.sqlite3",
    "dbtable" -> "SELECT security_id FROM ix_tri_pi")).load()

data.foreach {
  row => println(row.getInt(1))
}

然后我尝试提交:

spark-submit \
  --class "com.novus.analytics.spark.SparkTest" \ 
  --master "local[4]" \
 /Users/smabie/workspace/analytics/analytics-spark/target/scala-2.10/analytics-spark.jar \
  --conf spark.executer.extraClassPath=sqlite-jdbc-3.8.7.jar \
  --conf  spark.driver.extraClassPath=sqlite-jdbc-3.8.7.jar \
  --driver-class-path sqlite-jdbc-3.8.7.jar \
  --jars sqlite-jdbc-3.8.7.jar

但我得到以下异常:

Exception in thread "main" java.sql.SQLException: No suitable driver

我正在使用 Spark 1.6.1 版,如果有帮助的话。 谢谢!

您是否尝试在选项中明确指定驱动程序 class?

options(
  Map(
    "url" -> "jdbc:sqlite:/nv/pricing/ix_tri_pi.sqlite3",
    "driver" -> "org.sqlite.JDBC",
    "dbtable" -> "SELECT security_id FROM ix_tri_pi"))

我在尝试加载 PostgreSQL 时遇到了类似的问题 table。

此外,可能的原因可能在class加载中:

The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java’s DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs.

http://spark.apache.org/docs/latest/sql-programming-guide.html#troubleshooting

尝试将您的 jar 定义为 spark-submit 的最后一个参数。