com.cloudant.spark 在 DSX Notebook 中找不到数据源
com.cloudant.spark data source not found in DSX Notebook
我正在尝试按照 https://developer.ibm.com/clouddataservices/docs/ibm-data-science-experience/docs/load-and-filter-cloudant-data-with-spark/ 使用 Spark 加载云数据。我有一个带有 Spark 2.1 的 Scala 2.11(也适用于 Spark 2.0)笔记本,其中包含以下代码:
// @hidden_cell
var credentials = scala.collection.mutable.HashMap[String, String](
"username"->"<redacted>",
"password"->"""<redacted>""",
"host"->"<redacted>",
"port"->"443",
"url"->"<redacted>"
)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val cloudantdata = sqlContext.read.format("com.cloudant.spark").
option("cloudant.host", credentials("host")).
option("cloudant.username", credentials("username")).
option("cloudant.password", credentials("password")).
load("crimes")
尝试执行该单元仅在
结束
Name: java.lang.ClassNotFoundException
Message: Failed to find data source: com.cloudant.spark. Please find packages at http://spark.apache.org/third-party-projects.html
StackTrace: at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
... 42 elided
Caused by: java.lang.ClassNotFoundException: com.cloudant.spark.DefaultSource
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844)
at java.lang.ClassLoader.loadClass(ClassLoader.java:823)
at java.lang.ClassLoader.loadClass(ClassLoader.java:803)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$$anonfun$apply.apply(DataSource.scala:554)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$$anonfun$apply.apply(DataSource.scala:554)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun.apply(DataSource.scala:554)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun.apply(DataSource.scala:554)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:554)
如何克服此错误并连接到我的 Cloudant 数据库?
一定是有一些问题导致 cloudant 驱动程序丢失,这通常默认存在于 DSX Notebook 中。
请更改为 python 2.0 和 spark 2.1 内核
运行 cloudant 连接器的一次性安装(每个 spark 服务),以便它可用于所有 spark 2.0+ 内核。
!pip install --upgrade pixiedust
import pixiedust
pixiedust.installPackage("cloudant-labs:spark-cloudant:2.0.0-s_2.11")
重启内核一次。
然后将内核更改为您的 scala 内核,然后 运行 您的 cloudant 连接代码。
谢谢,
查尔斯.
我正在尝试按照 https://developer.ibm.com/clouddataservices/docs/ibm-data-science-experience/docs/load-and-filter-cloudant-data-with-spark/ 使用 Spark 加载云数据。我有一个带有 Spark 2.1 的 Scala 2.11(也适用于 Spark 2.0)笔记本,其中包含以下代码:
// @hidden_cell
var credentials = scala.collection.mutable.HashMap[String, String](
"username"->"<redacted>",
"password"->"""<redacted>""",
"host"->"<redacted>",
"port"->"443",
"url"->"<redacted>"
)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val cloudantdata = sqlContext.read.format("com.cloudant.spark").
option("cloudant.host", credentials("host")).
option("cloudant.username", credentials("username")).
option("cloudant.password", credentials("password")).
load("crimes")
尝试执行该单元仅在
结束Name: java.lang.ClassNotFoundException Message: Failed to find data source: com.cloudant.spark. Please find packages at http://spark.apache.org/third-party-projects.html StackTrace: at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135) ... 42 elided Caused by: java.lang.ClassNotFoundException: com.cloudant.spark.DefaultSource at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62) at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844) at java.lang.ClassLoader.loadClass(ClassLoader.java:823) at java.lang.ClassLoader.loadClass(ClassLoader.java:803) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$$anonfun$apply.apply(DataSource.scala:554) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$$anonfun$apply.apply(DataSource.scala:554) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun.apply(DataSource.scala:554) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun.apply(DataSource.scala:554) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:554)
如何克服此错误并连接到我的 Cloudant 数据库?
一定是有一些问题导致 cloudant 驱动程序丢失,这通常默认存在于 DSX Notebook 中。 请更改为 python 2.0 和 spark 2.1 内核 运行 cloudant 连接器的一次性安装(每个 spark 服务),以便它可用于所有 spark 2.0+ 内核。
!pip install --upgrade pixiedust
import pixiedust
pixiedust.installPackage("cloudant-labs:spark-cloudant:2.0.0-s_2.11")
重启内核一次。
然后将内核更改为您的 scala 内核,然后 运行 您的 cloudant 连接代码。
谢谢, 查尔斯.