Spark Cloudant error: 'nothing was saved because the number of records was 0!'

Spark Cloudant error: 'nothing was saved because the number of records was 0!'

我使用的是默认随 spark 服务一起安装的 spark-cloudant 库 1.6.3。

我正在尝试将一些数据保存到 Cloudant:

val df = getTopXRecommendationsForAllUsers().toDF.filter( $"_1" > 6035)

println(s"Saving ${df.count()} ratings to Cloudant: " + new Date())

println(df.show(5))

val timestamp: Long = System.currentTimeMillis / 1000
val dbName: String = s"${destDB.database}_${timestamp}"

df.write.mode("append").json(s"${dbName}.json")

val dfWriter = df.write.format("com.cloudant.spark")
dfWriter.option("cloudant.host", destDB.host)
if (destDB.username.isDefined && destDB.username.get.nonEmpty) dfWriter.option("cloudant.username", destDB.username.get)
if (destDB.password.isDefined && destDB.password.get.nonEmpty) dfWriter.option("cloudant.password", destDB.password.get)
dfWriter.save(dbName)

但是,我遇到了错误:

Starting getTopXRecommendationsForAllUsers: Sat Dec 24 08:50:11 CST 2016
Finished getTopXRecommendationsForAllUsers: Sat Dec 24 08:50:11 CST 2016
Saving 6 ratings to Cloudant: Sat Dec 24 08:50:17 CST 2016
+----+--------------------+
|  _1|                  _2|
+----+--------------------+
|6036|[[6036,2503,4.395...|
|6037|[[6037,572,4.5785...|
|6038|[[6038,1696,4.894...|
|6039|[[6039,572,4.6854...|
|6040|[[6040,670,4.6820...|
+----+--------------------+
only showing top 5 rows

()
Use connectorVersion=1.6.3, dbName=recommendationdb_1482591017, indexName=null, viewName=null,jsonstore.rdd.partitions=5, + jsonstore.rdd.maxInPartition=-1,jsonstore.rdd.minInPartition=10, jsonstore.rdd.requestTimeout=900000,bulkSize=20, schemaSampleSize=1
Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: Task 2 in stage 642.0 failed 10 times, most recent failure: Lost task 2.9 in stage 642.0 (TID 409, yp-spark-dal09-env5-0049): java.lang.RuntimeException: Database recommendationdb_1482591017: nothing was saved because the number of records was 0!
    at com.cloudant.spark.common.JsonStoreDataAccess.saveAll(JsonStoreDataAccess.scala:187)

我知道有数据,因为我也将它保存到文件中:

! cat recommendationdb_1482591017.json/*

{"_1":6036,"_2":[{"user":6036,"product":2503,"rating":4.3957030284620355},{"user":6036,"product":2019,"rating":4.351395783537379},{"user":6036,"product":1178,"rating":4.3373212302468165},{"user":6036,"product":923,"rating":4.3328207761734605},{"user":6036,"product":922,"rating":4.320787353937724},{"user":6036,"product":750,"rating":4.307312349612301},{"user":6036,"product":53,"rating":4.304341611330176},{"user":6036,"product":858,"rating":4.297961629128419},{"user":6036,"product":1212,"rating":4.285360675560061},{"user":6036,"product":1423,"rating":4.275255129149407}]}
{"_1":6037,"_2":[{"user":6037,"product":572,"rating":4.578508339835482},{"user":6037,"product":858,"rating":4.247809350206506},{"user":6037,"product":904,"rating":4.1222486445799404},{"user":6037,"product":527,"rating":4.117342524702621},{"user":6037,"product":787,"rating":4.115781026855997},{"user":6037,"product":2503,"rating":4.109861422105844},{"user":6037,"product":1193,"rating":4.088453520710152},{"user":6037,"product":912,"rating":4.085139017248665},{"user":6037,"product":1221,"rating":4.084368219857013},{"user":6037,"product":1207,"rating":4.082536396283374}]}
{"_1":6038,"_2":[{"user":6038,"product":1696,"rating":4.894442132848873},{"user":6038,"product":2998,"rating":4.887752985607918},{"user":6038,"product":2562,"rating":4.740442462948304},{"user":6038,"product":3245,"rating":4.7366090605162094},{"user":6038,"product":2609,"rating":4.736125582066063},{"user":6038,"product":1669,"rating":4.678373819044571},{"user":6038,"product":572,"rating":4.606132758047402},{"user":6038,"product":1493,"rating":4.577140478430046},{"user":6038,"product":745,"rating":4.56568047928448},{"user":6038,"product":213,"rating":4.546054686400765}]}
{"_1":6039,"_2":[{"user":6039,"product":572,"rating":4.685425482619273},{"user":6039,"product":527,"rating":4.291256016077275},{"user":6039,"product":904,"rating":4.27766400846558},{"user":6039,"product":2019,"rating":4.273486883864949},{"user":6039,"product":2905,"rating":4.266371181044469},{"user":6039,"product":912,"rating":4.26006044096224},{"user":6039,"product":1207,"rating":4.259935289367192},{"user":6039,"product":2503,"rating":4.250370780277651},{"user":6039,"product":1148,"rating":4.247288578998062},{"user":6039,"product":745,"rating":4.223697008637559}]}
{"_1":6040,"_2":[{"user":6040,"product":670,"rating":4.682008703927743},{"user":6040,"product":3134,"rating":4.603656534071515},{"user":6040,"product":2503,"rating":4.571906881428182},{"user":6040,"product":3415,"rating":4.523567737705732},{"user":6040,"product":3808,"rating":4.516778146579665},{"user":6040,"product":3245,"rating":4.496176019230939},{"user":6040,"product":53,"rating":4.491020821805015},{"user":6040,"product":668,"rating":4.471757243976877},{"user":6040,"product":3030,"rating":4.464674231353673},{"user":6040,"product":923,"rating":4.446195112198678}]}
{"_1":6042,"_2":[{"user":6042,"product":3389,"rating":3.331488167984286},{"user":6042,"product":572,"rating":3.3312810949271903},{"user":6042,"product":231,"rating":3.2622287749148926},{"user":6042,"product":1439,"rating":3.0988533259613944},{"user":6042,"product":333,"rating":3.0859809743588706},{"user":6042,"product":404,"rating":3.0573976830913203},{"user":6042,"product":216,"rating":3.044620107397873},{"user":6042,"product":408,"rating":3.038302525994588},{"user":6042,"product":2411,"rating":3.0190834747311244},{"user":6042,"product":875,"rating":2.9860048032439095}]}

这是 spark-cloudant 1.6.3 的缺陷,已在 1.6.4 中修复。拉取请求是 https://github.com/cloudant-labs/spark-cloudant/pull/61

答案是升级到spark-cloudant 1.6.4。如果您尝试在 IBM Bluemix Spark 服务上执行此操作,请参阅此答案: