Zeppelin 无法使用 Spark 解释器加载 mongodb 集合
Zeppelin fail to load mongodb collecttion with spark interpreter
我正在使用 zeppelin 版本 0.8.0,mongodb 4.0,spark 2.2.0,mongospark 连接器 2.2.4,mongo java 驱动程序3.8
sc.version
import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.{ReadConfig, WriteConfig}
import com.mongodb.spark.sql._
import org.apache.spark.sql.functions._
import org.bson.Document
import collection.JavaConverters._
import org.apache.zeppelin.display.angular.paragraphscope._
import AngularElem._
val readConfig = ReadConfig(Map("uri" -> "mongodb://127.0.0.1:27017/",
"database" -> "test","collection" -> "Collection_f"))
val zipDf = spark.sparkSession.read.mongo(readConfig).toDF()
给出:
import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.
{ReadConfig, WriteConfig} import com.mongodb.spark.sql._
import org.apache.spark.sql.functions._
import org.bson.Document
import collection.JavaConverters._
import org.apache.zeppelin.display.angular.paragraphscope._
import AngularElem._
readConfig:com.mongodb.spark.config.ReadConfig.Self=
ReadConfig(test,Collection_f,Some(mongo
db://127.0.0.1:27017/),1000,DefaultMongoPartitioner,Map(),15,
ReadPreferenceConfig(primary,None),
ReadConcernConfig(None),
AggregationConfig(None,None),false,true,250,true,
true)
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task
0.0 in stage 0.0 (TID 0, localhost, executor driver):
com.mongodb.MongoCommandException: Command failed with error 16820
(Location16820): 'Sort exceeded memory limit of 104857600 bytes,
but did not opt in to external sorting. Aborting operation. Pass
allowDiskUse:true to opt in.' on server 127.0.0.1:27017. The full
response is { "ok" : 0.0, "errmsg" : "Sort exceeded memory limit of
104857600 bytes, but did not opt in to external sorting. Aborting
operation. Pass allowDiskUse:true to opt in.", "code" : 16820,
"codeName" : "Location16820" }
我认为这是一个取决于 allowDiskUse
变量的问题。我在哪里可以将其修复为 true?
通过更改为 2.2.3 连接器解决了 enter link description here
我正在使用 zeppelin 版本 0.8.0,mongodb 4.0,spark 2.2.0,mongospark 连接器 2.2.4,mongo java 驱动程序3.8
sc.version
import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.{ReadConfig, WriteConfig}
import com.mongodb.spark.sql._
import org.apache.spark.sql.functions._
import org.bson.Document
import collection.JavaConverters._
import org.apache.zeppelin.display.angular.paragraphscope._
import AngularElem._
val readConfig = ReadConfig(Map("uri" -> "mongodb://127.0.0.1:27017/",
"database" -> "test","collection" -> "Collection_f"))
val zipDf = spark.sparkSession.read.mongo(readConfig).toDF()
给出:
import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.
{ReadConfig, WriteConfig} import com.mongodb.spark.sql._
import org.apache.spark.sql.functions._
import org.bson.Document
import collection.JavaConverters._
import org.apache.zeppelin.display.angular.paragraphscope._
import AngularElem._
readConfig:com.mongodb.spark.config.ReadConfig.Self=
ReadConfig(test,Collection_f,Some(mongo
db://127.0.0.1:27017/),1000,DefaultMongoPartitioner,Map(),15,
ReadPreferenceConfig(primary,None),
ReadConcernConfig(None),
AggregationConfig(None,None),false,true,250,true,
true)
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task
0.0 in stage 0.0 (TID 0, localhost, executor driver):
com.mongodb.MongoCommandException: Command failed with error 16820
(Location16820): 'Sort exceeded memory limit of 104857600 bytes,
but did not opt in to external sorting. Aborting operation. Pass
allowDiskUse:true to opt in.' on server 127.0.0.1:27017. The full
response is { "ok" : 0.0, "errmsg" : "Sort exceeded memory limit of
104857600 bytes, but did not opt in to external sorting. Aborting
operation. Pass allowDiskUse:true to opt in.", "code" : 16820,
"codeName" : "Location16820" }
我认为这是一个取决于 allowDiskUse
变量的问题。我在哪里可以将其修复为 true?
通过更改为 2.2.3 连接器解决了 enter link description here