elasticsearch-spark indexing error : Cannot handle type Map within Map using ScalaValueWriter
elasticsearch-spark indexing error : Cannot handle type Map within Map using ScalaValueWriter
我正在尝试使用带有 spark-1.3.1 的 elasticsearch-spark-2.1.0 在 elasticsearch 中索引数据,但出现以下错误:
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: Cannot handle type [class scala.collection.immutable.Map$Map3] within type [class scala.collection.immutable.Map$Map4], instance [Map(word -> ..., pos -> ...)] within instance [Map(page_title -> ..., full -> ..., tokens -> [Lscala.collection.immutable.Map;@1efb3e9)] using writer [org.elasticsearch.spark.serialization.ScalaValueWriter@200c86fd]
这是我索引 spark RDD 的代码。
val spark = new SparkContext(...)
val filesRDD = spark.wholeTextFiles("hdfs://" + source_dir + "/*", 200)
// val sentenceList: RDD[Map[String, Object with Serializable { .. }]]
val sentenceList = filesRDD.flatMap(file => ...)
.flatMap { page =>
page.sentences.map { sentence =>
Map("page_title" -> page.title,
"full" -> sentence.map(_.word).mkString(" "),
"tokens" -> sentence.map { t =>
Map("word" -> t.word, "pos" -> t.pos)
}.toArray)
}
}
EsSpark.saveToEs(sentenceList, ES_RESOURCE)
为什么我不能在地图中索引地图,如何解决?
谢谢
终于解决了问题
我只是删除了 Map 中的 .toArray 调用。好像是库解析不了。
生成的地图是:
Map("page_title" -> page.title,
"full" -> sentence.map(_.word).mkString(" "),
"tokens" -> sentence.map { t =>
Map("word" -> t.word, "pos" -> t.pos)
})
我正在尝试使用带有 spark-1.3.1 的 elasticsearch-spark-2.1.0 在 elasticsearch 中索引数据,但出现以下错误:
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: Cannot handle type [class scala.collection.immutable.Map$Map3] within type [class scala.collection.immutable.Map$Map4], instance [Map(word -> ..., pos -> ...)] within instance [Map(page_title -> ..., full -> ..., tokens -> [Lscala.collection.immutable.Map;@1efb3e9)] using writer [org.elasticsearch.spark.serialization.ScalaValueWriter@200c86fd]
这是我索引 spark RDD 的代码。
val spark = new SparkContext(...)
val filesRDD = spark.wholeTextFiles("hdfs://" + source_dir + "/*", 200)
// val sentenceList: RDD[Map[String, Object with Serializable { .. }]]
val sentenceList = filesRDD.flatMap(file => ...)
.flatMap { page =>
page.sentences.map { sentence =>
Map("page_title" -> page.title,
"full" -> sentence.map(_.word).mkString(" "),
"tokens" -> sentence.map { t =>
Map("word" -> t.word, "pos" -> t.pos)
}.toArray)
}
}
EsSpark.saveToEs(sentenceList, ES_RESOURCE)
为什么我不能在地图中索引地图,如何解决? 谢谢
终于解决了问题
我只是删除了 Map 中的 .toArray 调用。好像是库解析不了。
生成的地图是:
Map("page_title" -> page.title,
"full" -> sentence.map(_.word).mkString(" "),
"tokens" -> sentence.map { t =>
Map("word" -> t.word, "pos" -> t.pos)
})