为什么编码JSON时使用大小写类会报错"Unable to find encoder for type stored in a Dataset"？

Question

我写过 spark 作业：

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val ctx = new org.apache.spark.sql.SQLContext(sc)
    import ctx.implicits._

    case class Person(age: Long, city: String, id: String, lname: String, name: String, sex: String)
    case class Person2(name: String, age: Long, city: String)

    val persons = ctx.read.json("/tmp/persons.json").as[Person]
    persons.printSchema()
  }
}

在IDE我运行主函数的时候，出现2个错误：

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.
    val persons = ctx.read.json("/tmp/persons.json").as[Person]
                                                                  ^

Error:(15, 67) not enough arguments for method as: (implicit evidence: org.apache.spark.sql.Encoder[Person])org.apache.spark.sql.Dataset[Person].
Unspecified value parameter evidence.
    val persons = ctx.read.json("/tmp/persons.json").as[Person]
                                                                  ^

但在 Spark Shell 中，我可以运行这个作业而不会出现任何错误。有什么问题？

Answer 1

错误消息说 Encoder 无法接受 Person 案例 class。

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.

将案例 class 的声明移到 SimpleApp 的范围之外。

Answer 2

如果在 SimpleApp 中添加 sqlContext.implicits._ 和 spark.implicits._ 也会出现同样的错误（顺序无关紧要）。

删除一个或另一个将是解决方案：

val spark = SparkSession
  .builder()
  .getOrCreate()

val sqlContext = spark.sqlContext
import sqlContext.implicits._ //sqlContext OR spark implicits
//import spark.implicits._ //sqlContext OR spark implicits

case class Person(age: Long, city: String)
val persons = ctx.read.json("/tmp/persons.json").as[Person]

使用 Spark 2.1.0

测试

有趣的是，如果你添加相同的对象隐含两次你不会有问题。

Answer 3

@Milad Khajavi

在对象 SimpleApp 之外定义人员案例类。另外，在 main() 函数中添加 import sqlContext.implicits._。

为什么编码JSON时使用大小写类会报错"Unable to find encoder for type stored in a Dataset"？

Why is the error "Unable to find encoder for type stored in a Dataset" when encoding JSON using case classes?

scala

apache-spark

apache-spark-dataset

apache-spark-encoders