来自 json 的带有内部数组的 spark 数据集
spark dataset from json with inner array
我正在尝试将 json 读入数据集 (spark 2.1.1)。不幸的是它不起作用。并失败:
Caused by: java.lang.NullPointerException: Null value appeared in non-
nullable field:
- field (class: "scala.Long", name: "age")
知道我做错了什么吗?
case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Long)
val sampleJson = """{"id":"kotek", "pets":[{"name":"miauczek",
"age":18}, {"name":"miauczek2", "age":9}]}"""
val session = SparkSession.builder().master("local").getOrCreate()
import session.implicits._
val rdd = session.sparkContext.parallelize(Seq(sampleJson))
val ds = session.read.json(rdd).as[Owner].collect()
通常,如果可以缺少某些字段,请使用 Option
:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Option[Long])
或nullable
类型:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: java.lang.Long)
但这一个看起来确实像一个错误。我测试了这个ins Spark 2.2,现在已经解决了。我认为快速解决方法是让字段按名称排序:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(age: java.lang.Long, name: String)
我正在尝试将 json 读入数据集 (spark 2.1.1)。不幸的是它不起作用。并失败:
Caused by: java.lang.NullPointerException: Null value appeared in non-
nullable field:
- field (class: "scala.Long", name: "age")
知道我做错了什么吗?
case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Long)
val sampleJson = """{"id":"kotek", "pets":[{"name":"miauczek",
"age":18}, {"name":"miauczek2", "age":9}]}"""
val session = SparkSession.builder().master("local").getOrCreate()
import session.implicits._
val rdd = session.sparkContext.parallelize(Seq(sampleJson))
val ds = session.read.json(rdd).as[Owner].collect()
通常,如果可以缺少某些字段,请使用 Option
:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Option[Long])
或nullable
类型:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: java.lang.Long)
但这一个看起来确实像一个错误。我测试了这个ins Spark 2.2,现在已经解决了。我认为快速解决方法是让字段按名称排序:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(age: java.lang.Long, name: String)