升级到 Spark 2.0 dataframe.map
Upgrading to Spark 2.0 dataframe.map
我正在将一些 Spark 1.6 代码更新到 2.0.1,并且我 运行 遇到了一些使用地图的问题。
我看到其他关于 SO 问题的问题,例如 ,但我无法让这些技术发挥作用,它们对于下面的这种情况来说似乎很荒谬。
val df = spark.sqlContext.read.parquet(inputFile)
df: org.apache.spark.sql.DataFrame = [device_id: string, hour: string ... 9 more fields]
val deviceAggDF = df.select("device_id").distinct
deviceAggDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [device_id: string]
deviceAggDF.map( x =>
(
Map("ID" -> x.getAs[String](0)),
Map()
)
)
scala.MatchError: Nothing (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:667)
at org.apache.spark.sql.catalyst.ScalaReflection$.toCatalystArray(ScalaReflection.scala:448)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:482)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun.apply(ScalaReflection.scala:592)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun.apply(ScalaReflection.scala:583)
at scala.collection.TraversableLike$$anonfun$flatMap.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:274)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47)
要return空Map
你应该指定可以ecnode的类型,例如:
deviceAggDF.map( x =>
(
Map("ID" -> x.getAs[String](0)),
Map[String, String]()
)
)
Map()
是 Map[Nothing,Nothing]
,不能用于 Dataset
。
我正在将一些 Spark 1.6 代码更新到 2.0.1,并且我 运行 遇到了一些使用地图的问题。
我看到其他关于 SO 问题的问题,例如
val df = spark.sqlContext.read.parquet(inputFile)
df: org.apache.spark.sql.DataFrame = [device_id: string, hour: string ... 9 more fields]
val deviceAggDF = df.select("device_id").distinct
deviceAggDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [device_id: string]
deviceAggDF.map( x =>
(
Map("ID" -> x.getAs[String](0)),
Map()
)
)
scala.MatchError: Nothing (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:667)
at org.apache.spark.sql.catalyst.ScalaReflection$.toCatalystArray(ScalaReflection.scala:448)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:482)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun.apply(ScalaReflection.scala:592)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun.apply(ScalaReflection.scala:583)
at scala.collection.TraversableLike$$anonfun$flatMap.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:274)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47)
要return空Map
你应该指定可以ecnode的类型,例如:
deviceAggDF.map( x =>
(
Map("ID" -> x.getAs[String](0)),
Map[String, String]()
)
)
Map()
是 Map[Nothing,Nothing]
,不能用于 Dataset
。