Spark / Scala 代码不再适用于 Spark 3.x
Spark / Scala code no longer working in Spark 3.x
这个运行下2.x还好:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.{Encoder, Encoders}
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.split
import org.apache.spark.sql.functions.broadcast
import org.apache.spark.sql.functions.{lead, lag}
import spark.implicits._
// Gen example data via DF, can come from files, ordering in those files assumed. I.e. no need to sort.
val df = Seq(
("1 February"), ("n"), ("c"), ("b"),
("2 February"), ("hh"), ("www"), ("e"),
("3 February"), ("y"), ("s"), ("j"),
("1 March"), ("c"), ("b"), ("x"),
("1 March"), ("c"), ("b"), ("x"),
("2 March"), ("c"), ("b"), ("x"),
("3 March"), ("c"), ("b"), ("x"), ("y"), ("z")
).toDF("line")
// Define Case Classes to avoid Row aspects on df --> rdd --> to DF which I always must look up again.
case class X(line: String)
case class Xtra(key: Long, line: String)
// Add the Seq Num using zipWithIndex.
val rdd = df.as[X].rdd.zipWithIndex().map{case (v,k) => (k,v)}
val ds = rdd.toDF("key", "line").as[Xtra]
最后一条语句returns现在在3.x下:
AnalysisException: Cannot up cast line from struct<line:string> to string.
The type path of the target object is:
- field (class: "java.lang.String", name: "line")
- root class: "$linecfabb246f6fc445196875da751b278e883.$read.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.Xtra"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object
我发现消息很难理解,也很难理解更改的原因。我刚刚在 2.4.5 下测试过。很好。
由于 line
被推断为结构,您可以稍微更改您的架构(案例 类):
case class X(line: String)
case class Xtra(key: Long, nested_line: X)
然后通过使用得到想要的结果:
val ds = rdd.toDF("key", "nested_line").as[Xtra].select("key", "nested_line.line")
这个运行下2.x还好:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.{Encoder, Encoders}
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.split
import org.apache.spark.sql.functions.broadcast
import org.apache.spark.sql.functions.{lead, lag}
import spark.implicits._
// Gen example data via DF, can come from files, ordering in those files assumed. I.e. no need to sort.
val df = Seq(
("1 February"), ("n"), ("c"), ("b"),
("2 February"), ("hh"), ("www"), ("e"),
("3 February"), ("y"), ("s"), ("j"),
("1 March"), ("c"), ("b"), ("x"),
("1 March"), ("c"), ("b"), ("x"),
("2 March"), ("c"), ("b"), ("x"),
("3 March"), ("c"), ("b"), ("x"), ("y"), ("z")
).toDF("line")
// Define Case Classes to avoid Row aspects on df --> rdd --> to DF which I always must look up again.
case class X(line: String)
case class Xtra(key: Long, line: String)
// Add the Seq Num using zipWithIndex.
val rdd = df.as[X].rdd.zipWithIndex().map{case (v,k) => (k,v)}
val ds = rdd.toDF("key", "line").as[Xtra]
最后一条语句returns现在在3.x下:
AnalysisException: Cannot up cast line from struct<line:string> to string.
The type path of the target object is:
- field (class: "java.lang.String", name: "line")
- root class: "$linecfabb246f6fc445196875da751b278e883.$read.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.Xtra"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object
我发现消息很难理解,也很难理解更改的原因。我刚刚在 2.4.5 下测试过。很好。
由于 line
被推断为结构,您可以稍微更改您的架构(案例 类):
case class X(line: String)
case class Xtra(key: Long, nested_line: X)
然后通过使用得到想要的结果:
val ds = rdd.toDF("key", "nested_line").as[Xtra].select("key", "nested_line.line")