将列表列表转换为数据框

Converting List of List to Dataframe

我正在将数据(如下所示)读入列表列表,我想将其转换为具有七列的数据框。我得到的错误是:requirement failed: number of columns doesn't match. Old column names (1): value, new column names (7): <list of columns>

我做错了什么,我该如何解决?

数据:

Column1, Column2, Column3, Column4, Column5, Column6, Column7
a,b,c,d,e,f,g
a2,b2,c2,d2,e2,f2,g2

代码:

val spark = SparkSession.builder.appName("er").master("local").getOrCreate()
import spark.implicits._
val erResponse = response.body.toString.split("\\n")
val header = erResponse(0)
val body = erResponse.drop(1).map(x => x.split(",").toList).toList
val erDf = body.toDF()
erDf.show()

您收到此 number of columns doesn't match 错误是因为您的 erDf 数据框仅包含一列,其中包含一个数组:

+----------------------------+
|value                       |
+----------------------------+
|[a, b, c, d, e, f, g]       |
|[a2, b2, c2, d2, e2, f2, g2]|
+----------------------------+

您无法将此唯一列与 header 中包含的七列相匹配。

此处的解决方案是,给定此 erDf 数据框,迭代您的 header 列列表以逐一构建列。您的完整代码因此变为:

val spark = SparkSession.builder.appName("er").master("local").getOrCreate()
import spark.implicits._
val erResponse = response.body.toString.split("\\n")
val header = erResponse(0).split(", ") // build header columns list
val body = erResponse.drop(1).map(x => x.split(",").toList).toList
val erDf = header
  .zipWithIndex
  .foldLeft(body.toDF())((acc, elem) => acc.withColumn(elem._1, col("value")(elem._2)))
  .drop("value")

这将为您提供以下 erDf 数据框:

+-------+-------+-------+-------+-------+-------+-------+
|Column1|Column2|Column3|Column4|Column5|Column6|Column7|
+-------+-------+-------+-------+-------+-------+-------+
|      a|      b|      c|      d|      e|      f|      g|
|     a2|     b2|     c2|     d2|     e2|     f2|     g2|
+-------+-------+-------+-------+-------+-------+-------+