将列表列表转换为数据框
Converting List of List to Dataframe
我正在将数据(如下所示)读入列表列表,我想将其转换为具有七列的数据框。我得到的错误是:requirement failed: number of columns doesn't match. Old column names (1): value, new column names (7): <list of columns>
我做错了什么,我该如何解决?
数据:
Column1, Column2, Column3, Column4, Column5, Column6, Column7
a,b,c,d,e,f,g
a2,b2,c2,d2,e2,f2,g2
代码:
val spark = SparkSession.builder.appName("er").master("local").getOrCreate()
import spark.implicits._
val erResponse = response.body.toString.split("\\n")
val header = erResponse(0)
val body = erResponse.drop(1).map(x => x.split(",").toList).toList
val erDf = body.toDF()
erDf.show()
您收到此 number of columns doesn't match
错误是因为您的 erDf
数据框仅包含一列,其中包含一个数组:
+----------------------------+
|value |
+----------------------------+
|[a, b, c, d, e, f, g] |
|[a2, b2, c2, d2, e2, f2, g2]|
+----------------------------+
您无法将此唯一列与 header 中包含的七列相匹配。
此处的解决方案是,给定此 erDf
数据框,迭代您的 header 列列表以逐一构建列。您的完整代码因此变为:
val spark = SparkSession.builder.appName("er").master("local").getOrCreate()
import spark.implicits._
val erResponse = response.body.toString.split("\\n")
val header = erResponse(0).split(", ") // build header columns list
val body = erResponse.drop(1).map(x => x.split(",").toList).toList
val erDf = header
.zipWithIndex
.foldLeft(body.toDF())((acc, elem) => acc.withColumn(elem._1, col("value")(elem._2)))
.drop("value")
这将为您提供以下 erDf
数据框:
+-------+-------+-------+-------+-------+-------+-------+
|Column1|Column2|Column3|Column4|Column5|Column6|Column7|
+-------+-------+-------+-------+-------+-------+-------+
| a| b| c| d| e| f| g|
| a2| b2| c2| d2| e2| f2| g2|
+-------+-------+-------+-------+-------+-------+-------+
我正在将数据(如下所示)读入列表列表,我想将其转换为具有七列的数据框。我得到的错误是:requirement failed: number of columns doesn't match. Old column names (1): value, new column names (7): <list of columns>
我做错了什么,我该如何解决?
数据:
Column1, Column2, Column3, Column4, Column5, Column6, Column7
a,b,c,d,e,f,g
a2,b2,c2,d2,e2,f2,g2
代码:
val spark = SparkSession.builder.appName("er").master("local").getOrCreate()
import spark.implicits._
val erResponse = response.body.toString.split("\\n")
val header = erResponse(0)
val body = erResponse.drop(1).map(x => x.split(",").toList).toList
val erDf = body.toDF()
erDf.show()
您收到此 number of columns doesn't match
错误是因为您的 erDf
数据框仅包含一列,其中包含一个数组:
+----------------------------+
|value |
+----------------------------+
|[a, b, c, d, e, f, g] |
|[a2, b2, c2, d2, e2, f2, g2]|
+----------------------------+
您无法将此唯一列与 header 中包含的七列相匹配。
此处的解决方案是,给定此 erDf
数据框,迭代您的 header 列列表以逐一构建列。您的完整代码因此变为:
val spark = SparkSession.builder.appName("er").master("local").getOrCreate()
import spark.implicits._
val erResponse = response.body.toString.split("\\n")
val header = erResponse(0).split(", ") // build header columns list
val body = erResponse.drop(1).map(x => x.split(",").toList).toList
val erDf = header
.zipWithIndex
.foldLeft(body.toDF())((acc, elem) => acc.withColumn(elem._1, col("value")(elem._2)))
.drop("value")
这将为您提供以下 erDf
数据框:
+-------+-------+-------+-------+-------+-------+-------+
|Column1|Column2|Column3|Column4|Column5|Column6|Column7|
+-------+-------+-------+-------+-------+-------+-------+
| a| b| c| d| e| f| g|
| a2| b2| c2| d2| e2| f2| g2|
+-------+-------+-------+-------+-------+-------+-------+