Apache Spark - 转换 - 行值作为列 headers - 数据透视
Apache Spark - Transformation - Row values as column headers - Pivot
我有如下数据集
(id, 日期, 价格)
- 1, 2017-01-10, 100
- 1, 2017-01-11, 110
- 2, 2017-01-10, 100
- 2, 2017-01-12, 120
我需要下面的结果
pidx/date : 2017-01-10 2017-01-11 2017-01-12
1: 100 110 -
2: 100 - 120
哪些转换会导致上述输出
您可以将 pivot
与 groupBy
一起使用以获得输出
import spark.implicits._
//dummy data
val df = Seq(
(1, "2017-01-10", 100),
(1, "2017-01-11", 110),
(2, "2017-01-10", 100),
(2, "2017-01-12", 120)
).toDF("id", "date", "price")
//first groupBy id and pivot the date and calculate the sum
val resultDF = df.groupBy("id").pivot("date").agg(sum("price"))
resultDF.show()
输出:
+---+----------+----------+----------+
|id |2017-01-10|2017-01-11|2017-01-12|
+---+----------+----------+----------+
| 1 |100 |110 |null |
| 2 |100 |null |120 |
+---+----------+----------+----------+
我有如下数据集 (id, 日期, 价格)
- 1, 2017-01-10, 100
- 1, 2017-01-11, 110
- 2, 2017-01-10, 100
- 2, 2017-01-12, 120
我需要下面的结果
pidx/date : 2017-01-10 2017-01-11 2017-01-12
1: 100 110 -
2: 100 - 120
哪些转换会导致上述输出
您可以将 pivot
与 groupBy
一起使用以获得输出
import spark.implicits._
//dummy data
val df = Seq(
(1, "2017-01-10", 100),
(1, "2017-01-11", 110),
(2, "2017-01-10", 100),
(2, "2017-01-12", 120)
).toDF("id", "date", "price")
//first groupBy id and pivot the date and calculate the sum
val resultDF = df.groupBy("id").pivot("date").agg(sum("price"))
resultDF.show()
输出:
+---+----------+----------+----------+
|id |2017-01-10|2017-01-11|2017-01-12|
+---+----------+----------+----------+
| 1 |100 |110 |null |
| 2 |100 |null |120 |
+---+----------+----------+----------+