如何重命名通过 Apache Spark 中的 GroupedDataset 操作创建的新列?
How to rename newly columns that are created by operation on GroupedDataset in Apache Spark?
如何在不将结果转换为 DataFrame 的情况下重命名 count
操作的列?
case class LogRow(id: String, location: String, time: Long)
case class KeyValue(key: (String, String), value: Long)
val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b", 3) :: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) :: LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) :: LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) :: LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil
log.toDS().groupBy(l => {
(l.id, l.location)
}).count().toDF().toDF("key", "value").as[KeyValue].show
+-----+-----+
| key|value|
+-----+-----+
|[1,a]| 3|
|[1,b]| 3|
|[1,c]| 1|
|[2,a]| 4|
|[2,b]| 2|
|[2,c]| 1|
+-----+-----+
直接映射到需要的类型即可:
log.toDS.groupBy(l => {
(l.id, l.location)
}).count.as[KeyValue]
如何在不将结果转换为 DataFrame 的情况下重命名 count
操作的列?
case class LogRow(id: String, location: String, time: Long)
case class KeyValue(key: (String, String), value: Long)
val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b", 3) :: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) :: LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) :: LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) :: LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil
log.toDS().groupBy(l => {
(l.id, l.location)
}).count().toDF().toDF("key", "value").as[KeyValue].show
+-----+-----+
| key|value|
+-----+-----+
|[1,a]| 3|
|[1,b]| 3|
|[1,c]| 1|
|[2,a]| 4|
|[2,b]| 2|
|[2,c]| 1|
+-----+-----+
直接映射到需要的类型即可:
log.toDS.groupBy(l => {
(l.id, l.location)
}).count.as[KeyValue]