复制当前行,修改它并在 spark 中添加一个新行

copy current row , modify it and add a new row in spark

我正在使用 java8 版本的 spark-sql-2.4.1v。 我有一个场景,我需要复制当前行并创建另一行修改几列数据如何在 spark-sql 中实现?

例如: 鉴于

 val data = List(
  ("20", "score", "school",  14 ,12),
  ("21", "score", "school",  13 , 13),
  ("22", "rate", "school",  11 ,14)
 )
val df = data.toDF("id", "code", "entity", "value1","value2")

当前输出

+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 20|score|school|    14|    12|
| 21|score|school|    13|    13|
| 22| rate|school|    11|    14|
+---+-----+------+------+------+

When column "code" is "rate" copy it as two rows i.e. one is original , second it is another row with new code "old_ rate" like below

预期输出:

+---+--------+------+------+------+
| id|    code|entity|value1|value2|
+---+--------+------+------+------+
| 20|   score|school|    14|    12|
| 21|   score|school|    13|    13|
| 22|    rate|school|    11|    14|
| 22|new_rate|school|    11|    14|
+---+--------+------+------+------+

如何实现?

使用when检查code === rate,如果匹配则用array(lit("rate"),lit("new_rate"))替换该列值,不匹配的列值array($"code")然后展开code 列。

检查下面的代码。

scala> df.show(false)
+---+-----+------+------+------+
|id |code |entity|value1|value2|
+---+-----+------+------+------+
|20 |score|school|14    |12    |
|21 |score|school|13    |13    |
|22 |rate |school|11    |14    |
+---+-----+------+------+------+
val colExpr = explode(
    when(
        $"code" === "rate",
        array(
            lit("rate"),
            lit("new_rate")
        )
    )
    .otherwise(array($"code"))
)
scala> df.withColumn("code",colExpr).show(false)
+---+--------+------+------+------+
|id |code    |entity|value1|value2|
+---+--------+------+------+------+
|20 |score   |school|14    |12    |
|21 |score   |school|13    |13    |
|22 |rate    |school|11    |14    |
|22 |new_rate|school|11    |14    |
+---+--------+------+------+------+

您可以将此方法用于您的方案,

df.union(df.filter($"code"==="rate").withColumn("code",concat(lit("new_"), $"code"))).show()
/*
+---+--------+------+------+------+
| id|    code|entity|value1|value2|
+---+--------+------+------+------+
| 20|   score|school|    14|    12|
| 21|   score|school|    13|    13|
| 22|    rate|school|    11|    14|
| 22|new_rate|school|    11|    14|
+---+--------+------+------+------+
*/