如何在不创建额外行的情况下合并 2 个数据框?

How to union 2 dataframe without creating additional rows?

我有 2 个数据帧,我想做 .filter($"item" === "a") 同时保持 "S/N" 的数字。

我尝试了以下方法,但是当我使用 union 时,它以额外的行结束。有没有办法在不创建额外行的情况下合并 2 个数据帧?

var DF1 = Seq(
  ("1","a",2),
  ("2","a",3),
  ("3","b",3),
  ("4","b",4),
  ("5","a",2)).
  toDF("S/N","item", "value")

var DF2 = Seq(
  ("1","a",2),
  ("2","a",3),
  ("3","b",3),
  ("4","b",4),
  ("5","a",2)).
  toDF("S/N","item", "value")
DF2 = DF2.filter($"item"==="a")

DF3=DF1.withColumn("item",lit(0)).withColumn("value",lit(0))

DF1.show()
+---+----+-----+
|S/N|item|value|
+---+----+-----+
|  1|   a|    2|
|  2|   a|    3|
|  3|   b|    3|
|  4|   b|    4|
|  5|   a|    2|
+---+----+-----+

DF2.show()
+---+----+-----+
|S/N|item|value|
+---+----+-----+
|  1|   a|    2|
|  2|   a|    3|
|  5|   a|    2|
+---+----+-----+

DF3.show()
+---+----+-----+
|S/N|item|value|
+---+----+-----+
|  1|   0|    0|
|  2|   0|    0|
|  3|   0|    0|
|  4|   0|    0|
|  5|   0|    0|
+---+----+-----+

DF2.union(someDF3).show()
+---+----+-----+
|S/N|item|value|
+---+----+-----+
|  1|   a|    2|
|  2|   a|    3|
|  5|   a|    2|
|  1|   0|    0|
|  2|   0|    0|
|  3|   0|    0|
|  4|   0|    0|
|  5|   0|    0|
+---+----+-----+

左外连接你的 S/Ns 和过滤后的数据框,然后使用 coalesce 去除空值:

val DF3 = DF1.select("S/N")

val DF4 = (DF3.join(DF2, Seq("S/N"), joinType="leftouter")
              .withColumn("item", coalesce($"item", lit(0)))
              .withColumn("value", coalesce($"value", lit(0))))
DF4.show
+---+----+-----+
|S/N|item|value|
+---+----+-----+
|  1|   a|    2|
|  2|   a|    3|
|  3|   0|    0|
|  4|   0|    0|
|  5|   a|    2|
+---+----+-----+