如何在 Scala 中添加数据框内容忽略空值

How to add data frame contents in scala ignore null values

我在 Scala 中有一个如下所示的数据框。当我对两个不同大小的数据帧进行完全外部连接时,我得到了这个结果。

这些是执行以下查询后得到的键值对

select * from TEMP1 a FULL OUTER JOIN TEMP2 b ON a.T_ROWKEY = b.N_ROWKEY

下面的 df 描述了键值对,我们需要添加相似的键值并创建新的数据框,如果没有相似的值,则保持原样。

[2552195C312,100,2552195C312,5]
[null,null,175831A638,1]
[48061B887,1,null,null]
[null,null,171539C177,1]
[null,null,5584D2379,4]
[118732EE7792,3,null,null]
[null,null,8157FF1915,1]
[14310AA872,1000,14310AA872,7]
[148BB41539,5,148BB41539,1]
[40513SS68,1,null,null]
[null,null,199915UY72,11]
[11429401AW5,3,null,null]
[187755CD00,4,null,null]
[834413CV18,1,null,null]
[185475XS2,14,null,null]
[11716817SD8,2,null,null]
[2552998AS99,12,null,null]
[null,null,19792WS37,2]
[153054WE02,1,null,null]
[null,null,8131128ER1,7]

我期待

这样的结果
[2552195C312,105]
[175831A638,1]
[48061B887,1]
[171539C177,1]
[5584D2379,4]
[118732EE7792,3]
[8157FF1915,1]
[14310AA872,1007]
[148BB41539,6]
[40513SS68,1]
[199915UY72,11]
[11429401AW5,3]
[187755CD00,4]
[834413CV18,1]
[185475XS2,14]
[11716817SD8,2]
[2552998AS99,12]
[19792WS37,2]
[153054WE02,1]
[8131128ER1,7]

请一些人帮忙解决这个问题。感谢您的帮助。

因为你没有说明值列名假设你的dataframe在[=17之后的schema =]

root
 |-- T_ROWKEY: string (nullable = true)
 |-- T_ROWVALUE: integer (nullable = true)
 |-- N_ROWKEY: string (nullable = true)
 |-- N_ROWVALUE: integer (nullable = true)

所以你应该在 schema 之后 outer join as

sqlContext.sql("select * from TEMP1 a FULL OUTER JOIN TEMP2 b ON a.T_ROWKEY = b.N_ROWKEY").createOrReplaceTempView("JOINED")

然后简单的 case when then else end 应该会给你你期望的最终结果

sqlContext.sql("select case when T_ROWKEY is null then `N_ROWKEY` else `T_ROWKEY` end as ROWKEY, case when T_ROWVALUE is null then 0 else `T_ROWVALUE` end  + case when N_ROWVALUE is null then 0 else `N_ROWVALUE` end as VALUE  from JOINED").show(false)

哪个应该给你

+------------+-----+
|ROWKEY      |VALUE|
+------------+-----+
|14310AA872  |1007 |
|19792WS37   |2    |
|5584D2379   |4    |
|40513SS68   |1    |
|11716817SD8 |2    |
|11429401AW5 |3    |
|118732EE7792|3    |
|171539C177  |1    |
|187755CD00  |4    |
|8131128ER1  |7    |
|2552998AS99 |12   |
|834413CV18  |1    |
|8157FF1915  |1    |
|2552195C312 |105  |
|48061B887   |1    |
|148BB41539  |6    |
|153054WE02  |1    |
|175831A638  |1    |
|199915UY72  |11   |
|185475XS2   |14   |
+------------+-----+

使用api

使用 when otherwise 内置函数

更简单和简洁
import org.apache.spark.sql.functions._
joined.select(when('T_ROWKEY.isNull, 'N_ROWKEY).otherwise('T_ROWKEY).as("ROWKEY"),
              when('T_ROWVALUE.isNull, 0).otherwise('T_ROWVALUE) + when('N_ROWVALUE.isNull, 0).otherwise('N_ROWVALUE) as "VALUE")
  .show(false)

这应该会给你上面的结果

希望回答对你有帮助