如何在 Scala 中添加数据框内容忽略空值
How to add data frame contents in scala ignore null values
我在 Scala 中有一个如下所示的数据框。当我对两个不同大小的数据帧进行完全外部连接时,我得到了这个结果。
这些是执行以下查询后得到的键值对
select * from TEMP1 a FULL OUTER JOIN TEMP2 b ON a.T_ROWKEY = b.N_ROWKEY
下面的 df 描述了键值对,我们需要添加相似的键值并创建新的数据框,如果没有相似的值,则保持原样。
[2552195C312,100,2552195C312,5]
[null,null,175831A638,1]
[48061B887,1,null,null]
[null,null,171539C177,1]
[null,null,5584D2379,4]
[118732EE7792,3,null,null]
[null,null,8157FF1915,1]
[14310AA872,1000,14310AA872,7]
[148BB41539,5,148BB41539,1]
[40513SS68,1,null,null]
[null,null,199915UY72,11]
[11429401AW5,3,null,null]
[187755CD00,4,null,null]
[834413CV18,1,null,null]
[185475XS2,14,null,null]
[11716817SD8,2,null,null]
[2552998AS99,12,null,null]
[null,null,19792WS37,2]
[153054WE02,1,null,null]
[null,null,8131128ER1,7]
我期待
这样的结果
[2552195C312,105]
[175831A638,1]
[48061B887,1]
[171539C177,1]
[5584D2379,4]
[118732EE7792,3]
[8157FF1915,1]
[14310AA872,1007]
[148BB41539,6]
[40513SS68,1]
[199915UY72,11]
[11429401AW5,3]
[187755CD00,4]
[834413CV18,1]
[185475XS2,14]
[11716817SD8,2]
[2552998AS99,12]
[19792WS37,2]
[153054WE02,1]
[8131128ER1,7]
请一些人帮忙解决这个问题。感谢您的帮助。
因为你没有说明值列名我假设你的dataframe
在[=17之后的schema
=] 是
root
|-- T_ROWKEY: string (nullable = true)
|-- T_ROWVALUE: integer (nullable = true)
|-- N_ROWKEY: string (nullable = true)
|-- N_ROWVALUE: integer (nullable = true)
所以你应该在 schema
之后 outer join
as
sqlContext.sql("select * from TEMP1 a FULL OUTER JOIN TEMP2 b ON a.T_ROWKEY = b.N_ROWKEY").createOrReplaceTempView("JOINED")
然后简单的 case when then else end
应该会给你你期望的最终结果
sqlContext.sql("select case when T_ROWKEY is null then `N_ROWKEY` else `T_ROWKEY` end as ROWKEY, case when T_ROWVALUE is null then 0 else `T_ROWVALUE` end + case when N_ROWVALUE is null then 0 else `N_ROWVALUE` end as VALUE from JOINED").show(false)
哪个应该给你
+------------+-----+
|ROWKEY |VALUE|
+------------+-----+
|14310AA872 |1007 |
|19792WS37 |2 |
|5584D2379 |4 |
|40513SS68 |1 |
|11716817SD8 |2 |
|11429401AW5 |3 |
|118732EE7792|3 |
|171539C177 |1 |
|187755CD00 |4 |
|8131128ER1 |7 |
|2552998AS99 |12 |
|834413CV18 |1 |
|8157FF1915 |1 |
|2552195C312 |105 |
|48061B887 |1 |
|148BB41539 |6 |
|153054WE02 |1 |
|175831A638 |1 |
|199915UY72 |11 |
|185475XS2 |14 |
+------------+-----+
使用api
使用 when otherwise
内置函数 比
更简单和简洁
import org.apache.spark.sql.functions._
joined.select(when('T_ROWKEY.isNull, 'N_ROWKEY).otherwise('T_ROWKEY).as("ROWKEY"),
when('T_ROWVALUE.isNull, 0).otherwise('T_ROWVALUE) + when('N_ROWVALUE.isNull, 0).otherwise('N_ROWVALUE) as "VALUE")
.show(false)
这应该会给你上面的结果
希望回答对你有帮助
我在 Scala 中有一个如下所示的数据框。当我对两个不同大小的数据帧进行完全外部连接时,我得到了这个结果。
这些是执行以下查询后得到的键值对
select * from TEMP1 a FULL OUTER JOIN TEMP2 b ON a.T_ROWKEY = b.N_ROWKEY
下面的 df 描述了键值对,我们需要添加相似的键值并创建新的数据框,如果没有相似的值,则保持原样。
[2552195C312,100,2552195C312,5]
[null,null,175831A638,1]
[48061B887,1,null,null]
[null,null,171539C177,1]
[null,null,5584D2379,4]
[118732EE7792,3,null,null]
[null,null,8157FF1915,1]
[14310AA872,1000,14310AA872,7]
[148BB41539,5,148BB41539,1]
[40513SS68,1,null,null]
[null,null,199915UY72,11]
[11429401AW5,3,null,null]
[187755CD00,4,null,null]
[834413CV18,1,null,null]
[185475XS2,14,null,null]
[11716817SD8,2,null,null]
[2552998AS99,12,null,null]
[null,null,19792WS37,2]
[153054WE02,1,null,null]
[null,null,8131128ER1,7]
我期待
这样的结果[2552195C312,105]
[175831A638,1]
[48061B887,1]
[171539C177,1]
[5584D2379,4]
[118732EE7792,3]
[8157FF1915,1]
[14310AA872,1007]
[148BB41539,6]
[40513SS68,1]
[199915UY72,11]
[11429401AW5,3]
[187755CD00,4]
[834413CV18,1]
[185475XS2,14]
[11716817SD8,2]
[2552998AS99,12]
[19792WS37,2]
[153054WE02,1]
[8131128ER1,7]
请一些人帮忙解决这个问题。感谢您的帮助。
因为你没有说明值列名我假设你的dataframe
在[=17之后的schema
=] 是
root
|-- T_ROWKEY: string (nullable = true)
|-- T_ROWVALUE: integer (nullable = true)
|-- N_ROWKEY: string (nullable = true)
|-- N_ROWVALUE: integer (nullable = true)
所以你应该在 schema
之后 outer join
as
sqlContext.sql("select * from TEMP1 a FULL OUTER JOIN TEMP2 b ON a.T_ROWKEY = b.N_ROWKEY").createOrReplaceTempView("JOINED")
然后简单的 case when then else end
应该会给你你期望的最终结果
sqlContext.sql("select case when T_ROWKEY is null then `N_ROWKEY` else `T_ROWKEY` end as ROWKEY, case when T_ROWVALUE is null then 0 else `T_ROWVALUE` end + case when N_ROWVALUE is null then 0 else `N_ROWVALUE` end as VALUE from JOINED").show(false)
哪个应该给你
+------------+-----+
|ROWKEY |VALUE|
+------------+-----+
|14310AA872 |1007 |
|19792WS37 |2 |
|5584D2379 |4 |
|40513SS68 |1 |
|11716817SD8 |2 |
|11429401AW5 |3 |
|118732EE7792|3 |
|171539C177 |1 |
|187755CD00 |4 |
|8131128ER1 |7 |
|2552998AS99 |12 |
|834413CV18 |1 |
|8157FF1915 |1 |
|2552195C312 |105 |
|48061B887 |1 |
|148BB41539 |6 |
|153054WE02 |1 |
|175831A638 |1 |
|199915UY72 |11 |
|185475XS2 |14 |
+------------+-----+
使用api
使用 when otherwise
内置函数 比
import org.apache.spark.sql.functions._
joined.select(when('T_ROWKEY.isNull, 'N_ROWKEY).otherwise('T_ROWKEY).as("ROWKEY"),
when('T_ROWVALUE.isNull, 0).otherwise('T_ROWVALUE) + when('N_ROWVALUE.isNull, 0).otherwise('N_ROWVALUE) as "VALUE")
.show(false)
这应该会给你上面的结果
希望回答对你有帮助