Left Outer 加入来自 spark scala 中两个数据帧的不相等记录
Left Outer join for unequla records fro two data frames in spark scala
我有两个数据框。
数据帧一
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|DataPartition|TimeStamp |OrganizationID|SourceID|_auditorId|sr:AuditorEnumerationId|sr:AuditorOpinionCode|sr:AuditorOpinionId|sr:IsPlayingAuditorRole|sr:IsPlayingCSRAuditorRole|sr:IsPlayingTaxAdvisorRole|FFAction|!||
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|Japan |2018-05-03T09:52:48+00:00|4295876589 |195 |null |null |null |null |null |null |null |O|!| |
|Japan |2018-05-03T08:10:19+00:00|4295876589 |196 |null |null |null |null |null |null |null |D|!| |
|Japan |2018-05-03T09:52:48+00:00|4295876589 |194 |null |null |null |null |null |null |null |O|!| |
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
数据帧二是
DataPartition TimeStamp OrganizationID SourceID _auditorId sr:AuditorEnumerationId sr:AuditorOpinionCode sr:AuditorOpinionId sr:IsPlayingAuditorRole sr:IsPlayingCSRAuditorRole sr:IsPlayingTaxAdvisorRole FFAction|!|
Japan 2018-05-03T08:06:06+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T08:06:06+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T09:48:33+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:48:33+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:34:46+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:34:46+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T08:10:19+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T08:10:19+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:45:22+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:45:22+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:52:48+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:52:48+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T09:41:09+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:41:09+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:38:59+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:38:59+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
现在我想添加数据框一二数据框的所有列,但三列 TimeStamp ,OrganizationID and SourceID
不同的记录除外。
因此,在这种情况下,数据框一的记录不会添加到数据框二。因为 TimeStamp |OrganizationID|SourceID
列在两个数据框中都匹配。
只应添加 SourceId 为 196 的 1 行。
left_outer join 在这种情况下有效吗?
当我这样做时,我得到了重复的列。
所以简而言之,不会添加基于数据框 1 中三列的匹配记录,除了所有记录都将添加到数据框
您可以尝试 leftanti
加入然后 union
df2,
df1.join(df2, Seq("TimeStamp" ,"OrganizationID", "SourceID"), "leftanti").union(df2)
我有两个数据框。 数据帧一
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|DataPartition|TimeStamp |OrganizationID|SourceID|_auditorId|sr:AuditorEnumerationId|sr:AuditorOpinionCode|sr:AuditorOpinionId|sr:IsPlayingAuditorRole|sr:IsPlayingCSRAuditorRole|sr:IsPlayingTaxAdvisorRole|FFAction|!||
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|Japan |2018-05-03T09:52:48+00:00|4295876589 |195 |null |null |null |null |null |null |null |O|!| |
|Japan |2018-05-03T08:10:19+00:00|4295876589 |196 |null |null |null |null |null |null |null |D|!| |
|Japan |2018-05-03T09:52:48+00:00|4295876589 |194 |null |null |null |null |null |null |null |O|!| |
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
数据帧二是
DataPartition TimeStamp OrganizationID SourceID _auditorId sr:AuditorEnumerationId sr:AuditorOpinionCode sr:AuditorOpinionId sr:IsPlayingAuditorRole sr:IsPlayingCSRAuditorRole sr:IsPlayingTaxAdvisorRole FFAction|!|
Japan 2018-05-03T08:06:06+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T08:06:06+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T09:48:33+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:48:33+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:34:46+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:34:46+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T08:10:19+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T08:10:19+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:45:22+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:45:22+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:52:48+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:52:48+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T09:41:09+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:41:09+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:38:59+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:38:59+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
现在我想添加数据框一二数据框的所有列,但三列 TimeStamp ,OrganizationID and SourceID
不同的记录除外。
因此,在这种情况下,数据框一的记录不会添加到数据框二。因为 TimeStamp |OrganizationID|SourceID
列在两个数据框中都匹配。
只应添加 SourceId 为 196 的 1 行。
left_outer join 在这种情况下有效吗? 当我这样做时,我得到了重复的列。
所以简而言之,不会添加基于数据框 1 中三列的匹配记录,除了所有记录都将添加到数据框
您可以尝试 leftanti
加入然后 union
df2,
df1.join(df2, Seq("TimeStamp" ,"OrganizationID", "SourceID"), "leftanti").union(df2)