Left Outer 加入来自 spark scala 中两个数据帧的不相等记录

Left Outer join for unequla records fro two data frames in spark scala

我有两个数据框。 数据帧一

+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|DataPartition|TimeStamp                |OrganizationID|SourceID|_auditorId|sr:AuditorEnumerationId|sr:AuditorOpinionCode|sr:AuditorOpinionId|sr:IsPlayingAuditorRole|sr:IsPlayingCSRAuditorRole|sr:IsPlayingTaxAdvisorRole|FFAction|!||
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|Japan        |2018-05-03T09:52:48+00:00|4295876589    |195     |null      |null                   |null                 |null               |null                   |null                      |null                      |O|!|       |
|Japan        |2018-05-03T08:10:19+00:00|4295876589    |196     |null      |null                   |null                 |null               |null                   |null                      |null                      |D|!|       |
|Japan        |2018-05-03T09:52:48+00:00|4295876589    |194     |null      |null                   |null                 |null               |null                   |null                      |null                      |O|!|       |
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+

数据帧二是

    DataPartition   TimeStamp   OrganizationID  SourceID    _auditorId  sr:AuditorEnumerationId sr:AuditorOpinionCode   sr:AuditorOpinionId sr:IsPlayingAuditorRole sr:IsPlayingCSRAuditorRole  sr:IsPlayingTaxAdvisorRole  FFAction|!|
Japan   2018-05-03T08:06:06+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T08:06:06+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T09:48:33+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:48:33+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T07:27:10+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:27:10+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:27:10+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:35:42+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:35:42+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:35:42+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T09:34:46+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:34:46+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T08:10:19+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T08:10:19+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T07:28:16+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:28:16+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:28:16+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-02T09:05:04+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-02T09:05:04+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-02T09:05:04+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:31:28+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:31:28+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:31:28+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:22:58+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:22:58+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:22:58+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T09:45:22+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:45:22+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T07:11:26+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:11:26+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:11:26+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:00:45+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:00:45+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:00:45+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:36:47+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:36:47+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:36:47+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:01:52+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:01:52+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:01:52+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-02T10:28:22+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-02T10:28:22+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-02T10:28:22+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T09:52:48+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:52:48+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T09:41:09+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:41:09+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-02T10:30:32+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-02T10:30:32+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-02T10:30:32+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T06:56:32+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T06:56:32+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T06:56:32+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:05:04+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:05:04+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:05:04+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T09:38:59+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:38:59+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T07:08:14+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:08:14+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:08:14+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|

现在我想添加数据框一二数据框的所有列,但三列 TimeStamp ,OrganizationID and SourceID 不同的记录除外。 因此,在这种情况下,数据框一的记录不会添加到数据框二。因为 TimeStamp |OrganizationID|SourceID 列在两个数据框中都匹配。

只应添加 SourceId 为 196 的 1 行。

left_outer join 在这种情况下有效吗? 当我这样做时,我得到了重复的列。

所以简而言之,不会添加基于数据框 1 中三列的匹配记录,除了所有记录都将添加到数据框

您可以尝试 leftanti 加入然后 uniondf2,

df1.join(df2, Seq("TimeStamp" ,"OrganizationID", "SourceID"), "leftanti").union(df2)