如何在涉及 NULLS 时使用第三个 table 连接两个 table

How to join two tables using a third table when NULLS are involved

我有两张考勤卡table需要加入。这两个 table 应该由周 ID 和员工资源代码(如果适用)连接起来。然而,除了一周之外,两个 table 包含的数据来自 不同的时间范围 (即在大多数情况下,两个 [=] 中不会有匹配的数据=65=]s).

第一个 table (dt5) 具有该周的 ID、员工的资源代码、该员工该周的产能以及他们该周的实际工作时间。

dt5:

+---------------+---------------+----------+---------------+
| id            | Resource_code | capacity | time_reported |
+---------------+---------------+----------+---------------+
|             1 |           555 |       40 |            40 |
|             1 |           333 |       25 |            20 |
|             2 |           555 |       40 |            40 |
|             2 |           333 |       25 |            20 |
|             3 |           555 |       40 |            40 |
|             3 |           333 |       25 |            20 |
|             4 |           555 |       40 |            39 |
|             4 |           333 |       25 |            24 |
+---------------+---------------+----------+---------------+

第二个 table (dt4) 具有周 ID、员工的资源代码和员工该周的计划工时。

dt4:

+---------------+---------------+---------------+
| id            | Resource_code | planned_hours |
+---------------+---------------+---------------+
|             4 |           555 |            30 |
|             4 |           333 |            20 | 
|             5 |           555 |            30 |
|             5 |           333 |            20 |
|             6 |           555 |            30 |
|             6 |           333 |            20 | 
+---------------+---------------+---------------+ 

当员工完成考勤卡时,计划工时数据将被删除;在发生这种情况之前,数据重叠的时间很短(当两个 table 都有同一时期的数据时,例如我示例中的时期 4 tables)。因为两个 table 在任何给定时间只有一个共同的时间段,所以我使用第三个 table (gtd) 包含每周的 ID 来帮助加入它们。

gtd:

+----+------------+----------+
| id | start_date | end_date |
+----+------------+----------+
|  1 |         10 |       20 |
|  2 |         30 |       40 |
|  3 |         50 |       60 |
|  4 |         70 |       80 |
|  5 |         90 |      100 |
|  6 |        110 |      120 |
|  7 |        130 |      140 |
|  8 |        150 |      160 |
|  9 |        170 |      180 |
| 10 |        190 |      200 |
+----+------------+----------+

我的结果应该是这样的:

请注意,第 4 周的行包含 dt4 和 dt5 的数据(容量、报告时间、计划小时数),因为第 4 周是唯一重叠的一周。

+----+---------------+----------+---------------+---------------+---------------+
| id | Resource_code | capacity | time_reported | Resource_code | planned_hours |
+----+---------------+----------+---------------+---------------+---------------+
|  1 | 555           | 40       | 40            | NULL          | NULL          |
|  1 | 333           | 25       | 20            | NULL          | NULL          |
|  2 | 555           | 40       | 40            | NULL          | NULL          |
|  2 | 333           | 25       | 20            | NULL          | NULL          |
|  3 | 555           | 40       | 40            | NULL          | NULL          |
|  3 | 333           | 25       | 20            | NULL          | NULL          |
|  4 | 555           | 40       | 39            | 555           | 30            |
|  4 | 333           | 25       | 24            | 333           | 20            |
|  5 | NULL          | NULL     | NULL          | 555           | 30            |
|  5 | NULL          | NULL     | NULL          | 333           | 20            |
|  6 | NULL          | NULL     | NULL          | 555           | 30            |
|  6 | NULL          | NULL     | NULL          | 333           | 20            |
|  7 | NULL          | NULL     | NULL          | NULL          | NULL          |
|  8 | NULL          | NULL     | NULL          | NULL          | NULL          |
|  9 | NULL          | NULL     | NULL          | NULL          | NULL          |
| 10 | NULL          | NULL     | NULL          | NULL          | NULL          |
+----+---------------+----------+---------------+---------------+---------------+

这是我目前的 SQL:

SELECT 
  gtd.id,   
  dt5.resource_code,    
  dt5.capacity, 
  dt5.time_reported,    
  dt4.resource_code,    
  dt4.planned_hours
FROM gtd
  LEFT JOIN dt5 ON gtd.id = dt5.id
  LEFT OUTER JOIN dt4 ON gtd.id = dt4.id

我的(不正确的)结果如下所示:

错误发生在第 4 周的行中。在第 4 周行中的两行中,来自 dt4 的资源代码和计划时数信息与来自 d​​t5 的资源代码不匹配。

+----+---------------+----------+---------------+---------------+---------------+
| id | resource_code | capacity | time_reported | resource_code | planned_hours |
+----+---------------+----------+---------------+---------------+---------------+
|  1 | 555           | 40       | 40            | NULL          | NULL          |
|  1 | 333           | 25       | 20            | NULL          | NULL          |
|  2 | 555           | 40       | 40            | NULL          | NULL          |
|  2 | 333           | 25       | 20            | NULL          | NULL          |
|  3 | 555           | 40       | 40            | NULL          | NULL          |
|  3 | 333           | 25       | 20            | NULL          | NULL          |
|  4 | 555           | 40       | 39            | 555 (Correct) | 30            |
|  4 | 555           | 40       | 39            | 333 (Wrong)   | 20            |
|  4 | 333           | 25       | 24            | 555 (Wrong)   | 30            |
|  4 | 333           | 25       | 24            | 333 (Correct) | 20            |
|  5 | NULL          | NULL     | NULL          | 555           | 30            |
|  5 | NULL          | NULL     | NULL          | 333           | 20            |
|  6 | NULL          | NULL     | NULL          | 555           | 30            |
|  6 | NULL          | NULL     | NULL          | 333           | 20            |
|  7 | NULL          | NULL     | NULL          | NULL          | NULL          |
|  8 | NULL          | NULL     | NULL          | NULL          | NULL          |
|  9 | NULL          | NULL     | NULL          | NULL          | NULL          |
| 10 | NULL          | NULL     | NULL          | NULL          | NULL          |
+----+---------------+----------+---------------+---------------+---------------+

根据我的研究,我认为我要么错误地使用了 JOINS,要么我在某处需要一个 CASE 语句。我也试过在资源代码上加入 tables,但这消除了我的很多数据。任何正确方向的解决方案或指示将不胜感激。

我正在使用 tsql。

*编辑我的问题以解决与列名不一致的问题(period_number 更改为 id)

毫无疑问,我的回答有一个更简单、更优雅的解决方案,但由于我很累,这里有一个蛮力方法:

使用 UNION 将两个 table 混合在一起。您需要制造仅存在于一个 table(例如 Capacity)中的虚拟信息。

采用组合 table 并使用 GROUP BY:

组织数据
SELECT f1.Period, f1.RC, f1.PlanTime, f1.ActTime
FROM
(SELECT  
  dt5.period_number AS 'Period',
  dt5.resource_code AS 'RC',    
  dt5.capacity AS 'ActCap', 
  0 AS 'PlanTime',
  dt5.time_reported AS 'ActTime'
FROM dt5
UNION ALL
SELECT  
  dt4.period_number AS 'Period',
  dt4.resource_code AS 'RC',    
  0 AS 'ActCap', 
  dt4.planned_hours AS 'PlanTime',
  0 AS 'ActTime'
FROM dt4) AS f1
GROUP BY f1.Period, f1.RC

我认为您不需要 gtd table。请尝试看看这是否适合您。如果我对您的要求的理解不正确,请指正。

SELECT COALESCE(dt5.period_number, dt4.period_number) AS period_number,
    dt5.Resource_code,
    dt5.capacity,
    dt5.time_reported,
    dt4.Resource_code,
    dt4.planned_hours
FROM dt5
FULL OUTER JOIN (
    SELECT *
    FROM dt4 a
    WHERE NOT EXISTS (
            SELECT 1
            FROM dt5 b
            WHERE b.period_number = a.period_number
                AND b.Resource_code = a.Resource_code
            )
    ) dt4
    ON dt5.period_number = dt4.period_number
        AND dt4.Resource_code = dt5.Resource_code
ORDER BY COALESCE(dt5.period_number, dt4.period_number) ASC

测试数据

;WITH cte_dt5(period_number,Resource_code,capacity,time_reported) AS 
(
SELECT 1, 555, 40, 40 UNION ALL
SELECT 1, 333, 25, 20 UNION ALL
SELECT 2, 555, 40, 40 UNION ALL
SELECT 2, 333, 25, 20 UNION ALL
SELECT 3, 555, 40, 40 UNION ALL
SELECT 3, 333, 25, 20 UNION ALL
SELECT 4, 555, 40, 39 UNION ALL
SELECT 4, 333, 25, 24
)
,cte_dt4 (period_number, Resource_code, planned_hours) AS
(
SELECT 4, 555, 30 UNION ALL
SELECT 4, 333, 20 UNION ALL
SELECT 5, 555, 30 UNION ALL
SELECT 5, 333, 20 UNION ALL
SELECT 6, 555, 30 UNION ALL
SELECT 6, 333, 20
)
SELECT COALESCE(dt5.period_number, dt4.period_number) AS period_number,
    dt5.Resource_code,
    dt5.capacity,
    dt5.time_reported,
    dt4.Resource_code,
    dt4.planned_hours
FROM cte_dt5 AS dt5
FULL OUTER JOIN (
    SELECT *
    FROM cte_dt4 a
    WHERE NOT EXISTS (
            SELECT 1
            FROM cte_dt5 b
            WHERE b.period_number = a.period_number
                AND b.Resource_code = a.Resource_code
            )
    ) dt4
    ON dt5.period_number = dt4.period_number
        AND dt4.Resource_code = dt5.Resource_code
ORDER BY COALESCE(dt5.period_number, dt4.period_number) ASC

结果

+---------------------------------------------------------------------------------+
|period_number|Resource_code|capacity   |time_reported|Resource_code|planned_hours|
+-------------|-------------|-----------|-------------|-------------|-------------+
|1            |555          |40         |40           |NULL         |NULL         |
|1            |333          |25         |20           |NULL         |NULL         |
|2            |555          |40         |40           |NULL         |NULL         |
|2            |333          |25         |20           |NULL         |NULL         |
|3            |555          |40         |40           |NULL         |NULL         |
|3            |333          |25         |20           |NULL         |NULL         |
|4            |555          |40         |39           |NULL         |NULL         |
|4            |333          |25         |24           |NULL         |NULL         |
|5            |NULL         |NULL       |NULL         |333          |20           |
|5            |NULL         |NULL       |NULL         |555          |30           |
|6            |NULL         |NULL       |NULL         |333          |20           |
|6            |NULL         |NULL       |NULL         |555          |30           |
+---------------------------------------------------------------------------------+

代码根据以下 OP 的要求进行更改。注释 Exist 子句将给出所需的结果。

user7571220: Thank you for your help! Everything is correct except for the planned hours and resource code (which come from dt4) in week 4. I am trying to include data from both tables in the week that they overlap (week 4). I'm essentially trying to get the data for week 4 to look like the comments I've posted below.

| 4 | 555 | 40 | 39 | 555 | 30 |

| 4 | 333 | 25 | 24 | 333 | 20 |

SELECT COALESCE(dt5.period_number, dt4.period_number) AS period_number,
        dt5.Resource_code,
        dt5.capacity,
        dt5.time_reported,
        dt4.Resource_code,
        dt4.planned_hours
    FROM cte_dt5 AS dt5
    FULL OUTER JOIN (
        SELECT *
        FROM cte_dt4 a
        --WHERE NOT EXISTS (
        --        SELECT 1
        --        FROM cte_dt5 b
        --        WHERE b.period_number = a.period_number
        --            AND b.Resource_code = a.Resource_code
        --        )
        ) dt4
        ON dt5.period_number = dt4.period_number
            AND dt4.Resource_code = dt5.Resource_code
    ORDER BY COALESCE(dt5.period_number, dt4.period_number) ASC

结果

+---------------------------------------------------------------------------------+
|period_number|Resource_code|capacity   |time_reported|Resource_code|planned_hours|
+-------------|-------------|-----------|-------------|-------------|-------------+
|1            |555          |40         |40           |NULL         |NULL         |
|1            |333          |25         |20           |NULL         |NULL         |
|2            |555          |40         |40           |NULL         |NULL         |
|2            |333          |25         |20           |NULL         |NULL         |
|3            |555          |40         |40           |NULL         |NULL         |
|3            |333          |25         |20           |NULL         |NULL         |
|4            |555          |40         |39           |555          |30           |
|4            |333          |25         |24           |333          |20           |
|5            |NULL         |NULL       |NULL         |333          |20           |
|5            |NULL         |NULL       |NULL         |555          |30           |
|6            |NULL         |NULL       |NULL         |333          |20           |
|6            |NULL         |NULL       |NULL         |555          |30           |
+---------------------------------------------------------------------------------+