JOIN 基于服务日期的表

JOIN Tables based on Service Date

我有 2 个 Table(历史负责)。他们需要根据服务日期加入。

历史Table:

Id ServiceDate Hours ClientId ClientName
1 2021-10-15 3 123 Tom Holland
2 2021-10-25 5 123 Tom Holland
3 2022-01-14 2 123 Tom Holland

负责 Table:

2999-12-31 表示负责人没有结束日期(当前)

ClientId ClientName ResponsibleId ResponsibleName ResponsibleStartDate ResponsibleEndtDate
123 Tom Holland 77 Thomas Anderson 2020-09-17 2021-10-17
123 Tom Holland 88 Tom Cruise 2021-10-18 2999-12-31
123 Tom Holland 99 Sten Lee 2022-01-07 2999-12-31

我的代码生成多行,因为 2022-01-14 服务日期属于 Responsible Table:

的多个日期范围
SELECT h.Id, 
       h.ServiceDate, 
       h.Hours, 
       h.ClientId, 
       h.ClientName, 
       r.ResponsibleName
FROM History AS h
LEFT JOIN Responsible AS r
   ON (h.ClientId = r.ClientId AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)

上面查询的输出是:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3 123 Tom Holland Thomas Anderson
2 2021-10-25 5 123 Tom Holland Tom Cruise
3 2022-01-14 2 123 Tom Holland Tom Cruise
3 2022-01-14 2 123 Tom Holland Sten Lee

从技术上讲,输出是正确的(因为 2022-01-14 在 2021-10-18 - 2999-12-31 之间以及在 2022-01-07 - 2999-12-31 之间),但不是我需要。

我想知道是否可以实现 2 个输出:

1) 如果服务日期落在 Responsible Table 的多个日期范围内,Responsible 应该是 ResponsibleStartDate 更接近 ServiceDate 的人:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3 123 Tom Holland Thomas Anderson
2 2021-10-25 5 123 Tom Holland Tom Cruise
3 2022-01-14 2 123 Tom Holland Sten Lee

2) 保留所有行,如果服务日期落在 Responsible Table 的多个日期范围内,但将 Hours 平均分配给 Responsible:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3 123 Tom Holland Thomas Anderson
2 2021-10-25 5 123 Tom Holland Tom Cruise
3 2022-01-14 1 123 Tom Holland Tom Cruise
3 2022-01-14 1 123 Tom Holland Sten Lee

第一个,我们可以使用 window 函数根据 ResponsibleStartDateServiceDate 的接近程度来应用行号,然后我们可以只选择第一行每 h.Id。如果有平局,我们可以通过选择能给我们确定性秩序的东西来打破它,例如ORDER BY {DATEDIFF expression}, ResponsibleName.

;WITH cte AS 
(
  SELECT h.Id, 
       h.ServiceDate, 
       h.Hours, 
       h.ClientId, 
       h.ClientName, 
       r.ResponsibleName, 
       RankOrderedByProximityToServiceDate = ROW_NUMBER() OVER 
         (PARTITION BY h.Id 
          ORDER BY ABS(DATEDIFF(DAY, ResponsibleStartDate, ServiceDate)))
  FROM dbo.History AS h
  LEFT JOIN dbo.Responsible AS r
     ON (h.ClientId = r.ClientId 
     AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM cte WHERE RankOrderedByProximityToServiceDate = 1;

输出:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3 123 Tom Holland Thomas Anderson
2 2021-10-25 5 123 Tom Holland Tom Cruise
3 2022-01-14 2 123 Tom Holland Sten Lee

第二个不需要 CTE,我们可以简单地将 h 中的 Hours 除以 h.Id 的行数,然后将其限制为 2小数位:

SELECT h.Id, 
       h.ServiceDate,
       Hours = CONVERT(decimal(11,2), 
         h.Hours * 1.0
         / COUNT(h.Id) OVER (PARTITION BY h.Id)),
       h.ClientId, 
       h.ClientName, 
       r.ResponsibleName
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
   ON (h.ClientId = r.ClientId 
   AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate);

输出:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3.00 123 Tom Holland Thomas Anderson
2 2021-10-25 5.00 123 Tom Holland Tom Cruise
3 2022-01-14 1.00 123 Tom Holland Tom Cruise
3 2022-01-14 1.00 123 Tom Holland Sten Lee

两者都在 this db<>fiddle 中得到了证明。

我在第 1 部分的尝试 - 如果在同一开始日期有多个负责人,它就不起作用。

WITH
"all_services" AS (
    SELECT
        h.Id, 
        h.ServiceDate, 
        h.Hours, 
        h.ClientId, 
        h.ClientName, 
        r.ResponsibleName,
        r.ResponsibleStartDate
    FROM History AS h
    LEFT JOIN Responsible AS r
           ON h.ClientId = r.ClientId
          AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate
),
"most_recent_key" AS (
    SELECT
        ServiceDate,
        ClientId,
        MAX(ResponsibleStartDate) AS "ResponsibleStartDate"
    FROM all_services
    GROUP BY ServiceDate, ClientId
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM all_services
INNER JOIN most_recent_key
     USING (ServiceDate, ClientId, ResponsibleStartDate)

无论如何张贴它作为对比亚伦的更好的解决方案作为我自己的学习点。