JOIN 基于服务日期的表
JOIN Tables based on Service Date
我有 2 个 Table(历史 和 负责)。他们需要根据服务日期加入。
历史Table:
Id
ServiceDate
Hours
ClientId
ClientName
1
2021-10-15
3
123
Tom Holland
2
2021-10-25
5
123
Tom Holland
3
2022-01-14
2
123
Tom Holland
负责 Table:
2999-12-31 表示负责人没有结束日期(当前)
ClientId
ClientName
ResponsibleId
ResponsibleName
ResponsibleStartDate
ResponsibleEndtDate
123
Tom Holland
77
Thomas Anderson
2020-09-17
2021-10-17
123
Tom Holland
88
Tom Cruise
2021-10-18
2999-12-31
123
Tom Holland
99
Sten Lee
2022-01-07
2999-12-31
我的代码生成多行,因为 2022-01-14 服务日期属于 Responsible Table:
的多个日期范围
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM History AS h
LEFT JOIN Responsible AS r
ON (h.ClientId = r.ClientId AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
上面查询的输出是:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
从技术上讲,输出是正确的(因为 2022-01-14 在 2021-10-18 - 2999-12-31 之间以及在 2022-01-07 - 2999-12-31 之间),但不是我需要。
我想知道是否可以实现 2 个输出:
1) 如果服务日期落在 Responsible Table 的多个日期范围内,Responsible 应该是 ResponsibleStartDate 更接近 ServiceDate 的人:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
2) 保留所有行,如果服务日期落在 Responsible Table 的多个日期范围内,但将 Hours 平均分配给 Responsible:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
1
123
Tom Holland
Tom Cruise
3
2022-01-14
1
123
Tom Holland
Sten Lee
第一个,我们可以使用 window 函数根据 ResponsibleStartDate
与 ServiceDate
的接近程度来应用行号,然后我们可以只选择第一行每 h.Id
。如果有平局,我们可以通过选择能给我们确定性秩序的东西来打破它,例如ORDER BY {DATEDIFF expression}, ResponsibleName
.
;WITH cte AS
(
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
RankOrderedByProximityToServiceDate = ROW_NUMBER() OVER
(PARTITION BY h.Id
ORDER BY ABS(DATEDIFF(DAY, ResponsibleStartDate, ServiceDate)))
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM cte WHERE RankOrderedByProximityToServiceDate = 1;
输出:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
第二个不需要 CTE,我们可以简单地将 h
中的 Hours
除以 h.Id
的行数,然后将其限制为 2小数位:
SELECT h.Id,
h.ServiceDate,
Hours = CONVERT(decimal(11,2),
h.Hours * 1.0
/ COUNT(h.Id) OVER (PARTITION BY h.Id)),
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate);
输出:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3.00
123
Tom Holland
Thomas Anderson
2
2021-10-25
5.00
123
Tom Holland
Tom Cruise
3
2022-01-14
1.00
123
Tom Holland
Tom Cruise
3
2022-01-14
1.00
123
Tom Holland
Sten Lee
两者都在 this db<>fiddle 中得到了证明。
我在第 1 部分的尝试 - 如果在同一开始日期有多个负责人,它就不起作用。
WITH
"all_services" AS (
SELECT
h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
r.ResponsibleStartDate
FROM History AS h
LEFT JOIN Responsible AS r
ON h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate
),
"most_recent_key" AS (
SELECT
ServiceDate,
ClientId,
MAX(ResponsibleStartDate) AS "ResponsibleStartDate"
FROM all_services
GROUP BY ServiceDate, ClientId
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM all_services
INNER JOIN most_recent_key
USING (ServiceDate, ClientId, ResponsibleStartDate)
无论如何张贴它作为对比亚伦的更好的解决方案作为我自己的学习点。
我有 2 个 Table(历史 和 负责)。他们需要根据服务日期加入。
历史Table:
Id | ServiceDate | Hours | ClientId | ClientName |
---|---|---|---|---|
1 | 2021-10-15 | 3 | 123 | Tom Holland |
2 | 2021-10-25 | 5 | 123 | Tom Holland |
3 | 2022-01-14 | 2 | 123 | Tom Holland |
负责 Table:
2999-12-31 表示负责人没有结束日期(当前)
ClientId | ClientName | ResponsibleId | ResponsibleName | ResponsibleStartDate | ResponsibleEndtDate |
---|---|---|---|---|---|
123 | Tom Holland | 77 | Thomas Anderson | 2020-09-17 | 2021-10-17 |
123 | Tom Holland | 88 | Tom Cruise | 2021-10-18 | 2999-12-31 |
123 | Tom Holland | 99 | Sten Lee | 2022-01-07 | 2999-12-31 |
我的代码生成多行,因为 2022-01-14 服务日期属于 Responsible Table:
的多个日期范围SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM History AS h
LEFT JOIN Responsible AS r
ON (h.ClientId = r.ClientId AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
上面查询的输出是:
Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
---|---|---|---|---|---|
1 | 2021-10-15 | 3 | 123 | Tom Holland | Thomas Anderson |
2 | 2021-10-25 | 5 | 123 | Tom Holland | Tom Cruise |
3 | 2022-01-14 | 2 | 123 | Tom Holland | Tom Cruise |
3 | 2022-01-14 | 2 | 123 | Tom Holland | Sten Lee |
从技术上讲,输出是正确的(因为 2022-01-14 在 2021-10-18 - 2999-12-31 之间以及在 2022-01-07 - 2999-12-31 之间),但不是我需要。
我想知道是否可以实现 2 个输出:
1) 如果服务日期落在 Responsible Table 的多个日期范围内,Responsible 应该是 ResponsibleStartDate 更接近 ServiceDate 的人:
Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
---|---|---|---|---|---|
1 | 2021-10-15 | 3 | 123 | Tom Holland | Thomas Anderson |
2 | 2021-10-25 | 5 | 123 | Tom Holland | Tom Cruise |
3 | 2022-01-14 | 2 | 123 | Tom Holland | Sten Lee |
2) 保留所有行,如果服务日期落在 Responsible Table 的多个日期范围内,但将 Hours 平均分配给 Responsible:
Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
---|---|---|---|---|---|
1 | 2021-10-15 | 3 | 123 | Tom Holland | Thomas Anderson |
2 | 2021-10-25 | 5 | 123 | Tom Holland | Tom Cruise |
3 | 2022-01-14 | 1 | 123 | Tom Holland | Tom Cruise |
3 | 2022-01-14 | 1 | 123 | Tom Holland | Sten Lee |
第一个,我们可以使用 window 函数根据 ResponsibleStartDate
与 ServiceDate
的接近程度来应用行号,然后我们可以只选择第一行每 h.Id
。如果有平局,我们可以通过选择能给我们确定性秩序的东西来打破它,例如ORDER BY {DATEDIFF expression}, ResponsibleName
.
;WITH cte AS
(
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
RankOrderedByProximityToServiceDate = ROW_NUMBER() OVER
(PARTITION BY h.Id
ORDER BY ABS(DATEDIFF(DAY, ResponsibleStartDate, ServiceDate)))
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM cte WHERE RankOrderedByProximityToServiceDate = 1;
输出:
Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
---|---|---|---|---|---|
1 | 2021-10-15 | 3 | 123 | Tom Holland | Thomas Anderson |
2 | 2021-10-25 | 5 | 123 | Tom Holland | Tom Cruise |
3 | 2022-01-14 | 2 | 123 | Tom Holland | Sten Lee |
第二个不需要 CTE,我们可以简单地将 h
中的 Hours
除以 h.Id
的行数,然后将其限制为 2小数位:
SELECT h.Id,
h.ServiceDate,
Hours = CONVERT(decimal(11,2),
h.Hours * 1.0
/ COUNT(h.Id) OVER (PARTITION BY h.Id)),
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate);
输出:
Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
---|---|---|---|---|---|
1 | 2021-10-15 | 3.00 | 123 | Tom Holland | Thomas Anderson |
2 | 2021-10-25 | 5.00 | 123 | Tom Holland | Tom Cruise |
3 | 2022-01-14 | 1.00 | 123 | Tom Holland | Tom Cruise |
3 | 2022-01-14 | 1.00 | 123 | Tom Holland | Sten Lee |
两者都在 this db<>fiddle 中得到了证明。
我在第 1 部分的尝试 - 如果在同一开始日期有多个负责人,它就不起作用。
WITH
"all_services" AS (
SELECT
h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
r.ResponsibleStartDate
FROM History AS h
LEFT JOIN Responsible AS r
ON h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate
),
"most_recent_key" AS (
SELECT
ServiceDate,
ClientId,
MAX(ResponsibleStartDate) AS "ResponsibleStartDate"
FROM all_services
GROUP BY ServiceDate, ClientId
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM all_services
INNER JOIN most_recent_key
USING (ServiceDate, ClientId, ResponsibleStartDate)
无论如何张贴它作为对比亚伦的更好的解决方案作为我自己的学习点。