如何确定 SQL 服务器 "finding islands" 中的连续日期 count/days(连续行)
How to Determine consecutive date count/days in SQL Server "finding islands" (consecutive rows)
我有一个学生 table,我想知道他们 session/training 持续了多长时间。
我想排除周末,但我想计算不包括周末的连续天数。
A class 有 Start Date 和 End Date,例如,学生 ID S1 a 可以在 1 月预订 class 然后在 2 月再次预订,我想知道 1 月预订和 2 月预订有多少天,同时排除周末。基本上,我正在通过学生 ID 寻找从开始日期到结束日期的连续日期,除了周末没有休息时间。
SELECT
[ID]
,[StartDate]
,[EndDate]
,[BookingDays] AS Consecutive_Booking
FROM StudentBooking
如果学生 (student classifications(Type)) 在过去 3 天内预订了 class 5 天或 2 次(开始日期到结束日期(周一至周五))个月他们是 Resident else Visitors。开始日期和结束日期仅记录为周一至周五。
请注意学生 ID 1 有一个连续的日期,这应该被算作一个块。 (02/01/2018-12/01/2018) 第二块 22/01-26/01
我想在下面复制 table。
ID StartDate EndDate Duration Type
1 02/01/2018 05/01/2018 ==>Please Note have continous dates
1 08/01/2018 12/01/2018 9 Resident
1 22/01/2018 26/01/2018 5 Resident
2 23/01/2018 26/01/2018 4 Visitor
3 29/01/2018 31/01/2018 3 Visitor
这是我对你的问题的解决方案。
在 CTE "comparison" 中,我将每条记录与该学生的这条记录和所有后续记录相结合。这样,我就有了一个连续训练块的可能起点(从连接的左侧)和这样一个块的可能结束点(从连接的右侧)。
使用 "cross applies",我计算了 2 个值:
- 从链的第一个可能间隔开始到最后一个可能间隔结束的工作日
- 链中可能的最后一个间隔中的工作日。
根据后面的值,使用 windows 函数,我根据可能的开始和结束间隔构建了 运行 工作日总数。
您用 "SQL 2012" 标记了问题,因此应该可以使用此 window 函数。
在下一个 CTE ("sorting") 中,我将之前的结果限制为 运行 总数等于第一个开始日期和最后一个结束日期之间的工作日。这样,只剩下连续的块。然后以两种方式对它们进行编号:
- 具有相同 EndDate 的连续块按 StartDate 升序编号
- 具有相同 StartDate 的连续块按 EndDate 降序编号。
对于每个 EndDate,我想要最早的 StartDate,而对于这个 StartDate,我只想要最新的 EndDate,所以我在两个编号中都筛选为 1。在这里:
WITH
comparison (ID, StartDate, EndDate, TotalDays, SumSingleDays) AS (
SELECT bStart.ID, bStart.StartDate, bEnd.EndDate, Workdays.Total
, SUM(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
FROM StudentBookings bStart
INNER JOIN StudentBookings bEnd
ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
CROSS APPLY (VALUES (
DATEDIFF(day, 0, bStart.StartDate),
DATEDIFF(day, 0, bEnd.StartDate),
1+DATEDIFF(day, 0, bEnd.EndDate))
) d (s1, s2, e2)
CROSS APPLY (VALUES (
(d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
(d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
) Workdays (Total, Single)
),
sorting (ID, StartDate, EndDate, Duration, RowNumStart, RowNumEnd) AS (
SELECT ID, StartDate, EndDate, TotalDays
, ROW_NUMBER() OVER (PARTITION BY ID, EndDate ORDER BY StartDate)
, ROW_NUMBER() OVER (PARTITION BY ID, StartDate ORDER BY EndDate DESC)
FROM comparison
WHERE TotalDays = SumSingleDays
)
SELECT ID, StartDate, EndDate, Duration
, CASE WHEN Duration >= 5 THEN 'Resident' ELSE 'Visitor' END AS [Type]
FROM sorting
WHERE (RowNumStart = 1)
AND (RowNumEnd = 1)
ORDER BY ID, StartDate;
结果:
也许有一个更优雅的方法来解决这个问题,使用 Itzik Ben-Gan 的间隔打包解决方案,我会 post 当我想出来的时候。
已添加:
此外,我计算了所有预订块的预订数量,并按学生 (ID) 求和,最终做出 "Resident" 决定。第一个 CTE(比较)的预订仅限于最近 3 个月:
WITH
comparison (ID, StartDate, EndDate, TotalDays, CountBookings, SumSingleDays) AS (
SELECT bStart.ID, bStart.StartDate, bEnd.EndDate, Workdays.Total
, COUNT(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
, SUM(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
FROM StudentBookings bStart
INNER JOIN StudentBookings bEnd
ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
CROSS APPLY (VALUES (
DATEDIFF(day, 0, bStart.StartDate),
DATEDIFF(day, 0, bEnd.StartDate),
1+DATEDIFF(day, 0, bEnd.EndDate))
) d (s1, s2, e2)
CROSS APPLY (VALUES (
(d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
(d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
) Workdays (Total, Single)
WHERE bStart.StartDate >= DATEADD(month, -3, GETDATE())
),
sorting (ID, StartDate, EndDate, Duration, CountBookings, RowNumStart, RowNumEnd) AS (
SELECT ID, StartDate, EndDate, TotalDays, CountBookings
, ROW_NUMBER() OVER (PARTITION BY ID, EndDate ORDER BY StartDate)
, ROW_NUMBER() OVER (PARTITION BY ID, StartDate ORDER BY EndDate DESC)
FROM comparison
WHERE TotalDays = SumSingleDays
),
counting (ID, StartDate, EndDate, Duration, Bookings) AS (
SELECT ID, StartDate, EndDate, Duration
, SUM(CountBookings) OVER (PARTITION BY ID)
FROM sorting WHERE (RowNumStart = 1) AND (RowNumEnd = 1)
)
SELECT ID, StartDate, EndDate, Duration, Bookings
, CASE
WHEN Duration >= 5 OR Bookings >= 2 THEN 'Resident' ELSE 'Visitor'
END AS [Type]
FROM counting
ORDER BY ID, StartDate;
过滤Classe参考:
将从 bStart
table 引用中获取并过滤 Class 引用。为了能够将此字段添加到最终查询中,还必须使用它来加入 bEnd
table 参考,因此只有具有相同 Class参考值的预订间隔才会被连接块:
WITH
comparison (ID, ClassReference, StartDate, EndDate, TotalDays, CountBookings, SumSingleDays) AS (
SELECT bStart.ID, bStart.ClassReference, bStart.StartDate, bEnd.EndDate, Workdays.Total
, COUNT(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
, SUM(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
FROM StudentBookings bStart
INNER JOIN StudentBookings bEnd
ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
AND bStart.ClassReference = bEnd.ClassReference
CROSS APPLY (VALUES (
DATEDIFF(day, 0, bStart.StartDate),
DATEDIFF(day, 0, bEnd.StartDate),
1+DATEDIFF(day, 0, bEnd.EndDate))
) d (s1, s2, e2)
CROSS APPLY (VALUES (
(d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
(d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
) Workdays (Total, Single)
WHERE bStart.StartDate >= DATEADD(month, -3, GETDATE())
AND bStart.ClassReference IN (N'C1', N'C2')
),
sorting (ID, ClassReference, StartDate, EndDate, Duration, CountBookings, RowNumStart, RowNumEnd) AS (
SELECT ID, ClassReference, StartDate, EndDate, TotalDays, CountBookings
, ROW_NUMBER() OVER (PARTITION BY ID, ClassReference, EndDate ORDER BY StartDate)
, ROW_NUMBER() OVER (PARTITION BY ID, ClassReference, StartDate ORDER BY EndDate DESC)
FROM comparison
WHERE TotalDays = SumSingleDays
),
counting (ID, ClassReference, StartDate, EndDate, Duration, Bookings) AS (
SELECT ID, ClassReference, StartDate, EndDate, Duration
, SUM(CountBookings) OVER (PARTITION BY ID, ClassReference)
FROM sorting WHERE (RowNumStart = 1) AND (RowNumEnd = 1)
)
SELECT ID, ClassReference, StartDate, EndDate, Duration, Bookings
, CASE
WHEN Duration >= 5 OR Bookings >= 2 THEN 'Resident' ELSE 'Visitor'
END AS [Type]
FROM counting
ORDER BY ID, StartDate;
使用此数据进行测试:
使用最近 12 个月的过滤器,查询 returns:
所以学生 1 在 class C2 中是 "Resident",但在 Class C1 中是访客。
我有一个学生 table,我想知道他们 session/training 持续了多长时间。 我想排除周末,但我想计算不包括周末的连续天数。 A class 有 Start Date 和 End Date,例如,学生 ID S1 a 可以在 1 月预订 class 然后在 2 月再次预订,我想知道 1 月预订和 2 月预订有多少天,同时排除周末。基本上,我正在通过学生 ID 寻找从开始日期到结束日期的连续日期,除了周末没有休息时间。
SELECT
[ID]
,[StartDate]
,[EndDate]
,[BookingDays] AS Consecutive_Booking
FROM StudentBooking
如果学生 (student classifications(Type)) 在过去 3 天内预订了 class 5 天或 2 次(开始日期到结束日期(周一至周五))个月他们是 Resident else Visitors。开始日期和结束日期仅记录为周一至周五。 请注意学生 ID 1 有一个连续的日期,这应该被算作一个块。 (02/01/2018-12/01/2018) 第二块 22/01-26/01
我想在下面复制 table。
ID StartDate EndDate Duration Type
1 02/01/2018 05/01/2018 ==>Please Note have continous dates
1 08/01/2018 12/01/2018 9 Resident
1 22/01/2018 26/01/2018 5 Resident
2 23/01/2018 26/01/2018 4 Visitor
3 29/01/2018 31/01/2018 3 Visitor
这是我对你的问题的解决方案。
在 CTE "comparison" 中,我将每条记录与该学生的这条记录和所有后续记录相结合。这样,我就有了一个连续训练块的可能起点(从连接的左侧)和这样一个块的可能结束点(从连接的右侧)。 使用 "cross applies",我计算了 2 个值:
- 从链的第一个可能间隔开始到最后一个可能间隔结束的工作日
- 链中可能的最后一个间隔中的工作日。
根据后面的值,使用 windows 函数,我根据可能的开始和结束间隔构建了 运行 工作日总数。 您用 "SQL 2012" 标记了问题,因此应该可以使用此 window 函数。
在下一个 CTE ("sorting") 中,我将之前的结果限制为 运行 总数等于第一个开始日期和最后一个结束日期之间的工作日。这样,只剩下连续的块。然后以两种方式对它们进行编号:
- 具有相同 EndDate 的连续块按 StartDate 升序编号
- 具有相同 StartDate 的连续块按 EndDate 降序编号。
对于每个 EndDate,我想要最早的 StartDate,而对于这个 StartDate,我只想要最新的 EndDate,所以我在两个编号中都筛选为 1。在这里:
WITH
comparison (ID, StartDate, EndDate, TotalDays, SumSingleDays) AS (
SELECT bStart.ID, bStart.StartDate, bEnd.EndDate, Workdays.Total
, SUM(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
FROM StudentBookings bStart
INNER JOIN StudentBookings bEnd
ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
CROSS APPLY (VALUES (
DATEDIFF(day, 0, bStart.StartDate),
DATEDIFF(day, 0, bEnd.StartDate),
1+DATEDIFF(day, 0, bEnd.EndDate))
) d (s1, s2, e2)
CROSS APPLY (VALUES (
(d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
(d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
) Workdays (Total, Single)
),
sorting (ID, StartDate, EndDate, Duration, RowNumStart, RowNumEnd) AS (
SELECT ID, StartDate, EndDate, TotalDays
, ROW_NUMBER() OVER (PARTITION BY ID, EndDate ORDER BY StartDate)
, ROW_NUMBER() OVER (PARTITION BY ID, StartDate ORDER BY EndDate DESC)
FROM comparison
WHERE TotalDays = SumSingleDays
)
SELECT ID, StartDate, EndDate, Duration
, CASE WHEN Duration >= 5 THEN 'Resident' ELSE 'Visitor' END AS [Type]
FROM sorting
WHERE (RowNumStart = 1)
AND (RowNumEnd = 1)
ORDER BY ID, StartDate;
结果:
也许有一个更优雅的方法来解决这个问题,使用 Itzik Ben-Gan 的间隔打包解决方案,我会 post 当我想出来的时候。
已添加:
此外,我计算了所有预订块的预订数量,并按学生 (ID) 求和,最终做出 "Resident" 决定。第一个 CTE(比较)的预订仅限于最近 3 个月:
WITH
comparison (ID, StartDate, EndDate, TotalDays, CountBookings, SumSingleDays) AS (
SELECT bStart.ID, bStart.StartDate, bEnd.EndDate, Workdays.Total
, COUNT(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
, SUM(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
FROM StudentBookings bStart
INNER JOIN StudentBookings bEnd
ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
CROSS APPLY (VALUES (
DATEDIFF(day, 0, bStart.StartDate),
DATEDIFF(day, 0, bEnd.StartDate),
1+DATEDIFF(day, 0, bEnd.EndDate))
) d (s1, s2, e2)
CROSS APPLY (VALUES (
(d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
(d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
) Workdays (Total, Single)
WHERE bStart.StartDate >= DATEADD(month, -3, GETDATE())
),
sorting (ID, StartDate, EndDate, Duration, CountBookings, RowNumStart, RowNumEnd) AS (
SELECT ID, StartDate, EndDate, TotalDays, CountBookings
, ROW_NUMBER() OVER (PARTITION BY ID, EndDate ORDER BY StartDate)
, ROW_NUMBER() OVER (PARTITION BY ID, StartDate ORDER BY EndDate DESC)
FROM comparison
WHERE TotalDays = SumSingleDays
),
counting (ID, StartDate, EndDate, Duration, Bookings) AS (
SELECT ID, StartDate, EndDate, Duration
, SUM(CountBookings) OVER (PARTITION BY ID)
FROM sorting WHERE (RowNumStart = 1) AND (RowNumEnd = 1)
)
SELECT ID, StartDate, EndDate, Duration, Bookings
, CASE
WHEN Duration >= 5 OR Bookings >= 2 THEN 'Resident' ELSE 'Visitor'
END AS [Type]
FROM counting
ORDER BY ID, StartDate;
过滤Classe参考:
将从 bStart
table 引用中获取并过滤 Class 引用。为了能够将此字段添加到最终查询中,还必须使用它来加入 bEnd
table 参考,因此只有具有相同 Class参考值的预订间隔才会被连接块:
WITH
comparison (ID, ClassReference, StartDate, EndDate, TotalDays, CountBookings, SumSingleDays) AS (
SELECT bStart.ID, bStart.ClassReference, bStart.StartDate, bEnd.EndDate, Workdays.Total
, COUNT(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
, SUM(Workdays.Single) OVER (
PARTITION BY bStart.ID, bStart.StartDate
ORDER BY bEnd.StartDate
ROWS UNBOUNDED PRECEDING)
FROM StudentBookings bStart
INNER JOIN StudentBookings bEnd
ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
AND bStart.ClassReference = bEnd.ClassReference
CROSS APPLY (VALUES (
DATEDIFF(day, 0, bStart.StartDate),
DATEDIFF(day, 0, bEnd.StartDate),
1+DATEDIFF(day, 0, bEnd.EndDate))
) d (s1, s2, e2)
CROSS APPLY (VALUES (
(d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
(d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
) Workdays (Total, Single)
WHERE bStart.StartDate >= DATEADD(month, -3, GETDATE())
AND bStart.ClassReference IN (N'C1', N'C2')
),
sorting (ID, ClassReference, StartDate, EndDate, Duration, CountBookings, RowNumStart, RowNumEnd) AS (
SELECT ID, ClassReference, StartDate, EndDate, TotalDays, CountBookings
, ROW_NUMBER() OVER (PARTITION BY ID, ClassReference, EndDate ORDER BY StartDate)
, ROW_NUMBER() OVER (PARTITION BY ID, ClassReference, StartDate ORDER BY EndDate DESC)
FROM comparison
WHERE TotalDays = SumSingleDays
),
counting (ID, ClassReference, StartDate, EndDate, Duration, Bookings) AS (
SELECT ID, ClassReference, StartDate, EndDate, Duration
, SUM(CountBookings) OVER (PARTITION BY ID, ClassReference)
FROM sorting WHERE (RowNumStart = 1) AND (RowNumEnd = 1)
)
SELECT ID, ClassReference, StartDate, EndDate, Duration, Bookings
, CASE
WHEN Duration >= 5 OR Bookings >= 2 THEN 'Resident' ELSE 'Visitor'
END AS [Type]
FROM counting
ORDER BY ID, StartDate;
使用此数据进行测试:
使用最近 12 个月的过滤器,查询 returns:
所以学生 1 在 class C2 中是 "Resident",但在 Class C1 中是访客。