基于日期列的开始日期结束日期计算
Start Date end date calculation based on date column
我正在尝试根据 table 中的日期列计算 StartDate 和 EndDate。
下面是源码table 长得像
场景一
ID
SERIAL_NUMBER
STATUS
READ_DT
123456789
42007
D
15-12-2021
123456789
42007
D
16-12-2021
123456789
42007
D
17-12-2021
123456789
42007
D
18-12-2021
123456789
42007
D
19-12-2021
123456789
42007
D
20-12-2021
123456789
42007
D
21-12-2021
我想根据 READ_DT 计算 start_date 和 end_date,对于 ID 和 SERIAL_NUMBER 如果所有 READ_DT 都可用,那么输出应该如下
ID
SERIAL_NUMBER
STATUS
Start_Date
End_Date
123456789
42007
D
15-12-2021
21-12-2021
场景 2
ID
SERIAL_NUMBER
STATUS
READ_DT
123456789
42007
D
15-12-2021
123456789
42007
D
16-12-2021
123456789
42007
D
17-12-2021
123456789
42007
D
19-12-2021
123456789
42007
D
20-12-2021
123456789
42007
D
21-12-2021
如果 READ_DT 之间存在任何差距,则预期输出应该在以下两个事务中。
ID
SERIAL_NUMBER
STATUS
Start_Date
End_Date
123456789
42007
D
15-12-2021
17-12-2021
123456789
42007
D
19-12-2021
21-12-2021
对于场景 1,您可以直接使用聚合最小和最大函数按剩余列分组。
select ID,SERIAL_NUMBER, STATUS, convert(varchar, min(READ_DT), 105) as Start_Date, convert(varchar, max(READ_DT), 105) as End_Date
from tb1
group by ID,SERIAL_NUMBER, STATUS
对于场景2,我使用LAG函数获取当前行与上一行的日期差异,然后进行聚合。
此代码适用于场景 1 和 2 数据。
代码:
drop table if exists #t1
--stores diff_days and missing date from sequence
SELECT READ_DT,
case when DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT ) is NULL then 1
else DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT )
end AS diff_day
,case when DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT ) >1 then DATEADD(day, -1, READ_DT)
end as diff_read_dt
into #t1
from tb2
--update diff_day column where date greater that missing date to aggregate on the result set
update #t1
set diff_day = diff_day+1
where convert(date,READ_DT) > (select dateadd(day,1,convert(date,diff_read_dt)) from #t1 where diff_read_dt is not null)
--get the required results using min and max
select a.ID, a.SERIAL_NUMBER, a.STATUS, convert(varchar, min(a.READ_DT), 105) as Start_Date, convert(varchar, max(a.READ_DT), 105) as End_Date
from tb2 a
inner join #t1 b on convert(date,a.READ_DT) = convert(date,b.READ_DT)
group by a.ID, a.SERIAL_NUMBER, a.STATUS, b.diff_day
一点顺序时间数学就可以简化这些事情。
--===== This will work for either scenario
WITH cteDTgrp AS
(--==== Subtract an increasing number of days from each date to create the date groups.
SELECT *
,DT_Grp = DATEADD(dd,-ROW_NUMBER() OVER (PARTITION BY ID,SERIAL_NUMBER,STATUS ORDER BY READ_DT),READ_DT)
FROM dbo.YourTableNameHere
)--==== Then the grouping to get the start and end dates is trivial.
SELECT ID,SERIAL_NUMBER,STATUS
,Start_Date = MIN(READ_DT)
,End_Date = MAX(READ_DT)
FROM cteDTgrp
GROUP BY ID,SERIAL_NUMBER,STATUS,DT_Grp --<----This is the key!
ORDER BY ID,SERIAL_NUMBER,STATUS,Start_Date
;
请注意,这仅在 READ_DT 对于每组 ID 是唯一的时才有效,SERIAL_NUMBER,STATUS。
我正在尝试根据 table 中的日期列计算 StartDate 和 EndDate。 下面是源码table 长得像
场景一
ID | SERIAL_NUMBER | STATUS | READ_DT |
---|---|---|---|
123456789 | 42007 | D | 15-12-2021 |
123456789 | 42007 | D | 16-12-2021 |
123456789 | 42007 | D | 17-12-2021 |
123456789 | 42007 | D | 18-12-2021 |
123456789 | 42007 | D | 19-12-2021 |
123456789 | 42007 | D | 20-12-2021 |
123456789 | 42007 | D | 21-12-2021 |
我想根据 READ_DT 计算 start_date 和 end_date,对于 ID 和 SERIAL_NUMBER 如果所有 READ_DT 都可用,那么输出应该如下
ID | SERIAL_NUMBER | STATUS | Start_Date | End_Date |
---|---|---|---|---|
123456789 | 42007 | D | 15-12-2021 | 21-12-2021 |
场景 2
ID | SERIAL_NUMBER | STATUS | READ_DT |
---|---|---|---|
123456789 | 42007 | D | 15-12-2021 |
123456789 | 42007 | D | 16-12-2021 |
123456789 | 42007 | D | 17-12-2021 |
123456789 | 42007 | D | 19-12-2021 |
123456789 | 42007 | D | 20-12-2021 |
123456789 | 42007 | D | 21-12-2021 |
如果 READ_DT 之间存在任何差距,则预期输出应该在以下两个事务中。
ID | SERIAL_NUMBER | STATUS | Start_Date | End_Date |
---|---|---|---|---|
123456789 | 42007 | D | 15-12-2021 | 17-12-2021 |
123456789 | 42007 | D | 19-12-2021 | 21-12-2021 |
对于场景 1,您可以直接使用聚合最小和最大函数按剩余列分组。
select ID,SERIAL_NUMBER, STATUS, convert(varchar, min(READ_DT), 105) as Start_Date, convert(varchar, max(READ_DT), 105) as End_Date
from tb1
group by ID,SERIAL_NUMBER, STATUS
对于场景2,我使用LAG函数获取当前行与上一行的日期差异,然后进行聚合。
此代码适用于场景 1 和 2 数据。
代码:
drop table if exists #t1
--stores diff_days and missing date from sequence
SELECT READ_DT,
case when DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT ) is NULL then 1
else DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT )
end AS diff_day
,case when DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT ) >1 then DATEADD(day, -1, READ_DT)
end as diff_read_dt
into #t1
from tb2
--update diff_day column where date greater that missing date to aggregate on the result set
update #t1
set diff_day = diff_day+1
where convert(date,READ_DT) > (select dateadd(day,1,convert(date,diff_read_dt)) from #t1 where diff_read_dt is not null)
--get the required results using min and max
select a.ID, a.SERIAL_NUMBER, a.STATUS, convert(varchar, min(a.READ_DT), 105) as Start_Date, convert(varchar, max(a.READ_DT), 105) as End_Date
from tb2 a
inner join #t1 b on convert(date,a.READ_DT) = convert(date,b.READ_DT)
group by a.ID, a.SERIAL_NUMBER, a.STATUS, b.diff_day
一点顺序时间数学就可以简化这些事情。
--===== This will work for either scenario
WITH cteDTgrp AS
(--==== Subtract an increasing number of days from each date to create the date groups.
SELECT *
,DT_Grp = DATEADD(dd,-ROW_NUMBER() OVER (PARTITION BY ID,SERIAL_NUMBER,STATUS ORDER BY READ_DT),READ_DT)
FROM dbo.YourTableNameHere
)--==== Then the grouping to get the start and end dates is trivial.
SELECT ID,SERIAL_NUMBER,STATUS
,Start_Date = MIN(READ_DT)
,End_Date = MAX(READ_DT)
FROM cteDTgrp
GROUP BY ID,SERIAL_NUMBER,STATUS,DT_Grp --<----This is the key!
ORDER BY ID,SERIAL_NUMBER,STATUS,Start_Date
;
请注意,这仅在 READ_DT 对于每组 ID 是唯一的时才有效,SERIAL_NUMBER,STATUS。