将一个组的连续日期合并在一起
Merging consecutive dates for a group together
我有一个 table 的员工,如下所示:
Name
Department
Manager
Date
Employee 1
Dept 1
Manager X
202101
Employee 1
Dept 1
Manager X
202102
Employee 1
Dept 2
Manager X
202103
Employee 1
Dept 2
Manager X
202104
Employee 1
Dept 1
Manager X
202105
Employee 1
Dept 1
Manager X
202106
Employee 2
Dept 1
Manager X
202101
Employee 2
Dept 1
Manager X
202102
我需要构建一个以下列格式显示数据的视图:
Name
Department
Manager
Valid_From
Valid_To
Employee 1
Dept 1
Manager X
202101
202102
Employee 1
Dept 2
Manager X
202103
202104
Employee 1
Dept 1
Manager X
202105
999912
Employee 2
Dept 1
Manager X
202101
999912
到目前为止,代码如下所示:
WITH cte AS
(
SELECT [Name], Department, Manager, Valid_From = min([Date]), Valid_To = max([Date]),
RowNum = ROW_NUMBER() OVER (PARTITION BY [Name], ORDER BY max([Date]) DESC)
FROM TestingTable
WHERE ([Date] IS NOT NULL)
GROUP BY [Name], Department, Manager
)
SELECT [Name], Department, Manager, Valid_From,
CASE WHEN RowNum = 1 THEN 999912 ELSE Valid_To END AS Valid_To, CASE WHEN RowNum = 1 THEN 1 ELSE 0 END AS Is_Latest
FROM cte
输出是这样的 - 它对员工 1 在部门 1 工作的时间间隔进行了分组,而我在 2 个不同的时间间隔中需要它。
Name
Department
Manager
Valid_From
Valid_To
Employee 1
Dept 1
Manager X
202101
999912
Employee 1
Dept 2
Manager X
202103
202104
Employee 2
Dept 1
Manager X
202101
999912
我尝试了一些滞后和超前函数来比较日期,但我迷路了。
看来您需要将连续的 year-months 每个 employee-department-manager 组合在一起。可以这样做:
with cte1 as (
select name
, department
, manager
, datefromparts(date / 100, date % 100, 1) as yymm
from t
), cte2 as (
select *
, case when lag(yymm) over (partition by name, department, manager order by yymm) = dateadd(month, -1, yymm) then 0 else 1 end as new_grp
from cte1
), cte3 as (
select *
, sum(new_grp) over (partition by name, department, manager order by yymm) as grp_num
from cte2
)
select name
, department
, manager
, min(yymm) as valid_from
, max(yymm) as valid_to
from cte3
group by name, department, manager, grp_num
order by name, valid_from, department, manager
请注意,我必须将 year-months 转换为日期以便于比较。结果:
name
department
manager
valid_from
valid_to
Employee 1
Dept 1
Manager X
2021-01-01
2021-02-01
Employee 1
Dept 2
Manager X
2021-03-01
2021-04-01
Employee 1
Dept 1
Manager X
2021-05-01
2021-06-01
Employee 2
Dept 1
Manager X
2021-01-01
2021-02-01
用 9999-12-01
替换每个员工的最后一个 valid_to
是微不足道的,例如您可以检查 lead(valid_from) over (partition by name order by valid_from)
是否为空。
试试这个:
with tbl1 as (select
[name]
,[department]
,[manager]
,[date]
,ROW_NUMBER () over (partition by name, department, manager order by name, department) as rownum
from employees)
select [name]
,[department]
,[manager]
,[date] as [valid_from]
,(select [date] from tbl1 t2
where t2.[rownum] = t1.[rownum] + 1
and t1.name = t2.name
and t1.department = t2.department
and t1.manager = t2.manager
) as [valid_to]
from tbl1 t1
where rownum % 2 = 1
order by name, valid_from
它缺少 999912,因为我不明白底部两行中替换的逻辑。
看起来像是一种间隙和孤岛问题,在开区间检测方面有所不同
select distinct Name, Department, Manager,
min([date]) over(partition by Name, g) fromd,
max([date]) over(partition by Name, g) tod
from (
select *, sum(flag) over(partition by Name order by [date]) g
from (
select Name, Department, Manager,
case when lead(name) over(partition by Name order by [date]) is null then 999912 else [date] end [date],
case when department != lag(Department, 1, '') over(partition by Name order by [date])
or Manager != lag(Manager, 1, '') over(partition by Name order by [date])
then 1 else 0 end flag
from tbl
) t
) t
order by fromd
我有一个 table 的员工,如下所示:
Name | Department | Manager | Date |
---|---|---|---|
Employee 1 | Dept 1 | Manager X | 202101 |
Employee 1 | Dept 1 | Manager X | 202102 |
Employee 1 | Dept 2 | Manager X | 202103 |
Employee 1 | Dept 2 | Manager X | 202104 |
Employee 1 | Dept 1 | Manager X | 202105 |
Employee 1 | Dept 1 | Manager X | 202106 |
Employee 2 | Dept 1 | Manager X | 202101 |
Employee 2 | Dept 1 | Manager X | 202102 |
我需要构建一个以下列格式显示数据的视图:
Name | Department | Manager | Valid_From | Valid_To |
---|---|---|---|---|
Employee 1 | Dept 1 | Manager X | 202101 | 202102 |
Employee 1 | Dept 2 | Manager X | 202103 | 202104 |
Employee 1 | Dept 1 | Manager X | 202105 | 999912 |
Employee 2 | Dept 1 | Manager X | 202101 | 999912 |
到目前为止,代码如下所示:
WITH cte AS
(
SELECT [Name], Department, Manager, Valid_From = min([Date]), Valid_To = max([Date]),
RowNum = ROW_NUMBER() OVER (PARTITION BY [Name], ORDER BY max([Date]) DESC)
FROM TestingTable
WHERE ([Date] IS NOT NULL)
GROUP BY [Name], Department, Manager
)
SELECT [Name], Department, Manager, Valid_From,
CASE WHEN RowNum = 1 THEN 999912 ELSE Valid_To END AS Valid_To, CASE WHEN RowNum = 1 THEN 1 ELSE 0 END AS Is_Latest
FROM cte
输出是这样的 - 它对员工 1 在部门 1 工作的时间间隔进行了分组,而我在 2 个不同的时间间隔中需要它。
Name | Department | Manager | Valid_From | Valid_To |
---|---|---|---|---|
Employee 1 | Dept 1 | Manager X | 202101 | 999912 |
Employee 1 | Dept 2 | Manager X | 202103 | 202104 |
Employee 2 | Dept 1 | Manager X | 202101 | 999912 |
我尝试了一些滞后和超前函数来比较日期,但我迷路了。
看来您需要将连续的 year-months 每个 employee-department-manager 组合在一起。可以这样做:
with cte1 as (
select name
, department
, manager
, datefromparts(date / 100, date % 100, 1) as yymm
from t
), cte2 as (
select *
, case when lag(yymm) over (partition by name, department, manager order by yymm) = dateadd(month, -1, yymm) then 0 else 1 end as new_grp
from cte1
), cte3 as (
select *
, sum(new_grp) over (partition by name, department, manager order by yymm) as grp_num
from cte2
)
select name
, department
, manager
, min(yymm) as valid_from
, max(yymm) as valid_to
from cte3
group by name, department, manager, grp_num
order by name, valid_from, department, manager
请注意,我必须将 year-months 转换为日期以便于比较。结果:
name | department | manager | valid_from | valid_to |
---|---|---|---|---|
Employee 1 | Dept 1 | Manager X | 2021-01-01 | 2021-02-01 |
Employee 1 | Dept 2 | Manager X | 2021-03-01 | 2021-04-01 |
Employee 1 | Dept 1 | Manager X | 2021-05-01 | 2021-06-01 |
Employee 2 | Dept 1 | Manager X | 2021-01-01 | 2021-02-01 |
用 9999-12-01
替换每个员工的最后一个 valid_to
是微不足道的,例如您可以检查 lead(valid_from) over (partition by name order by valid_from)
是否为空。
试试这个:
with tbl1 as (select
[name]
,[department]
,[manager]
,[date]
,ROW_NUMBER () over (partition by name, department, manager order by name, department) as rownum
from employees)
select [name]
,[department]
,[manager]
,[date] as [valid_from]
,(select [date] from tbl1 t2
where t2.[rownum] = t1.[rownum] + 1
and t1.name = t2.name
and t1.department = t2.department
and t1.manager = t2.manager
) as [valid_to]
from tbl1 t1
where rownum % 2 = 1
order by name, valid_from
它缺少 999912,因为我不明白底部两行中替换的逻辑。
看起来像是一种间隙和孤岛问题,在开区间检测方面有所不同
select distinct Name, Department, Manager,
min([date]) over(partition by Name, g) fromd,
max([date]) over(partition by Name, g) tod
from (
select *, sum(flag) over(partition by Name order by [date]) g
from (
select Name, Department, Manager,
case when lead(name) over(partition by Name order by [date]) is null then 999912 else [date] end [date],
case when department != lag(Department, 1, '') over(partition by Name order by [date])
or Manager != lag(Manager, 1, '') over(partition by Name order by [date])
then 1 else 0 end flag
from tbl
) t
) t
order by fromd