将一个组的连续日期合并在一起

Merging consecutive dates for a group together

我有一个 table 的员工,如下所示:

Name Department Manager Date
Employee 1 Dept 1 Manager X 202101
Employee 1 Dept 1 Manager X 202102
Employee 1 Dept 2 Manager X 202103
Employee 1 Dept 2 Manager X 202104
Employee 1 Dept 1 Manager X 202105
Employee 1 Dept 1 Manager X 202106
Employee 2 Dept 1 Manager X 202101
Employee 2 Dept 1 Manager X 202102

我需要构建一个以下列格式显示数据的视图:

Name Department Manager Valid_From Valid_To
Employee 1 Dept 1 Manager X 202101 202102
Employee 1 Dept 2 Manager X 202103 202104
Employee 1 Dept 1 Manager X 202105 999912
Employee 2 Dept 1 Manager X 202101 999912

到目前为止,代码如下所示:

WITH cte AS
(
   SELECT [Name], Department, Manager, Valid_From = min([Date]), Valid_To = max([Date]),
      RowNum = ROW_NUMBER() OVER (PARTITION BY [Name], ORDER BY max([Date]) DESC)
   FROM TestingTable
   WHERE ([Date] IS NOT NULL)
   GROUP BY [Name], Department, Manager
)
SELECT [Name], Department, Manager, Valid_From,
    CASE WHEN RowNum = 1 THEN 999912 ELSE Valid_To END AS Valid_To, CASE WHEN RowNum = 1 THEN 1 ELSE 0 END AS Is_Latest
FROM cte

输出是这样的 - 它对员工 1 在部门 1 工作的时间间隔进行了分组,而我在 2 个不同的时间间隔中需要它。

Name Department Manager Valid_From Valid_To
Employee 1 Dept 1 Manager X 202101 999912
Employee 1 Dept 2 Manager X 202103 202104
Employee 2 Dept 1 Manager X 202101 999912

我尝试了一些滞后和超前函数来比较日期,但我迷路了。

看来您需要将连续的 year-months 每个 employee-department-manager 组合在一起。可以这样做:

with cte1 as (
    select name
         , department
         , manager
         , datefromparts(date / 100, date % 100, 1) as yymm
    from t
), cte2 as (
    select *
         , case when lag(yymm) over (partition by name, department, manager order by yymm) = dateadd(month, -1, yymm) then 0 else 1 end as new_grp
    from cte1
), cte3 as (
    select *
         , sum(new_grp) over (partition by name, department, manager order by yymm) as grp_num
    from cte2
)
select name
     , department
     , manager
     , min(yymm) as valid_from
     , max(yymm) as valid_to
from cte3
group by name, department, manager, grp_num
order by name, valid_from, department, manager

请注意,我必须将 year-months 转换为日期以便于比较。结果:

name department manager valid_from valid_to
Employee 1 Dept 1 Manager X 2021-01-01 2021-02-01
Employee 1 Dept 2 Manager X 2021-03-01 2021-04-01
Employee 1 Dept 1 Manager X 2021-05-01 2021-06-01
Employee 2 Dept 1 Manager X 2021-01-01 2021-02-01

9999-12-01 替换每个员工的最后一个 valid_to 是微不足道的,例如您可以检查 lead(valid_from) over (partition by name order by valid_from) 是否为空。

试试这个:

with tbl1 as (select 
    [name]
    ,[department]
    ,[manager]
    ,[date]
    ,ROW_NUMBER () over (partition by name, department, manager order by name, department) as rownum 
from employees)
select  [name]
    ,[department]
    ,[manager]
    ,[date] as [valid_from]
    ,(select [date] from tbl1 t2 
    where t2.[rownum]  = t1.[rownum] + 1
    and t1.name = t2.name
    and t1.department = t2.department
    and t1.manager = t2.manager
    ) as [valid_to]
from tbl1 t1
where rownum % 2 = 1
order by name, valid_from

它缺少 999912,因为我不明白底部两行中替换的逻辑。

看起来像是一种间隙和孤岛问题,在开区间检测方面有所不同

select distinct Name, Department, Manager, 
      min([date]) over(partition by Name, g) fromd,
      max([date]) over(partition by Name, g) tod
from (
  select *, sum(flag) over(partition by Name order by [date]) g
  from (
    select Name, Department, Manager,
      case when lead(name) over(partition by Name order by [date]) is null then 999912 else [date] end [date],
      case when department != lag(Department, 1, '') over(partition by Name order by [date]) 
            or  Manager != lag(Manager, 1, '') over(partition by Name order by [date])
           then 1 else 0 end flag
    from tbl
  ) t
) t
order by fromd

db<>fiddle