根据中断条件将多行折叠成一行

Collapse multiple rows into a single row based upon a break condition

我有一个听起来很简单的要求,现在已经难住了我一天左右,所以是时候寻求专家的帮助了。

我的要求是根据中断条件将多行简单地汇总成一行 - 当这些列中的任何一个更改员工 ID、津贴计划、津贴金额或截止日期时,该行将被保留,如果有道理的话。

示例源数据集如下所示:

折叠行后的目标数据应如下所示:

如您所见,我不需要任何类型的 运行 总计计算,我只需要将行折叠成每个来自 date/to 日期组合的记录。

到目前为止,我已经使用 GROUP BY 和 MIN 函数尝试了以下 SQL

select [Employee ID], [Allowance Plan], 
       min([From Date]), max([To Date]), [Allowance Amount] 
from   [dbo].[#AllowInfo] 
group by [Employee ID], [Allowance Plan], [Allowance Amount]

但这只给了我一行,没有考虑中断条件。

我需要做什么才能正确地汇总记录(如果这不是正确的术语,请纠正我)考虑到中断条件?

感谢任何帮助。

谢谢。

请注意,您的测试数据并没有很好地运用算法 - 例如您只有一名员工,一个计划。另外,正如您所描述的,您最终会得到 4 行,因为日期在 7->8、8->9、9->10 和 10->11 之间发生了变化。

但我可以看到您正在尝试做什么,所以这至少应该让您走上正轨,并且 returns 预期的 3 行。我将组的末尾设为 employee/plan/amount 已更改的位置,或者 todate 不为空的位置(或者我们到达数据末尾的位置)

CREATE TABLE #data
(
    RowID INT,
    EmployeeID INT,
    AllowancePlan VARCHAR(30),
    FromDate DATE, 
    ToDate DATE,
    AllowanceAmount DECIMAL(12,2)
);

INSERT INTO #data(RowID, EmployeeID, AllowancePlan, FromDate, ToDate, AllowanceAmount)
VALUES
(1,200690,'CarAllowance','30/03/2017', NULL, 1000.0),
(2,200690,'CarAllowance','01/08/2017', NULL, 1000.0),
(6,200690,'CarAllowance','23/04/2018', NULL, 1000.0),
(7,200690,'CarAllowance','30/03/2018', NULL, 1000.0),
(8,200690,'CarAllowance','21/06/2018', '01/04/2019', 1000.0),
(9,200690,'CarAllowance','04/11/2021', NULL, 1000.0),
(10,200690,'CarAllowance','30/03/2017', '13/05/2022', 1000.0),
(11,200690,'CarAllowance','14/05/2022', NULL, 850.0);

-- find where the break points are
WITH chg AS 
(
    SELECT *, 
        CASE WHEN LAG(EmployeeID, 1, -1) OVER(ORDER BY RowID) != EmployeeID
               OR LAG(AllowancePlan, 1, 'X') OVER(ORDER BY RowID) != AllowancePlan  
               OR LAG(AllowanceAmount, 1, -1) OVER(ORDER BY RowID) != AllowanceAmount
               OR LAG(ToDate, 1) OVER(ORDER BY RowID) IS NOT NULL
    THEN 1 ELSE 0 END AS NewGroup
    FROM #data   
), 
-- count the number of break points as we go to group the related rows
grp AS
(
    SELECT chg.*,
        ISNULL(
            SUM(NewGroup) 
                OVER (ORDER BY RowID 
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
              0) AS grpNum
    FROM chg
) 
SELECT  MIN(grp.RowID) AS RowID, 
        MAX(grp.EmployeeID) AS EmployeeID,      
        MAX(grp.AllowancePlan) AS AllowancePlan,
        MIN(grp.FromDate) AS FromDate, 
        MAX(grp.ToDate) AS ToDate,  
        MAX(grp.AllowanceAmount) AS AllowanceAmount
FROM grp
GROUP BY grpNum

一种方法是获取所有行的最后日期,然后对其进行分组

select min(t.RowID) as RowID,
       t.EmployeeID,
       min(t.AllowancePlan) as AllowancePlan,
       min(t.FromDate) as FromDate,
       max(t.ToDate) as ToDate,
       min(t.AllowanceAmount) as AllowanceAmount
from   ( select t.RowID,
                t.EmployeeID,
                t.FromDate,
                t.AllowancePlan,
                t.AllowanceAmount,
                case when t.ToDate is null then ( select top 1 t2.ToDate 
                                                  from   test t2
                                                  where  t2.EmployeeID = t.EmployeeID
                                                  and    t2.ToDate is not null
                                                  and    t2.FromDate > t.FromDate -- t2.RowID > t.RowID
                                                  order by t2.RowID, t2.FromDate 
                                                )
                     else t.ToDate
                end as todate     
         from   test t
       ) t

group by t.EmployeeID, t.ToDate
order by t.EmployeeID, min(t.RowID)

自己看和测试in this DBFiddle

结果是

RowID EmployeeID AllowancePlan FromDate ToDate AllowanceAmount
1 200690 CarAllowance 2017-03-30 2019-04-01 1000
9 200690 CarAllowance 2021-11-04 2022-05-13 1000
11 200690 CarAllowance 2022-05-14 (null) 850