根据前一行的增量创建新的 RANK

Creating a new RANK based on delta of previous row

几天来我一直在处理一个问题,但似乎找不到正确的解决方法。有人有想法吗?

案例

我们想在员工辞职超过 1 天时创建一个新的序列号。我们有当前就业记录和以前的就业记录的增量,所以我们可以检查序列。我们要计算每条就业记录的最小(开始)和最大(结束),间隔不超过 1 天。

数据

Employee Contract Unit Start End Delta
John Doe 1 Unit A 2014-01-01 2017-12-31 NULL
John Doe 2 Unit A 2018-02-01 2018-12-31 31
John Doe 3 Unit B 2019-01-01 2020-05-31 1
John Doe 4 Unit A 2020-06-01 NULL 1

对于查询,它应该返回:

Employee Contract Unit Start End Delta Sequence
John Doe 1 Unit A 2014-01-01 2017-12-31 NULL 1
John Doe 2 Unit A 2018-02-01 2018-12-31 31 2
John Doe 3 Unit B 2019-01-01 2020-05-31 1 2
John Doe 4 Unit A 2020-06-01 NULL 1 2

那是因为sequence 1结束于31-12-2017,新的sequence 1从2018年2月开始,所以记录之间相隔了1天多。以下都是2的序列,因为它还在继续。

查询

我已经用 lag() 和 lead() 尝试了一些东西,但我一直在努力使用我拥有的数据样本。当我 运行 全套时它不会工作。

SELECT
    Employee,
    Start,
    End,
    DeltaPrevious,
    Delta,
    DeltaNext,
    case
        when DeltaPrevious IS NULL AND Delta = 1 then 1
        when DeltaPrevious = 1 AND Delta > 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
        when DeltaPrevious > 1 AND Delta = 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
    end as Sequence
FROM
    Contracts
ORDER BY
    Employee, Start ASC

希望有人有好主意。

谢谢,

如果我从你的第二个 table 中 Sequence 的定义理解正确的话,你对 DeltaNextDelta(Previous) 更感兴趣。这里是一次尝试,包括创建一个示例输入日期和另外两名员工的代码:

CREATE TABLE #input_table (Employee VARCHAR(255), [Contract] INT, Unit VARCHAR(6), [Start] DATE, [End] DATE)

INSERT INTO #input_table
VALUES
('John Doe',    1,  'Unit A',   '2014-01-01',   '2017-12-31'),
('John Doe',    2,  'Unit A',   '2018-02-01',   '2018-12-31'),
('John Doe',    3,  'Unit B',   '2019-01-01',   '2020-05-31'),
('John Doe',    4,  'Unit A',   '2020-06-01',   NULL),
('Alice',       1,  'Unit A',   '2020-01-01',   NULL),
('Bob',         1,  'Unit C',   '2020-01-01',   '2020-02-20')

首先我们创建增量:

SELECT *
    , DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee 
ORDER BY [Start]), [Start])  -- Not relevant (?)
    , DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
INTO #cte_delta -- I'll create a CTE at the end
FROM #input_table

然后我们定义Sequence:

SELECT *
    , [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
INTO #cte_sequence
FROM #cte_delta

然后,我们通过为具有连续/相同 Sequence 的每个员工分配一个唯一的 ROW_NUMBER 来对相同的 Sequence 进行分组:

SELECT *
    , GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start]) 
INTO #cte_grp
FROM #cte_sequence

最后我们计算合约期限的minmax

SELECT *
    , MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
    , CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End]) 
OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd  
FROM cte_grp

COUNT(*)COUNT([End]) 比较是必要的,否则 ContractEnd 将是最大的非 NULL 值,即 2018-02-01.

这里有 CTE 的整个代码:

WITH cte_delta AS (
    SELECT *
        , DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]), [Start])  -- Not relevant (?)
        , DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
    FROM #input_table
)
, cte_sequence AS (
    SELECT *
        , [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
    FROM cte_delta
)
, cte_grp AS (
SELECT *
    , GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
FROM cte_sequence
) 
SELECT *
    , MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
    , CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End]) OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd  
FROM cte_grp

此处输出:

Employee Contract Unit Start End DeltaPrev DeltaNext Sequence GRP ContractStart ContractEnd
Alice 1 Unit A 2020-01-01 NULL NULL NULL 2 0 2020-01-01 NULL
Bob 1 Unit C 2020-01-01 2020-02-20 NULL NULL 2 0 2020-01-01 2020-02-20
John Doe 1 Unit A 2014-01-01 2017-12-31 NULL 32 1 0 2014-01-01 2017-12-31
John Doe 2 Unit A 2018-02-01 2018-12-31 32 1 2 1 2018-02-01 NULL
John Doe 3 Unit B 2019-01-01 2020-05-31 1 1 2 1 2018-02-01 NULL
John Doe 4 Unit A 2020-06-01 NULL 1 NULL 2 1 2018-02-01 NULL

欢迎根据需要selectDISTINCT记录。

基本上,您想使用 lag() 获取前一个日期,然后进行累加。这看起来像:

select c.*,
       sum(case when prev_end >= dateadd(day, -1, start) then 0 else 1
           end) over (partition by employee order by start) as ranking
from (select c.*,
             lag(end) over (partition by employee order by start) as prev_end
      from contracts c
     ) c;

您提到您可能想要重新计算新的 startend。您只需将上面的内容用作 subquery/CTE 并聚合 employeeranking.