根据前一行的增量创建新的 RANK
Creating a new RANK based on delta of previous row
几天来我一直在处理一个问题,但似乎找不到正确的解决方法。有人有想法吗?
案例
我们想在员工辞职超过 1 天时创建一个新的序列号。我们有当前就业记录和以前的就业记录的增量,所以我们可以检查序列。我们要计算每条就业记录的最小(开始)和最大(结束),间隔不超过 1 天。
数据
Employee
Contract
Unit
Start
End
Delta
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
John Doe
2
Unit A
2018-02-01
2018-12-31
31
John Doe
3
Unit B
2019-01-01
2020-05-31
1
John Doe
4
Unit A
2020-06-01
NULL
1
对于查询,它应该返回:
Employee
Contract
Unit
Start
End
Delta
Sequence
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
1
John Doe
2
Unit A
2018-02-01
2018-12-31
31
2
John Doe
3
Unit B
2019-01-01
2020-05-31
1
2
John Doe
4
Unit A
2020-06-01
NULL
1
2
那是因为sequence 1结束于31-12-2017,新的sequence 1从2018年2月开始,所以记录之间相隔了1天多。以下都是2的序列,因为它还在继续。
查询
我已经用 lag() 和 lead() 尝试了一些东西,但我一直在努力使用我拥有的数据样本。当我 运行 全套时它不会工作。
SELECT
Employee,
Start,
End,
DeltaPrevious,
Delta,
DeltaNext,
case
when DeltaPrevious IS NULL AND Delta = 1 then 1
when DeltaPrevious = 1 AND Delta > 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
when DeltaPrevious > 1 AND Delta = 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
end as Sequence
FROM
Contracts
ORDER BY
Employee, Start ASC
希望有人有好主意。
谢谢,
如果我从你的第二个 table 中 Sequence
的定义理解正确的话,你对 DeltaNext
比 Delta(Previous)
更感兴趣。这里是一次尝试,包括创建一个示例输入日期和另外两名员工的代码:
CREATE TABLE #input_table (Employee VARCHAR(255), [Contract] INT, Unit VARCHAR(6), [Start] DATE, [End] DATE)
INSERT INTO #input_table
VALUES
('John Doe', 1, 'Unit A', '2014-01-01', '2017-12-31'),
('John Doe', 2, 'Unit A', '2018-02-01', '2018-12-31'),
('John Doe', 3, 'Unit B', '2019-01-01', '2020-05-31'),
('John Doe', 4, 'Unit A', '2020-06-01', NULL),
('Alice', 1, 'Unit A', '2020-01-01', NULL),
('Bob', 1, 'Unit C', '2020-01-01', '2020-02-20')
首先我们创建增量:
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee
ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
INTO #cte_delta -- I'll create a CTE at the end
FROM #input_table
然后我们定义Sequence
:
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
INTO #cte_sequence
FROM #cte_delta
然后,我们通过为具有连续/相同 Sequence
的每个员工分配一个唯一的 ROW_NUMBER
来对相同的 Sequence
进行分组:
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
INTO #cte_grp
FROM #cte_sequence
最后我们计算合约期限的min
和max
:
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End])
OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
COUNT(*)
和 COUNT([End])
比较是必要的,否则 ContractEnd
将是最大的非 NULL 值,即 2018-02-01
.
这里有 CTE
的整个代码:
WITH cte_delta AS (
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
FROM #input_table
)
, cte_sequence AS (
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
FROM cte_delta
)
, cte_grp AS (
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
FROM cte_sequence
)
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End]) OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
此处输出:
Employee
Contract
Unit
Start
End
DeltaPrev
DeltaNext
Sequence
GRP
ContractStart
ContractEnd
Alice
1
Unit A
2020-01-01
NULL
NULL
NULL
2
0
2020-01-01
NULL
Bob
1
Unit C
2020-01-01
2020-02-20
NULL
NULL
2
0
2020-01-01
2020-02-20
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
32
1
0
2014-01-01
2017-12-31
John Doe
2
Unit A
2018-02-01
2018-12-31
32
1
2
1
2018-02-01
NULL
John Doe
3
Unit B
2019-01-01
2020-05-31
1
1
2
1
2018-02-01
NULL
John Doe
4
Unit A
2020-06-01
NULL
1
NULL
2
1
2018-02-01
NULL
欢迎根据需要selectDISTINCT
记录。
基本上,您想使用 lag()
获取前一个日期,然后进行累加。这看起来像:
select c.*,
sum(case when prev_end >= dateadd(day, -1, start) then 0 else 1
end) over (partition by employee order by start) as ranking
from (select c.*,
lag(end) over (partition by employee order by start) as prev_end
from contracts c
) c;
您提到您可能想要重新计算新的 start
和 end
。您只需将上面的内容用作 subquery/CTE 并聚合 employee
和 ranking
.
几天来我一直在处理一个问题,但似乎找不到正确的解决方法。有人有想法吗?
案例
我们想在员工辞职超过 1 天时创建一个新的序列号。我们有当前就业记录和以前的就业记录的增量,所以我们可以检查序列。我们要计算每条就业记录的最小(开始)和最大(结束),间隔不超过 1 天。
数据
Employee | Contract | Unit | Start | End | Delta |
---|---|---|---|---|---|
John Doe | 1 | Unit A | 2014-01-01 | 2017-12-31 | NULL |
John Doe | 2 | Unit A | 2018-02-01 | 2018-12-31 | 31 |
John Doe | 3 | Unit B | 2019-01-01 | 2020-05-31 | 1 |
John Doe | 4 | Unit A | 2020-06-01 | NULL | 1 |
对于查询,它应该返回:
Employee | Contract | Unit | Start | End | Delta | Sequence |
---|---|---|---|---|---|---|
John Doe | 1 | Unit A | 2014-01-01 | 2017-12-31 | NULL | 1 |
John Doe | 2 | Unit A | 2018-02-01 | 2018-12-31 | 31 | 2 |
John Doe | 3 | Unit B | 2019-01-01 | 2020-05-31 | 1 | 2 |
John Doe | 4 | Unit A | 2020-06-01 | NULL | 1 | 2 |
那是因为sequence 1结束于31-12-2017,新的sequence 1从2018年2月开始,所以记录之间相隔了1天多。以下都是2的序列,因为它还在继续。
查询
我已经用 lag() 和 lead() 尝试了一些东西,但我一直在努力使用我拥有的数据样本。当我 运行 全套时它不会工作。
SELECT
Employee,
Start,
End,
DeltaPrevious,
Delta,
DeltaNext,
case
when DeltaPrevious IS NULL AND Delta = 1 then 1
when DeltaPrevious = 1 AND Delta > 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
when DeltaPrevious > 1 AND Delta = 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
end as Sequence
FROM
Contracts
ORDER BY
Employee, Start ASC
希望有人有好主意。
谢谢,
如果我从你的第二个 table 中 Sequence
的定义理解正确的话,你对 DeltaNext
比 Delta(Previous)
更感兴趣。这里是一次尝试,包括创建一个示例输入日期和另外两名员工的代码:
CREATE TABLE #input_table (Employee VARCHAR(255), [Contract] INT, Unit VARCHAR(6), [Start] DATE, [End] DATE)
INSERT INTO #input_table
VALUES
('John Doe', 1, 'Unit A', '2014-01-01', '2017-12-31'),
('John Doe', 2, 'Unit A', '2018-02-01', '2018-12-31'),
('John Doe', 3, 'Unit B', '2019-01-01', '2020-05-31'),
('John Doe', 4, 'Unit A', '2020-06-01', NULL),
('Alice', 1, 'Unit A', '2020-01-01', NULL),
('Bob', 1, 'Unit C', '2020-01-01', '2020-02-20')
首先我们创建增量:
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee
ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
INTO #cte_delta -- I'll create a CTE at the end
FROM #input_table
然后我们定义Sequence
:
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
INTO #cte_sequence
FROM #cte_delta
然后,我们通过为具有连续/相同 Sequence
的每个员工分配一个唯一的 ROW_NUMBER
来对相同的 Sequence
进行分组:
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
INTO #cte_grp
FROM #cte_sequence
最后我们计算合约期限的min
和max
:
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End])
OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
COUNT(*)
和 COUNT([End])
比较是必要的,否则 ContractEnd
将是最大的非 NULL 值,即 2018-02-01
.
这里有 CTE
的整个代码:
WITH cte_delta AS (
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
FROM #input_table
)
, cte_sequence AS (
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
FROM cte_delta
)
, cte_grp AS (
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
FROM cte_sequence
)
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End]) OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
此处输出:
Employee | Contract | Unit | Start | End | DeltaPrev | DeltaNext | Sequence | GRP | ContractStart | ContractEnd |
---|---|---|---|---|---|---|---|---|---|---|
Alice | 1 | Unit A | 2020-01-01 | NULL | NULL | NULL | 2 | 0 | 2020-01-01 | NULL |
Bob | 1 | Unit C | 2020-01-01 | 2020-02-20 | NULL | NULL | 2 | 0 | 2020-01-01 | 2020-02-20 |
John Doe | 1 | Unit A | 2014-01-01 | 2017-12-31 | NULL | 32 | 1 | 0 | 2014-01-01 | 2017-12-31 |
John Doe | 2 | Unit A | 2018-02-01 | 2018-12-31 | 32 | 1 | 2 | 1 | 2018-02-01 | NULL |
John Doe | 3 | Unit B | 2019-01-01 | 2020-05-31 | 1 | 1 | 2 | 1 | 2018-02-01 | NULL |
John Doe | 4 | Unit A | 2020-06-01 | NULL | 1 | NULL | 2 | 1 | 2018-02-01 | NULL |
欢迎根据需要selectDISTINCT
记录。
基本上,您想使用 lag()
获取前一个日期,然后进行累加。这看起来像:
select c.*,
sum(case when prev_end >= dateadd(day, -1, start) then 0 else 1
end) over (partition by employee order by start) as ranking
from (select c.*,
lag(end) over (partition by employee order by start) as prev_end
from contracts c
) c;
您提到您可能想要重新计算新的 start
和 end
。您只需将上面的内容用作 subquery/CTE 并聚合 employee
和 ranking
.