SQL 排名计数出现次数

SQL rank counting occurences

在我的 table 中,我有这样的员工合同数据:

而且我想在时间段内找到工作职位,比如这样

login ValidFrom ValidTo JobPosition
bcde 2019-07-01 2019-09-30 Project Manager
bcde 2020-01-09 2020-06-16 Head of Center of Excellence
bcde 2020-06-17 2021-07-31 Team Leader
bcde 2021-08-01 2099-12-31 Head of Center of Excellence

所以我写查询:

select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, login

但是在这种情况下它不起作用(如果有人有相同的 JobPosition),所以我决定使用 dense_rank,像这样:

select login, ValidFrom, ValidTo, JobPosition
,dense_rank() OVER (Partition BY JobPosition ORDER BY login, ValidFrom, ValidTo, JobPosition) as no
from employeeContracts

之后

select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login

但问题是 dense_rank 不能按我的需要工作 ;) 我想要这样的东西:

login ValidFrom ValidTo JobPosition no
bcde 2019-07-01 2019-09-30 Project Manager 1
bcde 2020-01-09 2020-06-16 Head of Center of Excellence 2
bcde 2020-06-17 2020-07-31 Team Leader 3
bcde 2020-08-01 2021-03-31 Team Leader 3
bcde 2021-04-01 2021-06-30 Team Leader 3
bcde 2021-07-01 2021-07-31 Team Leader 3
bcde 2021-08-01 2021-12-31 Head of Center of Excellence 4
bcde 2022-01-01 2022-05-09 Head of Center of Excellence 4
bcde 2022-02-01 2022-05-09 Head of Center of Excellence 4
bcde 2022-05-09 2099-12-31 Head of Center of Excellence 4

然后使用查询得到最终结果:

select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login

示例数据方案

CREATE TABLE employeeContracts (
  login text,
  ValidFrom datetime,
  ValidTo datetime,
  JobPosition text
);

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2019-07-01', '2019-09-30', 'Project Manager');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2020-01-09', '2020-06-16', 'Head of Center of Excellence');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2020-06-17', '2020-07-31', 'Team Leader');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2020-08-01', '2021-03-31', '   Team Leader');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2021-04-01', '2021-06-30', '   Team Leader');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2021-07-01', '2021-07-31', '   Team Leader');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2021-08-01', '2021-12-31', '   Head of Center of Excellence');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2022-01-01', ' 2022-05-09', 'Head of Center of Excellence');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2022-02-01', '2022-05-09', 'Head of Center of Excellence');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2022-05-09', '2099-12-31', 'Head of Center of Excellence');

这个可以测试Here

这听起来像是一个“差距和孤岛”问题。处理它的一种方法是识别每个“岛屿”(或每个 JobPosition 的日期范围)并分配排名。

login ValidFrom ValidTo JobPosition PrevPosition JobPositionGroup
bcde 2019-07-01 00:00:00 2019-09-30 00:00:00 Project Manager null 1
bcde 2020-01-09 00:00:00 2020-06-16 00:00:00 Head of Center of Excellence Project Manager 2
bcde 2020-06-17 00:00:00 2020-07-31 00:00:00 Team Leader Head of Center of Excellence 3
bcde 2020-08-01 00:00:00 2021-03-31 00:00:00 Team Leader Team Leader 3
bcde 2021-04-01 00:00:00 2021-06-30 00:00:00 Team Leader Team Leader 3
bcde 2021-07-01 00:00:00 2021-07-31 00:00:00 Team Leader Team Leader 3
bcde 2021-08-01 00:00:00 2021-12-31 00:00:00 Head of Center of Excellence Team Leader 4
bcde 2022-01-01 00:00:00 2022-05-09 00:00:00 Head of Center of Excellence Head of Center of Excellence 4
bcde 2022-02-01 00:00:00 2022-05-09 00:00:00 Head of Center of Excellence Head of Center of Excellence 4
bcde 2022-05-09 00:00:00 2099-12-31 00:00:00 Head of Center of Excellence Head of Center of Excellence 4

然后应用您的 min/max 逻辑,按排名分组:

 SELECT grp.JobPositionGroup
        , grp.JobPosition
        , grp.Login
        , MIN(grp.ValidFrom) AS ValidFrom
        , MAX(grp.ValidTo) AS ValidTo
 FROM (
         SELECT  cron.*
                 , SUM( IF( JobPosition = PrevPosition, 0, 1) ) OVER(
                            ORDER BY ValidFrom, ValidTo
                 )  AS JobPositionGroup
         FROM   (
                    SELECT ec.*
                          , LAG(JobPosition, 1) OVER (
                               ORDER BY ValidFrom, ValidTo
                          ) AS PrevPosition
                    FROM   employeeContracts ec
                 )
                 cron
         ) grp
 GROUP BY grp.JobPositionGroup
          , grp.JobPosition
          , grp.Login

结果:

JobPositionGroup JobPosition Login ValidFrom ValidTo
1 Project Manager bcde 2019-07-01 00:00:00 2019-09-30 00:00:00
2 Head of Center of Excellence bcde 2020-01-09 00:00:00 2020-06-16 00:00:00
3 Team Leader bcde 2020-06-17 00:00:00 2021-07-31 00:00:00
4 Head of Center of Excellence bcde 2021-08-01 00:00:00 2099-12-31 00:00:00

db<>fiddle here