SQL 排名计数出现次数
SQL rank counting occurences
在我的 table 中,我有这样的员工合同数据:
而且我想在时间段内找到工作职位,比如这样
login
ValidFrom
ValidTo
JobPosition
bcde
2019-07-01
2019-09-30
Project Manager
bcde
2020-01-09
2020-06-16
Head of Center of Excellence
bcde
2020-06-17
2021-07-31
Team Leader
bcde
2021-08-01
2099-12-31
Head of Center of Excellence
所以我写查询:
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, login
但是在这种情况下它不起作用(如果有人有相同的 JobPosition),所以我决定使用 dense_rank,像这样:
select login, ValidFrom, ValidTo, JobPosition
,dense_rank() OVER (Partition BY JobPosition ORDER BY login, ValidFrom, ValidTo, JobPosition) as no
from employeeContracts
之后
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login
但问题是 dense_rank 不能按我的需要工作 ;) 我想要这样的东西:
login
ValidFrom
ValidTo
JobPosition
no
bcde
2019-07-01
2019-09-30
Project Manager
1
bcde
2020-01-09
2020-06-16
Head of Center of Excellence
2
bcde
2020-06-17
2020-07-31
Team Leader
3
bcde
2020-08-01
2021-03-31
Team Leader
3
bcde
2021-04-01
2021-06-30
Team Leader
3
bcde
2021-07-01
2021-07-31
Team Leader
3
bcde
2021-08-01
2021-12-31
Head of Center of Excellence
4
bcde
2022-01-01
2022-05-09
Head of Center of Excellence
4
bcde
2022-02-01
2022-05-09
Head of Center of Excellence
4
bcde
2022-05-09
2099-12-31
Head of Center of Excellence
4
然后使用查询得到最终结果:
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login
示例数据方案
CREATE TABLE employeeContracts (
login text,
ValidFrom datetime,
ValidTo datetime,
JobPosition text
);
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2019-07-01', '2019-09-30', 'Project Manager');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-01-09', '2020-06-16', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-06-17', '2020-07-31', 'Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-08-01', '2021-03-31', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-04-01', '2021-06-30', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-07-01', '2021-07-31', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-08-01', '2021-12-31', ' Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-01-01', ' 2022-05-09', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-02-01', '2022-05-09', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-05-09', '2099-12-31', 'Head of Center of Excellence');
这个可以测试Here
这听起来像是一个“差距和孤岛”问题。处理它的一种方法是识别每个“岛屿”(或每个 JobPosition 的日期范围)并分配排名。
login
ValidFrom
ValidTo
JobPosition
PrevPosition
JobPositionGroup
bcde
2019-07-01 00:00:00
2019-09-30 00:00:00
Project Manager
null
1
bcde
2020-01-09 00:00:00
2020-06-16 00:00:00
Head of Center of Excellence
Project Manager
2
bcde
2020-06-17 00:00:00
2020-07-31 00:00:00
Team Leader
Head of Center of Excellence
3
bcde
2020-08-01 00:00:00
2021-03-31 00:00:00
Team Leader
Team Leader
3
bcde
2021-04-01 00:00:00
2021-06-30 00:00:00
Team Leader
Team Leader
3
bcde
2021-07-01 00:00:00
2021-07-31 00:00:00
Team Leader
Team Leader
3
bcde
2021-08-01 00:00:00
2021-12-31 00:00:00
Head of Center of Excellence
Team Leader
4
bcde
2022-01-01 00:00:00
2022-05-09 00:00:00
Head of Center of Excellence
Head of Center of Excellence
4
bcde
2022-02-01 00:00:00
2022-05-09 00:00:00
Head of Center of Excellence
Head of Center of Excellence
4
bcde
2022-05-09 00:00:00
2099-12-31 00:00:00
Head of Center of Excellence
Head of Center of Excellence
4
然后应用您的 min/max 逻辑,按排名分组:
SELECT grp.JobPositionGroup
, grp.JobPosition
, grp.Login
, MIN(grp.ValidFrom) AS ValidFrom
, MAX(grp.ValidTo) AS ValidTo
FROM (
SELECT cron.*
, SUM( IF( JobPosition = PrevPosition, 0, 1) ) OVER(
ORDER BY ValidFrom, ValidTo
) AS JobPositionGroup
FROM (
SELECT ec.*
, LAG(JobPosition, 1) OVER (
ORDER BY ValidFrom, ValidTo
) AS PrevPosition
FROM employeeContracts ec
)
cron
) grp
GROUP BY grp.JobPositionGroup
, grp.JobPosition
, grp.Login
结果:
JobPositionGroup
JobPosition
Login
ValidFrom
ValidTo
1
Project Manager
bcde
2019-07-01 00:00:00
2019-09-30 00:00:00
2
Head of Center of Excellence
bcde
2020-01-09 00:00:00
2020-06-16 00:00:00
3
Team Leader
bcde
2020-06-17 00:00:00
2021-07-31 00:00:00
4
Head of Center of Excellence
bcde
2021-08-01 00:00:00
2099-12-31 00:00:00
db<>fiddle here
在我的 table 中,我有这样的员工合同数据:
而且我想在时间段内找到工作职位,比如这样
login | ValidFrom | ValidTo | JobPosition |
---|---|---|---|
bcde | 2019-07-01 | 2019-09-30 | Project Manager |
bcde | 2020-01-09 | 2020-06-16 | Head of Center of Excellence |
bcde | 2020-06-17 | 2021-07-31 | Team Leader |
bcde | 2021-08-01 | 2099-12-31 | Head of Center of Excellence |
所以我写查询:
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, login
但是在这种情况下它不起作用(如果有人有相同的 JobPosition),所以我决定使用 dense_rank,像这样:
select login, ValidFrom, ValidTo, JobPosition
,dense_rank() OVER (Partition BY JobPosition ORDER BY login, ValidFrom, ValidTo, JobPosition) as no
from employeeContracts
之后
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login
但问题是 dense_rank 不能按我的需要工作 ;) 我想要这样的东西:
login | ValidFrom | ValidTo | JobPosition | no |
---|---|---|---|---|
bcde | 2019-07-01 | 2019-09-30 | Project Manager | 1 |
bcde | 2020-01-09 | 2020-06-16 | Head of Center of Excellence | 2 |
bcde | 2020-06-17 | 2020-07-31 | Team Leader | 3 |
bcde | 2020-08-01 | 2021-03-31 | Team Leader | 3 |
bcde | 2021-04-01 | 2021-06-30 | Team Leader | 3 |
bcde | 2021-07-01 | 2021-07-31 | Team Leader | 3 |
bcde | 2021-08-01 | 2021-12-31 | Head of Center of Excellence | 4 |
bcde | 2022-01-01 | 2022-05-09 | Head of Center of Excellence | 4 |
bcde | 2022-02-01 | 2022-05-09 | Head of Center of Excellence | 4 |
bcde | 2022-05-09 | 2099-12-31 | Head of Center of Excellence | 4 |
然后使用查询得到最终结果:
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login
示例数据方案
CREATE TABLE employeeContracts (
login text,
ValidFrom datetime,
ValidTo datetime,
JobPosition text
);
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2019-07-01', '2019-09-30', 'Project Manager');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-01-09', '2020-06-16', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-06-17', '2020-07-31', 'Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-08-01', '2021-03-31', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-04-01', '2021-06-30', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-07-01', '2021-07-31', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-08-01', '2021-12-31', ' Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-01-01', ' 2022-05-09', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-02-01', '2022-05-09', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-05-09', '2099-12-31', 'Head of Center of Excellence');
这个可以测试Here
这听起来像是一个“差距和孤岛”问题。处理它的一种方法是识别每个“岛屿”(或每个 JobPosition 的日期范围)并分配排名。
login | ValidFrom | ValidTo | JobPosition | PrevPosition | JobPositionGroup |
---|---|---|---|---|---|
bcde | 2019-07-01 00:00:00 | 2019-09-30 00:00:00 | Project Manager | null | 1 |
bcde | 2020-01-09 00:00:00 | 2020-06-16 00:00:00 | Head of Center of Excellence | Project Manager | 2 |
bcde | 2020-06-17 00:00:00 | 2020-07-31 00:00:00 | Team Leader | Head of Center of Excellence | 3 |
bcde | 2020-08-01 00:00:00 | 2021-03-31 00:00:00 | Team Leader | Team Leader | 3 |
bcde | 2021-04-01 00:00:00 | 2021-06-30 00:00:00 | Team Leader | Team Leader | 3 |
bcde | 2021-07-01 00:00:00 | 2021-07-31 00:00:00 | Team Leader | Team Leader | 3 |
bcde | 2021-08-01 00:00:00 | 2021-12-31 00:00:00 | Head of Center of Excellence | Team Leader | 4 |
bcde | 2022-01-01 00:00:00 | 2022-05-09 00:00:00 | Head of Center of Excellence | Head of Center of Excellence | 4 |
bcde | 2022-02-01 00:00:00 | 2022-05-09 00:00:00 | Head of Center of Excellence | Head of Center of Excellence | 4 |
bcde | 2022-05-09 00:00:00 | 2099-12-31 00:00:00 | Head of Center of Excellence | Head of Center of Excellence | 4 |
然后应用您的 min/max 逻辑,按排名分组:
SELECT grp.JobPositionGroup
, grp.JobPosition
, grp.Login
, MIN(grp.ValidFrom) AS ValidFrom
, MAX(grp.ValidTo) AS ValidTo
FROM (
SELECT cron.*
, SUM( IF( JobPosition = PrevPosition, 0, 1) ) OVER(
ORDER BY ValidFrom, ValidTo
) AS JobPositionGroup
FROM (
SELECT ec.*
, LAG(JobPosition, 1) OVER (
ORDER BY ValidFrom, ValidTo
) AS PrevPosition
FROM employeeContracts ec
)
cron
) grp
GROUP BY grp.JobPositionGroup
, grp.JobPosition
, grp.Login
结果:
JobPositionGroup | JobPosition | Login | ValidFrom | ValidTo |
---|---|---|---|---|
1 | Project Manager | bcde | 2019-07-01 00:00:00 | 2019-09-30 00:00:00 |
2 | Head of Center of Excellence | bcde | 2020-01-09 00:00:00 | 2020-06-16 00:00:00 |
3 | Team Leader | bcde | 2020-06-17 00:00:00 | 2021-07-31 00:00:00 |
4 | Head of Center of Excellence | bcde | 2021-08-01 00:00:00 | 2099-12-31 00:00:00 |
db<>fiddle here