使用 SQL 中的 RANK() 作为记录组的 ID 号
Using RANK() in SQL as ID Number for Groups of Records
这是我的 table:
employeeid workdate workstatus
----------- ----------------------- ----------
1 2020-09-01 00:00:00.000 ON
1 2020-09-02 00:00:00.000 ON
1 2020-09-03 00:00:00.000 ON
1 2020-09-04 00:00:00.000 OFF
1 2020-09-05 00:00:00.000 OFF
2 2020-09-01 00:00:00.000 ON
2 2020-09-02 00:00:00.000 ON
2 2020-09-03 00:00:00.000 OFF
2 2020-09-04 00:00:00.000 OFF
2 2020-09-05 00:00:00.000 ON
我正在执行这个查询:
select employeeid, workdate, workstatus, rank() over(partition by employeeid, workstatus order by workdate) as cycle
from #workstatus
order by 1, 2
结果如下:
employeeid workdate workstatus cycle
----------- ----------------------- ---------- --------------------
1 2020-09-01 00:00:00.000 ON 1
1 2020-09-02 00:00:00.000 ON 2
1 2020-09-03 00:00:00.000 ON 3
1 2020-09-04 00:00:00.000 OFF 1
1 2020-09-05 00:00:00.000 OFF 2
2 2020-09-01 00:00:00.000 ON 1
2 2020-09-02 00:00:00.000 ON 2
2 2020-09-03 00:00:00.000 OFF 1
2 2020-09-04 00:00:00.000 OFF 2
2 2020-09-05 00:00:00.000 ON 3
我的目标是让 on/off 工作的“周期”由每个员工的唯一编号来标识。因此,员工 1 的三个工作日将是周期 1,然后两个休息日将是周期 2。
员工 2 的前两天 ON 日将是第 1 个周期,然后这两个 OFF 日将是第 2 个周期,最后一个 ON 日将是第 3 个周期。
我不确定是否可以为此使用 RANK(),或者是否有更好的解决方案。谢谢!
这是一种 gaps-and-islands 问题。对于此版本,使用 lag()
和累积总和:
select t.*,
sum(case when prev_ws= workstatus then 0 else 1 end) over
(partition by employeeid order by workdate) as ranking
from (select t.*,
lag(workstatus) over (partition by employeeid order by workdate) as prev_ws
from t
) t;
使用dense_rank代替排名
您可以使用 window 函数来解决这个 gaps-and-islands 问题。一种方法是利用行号之间的差异来构建“相邻”记录组:
select employeeid, workdate, workstatus,
row_number() over(partition by employeeid, workstatus, rn1 - rn2 order by workdate) cycle
from (
select t.*,
row_number() over(partition by employeeid order by workdate) rn1,
row_number() over(partition by employeeid, workstatus order by workdate) rn2
from mytable t
) t
这是我的 table:
employeeid workdate workstatus
----------- ----------------------- ----------
1 2020-09-01 00:00:00.000 ON
1 2020-09-02 00:00:00.000 ON
1 2020-09-03 00:00:00.000 ON
1 2020-09-04 00:00:00.000 OFF
1 2020-09-05 00:00:00.000 OFF
2 2020-09-01 00:00:00.000 ON
2 2020-09-02 00:00:00.000 ON
2 2020-09-03 00:00:00.000 OFF
2 2020-09-04 00:00:00.000 OFF
2 2020-09-05 00:00:00.000 ON
我正在执行这个查询:
select employeeid, workdate, workstatus, rank() over(partition by employeeid, workstatus order by workdate) as cycle
from #workstatus
order by 1, 2
结果如下:
employeeid workdate workstatus cycle
----------- ----------------------- ---------- --------------------
1 2020-09-01 00:00:00.000 ON 1
1 2020-09-02 00:00:00.000 ON 2
1 2020-09-03 00:00:00.000 ON 3
1 2020-09-04 00:00:00.000 OFF 1
1 2020-09-05 00:00:00.000 OFF 2
2 2020-09-01 00:00:00.000 ON 1
2 2020-09-02 00:00:00.000 ON 2
2 2020-09-03 00:00:00.000 OFF 1
2 2020-09-04 00:00:00.000 OFF 2
2 2020-09-05 00:00:00.000 ON 3
我的目标是让 on/off 工作的“周期”由每个员工的唯一编号来标识。因此,员工 1 的三个工作日将是周期 1,然后两个休息日将是周期 2。
员工 2 的前两天 ON 日将是第 1 个周期,然后这两个 OFF 日将是第 2 个周期,最后一个 ON 日将是第 3 个周期。
我不确定是否可以为此使用 RANK(),或者是否有更好的解决方案。谢谢!
这是一种 gaps-and-islands 问题。对于此版本,使用 lag()
和累积总和:
select t.*,
sum(case when prev_ws= workstatus then 0 else 1 end) over
(partition by employeeid order by workdate) as ranking
from (select t.*,
lag(workstatus) over (partition by employeeid order by workdate) as prev_ws
from t
) t;
使用dense_rank代替排名
您可以使用 window 函数来解决这个 gaps-and-islands 问题。一种方法是利用行号之间的差异来构建“相邻”记录组:
select employeeid, workdate, workstatus,
row_number() over(partition by employeeid, workstatus, rn1 - rn2 order by workdate) cycle
from (
select t.*,
row_number() over(partition by employeeid order by workdate) rn1,
row_number() over(partition by employeeid, workstatus order by workdate) rn2
from mytable t
) t