获取进程在 Jinja 上每个阶段的时间间隔 SQL
Get time intervals where the process was on each stage on Jinja SQL
我需要获取进程在每个阶段的时间间隔,并考虑进程何时返回到某个阶段。例如:
stage_name
from_day
to_day
A
1
2
B
2
3
B
3
4
B
4
5
C
5
6
B
6
7
D
7
进程目前在阶段 D
我想要一个table这样的
stage_name
from_day
to_day
A
1
2
B
2
5
C
5
6
B
6
7
D
7
这是一个gaps and islands
问题,您可以使用以下方法解决这个问题。
方法一
SELECT
stage_name,
MIN(from_day) as from_day,
MAX(to_day) as to_day
FROM (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY from_day) - ROW_NUMBER() OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM
my_table
) t1
GROUP BY stage_name,grp
ORDER BY from_day, stage_name;
stage_name
from_day
to_day
A
1
2
B
2
5
C
5
6
B
6
7
D
7
View working demo on DB Fiddle
方法二
SELECT
stage_name,
MIN(from_day) as from_day,
MAX(to_day) as to_day
FROM (
SELECT
*,
SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM (
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
) t1
) t2
GROUP BY stage_name,grp
ORDER BY from_day, stage_name;
stage_name
from_day
to_day
A
1
2
B
2
5
C
5
6
B
6
7
D
7
View working demo on DB Fiddle
上面的示例使用 from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day)
来确定当前行 from_day
是否与相同 stage_name
group/partition 中的前几行 to_day
相同from_day
订购。如果相同,则分配 0,否则分配 1。此子查询的输出已包含在下面供您阅读:
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
ORDER BY from_day, stage_name;
stage_name
from_day
to_day
same
A
1
2
0
B
2
3
0
B
3
4
0
B
4
5
0
C
5
6
0
B
6
7
1
D
7
0
然后使用window函数SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day)
求出这些差异的累加和,从而创建组:
SELECT
*,
SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM (
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
) t1
ORDER BY from_day, stage_name;
stage_name
from_day
to_day
same
grp
A
1
2
0
0
B
2
3
0
0
B
3
4
0
0
B
4
5
0
0
C
5
6
0
0
B
6
7
1
1
D
7
0
0
最后,通过按 stage_name
和 grp
分组,我们可以在每个组中找到所需的值,在本例中最早的 from_day
使用 MIN
和最新的to_day
使用 MAX
我需要获取进程在每个阶段的时间间隔,并考虑进程何时返回到某个阶段。例如:
stage_name | from_day | to_day |
---|---|---|
A | 1 | 2 |
B | 2 | 3 |
B | 3 | 4 |
B | 4 | 5 |
C | 5 | 6 |
B | 6 | 7 |
D | 7 |
进程目前在阶段 D
我想要一个table这样的
stage_name | from_day | to_day |
---|---|---|
A | 1 | 2 |
B | 2 | 5 |
C | 5 | 6 |
B | 6 | 7 |
D | 7 |
这是一个gaps and islands
问题,您可以使用以下方法解决这个问题。
方法一
SELECT
stage_name,
MIN(from_day) as from_day,
MAX(to_day) as to_day
FROM (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY from_day) - ROW_NUMBER() OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM
my_table
) t1
GROUP BY stage_name,grp
ORDER BY from_day, stage_name;
stage_name | from_day | to_day |
---|---|---|
A | 1 | 2 |
B | 2 | 5 |
C | 5 | 6 |
B | 6 | 7 |
D | 7 |
View working demo on DB Fiddle
方法二
SELECT
stage_name,
MIN(from_day) as from_day,
MAX(to_day) as to_day
FROM (
SELECT
*,
SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM (
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
) t1
) t2
GROUP BY stage_name,grp
ORDER BY from_day, stage_name;
stage_name | from_day | to_day |
---|---|---|
A | 1 | 2 |
B | 2 | 5 |
C | 5 | 6 |
B | 6 | 7 |
D | 7 |
View working demo on DB Fiddle
上面的示例使用 from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day)
来确定当前行 from_day
是否与相同 stage_name
group/partition 中的前几行 to_day
相同from_day
订购。如果相同,则分配 0,否则分配 1。此子查询的输出已包含在下面供您阅读:
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
ORDER BY from_day, stage_name;
stage_name | from_day | to_day | same |
---|---|---|---|
A | 1 | 2 | 0 |
B | 2 | 3 | 0 |
B | 3 | 4 | 0 |
B | 4 | 5 | 0 |
C | 5 | 6 | 0 |
B | 6 | 7 | 1 |
D | 7 | 0 |
然后使用window函数SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day)
求出这些差异的累加和,从而创建组:
SELECT
*,
SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM (
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
) t1
ORDER BY from_day, stage_name;
stage_name | from_day | to_day | same | grp |
---|---|---|---|---|
A | 1 | 2 | 0 | 0 |
B | 2 | 3 | 0 | 0 |
B | 3 | 4 | 0 | 0 |
B | 4 | 5 | 0 | 0 |
C | 5 | 6 | 0 | 0 |
B | 6 | 7 | 1 | 1 |
D | 7 | 0 | 0 |
最后,通过按 stage_name
和 grp
分组,我们可以在每个组中找到所需的值,在本例中最早的 from_day
使用 MIN
和最新的to_day
使用 MAX