获取进程在 Jinja 上每个阶段的时间间隔 SQL

Get time intervals where the process was on each stage on Jinja SQL

我需要获取进程在每个阶段的时间间隔,并考虑进程何时返回到某个阶段。例如:

stage_name from_day to_day
A 1 2
B 2 3
B 3 4
B 4 5
C 5 6
B 6 7
D 7

进程目前在阶段 D

我想要一个table这样的

stage_name from_day to_day
A 1 2
B 2 5
C 5 6
B 6 7
D 7

这是一个gaps and islands问题,您可以使用以下方法解决这个问题。

方法一

SELECT
    stage_name,
    MIN(from_day) as from_day,
    MAX(to_day) as to_day
FROM (
    SELECT
        *,
       ROW_NUMBER() OVER (ORDER BY from_day) - ROW_NUMBER() OVER (PARTITION  BY stage_name ORDER BY from_day) as grp
        FROM
            my_table
    ) t1

GROUP BY stage_name,grp
ORDER BY from_day, stage_name;
stage_name from_day to_day
A 1 2
B 2 5
C 5 6
B 6 7
D 7

View working demo on DB Fiddle

方法二

SELECT
    stage_name,
    MIN(from_day) as from_day,
    MAX(to_day) as to_day
FROM (
    SELECT
        *,
       SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
    FROM (
        SELECT
            *,
            CASE 
                WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
                ELSE 1
            END as same
        FROM
            my_table
    ) t1
) t2
GROUP BY stage_name,grp
ORDER BY from_day, stage_name;
stage_name from_day to_day
A 1 2
B 2 5
C 5 6
B 6 7
D 7

View working demo on DB Fiddle

上面的示例使用 from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) 来确定当前行 from_day 是否与相同 stage_name group/partition 中的前几行 to_day 相同from_day 订购。如果相同,则分配 0,否则分配 1。此子查询的输出已包含在下面供您阅读:

SELECT
    *,
    CASE 
        WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
        ELSE 1
    END as same
FROM
    my_table
ORDER BY from_day, stage_name;
stage_name from_day to_day same
A 1 2 0
B 2 3 0
B 3 4 0
B 4 5 0
C 5 6 0
B 6 7 1
D 7 0

然后使用window函数SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day)求出这些差异的累加和,从而创建组:

    SELECT
        *,
        SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
    FROM (
            SELECT
                *,
                CASE 
                    WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
                    ELSE 1
                END as same
            FROM
                my_table
        ) t1
     ORDER BY from_day, stage_name;
stage_name from_day to_day same grp
A 1 2 0 0
B 2 3 0 0
B 3 4 0 0
B 4 5 0 0
C 5 6 0 0
B 6 7 1 1
D 7 0 0

最后,通过按 stage_namegrp 分组,我们可以在每个组中找到所需的值,在本例中最早的 from_day 使用 MIN 和最新的to_day 使用 MAX