RANK() OVER 分区,根据另一列进行重置

RANK() OVER Partition with resetting in dependence of another column

我有 table 2 个不同的 ID 和时间戳,我想对其进行排名。但特殊之处在于我想对 S_ID 进行排名,直到在 O_ID 处有一个条目。在 O_ID 处有条目后,我希望 S_ID 处的下一个排名从 1 开始。

这是一个例子:

select 
    S_ID,
    timestamp,
    O_ID,
    rank() OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS RANK
from table
order by S_ID, timestamp;
S_ID Timestamp O_ID Rank
2e114e9f 2021-11-26 08:57:44.049 NULL 1
2e114e9f 2021-12-26 17:07:26.272 NULL 2
2e114e9f 2021-12-27 08:13:24.277 NULL 3
2e114e9f 2021-12-29 11:32:56.952 2287549 4
2e114e9f 2021-12-30 13:41:28.821 NULL 5
2e114e9f 2021-12-30 19:53:28.590 NULL 6
2e114e9f 2022-02-05 09:50:54.104 2333002 7
2e114e9f 2022-02-19 10:14:31.389 NULL 8

我现在如何根据列 O_ID 中的条目添加另一个排名? 所以结果应该是:

S_ID Timestamp O_ID Rank S_ID Rank both
2e114e9f 2021-11-26 08:57:44.049 NULL 1 1
2e114e9f 2021-12-26 17:07:26.272 NULL 2 2
2e114e9f 2021-12-27 08:13:24.277 NULL 3 3
2e114e9f 2021-12-29 11:32:56.952 2287549 4 4
2e114e9f 2021-12-30 13:41:28.821 NULL 5 1
2e114e9f 2021-12-30 19:53:28.590 NULL 6 2
2e114e9f 2022-02-05 09:50:54.104 2333002 7 3
2e114e9f 2022-02-19 10:14:31.389 NULL 8 1

我很高兴有任何值得深思的地方!!!!

看起来间隙和孤岛方法在这里可能会有所帮助 - 使用 lag 将数据分成组(基于当前和之前的相等性以及一些空处理),然后使用组值作为 rank()函数。

-- sample data
WITH dataset (S_ID, Timestamp, O_ID) AS (
    VALUES ('2e114e9f', timestamp '2021-11-26 08:57:44.049',    NULL),
    ('2e114e9f',    timestamp '2021-12-26 17:07:26.272',    NULL),
    ('2e114e9f',    timestamp '2021-12-27 08:13:24.277',    NULL),
    ('2e114e9f',    timestamp '2021-12-29 11:32:56.952',    2287549),
    ('2e114e9f',    timestamp '2021-12-30 13:41:28.821',    NULL),
    ('2e114e9f',    timestamp '2021-12-30 19:53:28.590',    NULL),
    ('2e114e9f',    timestamp '2022-02-05 09:50:54.104',    2333002),
    ('2e114e9f',    timestamp '2022-02-19 10:14:31.389',    NULL)
) 

--query
select S_ID,
    Timestamp,
    O_ID,
    rank() OVER (PARTITION BY S_ID, grp ORDER BY timestamp asc) AS RANK
from(
        select *,
            sum(if(prev is not null and (O_ID is null or O_ID != prev), 1, 0)) 
                OVER (PARTITION BY S_ID ORDER BY timestamp asc) as grp
        from (
                select *,
                    lag(O_ID) OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS prev
                from dataset
            )
    )

输出:

S_ID Timestamp O_ID RANK
2e114e9f 2021-11-26 08:57:44.049 1
2e114e9f 2021-12-26 17:07:26.272 2
2e114e9f 2021-12-27 08:13:24.277 3
2e114e9f 2021-12-29 11:32:56.952 2287549 4
2e114e9f 2021-12-30 13:41:28.821 1
2e114e9f 2021-12-30 19:53:28.590 2
2e114e9f 2022-02-05 09:50:54.104 2333002 3
2e114e9f 2022-02-19 10:14:31.389 1