RANK() OVER 分区,根据另一列进行重置
RANK() OVER Partition with resetting in dependence of another column
我有 table 2 个不同的 ID 和时间戳,我想对其进行排名。但特殊之处在于我想对 S_ID 进行排名,直到在 O_ID 处有一个条目。在 O_ID 处有条目后,我希望 S_ID 处的下一个排名从 1 开始。
这是一个例子:
select
S_ID,
timestamp,
O_ID,
rank() OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS RANK
from table
order by S_ID, timestamp;
S_ID
Timestamp
O_ID
Rank
2e114e9f
2021-11-26 08:57:44.049
NULL
1
2e114e9f
2021-12-26 17:07:26.272
NULL
2
2e114e9f
2021-12-27 08:13:24.277
NULL
3
2e114e9f
2021-12-29 11:32:56.952
2287549
4
2e114e9f
2021-12-30 13:41:28.821
NULL
5
2e114e9f
2021-12-30 19:53:28.590
NULL
6
2e114e9f
2022-02-05 09:50:54.104
2333002
7
2e114e9f
2022-02-19 10:14:31.389
NULL
8
我现在如何根据列 O_ID 中的条目添加另一个排名?
所以结果应该是:
S_ID
Timestamp
O_ID
Rank S_ID
Rank both
2e114e9f
2021-11-26 08:57:44.049
NULL
1
1
2e114e9f
2021-12-26 17:07:26.272
NULL
2
2
2e114e9f
2021-12-27 08:13:24.277
NULL
3
3
2e114e9f
2021-12-29 11:32:56.952
2287549
4
4
2e114e9f
2021-12-30 13:41:28.821
NULL
5
1
2e114e9f
2021-12-30 19:53:28.590
NULL
6
2
2e114e9f
2022-02-05 09:50:54.104
2333002
7
3
2e114e9f
2022-02-19 10:14:31.389
NULL
8
1
我很高兴有任何值得深思的地方!!!!
看起来间隙和孤岛方法在这里可能会有所帮助 - 使用 lag
将数据分成组(基于当前和之前的相等性以及一些空处理),然后使用组值作为 rank()
函数。
-- sample data
WITH dataset (S_ID, Timestamp, O_ID) AS (
VALUES ('2e114e9f', timestamp '2021-11-26 08:57:44.049', NULL),
('2e114e9f', timestamp '2021-12-26 17:07:26.272', NULL),
('2e114e9f', timestamp '2021-12-27 08:13:24.277', NULL),
('2e114e9f', timestamp '2021-12-29 11:32:56.952', 2287549),
('2e114e9f', timestamp '2021-12-30 13:41:28.821', NULL),
('2e114e9f', timestamp '2021-12-30 19:53:28.590', NULL),
('2e114e9f', timestamp '2022-02-05 09:50:54.104', 2333002),
('2e114e9f', timestamp '2022-02-19 10:14:31.389', NULL)
)
--query
select S_ID,
Timestamp,
O_ID,
rank() OVER (PARTITION BY S_ID, grp ORDER BY timestamp asc) AS RANK
from(
select *,
sum(if(prev is not null and (O_ID is null or O_ID != prev), 1, 0))
OVER (PARTITION BY S_ID ORDER BY timestamp asc) as grp
from (
select *,
lag(O_ID) OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS prev
from dataset
)
)
输出:
S_ID
Timestamp
O_ID
RANK
2e114e9f
2021-11-26 08:57:44.049
1
2e114e9f
2021-12-26 17:07:26.272
2
2e114e9f
2021-12-27 08:13:24.277
3
2e114e9f
2021-12-29 11:32:56.952
2287549
4
2e114e9f
2021-12-30 13:41:28.821
1
2e114e9f
2021-12-30 19:53:28.590
2
2e114e9f
2022-02-05 09:50:54.104
2333002
3
2e114e9f
2022-02-19 10:14:31.389
1
我有 table 2 个不同的 ID 和时间戳,我想对其进行排名。但特殊之处在于我想对 S_ID 进行排名,直到在 O_ID 处有一个条目。在 O_ID 处有条目后,我希望 S_ID 处的下一个排名从 1 开始。
这是一个例子:
select
S_ID,
timestamp,
O_ID,
rank() OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS RANK
from table
order by S_ID, timestamp;
S_ID | Timestamp | O_ID | Rank |
---|---|---|---|
2e114e9f | 2021-11-26 08:57:44.049 | NULL | 1 |
2e114e9f | 2021-12-26 17:07:26.272 | NULL | 2 |
2e114e9f | 2021-12-27 08:13:24.277 | NULL | 3 |
2e114e9f | 2021-12-29 11:32:56.952 | 2287549 | 4 |
2e114e9f | 2021-12-30 13:41:28.821 | NULL | 5 |
2e114e9f | 2021-12-30 19:53:28.590 | NULL | 6 |
2e114e9f | 2022-02-05 09:50:54.104 | 2333002 | 7 |
2e114e9f | 2022-02-19 10:14:31.389 | NULL | 8 |
我现在如何根据列 O_ID 中的条目添加另一个排名? 所以结果应该是:
S_ID | Timestamp | O_ID | Rank S_ID | Rank both |
---|---|---|---|---|
2e114e9f | 2021-11-26 08:57:44.049 | NULL | 1 | 1 |
2e114e9f | 2021-12-26 17:07:26.272 | NULL | 2 | 2 |
2e114e9f | 2021-12-27 08:13:24.277 | NULL | 3 | 3 |
2e114e9f | 2021-12-29 11:32:56.952 | 2287549 | 4 | 4 |
2e114e9f | 2021-12-30 13:41:28.821 | NULL | 5 | 1 |
2e114e9f | 2021-12-30 19:53:28.590 | NULL | 6 | 2 |
2e114e9f | 2022-02-05 09:50:54.104 | 2333002 | 7 | 3 |
2e114e9f | 2022-02-19 10:14:31.389 | NULL | 8 | 1 |
我很高兴有任何值得深思的地方!!!!
看起来间隙和孤岛方法在这里可能会有所帮助 - 使用 lag
将数据分成组(基于当前和之前的相等性以及一些空处理),然后使用组值作为 rank()
函数。
-- sample data
WITH dataset (S_ID, Timestamp, O_ID) AS (
VALUES ('2e114e9f', timestamp '2021-11-26 08:57:44.049', NULL),
('2e114e9f', timestamp '2021-12-26 17:07:26.272', NULL),
('2e114e9f', timestamp '2021-12-27 08:13:24.277', NULL),
('2e114e9f', timestamp '2021-12-29 11:32:56.952', 2287549),
('2e114e9f', timestamp '2021-12-30 13:41:28.821', NULL),
('2e114e9f', timestamp '2021-12-30 19:53:28.590', NULL),
('2e114e9f', timestamp '2022-02-05 09:50:54.104', 2333002),
('2e114e9f', timestamp '2022-02-19 10:14:31.389', NULL)
)
--query
select S_ID,
Timestamp,
O_ID,
rank() OVER (PARTITION BY S_ID, grp ORDER BY timestamp asc) AS RANK
from(
select *,
sum(if(prev is not null and (O_ID is null or O_ID != prev), 1, 0))
OVER (PARTITION BY S_ID ORDER BY timestamp asc) as grp
from (
select *,
lag(O_ID) OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS prev
from dataset
)
)
输出:
S_ID | Timestamp | O_ID | RANK |
---|---|---|---|
2e114e9f | 2021-11-26 08:57:44.049 | 1 | |
2e114e9f | 2021-12-26 17:07:26.272 | 2 | |
2e114e9f | 2021-12-27 08:13:24.277 | 3 | |
2e114e9f | 2021-12-29 11:32:56.952 | 2287549 | 4 |
2e114e9f | 2021-12-30 13:41:28.821 | 1 | |
2e114e9f | 2021-12-30 19:53:28.590 | 2 | |
2e114e9f | 2022-02-05 09:50:54.104 | 2333002 | 3 |
2e114e9f | 2022-02-19 10:14:31.389 | 1 |