ROW_NUMBER 超过 PARTITION BY 在中断之间重新启动行计数器
ROW_NUMBER over PARTITION BY restart row counter between breaks
我有一个当前按用户、activity 的日期和时间以及 ID 排序的活动列表。我想为由相同字段设置的每个组生成数字。使用以下代码,我获得了相当大的准确性。但是,当稍后重复相同的 ID 时会出现问题,我需要重新开始行号计数而不是从上一次迭代继续。
这是我的代码:
ROW_NUMBER() OVER (PARTITION BY USER_ID, foc_id ORDER BY USER_ID, to_char(activity_date, 'MM/DD/YYYY HH24:MI:SS'), foc_id) seq_nbr
在下图中,我们看到 FOC_ID“A240”在 2:20PM 附近有 activity。然后 FOC_ID “B410” 在 3:19PM 附近有 activity,最后用户返回到 “A240” 在 3:20 附近额外 activity。因为“A240”的第一个和第二个事件序列之间有 activity,所以我需要行号 (seq_nbr) 重新开始,而不是从之前的 activity.[=13 继续=]
您可以使用 MATCH_RECOGNIZE
:
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id, mno ORDER BY activity_date ) AS seq_num
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY user_id
ORDER BY activity_date
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
PATTERN ( same_foc_id* last_row )
DEFINE
same_foc_id AS FIRST( foc_id ) = NEXT( foc_id )
)
或者,多个 ROW_NUMBER
s:
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id, foc_id, grp ORDER BY activity_date ) AS seq_num
FROM (
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY activity_date )
- ROW_NUMBER() OVER ( PARTITION BY user_id, foc_id ORDER BY activity_date ) AS grp
FROM table_name
)
ORDER BY user_id, activity_date
其中,对于示例数据:
CREATE TABLE table_name ( user_id, activity_date, foc_id ) AS
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:20:34' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:39' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:44' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:58' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:20:11' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:22:16' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:22:33' HOUR TO SECOND, 'A240' FROM DUAL;
双输出:
USER_ID | ACTIVITY_DATE | FOC_ID | SEQ_NUM
:------ | :------------------ | :----- | ------:
UVAC3 | 2020-11-04 14:20:34 | A240 | 1
UVAC3 | 2020-11-04 14:21:23 | A240 | 2
UVAC3 | 2020-11-04 14:21:23 | A240 | 3
UVAC3 | 2020-11-04 14:21:23 | A240 | 4
UVAC3 | 2020-11-04 15:19:39 | B410 | 1
UVAC3 | 2020-11-04 15:19:44 | B410 | 2
UVAC3 | 2020-11-04 15:19:58 | B410 | 3
UVAC3 | 2020-11-04 15:20:11 | B410 | 4
UVAC3 | 2020-11-04 15:22:16 | A240 | 1
UVAC3 | 2020-11-04 15:22:33 | A240 | 2
db<>fiddle here
我有一个当前按用户、activity 的日期和时间以及 ID 排序的活动列表。我想为由相同字段设置的每个组生成数字。使用以下代码,我获得了相当大的准确性。但是,当稍后重复相同的 ID 时会出现问题,我需要重新开始行号计数而不是从上一次迭代继续。
这是我的代码:
ROW_NUMBER() OVER (PARTITION BY USER_ID, foc_id ORDER BY USER_ID, to_char(activity_date, 'MM/DD/YYYY HH24:MI:SS'), foc_id) seq_nbr
在下图中,我们看到 FOC_ID“A240”在 2:20PM 附近有 activity。然后 FOC_ID “B410” 在 3:19PM 附近有 activity,最后用户返回到 “A240” 在 3:20 附近额外 activity。因为“A240”的第一个和第二个事件序列之间有 activity,所以我需要行号 (seq_nbr) 重新开始,而不是从之前的 activity.[=13 继续=]
您可以使用 MATCH_RECOGNIZE
:
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id, mno ORDER BY activity_date ) AS seq_num
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY user_id
ORDER BY activity_date
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
PATTERN ( same_foc_id* last_row )
DEFINE
same_foc_id AS FIRST( foc_id ) = NEXT( foc_id )
)
或者,多个 ROW_NUMBER
s:
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id, foc_id, grp ORDER BY activity_date ) AS seq_num
FROM (
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY activity_date )
- ROW_NUMBER() OVER ( PARTITION BY user_id, foc_id ORDER BY activity_date ) AS grp
FROM table_name
)
ORDER BY user_id, activity_date
其中,对于示例数据:
CREATE TABLE table_name ( user_id, activity_date, foc_id ) AS
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:20:34' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:39' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:44' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:58' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:20:11' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:22:16' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:22:33' HOUR TO SECOND, 'A240' FROM DUAL;
双输出:
USER_ID | ACTIVITY_DATE | FOC_ID | SEQ_NUM :------ | :------------------ | :----- | ------: UVAC3 | 2020-11-04 14:20:34 | A240 | 1 UVAC3 | 2020-11-04 14:21:23 | A240 | 2 UVAC3 | 2020-11-04 14:21:23 | A240 | 3 UVAC3 | 2020-11-04 14:21:23 | A240 | 4 UVAC3 | 2020-11-04 15:19:39 | B410 | 1 UVAC3 | 2020-11-04 15:19:44 | B410 | 2 UVAC3 | 2020-11-04 15:19:58 | B410 | 3 UVAC3 | 2020-11-04 15:20:11 | B410 | 4 UVAC3 | 2020-11-04 15:22:16 | A240 | 1 UVAC3 | 2020-11-04 15:22:33 | A240 | 2
db<>fiddle here