遍历数据集并为行插入序列 ID

Iterate over data set and insert sequence ids for rows

我在 Oracle SQL table 中有一个超过 200 万行的数据集,想对此做一些关联分析。要将序列挖掘算法应用于此数据,我需要一个名为 'sequenceId' 的列和一个名为 'eventId'.

的列

table 结构如下所示:

现在我需要一个每次 uId 更改时递增的 id。我如何在 Oracle SQL 中做到这一点?我在 R 中尝试过,但在那里需要超过 12 个小时...


示例数据:

time pId uId
2019-10-01 12:12:24 3806 535447446
2019-10-01 19:51:55 3762 535447446
2019-10-02 18:09:34 3806 552286734
2019-10-02 17:54:01 3928 493964166

预期结果:

time pId uId sequence id
2019-10-01 12:12:24 3806 535447446 1
2019-10-01 19:51:55 3762 535447446 1
2019-10-02 18:09:34 3806 552286734 2
2019-10-02 17:54:01 3928 493964166 3

ID 应在 user_id 更改时增加

WITH 
source_data AS (
    SELECT '2019-10-01 12:12:24' AS timestamp, 3806 AS product_id, 535447446 AS user_id FROM DUAL UNION ALL
    SELECT '2019-10-01 19:51:55', 3762, 535447446 FROM DUAL UNION ALL
    SELECT '2019-10-02 18:09:34', 3806, 552286734 FROM DUAL UNION ALL
    SELECT '2019-10-02 17:54:01', 3928, 493964166 FROM DUAL 
),
cte AS (
    SELECT timestamp,
           product_id,
           user_id,
           CASE WHEN user_id = LAG(user_id) OVER (ORDER BY timestamp) 
                THEN 0
                ELSE 1
                END new_user
    FROM source_data
)
SELECT timestamp,
       product_id,
       user_id,
       SUM(new_user) OVER (ORDER BY timestamp) sequence_id
FROM cte;
TIMESTAMP PRODUCT_ID USER_ID SEQUENCE_ID
2019-10-01 12:12:24 3806 535447446 1
2019-10-01 19:51:55 3762 535447446 1
2019-10-02 17:54:01 3928 493964166 2
2019-10-02 18:09:34 3806 552286734 3