遍历数据集并为行插入序列 ID
Iterate over data set and insert sequence ids for rows
我在 Oracle SQL table 中有一个超过 200 万行的数据集,想对此做一些关联分析。要将序列挖掘算法应用于此数据,我需要一个名为 'sequenceId' 的列和一个名为 'eventId'.
的列
table 结构如下所示:
- 时间
- pId
- uId
现在我需要一个每次 uId 更改时递增的 id。我如何在 Oracle SQL 中做到这一点?我在 R 中尝试过,但在那里需要超过 12 个小时...
示例数据:
time
pId
uId
2019-10-01 12:12:24
3806
535447446
2019-10-01 19:51:55
3762
535447446
2019-10-02 18:09:34
3806
552286734
2019-10-02 17:54:01
3928
493964166
预期结果:
time
pId
uId
sequence id
2019-10-01 12:12:24
3806
535447446
1
2019-10-01 19:51:55
3762
535447446
1
2019-10-02 18:09:34
3806
552286734
2
2019-10-02 17:54:01
3928
493964166
3
ID 应在 user_id 更改时增加
WITH
source_data AS (
SELECT '2019-10-01 12:12:24' AS timestamp, 3806 AS product_id, 535447446 AS user_id FROM DUAL UNION ALL
SELECT '2019-10-01 19:51:55', 3762, 535447446 FROM DUAL UNION ALL
SELECT '2019-10-02 18:09:34', 3806, 552286734 FROM DUAL UNION ALL
SELECT '2019-10-02 17:54:01', 3928, 493964166 FROM DUAL
),
cte AS (
SELECT timestamp,
product_id,
user_id,
CASE WHEN user_id = LAG(user_id) OVER (ORDER BY timestamp)
THEN 0
ELSE 1
END new_user
FROM source_data
)
SELECT timestamp,
product_id,
user_id,
SUM(new_user) OVER (ORDER BY timestamp) sequence_id
FROM cte;
TIMESTAMP
PRODUCT_ID
USER_ID
SEQUENCE_ID
2019-10-01 12:12:24
3806
535447446
1
2019-10-01 19:51:55
3762
535447446
1
2019-10-02 17:54:01
3928
493964166
2
2019-10-02 18:09:34
3806
552286734
3
我在 Oracle SQL table 中有一个超过 200 万行的数据集,想对此做一些关联分析。要将序列挖掘算法应用于此数据,我需要一个名为 'sequenceId' 的列和一个名为 'eventId'.
的列table 结构如下所示:
- 时间
- pId
- uId
现在我需要一个每次 uId 更改时递增的 id。我如何在 Oracle SQL 中做到这一点?我在 R 中尝试过,但在那里需要超过 12 个小时...
示例数据:
time | pId | uId |
---|---|---|
2019-10-01 12:12:24 | 3806 | 535447446 |
2019-10-01 19:51:55 | 3762 | 535447446 |
2019-10-02 18:09:34 | 3806 | 552286734 |
2019-10-02 17:54:01 | 3928 | 493964166 |
预期结果:
time | pId | uId | sequence id |
---|---|---|---|
2019-10-01 12:12:24 | 3806 | 535447446 | 1 |
2019-10-01 19:51:55 | 3762 | 535447446 | 1 |
2019-10-02 18:09:34 | 3806 | 552286734 | 2 |
2019-10-02 17:54:01 | 3928 | 493964166 | 3 |
ID 应在 user_id 更改时增加
WITH
source_data AS (
SELECT '2019-10-01 12:12:24' AS timestamp, 3806 AS product_id, 535447446 AS user_id FROM DUAL UNION ALL
SELECT '2019-10-01 19:51:55', 3762, 535447446 FROM DUAL UNION ALL
SELECT '2019-10-02 18:09:34', 3806, 552286734 FROM DUAL UNION ALL
SELECT '2019-10-02 17:54:01', 3928, 493964166 FROM DUAL
),
cte AS (
SELECT timestamp,
product_id,
user_id,
CASE WHEN user_id = LAG(user_id) OVER (ORDER BY timestamp)
THEN 0
ELSE 1
END new_user
FROM source_data
)
SELECT timestamp,
product_id,
user_id,
SUM(new_user) OVER (ORDER BY timestamp) sequence_id
FROM cte;
TIMESTAMP PRODUCT_ID USER_ID SEQUENCE_ID 2019-10-01 12:12:24 3806 535447446 1 2019-10-01 19:51:55 3762 535447446 1 2019-10-02 17:54:01 3928 493964166 2 2019-10-02 18:09:34 3806 552286734 3