在 "gaps and island" 问题中创建 ("force") 岛

Create ("force") island in "gaps and island" problem

我有代码将我的数据划分为间隙和孤岛解决方案。数据本身根据记录的时间戳和活动报告用户 activity、工作时间和空闲时间。我的代码运行良好,但每隔一段时间我就会有一个 user_id 记录一个应用程序的一系列活动,空闲,然后 returns 到同一个应用程序以记录额外的 activity.根据我当前的代码,看起来用户在一个应用程序上花费了将近两个小时,而实际上中间有很长的停机时间。我想“强制”创建一个岛,如果活动之间的间隔超过 30 分钟,则重新启动分区。

ACTIVITY_DATE | USER_ID | APPL_ID |  PR1  |  PR2
---------------------------------------------------

11/20/2020 10:55    A     9340         1    1
11/20/2020 10:55    A     9340         2    2
11/20/2020 10:58    A     9340         3    3
11/20/2020 10:58    A     9340         4    4
11/20/2020 10:59    A     9340         5    5
11/20/2020 13:09    A     9340         6    6
11/20/2020 13:09    A     9340         7    7
11/20/2020 13:10    A     9340         8    8
11/20/2020 13:10    A     9340         9    9
11/20/2020 17:12    A     8354        10    1
11/20/2020 17:14    A     8354        11    2
11/20/2020 17:14    A     8354        12    3

最终结果需要重新启动此示例中第六行的列 PR2 的分区,因为相同的记录活动之间的间隔超过 30 分钟 appl_id:

ACTIVITY_DATE | USER_ID | APPL_ID |  PR1  |  PR2
---------------------------------------------------

11/20/2020 10:55    A     9340         1    1
11/20/2020 10:55    A     9340         2    2
11/20/2020 10:58    A     9340         3    3
11/20/2020 10:58    A     9340         4    4
11/20/2020 10:59    A     9340         5    5
11/20/2020 13:09    A     9340         6    1
11/20/2020 13:09    A     9340         7    2
11/20/2020 13:10    A     9340         8    3
11/20/2020 13:10    A     9340         9    4
11/20/2020 17:12    A     8354        10    1
11/20/2020 17:14    A     8354        11    2
11/20/2020 17:14    A     8354        12    3

这是我当前的代码:

    select activity_date, user_id, appl_id,
        row_number() over(partition by user_id order by activity_date) rn1,
        row_number() over(partition by user_id, appl_id order by activity_date) rn2
    from 
    (select
    activity_date, user_id, appl_id, count(*)
    from mytable tt
    where
        user_id in ('A', 'B', 'C')
        and activity_date >= trunc(sysdate - 4,'DD')
        and activity_date <= trunc(sysdate - 3,'DD')
    group by
        activity_date, user_id, appl_id) tt

您可以使用 MATCH_RECOGNIZE:

SELECT activity_date,
       user_id,
       appl_id,
       pr1,
       ROW_NUMBER() OVER ( PARTITION BY user_id, appl_id, mno ORDER BY pr1 )
         AS pr2
FROM   (
  SELECT t.*,
         ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY activity_date) AS pr1
  FROM   table_name t
)
MATCH_RECOGNIZE(
  PARTITION BY user_id, appl_id
  ORDER     BY pr1
  MEASURES
    MATCH_NUMBER() AS mno
  ALL ROWS PER MATCH
  PATTERN ( activities* last_activity )
  DEFINE activities AS
    NEXT(activity_date) <= LAST(activity_date) + INTERVAL '30' MINUTE
)
ORDER BY user_id, pr1;

其中,对于示例数据:

CREATE TABLE table_name ( ACTIVITY_DATE, USER_ID, APPL_ID ) AS
SELECT DATE '2020-11-20' + INTERVAL '10:55' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:55' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:58' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:58' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:59' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:09' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:09' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:10' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:10' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:12' HOUR TO MINUTE, 'A', 8354 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:14' HOUR TO MINUTE, 'A', 8354 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:14' HOUR TO MINUTE, 'A', 8354 FROM DUAL;

输出:

ACTIVITY_DATE       | USER_ID | APPL_ID | PR1 | PR2
:------------------ | :------ | ------: | --: | --:
2020-11-20 10:55:00 | A       |    9340 |   1 |   1
2020-11-20 10:55:00 | A       |    9340 |   2 |   2
2020-11-20 10:58:00 | A       |    9340 |   3 |   3
2020-11-20 10:58:00 | A       |    9340 |   4 |   4
2020-11-20 10:59:00 | A       |    9340 |   5 |   5
2020-11-20 13:09:00 | A       |    9340 |   6 |   1
2020-11-20 13:09:00 | A       |    9340 |   7 |   2
2020-11-20 13:10:00 | A       |    9340 |   8 |   3
2020-11-20 13:10:00 | A       |    9340 |   9 |   4
2020-11-20 17:12:00 | A       |    8354 |  10 |   1
2020-11-20 17:14:00 | A       |    8354 |  11 |   2
2020-11-20 17:14:00 | A       |    8354 |  12 |   3

db<>fiddle here