SQL Oracle - 按 ID、任务 ID、最小和最大时间戳分组
SQL Oracle - Group by ID, task ID, min and max timestamp
我有用户执行不同任务的数据。
我想按用户 ID 和任务 ID 对这些数据进行分组,以获得每个任务的开始和结束时间。当员工更改为另一个任务时,应该有一个新的行,其中包含新的开始和结束时间。
示例简化数据集:
用户 ID
任务栏
date_time_stamp(升序)
1
任务-A
16/6/2021 04:17:00
1
任务-A
16/6/2021 04:19:00
1
任务-A
16/6/2021 04:27:00
1
任务-B
16/6/2021 04:31:00
1
任务-B
16/6/2021 04:33:00
1
任务-B
16/6/2021 04:36:00
1
任务-A
16/6/2021 04:42:00
1
任务-A
16/6/2021 04:44:00
示例结果
用户 ID
任务栏
first_dtm
last_dtm
1
任务-A
16/6/2021 04:17:00
16/6/2021 04:27:00
1
任务-B
16/6/2021 04:31:00
16/6/2021 04:36:00
1
任务-A
16/6/2021 04:42:00
16/6/2021 04:44:00
我知道我应该将一些 min() 和 max() 函数与 GROUP BY 结合使用。但是,按 userid 和 taskid 分组,在此示例中,任务 A 将只有一行。
从 Oracle 12 开始,您可以使用 MATCH_RECOGNIZE
:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY userid
ORDER BY date_time_stamp
MEASURES
FIRST(taskid) AS taskid,
FIRST(date_time_stamp) AS start_date,
LAST(date_time_stamp) AS end_date
ONE ROW PER MATCH
PATTERN ( same_task+ )
DEFINE same_task AS FIRST(taskid) = taskid
)
在此之前,您可以使用ROW_NUMBER
解析函数和聚合:
SELECT userid,
taskid,
MIN(date_time_stamp) AS start_date,
MAX(date_time_stamp) AS end_date
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY userid ORDER BY date_time_stamp )
- ROW_NUMBER() OVER ( PARTITION BY userid, taskid ORDER BY date_time_stamp )
AS grp
FROM table_name t
)
GROUP BY userid, taskid, grp
ORDER BY userid, start_date
其中,对于您的示例数据:
CREATE TABLE table_name ( userid, taskid, date_time_stamp ) AS
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:17:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:19:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:27:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:31:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:33:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:36:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:42:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:44:00' HOUR TO SECOND FROM DUAL
双输出:
USERID
TASKID
START_DATE
END_DATE
1
task-A
2021-06-16 04:17:00
2021-06-16 04:27:00
1
task-B
2021-06-16 04:31:00
2021-06-16 04:36:00
1
task-A
2021-06-16 04:42:00
2021-06-16 04:44:00
db<>fiddle here
我有用户执行不同任务的数据。 我想按用户 ID 和任务 ID 对这些数据进行分组,以获得每个任务的开始和结束时间。当员工更改为另一个任务时,应该有一个新的行,其中包含新的开始和结束时间。
示例简化数据集:
用户 ID | 任务栏 | date_time_stamp(升序) |
---|---|---|
1 | 任务-A | 16/6/2021 04:17:00 |
1 | 任务-A | 16/6/2021 04:19:00 |
1 | 任务-A | 16/6/2021 04:27:00 |
1 | 任务-B | 16/6/2021 04:31:00 |
1 | 任务-B | 16/6/2021 04:33:00 |
1 | 任务-B | 16/6/2021 04:36:00 |
1 | 任务-A | 16/6/2021 04:42:00 |
1 | 任务-A | 16/6/2021 04:44:00 |
示例结果
用户 ID | 任务栏 | first_dtm | last_dtm |
---|---|---|---|
1 | 任务-A | 16/6/2021 04:17:00 | 16/6/2021 04:27:00 |
1 | 任务-B | 16/6/2021 04:31:00 | 16/6/2021 04:36:00 |
1 | 任务-A | 16/6/2021 04:42:00 | 16/6/2021 04:44:00 |
我知道我应该将一些 min() 和 max() 函数与 GROUP BY 结合使用。但是,按 userid 和 taskid 分组,在此示例中,任务 A 将只有一行。
从 Oracle 12 开始,您可以使用 MATCH_RECOGNIZE
:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY userid
ORDER BY date_time_stamp
MEASURES
FIRST(taskid) AS taskid,
FIRST(date_time_stamp) AS start_date,
LAST(date_time_stamp) AS end_date
ONE ROW PER MATCH
PATTERN ( same_task+ )
DEFINE same_task AS FIRST(taskid) = taskid
)
在此之前,您可以使用ROW_NUMBER
解析函数和聚合:
SELECT userid,
taskid,
MIN(date_time_stamp) AS start_date,
MAX(date_time_stamp) AS end_date
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY userid ORDER BY date_time_stamp )
- ROW_NUMBER() OVER ( PARTITION BY userid, taskid ORDER BY date_time_stamp )
AS grp
FROM table_name t
)
GROUP BY userid, taskid, grp
ORDER BY userid, start_date
其中,对于您的示例数据:
CREATE TABLE table_name ( userid, taskid, date_time_stamp ) AS
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:17:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:19:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:27:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:31:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:33:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-B', DATE '2021-06-16' + INTERVAL '04:36:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:42:00' HOUR TO SECOND FROM DUAL UNION ALL
SELECT 1, 'task-A', DATE '2021-06-16' + INTERVAL '04:44:00' HOUR TO SECOND FROM DUAL
双输出:
USERID TASKID START_DATE END_DATE 1 task-A 2021-06-16 04:17:00 2021-06-16 04:27:00 1 task-B 2021-06-16 04:31:00 2021-06-16 04:36:00 1 task-A 2021-06-16 04:42:00 2021-06-16 04:44:00
db<>fiddle here