行编号和子分组
Row Numbering and Sub Grouping
我希望有人能提供帮助;我 class 我自己是 Oracle/SQL 的新手,但到目前为止我已经设法得到我需要的东西,但我在如何处理我的查询时遇到了障碍。
我有一个活动数据集,每个活动 activity 都有一个在整个生命周期中保持一致的唯一 ID;每个 activity 有多个时间指示的事件;每个事件都可以有不同的状态。请参阅下面的示例集。
我想要实现的是一个列表,其中包含按 activity id 和时间排序的数据,每个 activity (1,2,3,4) 都有一个增量 ID;但我还需要一个从 1 开始并在状态与上一行不同时递增的辅助列。
以下是我的数据示例:
ACTIVITY_ID | EVENT_TIMESTAMP | EVENT_STATUS
--------------------------------------------------------
A001 | 01/01/2020 09:00:00 | STATUS A
A001 | 01/01/2020 10:10:00 | STATUS B
A001 | 01/01/2020 11:20:00 | STATUS C
A001 | 01/01/2020 12:30:00 | STATUS C
A002 | 01/01/2020 13:40:00 | STATUS F
A002 | 01/01/2020 17:50:00 | STATUS F
A002 | 01/01/2020 17:53:00 | STATUS G
利用 ROW_NUMBER 和 PARTITION BY,我得到了一个输出,它给出了我的有序列表,如下所示:
ACTIVITY_ID | EVENT_TIMESTAMP | EVENT_STATUS | EVENT_NUMBER
--------------------------------------------------------------------
A001 | 01/01/2020 09:00:00 | STATUS A | 1
A001 | 01/01/2020 10:10:00 | STATUS B | 2
A001 | 01/01/2020 11:20:00 | STATUS C | 3
A001 | 01/01/2020 12:30:00 | STATUS C | 4
A002 | 01/01/2020 13:40:00 | STATUS F | 1
A002 | 01/01/2020 17:50:00 | STATUS F | 2
A002 | 01/01/2020 17:53:00 | STATUS G | 3
我正在努力解决的是我正在寻找的子分组结果(如下),这是否应该与 ROW_NUMBER 相同但具有针对事件状态的分区?我已经尝试过各种尝试,但分区总是在状态更改时重置为 1,而不是从 1 开始,然后随着每次更改而递增?
ACTIVITY_ID | EVENT_TIMESTAMP | EVENT_STATUS | EVENT_NUMBER | EVENT_STATUS_GROUP
----------------------------------------------------------------------------------------
A001 | 01/01/2020 09:00:00 | STATUS A | 1 | 1
A001 | 01/01/2020 10:10:00 | STATUS B | 2 | 2
A001 | 01/01/2020 11:20:00 | STATUS C | 3 | 3
A001 | 01/01/2020 12:30:00 | STATUS C | 4 | 3
A001 | 01/01/2020 12:30:00 | STATUS A | 5 | 4
A002 | 01/01/2020 13:40:00 | STATUS F | 1 | 1
A002 | 01/01/2020 17:50:00 | STATUS F | 2 | 1
A002 | 01/01/2020 17:53:00 | STATUS G | 3 | 2
我希望这已经够清楚了,如果还不清楚,请提出任何问题。
可以使用DENSE_RANK()
解析函数:
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ACTIVITY_ID ORDER BY EVENT_STATUS )
AS EVENT_NUMBER,
DENSE_RANK() OVER (PARTITION BY ACTIVITY_ID ORDER BY EVENT_STATUS )
AS EVENT_STATUS_GROUP
FROM tab t
ORDER BY ACTIVITY_ID, EVENT_NUMBER
您可以使用lag()
和一个累计总和来计算变化的次数:
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ACTIVITY_ID ORDER BY EVENT_TIMESTAMP) AS EVENT_NUMBER,
SUM(CASE WHEN PREV_EVENT_STATUS = EVENT_STATUS THEN 0 ELSE 1 END) OVER
(PARTITION BY ACTIVITY_ID ORDER BY EVENT_TIMESTAMP) AS EVENT_STATUS_GROUP
FROM (SELECT t.*,
LAG(EVENT_STATUS) OVER (PARTITION BY ACTIVITY_ID ORDER BY EVENT_TIMESTAMP) as PREV_EVENT_STATUS
FROM t
) t
ORDER BY ACTIVITY_ID, EVENT_NUMBER ;
Here 是一个 db<>fiddle.
您可以使用调制解调器 MATCH_RECOGNIZE:
--main query:
select
ACTIVITY_ID, EVENT_TIMESTAMP, EVENT_STATUS
,EVENT_NUMBER
,EVENT_STATUS_GROUP
,CLS
from (select t.*
,row_number()over(partition by ACTIVITY_ID order by EVENT_TIMESTAMP) EVENT_NUMBER
from your_tab t
)
match_recognize(
partition by ACTIVITY_ID
order by EVENT_TIMESTAMP
measures
MATCH_NUMBER() AS EVENT_STATUS_GROUP,
case when classifier()='B' then 'DUP' end as cls
all rows per match
pattern(A B*)
define
b AS b.EVENT_STATUS =PREV(b.EVENT_STATUS)
);
结果:
ACTIVITY_ID EVENT_TIMESTAMP EVENT_STATUS EVENT_NUMBER EVENT_STATUS_GROUP CLS
------------- ------------------- ------------- ------------ ------------------ ---
A001 2020-01-01 09:00:00 STATUS A 1 1
A001 2020-01-01 10:10:00 STATUS B 2 2
A001 2020-01-01 11:20:00 STATUS C 3 3
A001 2020-01-01 12:30:00 STATUS C 4 3 DUP
A001 2020-01-01 13:10:00 STATUS D 5 4
A002 2020-01-01 13:40:00 STATUS F 1 1
A002 2020-01-01 17:50:00 STATUS F 2 1 DUP
A002 2020-01-01 17:53:00 STATUS G 3 2
8 rows selected.
完整示例(我在您的示例中添加了一行):
-- your sample data:
with your_tab(ACTIVITY_ID, EVENT_TIMESTAMP, EVENT_STATUS) as (
select 'A001', to_date('01/01/2020 09:00:00','dd/mm/yyyy hh24:mi:ss'),'STATUS A' from dual union all
select 'A001', to_date('01/01/2020 10:10:00','dd/mm/yyyy hh24:mi:ss'),'STATUS B' from dual union all
select 'A001', to_date('01/01/2020 11:20:00','dd/mm/yyyy hh24:mi:ss'),'STATUS C' from dual union all
select 'A001', to_date('01/01/2020 12:30:00','dd/mm/yyyy hh24:mi:ss'),'STATUS C' from dual union all
select 'A001', to_date('01/01/2020 13:10:00','dd/mm/yyyy hh24:mi:ss'),'STATUS D' from dual union all
select 'A002', to_date('01/01/2020 13:40:00','dd/mm/yyyy hh24:mi:ss'),'STATUS F' from dual union all
select 'A002', to_date('01/01/2020 17:50:00','dd/mm/yyyy hh24:mi:ss'),'STATUS F' from dual union all
select 'A002', to_date('01/01/2020 17:53:00','dd/mm/yyyy hh24:mi:ss'),'STATUS G' from dual
)
--main query:
select
ACTIVITY_ID, EVENT_TIMESTAMP, EVENT_STATUS
,EVENT_NUMBER
,EVENT_STATUS_GROUP
,CLS
from (select t.*
,row_number()over(partition by ACTIVITY_ID order by EVENT_TIMESTAMP) EVENT_NUMBER
from your_tab t
)
match_recognize(
partition by ACTIVITY_ID
order by EVENT_TIMESTAMP
measures
MATCH_NUMBER() AS EVENT_STATUS_GROUP,
case when classifier()='B' then 'DUP' end as cls
all rows per match
pattern(A B*)
define
b AS b.EVENT_STATUS =PREV(b.EVENT_STATUS)
);
我希望有人能提供帮助;我 class 我自己是 Oracle/SQL 的新手,但到目前为止我已经设法得到我需要的东西,但我在如何处理我的查询时遇到了障碍。
我有一个活动数据集,每个活动 activity 都有一个在整个生命周期中保持一致的唯一 ID;每个 activity 有多个时间指示的事件;每个事件都可以有不同的状态。请参阅下面的示例集。
我想要实现的是一个列表,其中包含按 activity id 和时间排序的数据,每个 activity (1,2,3,4) 都有一个增量 ID;但我还需要一个从 1 开始并在状态与上一行不同时递增的辅助列。
以下是我的数据示例:
ACTIVITY_ID | EVENT_TIMESTAMP | EVENT_STATUS
--------------------------------------------------------
A001 | 01/01/2020 09:00:00 | STATUS A
A001 | 01/01/2020 10:10:00 | STATUS B
A001 | 01/01/2020 11:20:00 | STATUS C
A001 | 01/01/2020 12:30:00 | STATUS C
A002 | 01/01/2020 13:40:00 | STATUS F
A002 | 01/01/2020 17:50:00 | STATUS F
A002 | 01/01/2020 17:53:00 | STATUS G
利用 ROW_NUMBER 和 PARTITION BY,我得到了一个输出,它给出了我的有序列表,如下所示:
ACTIVITY_ID | EVENT_TIMESTAMP | EVENT_STATUS | EVENT_NUMBER
--------------------------------------------------------------------
A001 | 01/01/2020 09:00:00 | STATUS A | 1
A001 | 01/01/2020 10:10:00 | STATUS B | 2
A001 | 01/01/2020 11:20:00 | STATUS C | 3
A001 | 01/01/2020 12:30:00 | STATUS C | 4
A002 | 01/01/2020 13:40:00 | STATUS F | 1
A002 | 01/01/2020 17:50:00 | STATUS F | 2
A002 | 01/01/2020 17:53:00 | STATUS G | 3
我正在努力解决的是我正在寻找的子分组结果(如下),这是否应该与 ROW_NUMBER 相同但具有针对事件状态的分区?我已经尝试过各种尝试,但分区总是在状态更改时重置为 1,而不是从 1 开始,然后随着每次更改而递增?
ACTIVITY_ID | EVENT_TIMESTAMP | EVENT_STATUS | EVENT_NUMBER | EVENT_STATUS_GROUP
----------------------------------------------------------------------------------------
A001 | 01/01/2020 09:00:00 | STATUS A | 1 | 1
A001 | 01/01/2020 10:10:00 | STATUS B | 2 | 2
A001 | 01/01/2020 11:20:00 | STATUS C | 3 | 3
A001 | 01/01/2020 12:30:00 | STATUS C | 4 | 3
A001 | 01/01/2020 12:30:00 | STATUS A | 5 | 4
A002 | 01/01/2020 13:40:00 | STATUS F | 1 | 1
A002 | 01/01/2020 17:50:00 | STATUS F | 2 | 1
A002 | 01/01/2020 17:53:00 | STATUS G | 3 | 2
我希望这已经够清楚了,如果还不清楚,请提出任何问题。
可以使用DENSE_RANK()
解析函数:
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ACTIVITY_ID ORDER BY EVENT_STATUS )
AS EVENT_NUMBER,
DENSE_RANK() OVER (PARTITION BY ACTIVITY_ID ORDER BY EVENT_STATUS )
AS EVENT_STATUS_GROUP
FROM tab t
ORDER BY ACTIVITY_ID, EVENT_NUMBER
您可以使用lag()
和一个累计总和来计算变化的次数:
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ACTIVITY_ID ORDER BY EVENT_TIMESTAMP) AS EVENT_NUMBER,
SUM(CASE WHEN PREV_EVENT_STATUS = EVENT_STATUS THEN 0 ELSE 1 END) OVER
(PARTITION BY ACTIVITY_ID ORDER BY EVENT_TIMESTAMP) AS EVENT_STATUS_GROUP
FROM (SELECT t.*,
LAG(EVENT_STATUS) OVER (PARTITION BY ACTIVITY_ID ORDER BY EVENT_TIMESTAMP) as PREV_EVENT_STATUS
FROM t
) t
ORDER BY ACTIVITY_ID, EVENT_NUMBER ;
Here 是一个 db<>fiddle.
您可以使用调制解调器 MATCH_RECOGNIZE:
--main query:
select
ACTIVITY_ID, EVENT_TIMESTAMP, EVENT_STATUS
,EVENT_NUMBER
,EVENT_STATUS_GROUP
,CLS
from (select t.*
,row_number()over(partition by ACTIVITY_ID order by EVENT_TIMESTAMP) EVENT_NUMBER
from your_tab t
)
match_recognize(
partition by ACTIVITY_ID
order by EVENT_TIMESTAMP
measures
MATCH_NUMBER() AS EVENT_STATUS_GROUP,
case when classifier()='B' then 'DUP' end as cls
all rows per match
pattern(A B*)
define
b AS b.EVENT_STATUS =PREV(b.EVENT_STATUS)
);
结果:
ACTIVITY_ID EVENT_TIMESTAMP EVENT_STATUS EVENT_NUMBER EVENT_STATUS_GROUP CLS
------------- ------------------- ------------- ------------ ------------------ ---
A001 2020-01-01 09:00:00 STATUS A 1 1
A001 2020-01-01 10:10:00 STATUS B 2 2
A001 2020-01-01 11:20:00 STATUS C 3 3
A001 2020-01-01 12:30:00 STATUS C 4 3 DUP
A001 2020-01-01 13:10:00 STATUS D 5 4
A002 2020-01-01 13:40:00 STATUS F 1 1
A002 2020-01-01 17:50:00 STATUS F 2 1 DUP
A002 2020-01-01 17:53:00 STATUS G 3 2
8 rows selected.
完整示例(我在您的示例中添加了一行):
-- your sample data:
with your_tab(ACTIVITY_ID, EVENT_TIMESTAMP, EVENT_STATUS) as (
select 'A001', to_date('01/01/2020 09:00:00','dd/mm/yyyy hh24:mi:ss'),'STATUS A' from dual union all
select 'A001', to_date('01/01/2020 10:10:00','dd/mm/yyyy hh24:mi:ss'),'STATUS B' from dual union all
select 'A001', to_date('01/01/2020 11:20:00','dd/mm/yyyy hh24:mi:ss'),'STATUS C' from dual union all
select 'A001', to_date('01/01/2020 12:30:00','dd/mm/yyyy hh24:mi:ss'),'STATUS C' from dual union all
select 'A001', to_date('01/01/2020 13:10:00','dd/mm/yyyy hh24:mi:ss'),'STATUS D' from dual union all
select 'A002', to_date('01/01/2020 13:40:00','dd/mm/yyyy hh24:mi:ss'),'STATUS F' from dual union all
select 'A002', to_date('01/01/2020 17:50:00','dd/mm/yyyy hh24:mi:ss'),'STATUS F' from dual union all
select 'A002', to_date('01/01/2020 17:53:00','dd/mm/yyyy hh24:mi:ss'),'STATUS G' from dual
)
--main query:
select
ACTIVITY_ID, EVENT_TIMESTAMP, EVENT_STATUS
,EVENT_NUMBER
,EVENT_STATUS_GROUP
,CLS
from (select t.*
,row_number()over(partition by ACTIVITY_ID order by EVENT_TIMESTAMP) EVENT_NUMBER
from your_tab t
)
match_recognize(
partition by ACTIVITY_ID
order by EVENT_TIMESTAMP
measures
MATCH_NUMBER() AS EVENT_STATUS_GROUP,
case when classifier()='B' then 'DUP' end as cls
all rows per match
pattern(A B*)
define
b AS b.EVENT_STATUS =PREV(b.EVENT_STATUS)
);