如何使用 match_recognize 定义模式以查找不连续的有序事件?

How to define a pattern with match_recognize to find ordered events that aren't consecutive?

我试图找到包含 3 个特定事件的会话,它们需要排序,这意味着 event_1 先发生,然后是 event_2,然后是 event_3,但是它们不需要一个接一个地集中。相反,它们之间可以有任意数量的其他随机事件。如何在 match_recognize 子句中定义模式以允许我使用分类器语句标记这些事件,并在序列不完整的情况下标记它们,例如仅发生 event_1,或者如果event_1 + event_2 发生了吗?

或者有没有其他更有效且不涉及 match_recognize 的方法?我试图避免多重连接,因为数据很大。

这是一个用于演示的虚拟查询:

select 
    session_id,
    event,
    event_dttm
from events
match_recognize (
    partition by session_id
    order by event_dttm
    measures
        classifier as var
    all rows per match with unmatched rows
    pattern (???answer needed???)
    define
        event_1 as event = 'Click image',
        event_2 as event = 'Open profile',
        event_3 as event = 'Leave review');

您可以将另一个事件放入“但不是那些其他事件”,然后在该事件之间进行 0 次匹配:

使用此数据:

with events(session_id, event, event_dttm) as (
    SELECT * FROM VALUES
    (99, 0, 10)
    ,(99, 1, 11)
    ,(99, 2, 12)
    ,(99, 3, 13)
    
    ,(98, 1, 10)
    ,(98, 2, 11)
    ,(98, 3, 12)
    ,(98, 0, 13)
    
    ,(100, 1, 10)
    ,(100, 2, 11)
    ,(100, 3, 12)
    
    ,(101, 1, 10)
    ,(101, 0, 11)
    ,(101, 2, 12)
    ,(101, 3, 13)

    ,(102, 1, 10)
    ,(102, 0, 11)
    ,(102, 0, 12)
    ,(102, 2, 13)
    ,(102, 3, 14)

    ,(103, 1, 10)
    ,(103, 0, 11)
    ,(103, 2, 12)
    ,(103, 0, 13)
    ,(103, 3, 14)
    
    ,(104, 1, 10)
    ,(104, 0, 11)
    ,(104, 2, 12)
    ,(104, 0, 13)
    /* incomplete ,(104, 3, 14) */
)
select 
    *
from events
match_recognize (
    partition by session_id
    order by event_dttm
    measures
        classifier as var
    all rows per match with unmatched rows
    pattern (e1 ex* e2 ex* e3)
    define
        e1 as event = 1,
        e2 as event = 2,
        e3 as event = 3,
        ex as event not in (1,2,3))
ORDER BY 1,3;

给出:

SESSION_ID EVENT EVENT_DTTM VAR
98 1 10 E1
98 2 11 E2
98 3 12 E3
98 0 13
99 0 10
99 1 11 E1
99 2 12 E2
99 3 13 E3
100 1 10 E1
100 2 11 E2
100 3 12 E3
101 1 10 E1
101 0 11 EX
101 2 12 E2
101 3 13 E3
102 1 10 E1
102 0 11 EX
102 0 12 EX
102 2 13 E2
102 3 14 E3
103 1 10 E1
103 0 11 EX
103 2 12 E2
103 0 13 EX
103 3 14 E3
104 1 10
104 0 11
104 2 12
104 0 13