Snowflake MATCH_RECOGNIZE 跳过不重要的事件
Snowflake MATCH_RECOGNIZE to skip not important events
我有以下按发生时间排序的事件:
e4 -> e2 -> e2 -> e3 -> e10 -> e4
如果 e2 事件发生然后 e4 发生(e2 在 e4 之前),我应该如何编写 MATCH_RECOGNIZE 的 PATTERN 部分来匹配记录,无论这两个事件之间是否有 0 个或更多其他事件?
e4 -> e2 -> e2 -> e3 -> e10 -> e4 - matched
e4 -> e2 -> e4 - matched
e4 -> e4 -> e2 -> e3 - not matched
e2 -> e10 -> e2 -> e5 -> e4 - matched
所以这四个序列,它们可以最小匹配:
WITH data AS (
SELECT * FROM VALUES
(1,'e4',1),(1,'e2',2),(1,'e2',3),(1,'e3',4),(1,'e10',5),(1,'e4',6),
(2,'e4',1),(2,'e2',2),(2,'e4',3),
(3,'e4',1),(3,'e4',2),(3,'e2',3),(3,'e3',4),
(4,'e2',1),(4,'10',2),(4,'e2',3),(4,'e5',4),(4,'e4',5)
)
SELECT * FROM data
match_recognize(
partition by column1
order by column3
measures
match_number() as "MATCH_NUMBER",
match_sequence_number() as msq,
classifier() as cl
all rows per match with unmatched rows
PATTERN (d1 d2* d3)
DEFINE d1 as column2 = 'e2',
d2 as column2 NOT IN ('e2','e4'),
d3 as column2 = 'e4'
)
ORDER BY 1,3;
给予:
COLUMN1
COLUMN2
COLUMN3
MATCH_NUMBER
MSQ
CL
1
e4
1
1
e2
2
1
e2
3
1
1
D1
1
e3
4
1
2
D2
1
e10
5
1
3
D2
1
e4
6
1
4
D3
2
e4
1
2
e2
2
1
1
D1
2
e4
3
1
2
D3
3
e4
1
3
e4
2
3
e2
3
3
e3
4
4
e2
1
4
10
2
4
e2
3
1
1
D1
4
e5
4
1
2
D2
4
e4
5
1
3
D3
但是如果你说你想要“匹配”,那么也许你只想要范围细节,因此:
SELECT * FROM data
match_recognize(
partition by column1
order by column3
measures
first_value(column1) as batch,
first_value(column3) as seq_start,
last_value(column3) as seq_end,
match_number() as "MATCH_NUMBER",
match_sequence_number() as msq,
classifier() as cl
one row per match
PATTERN (d1 d2* d3)
DEFINE d1 as column2 = 'e2',
d2 as column2 NOT IN ('e2','e4'),
d3 as column2 = 'e4'
)
ORDER BY 1,3;
可能是你想要的:
COLUMN1
BATCH
SEQ_START
SEQ_END
MATCH_NUMBER
MSQ
CL
1
1
3
6
1
4
D3
2
2
2
3
1
2
D3
4
4
3
5
1
3
D3
我有以下按发生时间排序的事件:
e4 -> e2 -> e2 -> e3 -> e10 -> e4
如果 e2 事件发生然后 e4 发生(e2 在 e4 之前),我应该如何编写 MATCH_RECOGNIZE 的 PATTERN 部分来匹配记录,无论这两个事件之间是否有 0 个或更多其他事件?
e4 -> e2 -> e2 -> e3 -> e10 -> e4 - matched
e4 -> e2 -> e4 - matched
e4 -> e4 -> e2 -> e3 - not matched
e2 -> e10 -> e2 -> e5 -> e4 - matched
所以这四个序列,它们可以最小匹配:
WITH data AS (
SELECT * FROM VALUES
(1,'e4',1),(1,'e2',2),(1,'e2',3),(1,'e3',4),(1,'e10',5),(1,'e4',6),
(2,'e4',1),(2,'e2',2),(2,'e4',3),
(3,'e4',1),(3,'e4',2),(3,'e2',3),(3,'e3',4),
(4,'e2',1),(4,'10',2),(4,'e2',3),(4,'e5',4),(4,'e4',5)
)
SELECT * FROM data
match_recognize(
partition by column1
order by column3
measures
match_number() as "MATCH_NUMBER",
match_sequence_number() as msq,
classifier() as cl
all rows per match with unmatched rows
PATTERN (d1 d2* d3)
DEFINE d1 as column2 = 'e2',
d2 as column2 NOT IN ('e2','e4'),
d3 as column2 = 'e4'
)
ORDER BY 1,3;
给予:
COLUMN1 | COLUMN2 | COLUMN3 | MATCH_NUMBER | MSQ | CL |
---|---|---|---|---|---|
1 | e4 | 1 | |||
1 | e2 | 2 | |||
1 | e2 | 3 | 1 | 1 | D1 |
1 | e3 | 4 | 1 | 2 | D2 |
1 | e10 | 5 | 1 | 3 | D2 |
1 | e4 | 6 | 1 | 4 | D3 |
2 | e4 | 1 | |||
2 | e2 | 2 | 1 | 1 | D1 |
2 | e4 | 3 | 1 | 2 | D3 |
3 | e4 | 1 | |||
3 | e4 | 2 | |||
3 | e2 | 3 | |||
3 | e3 | 4 | |||
4 | e2 | 1 | |||
4 | 10 | 2 | |||
4 | e2 | 3 | 1 | 1 | D1 |
4 | e5 | 4 | 1 | 2 | D2 |
4 | e4 | 5 | 1 | 3 | D3 |
但是如果你说你想要“匹配”,那么也许你只想要范围细节,因此:
SELECT * FROM data
match_recognize(
partition by column1
order by column3
measures
first_value(column1) as batch,
first_value(column3) as seq_start,
last_value(column3) as seq_end,
match_number() as "MATCH_NUMBER",
match_sequence_number() as msq,
classifier() as cl
one row per match
PATTERN (d1 d2* d3)
DEFINE d1 as column2 = 'e2',
d2 as column2 NOT IN ('e2','e4'),
d3 as column2 = 'e4'
)
ORDER BY 1,3;
可能是你想要的:
COLUMN1 | BATCH | SEQ_START | SEQ_END | MATCH_NUMBER | MSQ | CL |
---|---|---|---|---|---|---|
1 | 1 | 3 | 6 | 1 | 4 | D3 |
2 | 2 | 2 | 3 | 1 | 2 | D3 |
4 | 4 | 3 | 5 | 1 | 3 | D3 |