如何获取每个 id 的特定行?
How to get specific rows for each id?
要了解具有多种状态的业务流程,
我想根据 created_at
列获取具有以下规则的行:
状态第一行‘created’
最后一行 ‘missing_info’
在 ‘created’
之后 (row_no 4)
第一行‘pending’
(row_no 5)
最后一行 ‘missing_info’
在 ‘pending’
之后 (row_no 7)
第一行 ‘pending’
after 'missing_info'
(row_no 8)
最后一行 ‘successful’
(row_no 10)
下面我突出显示了我要检索的行。
这是 DB-FIDDLE
上的示例数据
这是一般流程:创建 > missing_info > 待定 > 成功。但是也只能这样:创建>成功。
我知道我可以将 QUALIFY
与 window 函数一起使用,并且可以获得 'created'
和 'successful'
,如下所示。但是我不知道如何获得临时状态。我怎样才能达到预期的输出?
created AS(
SELECT *
FROM t1
WHERE status = 'created'
QUALIFY ROW_NUMBER() OVER (PARTITION BY STATUS, id ORDER BY created_at) = 1 )
请注意 created
和 successful
是开始和结束状态,因此输出中只有一行。 missing_info
或 pending
等其他状态是临时状态,因此在所需的输出中可以是多个状态。
编辑:
要了解具有多种状态的业务流程,
我想根据 created_at
列获取具有以下两个规则的一些行:
最后一行状态“missing_info”在“待定”之前(第 2 行)
待处理(第 3 行)
最后一行状态“missing_info”在“待定”之前(第 5 行)
待定(第 6 行)
示例数据:
WITH t1 AS (
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:10:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:20:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 11:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 12:10:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 12:20:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:30:00'::timestamp AS created_at
)
SELECT *
FROM t1
期望输出:
Snowflake 实现 MATCH_RECOGNIZE,这是在纯 SQL:
中查找复杂模式的最简单工具
Recognizes matches of a pattern in a set of rows. MATCH_RECOGNIZE accepts a set of rows (from a table, view, subquery, or other source) as input, and returns all matches for a given row pattern within this set. The pattern is defined similarly to a regular expression.
资料准备:
CREATE OR REPLACE TABLE t
AS
WITH t1 AS (
SELECT 'A' AS id, 'created' AS status, '2021-07-15 10:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-07-15 10:38:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:10:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:12:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 13:36:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 14:36:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-07-15 16:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-07-15 17:00:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-07-16 10:30:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-07-16 11:30:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'successful' AS status, '2021-07-16 12:30:00'::timestamp AS created_at
)
SELECT * FROM t1;
场景 1 的查询:
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY ID
ORDER BY CREATED_AT
-- MEASURES MATCH_NUMBER() AS m, --LAST/FIRST/CLASSIFIER/...
ALL ROWS PER MATCH
PATTERN (c+m+)
DEFINE
c AS status='created'
,m AS status='missing_info'
,p AS status='pending'
,s AS status='succesful'
) mr
ORDER BY ID, CREATED_AT;
-- returns rows 1-4
这里的关键点是作为 Perl 风格的正则表达式提供的模式。在这里,我们正在搜索由一个或多个“missing_info”完成的一个或多个“创建”的模式。
ALL ROWS PER MATCH
- return 所有行,但如有必要可以更改为第一行
MEASURES: Specifying Additional Output Columns 可用于提供其他信息,例如 MATCH_NUMBER/MATCH_SEQUENCE_NUMBER/CLASSIFIER 以及更多信息,具体取决于特定需求。
可以通过使用“|”在单个查询中提供更多模式(备选):(c+m+|pm+|...)
编辑:
"Thanks for the answer! It returns first 4 rows. I was essentially needed 1st and 4th row."
一旦确定了组,就可以过滤第一行和最后一行,例如 QUALIFY
。关键是使用我之前提到的 MEASURES:
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY ID
ORDER BY CREATED_AT
MEASURES MATCH_NUMBER() AS mn,
MATCH_SEQUENCE_NUMBER AS msn
ALL ROWS PER MATCH
PATTERN (c+m+)
DEFINE
c AS status='created'
,m AS status='missing_info'
,p AS status='pending'
,s AS status='succesful'
) mr
QUALIFY (ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn) = 1)
OR(ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn DESC)=1)
ORDER BY ID, CREATED_AT;
-- returns first and last row by group consisted of ID and MATCH_NUMBER
要了解具有多种状态的业务流程,
我想根据 created_at
列获取具有以下规则的行:
状态第一行
‘created’
最后一行
‘missing_info’
在‘created’
之后 (row_no 4)第一行
‘pending’
(row_no 5)最后一行
‘missing_info’
在‘pending’
之后 (row_no 7)第一行
‘pending’
after'missing_info'
(row_no 8)最后一行
‘successful’
(row_no 10)
下面我突出显示了我要检索的行。
这是 DB-FIDDLE
上的示例数据这是一般流程:创建 > missing_info > 待定 > 成功。但是也只能这样:创建>成功。
我知道我可以将 QUALIFY
与 window 函数一起使用,并且可以获得 'created'
和 'successful'
,如下所示。但是我不知道如何获得临时状态。我怎样才能达到预期的输出?
created AS(
SELECT *
FROM t1
WHERE status = 'created'
QUALIFY ROW_NUMBER() OVER (PARTITION BY STATUS, id ORDER BY created_at) = 1 )
请注意 created
和 successful
是开始和结束状态,因此输出中只有一行。 missing_info
或 pending
等其他状态是临时状态,因此在所需的输出中可以是多个状态。
编辑:
要了解具有多种状态的业务流程,
我想根据 created_at
列获取具有以下两个规则的一些行:
最后一行状态“missing_info”在“待定”之前(第 2 行)
待处理(第 3 行)
最后一行状态“missing_info”在“待定”之前(第 5 行)
待定(第 6 行)
示例数据:
WITH t1 AS (
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:10:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:20:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 11:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 12:10:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 12:20:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:30:00'::timestamp AS created_at
)
SELECT *
FROM t1
期望输出:
Snowflake 实现 MATCH_RECOGNIZE,这是在纯 SQL:
中查找复杂模式的最简单工具Recognizes matches of a pattern in a set of rows. MATCH_RECOGNIZE accepts a set of rows (from a table, view, subquery, or other source) as input, and returns all matches for a given row pattern within this set. The pattern is defined similarly to a regular expression.
资料准备:
CREATE OR REPLACE TABLE t
AS
WITH t1 AS (
SELECT 'A' AS id, 'created' AS status, '2021-07-15 10:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-07-15 10:38:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:10:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:12:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 13:36:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 14:36:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-07-15 16:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-07-15 17:00:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-07-16 10:30:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-07-16 11:30:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'successful' AS status, '2021-07-16 12:30:00'::timestamp AS created_at
)
SELECT * FROM t1;
场景 1 的查询:
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY ID
ORDER BY CREATED_AT
-- MEASURES MATCH_NUMBER() AS m, --LAST/FIRST/CLASSIFIER/...
ALL ROWS PER MATCH
PATTERN (c+m+)
DEFINE
c AS status='created'
,m AS status='missing_info'
,p AS status='pending'
,s AS status='succesful'
) mr
ORDER BY ID, CREATED_AT;
-- returns rows 1-4
这里的关键点是作为 Perl 风格的正则表达式提供的模式。在这里,我们正在搜索由一个或多个“missing_info”完成的一个或多个“创建”的模式。
ALL ROWS PER MATCH
- return 所有行,但如有必要可以更改为第一行
MEASURES: Specifying Additional Output Columns 可用于提供其他信息,例如 MATCH_NUMBER/MATCH_SEQUENCE_NUMBER/CLASSIFIER 以及更多信息,具体取决于特定需求。
可以通过使用“|”在单个查询中提供更多模式(备选):(c+m+|pm+|...)
编辑:
"Thanks for the answer! It returns first 4 rows. I was essentially needed 1st and 4th row."
一旦确定了组,就可以过滤第一行和最后一行,例如 QUALIFY
。关键是使用我之前提到的 MEASURES:
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY ID
ORDER BY CREATED_AT
MEASURES MATCH_NUMBER() AS mn,
MATCH_SEQUENCE_NUMBER AS msn
ALL ROWS PER MATCH
PATTERN (c+m+)
DEFINE
c AS status='created'
,m AS status='missing_info'
,p AS status='pending'
,s AS status='succesful'
) mr
QUALIFY (ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn) = 1)
OR(ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn DESC)=1)
ORDER BY ID, CREATED_AT;
-- returns first and last row by group consisted of ID and MATCH_NUMBER