在 SQL 中对有序数据中的子集进行分组
Grouping subsets within ordered data in SQL
我有一组制造操作数据。在流程的某些部分,可能有一些步骤可以并行完成,这意味着它们可以按任何顺序完成,甚至可以重叠。例如,在下面的示例中,订单 1001 的步骤 2、3 和 4 可以按任何顺序完成。 Type = C 表示并行操作。
因为历史数据可能显示以任何顺序完成的并行步骤,我想将 C 步骤的每个块视为一行,并使用该组内的最小开始时间和最大结束时间,如所需 table.
我如何在 SQL 中实现这一点?特别是 HANA SQL,但任何相关示例都会有所帮助。
当前:
+-----------+------+------+---------------------+---------------------+
| order_nbr | step | type | start | end |
+-----------+------+------+---------------------+---------------------+
| 1001 | 1 | P | 2021-01-01 00:00:00 | 2021-01-01 09:00:00 |
| 1001 | 2 | C | 2021-01-04 03:00:00 | 2021-01-04 06:00:00 |
| 1001 | 3 | C | 2021-01-03 07:00:00 | 2021-01-03 08:00:00 |
| 1001 | 4 | C | 2021-01-05 10:00:00 | 2021-01-05 15:00:00 |
| 1001 | 5 | Z | 2021-01-06 00:00:00 | 2021-01-06 06:00:00 |
| 1001 | 6 | Z | 2021-01-06 16:00:00 | 2021-01-06 20:00:00 |
| 1001 | 7 | C | 2021-01-07 08:00:00 | 2021-01-07 09:00:00 |
| 1001 | 8 | C | 2021-01-07 10:00:00 | 2021-01-07 12:00:00 |
| 1002 | 1 | P | 2021-01-04 08:00:00 | 2021-01-04 16:00:00 |
+-----------+------+------+---------------------+---------------------+
期望:
+-----------+---------+------+---------------------+---------------------+
| order_nbr | step | type | start | end |
+-----------+---------+------+---------------------+---------------------+
| 1001 | 1 | P | 2021-01-01 00:00:00 | 2021-01-01 09:00:00 |
| 1001 | 2, 3, 4 | C | 2021-01-03 07:00:00 | 2021-01-05 15:00:00 |
| 1001 | 5 | Z | 2021-01-06 00:00:00 | 2021-01-06 06:00:00 |
| 1001 | 6 | Z | 2021-01-06 16:00:00 | 2021-01-06 20:00:00 |
| 1001 | 7, 8 | C | 2021-01-07 08:00:00 | 2021-01-07 12:00:00 |
| 1002 | 1 | P | 2021-01-04 08:00:00 | 2021-01-04 16:00:00 |
+-----------+---------+------+---------------------+---------------------+
与 一样,这是一个缺口和孤岛问题,因此您可以查看链接的文章以深入研究该问题。但是您需要在找到岛屿后有条件地对数据进行分组(您只需要折叠 type = 'C'
项。
代码如下:
with s as (
select '1001' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-01 00:00:00' as start_ts, timestamp '2021-01-01 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '2' as step, 'C' as ex_type, timestamp '2021-01-04 03:00:00' as start_ts, timestamp '2021-01-04 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '3' as step, 'C' as ex_type, timestamp '2021-01-03 07:00:00' as start_ts, timestamp '2021-01-03 08:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '4' as step, 'C' as ex_type, timestamp '2021-01-05 10:00:00' as start_ts, timestamp '2021-01-05 15:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '5' as step, 'Z' as ex_type, timestamp '2021-01-06 00:00:00' as start_ts, timestamp '2021-01-06 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '6' as step, 'Z' as ex_type, timestamp '2021-01-06 16:00:00' as start_ts, timestamp '2021-01-06 20:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '7' as step, 'C' as ex_type, timestamp '2021-01-07 08:00:00' as start_ts, timestamp '2021-01-07 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '8' as step, 'C' as ex_type, timestamp '2021-01-07 10:00:00' as start_ts, timestamp '2021-01-07 12:00:00' as end_ts from dummy union all
select '1002' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-04 08:00:00' as start_ts, timestamp '2021-01-04 16:00:00' as end_ts from dummy
)
, num as (
select
s.*
/*Find consecutive rows on ex_type field*/
, row_number() over(partition by order_nbr order by start_ts asc) as r1
, row_number() over(partition by order_nbr, ex_type order by start_ts asc) as r2
from s
)
select
order_nbr
, ex_type
, min(start_ts) as start_ts
, max(end_ts) as end_ts
, string_agg(step, ',' order by start_ts asc) as steps
from num
group by
order_nbr
, ex_type
, case
/*For C use group number, for others - use original row number not to collapse them*/
when ex_type = 'C'
then r1 - r2
else r1
end
order by
order_nbr
, start_ts asc
这里是 PostgreSQL 上的 db<>fiddle 作为 HANA 语法相同的平台,用于所涉及的功能。
这是我在使用 astentx 提供的答案之前的方法,它为非 C 行和 C 类行组创建一个 id。
with s as (
select '1001' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-01 00:00:00' as start_ts, timestamp '2021-01-01 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '2' as step, 'C' as ex_type, timestamp '2021-01-04 03:00:00' as start_ts, timestamp '2021-01-04 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '3' as step, 'C' as ex_type, timestamp '2021-01-03 07:00:00' as start_ts, timestamp '2021-01-03 08:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '4' as step, 'C' as ex_type, timestamp '2021-01-05 10:00:00' as start_ts, timestamp '2021-01-05 15:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '5' as step, 'Z' as ex_type, timestamp '2021-01-06 00:00:00' as start_ts, timestamp '2021-01-06 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '6' as step, 'Z' as ex_type, timestamp '2021-01-06 16:00:00' as start_ts, timestamp '2021-01-06 20:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '7' as step, 'C' as ex_type, timestamp '2021-01-07 08:00:00' as start_ts, timestamp '2021-01-07 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '8' as step, 'C' as ex_type, timestamp '2021-01-07 10:00:00' as start_ts, timestamp '2021-01-07 12:00:00' as end_ts from dummy union all
select '1002' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-04 08:00:00' as start_ts, timestamp '2021-01-04 16:00:00' as end_ts from dummy
)
select
b.order_nbr,
b.ex_type,
min(b.start_ts) as start_ts,
max(b.end_ts) as end_ts,
string_agg(b.step, ',') as steps
from
(select
a.order_nbr,
a.step,
a.ex_type,
a.start_ts,
a.end_ts,
sum(a.inc) over (order by a.order_nbr asc, a.start_ts asc) as id
from
(select
s.order_nbr,
s.step,
s.ex_type,
s.start_ts,
s.end_ts,
case
when s.ex_type = 'C' and s.ex_type = lag(s.ex_type) over (partition by s.order_nbr order by s.start_ts)
then 0
else 1
end as inc
from
s
order by
s.order_nbr asc,
s.start_ts asc
) as a
) as b
group by
b.order_nbr,
b.ex_type,
b.id
order by
b.order_nbr asc,
min(b.start_ts) asc
我有一组制造操作数据。在流程的某些部分,可能有一些步骤可以并行完成,这意味着它们可以按任何顺序完成,甚至可以重叠。例如,在下面的示例中,订单 1001 的步骤 2、3 和 4 可以按任何顺序完成。 Type = C 表示并行操作。
因为历史数据可能显示以任何顺序完成的并行步骤,我想将 C 步骤的每个块视为一行,并使用该组内的最小开始时间和最大结束时间,如所需 table.
我如何在 SQL 中实现这一点?特别是 HANA SQL,但任何相关示例都会有所帮助。
当前:
+-----------+------+------+---------------------+---------------------+
| order_nbr | step | type | start | end |
+-----------+------+------+---------------------+---------------------+
| 1001 | 1 | P | 2021-01-01 00:00:00 | 2021-01-01 09:00:00 |
| 1001 | 2 | C | 2021-01-04 03:00:00 | 2021-01-04 06:00:00 |
| 1001 | 3 | C | 2021-01-03 07:00:00 | 2021-01-03 08:00:00 |
| 1001 | 4 | C | 2021-01-05 10:00:00 | 2021-01-05 15:00:00 |
| 1001 | 5 | Z | 2021-01-06 00:00:00 | 2021-01-06 06:00:00 |
| 1001 | 6 | Z | 2021-01-06 16:00:00 | 2021-01-06 20:00:00 |
| 1001 | 7 | C | 2021-01-07 08:00:00 | 2021-01-07 09:00:00 |
| 1001 | 8 | C | 2021-01-07 10:00:00 | 2021-01-07 12:00:00 |
| 1002 | 1 | P | 2021-01-04 08:00:00 | 2021-01-04 16:00:00 |
+-----------+------+------+---------------------+---------------------+
期望:
+-----------+---------+------+---------------------+---------------------+
| order_nbr | step | type | start | end |
+-----------+---------+------+---------------------+---------------------+
| 1001 | 1 | P | 2021-01-01 00:00:00 | 2021-01-01 09:00:00 |
| 1001 | 2, 3, 4 | C | 2021-01-03 07:00:00 | 2021-01-05 15:00:00 |
| 1001 | 5 | Z | 2021-01-06 00:00:00 | 2021-01-06 06:00:00 |
| 1001 | 6 | Z | 2021-01-06 16:00:00 | 2021-01-06 20:00:00 |
| 1001 | 7, 8 | C | 2021-01-07 08:00:00 | 2021-01-07 12:00:00 |
| 1002 | 1 | P | 2021-01-04 08:00:00 | 2021-01-04 16:00:00 |
+-----------+---------+------+---------------------+---------------------+
与 type = 'C'
项。
代码如下:
with s as (
select '1001' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-01 00:00:00' as start_ts, timestamp '2021-01-01 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '2' as step, 'C' as ex_type, timestamp '2021-01-04 03:00:00' as start_ts, timestamp '2021-01-04 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '3' as step, 'C' as ex_type, timestamp '2021-01-03 07:00:00' as start_ts, timestamp '2021-01-03 08:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '4' as step, 'C' as ex_type, timestamp '2021-01-05 10:00:00' as start_ts, timestamp '2021-01-05 15:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '5' as step, 'Z' as ex_type, timestamp '2021-01-06 00:00:00' as start_ts, timestamp '2021-01-06 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '6' as step, 'Z' as ex_type, timestamp '2021-01-06 16:00:00' as start_ts, timestamp '2021-01-06 20:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '7' as step, 'C' as ex_type, timestamp '2021-01-07 08:00:00' as start_ts, timestamp '2021-01-07 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '8' as step, 'C' as ex_type, timestamp '2021-01-07 10:00:00' as start_ts, timestamp '2021-01-07 12:00:00' as end_ts from dummy union all
select '1002' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-04 08:00:00' as start_ts, timestamp '2021-01-04 16:00:00' as end_ts from dummy
)
, num as (
select
s.*
/*Find consecutive rows on ex_type field*/
, row_number() over(partition by order_nbr order by start_ts asc) as r1
, row_number() over(partition by order_nbr, ex_type order by start_ts asc) as r2
from s
)
select
order_nbr
, ex_type
, min(start_ts) as start_ts
, max(end_ts) as end_ts
, string_agg(step, ',' order by start_ts asc) as steps
from num
group by
order_nbr
, ex_type
, case
/*For C use group number, for others - use original row number not to collapse them*/
when ex_type = 'C'
then r1 - r2
else r1
end
order by
order_nbr
, start_ts asc
这里是 PostgreSQL 上的 db<>fiddle 作为 HANA 语法相同的平台,用于所涉及的功能。
这是我在使用 astentx 提供的答案之前的方法,它为非 C 行和 C 类行组创建一个 id。
with s as (
select '1001' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-01 00:00:00' as start_ts, timestamp '2021-01-01 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '2' as step, 'C' as ex_type, timestamp '2021-01-04 03:00:00' as start_ts, timestamp '2021-01-04 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '3' as step, 'C' as ex_type, timestamp '2021-01-03 07:00:00' as start_ts, timestamp '2021-01-03 08:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '4' as step, 'C' as ex_type, timestamp '2021-01-05 10:00:00' as start_ts, timestamp '2021-01-05 15:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '5' as step, 'Z' as ex_type, timestamp '2021-01-06 00:00:00' as start_ts, timestamp '2021-01-06 06:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '6' as step, 'Z' as ex_type, timestamp '2021-01-06 16:00:00' as start_ts, timestamp '2021-01-06 20:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '7' as step, 'C' as ex_type, timestamp '2021-01-07 08:00:00' as start_ts, timestamp '2021-01-07 09:00:00' as end_ts from dummy union all
select '1001' as order_nbr, '8' as step, 'C' as ex_type, timestamp '2021-01-07 10:00:00' as start_ts, timestamp '2021-01-07 12:00:00' as end_ts from dummy union all
select '1002' as order_nbr, '1' as step, 'P' as ex_type, timestamp '2021-01-04 08:00:00' as start_ts, timestamp '2021-01-04 16:00:00' as end_ts from dummy
)
select
b.order_nbr,
b.ex_type,
min(b.start_ts) as start_ts,
max(b.end_ts) as end_ts,
string_agg(b.step, ',') as steps
from
(select
a.order_nbr,
a.step,
a.ex_type,
a.start_ts,
a.end_ts,
sum(a.inc) over (order by a.order_nbr asc, a.start_ts asc) as id
from
(select
s.order_nbr,
s.step,
s.ex_type,
s.start_ts,
s.end_ts,
case
when s.ex_type = 'C' and s.ex_type = lag(s.ex_type) over (partition by s.order_nbr order by s.start_ts)
then 0
else 1
end as inc
from
s
order by
s.order_nbr asc,
s.start_ts asc
) as a
) as b
group by
b.order_nbr,
b.ex_type,
b.id
order by
b.order_nbr asc,
min(b.start_ts) asc