制作一条记录,将其重复记录基于两列的组合并分配 Min start_date 和 Max end_date
Make a record combining its duplicate records based combination of two columns and assign the Min start_date and Max end_date
这张图片将向您展示问题结构
这应该是所需的输出
我想填充 Min Start_Date 和 Max End_Date 来自该记录的重复记录(基于 Present 和 Absent 列的重复)和制作主记录,与 Present 和 Absent 列的其他组合相同。
我已经根据 ID 和开始日期完成排序,以了解数据的行为。
这个场景我只提到了一个 ID,其他 ID 也应该给出相同类型的输出,如果我得到这个例子的有效解决方案,我可以在整个 table 上实现正确的逻辑。
我试过使用 window 函数但没有找到任何解决方案。
提前致谢
请注意 ID 的重复记录数是可变的。
这是一个 gaps-and-islands 问题。考虑使用行号之间的差异来构建“相邻”记录组,然后您可以合并:
select id, min(start_date) start_date, max(end_date end_date, present, absent
from (
select t.*,
row_number() over(partition by id order by start_date) rn1,
row_number() over(partition by id, present, absent order by start_date) rn2
from mytable t
) t
group by id, present, absent, rn1 - rn2
order by 1, 2
您可以使用 MATCH_RECOGNIZE
:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY start_date
MEASURES FIRST( start_date ) AS start_date,
MAX( end_date ) AS end_date,
FIRST( present ) AS present,
FIRST( absent ) AS absent
ONE ROW PER MATCH
PATTERN (FIRST_ROW EQUAL_ROWS*)
DEFINE EQUAL_ROWS AS
(
(
(
EQUAL_ROWS.present = PREV(EQUAL_ROWS.present)
) OR (
EQUAL_ROWS.present IS NULL AND PREV(EQUAL_ROWS.present) IS NULL
)
) AND (
(
EQUAL_ROWS.absent = PREV(EQUAL_ROWS.absent)
) OR (
EQUAL_ROWS.absent IS NULL AND PREV(EQUAL_ROWS.absent) IS NULL
)
)
)
)
因此,对于您的示例数据:
CREATE TABLE table_name ( id, start_date, end_date, present, absent ) AS
SELECT 1, DATE '2020-02-01', DATE '2020-03-01', 'Y', 'N' FROM DUAL UNION ALL
SELECT 1, DATE '2020-03-04', DATE '2020-04-19', 'Y', 'N' FROM DUAL UNION ALL
SELECT 1, DATE '2020-03-06', DATE '2020-03-09', 'N', 'N' FROM DUAL UNION ALL
SELECT 1, DATE '2020-05-04', DATE '2020-09-04', 'N', 'Y' FROM DUAL UNION ALL
SELECT 1, DATE '2020-05-06', DATE '2020-06-26', 'N', 'Y' FROM DUAL UNION ALL
SELECT 1, DATE '2020-07-12', DATE '2020-08-12', NULL, NULL FROM DUAL UNION ALL
SELECT 1, DATE '2020-08-13', DATE '2020-08-12', NULL, NULL FROM DUAL;
这输出:
ID | START_DATE | END_DATE | PRESENT | ABSENT
-: | :--------- | :-------- | :------ | :-----
1 | 01-FEB-20 | 19-APR-20 | Y | N
1 | 06-MAR-20 | 09-MAR-20 | N | N
1 | 04-MAY-20 | 04-SEP-20 | N | Y
1 | 12-JUL-20 | 12-AUG-20 | null | null
db<>fiddle here
这张图片将向您展示问题结构
这应该是所需的输出
我想填充 Min Start_Date 和 Max End_Date 来自该记录的重复记录(基于 Present 和 Absent 列的重复)和制作主记录,与 Present 和 Absent 列的其他组合相同。
我已经根据 ID 和开始日期完成排序,以了解数据的行为。 这个场景我只提到了一个 ID,其他 ID 也应该给出相同类型的输出,如果我得到这个例子的有效解决方案,我可以在整个 table 上实现正确的逻辑。 我试过使用 window 函数但没有找到任何解决方案。 提前致谢
请注意 ID 的重复记录数是可变的。
这是一个 gaps-and-islands 问题。考虑使用行号之间的差异来构建“相邻”记录组,然后您可以合并:
select id, min(start_date) start_date, max(end_date end_date, present, absent
from (
select t.*,
row_number() over(partition by id order by start_date) rn1,
row_number() over(partition by id, present, absent order by start_date) rn2
from mytable t
) t
group by id, present, absent, rn1 - rn2
order by 1, 2
您可以使用 MATCH_RECOGNIZE
:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY start_date
MEASURES FIRST( start_date ) AS start_date,
MAX( end_date ) AS end_date,
FIRST( present ) AS present,
FIRST( absent ) AS absent
ONE ROW PER MATCH
PATTERN (FIRST_ROW EQUAL_ROWS*)
DEFINE EQUAL_ROWS AS
(
(
(
EQUAL_ROWS.present = PREV(EQUAL_ROWS.present)
) OR (
EQUAL_ROWS.present IS NULL AND PREV(EQUAL_ROWS.present) IS NULL
)
) AND (
(
EQUAL_ROWS.absent = PREV(EQUAL_ROWS.absent)
) OR (
EQUAL_ROWS.absent IS NULL AND PREV(EQUAL_ROWS.absent) IS NULL
)
)
)
)
因此,对于您的示例数据:
CREATE TABLE table_name ( id, start_date, end_date, present, absent ) AS
SELECT 1, DATE '2020-02-01', DATE '2020-03-01', 'Y', 'N' FROM DUAL UNION ALL
SELECT 1, DATE '2020-03-04', DATE '2020-04-19', 'Y', 'N' FROM DUAL UNION ALL
SELECT 1, DATE '2020-03-06', DATE '2020-03-09', 'N', 'N' FROM DUAL UNION ALL
SELECT 1, DATE '2020-05-04', DATE '2020-09-04', 'N', 'Y' FROM DUAL UNION ALL
SELECT 1, DATE '2020-05-06', DATE '2020-06-26', 'N', 'Y' FROM DUAL UNION ALL
SELECT 1, DATE '2020-07-12', DATE '2020-08-12', NULL, NULL FROM DUAL UNION ALL
SELECT 1, DATE '2020-08-13', DATE '2020-08-12', NULL, NULL FROM DUAL;
这输出:
ID | START_DATE | END_DATE | PRESENT | ABSENT -: | :--------- | :-------- | :------ | :----- 1 | 01-FEB-20 | 19-APR-20 | Y | N 1 | 06-MAR-20 | 09-MAR-20 | N | N 1 | 04-MAY-20 | 04-SEP-20 | N | Y 1 | 12-JUL-20 | 12-AUG-20 | null | null
db<>fiddle here