Oracle SQL 基于月份的数据迁移行到列
Oracle SQL data migration row to column based in month
CODE1 CODE2 CODE3 RATE VALUE MONTH
A B C 1 1 202001
A B C 1 1 202002
A B C 1 1 202003
A B C 2 1 202004
A B C 2 1 202005
A B C 1 1 202006
A B C 1 1 202007
A B C 1 1 202008
A B C 1 1 202009
我正在将数据从旧系统迁移到新系统。
作为每月维护的旧系统数据的一部分,如果数据更新并且 table 一个月包含一行,则将更新同一行
我正在迁移到新闻系统,它包含开始日期和结束日期以制作活动记录。所以更新新数据需要插入和更新旧行结束日期
我的预期数据
CODE1 CODE2 CODE3 RATE VALUE START_DT END_DT
A B C 1 1 20200101 20200331
A B C 2 1 20200401 20200531
A B C 1 1 20200601 99991230
如果数据有效,我们会将日期更新为无穷大,所以 999912
但我只得到两条记录,我的查询如下
CODE1 CODE2 CODE3 RATE VALUE START_DT END_DT
A B C 2 1 20200401 20200531
A B C 1 1 20200601 99991230
SELECT CODE1, CODE2, CODE3 RATE, VALUE,
TO_DATE(MIN(bus_month), 'yyyymm') AS START_DT,
last_day(TO_DATE(replace(MAX(bus_month), $CURRENTMONTG, '999912'), 'yyyymm')) AS end_date
FROM TEST_TABLE
GROUP BY CODE1, CODE2, CODE3, RATE, VALUE
因为我正在根据 CODE1、CODE2、CODE3、RATE、VALUE 进行分组并根据分组获取最新数据,但我无法获取旧数据
请帮助我获得预期的 table 结构。
提前致谢
如果需要更多详细信息,请发表评论
这是一个 gaps-and-islands 问题,您希望将具有相同速率和值的“相邻”行组合在一起。
一种方法使用行号之间的差异来构建组。假设这三个代码定义了基本组,并且您希望在比率或值发生变化时分成一个新行:
select code1, code2, code3, rate, value, min(month) start_dt,
case when row_number() over(partition by code1, code2, code3 order by max(month) desc) = 1 then 999912 else max(month) end end_dt
from (
select t.*,
row_number() over(partition by code1, code2, code3 order by month) rn1,
row_number() over(partition by code1, code2, code3, rate, value order by month) rn2
from mytable t
) t
group by code1, code2, code3, rate, value, rn1 - rn2
order by start_dt
外部查询中的条件表达式将“最后”期间的结束日期设置为 999912
。
CODE1 | CODE2 | CODE3 | RATE | VALUE | START_DT | END_DT
:---- | :---- | :---- | ---: | ----: | -------: | -----:
A | B | C | 1 | 1 | 202001 | 202003
A | B | C | 2 | 1 | 202004 | 202005
A | B | C | 1 | 1 | 202006 | 999912
您可以使用 MATCH_RECOGNIZE
对数据进行 row-by-row 比较:
SELECT code1,
code2,
code3,
rate,
value,
start_dt,
CASE end_dt
WHEN TO_NUMBER( TO_CHAR( SYSDATE, 'YYYYMM' ) )
THEN 999912
ELSE end_dt
END AS end_dt
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY code1, code2, code3
ORDER BY month
MEASURES FIRST( rate ) AS rate,
FIRST( value ) AS value,
FIRST( month ) AS start_dt,
LAST( month ) AS end_dt
ONE ROW PER MATCH
PATTERN (FIRST_ROW EQUAL_ROWS*)
DEFINE EQUAL_ROWS AS (
EQUAL_ROWS.rate = PREV(EQUAL_ROWS.rate)
AND EQUAL_ROWS.value = PREV(EQUAL_ROWS.value)
AND TO_DATE( EQUAL_ROWS.month, 'YYYYMM' )
= ADD_MONTHS( TO_DATE( PREV(EQUAL_ROWS.month), 'YYYYMM' ), 1 )
)
)
因此,对于您的示例数据:
CREATE TABLE table_name ( CODE1, CODE2, CODE3, RATE, VALUE, MONTH ) AS
SELECT 'A', 'B', 'C', 1, 1, 201912 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202001 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202002 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202003 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 2, 1, 202004 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 2, 1, 202005 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202006 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202007 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202008 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202009 FROM DUAL;
这输出:
CODE1 | CODE2 | CODE3 | RATE | VALUE | START_DT | END_DT
:---- | :---- | :---- | ---: | ----: | -------: | -----:
A | B | C | 1 | 1 | 201912 | 202003
A | B | C | 2 | 1 | 202004 | 202005
A | B | C | 1 | 1 | 202006 | 999912
db<>fiddle here
甲骨文SQL:
SELECT
code1,code2,code3,rate,value,min(MONTH) start_dt,
CASE
WHEN ROW_NUMBER() OVER(PARTITION BY code1, code2, code3 ORDER BYmax(MONTH) DESC) = 1 THEN 99991230
ELSE max(MONTH)
END end_dt
FROM
(
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY code1, code2, code3 ORDER BY MONTH) rn1,
ROW_NUMBER() OVER(PARTITION BY code1, code2, code3, rate, value ORDERBY MONTH) rn2
FROM
TBLTEST t
) t
GROUP BY
code1,code2,code3,rate,value,rn1 - rn2
ORDER BY
start_dt
以自然的思维方式完成任务是相当简单的。我们比较前五列行之间的相邻值,当值相同时将当前行和上一行放在同一组中,如果不同则创建一个新组,直到比较最后一条记录。由于SQL集合是无序的,我们需要先以极其复杂的方式手动创建两列索引,然后根据两列索引之间的关系进行分组。你需要非常聪明才能想出解决方案。
但使用开源集算器SPL编写代码很容易:
A
1
=connect("oracle")
2
=A1.query@x("SELECT * FROM TBLTEST ORDER BY MONTH")
3
=A2.groups@o(CODE1,CODE2,CODE3,RATE,VALUE;min(MONTH)/"01":STARTDT,string(date((max(MONTH)+1)/"01","yyyyMMdd")-1,"yyyyMMdd"):ENDDT)
4
>A3.m(-1).modify("99991230":ENDDT)
SPL直接支持ordered sets,当相邻值不同时可以方便的进行分组。
CODE1 CODE2 CODE3 RATE VALUE MONTH
A B C 1 1 202001
A B C 1 1 202002
A B C 1 1 202003
A B C 2 1 202004
A B C 2 1 202005
A B C 1 1 202006
A B C 1 1 202007
A B C 1 1 202008
A B C 1 1 202009
我正在将数据从旧系统迁移到新系统。 作为每月维护的旧系统数据的一部分,如果数据更新并且 table 一个月包含一行,则将更新同一行 我正在迁移到新闻系统,它包含开始日期和结束日期以制作活动记录。所以更新新数据需要插入和更新旧行结束日期
我的预期数据
CODE1 CODE2 CODE3 RATE VALUE START_DT END_DT
A B C 1 1 20200101 20200331
A B C 2 1 20200401 20200531
A B C 1 1 20200601 99991230
如果数据有效,我们会将日期更新为无穷大,所以 999912
但我只得到两条记录,我的查询如下
CODE1 CODE2 CODE3 RATE VALUE START_DT END_DT
A B C 2 1 20200401 20200531
A B C 1 1 20200601 99991230
SELECT CODE1, CODE2, CODE3 RATE, VALUE,
TO_DATE(MIN(bus_month), 'yyyymm') AS START_DT,
last_day(TO_DATE(replace(MAX(bus_month), $CURRENTMONTG, '999912'), 'yyyymm')) AS end_date
FROM TEST_TABLE
GROUP BY CODE1, CODE2, CODE3, RATE, VALUE
因为我正在根据 CODE1、CODE2、CODE3、RATE、VALUE 进行分组并根据分组获取最新数据,但我无法获取旧数据
请帮助我获得预期的 table 结构。 提前致谢
如果需要更多详细信息,请发表评论
这是一个 gaps-and-islands 问题,您希望将具有相同速率和值的“相邻”行组合在一起。
一种方法使用行号之间的差异来构建组。假设这三个代码定义了基本组,并且您希望在比率或值发生变化时分成一个新行:
select code1, code2, code3, rate, value, min(month) start_dt,
case when row_number() over(partition by code1, code2, code3 order by max(month) desc) = 1 then 999912 else max(month) end end_dt
from (
select t.*,
row_number() over(partition by code1, code2, code3 order by month) rn1,
row_number() over(partition by code1, code2, code3, rate, value order by month) rn2
from mytable t
) t
group by code1, code2, code3, rate, value, rn1 - rn2
order by start_dt
外部查询中的条件表达式将“最后”期间的结束日期设置为 999912
。
CODE1 | CODE2 | CODE3 | RATE | VALUE | START_DT | END_DT :---- | :---- | :---- | ---: | ----: | -------: | -----: A | B | C | 1 | 1 | 202001 | 202003 A | B | C | 2 | 1 | 202004 | 202005 A | B | C | 1 | 1 | 202006 | 999912
您可以使用 MATCH_RECOGNIZE
对数据进行 row-by-row 比较:
SELECT code1,
code2,
code3,
rate,
value,
start_dt,
CASE end_dt
WHEN TO_NUMBER( TO_CHAR( SYSDATE, 'YYYYMM' ) )
THEN 999912
ELSE end_dt
END AS end_dt
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY code1, code2, code3
ORDER BY month
MEASURES FIRST( rate ) AS rate,
FIRST( value ) AS value,
FIRST( month ) AS start_dt,
LAST( month ) AS end_dt
ONE ROW PER MATCH
PATTERN (FIRST_ROW EQUAL_ROWS*)
DEFINE EQUAL_ROWS AS (
EQUAL_ROWS.rate = PREV(EQUAL_ROWS.rate)
AND EQUAL_ROWS.value = PREV(EQUAL_ROWS.value)
AND TO_DATE( EQUAL_ROWS.month, 'YYYYMM' )
= ADD_MONTHS( TO_DATE( PREV(EQUAL_ROWS.month), 'YYYYMM' ), 1 )
)
)
因此,对于您的示例数据:
CREATE TABLE table_name ( CODE1, CODE2, CODE3, RATE, VALUE, MONTH ) AS
SELECT 'A', 'B', 'C', 1, 1, 201912 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202001 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202002 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202003 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 2, 1, 202004 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 2, 1, 202005 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202006 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202007 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202008 FROM DUAL UNION ALL
SELECT 'A', 'B', 'C', 1, 1, 202009 FROM DUAL;
这输出:
CODE1 | CODE2 | CODE3 | RATE | VALUE | START_DT | END_DT :---- | :---- | :---- | ---: | ----: | -------: | -----: A | B | C | 1 | 1 | 201912 | 202003 A | B | C | 2 | 1 | 202004 | 202005 A | B | C | 1 | 1 | 202006 | 999912
db<>fiddle here
甲骨文SQL:
SELECT
code1,code2,code3,rate,value,min(MONTH) start_dt,
CASE
WHEN ROW_NUMBER() OVER(PARTITION BY code1, code2, code3 ORDER BYmax(MONTH) DESC) = 1 THEN 99991230
ELSE max(MONTH)
END end_dt
FROM
(
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY code1, code2, code3 ORDER BY MONTH) rn1,
ROW_NUMBER() OVER(PARTITION BY code1, code2, code3, rate, value ORDERBY MONTH) rn2
FROM
TBLTEST t
) t
GROUP BY
code1,code2,code3,rate,value,rn1 - rn2
ORDER BY
start_dt
以自然的思维方式完成任务是相当简单的。我们比较前五列行之间的相邻值,当值相同时将当前行和上一行放在同一组中,如果不同则创建一个新组,直到比较最后一条记录。由于SQL集合是无序的,我们需要先以极其复杂的方式手动创建两列索引,然后根据两列索引之间的关系进行分组。你需要非常聪明才能想出解决方案。
但使用开源集算器SPL编写代码很容易:
A | |
---|---|
1 | =connect("oracle") |
2 | =A1.query@x("SELECT * FROM TBLTEST ORDER BY MONTH") |
3 | =A2.groups@o(CODE1,CODE2,CODE3,RATE,VALUE;min(MONTH)/"01":STARTDT,string(date((max(MONTH)+1)/"01","yyyyMMdd")-1,"yyyyMMdd"):ENDDT) |
4 | >A3.m(-1).modify("99991230":ENDDT) |
SPL直接支持ordered sets,当相邻值不同时可以方便的进行分组。