涉及 CLOB 数据的 GROUP BY
GROUP BY invloving a CLOB data
三个table、test_3
、test_2
和test_1
之间存在连接。
test_1
和 test_3
是主要的 table,没有共同的列。还有 table test_2
加入。
test_1
有 sr_id
、last_updated_date
、
test_2
有 sr_id
和 sm_id
,test_3
有 sm_id
、sql_statement
。
test_3
有导致所有问题的 clob 数据。
我必须找到与 sm_id
关联的最新 sr_id
。我的想法是使用聚合函数 max(last_updated_date)
并将其分组。
由于很多原因,它没有发生。
它包含列为 sql_statement 的 CLOB 数据。
我使用了一个我不熟悉的连接
任何想法都会有所帮助。
WITH xx as (
(select ANSWER ,sr_id AS ID from test
WHERE Q_ID in (SELECT Q_ID FROM test_2 WHERE field_id='LM_LRE_Q6')
)
)
-- end of source data
SELECT t.ID, t1.n, t1.SM_ID,seg_dtls.SEGMENTation_NAME ,to_char(mst.LAST_UPDATED_DATE,'dd-mon-yyyy hh24:mi:ss'),seg_dtls.sql_statement
FROM xx t
CROSS JOIN LATERAL (
select LEVEL AS n, regexp_substr( t.answer, '\d+', 1, level) as SM_ID
from dual
connect by regexp_substr( t.answer, '\d+', 1, level) IS NOT NULL
) t1
left join test_1 mst
on mst.sr_id=t.id
right join test_3 seg_dtls
on seg_dtls.sm_id=t1.sm_id;
样本数据看起来像
sr_id sm_id SEGMENTATION_NAME LAST_UPDATED_DATE
1108197 958 test_not_in 05-feb-2017 23:56:59
1108217 958 test_not_in 14-feb-2017 00:37:39
1108218 958 test_not_in 14-feb-2017 01:39:50
1108220 958 test_not_in 14-feb-2017 03:39:07
预期输出为
1108220 958 test_not_in 14-feb-2017 03:39:07
我没有发布 CLOB 数据,因为它很大。
每行包含 CLOB 数据。
table test_3 contains
q_id sr_id answer
1009330 1108246 976~feb_24^941~Test_regionwithcountry
1009330 1108247 941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24
1009330 1108239 972~test_emea
1009330 1108240 972~test_emea^827~test_with_region_country
1009330 1108251 981~MSE100579729 testing.
样本数据类似于 test_3
的上方
答案包含 sm_id。我必须从这里把它拉出来。
例如:
941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24
the sm_id is 941,787,976
。
所以我带着上面发布的查询来了。
同样,对于左连接和右连接,需要 test_3 中的所有 sm_id,所以我在这里使用了右连接。
edit1:接受的答案给出了 SR_ID OF SEGMENTS with max(last_updated_date).
我需要全部 SR_ID。因此,我使用 MINUS 运算符来获取不是 max(last_updated_date).
的那些
我需要将该结果集附加到已接受的答案中。
这就是我为获得其他 SR_ID 所做的。
select sr_id,segmentation_name,request_status from (with test_31 (q_id, sr_id, answer) as (
(SELECT Q_ID,SR_ID,ANSWER FROM test_3 WHERE Q_ID=(SELECT Q_ID FROM test_4 WHERE FIELD_ID='LM_LRE_Q6'))
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level),'\d+') as sm_id
from test_31
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select sr_id,
sm_id,
segmentation_name,
LAST_UPDATED_DATE,
sql_statement,request_status
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
t1.request_status
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test1 t1 on t1.sr_id = t3.sr_id
)
)
minus
(select sr_id,segmentation_name , request_status from (with test_31 (q_id, sr_id, answer) as (
(SELECT Q_ID,SR_ID,ANSWER FROM test_3 WHERE Q_ID=(SELECT Q_ID FROM test_4 WHERE FIELD_ID='LM_LRE_Q6'))
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level), '\d+') as sm_id
from test_31
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select sr_id,
segmentation_name,
sql_statement,
request_status
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
t1.request_status,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date));
}
示例数据:
接受的答案给出了以下输出,其中包含该段的 max(last_updated_date) 。
1097661 Submitted o2k lad 30-NOV-15 01-DEC-16 62 CLOB DATA
以上发布的查询 GIVES 下面的输出是 sr_id 具有其他更新日期的段。
1097621 o2k lad Submitted
1097625 o2k lad Submitted
1097627 o2k lad Submitted
1097632 o2k lad Submitted
1097633 o2k lad Submitted
1097658 o2k lad Pending
1097640 o2k lad Submitted
1097644 o2k lad Submitted
1097646 o2k lad Submitted
预期输出:
sr_id status segment_name updated_date sql_statement other_sr_id
1097661 Submitted o2k lad 30-NOV-15 CLOB DATA 1097618,1097621,1097625,1097627,1097632,1097633,1097658,1097640,1097644,1097646
合并两个查询,使最后一列包含所有旧 sr_id。
一个相当简单的选择是修改您当前的查询以添加一个分析函数来查找每个 ID 的最大日期,例如:
..., max(mst.last_updated_date) over (partition by id) as max_updated_date
总体思路的快速演示:
with cte (id, last_updated_date, sql_statement) as (
select 1, date '2017-01-01', to_clob('stmt 1') from dual
union all select 1, date '2017-01-02', to_clob('stmt 2') from dual
union all select 1, date '2017-01-03', to_clob('stmt 3') from dual
union all select 2, date '2017-01-02', to_clob('stmt 4') from dual
)
select id, last_updated_date, sql_statement
from (
select id, last_updated_date, sql_statement,
max(last_updated_date) over (partition by id) as max_updated_date
from cte
)
where last_updated_date = max_updated_date;
ID LAST_UPDAT SQL_STATEMENT
---------- ---------- --------------------------------------------------------------------------------
1 2017-01-03 stmt 3
2 2017-01-02 stmt 4
您可以使用 row_number() 或 rank() 或 dense_rank() 来识别具有最早日期的行并对其进行过滤,但总体思路是相同的。
但是,您当前的查询一开始就不是很清楚(或在 12c 之前有效)。与其尝试猜测如何包含这样的函数和过滤器,不如从您的基表重新开始可能更简单,尽管这对您正在做的事情做出了很多假设,并且可能会忽略一些事情——比如左连接和右连接- 这可能需要也可能不需要。
通过 CTE 编造一些数据:
with test_1 (sr_id, last_updated_date) as (
select 1108197, timestamp '2017-02-05 23:56:59' from dual
union all select 1108217, timestamp '2017-02-14 00:37:39' from dual
union all select 1108218, timestamp '2017-02-14 01:39:50' from dual
union all select 1108220, timestamp '2017-02-14 03:39:07' from dual
),
test_2 (sm_id, segmentation_name, sql_statement) as (
select 958, 'test_not_in', to_clob('select * from dual') from dual
),
test_3 (q_id, sr_id, answer) as (
select 41, 1108197, 958 from dual
union all select 42, 1108217, 958 from dual
union all select 43, 1108218, 958 from dual
union all select 44, 1108220, 958 from dual
),
test_4 (q_id, field_id) as (
select 41, 'LM_LRE_Q6' from dual
union all select 42, 'LM_LRE_Q6' from dual
union all select 43, 'LM_LRE_Q6' from dual
union all select 44, 'LM_LRE_Q6' from dual
)
然后这会得到与您在问题中显示的相同的输出:
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
to_char(t1.last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
t2.sql_statement
from test_4 t4
join test_3 t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.answer
join test_1 t1 on t1.sr_id = t3.sr_id;
SR_ID SM_ID SEGMENTATIO LAST_UPDATED_DATE SQL_STATEMENT
---------- ----- ----------- ----------------------------- --------------------------------------------------------------------------------
1108197 958 test_not_in 05-feb-2017 23:56:59 select * from dual
1108217 958 test_not_in 14-feb-2017 00:37:39 select * from dual
1108218 958 test_not_in 14-feb-2017 01:39:50 select * from dual
1108220 958 test_not_in 14-feb-2017 03:39:07 select * from dual
根据接近右边的疯狂假设,您可以找到每个 sm_id
的最新日期的行,如下所示:
select sr_id,
sm_id,
segmentation_name,
to_char(last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
sql_statement
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join test_3 t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.answer
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date;
SR_ID SM_ID SEGMENTATIO LAST_UPDATED_DATE SQL_STATEMENT
---------- ----- ----------- ----------------------------- --------------------------------------------------------------------------------
1108220 958 test_not_in 14-feb-2017 03:39:07 select * from dual
您需要对其进行调整以处理任何其他不明确的限制或要求(例如,包括您的 left/right 外部联接)。
我故意忽略了您正在执行的将 'answer' 拆分为多个值的子查询。可能你有一些可怕的东西,比如里面有一个分隔的 ID 列表,这是一个数据模型问题。如果是这种情况,那么您仍然需要提取单个 sm_id
值;类似于:
with answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '\d+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '\d+', 1, level) is not null
)
select sr_id,
sm_id,
segmentation_name,
to_char(last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
sql_statement
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date;
根据您添加的 test3
的实际内容,您的正则表达式没有完全满足您的需要。使用您正在使用的模式,它会找到 14 个数值,即任何数字:
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '\d+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '\d+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------
1009330 1108239 972
1009330 1108240 972
1009330 1108240 827
1009330 1108246 976
1009330 1108246 24
1009330 1108246 941
1009330 1108247 941
1009330 1108247 2016
1009330 1108247 787
1009330 1108247 28
1009330 1108247 976
1009330 1108247 24
1009330 1108251 981
1009330 1108251 100579729
看来您只需要 ^ 分隔符和 ~ 标记之间的位。分割定界字符串的常用方法是:
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '[^^]+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------------------------------------
1009330 1108239 972~test_emea
1009330 1108240 972~test_emea
1009330 1108240 827~test_with_region_country
1009330 1108246 976~feb_24
1009330 1108246 941~Test_regionwithcountry
1009330 1108247 941~Test_regionwithcountry_2016
1009330 1108247 787~Test_Request_28
1009330 1108247 976~feb_24
1009330 1108251 981~MSE100579729 testing.
但是你需要得到它的第一部分,例如借用你原来的模式(其他的也可以!):
column sm_id format a10
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level), '\d+') as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------
1009330 1108239 972
1009330 1108240 972
1009330 1108240 827
1009330 1108246 976
1009330 1108246 941
1009330 1108247 941
1009330 1108247 787
1009330 1108247 976
1009330 1108251 981
注意额外的 regexp_substr()
仅在 select 列表中,不 connect-by 子句;并且提取物 sm_id
仍然是一个字符串。如果 test_2.sm_id
是一个数字,那么也在 select 列表中的一对子字符串周围添加一个 to_number()
调用。
三个table、test_3
、test_2
和test_1
之间存在连接。
test_1
和 test_3
是主要的 table,没有共同的列。还有 table test_2
加入。
test_1
有 sr_id
、last_updated_date
、
test_2
有 sr_id
和 sm_id
,test_3
有 sm_id
、sql_statement
。
test_3
有导致所有问题的 clob 数据。
我必须找到与 sm_id
关联的最新 sr_id
。我的想法是使用聚合函数 max(last_updated_date)
并将其分组。
由于很多原因,它没有发生。
它包含列为 sql_statement 的 CLOB 数据。
我使用了一个我不熟悉的连接
任何想法都会有所帮助。
WITH xx as (
(select ANSWER ,sr_id AS ID from test
WHERE Q_ID in (SELECT Q_ID FROM test_2 WHERE field_id='LM_LRE_Q6')
)
)
-- end of source data
SELECT t.ID, t1.n, t1.SM_ID,seg_dtls.SEGMENTation_NAME ,to_char(mst.LAST_UPDATED_DATE,'dd-mon-yyyy hh24:mi:ss'),seg_dtls.sql_statement
FROM xx t
CROSS JOIN LATERAL (
select LEVEL AS n, regexp_substr( t.answer, '\d+', 1, level) as SM_ID
from dual
connect by regexp_substr( t.answer, '\d+', 1, level) IS NOT NULL
) t1
left join test_1 mst
on mst.sr_id=t.id
right join test_3 seg_dtls
on seg_dtls.sm_id=t1.sm_id;
样本数据看起来像
sr_id sm_id SEGMENTATION_NAME LAST_UPDATED_DATE
1108197 958 test_not_in 05-feb-2017 23:56:59
1108217 958 test_not_in 14-feb-2017 00:37:39
1108218 958 test_not_in 14-feb-2017 01:39:50
1108220 958 test_not_in 14-feb-2017 03:39:07
预期输出为
1108220 958 test_not_in 14-feb-2017 03:39:07
我没有发布 CLOB 数据,因为它很大。 每行包含 CLOB 数据。
table test_3 contains
q_id sr_id answer
1009330 1108246 976~feb_24^941~Test_regionwithcountry
1009330 1108247 941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24
1009330 1108239 972~test_emea
1009330 1108240 972~test_emea^827~test_with_region_country
1009330 1108251 981~MSE100579729 testing.
样本数据类似于 test_3
的上方
答案包含 sm_id。我必须从这里把它拉出来。
例如:
941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24
the sm_id is 941,787,976
。
所以我带着上面发布的查询来了。
同样,对于左连接和右连接,需要 test_3 中的所有 sm_id,所以我在这里使用了右连接。
edit1:接受的答案给出了 SR_ID OF SEGMENTS with max(last_updated_date).
我需要全部 SR_ID。因此,我使用 MINUS 运算符来获取不是 max(last_updated_date).
的那些
我需要将该结果集附加到已接受的答案中。
这就是我为获得其他 SR_ID 所做的。
select sr_id,segmentation_name,request_status from (with test_31 (q_id, sr_id, answer) as (
(SELECT Q_ID,SR_ID,ANSWER FROM test_3 WHERE Q_ID=(SELECT Q_ID FROM test_4 WHERE FIELD_ID='LM_LRE_Q6'))
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level),'\d+') as sm_id
from test_31
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select sr_id,
sm_id,
segmentation_name,
LAST_UPDATED_DATE,
sql_statement,request_status
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
t1.request_status
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test1 t1 on t1.sr_id = t3.sr_id
)
)
minus
(select sr_id,segmentation_name , request_status from (with test_31 (q_id, sr_id, answer) as (
(SELECT Q_ID,SR_ID,ANSWER FROM test_3 WHERE Q_ID=(SELECT Q_ID FROM test_4 WHERE FIELD_ID='LM_LRE_Q6'))
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level), '\d+') as sm_id
from test_31
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select sr_id,
segmentation_name,
sql_statement,
request_status
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
t1.request_status,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date));
}
示例数据:
接受的答案给出了以下输出,其中包含该段的 max(last_updated_date) 。
1097661 Submitted o2k lad 30-NOV-15 01-DEC-16 62 CLOB DATA
以上发布的查询 GIVES 下面的输出是 sr_id 具有其他更新日期的段。
1097621 o2k lad Submitted
1097625 o2k lad Submitted
1097627 o2k lad Submitted
1097632 o2k lad Submitted
1097633 o2k lad Submitted
1097658 o2k lad Pending
1097640 o2k lad Submitted
1097644 o2k lad Submitted
1097646 o2k lad Submitted
预期输出:
sr_id status segment_name updated_date sql_statement other_sr_id
1097661 Submitted o2k lad 30-NOV-15 CLOB DATA 1097618,1097621,1097625,1097627,1097632,1097633,1097658,1097640,1097644,1097646
合并两个查询,使最后一列包含所有旧 sr_id。
一个相当简单的选择是修改您当前的查询以添加一个分析函数来查找每个 ID 的最大日期,例如:
..., max(mst.last_updated_date) over (partition by id) as max_updated_date
总体思路的快速演示:
with cte (id, last_updated_date, sql_statement) as (
select 1, date '2017-01-01', to_clob('stmt 1') from dual
union all select 1, date '2017-01-02', to_clob('stmt 2') from dual
union all select 1, date '2017-01-03', to_clob('stmt 3') from dual
union all select 2, date '2017-01-02', to_clob('stmt 4') from dual
)
select id, last_updated_date, sql_statement
from (
select id, last_updated_date, sql_statement,
max(last_updated_date) over (partition by id) as max_updated_date
from cte
)
where last_updated_date = max_updated_date;
ID LAST_UPDAT SQL_STATEMENT
---------- ---------- --------------------------------------------------------------------------------
1 2017-01-03 stmt 3
2 2017-01-02 stmt 4
您可以使用 row_number() 或 rank() 或 dense_rank() 来识别具有最早日期的行并对其进行过滤,但总体思路是相同的。
但是,您当前的查询一开始就不是很清楚(或在 12c 之前有效)。与其尝试猜测如何包含这样的函数和过滤器,不如从您的基表重新开始可能更简单,尽管这对您正在做的事情做出了很多假设,并且可能会忽略一些事情——比如左连接和右连接- 这可能需要也可能不需要。
通过 CTE 编造一些数据:
with test_1 (sr_id, last_updated_date) as (
select 1108197, timestamp '2017-02-05 23:56:59' from dual
union all select 1108217, timestamp '2017-02-14 00:37:39' from dual
union all select 1108218, timestamp '2017-02-14 01:39:50' from dual
union all select 1108220, timestamp '2017-02-14 03:39:07' from dual
),
test_2 (sm_id, segmentation_name, sql_statement) as (
select 958, 'test_not_in', to_clob('select * from dual') from dual
),
test_3 (q_id, sr_id, answer) as (
select 41, 1108197, 958 from dual
union all select 42, 1108217, 958 from dual
union all select 43, 1108218, 958 from dual
union all select 44, 1108220, 958 from dual
),
test_4 (q_id, field_id) as (
select 41, 'LM_LRE_Q6' from dual
union all select 42, 'LM_LRE_Q6' from dual
union all select 43, 'LM_LRE_Q6' from dual
union all select 44, 'LM_LRE_Q6' from dual
)
然后这会得到与您在问题中显示的相同的输出:
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
to_char(t1.last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
t2.sql_statement
from test_4 t4
join test_3 t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.answer
join test_1 t1 on t1.sr_id = t3.sr_id;
SR_ID SM_ID SEGMENTATIO LAST_UPDATED_DATE SQL_STATEMENT
---------- ----- ----------- ----------------------------- --------------------------------------------------------------------------------
1108197 958 test_not_in 05-feb-2017 23:56:59 select * from dual
1108217 958 test_not_in 14-feb-2017 00:37:39 select * from dual
1108218 958 test_not_in 14-feb-2017 01:39:50 select * from dual
1108220 958 test_not_in 14-feb-2017 03:39:07 select * from dual
根据接近右边的疯狂假设,您可以找到每个 sm_id
的最新日期的行,如下所示:
select sr_id,
sm_id,
segmentation_name,
to_char(last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
sql_statement
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join test_3 t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.answer
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date;
SR_ID SM_ID SEGMENTATIO LAST_UPDATED_DATE SQL_STATEMENT
---------- ----- ----------- ----------------------------- --------------------------------------------------------------------------------
1108220 958 test_not_in 14-feb-2017 03:39:07 select * from dual
您需要对其进行调整以处理任何其他不明确的限制或要求(例如,包括您的 left/right 外部联接)。
我故意忽略了您正在执行的将 'answer' 拆分为多个值的子查询。可能你有一些可怕的东西,比如里面有一个分隔的 ID 列表,这是一个数据模型问题。如果是这种情况,那么您仍然需要提取单个 sm_id
值;类似于:
with answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '\d+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '\d+', 1, level) is not null
)
select sr_id,
sm_id,
segmentation_name,
to_char(last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date,
sql_statement
from (
select t1.sr_id,
t2.sm_id,
t2.segmentation_name,
t1.last_updated_date,
t2.sql_statement,
max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date
from test_4 t4
join answer_extraction t3 on t3.q_id = t4.q_id
join test_2 t2 on t2.sm_id = t3.sm_id
join test_1 t1 on t1.sr_id = t3.sr_id
)
where last_updated_date = max_updated_date;
根据您添加的 test3
的实际内容,您的正则表达式没有完全满足您的需要。使用您正在使用的模式,它会找到 14 个数值,即任何数字:
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '\d+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '\d+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------
1009330 1108239 972
1009330 1108240 972
1009330 1108240 827
1009330 1108246 976
1009330 1108246 24
1009330 1108246 941
1009330 1108247 941
1009330 1108247 2016
1009330 1108247 787
1009330 1108247 28
1009330 1108247 976
1009330 1108247 24
1009330 1108251 981
1009330 1108251 100579729
看来您只需要 ^ 分隔符和 ~ 标记之间的位。分割定界字符串的常用方法是:
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id, regexp_substr(answer, '[^^]+', 1, level) as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------------------------------------
1009330 1108239 972~test_emea
1009330 1108240 972~test_emea
1009330 1108240 827~test_with_region_country
1009330 1108246 976~feb_24
1009330 1108246 941~Test_regionwithcountry
1009330 1108247 941~Test_regionwithcountry_2016
1009330 1108247 787~Test_Request_28
1009330 1108247 976~feb_24
1009330 1108251 981~MSE100579729 testing.
但是你需要得到它的第一部分,例如借用你原来的模式(其他的也可以!):
column sm_id format a10
with test_3 (q_id, sr_id, answer) as (
select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual
union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual
union all select 1009330, 1108239, '972~test_emea' from dual
union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual
union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual
),
answer_extraction as (
select q_id, sr_id,
regexp_substr(regexp_substr(answer, '[^^]+', 1, level), '\d+') as sm_id
from test_3
connect by q_id = prior q_id
and sr_id = prior sr_id
and prior dbms_random.value is not null
and regexp_substr(answer, '[^^]+', 1, level) is not null
)
select * from answer_extraction;
Q_ID SR_ID SM_ID
---------- ---------- ----------
1009330 1108239 972
1009330 1108240 972
1009330 1108240 827
1009330 1108246 976
1009330 1108246 941
1009330 1108247 941
1009330 1108247 787
1009330 1108247 976
1009330 1108251 981
注意额外的 regexp_substr()
仅在 select 列表中,不 connect-by 子句;并且提取物 sm_id
仍然是一个字符串。如果 test_2.sm_id
是一个数字,那么也在 select 列表中的一对子字符串周围添加一个 to_number()
调用。