需要对 Oracle 中具有略微不同约束的类似查询执行进行粗略分组
Need to roughly group similar query executions in Oracle that have slightly different constraints
正在评估一些当前数据库计划停用的影响。与最近访问过受影响数据的用户进行单独通信是不可行的,因为数量。
我在想,如果我可以执行某种形式的模糊逻辑查找来按用户对查询进行分组,那么至少我可以识别由于预期约束更改而略有不同的重复查询。尽管远非完美,但这可以帮助表示 运行 定期查询以支持重复出现的业务功能与纯粹的临时查询。
任何人都可以提供一些可以让我开始的想法,或者让我知道是否有任何替代想法可以根据我的上述目标进行研究?
您可以结合使用 UTL_MATCH、DBA_HIST_SQLTEXT 和 DBA_HIST_SQLSTAT 来查找相似的查询执行。如果您没有获得 AWR 许可,或者只对最近的查询感兴趣,您可以使用 GV$SQLSTATS 而不是 DBA_HIST 表。
除了复杂之外,您还需要根据反复试验调整以下查询中的一些文字。目前,它只查看每个用户执行次数最多的前 10 个查询,并且只查找相似度得分大于或等于 60% 的前 5 个最相关的查询。
--Common queries and the top 5 most-closely related queries.
with statements as
(
--All relevant SQL statements
select
sqlstats.parsing_schema_name,
sqlstats.total_executions,
sqltext.sql_id,
--Convert CLOB to VARCHAR for UTL_MATCH.
--Won't matter, since we're only interseted in fuzzy matches anyway.
to_char(substr(sqltext.sql_text, 1, 1000)) sql_text,
sqltext.command_type
from
(
--All queries in AWR.
select sql_id, sql_text, command_type
from dba_hist_sqltext
) sqltext
join
(
--Statistics for all queries in AWR.
select sql_id, parsing_schema_name, sum(executions_delta) total_executions
from dba_hist_sqlstat
group by sql_id, parsing_schema_name
) sqlstats
on sqltext.sql_id = sqlstats.sql_id
order by parsing_schema_name, total_executions desc
)
--Top N most similar queries.
select *
from
(
--Ranked similarity.
select
similarity.*,
row_number() over (partition by sql_id1 order by similarity desc) top_similarity
from
(
--Similarity between SQL statements for the Top N SQL and other SQL run by the same user.
select
top_n.parsing_schema_name, top_n.sql_id sql_id1, top_n.sql_text sql_text1, top_n.total_executions,
statements.sql_id sql_id2, statements.sql_text sql_text2,
utl_match.edit_distance_similarity(top_n.sql_text, statements.sql_text) similarity
from
(
--Top N most executed queries.
select *
from
(
--Most executed queries per user.
select
statements.*,
row_number () over (partition by parsing_schema_name order by total_executions desc) top_n
from statements
order by parsing_schema_name, total_executions desc
)
where top_n <= 10
) top_n
join statements
on top_n.parsing_schema_name = statements.parsing_schema_name
and top_n.command_type = statements.command_type
and top_n.sql_id <> statements.sql_id
order by top_n.sql_id, similarity desc, statements.sql_id
) similarity
) ranked_similarity
where top_similarity <= 5
and similarity >= 60
order by parsing_schema_name, sql_id1, top_similarity;
正在评估一些当前数据库计划停用的影响。与最近访问过受影响数据的用户进行单独通信是不可行的,因为数量。
我在想,如果我可以执行某种形式的模糊逻辑查找来按用户对查询进行分组,那么至少我可以识别由于预期约束更改而略有不同的重复查询。尽管远非完美,但这可以帮助表示 运行 定期查询以支持重复出现的业务功能与纯粹的临时查询。
任何人都可以提供一些可以让我开始的想法,或者让我知道是否有任何替代想法可以根据我的上述目标进行研究?
您可以结合使用 UTL_MATCH、DBA_HIST_SQLTEXT 和 DBA_HIST_SQLSTAT 来查找相似的查询执行。如果您没有获得 AWR 许可,或者只对最近的查询感兴趣,您可以使用 GV$SQLSTATS 而不是 DBA_HIST 表。
除了复杂之外,您还需要根据反复试验调整以下查询中的一些文字。目前,它只查看每个用户执行次数最多的前 10 个查询,并且只查找相似度得分大于或等于 60% 的前 5 个最相关的查询。
--Common queries and the top 5 most-closely related queries.
with statements as
(
--All relevant SQL statements
select
sqlstats.parsing_schema_name,
sqlstats.total_executions,
sqltext.sql_id,
--Convert CLOB to VARCHAR for UTL_MATCH.
--Won't matter, since we're only interseted in fuzzy matches anyway.
to_char(substr(sqltext.sql_text, 1, 1000)) sql_text,
sqltext.command_type
from
(
--All queries in AWR.
select sql_id, sql_text, command_type
from dba_hist_sqltext
) sqltext
join
(
--Statistics for all queries in AWR.
select sql_id, parsing_schema_name, sum(executions_delta) total_executions
from dba_hist_sqlstat
group by sql_id, parsing_schema_name
) sqlstats
on sqltext.sql_id = sqlstats.sql_id
order by parsing_schema_name, total_executions desc
)
--Top N most similar queries.
select *
from
(
--Ranked similarity.
select
similarity.*,
row_number() over (partition by sql_id1 order by similarity desc) top_similarity
from
(
--Similarity between SQL statements for the Top N SQL and other SQL run by the same user.
select
top_n.parsing_schema_name, top_n.sql_id sql_id1, top_n.sql_text sql_text1, top_n.total_executions,
statements.sql_id sql_id2, statements.sql_text sql_text2,
utl_match.edit_distance_similarity(top_n.sql_text, statements.sql_text) similarity
from
(
--Top N most executed queries.
select *
from
(
--Most executed queries per user.
select
statements.*,
row_number () over (partition by parsing_schema_name order by total_executions desc) top_n
from statements
order by parsing_schema_name, total_executions desc
)
where top_n <= 10
) top_n
join statements
on top_n.parsing_schema_name = statements.parsing_schema_name
and top_n.command_type = statements.command_type
and top_n.sql_id <> statements.sql_id
order by top_n.sql_id, similarity desc, statements.sql_id
) similarity
) ranked_similarity
where top_similarity <= 5
and similarity >= 60
order by parsing_schema_name, sql_id1, top_similarity;