Group BY 查询利用索引,但 window 函数查询不
Group BY query utilizes indexes, but window function query doesn't
我使用 IBM 的 COTS 系统,称为 Maximo Asset Management。系统有一个 WORKORDER table 有 350,000 行。
Maximo 有一个名为 relationships 的概念,可用于从相关记录中提取数据。
人际关系如何运作:
对于每个人 WORKORDER 记录,系统使用 WHERE 子句从关系到 运行 一个 select 查询来拉入相关记录( screenshot).
相关记录:
在这种情况下,相关记录是名为 WOTASKROLLUP_VW.
的自定义数据库视图中的行
在相关的 post 中,我探索了可以在视图中使用的不同 SQL 汇总技术:Group by x, get other fields too。当我在完整的 WORKORDER table.
上 运行 时,我探索的选项彼此表现相似
然而,实际上,Maximo 被设计为一次只能获取一行——通过单独的 select 语句。因此,当仅 select 单个 WORKORDER 记录时,查询 表现非常不同 。
我已经将每个查询包装在一个外部查询中,其中包含一个 select 特定工作订单的 WHERE 子句。我这样做是为了模仿 Maximo 在使用关系时所做的事情。
查询 1b:(GROUP BY;select5 个聚合)
性能 非常好,即使只 selecting 一条记录,因为使用了索引(仅 37 毫秒)。
select
*
from
(
select
wogroup as wonum,
sum(actlabcost) as actlabcost_tasks_incl,
sum(actmatcost) as actmatcost_tasks_incl,
sum(acttoolcost) as acttoolcost_tasks_incl,
sum(actservcost) as actservcost_tasks_incl,
sum(actlabcost + actmatcost + acttoolcost + actservcost) as acttotalcost_tasks_incl,
max(case when istask = 0 then rowstamp end) as other_wo_columns
from
maximo.workorder
group by
wogroup
)
where
wonum in ('WO360996')
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 4 (0)| 00:00:01 |
| 1 | SORT GROUP BY NOSORT | | 1 | 34 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| WORKORDER | 1 | 34 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | WORKORDER_NDX32 | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("WOGROUP"='WO360996')
查询 #2:(SUM window 函数)
性能相对慢,当select单个记录时,因为没有使用索引(3 秒)。
select
*
from
(
select
wonum,
actlabcost_tasks_incl,
actmatcost_tasks_incl,
acttoolcost_tasks_incl,
actservcost_tasks_incl,
acttotalcost_tasks_incl,
other_wo_columns
from
(
select
wonum,
istask,
sum(actlabcost ) over (partition by wogroup) as actlabcost_tasks_incl,
sum(actmatcost ) over (partition by wogroup) as actmatcost_tasks_incl,
sum(acttoolcost) over (partition by wogroup) as acttoolcost_tasks_incl,
sum(actservcost) over (partition by wogroup) as actservcost_tasks_incl,
sum(actlabcost + actmatcost + acttoolcost + actservcost) over (partition by wogroup) as acttotalcost_tasks_incl,
rowstamp as other_wo_columns
from
maximo.workorder
)
where
istask = 0
)
where
wonum in ('WO360996')
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 355K| 61M| | 14789 (1)| 00:00:01 |
|* 1 | VIEW | | 355K| 61M| | 14789 (1)| 00:00:01 |
| 2 | WINDOW SORT | | 355K| 14M| 21M| 14789 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| WORKORDER | 355K| 14M| | 10863 (2)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("WONUM"='WO360996' AND "ISTASK"=0)
问题:
为什么 #1B 中的 GROUP BY 查询能够使用索引(快),而 #2 中的 Sum Window 函数不能使用索引(慢)?
您的两个查询与您使用的第一个查询不同:
select wogroup as wonum,
而您刚刚使用的第二个:
select wonum,
这意味着您不会在 WOGROUP
上使用索引,因为您在 WONUM
列而不是 WOGROUP
列上进行过滤(恰好别名为WONUM
).
看起来你的第二个查询可以更正和减少(通过将过滤器移动到内部子查询并删除分区,因为你已经在过滤)到:
select wonum,
actlabcost_tasks_incl,
actmatcost_tasks_incl,
acttoolcost_tasks_incl,
actservcost_tasks_incl,
acttotalcost_tasks_incl,
other_wo_columns
from (
select wogroup AS wonum,
istask,
sum(actlabcost ) over () as actlabcost_tasks_incl,
sum(actmatcost ) over () as actmatcost_tasks_incl,
sum(acttoolcost) over () as acttoolcost_tasks_incl,
sum(actservcost) over () as actservcost_tasks_incl,
sum(actlabcost + actmatcost + acttoolcost + actservcost) over () as acttotalcost_tasks_incl,
rowstamp as other_wo_columns
from maximo.workorder
where wogroup = 'WO360996'
)
where istask = 0;
我使用 IBM 的 COTS 系统,称为 Maximo Asset Management。系统有一个 WORKORDER table 有 350,000 行。
Maximo 有一个名为 relationships 的概念,可用于从相关记录中提取数据。
人际关系如何运作:
对于每个人 WORKORDER 记录,系统使用 WHERE 子句从关系到 运行 一个 select 查询来拉入相关记录( screenshot).
相关记录:
在这种情况下,相关记录是名为 WOTASKROLLUP_VW.
的自定义数据库视图中的行在相关的 post 中,我探索了可以在视图中使用的不同 SQL 汇总技术:Group by x, get other fields too。当我在完整的 WORKORDER table.
上 运行 时,我探索的选项彼此表现相似然而,实际上,Maximo 被设计为一次只能获取一行——通过单独的 select 语句。因此,当仅 select 单个 WORKORDER 记录时,查询 表现非常不同 。
我已经将每个查询包装在一个外部查询中,其中包含一个 select 特定工作订单的 WHERE 子句。我这样做是为了模仿 Maximo 在使用关系时所做的事情。
查询 1b:(GROUP BY;select5 个聚合)
性能 非常好,即使只 selecting 一条记录,因为使用了索引(仅 37 毫秒)。
select
*
from
(
select
wogroup as wonum,
sum(actlabcost) as actlabcost_tasks_incl,
sum(actmatcost) as actmatcost_tasks_incl,
sum(acttoolcost) as acttoolcost_tasks_incl,
sum(actservcost) as actservcost_tasks_incl,
sum(actlabcost + actmatcost + acttoolcost + actservcost) as acttotalcost_tasks_incl,
max(case when istask = 0 then rowstamp end) as other_wo_columns
from
maximo.workorder
group by
wogroup
)
where
wonum in ('WO360996')
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 4 (0)| 00:00:01 |
| 1 | SORT GROUP BY NOSORT | | 1 | 34 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| WORKORDER | 1 | 34 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | WORKORDER_NDX32 | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("WOGROUP"='WO360996')
查询 #2:(SUM window 函数)
性能相对慢,当select单个记录时,因为没有使用索引(3 秒)。
select
*
from
(
select
wonum,
actlabcost_tasks_incl,
actmatcost_tasks_incl,
acttoolcost_tasks_incl,
actservcost_tasks_incl,
acttotalcost_tasks_incl,
other_wo_columns
from
(
select
wonum,
istask,
sum(actlabcost ) over (partition by wogroup) as actlabcost_tasks_incl,
sum(actmatcost ) over (partition by wogroup) as actmatcost_tasks_incl,
sum(acttoolcost) over (partition by wogroup) as acttoolcost_tasks_incl,
sum(actservcost) over (partition by wogroup) as actservcost_tasks_incl,
sum(actlabcost + actmatcost + acttoolcost + actservcost) over (partition by wogroup) as acttotalcost_tasks_incl,
rowstamp as other_wo_columns
from
maximo.workorder
)
where
istask = 0
)
where
wonum in ('WO360996')
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 355K| 61M| | 14789 (1)| 00:00:01 |
|* 1 | VIEW | | 355K| 61M| | 14789 (1)| 00:00:01 |
| 2 | WINDOW SORT | | 355K| 14M| 21M| 14789 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| WORKORDER | 355K| 14M| | 10863 (2)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("WONUM"='WO360996' AND "ISTASK"=0)
问题:
为什么 #1B 中的 GROUP BY 查询能够使用索引(快),而 #2 中的 Sum Window 函数不能使用索引(慢)?
您的两个查询与您使用的第一个查询不同:
select wogroup as wonum,
而您刚刚使用的第二个:
select wonum,
这意味着您不会在 WOGROUP
上使用索引,因为您在 WONUM
列而不是 WOGROUP
列上进行过滤(恰好别名为WONUM
).
看起来你的第二个查询可以更正和减少(通过将过滤器移动到内部子查询并删除分区,因为你已经在过滤)到:
select wonum,
actlabcost_tasks_incl,
actmatcost_tasks_incl,
acttoolcost_tasks_incl,
actservcost_tasks_incl,
acttotalcost_tasks_incl,
other_wo_columns
from (
select wogroup AS wonum,
istask,
sum(actlabcost ) over () as actlabcost_tasks_incl,
sum(actmatcost ) over () as actmatcost_tasks_incl,
sum(acttoolcost) over () as acttoolcost_tasks_incl,
sum(actservcost) over () as actservcost_tasks_incl,
sum(actlabcost + actmatcost + acttoolcost + actservcost) over () as acttotalcost_tasks_incl,
rowstamp as other_wo_columns
from maximo.workorder
where wogroup = 'WO360996'
)
where istask = 0;