SQL 大数据查询优化
SQL Query Optimizing on large data
我想弄清楚为什么我的下面的代码只需添加指定的 where 子句(在底部)就可以从 2 秒 运行 时间缩短到 23 分钟 运行 时间。
修复会很棒,但也试图理解为什么这会导致它 运行 4573468975468% 长,(不使用非常大的数据集,< 100,000)。
USE [HDWarehouse]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
with AncestryTree as (
select WbsCode, ParentWbsCode
from ProgressItemsView
where ParentWbsCode is not null
and
BidMasterJobCode = '01525'
union all
select ProgressItemsView.WbsCode, t.ParentWbsCode
from AncestryTree t
join ProgressItemsView on t.WbsCode = ProgressItemsView.ParentWbsCode
where BidMasterJobCode ='01525'
)
Select ResourceCode, ResourceType, AccountCode, CostItemWbsCode, [Progress Level], TotalCost, ISNULL([Percent Complete]*totalcost/100,0) as [Earned Value], IsSuspended
from(
Select *
from(
select *
from ResourceEmploymentsView y
left join
(
select t.WbsCode as [Resource Level], t.ParentWbsCode as [Progress Level], v.QuantityCompletePercent as [Percent Complete], BidMasterJobCode as jobcode
from AncestryTree T
left join ProgressItemsView V
on t.ParentWbsCode = v.WbsCode
and BidMasterJobCode = '01525'
where v.HasProgressRecorded = '1'
--Bring in all WBS codes
union
select wbscode, wbscode, QuantityCompletePercent, BidMasterJobCode as jobcode
from ProgressItemsView
where IsLeaf = '1'
and
HasProgressRecorded = '1'
and
BidMasterJobCode = '01525'
) x
--on y.BidMasterJobCode = x.BidMasterJobCode
--and
on y.CostItemWbsCode = x.[Resource Level]
)z
left join
(select bidmasterjobcode as jobecode, wbscode, issuspended
from CostItemsView
--where IsSuspended <> '1'
) CI
on z.BidMasterJobCode = ci.jobecode
and
z.CostItemWbsCode = ci.WbsCode
)q
where q.BidMasterJobCode = '01525'
and q.ResourceType <> 'Resource Assembly'
and IsSuspended <> '1' --this is what slow down my code, without it it runs in seconds...
我大胆猜测 IsSuspended 是一个低基数列,很可能是一个具有两个值的标志。如果是这样,服务器可能会看到您的谓词:
IsSuspended <> '1'
它可能会进行最坏情况评估,您可能会在查询计划中看到索引扫描或 Table 扫描。如果将此标志添加到现有索引的末尾,涉及 BidMasterJobCode 和 ResourceType,您可能会恢复性能。在制作这类复合索引时,甚至在制作覆盖索引时,一定要仔细考虑哪些列出现在那里,以什么顺序出现。通常,高基数列应首先出现,较小基数列按降序出现。这有助于优化器更好地选择索引。
在您的情况下,{IsSuspended <> '1'} 可能假设它将需要读取绝大多数行,特别是如果 {IsSuspended = '1'} 的行的百分比是比较低。
我想弄清楚为什么我的下面的代码只需添加指定的 where 子句(在底部)就可以从 2 秒 运行 时间缩短到 23 分钟 运行 时间。
修复会很棒,但也试图理解为什么这会导致它 运行 4573468975468% 长,(不使用非常大的数据集,< 100,000)。
USE [HDWarehouse]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
with AncestryTree as (
select WbsCode, ParentWbsCode
from ProgressItemsView
where ParentWbsCode is not null
and
BidMasterJobCode = '01525'
union all
select ProgressItemsView.WbsCode, t.ParentWbsCode
from AncestryTree t
join ProgressItemsView on t.WbsCode = ProgressItemsView.ParentWbsCode
where BidMasterJobCode ='01525'
)
Select ResourceCode, ResourceType, AccountCode, CostItemWbsCode, [Progress Level], TotalCost, ISNULL([Percent Complete]*totalcost/100,0) as [Earned Value], IsSuspended
from(
Select *
from(
select *
from ResourceEmploymentsView y
left join
(
select t.WbsCode as [Resource Level], t.ParentWbsCode as [Progress Level], v.QuantityCompletePercent as [Percent Complete], BidMasterJobCode as jobcode
from AncestryTree T
left join ProgressItemsView V
on t.ParentWbsCode = v.WbsCode
and BidMasterJobCode = '01525'
where v.HasProgressRecorded = '1'
--Bring in all WBS codes
union
select wbscode, wbscode, QuantityCompletePercent, BidMasterJobCode as jobcode
from ProgressItemsView
where IsLeaf = '1'
and
HasProgressRecorded = '1'
and
BidMasterJobCode = '01525'
) x
--on y.BidMasterJobCode = x.BidMasterJobCode
--and
on y.CostItemWbsCode = x.[Resource Level]
)z
left join
(select bidmasterjobcode as jobecode, wbscode, issuspended
from CostItemsView
--where IsSuspended <> '1'
) CI
on z.BidMasterJobCode = ci.jobecode
and
z.CostItemWbsCode = ci.WbsCode
)q
where q.BidMasterJobCode = '01525'
and q.ResourceType <> 'Resource Assembly'
and IsSuspended <> '1' --this is what slow down my code, without it it runs in seconds...
我大胆猜测 IsSuspended 是一个低基数列,很可能是一个具有两个值的标志。如果是这样,服务器可能会看到您的谓词:
IsSuspended <> '1'
它可能会进行最坏情况评估,您可能会在查询计划中看到索引扫描或 Table 扫描。如果将此标志添加到现有索引的末尾,涉及 BidMasterJobCode 和 ResourceType,您可能会恢复性能。在制作这类复合索引时,甚至在制作覆盖索引时,一定要仔细考虑哪些列出现在那里,以什么顺序出现。通常,高基数列应首先出现,较小基数列按降序出现。这有助于优化器更好地选择索引。
在您的情况下,{IsSuspended <> '1'} 可能假设它将需要读取绝大多数行,特别是如果 {IsSuspended = '1'} 的行的百分比是比较低。