SQL 大数据查询优化

Question

我想弄清楚为什么我的下面的代码只需添加指定的 where 子句（在底部）就可以从 2 秒运行时间缩短到 23 分钟运行时间。

修复会很棒，但也试图理解为什么这会导致它运行 4573468975468% 长，（不使用非常大的数据集，< 100,000）。

USE [HDWarehouse]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

with AncestryTree as (
  select WbsCode, ParentWbsCode
  from ProgressItemsView
  where ParentWbsCode is not null
  and
  BidMasterJobCode = '01525'
  union all
  select ProgressItemsView.WbsCode, t.ParentWbsCode
  from AncestryTree t 
  join ProgressItemsView on t.WbsCode = ProgressItemsView.ParentWbsCode
  where BidMasterJobCode ='01525'
)
Select ResourceCode, ResourceType, AccountCode, CostItemWbsCode, [Progress Level], TotalCost, ISNULL([Percent Complete]*totalcost/100,0) as [Earned Value], IsSuspended
from(
    Select *
    from(
        select * 
        from ResourceEmploymentsView y
        left join 
            (
            select t.WbsCode as [Resource Level], t.ParentWbsCode as [Progress Level], v.QuantityCompletePercent as [Percent Complete], BidMasterJobCode as jobcode
            from AncestryTree T
                left join ProgressItemsView V
                on t.ParentWbsCode = v.WbsCode
                and BidMasterJobCode = '01525'
            where v.HasProgressRecorded = '1'
            --Bring in all WBS codes 
            union
            select wbscode, wbscode, QuantityCompletePercent, BidMasterJobCode as jobcode
            from ProgressItemsView
                where IsLeaf = '1'
                and
                HasProgressRecorded = '1'
                and
                BidMasterJobCode = '01525'
            ) x
        --on y.BidMasterJobCode = x.BidMasterJobCode
        --and
        on y.CostItemWbsCode = x.[Resource Level]
        )z
    left join
        (select bidmasterjobcode as jobecode, wbscode, issuspended 
        from CostItemsView
        --where IsSuspended <> '1'
        ) CI
    on z.BidMasterJobCode = ci.jobecode
    and
    z.CostItemWbsCode = ci.WbsCode
    )q
where q.BidMasterJobCode = '01525'
and q.ResourceType <> 'Resource Assembly'
and IsSuspended <> '1' --this is what slow down my code, without it it runs in seconds...

Answer 1

我大胆猜测 IsSuspended 是一个低基数列，很可能是一个具有两个值的标志。如果是这样，服务器可能会看到您的谓词：

IsSuspended <> '1'

它可能会进行最坏情况评估，您可能会在查询计划中看到索引扫描或 Table 扫描。如果将此标志添加到现有索引的末尾，涉及 BidMasterJobCode 和 ResourceType，您可能会恢复性能。在制作这类复合索引时，甚至在制作覆盖索引时，一定要仔细考虑哪些列出现在那里，以什么顺序出现。通常，高基数列应首先出现，较小基数列按降序出现。这有助于优化器更好地选择索引。

在您的情况下，{IsSuspended <> '1'} 可能假设它将需要读取绝大多数行，特别是如果 {IsSuspended = '1'} 的行的百分比是比较低。

SQL 大数据查询优化

SQL Query Optimizing on large data

sql

sql-server

query-optimization