SQL 服务器 - 为什么对同一个 table 扫描两次?

SQL Server - why is scanning done twice for the same table?

有谁知道为什么 sql 服务器选择查询 table 'building' 两次?有什么解释吗?只用一个 table seek 就可以完成吗?

这是代码示例:

DECLARE @id1stBuild INT = 1
    ,@number1stBuild INT = 2
    ,@idLastBuild INT = 5
    ,@numberLastBuild INT = 1;
DECLARE @nr TABLE (nr INT);

INSERT @nr
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

CREATE TABLE building (
    id INT PRIMARY KEY identity(1, 1)
    ,number INT NOT NULL
    ,idStreet INT NOT NULL
    ,surface INT NOT NULL
    )

INSERT INTO building (number,idStreet,surface)
SELECT bl.b
    ,n.nr
    ,abs(convert(BIGINT, convert(VARBINARY, NEWID()))) % 500
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY n1.nr) b
    FROM @nr n1
    CROSS JOIN @nr n2
    CROSS JOIN @nr n3
    ) bl
CROSS JOIN @nr n

--***** execution plan for the select below
SELECT *
FROM building b
WHERE b.id = @id1stBuild
    AND b.number = @number1stBuild
    OR b.id = @idLastBuild
    AND b.number = @numberLastBuild

DROP TABLE building

这个执行计划总是一样的:两个Clustered Index Seek通过Merge Join(Concatenation)统一。其余的就不那么重要了。 这是执行计划:

它不是扫描两次。正在寻找两次。

您的查询在语义上与以下相同。

SELECT *
FROM   building b
WHERE  b.id = @id1stBuild
       AND b.number = @number1stBuild
UNION
SELECT *
FROM   building b
WHERE  b.id = @idLastBuild
       AND b.number = @numberLastBuild 

并且执行计划执行两次查找并合并结果。

why is scanning done twice for the same table?

不是扫描,是搜索,这一切都不同。

将 OR 作为 UNION 实现,然后通过 MERGE JOIN 实现 UNION。叫做'merge union':

Merge union

Now let’s change the query slightly:

select a from T where b = 1 or c = 3

  |--Stream Aggregate(GROUP BY:([T].[a]))
   |--Merge Join(Concatenation)
        |--Index Seek(OBJECT:([T].[Tb]), SEEK:([T].[b]=(1)) ORDERED FORWARD)
        |--Index Seek(OBJECT:([T].[Tc]), SEEK:([T].[c]=(3)) ORDERED FORWARD)

Instead of the concatenation and sort distinct operators, we now have a merge join (concatenation) and a stream aggregate. What happened? The merge join (concatenation) or “merge union” is not really a join at all. It is implemented by the same iterator as the merge join, but it really performs a union all while preserving the order of the input rows. Finally, we use the stream aggregate to eliminate duplicates. (See this post for more about using stream aggregate to eliminate duplicates.) This plan is generally a better choice since the sort distinct uses memory and could spill data to disk if it runs out of memory while the stream aggregate does not use memory.

您可以尝试以下方法,它只提供一次寻道并且性能略有提高。正如@Martin_Smith 所说,您编写的代码相当于 Union

SELECT *
FROM building b
WHERE b.id IN (@id1stBuild , @idLastBuild) 
    AND 
        (
            (b.id = @id1stBuild AND b.number = @number1stBuild) OR 
            (b.id = @idLastBuild AND b.number = @numberLastBuild)
        )