为什么将查询计数分配给变量比直接检查它执行得更好?

Why assigning count of a query to a variable performed better than checking it directly?

最近有一次性能调优的经历,想在这里分享一下,想了解一下为什么会出现这种改进。

在我的一个过程中,我想return一个基于其他记录的数据集。

我的查询:

IF (SELECT COUNT(1) FROM ...) > 0
    SELECT …

这个查询大约用了 5 秒。

我做了一个更改,并将 IF 语句的输出分配给一个变量,然后检查它。

DECLARE @cnt INT = 0
SELECT @cnt = COUNT(1) FROM …

IF @cnt > 0
    SELECT …

这个 运行.

用不到 1 秒的时间

我也尝试了 IF EXISTS,但在改进(5 秒)之前得到了相同的结果。

我非常想知道为什么编译器的行为如此不同,以及是否有任何特定的答案。

谢谢

这里有两部分。

1) SQL 服务器优化器转换

IF (SELECT COUNT(1) FROM ...) > 0
    SELECT …

进入

IF EXISTS(SELECT 1 FROM ...)
    SELECT …

我看到 Adam Machanic 在他对 Andrew Kelly post 的评论中指出了这一点 Exists Vs. Count(*) - The battle never ends:

It's interesting to note that in SQL Server 2005 if there is a index available to allow a seek, the COUNT(*) > 0 test will be optimized and behave the same as EXISTS.

Adam 提供了一个演示。


2) 有时 EXISTSCOUNT 更糟糕:

IF EXISTS taking longer than embedded select statement

Check existence with EXISTS outperform COUNT! … Not?

保罗·怀特 wrote:

Using EXISTS introduces a row goal, where the optimizer produces an execution plan aimed at locating the first row quickly. In doing this, it assumes that the data is uniformly distributed. For example, if statistics show there are 100 expected matches in 100,000 rows, it will assume it will have to read only 1,000 rows to find the first match.

This will result in longer than expected execution times if this assumption turns out to be faulty. For example, if SQL Server chooses an access method (e.g. unordered scan) that happens to locate the first matching value very late on in the search, it could result in an almost complete scan. On the other hand, if a matching row happens to be found amongst the first few rows, performance will be very good. This is the fundamental risk with row goals - inconsistent performance.


如果您的数据分布是倾斜的,或者如果您预计在大多数情况下 COUNT 将为零(即无论如何您都必须扫描整个 table 以获得答案),那么您应该尝试获得没有行目标的计划(即没有 EXISTS)。

您已经发现的一种明显方法是将 COUNT 的结果保存到变量中。