COUNT(1) + COUNT(DISTINCT()) 比分别执行 2 个查询要慢得多
COUNT(1) + COUNT(DISTINCT()) much slower then doing 2 queries separately
查询说明:
Person
(由PersonID
标识)可能有或没有相应的Job
(由JobID
标识)。
- 如果有相应的
Job
,绑定存储在table PersonJob
(PersonID
<=> JobID
).
Person
没有 Job
将被忽略。
Job
还有 CityID
.
- 每个
Job.CityID
,查询想要知道 Person
的总计数以及唯一 Person.HouseID
的计数
查询:
SELECT
Job.CityID, COUNT(1) NumTotal, COUNT(DISTINCT(Person.HouseID)) NumDistinct
FROM
Job
INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
Job.CityID
统计数据:
SELECT COUNT(1) FROM PersonJob
~600.000
SELECT COUNT(1) FROM Person
~800.000
SELECT COUNT(DISTINCT(Person.HouseID)) FROM Person
~10.000
SELECT COUNT(1) FROM Job
~500
- MS SQL 服务器 10.50
问题:
COUNT(1)
部分查询,当运行分开时,运行秒在0.25秒内。
SELECT
Job.CityID, COUNT(1) NumTotal
FROM
Job
INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
Job.CityID
COUNT(DISTINCT(Person.HouseID))
部分查询,当运行分开时,0.80秒运行s.
SELECT
Job.CityID, COUNT(DISTINCT(Person.HouseID)) NumDistinct
FROM
Job
INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
Job.CityID
整个查询 运行s 在 3.10 秒内 - 3 倍慢,为什么?
执行计划:
- 对不起,我不是阅读这些的专家。
- 据我所知,问题出在 COUNT(DISTINCT)
- 在部分查询中:
- 25% 哈希匹配(聚合)(输出
Job.CityID
)
- 15% 哈希匹配(内连接)(输出
Job.CityID
,Person.HouseID
)
- 30%索引扫描(输出
Person.PersonID
,Person.HouseID
)
- 14% 索引查找(输出
PersonJob.PersonID
)
- 完整查询:
- 03% 哈希匹配(部分聚合)(输出
Job.CityID
,COUNT(*)
)
- 31% 哈希匹配(聚合)(输出
Job.CityID
)
- 29% Table 阀芯(输出
Job.CityID
、Person.HouseID
)
这是 2012 之前 SQL 服务器版本中的一个已知问题。
你可以试试这个基于 on the code here 的重写。
WITH T1
AS (SELECT Job.CityID,
Person.HouseID
FROM Job
INNER JOIN PersonJob
ON ( PersonJob.JobID = Job.JobID )
INNER JOIN Person
ON ( Person.PersonID = PersonJob.PersonID )),
PartialSums
AS (SELECT COUNT(*) AS CountStarPartialCount,
HouseID,
CityID
FROM T1
GROUP BY CityID,
HouseID)
SELECT CityID,
SUM(CountStarPartialCount) AS NumTotal,
COUNT(HouseID) AS NumDistinct
FROM PartialSums
GROUP BY CityID
SQL Server 2012 在这方面有一些改进。参见 Is Distinct Aggregation Still Considered Harmful?
阅读 Martin Smith 提供的解决方法后,我认为该解决方法太难阅读和理解,如果需要额外的 DISTINCT 列,它将变得一团糟。我决定按如下方式 LEFT JOIN 部分查询:
SELECT
Job.CityID, NumTotal.Value, NumDistinct.Value
FROM
Job
LEFT JOIN
(
SELECT
Job.CityID, COUNT(1) AS Value
FROM
Job
INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
Job.CityID
) NumTotal ON (NumTotal.CityID = Job.CityID)
LEFT JOIN
(
SELECT
Job.CityID, COUNT(DISTINCT Person.HouseID) AS Value
FROM
Job
INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
Job.CityID
) NumDistinct ON (NumDistinct.CityID = Job.CityID)
GROUP BY
Job.CityID
这在 0.70 秒内运行,而 "workaround" sql 在 0.60 秒内运行。这意味着 LEFT JOIN'inig 比 "original full query" 快 5 倍,只比 "workaround" 慢 20%,同时更容易阅读和扩展。
查询说明:
Person
(由PersonID
标识)可能有或没有相应的Job
(由JobID
标识)。- 如果有相应的
Job
,绑定存储在tablePersonJob
(PersonID
<=>JobID
). Person
没有Job
将被忽略。Job
还有CityID
.- 每个
Job.CityID
,查询想要知道Person
的总计数以及唯一Person.HouseID
的计数
查询:
SELECT
Job.CityID, COUNT(1) NumTotal, COUNT(DISTINCT(Person.HouseID)) NumDistinct
FROM
Job
INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
Job.CityID
统计数据:
SELECT COUNT(1) FROM PersonJob
~600.000SELECT COUNT(1) FROM Person
~800.000SELECT COUNT(DISTINCT(Person.HouseID)) FROM Person
~10.000SELECT COUNT(1) FROM Job
~500- MS SQL 服务器 10.50
问题:
COUNT(1)
部分查询,当运行分开时,运行秒在0.25秒内。SELECT Job.CityID, COUNT(1) NumTotal FROM Job INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID) INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID) GROUP BY Job.CityID
COUNT(DISTINCT(Person.HouseID))
部分查询,当运行分开时,0.80秒运行s.SELECT Job.CityID, COUNT(DISTINCT(Person.HouseID)) NumDistinct FROM Job INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID) INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID) GROUP BY Job.CityID
整个查询 运行s 在 3.10 秒内 - 3 倍慢,为什么?
执行计划:
- 对不起,我不是阅读这些的专家。
- 据我所知,问题出在 COUNT(DISTINCT)
- 在部分查询中:
- 25% 哈希匹配(聚合)(输出
Job.CityID
) - 15% 哈希匹配(内连接)(输出
Job.CityID
,Person.HouseID
)- 30%索引扫描(输出
Person.PersonID
,Person.HouseID
) - 14% 索引查找(输出
PersonJob.PersonID
)
- 30%索引扫描(输出
- 25% 哈希匹配(聚合)(输出
- 完整查询:
- 03% 哈希匹配(部分聚合)(输出
Job.CityID
,COUNT(*)
) - 31% 哈希匹配(聚合)(输出
Job.CityID
) - 29% Table 阀芯(输出
Job.CityID
、Person.HouseID
)
- 03% 哈希匹配(部分聚合)(输出
这是 2012 之前 SQL 服务器版本中的一个已知问题。
你可以试试这个基于 on the code here 的重写。
WITH T1
AS (SELECT Job.CityID,
Person.HouseID
FROM Job
INNER JOIN PersonJob
ON ( PersonJob.JobID = Job.JobID )
INNER JOIN Person
ON ( Person.PersonID = PersonJob.PersonID )),
PartialSums
AS (SELECT COUNT(*) AS CountStarPartialCount,
HouseID,
CityID
FROM T1
GROUP BY CityID,
HouseID)
SELECT CityID,
SUM(CountStarPartialCount) AS NumTotal,
COUNT(HouseID) AS NumDistinct
FROM PartialSums
GROUP BY CityID
SQL Server 2012 在这方面有一些改进。参见 Is Distinct Aggregation Still Considered Harmful?
阅读 Martin Smith 提供的解决方法后,我认为该解决方法太难阅读和理解,如果需要额外的 DISTINCT 列,它将变得一团糟。我决定按如下方式 LEFT JOIN 部分查询:
SELECT
Job.CityID, NumTotal.Value, NumDistinct.Value
FROM
Job
LEFT JOIN
(
SELECT
Job.CityID, COUNT(1) AS Value
FROM
Job
INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
Job.CityID
) NumTotal ON (NumTotal.CityID = Job.CityID)
LEFT JOIN
(
SELECT
Job.CityID, COUNT(DISTINCT Person.HouseID) AS Value
FROM
Job
INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
Job.CityID
) NumDistinct ON (NumDistinct.CityID = Job.CityID)
GROUP BY
Job.CityID
这在 0.70 秒内运行,而 "workaround" sql 在 0.60 秒内运行。这意味着 LEFT JOIN'inig 比 "original full query" 快 5 倍,只比 "workaround" 慢 20%,同时更容易阅读和扩展。