COUNT(1) + COUNT(DISTINCT()) 比分别执行 2 个查询要慢得多

COUNT(1) + COUNT(DISTINCT()) much slower then doing 2 queries separately

查询说明:

查询:

SELECT
  Job.CityID, COUNT(1) NumTotal, COUNT(DISTINCT(Person.HouseID)) NumDistinct
FROM
  Job
  INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
  INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
  Job.CityID

统计数据:

问题:

执行计划:

这是 2012 之前 SQL 服务器版本中的一个已知问题。

你可以试试这个基于 on the code here 的重写。

WITH T1
     AS (SELECT Job.CityID,
                Person.HouseID
         FROM   Job
                INNER JOIN PersonJob
                        ON ( PersonJob.JobID = Job.JobID )
                INNER JOIN Person
                        ON ( Person.PersonID = PersonJob.PersonID )),
     PartialSums
     AS (SELECT COUNT(*) AS CountStarPartialCount,
                HouseID,
                CityID
         FROM   T1
         GROUP  BY CityID,
                   HouseID)
SELECT CityID,
       SUM(CountStarPartialCount) AS NumTotal,
       COUNT(HouseID)             AS NumDistinct
FROM   PartialSums
GROUP  BY CityID 

SQL Server 2012 在这方面有一些改进。参见 Is Distinct Aggregation Still Considered Harmful?

阅读 Martin Smith 提供的解决方法后,我认为该解决方法太难阅读和理解,如果需要额外的 DISTINCT 列,它将变得一团糟。我决定按如下方式 LEFT JOIN 部分查询:

SELECT
  Job.CityID, NumTotal.Value, NumDistinct.Value
FROM
  Job
  LEFT JOIN
  (
    SELECT
      Job.CityID, COUNT(1) AS Value
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
  ) NumTotal ON (NumTotal.CityID = Job.CityID)
  LEFT JOIN
  (
    SELECT
      Job.CityID, COUNT(DISTINCT Person.HouseID) AS Value
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
  ) NumDistinct ON (NumDistinct.CityID = Job.CityID)
GROUP BY
  Job.CityID

这在 0.70 秒内运行,而 "workaround" sql 在 0.60 秒内运行。这意味着 LEFT JOIN'inig 比 "original full query" 快 5 倍,只比 "workaround" 慢 20%,同时更容易阅读和扩展。