COUNT(DISTINCT) 和 COUNT(*) + GROUP BY 给出不同的结果

Question

我们正在查询其中一个数据集的唯一 ID

SELECT count(distinct id) FROM [MyTable] LIMIT 1

另一个查询运行一个类似的命令

SELECT count(*) From  ( select id FROM MyTable group by id) A ;

第一个命令效率更高，但输出应该相同。然而，他们得到了不同的结果。第一个查询 returns 更多结果约占数据集的 1.5%，超过 1 亿行。

Answer 1

It is a statistical approximation and is not guaranteed to be exact.

第二个查询returns精确计数，因此差异

Answer 2

COUNT(DISTINCT field) 只是一个估计值。如果您需要准确的结果，您可以使用 EXACT_COUNT_DISTINCT(field).

COUNT(DISTINCT) and COUNT(*) + GROUP BY give different results