为什么计数结果与正常的 select 行计数不同？在极光 Postgresql 中

Question

我正在使用 Aurora 的 Postgresql，搜索带有星号的记录时，正常计数查询的数量与行数不匹配。这是什么原因？

问题系统使用AWS Aurora和9.6.8版本的Postgresql作为引擎。如下图，Postgresql的正常搜索结果和count结果不匹配

正常搜索

SELECT * FROM samples WHERE date BETWEEN '2019-09-25 00:00:00' AND '2019-09-26 00:00:00';

结果返回17613条记录。

计数查询

SELECT COUNT(*) FROM samples WHERE date BETWEEN '2019-09-25 00:00:00' AND '2019-09-26 00:00:00';

17875 作为计数查询返回。

这个table有多个主键和多个允许空值的列。为什么 SELECT * 和 SELECT COUNT(*) 的结果数量不同？

顺便说一句，如果你指定了一个table名称或主键，它匹配正常搜索中的记录数。

SELECT COUNT(sample_id) FROM samples WHERE date BETWEEN '2019-09-25 00:00:00' AND '2019-09-26 00:00:00';

或

SELECT COUNT(samples) FROM samples WHERE date BETWEEN '2019-09-25 00:00:00' AND '2019-09-26 00:00:00';

17613 作为计数查询返回。

我要哭了，因为我的工作不成功。谢谢。

Answer 1

函数count(<column_name>) returns 非空值的个数。 count(*) return NULL 和非 NULL 值的总数。

Answer 2

我不知道根本原因，但发现这是在对 Aurora Postgres 的只读副本执行仅索引扫描时发生的。

我看了每一个执行计划，看起来是这样的。索引用于日期列。

SELECT *

testdb=> explain analyse select * from samples where date between '2019-09-25 00:00:00' and '2019-09-26 00:00:00';
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_samples on samples  (cost=0.43..5852.48 rows=3906 width=627) (actual time=0.014..18.325 rows=17613 loops=1)
   Index Cond: ((date >= '2019-09-25'::date) AND (date <= '2019-09-26'::date))
 Planning time: 1.154 ms
 Execution time: 18.969 ms
(4 rows)

在这种情况下，为了获取数据作为记录，不仅要获取索引，还要获取实际数据。结果，17613 行符合 return.

SELECT COUNT(*)

testdb=> explain analyse select count(*) from samples where date between '2019-09-25 00:00:00' and '2019-09-26 00:00:00';
                                                                             QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=200.32..200.33 rows=1 width=8) (actual time=19.971..19.972 rows=1 loops=1)
   ->  Index Only Scan using idx_samples on samples  (cost=0.43..190.56 rows=3906 width=0) (actual time=0.022..18.901 rows=17875 loops=1)
         Index Cond: ((date >= '2019-09-25'::date) AND (date <= '2019-09-26'::date))
         Heap Fetches: 59983
 Planning time: 1.125 ms
 Execution time: 19.994 ms
(6 rows)

但是，当只获取记录条数时，条件中加入的日期列为索引目标，所以使用Index Only Scan。结果，17875 行符合 return.

这不会发生在 Aurora 主节点上，只会发生在只读副本上。

我找了很多，都没有找到关于出现无用元组的根本原因，或者没有做vacuum的文章。 Aurora 和 Postgresql 的论坛我也搜了，都没有找到这样的文章。与我一起参加这次活动的工程师无法掩饰惊喜。

我认为实际数据的个数是正确答案，不是Index Only Scan的结果，作为解决方案，我将检查未按计数索引的列数，例如 SELECT COUNT (sample_id).

如果有人分享一篇关于此事件的起因或它是这样的规范的文章，将会很有帮助。

为什么计数结果与正常的 select 行计数不同？在极光 Postgresql 中

Why is the count result different from the normal select line count? in Aurora Postgresql

sql

postgresql

amazon-aurora