如何在 PostgreSQL 中提高 COUNT SQL 查询的性能？

Question

我有一个带有多个列的 table。但为了简单起见，我们可以考虑以下 table:

create table tmp_table
(
    entity_1_id varchar(255) not null,
    status integer default 1 not null,
    entity_2_id varchar(255)
);

create index tmp_table_entity_1_id_idx
    on tmp_table (entity_1_id);

create index tmp_table_entity_2_id_idx
    on tmp_table (entity_2_id);

我要执行这个请求：

SELECT tmp_table.entity_2_id, COUNT(*) FROM tmp_table 
    WHERE tmp_table.entity_1_id='cedca236-3f27-4db3-876c-a6c159f4d15e' AND 
          tmp_table.status <> 2 AND 
          tmp_table.entity_2_id = ANY (string_to_array('21c5598b-0620-4a8c-b6fd-a4bfee024254,af0f9cb9-da47-4f6b-a3c4-218b901842f7', ',')) 
    GROUP BY tmp_table.entity_2_id;

当我将字符串发送到带有一些值（如 1-20）的 string_to_array 函数时，它工作正常。但是当我尝试发送 500 元素时，它运行得太慢了。不幸的是，我真的需要 100-500 个元素。

Answer 1

对于此查询：

SELECT t.entity_2_id, COUNT(*)
FROM tmp_table t
WHERE t.entity_1_id = 'cedca236-3f27-4db3-876c-a6c159f4d15e' AND 
      t.status <> 2 AND 
      t.entity_2_id = ANY (string_to_array('21c5598b-0620-4a8c-b6fd-a4bfee024254,af0f9cb9-da47-4f6b-a3c4-218b901842f7', ',')) 
GROUP BY t.entity_2_id;

我会推荐 tmp_table(entity_1_id, entity_2_id, status) 上的索引。

但是，您可能会发现它更快：

select rst.entity_2_id,
       (select count(*)
        from tmp_table t
        where t.entity_2_id = rst.entity_2_id and
              t.entity_1_id = 'cedca236-3f27-4db3-876c-a6c159f4d15e' AND 
              t.status <> 2
       ) as cnt
from regexp_split_to_table(str, ',') rst(entity_2_id);

那么您需要 tmp_table(entity_2_id, entity_1_id, status) 上的索引。

在大多数数据库中，这会更快，因为索引是覆盖索引，这避免了对整个结果集进行最终聚合。但是，Postgres 将锁定信息存储在数据页上，因此仍然需要读取它们。还是值得一试的。

如何在 PostgreSQL 中提高 COUNT SQL 查询的性能？

How to increase performance of COUNT SQL query in PostgreSQL?

sql

postgresql

count

sql-optimization

postgresql-10