为什么 postgresql 不按聚合对我的组使用索引？

Question

我在启用了 timescaledb 扩展的 postgresql 数据库中有一个 table，看起来像：

+------------+--------------------------+-------------+
| Column     | Type                     | Modifiers   |
|------------+--------------------------+-------------|
| time       | timestamp with time zone |  not null   |
| value      | double precision         |  not null   |
| being      | metric_being             |  not null   |
| device     | integer                  |  not null   |
+------------+--------------------------+-------------+

以及 table 上的索引：

"metrics_device_time_idx" btree (device, "time" DESC)

但是当我使用分组依据查询 table 时：

explain select max(time), device from metrics group by device;

不使用索引：

+----------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                        |
|-------------------------------------------------------------------------------------------------------------------|
| Finalize GroupAggregate  (cost=104577.41..104588.61 rows=22 width=12)                                             |
|   Group Key: _hyper_9_95_chunk.device                                                                             |
|   ->  Gather Merge  (cost=104577.41..104587.95 rows=88 width=12)                                                  |
|         Workers Planned: 4                                                                                        |
|         ->  Sort  (cost=103577.35..103577.41 rows=22 width=12)                                                    |
|               Sort Key: _hyper_9_95_chunk.device                                                                  |
|               ->  Partial HashAggregate  (cost=103576.64..103576.86 rows=22 width=12)                             |
|                     Group Key: _hyper_9_95_chunk.device                                                           |
|                     ->  Parallel Append  (cost=0.00..95035.06 rows=1708317 width=12)                              |
|                           ->  Parallel Seq Scan on _hyper_9_95_chunk  (cost=0.00..44602.70 rows=1122370 width=12) |
|                           ->  Parallel Seq Scan on _hyper_9_92_chunk  (cost=0.00..24807.61 rows=756061 width=12)  |
+-------------------------------------------------------------------------------------------------------------------+

最后开始有点慢。另一方面，真正快 10 倍的是

select max(time), 29 from metrics where device = 29
union
select max(time), 30 from metrics where device = 30
union
...

为什么会这样？我可以更改我的索引或查询以使用 group by 加速查询吗？为什么 union 这么快？

Answer 1

Postgres 无法在这种情况下使用索引。现在优化器不支持这一点。您可以找到一些关于此的信息 - 有名为“index skip scan”的补丁，但这项工作尚未完成。您可以使用一些 workarounds.

Answer 2

正如@Pavel Stehule 在他的回答中提到的那样，Postgres 没有实现索引跳过扫描，而索引跳过扫描对于优化这些类型的查询是必需的。 Timescaledb 认识到这些类型的查询在时间序列分析中确实很有帮助，因此他们自己实现了索引跳过扫描。从 2.2.1 版本开始，它出现在他们的扩展中，请参阅他们的博客 post 关于它 here。

将扩展升级到 >= 2.2.1 后，可以重写查询以使用索引跳过扫描：

select distinct on (device) device, time from metrics order by device, time desc

然后使用他们的索引跳过扫描实现，在我的例子中，查询速度提高了大约 100 倍。

为什么 postgresql 不按聚合对我的组使用索引？

Why isn't postgresql using an index with my group by aggregate?

sql

postgresql

timescaledb