性能：使用索引和分区 (PostgreSQL)

Question

我有一个相当简单的数据库模型。我的 table "main" 看起来像这样：

| id (PK) | device_id (int) | msg_type (int) | rawdata (text) | timestamp (date+time) |

因此，每条收到的消息都存储在此 table 中，包括消息类型、时间戳、发送它的设备和原始数据。

此外，对于每个可能的 msg_type（总共大约 30 个），我还有一个单独的 table 存储解析后的原始数据。 table "main_type1" 的示例：

| id (PK) | main_id (FK) | device_id (int) | attribute_1 | attribute_2 | attribute_n |

（每个 msg_type 的结构不同，消息分布不均，这意味着有些 table 很大，有些 table 很小）。

请注意 device_id 始终包含在原始数据中，因此每个 table 都有此列。

现在我的问题是：

我曾经有过这样的疑问：

select attribute_1, attribute_2 from main_type1 inner join main on main_type1.main_id = main.id where timestamp > X and timestamp < Y and main.device_id = Z

一开始一切都足够而且很快。但是现在我的数据库在 "main" 中有超过 400.000.000 个条目。现在查询最多需要 15 分钟。

索引

我尝试使用索引，例如：

CREATE INDEX device_id_index ON main (device_id);

好吧，现在我可以更快地从主 table 检索数据，但这对连接没有帮助。我这里最大的问题是我只在主 table 中存储了时间戳信息。所以我必须一直加入......这是我的数据库模型的普遍失败吗？我尽量避免存储时间戳两次。

分区

一种解决方案是使用分区为每个 device_id 创建一个包含原始数据的新 table 吗？然后我会（当然是自动地）创建适当的分区，例如：

main_device_id_343223
main_device_id_4563
main_device_id_92338
main_device_id_4142315

这会给我带来与联接相关的速度优势吗？我还有哪些其他选择？为了完整起见：我正在使用 PostgreSQL

Answer 1

因为你的问题是join的执行时间，首先要做的是尝试通过以下方式创建索引来加速查询：

有助于连接本身的索引，在本例中是 main_type1 中外键 main.id 的索引（请注意，外键声明不会自动创建索引):
```
CREATE INDEX main_type_main_id_index ON main_type1(main_id);
```
有助于限制查询考虑的数据集的索引，在本例中是关于时间戳属性：
```
CREATE INDEX main_timestamp_index ON main(timestamp);
```

如果您的查询仅查找值的特定子集，您还可以考虑为属性时间戳创建 Partial Index 的可能性。

如果这些索引没有显着加快查询速度，那么您应该遵循的答案。

Answer 2

我建议的方案是：首先，创建 Renzo 提出的索引。如果这不能充分提高性能，请尝试使用分区。

From the documentation:

Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. (...)

如果您使用分区所有包含对特定设备的引用的查询（例如在您的问题中）将会快得多。只有那些将应用于许多 device_id 的查询（例如包含聚合）可能会更慢。

性能：使用索引和分区 (PostgreSQL)

Performance: Using indexing and partitioning (PostgreSQL)

database

postgresql

indexing

performance

partitioning