在处理排序时，我应该如何正确索引 mysql 列？

Question

我有一个日志table，但是我发现它在整理的时候变得很慢。

这是我的数据库 table 结构的简要说明。

CREATE TABLE `webhook_logs` (
  `ID` bigint(20) UNSIGNED NOT NULL,
  `event_id` bigint(20) UNSIGNED DEFAULT NULL,
  `object_id` bigint(20) UNSIGNED DEFAULT NULL,
  `occurred_at` bigint(20) UNSIGNED DEFAULT NULL,
  `payload` text COLLATE utf8mb4_unicode_520_ci,
  `priority` bigint(1) UNSIGNED DEFAULT NULL,
  `status` varchar(32) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;

ALTER TABLE `webhook_logs`
  ADD PRIMARY KEY (`ID`),
  ADD KEY `event_id` (`event_id`),
  ADD KEY `object_id` (`object_id`),
  ADD KEY `occurred_at` (`occurred_at`),
  ADD KEY `priority` (`priority`),
  ADD KEY `status` (`status`);

有5M+条记录。

当我做的时候

SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 ORDER BY priority ASC LIMIT 100

，获取记录用时约5秒

但是，当我删除排序时，只需执行

SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 LIMIT 100

，只用了0.0022秒

我一直在研究索引，看看时间是否有所改善，但没有成功。我想知道我在这里做错了什么。

我尝试用“occurred_at”和“priority”创建组合索引，或者用所有“occurred_at”、“priority”和“status”创建组合索引。 None 提高了速度，仍然需要 5 秒左右。如果有帮助，服务器是运行 MYSQL 5.7.12.

任何帮助都会被挪用。谢谢

Answer 1

纯索引解决不了你的问题。在您的查询中，数据库必须首先找出“occurred_at < 1652838913000”的所有记录，然后对它们进行排序以获得优先级最高的记录。没有索引可以帮助减少排序。

但是你的问题是有解决办法的，因为优先级总是只有几个值。你可以创建一个索引(status, priority, occurred_at)，然后像这样写一个查询：

select * from (
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=1 AND occurred_at < 1652838913000 LIMIT 100)
union
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=2 AND occurred_at < 1652838913000 LIMIT 100)
) a ORDER BY priority asc LIMIT 100

在这个查询中，DB会使用索引做并集的各个子查询，然后只对很少的几行进行排序。不到0.1秒即可返回结果

Answer 2

您不需要 BIGINT 对于其中的大多数列。该数据类型占用 8 个字节。有更小的数据类型。 priority 可以是 TINYINT UNSIGNED（1 个字节，范围为 0..255）。 status 可以更改为 1 字节 ENUM。此类更改将缩小数据和索引大小，从而在一定程度上加快大多数操作。

将INDEX(status)替换为

INDEX(status, occurred_at, priority, id)  -- in this order

然后您的查询会运行稍微快一些，具体取决于数据的分布。

这可能运行更快：

SELECT  w.*
    FROM  (
        SELECT id
            FROM `webhook_logs`
            WHERE  status = 'pending'
              AND  occurred_at < 1652838913000
            ORDER BY  priority ASC
            LIMIT  100
          ) AS t
    JOIN webhook_logs  USING(id)
    ORDER BY priority ASC    -- yes, this is repeated
    ;

这是因为它可以更快地从 my 索引中选取 100 个 id，因为它是“覆盖”，然后进行 100 次查找以获得“*”。

在处理排序时，我应该如何正确索引 mysql 列？

How should i properly index the mysql column when dealing with sort?

mysql

database

indexing

performance