大 MySQL 数据库需要时间来获取数据

Question

我有一个很大的消息数据库，有 240 万行：

Showing rows 0 - 24 (2455455 total, Query took 0.0006 seconds.)

消息，所以我需要对话加载得更快，对于它加载的对话较少的用户（用户有 3.2k 个对话）：

 Showing rows 0 - 24 (3266 total, Query took 0.0345 seconds.) [id: 5009666... - 4375619...]

对于具有大量对话的用户，加载速度较慢（用户有 40k 个对话）：

 Showing rows 0 - 24 (40296 total, Query took 5.1763 seconds.) [id: 5021561... - 5015545...]

我正在为这些列使用索引键：

id, to_id, from_id, time, seen

数据库Table:

CREATE TABLE `messages` (
  `id` int(255) NOT NULL,
  `to_id` int(20) NOT NULL,
  `from_id` int(20) NOT NULL,
  `message` longtext NOT NULL,
  `time` double NOT NULL,
  `seen` int(2) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=latin1;



INSERT INTO `messages` (`id`, `to_id`, `from_id`, `message`, `time`, `seen`) VALUES
(2, 6001, 2, 'Hi there', 1587581995.5222, 1);


ALTER TABLE `messages`
  ADD PRIMARY KEY (`id`),
  ADD KEY `time_idx` (`time`),
  ADD KEY `from_idx` (`from_id`),
  ADD KEY `to_idx` (`to_id`),
  ADD KEY `seenx` (`seen`),
  ADD KEY `idx` (`id`);


ALTER TABLE `messages`
  MODIFY `id` int(255) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=5021570;
COMMIT;

我正在使用这个查询：

SELECT
  *
FROM
  messages,
  (
    SELECT
      MAX(id) as lastid
    FROM
      messages
    WHERE
      (
        messages.to_id = '1' -- ID to compare with (logged in users's ID)
        OR messages.from_id = '1' -- ID to compare with (logged in users's ID)
      )
    GROUP BY
      CONCAT(
        LEAST(messages.to_id, messages.from_id),
        '.',
        GREATEST(messages.to_id, messages.from_id)
      )
  ) as conversations
WHERE
  id = conversations.lastid
ORDER BY
  messages.id DESC

我不知道如何让有很多对话的用户更快，我是否应该重新创建数据库结构。

Answer 1

嗯，也许你可以尝试为你的 table 添加索引：https://www.drupal.org/docs/7/guidelines-for-sql/the-benefits-of-indexing-large-mysql-tables#:~:text=Creating%20Indexes&text=The%20statement%20to%20create%20index,the%20index%20must%20be%20distinct。确保按要查询的行添加组合索引。

如果这不能改善您的查询时间，则应改进查询。

Answer 2

您还可以使用 time 对消息 table 进行分区。

Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables, but still get treated as a single table by the SQL layer. When partitioning in MySQL, it's a good idea to find a natural partition key

https://www.percona.com/blog/2017/07/27/what-is-mysql-partitioning/#:~:text=So%2C%20What%20is%20MySQL%20Partitioning,find%20a%20natural%20partition%20key.

Answer 3

备注：

使用 UNION 而不是 OR（见下文）
有多余的键。 PRIMARY KEY是一个key，所以折腾KEY(id)
不要盲目地索引每一列；而是使用查询来确定哪些索引，尤其是复合索引，实际上是有用的。
CONCAT 是不必要的，并且在 GROUP BY 和 ORDER BY 中可能适得其反。
INT 上的长度字段将被忽略。您所拥有的价值仅限于 20 亿个。（这对于假设为 0 或 1 的 seen 来说太过分了？）
使用新语法：JOIN..ON.
如果seen正好是true/false，那就把索引折腾上去。（或者告诉我您认为会从中受益的查询。）

CONCAT-LEAST-GREATEST -- 这是构造一个“friends_id”？也许你真的想要一个“conversation_id”？目前，两个用户永远不能超过一个“对话”，对吗？

如果确实需要，请为 conversation_id 创建一个新专栏。（目前，GROUP BY 效率低下。）下面的代码避免了对这样一个 id 的需要。

( SELECT lastid FROM (
    ( SELECT from_id, MAX(id) AS lastid FROM messages
           WHERE to_id = ? GROUP BY from_id )
    UNION DISTINCT
    ( SELECT to_id,   MAX(id) AS lastid FROM messages 
           WHERE from_id = ? GROUP BY to_id )
                     ) AS x
) AS conversations

还有这些 'covering' 和 'composite' 索引：

INDEX(to_id, from_id, id)
INDEX(from_id, to_id, id)

并抛出 KEY(to_id), KEY(from_id) 因为我的新索引处理了它们要做的任何其他事情。

我认为这具有相同的效果，但会运行快得多。

放在一起：

SELECT  *
    FROM (
            ( SELECT from_id AS other_id,
                     MAX(id) AS lastid
                  FROM messages
                  WHERE to_id = ? GROUP BY from_id )
            UNION ALL
            ( SELECT to_id AS other_id,
                     MAX(id) AS lastid
                  FROM messages 
                  WHERE from_id = ? GROUP BY to_id )
         ) AS latest
    JOIN  messages  ON messages.id = latest.lastid
    ORDER BY  messages.id DESC

（加上两个索引）

更多

我在想（错误地）UNION DISTINCT 会取代对 conversation_id 的需要。但它不会。副手我看到了一些解决方案：

添加 conversation_id 并使用它进行重复数据删除。（同时，我将 UNION DISTINCT 更改为 UNION ALL，在不更改结果的情况下使查询更快一些。）
使用 (from_id, to_id, latestid); 将我的查询输出放入临时 table；然后使用 CONCAT-LEAST-GREATEST 技巧来删除对话；最后将 JOIN 返回到 messages 以获取其余列。
临时 table 技术使编写和调试更容易。我的第三个建议是简单地将各个部分拼凑在一起，这是一个单一的 (hard-to-read) 查询，其中 Selects 嵌套在 3 层深处。

大 MySQL 数据库需要时间来获取数据

Big MySQL database takes time fetching data

mysql

optimization

query-optimization