通过 LEFT JOIN 优化 SQL 子查询

Question

我想根据不存在于 uniqueEntries 中的 actualEntries User_IDs 将 actualEntries table 中的所有记录插入 uniqueEntries table。

我从一个包含 NOT IN 子查询的 sql 子句开始，它非常慢（在处理 400K 条记录时），然后将它变成了一个 LEFT JOIN 子句，但是速度没有提高。

以下是包含 NOT IN 子查询的原始 sql 子句：

INSERT INTO uniqueEntries 
  SELECT * 
  FROM actualEntries 
  WHERE actualEntries.User_ID NOT IN (
    SELECT uniqueEntries.User_ID 
    FROM uniqueEntries
  )
  GROUP BY User_ID"

下面是sql子句转换成LEFT JOIN后的：

INSERT INTO uniqueEntries 
  SELECT actualEntries.* 
  FROM actualEntries 
  LEFT JOIN uniqueEntries 
  ON uniqueEntries.User_ID = actualEntries.User_ID 
  WHERE uniqueEntries.User_ID IS NULL 
  GROUP BY User_ID

当我运行查询 50 条记录时，它们会立即完成，但是当我运行查询 400K 条记录时，它们不会完成。

完成此操作的最快方法是什么？

更新/解决方案：根据@Rahul、@Steve E 和@fhthiella，我按如下方式更新了 LEFT JOIN，并将 470K 条记录的处理时间减少到 2 分钟。

INSERT INTO uniqueEntries 
  SELECT actualEntries.* 
  FROM actualEntries 
  LEFT JOIN uniqueEntries 
  ON uniqueEntries.id = actualEntries.id 
  WHERE uniqueEntries.User_ID IS NULL GROUP BY User_ID

Answer 1

首先删除 GROUP BY 子句 GROUP BY User_ID 因为它根本不需要。此外，您应该在表 uniqueEntries 和 actualEntries 的 User_ID 列上有一个索引，因为您将其用作连接列。这样，您的查询应该看起来像

INSERT INTO uniqueEntries 
  SELECT actualEntries.* 
  FROM actualEntries 
  LEFT JOIN uniqueEntries 
  ON uniqueEntries.User_ID = actualEntries.User_ID 
  WHERE uniqueEntries.User_ID IS NULL

Answer 2

在 uniqueEntries.User_ID 上放置唯一键或主键。那么

INSERT IGNORE INTO uniqueEntries 
  SELECT actualEntries.* 
  FROM actualEntries

IGNORE 子句将使MySQL跳过插入过程中的错误。 the manual 是这么说的：

If you use the IGNORE keyword, errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors may generate warnings instead, although duplicate-key errors do not.

Answer 3

您应该在 uniqueEntries.User_ID 和 actualEntries.User_ID 字段上添加索引：

ALTER TABLE uniqueEntries ADD INDEX idx_ue_id (User_ID);
ALTER TABLE actualEntries ADD INDEX idx_ae_id (User_ID);

这应该会使连接更快。我还看到您选择了所有 table 个字段：

SELECT actualEntries.*

但是你是按 User_id

分组的

GROUP BY User_ID

我认为您这样做是因为每个 User_ID 可能有多行。 MySQL 允许您这样做，但请注意，如果有多个行，您的查询将只保留一个，但未分组的值将不确定（它们可以属于任何分组的行）。

通过 LEFT JOIN 优化 SQL 子查询

Optimizing SQL subquery through a LEFT JOIN

mysql

join

subquery

sql-insert

notin