Google 云 SQL 选择在另一个 table 中不存在的行

Question

我需要审核一些数据，但是我运行遇到了一个障碍，没有 return 查询。我在这里可以做些什么不同的事情，因为我真的不明白为什么这些查询是 returning（从 mysql workbench 执行），即使在让它们运行多个小时之后也是如此？我的配置是否不足？

我有一个 n1 标准 4 google 云 sql 实例（4cpu，15GB 内存）。下面是两个 table。在 customer_id 上也存在 table 的索引。 Table 2 有 885481 行，Table 1 有 1891653 行。

我尝试了三种查询变体来查找 table 中的客户 ID，其中一种变体在 table 2 中不存在（表示为 account_group_id）。

我所期望的是最高效的，实际上 return:

Select customer_id
FROM Table1 as a
WHERE NOT EXISTS(
    Select account_group_id
    FROM Table2 as b WHERE b.account_group_id = a.customer_id
)

作为子查询：

Select customer_id
FROM Table1
WHERE customer_id NOT IN(
    Select account_group_id
    FROM Table2
)

作为左连接：

SELECT customer_id
FROM Table1 as a
LEFT OUTER JOIN Table2 as b ON a.customer_id = b.account_group_id
WHERE b.account_group_id IS NULL

编辑：因此，经过一些修补并在发布我的问题之前实际使用 EXPLAIN 之后，table2 子查询出于某种原因正在执行 FULL TABLE 扫描。我已经在我的测试/暂存环境中使用相同的索引模式尝试过此查询，并且我在那里看到了一个索引搜索。现在，我更糊涂了。

即使我添加强制索引提示，查询优化器也拒绝使用主键。

这是查询计划在我的临时环境中的样子：

关于为什么会发生这种情况有什么想法吗？

Table1:

Table 2:

Answer 1

两件事：

确保 table 2 在 account_group_id 上有一个 INDEX。否则，您将进行完整的 table 扫描，效率不高。
SUB QUERY 选项是更好的选择，但不是 OUTER JOIN，因为它将 table 的两行相乘，得到一个可怕的（而且似乎永无止境） !) 结果集。

如果索引不存在

 -- CREATING AN INDEX IN CASE
 CREATE INDEX T2_agi ON Table2(account_group_id);     

 SELECT customer_id
 FROM Table1 as a
 WHERE customer_id NOT IN(
   Select account_group_id
   FROM Table2
 );

Answer 2

好吧，经过大量修改后，我完全重新设计了这个查询，让愚蠢的优化器使用我想要的索引...一定与表的大小有关：

SELECT a.customer_id
FROM Table1 as a
WHERE a.customer_id NOT IN (
    SELECT b.customer_id
    FROM Table1 as b
    JOIN (select account_group_id from Table2) as x on x.account_group_id = b.customer_id
)

Google 云 SQL 选择在另一个 table 中不存在的行

Google Cloud SQL selecting rows that don't exist within another table

mysql

sql

google-cloud-sql