简单的 WHERE EXISTS ... ORDER BY ... 查询在 PostgreSQL 中非常慢

Question

我有这个非常简单的查询，由我的 ORM（Entity Framework 核心）生成：

SELECT *
 FROM "table1" AS "t1"
 WHERE EXISTS (
     SELECT 1
     FROM "table2" AS "t2"
     WHERE ("t2"."is_active" = TRUE) AND ("t1"."table2_id" = "t2"."id"))
 ORDER BY "t1"."table2_id"

有 2 "is_active" 条记录。其他涉及的列 ("id") 是主键。查询 returns 恰好 4 行。
Table1为9600万条记录
Table2为3000万条记录
本次查询涉及的3列都被索引了(is_active, id, table2_id).
生成这个简单查询的 C#/LINQ 代码是：Table2.Where(t => t.IsActive).Include(t => t.Table1).ToList( );`
SET STATISTICS 10000 已设置为所有 3 列。
VACUUM FULL ANALYZE 在两张桌子上都是运行。

在没有 ORDER BY 子句的情况下，查询 return 会在几毫秒内完成，我不希望有 4 条记录到 return。解释输出：

Nested Loop  (cost=1.13..13.42 rows=103961024 width=121)
  ->  Index Scan using table2_is_active_idx on table2  (cost=0.56..4.58 rows=1 width=8)
        Index Cond: (is_active = true)
        Filter: is_active
  ->  Index Scan using table1_table2_id_fkey on table1 t1 (cost=0.57..8.74 rows=10 width=121)
        Index Cond: (table2_id = table1.id)

使用 ORDER BY 子句，查询需要 5 分钟才能完成！解释输出：

Merge Semi Join  (cost=10.95..4822984.67 rows=103961040 width=121)
  Merge Cond: (t1.table2_id = t2.id)
  ->  Index Scan using table1_table2_id_fkey on table1 t1  (cost=0.57..4563070.61 rows=103961040 width=121)
  ->  Sort  (cost=4.59..4.59 rows=2 width=8)
        Sort Key: t2.id
        ->  Index Scan using table2_is_active_idx on table2 a  (cost=0.56..4.58 rows=2 width=8)
              Index Cond: (is_active = true)
              Filter: is_active

内部第一个索引扫描应该return不超过 2 行。然后外部的第二个索引扫描没有任何意义，其成本为 4563070 行和 103961040 行。它只需要匹配 table2 中的 2 行和 table1!

中的 4 行

这是一个非常简单的查询，只有很少的记录到 return。为什么 Postgres 无法正确执行它？

Answer 1

添加索引：

CREATE INDEX _index 
ON table2 
USING btree (id) 
WHERE is_active IS TRUE;

并像这样重写查询

SELECT table1.*
FROM table2
INNER JOIN table1 ON (table1.table2_id = table2.id)
WHERE table2.is_active IS TRUE 
ORDER BY table2.id

有必要考虑到 PostgreSQL 以不同方式处理 "is_active IS TRUE" 和 "is_active = TRUE"。所以索引谓词中的表达式和查询必须匹配。

如果您无法重写查询，请尝试添加索引：

CREATE INDEX _index 
ON table2 
USING btree (id) 
WHERE is_active = TRUE;

Answer 2

好的，我以最意想不到的方式解决了我的问题。我将 Postgresql 从 9.6.1 升级到 9.6.3。就是这样。重新启动服务后，解释计划现在看起来不错，这次查询运行也很好。我没有改变任何东西，没有新索引，什么都没有。我能想到的唯一解释是 9.6.1 中存在查询计划程序错误并在 9.6.3 中解决。谢谢大家的回答！

Answer 3

您的猜测是正确的，Postgres 9.6.1 中有一个错误完全符合您的用例。升级是正确的做法。 Upgrading to the latest point-release is always the right thing to do.

Quoting the release notes for Postgres 9.6.2:

Fix foreign-key-based join selectivity estimation for semi-joins and anti-joins, as well as inheritance cases (Tom Lane)

The new code for taking the existence of a foreign key relationship into account did the wrong thing in these cases, making the estimates worse not better than the pre-9.6 code.

您仍然应该像那样创建部分索引。但保持简单：

is_active = TRUE 和 is_active IS TRUE subtly differ 因为第二个 returns FALSE 而不是 NULL 用于 NULL 输入。但是 none 在 WHERE 子句中很重要，其中只有 TRUE 符合条件。这两种表达都只是噪音。在 Postgres 中，您可以直接使用 boolean 值：

CREATE INDEX t2_id_idx ON table2 (id) WHERE is_active;  -- that's all

并且 而不是 用 LEFT JOIN 重写您的查询。这会将包含 NULL 值的行添加到 table2 中 "active" 行的结果中，而 table1 中没有任何兄弟。为了符合您当前的逻辑，它必须是 [INNER] JOIN:

SELECT t1.*
FROM   table2 t2
JOIN   table1 t1 ON t1.table2_id = t2.id  -- and no parentheses needed
WHERE  t2.is_active  -- that's all
ORDER  BY t1.table2_id;

但是根本不需要那样重写您的查询。您拥有的 EXISTS 半连接同样好。拥有部分索引后会产生相同的查询计划。

SELECT *
FROM   table1 t1
WHERE  EXISTS (
   SELECT 1 FROM table2
   WHERE  is_active  -- that's all
   WHERE  id = t1.table2_id
   )
ORDER  BY table2_id;

顺便说一句，因为你通过升级修复了这个错误，一旦你创建了那个部分索引（并且运行 ANALYZE 或 VACUUM ANALYZE 在 table 上至少一次- 或者 autovacuum 为你做了那个），你将 never 再次得到一个糟糕的查询计划，因为 Postgres 维护部分索引的单独估计，这对您的数字来说是明确的。详情：

简单的 WHERE EXISTS ... ORDER BY ... 查询在 PostgreSQL 中非常慢

Simple WHERE EXISTS ... ORDER BY... query very slow in PostrgeSQL

postgresql

sql-execution-plan

partial-index

entity-framework-core

postgres-9.6