Planner 不使用索引顺序使用 CTE 对记录进行排序

Question

我正在尝试将一些 ID 传递到按条件排序的已排序索引的子句中，但查询计划程序在执行索引搜索后显式地对数据进行排序。以下是我的查询。

生成一个临时文件table.

SELECT a.n/20 as n, md5(a.n::TEXT) as b INTO temp_table 
From generate_series(1, 100000) as a(n);

创建索引

CREATE INDEX idx_temp_table ON temp_table(n ASC, b ASC);

在下面的查询中，规划器使用索引排序并且没有显式地对数据进行排序。（预期）
```
EXPLAIN ANALYSE
SELECT * from 
temp_table WHERE n = 10
ORDER BY  n, b
limit 5;
```

查询计划

QUERY PLAN Limit  (cost=0.42..16.07 rows=5 width=36) (actual time=0.098..0.101 rows=5 loops=1)   
          ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..1565.17 rows=500 width=36) (actual time=0.095..0.098 rows=5 loops=1)
            Index Cond: (n = 10)
            Heap Fetches: 5 Planning time: 0.551 ms Execution time: 0.128 ms

但是当我使用来自 cte 的一个或多个 id 并将它们传递到子句中时，planner 仅使用索引来获取值但之后显式地对它们进行排序（不是预期的）。
```
EXPLAIN ANALYSE
WITH cte(x) AS (VALUES (10))
SELECT * from temp_table 
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;
```
然后规划器使用下面的查询计划

查询计划

QUERY PLAN
Limit  (cost=85.18..85.20 rows=5 width=37) (actual time=0.073..0.075 rows=5 loops=1)
  CTE cte
    ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)
  ->  Sort  (cost=85.16..85.26 rows=40 width=37) (actual time=0.072..0.073 rows=5 loops=1)
        Sort Key: temp_table.n, temp_table.b
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Nested Loop  (cost=0.47..84.50 rows=40 width=37) (actual time=0.037..0.056 rows=40 loops=1)
              ->  Unique  (cost=0.05..0.06 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
                    ->  Sort  (cost=0.05..0.06 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
                          Sort Key: cte.x
                          Sort Method: quicksort  Memory: 25kB
                          ->  CTE Scan on cte  (cost=0.00..0.04 rows=2 width=4) (actual time=0.004..0.005 rows=2 loops=1)
              ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..42.02 rows=20 width=37) (actual time=0.012..0.018 rows=20 loops=2)
                    Index Cond: (n = cte.x)
                    Heap Fetches: 40
Planning time: 0.166 ms
Execution time: 0.101 ms

我尝试在 where 子句中传递 id 时进行显式排序，以便保持 id 中的排序顺序，但计划程序仍然显式排序
```
EXPLAIN ANALYSE
WITH cte(x) AS (VALUES (10))
SELECT * from temp_table 
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;
```

查询计划

QUERY PLAN
Limit  (cost=42.62..42.63 rows=5 width=37) (actual time=0.042..0.044 rows=5 loops=1)
  CTE cte
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
  ->  Sort  (cost=42.61..42.66 rows=20 width=37) (actual time=0.042..0.042 rows=5 loops=1)
        Sort Key: temp_table.n, temp_table.b
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Nested Loop  (cost=0.46..42.28 rows=20 width=37) (actual time=0.025..0.033 rows=20 loops=1)
              ->  HashAggregate  (cost=0.05..0.06 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
                    Group Key: cte.x
                    ->  Sort  (cost=0.03..0.04 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
                          Sort Key: cte.x
                          Sort Method: quicksort  Memory: 25kB
                          ->  CTE Scan on cte  (cost=0.00..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
              ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..42.02 rows=20 width=37) (actual time=0.014..0.020 rows=20 loops=1)
                    Index Cond: (n = cte.x)
                    Heap Fetches: 20
Planning time: 0.167 ms
Execution time: 0.074 ms

任何人都可以解释为什么规划器对数据使用显式排序吗？有没有办法绕过这个并让计划者使用索引排序顺序，以便可以保存对记录的额外排序。在生产中，我们有类似的情况，但我们的选择规模太大，但只有少数记录需要分页获取。感谢期待！

Answer 1

优化器不知道 CTE 已排序。如果您为多个值扫描一个索引并且有一个 ORDER BY，PostgreSQL 将始终排序。

我唯一想到的是用 IN 列表中的值创建一个临时 table 并在该临时 table 上放置一个索引。然后，当您加入那个 table 时，PostgreSQL 会知道顺序，并且可能会选择一个可以使用索引的合并联接。

当然这意味着大量的开销，并且很可能是原始排序胜出。

Answer 2

其实是planner做的决定，有更大的集合values()，Postgres会切换到一个更聪明的plan，排序在merge之前完成。

select version();

\echo +++++ Original

EXPLAIN ANALYSE
WITH cte(x) AS (VALUES (10))
SELECT * from temp_table
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;

\echo +++++ TEN Values
EXPLAIN ANALYSE
WITH cte(x) AS (VALUES (10),(11),(12),(13),(14),(15),(16),(17),(18),(19)
        )
SELECT * from temp_table
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;

\echo ++++++++ one row from table

EXPLAIN ANALYSE
WITH cte(x) AS (SELECT n FROM temp_table WHERE n = 10)
SELECT * from temp_table
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;

\echo ++++++++ one row from table TWO ctes

EXPLAIN ANALYSE
WITH val(x) AS (VALUES (10))
,  cte(x) AS (
        SELECT n FROM temp_table WHERE n IN (select x from val)
        )
SELECT * from temp_table
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;

生成的计划：

                                                version                                                
-------------------------------------------------------------------------------------------------------
 PostgreSQL 11.3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
(1 row)

+++++ Original
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.72..13.73 rows=5 width=37) (actual time=0.197..0.200 rows=5 loops=1)
   CTE cte
     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
   ->  Sort  (cost=13.71..13.76 rows=20 width=37) (actual time=0.194..0.194 rows=5 loops=1)
         Sort Key: temp_table.n, temp_table.b
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Nested Loop  (cost=0.44..13.37 rows=20 width=37) (actual time=0.083..0.097 rows=20 loops=1)
               ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.018..0.018 rows=1 loops=1)
                     Group Key: cte.x
                     ->  CTE Scan on cte  (cost=0.00..0.02 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
               ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..13.14 rows=20 width=37) (actual time=0.058..0.068 rows=20 loops=1)
                     Index Cond: (n = cte.x)
                     Heap Fetches: 20
 Planning Time: 1.328 ms
 Execution Time: 0.360 ms
(15 rows)

+++++ TEN Values
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.91..89.11 rows=5 width=37) (actual time=0.179..0.183 rows=5 loops=1)
   CTE cte
     ->  Values Scan on "*VALUES*"  (cost=0.00..0.12 rows=10 width=4) (actual time=0.001..0.007 rows=10 loops=1)
   ->  Merge Semi Join  (cost=0.78..3528.72 rows=200 width=37) (actual time=0.178..0.181 rows=5 loops=1)
         Merge Cond: (temp_table.n = cte.x)
         ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.030..0.123 rows=204 loops=1)
               Heap Fetches: 204
         ->  Sort  (cost=0.37..0.39 rows=10 width=4) (actual time=0.023..0.023 rows=1 loops=1)
               Sort Key: cte.x
               Sort Method: quicksort  Memory: 25kB
               ->  CTE Scan on cte  (cost=0.00..0.20 rows=10 width=4) (actual time=0.003..0.013 rows=10 loops=1)
 Planning Time: 0.197 ms
 Execution Time: 0.226 ms
(13 rows)
++++++++ one row from table
                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14.39..58.52 rows=5 width=37) (actual time=0.168..0.173 rows=5 loops=1)
   CTE cte
     ->  Index Only Scan using idx_temp_table on temp_table temp_table_1  (cost=0.42..13.14 rows=20 width=4) (actual time=0.010..0.020 rows=20 loops=1)
           Index Cond: (n = 10)
           Heap Fetches: 20
   ->  Merge Semi Join  (cost=1.25..3531.24 rows=400 width=37) (actual time=0.167..0.170 rows=5 loops=1)
         Merge Cond: (temp_table.n = cte.x)
         ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.025..0.101 rows=204 loops=1)
               Heap Fetches: 204
         ->  Sort  (cost=0.83..0.88 rows=20 width=4) (actual time=0.039..0.039 rows=1 loops=1)
               Sort Key: cte.x
               Sort Method: quicksort  Memory: 25kB
               ->  CTE Scan on cte  (cost=0.00..0.40 rows=20 width=4) (actual time=0.012..0.031 rows=20 loops=1)
 Planning Time: 0.243 ms
 Execution Time: 0.211 ms
(15 rows)

++++++++ one row from table TWO ctes
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14.63..58.76 rows=5 width=37) (actual time=0.224..0.229 rows=5 loops=1)
   CTE val
     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
   CTE cte
     ->  Nested Loop  (cost=0.44..13.37 rows=20 width=4) (actual time=0.038..0.052 rows=20 loops=1)
           ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1)
                 Group Key: val.x
                 ->  CTE Scan on val  (cost=0.00..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
           ->  Index Only Scan using idx_temp_table on temp_table temp_table_1  (cost=0.42..13.14 rows=20 width=4) (actual time=0.029..0.038 rows=20 loops=1)
                 Index Cond: (n = val.x)
                 Heap Fetches: 20
   ->  Merge Semi Join  (cost=1.25..3531.24 rows=400 width=37) (actual time=0.223..0.226 rows=5 loops=1)
         Merge Cond: (temp_table.n = cte.x)
         ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.038..0.114 rows=204 loops=1)
               Heap Fetches: 204
         ->  Sort  (cost=0.83..0.88 rows=20 width=4) (actual time=0.082..0.082 rows=1 loops=1)
               Sort Key: cte.x
               Sort Method: quicksort  Memory: 25kB
               ->  CTE Scan on cte  (cost=0.00..0.40 rows=20 width=4) (actual time=0.040..0.062 rows=20 loops=1)
 Planning Time: 0.362 ms
 Execution Time: 0.313 ms
(21 rows)

小心 CTE！

对于计划者来说，CTE 或多或少是黑匣子，并且对预期的行数、统计分布或内部排序知之甚少。

在 CTE 导致错误计划的情况下（原始问题不是这种情况），CTE 通常可以用（临时）视图替换，是被策划者看到了它完全赤裸的荣耀。

更新

从版本 11 开始，计划程序对 CTE 的处理方式有所不同：如果它们没有副作用，则可以与主查询合并。（但检查您的查询计划仍然是个好主意）

Planner 不使用索引顺序使用 CTE 对记录进行排序

Planner not using index order to sort the records using CTE

sorting

postgresql

query-optimization

common-table-expression

postgresql-9.6

小心 CTE！

更新