等效于 Postgres 11 中的 FETCH FIRST WITH TIES,具有相当的性能

Equivalent for FETCH FIRST WITH TIES in Postgres 11 with comparable performance

给定以下 DDL

CREATE TABLE numbers (
    id     BIGSERIAL PRIMARY KEY,
    number BIGINT
);
CREATE INDEX ON numbers (number);

在 PostgreSQL 13 中,我可以使用以下查询:

SELECT *
FROM numbers
WHERE number > 1
ORDER BY number
FETCH FIRST 1000 ROWS WITH TIES

它生成了一个非常有效的查询计划并且在处理大型表时表现良好:

Limit  (cost=...)                                                   
  ->  Index Scan using numbers_number_idx on numbers  (cost=...) 
        Index Cond: (number > 1)  

是否有 PostgreSQL 11 中的等效项为大型 (10TB+) 表提供类似的查询计划和可比较的性能?

This answer 建议使用以下查询:

WITH cte AS (
    SELECT *, RANK() OVER (ORDER BY number) AS rnk
    FROM numbers
    WHERE number > 1
)
SELECT *
FROM cte
WHERE rnk <= 1000;

但是对于大表来说它并不能真正使用,因为性能要差很多倍。

Subquery Scan on cte  (cost=...)                                       
  Filter: (cte.rnk <= 1000)                                                                       
  ->  WindowAgg  (cost=...)                                            
        ->  Index Scan using numbers_number_idx on numbers  (cost=...) 
              Index Cond: (number > 1)

您将无法在 Postgres 13 或更高版本中获得极其方便的 WITH TIES 的性能。参见:

但总有选择...

更快 SQL 变体

对于大 tables 应该更快 因为它避免像 simple solution in the referenced answer.[=25= 那样对整个 table 进行排名]

WITH cte AS (
   SELECT *, row_number() OVER (ORDER  BY number, id) AS rn  -- ORDER must match
   FROM   numbers
   WHERE  number > 1
   ORDER  BY number, id  -- tiebreaker to make it deterministic
   LIMIT  1000
   )
TABLE cte
UNION ALL
(
SELECT n.*, 1000 + row_number() OVER (ORDER BY n.id) AS rn
FROM   numbers n, (SELECT number, id FROM cte WHERE rn = 1000) AS max
WHERE  n.number = max.number
AND    n.id > max.id
ORDER  BY n.id
);

id 是您 table 中的任何 UNIQUE NOT NULL 列,可以作为决胜局,通常是 PRIMARY KEY.

另外 进行排序,它具有(可能受欢迎的)副作用,即您会获得确定性排序的结果。

理想情况下,您在 (number, id) 上有一个多列 index。但它也将仅使用 (number) 上的现有索引。结果性能取决于基数和数据类型。
相关:

由于 CTE 查询算作 一个 命令,因此在并发写入负载下 没有竞争条件 。命令的所有部分在默认隔离级别 READ COMMITTED.

中看到相同的 table 快照

为了方便起见,我可能会将其包装成一个简单的 SQL 函数(或准备好的语句)并传递 LIMIT 给它。

替代PL/pgSQL

只有 运行 一个 SELECT 可能超过增加的开销。仅在 (number):

上与现有索引配合使用效果最佳
CREATE OR REPLACE FUNCTION foo(_bound int, _limit int, OUT _row public.numbers)
  RETURNS SETOF public.numbers
  LANGUAGE plpgsql AS
$func$
DECLARE
   _ct int := 0;
   _number int;  -- match type!
BEGIN
   FOR _row IN
      SELECT *
      FROM   public.numbers
      WHERE  number > _bound
      ORDER  BY number
   LOOP
      _ct := _ct + 1;
      CASE 
      WHEN _ct < _limit THEN
         -- do nothing
      WHEN _ct > _limit THEN
         EXIT WHEN _row.number > _number;
      WHEN _ct = _limit THEN
         _number := _row.number;   
      END CASE;

      RETURN NEXT;
   END LOOP;
END
$func$;

但它只针对一个查询。对于不同的查询变得乏味。

我用下面的 select 查询得到了很好的结果。没有其他答案那么快,但查询也简单了很多。偏移值必须比所需的行数少 1(1000 为 999),因为偏移从 0 开始。

-- query 1
select *
from numbers 
where number <= (
  select number 
  from numbers
  where number > 1
  order by number
  offset 999 limit 1
  ) 
  and number > 1

性能

测试 运行 问题中定义的 table 和以下插入语句。

INSERT INTO numbers (n) 
SELECT floor(random() * 100000000) 
FROM generate_series(1, 100000000);

我承认这不是很科学,因为这是在我微不足道的 windows 10 笔记本电脑和 运行ning 在 wsl 上的 postgresql 13 集群上完成的,所有默认电池电量设置。由于缓存中存在数据,我没有在 运行 秒之间重新启动服务器,也没有采取任何措施来阻止优化。

解释分析结果,其中查询 1 指的是上面的查询,查询 2 指的是问题中的简单查询,查询 3 指的是其他答案中的查询,查询 4 ​​是优雅的 with ties 变体。

query execution time factor slower
1 80.642 ms 63.6
2 142142.026 ms 112011
3 59.551 ms 46.9
4 1.269 ms 1

解释分析查询计划

-- query 1
 Gather  (cost=10965.26..1178206.10 rows=500000 width=16) (actual time=64.157..79.186 rows=1000 loops=1)
   Workers Planned: 2
   Params Evaluated: [=12=]
   Workers Launched: 2
   InitPlan 1 (returns [=12=])
     ->  Limit  (cost=27.66..27.69 rows=1 width=8) (actual time=63.613..63.615 rows=1 loops=1)
           ->  Index Only Scan using numbers_n_idx on numbers numbers_1  (cost=0.57..2712066.05 rows=100000085 width=8) (actual time=0.029..0.318 rows=1000 loops=1)
                 Index Cond: (number > 1)
                 Heap Fetches: 0
   ->  Parallel Bitmap Heap Scan on numbers  (cost=9937.57..1127178.41 rows=208333 width=16) (actual time=0.090..0.498 rows=333 loops=3)
         Recheck Cond: ((number <= [=12=]) AND (number > 1))
         Heap Blocks: exact=1000
         ->  Bitmap Index Scan on numbers_n_idx  (cost=0.00..9812.57 rows=500000 width=0) (actual time=0.135..0.135 rows=1000 loops=1)
               Index Cond: ((number <= [=12=]) AND (number > 1))
 Planning Time: 0.230 ms
 JIT:
   Functions: 18
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1.936 ms, Inlining 44.863 ms, Optimization 12.045 ms, Emission 6.182 ms, Total 65.026 ms
 Execution Time: 80.642 ms
(20 rows)
-- query 2
 Subquery Scan on cte  (cost=8472029.70..22868689.85 rows=33333362 width=24) (actual time=40001.966..142074.323 rows=1000 loops=1)
   Filter: (cte.rnk <= 1000)
   Rows Removed by Filter: 99998999
   ->  WindowAgg  (cost=8472029.70..21618688.79 rows=100000085 width=24) (actual time=40001.964..129209.141 rows=99999999 loops=1)
         ->  Gather Merge  (cost=8472029.70..20118687.52 rows=100000085 width=16) (actual time=40001.921..75176.460 rows=99999999 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=8471029.68..8575196.43 rows=41666702 width=16) (actual time=39707.316..46741.888 rows=33333333 loops=3)
                     Sort Key: numbers.number
                     Sort Method: external merge  Disk: 856888kB
                     Worker 0:  Sort Method: external merge  Disk: 839696kB
                     Worker 1:  Sort Method: external merge  Disk: 847688kB
                     ->  Parallel Seq Scan on numbers  (cost=0.00..1061374.79 rows=41666702 width=16) (actual time=60.334..13404.824 rows=33333333 loops=3)
                           Filter: (number > 1)
                           Rows Removed by Filter: 0
 Planning Time: 0.179 ms
 JIT:
   Functions: 16
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1.773 ms, Inlining 90.193 ms, Optimization 55.148 ms, Emission 34.709 ms, Total 181.823 ms
 Execution Time: 142142.026 ms
(21 rows)
-- query 3
 Append  (cost=2497.62..2628.54 rows=1005 width=24) (actual time=12.881..59.360 rows=1000 loops=1)
   CTE cte
     ->  Limit  (cost=1004.33..2497.62 rows=1000 width=24) (actual time=12.877..58.683 rows=1000 loops=1)
           ->  WindowAgg  (cost=1004.33..149329948.33 rows=100000085 width=24) (actual time=12.875..58.435 rows=1000 loops=1)
                 ->  Gather Merge  (cost=1004.33..147579946.84 rows=100000085 width=16) (actual time=12.858..57.988 rows=1001 loops=1)
                       Workers Planned: 2
                       Workers Launched: 2
                       ->  Incremental Sort  (cost=4.31..136036455.76 rows=41666702 width=16) (actual time=0.147..29.746 rows=366 loops=3)
                             Sort Key: numbers.number, numbers.id
                             Presorted Key: numbers.number
                             Full-sort Groups: 13  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
                             Worker 0:  Full-sort Groups: 15  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
                             Worker 1:  Full-sort Groups: 8  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
                             ->  Parallel Index Scan using numbers_n_idx on numbers  (cost=0.57..134359882.50 rows=41666702 width=16) (actual time=0.042..29.576 rows=393 loops=3)
                                   Index Cond: (number > 1)
   ->  CTE Scan on cte  (cost=0.00..20.00 rows=1000 width=24) (actual time=12.880..14.618 rows=1000 loops=1)
   ->  WindowAgg  (cost=105.75..105.85 rows=5 width=24) (actual time=0.079..0.081 rows=0 loops=1)
         ->  Sort  (cost=105.75..105.76 rows=5 width=16) (actual time=0.077..0.079 rows=0 loops=1)
               Sort Key: n.id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop  (cost=0.57..105.69 rows=5 width=16) (actual time=0.070..0.071 rows=0 loops=1)
                     ->  CTE Scan on cte cte_1  (cost=0.00..22.50 rows=5 width=16) (actual time=0.051..0.051 rows=1 loops=1)
                           Filter: (rn = 1000)
                           Rows Removed by Filter: 999
                     ->  Index Scan using numbers_n_idx on numbers n  (cost=0.57..16.63 rows=1 width=16) (actual time=0.015..0.016 rows=0 loops=1)
                           Index Cond: (number = cte_1.number)
                           Filter: (id > cte_1.id)
                           Rows Removed by Filter: 1
 Planning Time: 0.202 ms
 Execution Time: 59.551 ms
(30 rows)
-- query 4
Limit  (cost=0.57..1350.00 rows=1000 width=16) (actual time=0.013..1.113 rows=1000 loops=1)
   ->  Index Scan using numbers_n_idx on numbers  (cost=0.57..134943216.33 rows=100000085 width=16) (actual time=0.012..0.869 rows=1001 loops=1)
         Index Cond: (number > 1)
 Planning Time: 0.142 ms
 Execution Time: 1.269 ms
(5 rows)