postgresql中每第n行的id列的返回值

Question

我需要处理具有数百万条记录（约 25M）的 table 的 id 列（主键，整数）。但是，我只对第n个id感兴趣。

目前使用的是一种简单的方法：

select id from big order by id;

然后每个第 n 个 id 由客户端软件处理（基于游标）。

我想知道如果将每个第 n 个 id 的选择委托给 postgresql，这是否会更有效率。试过这个：

select id from 
    (select id, row_number() over (order by id) from big) _ 
    where row_number % 10000 = 0;`

但是，这种方法慢很多：

                                                                       QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on _  (cost=0.44..1291886.20 rows=115413 width=4) (actual time=9.385..10731.994 rows=2308 loops=1)
   Filter: ((_.row_number % '10000'::bigint) = 0)
   Rows Removed by Filter: 23080220
   ->  WindowAgg  (cost=0.44..945648.28 rows=23082528 width=12) (actual time=0.107..9450.396 rows=23082528 loops=1)
         ->  Index Only Scan using big_pkey on big  (cost=0.44..599410.36 rows=23082528 width=4) (actual time=0.093..2403.921 rows=23082528 loops=1)
               Heap Fetches: 0
 Planning Time: 0.172 ms
 Execution Time: 10732.229 ms
(8 rows)

简单查询的执行时间为 2721.101 毫秒（几乎快了 4 倍）。

问题：有更好的方法吗？（使用 PostgreSQL 11）

Answer 1

您是要获取第 n 个 ID 进行采样，还是有其他原因导致它必须是排序结果中的实际第 n 个？

如果您只需要一个随机样本，TABLESAMPLE 就很棒。您所要做的就是在 SELECT 中添加一个简单的子句，并根据您的需要添加一个 LIMIT。

这是一个包含更多详细信息的近期问题：

I need a function to select 88 random rows from a table (without duplicates)

Answer 2

创建游标并仅每第十行获取一次（也适用于不同的间隔）：

BEGIN; -- must be in a transaction

DECLARE cc CURSOR FOR
   SELECT id FROM big ORDER BY id;

/* skip 9 rows */
MOVE 9 IN cc;

FETCH NEXT FROM cc;

继续循环执行 MOVE 和 FETCH，直到运行超出行数。

postgresql中每第n行的id列的返回值

Returning value of id column of every n-th row in postgresql

postgresql

postgresql-11