PostgreSQL 中的快速随机行：为什么 time (floor(random()N)) + (select from a where id = const) 比 select where id = random 少 100 倍？

Question

我需要从 PostgreSQL 查询中快速 select 行。我读了 Best way to select random rows PostgreSQL。 quick random row selection in Postgres

到目前为止，我阅读速度最快的是：

CREATE EXTENSION IF NOT EXISTS tsm_system_rows;
SELECT myid  FROM mytable TABLESAMPLE SYSTEM_ROWS(1);

平均 2 毫秒。但正如评论中所述，它不是 "completely random".

我试过了

SELECT id FROM a OFFSET floor(random()*3000000) LIMIT 1;

15-200 毫秒

最简单的想法是 select 按 id，因为我的 id 是连续的。但是

select floor(random ()*1000); 2ms
select * from a where id=233; 2ms (and again 2ms for other constants)

但是

SELECT * FROM a where id = floor(random ()*1000)::integer; 300ms!!!

为什么是 300 而不是 4？是否有可能以某种方式重新排序、提示等以缩短 4 毫秒？

Answer 1

这是因为 random() 被定义为可变的，所以 Postgres 会再次对每一行进行评估 - 有效地遍历所有行。

如果你想避免这种情况，"hide" 它在一个（否则无用的）子选择后面：

SELECT * 
FROM a 
where id = (select trunc(random ()*1000)::integer);

Answer 2

以下内容严格属于 @a-horse-with_no-name 回答后的 OP 问题：奇怪的是，它变成了 long w/out ::integer。这是为什么？

因为 ::integer 是 SQL 标准的 Postgres 扩展 "select cast( number as integer)" RANDOM() 返回的类型是双精度的，并且在应用 TRUNC() 函数后仍然如此。显示的内容由您的系统决定。

在其一般形式中，结构 val::data_type 表示将 val 强制转换为指定的 data_type（前提是存在有效的强制转换函数）。如果 val 本身是一个表达式，则格式变为 (val)::data_type。
下面逐步显示无名马的查询正在做什么，并指示该步骤的数据类型。 CTE 是严格的，因此每个步骤使用相同的值，因为每次使用 random() 都会生成不同的值。

with gen  as (select random() n)
select  n,pg_typeof(n)                          --step1 get random value interval [0-1). 
     ,  n*1000, pg_typeof(n*1000)               -- get value into interval [0-999.9999...)  
     ,  trunc(n*1000), pg_typeof(trunc(n*1000)) -- reduce to interval [0,999.000)
     ,  trunc(n*1000)::integer, pg_typeof(trunc(n*1000)::integer) 
  from gen;                                     -- cast to integer interval [0-999)

顺便说一句，上面并不严格需要 trunc() 函数，因为将双精度转换为整数会丢弃任何小数位。

我希望这可以帮助您了解正在发生的事情。

PostgreSQL 中的快速随机行：为什么 time (floor(random()N)) + (select from a where id = const) 比 select where id = random 少 100 倍？

quick random row in PostgreSQL: why time (floor(random()N)) + (select from a where id = const) 100 times less then select where id = random?

postgresql

random-access

PostgreSQL 中的快速随机行：为什么 time (floor(random()*N)) + (select * from a where id = const) 比 select where id = random 少 100 倍？

quick random row in PostgreSQL: why time (floor(random()*N)) + (select * from a where id = const) 100 times less then select where id = random?

postgresql

random-access

PostgreSQL 中的快速随机行：为什么 time (floor(random()N)) + (select from a where id = const) 比 select where id = random 少 100 倍？

quick random row in PostgreSQL: why time (floor(random()N)) + (select from a where id = const) 100 times less then select where id = random?