哪些 Postgres 13 索引类型支持远程搜索?

What Postgres 13 index types support distance searches?

原始问题

我们使用带有 gist_trgm_ops 的 GiST 索引的 K-NN 搜索取得了很好的结果。纯粹的魔法。我遇到过其他情况,例如 timestamp 等其他数据类型,其中距离函数会非常有用。如果我没有想到,这可以通过 pg_catalog 获得或曾经获得。环顾四周,我找不到通过此类属性搜索索引的方法。我认为在这种情况下,我所追求的是 AMPROP_DISTANCE_ORDERABLE 幕后。

刚刚检查过,pg_am 确实比现在 9.6 之前的属性多了很多。

是否有另一种方法可以确定各种索引对目录搜索有哪些选择?

目录

jjanes 的回答启发了我更多地查看系统信息功能,并在 pg_catalog 表中度过了一天。索引和运算符的目录复杂。系统信息功能是一个很大的帮助。事实证明,这篇文章对于处理事情非常有用:

https://postgrespro.com/blog/pgsql/4161264

我认为结论是“不,您无法轻易弄清楚哪些数据类型和索引支持邻近搜索”。相关属性是特定索引中列的 属性。然而,最近邻搜索似乎需要 GiST 索引,并且有现成的索引运算符 classes 可以将 K-NN 搜索添加到大量常见类型中。很高兴对这些结论或下面的详细信息进行更正。

内置距离支持

https://www.postgresql.org/docs/current/gist-builtin-opclasses.html

从文档的各个部分来看,听起来好像有一些内置几何类型上的 GiST 索引的距离(邻近、最近邻、K-NN)运算符。

box
circle
point
poly

B树运算符类

未在文档中列出,但可通过此查询查看:

select am.amname AS index_method
                , opc.opcname AS opclass_name
                , opc.opcintype::regtype AS indexed_type
                , opc.opcdefault AS is_default
             from pg_am am
                , pg_opclass opc
            where opc.opcmethod = am.oid 
             and am.amname = 'btree'
         order by 1,2;

B-tree GiST 距离支持

https://www.postgresql.org/docs/current/btree-gist.html

我想 B-tree 是 GiST 的特例,并且有一个 B-tree 运算符 class 可以匹配。文档说支持这些原生类型:

int2
int4
int8
float4
float8
timestamp with time zone
timestamp without time zone
time without time zone
date
interval
oid
money

BRIN 内置运算符类

https://www.postgresql.org/docs/current/brin-builtin-opclasses.html

内部文档中列出了 70 多个。

GIN 内置运算符类

https://www.postgresql.org/docs/12/gin-builtin-opclasses.html

array_ops
jsonb_ops
jsonb_path_ops
tsvector_ops

替代文本选项

https://www.postgresql.org/docs/current/indexes-opclass.html 有特殊的运算符 classes 用于逐个字符进行文本比较,而不是通过排序规则。或者文档说:

text_pattern_ops
varchar_pattern_ops
bpchar_pattern_ops

pg_trgm

除此之外,包含的 pg_trgm 模块包括 GIN 和 GiST 的运算符,GiST 版本优化 <->。我认为这显示为:

text

注意:Postgres 14 修改pg_trgm 允许您调整索引条目的“签名长度”。更长的可能更准确,更短的签名在磁盘上更小。如果您一直在使用 pg_trgm,可能值得尝试 PG 14 中的签名长度。

https://www.postgresql.org/docs/current/pgtrgm.html

SP-GiST 内置运算符类

box_ops
kd_point_ops
network_ops
poly_ops
quad_point_ops
range_ops
text_ops

pg_operator 搜索

这是对 pg_operator 的搜索,用于查找从 <-> 运算符本身开始的匹配项:

select oprnamespace::regnamespace::text  as schema_name,
       oprowner::regrole                 as owner,
       oprname                           as operator,
       
       oprleft::regtype                  as left,
       oprright::regtype                 as right,
       oprresult::regtype                as result,
       
       oprcom::regoperator              as commutator
              
 from pg_operator
where oprname = '<->'

order by 1

我们的一台服务器的输出:

| schema_name | owner    | operator | left                        | right                       | result           | commutator                                                   |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
| extensions  | postgres | <->      | text                        | text                        | real             | <->(text,text)                                               |
| extensions  | postgres | <->      | money                       | money                       | money            | <->(money,money)                                             |
| extensions  | postgres | <->      | date                        | date                        | integer          | <->(date,date)                                               |
| extensions  | postgres | <->      | real                        | real                        | real             | <->(real,real)                                               |
| extensions  | postgres | <->      | double precision            | double precision            | double precision | <->(double precision,double precision)                       |
| extensions  | postgres | <->      | smallint                    | smallint                    | smallint         | <->(smallint,smallint)                                       |
| extensions  | postgres | <->      | integer                     | integer                     | integer          | <->(integer,integer)                                         |
| extensions  | postgres | <->      | bigint                      | bigint                      | bigint           | <->(bigint,bigint)                                           |
| extensions  | postgres | <->      | interval                    | interval                    | interval         | <->(interval,interval)                                       |
| extensions  | postgres | <->      | oid                         | oid                         | oid              | <->(oid,oid)                                                 |
| extensions  | postgres | <->      | time without time zone      | time without time zone      | interval         | <->(time without time zone,time without time zone)           |
| extensions  | postgres | <->      | timestamp without time zone | timestamp without time zone | interval         | <->(timestamp without time zone,timestamp without time zone) |
| extensions  | postgres | <->      | timestamp with time zone    | timestamp with time zone    | interval         | <->(timestamp with time zone,timestamp with time zone)       |
| pg_catalog  | postgres | <->      | box                         | box                         | double precision | <->(box,box)                                                 |
| pg_catalog  | postgres | <->      | path                        | path                        | double precision | <->(path,path)                                               |
| pg_catalog  | postgres | <->      | line                        | line                        | double precision | <->(line,line)                                               |
| pg_catalog  | postgres | <->      | lseg                        | lseg                        | double precision | <->(lseg,lseg)                                               |
| pg_catalog  | postgres | <->      | polygon                     | polygon                     | double precision | <->(polygon,polygon)                                         |
| pg_catalog  | postgres | <->      | circle                      | circle                      | double precision | <->(circle,circle)                                           |
| pg_catalog  | postgres | <->      | point                       | circle                      | double precision | <->(circle,point)                                            |
| pg_catalog  | postgres | <->      | circle                      | point                       | double precision | <->(point,circle)                                            |
| pg_catalog  | postgres | <->      | point                       | polygon                     | double precision | <->(polygon,point)                                           |
| pg_catalog  | postgres | <->      | polygon                     | point                       | double precision | <->(point,polygon)                                           |
| pg_catalog  | postgres | <->      | circle                      | polygon                     | double precision | <->(polygon,circle)                                          |
| pg_catalog  | postgres | <->      | polygon                     | circle                      | double precision | <->(circle,polygon)                                          |
| pg_catalog  | postgres | <->      | point                       | point                       | double precision | <->(point,point)                                             |
| pg_catalog  | postgres | <->      | box                         | line                        | double precision | <->(line,box)                                                |
| pg_catalog  | postgres | <->      | tsquery                     | tsquery                     | tsquery          | 0                                                            |
| pg_catalog  | postgres | <->      | line                        | box                         | double precision | <->(box,line)                                                |
| pg_catalog  | postgres | <->      | point                       | line                        | double precision | <->(line,point)                                              |
| pg_catalog  | postgres | <->      | line                        | point                       | double precision | <->(point,line)                                              |
| pg_catalog  | postgres | <->      | point                       | lseg                        | double precision | <->(lseg,point)                                              |
| pg_catalog  | postgres | <->      | lseg                        | point                       | double precision | <->(point,lseg)                                              |
| pg_catalog  | postgres | <->      | point                       | box                         | double precision | <->(box,point)                                               |
| pg_catalog  | postgres | <->      | box                         | point                       | double precision | <->(point,box)                                               |
| pg_catalog  | postgres | <->      | lseg                        | line                        | double precision | <->(line,lseg)                                               |
| pg_catalog  | postgres | <->      | line                        | lseg                        | double precision | <->(lseg,line)                                               |
| pg_catalog  | postgres | <->      | lseg                        | box                         | double precision | <->(box,lseg)                                                |
| pg_catalog  | postgres | <->      | box                         | lseg                        | double precision | <->(lseg,box)                                                |
| pg_catalog  | postgres | <->      | point                       | path                        | double precision | <->(path,point)                                              |
| pg_catalog  | postgres | <->      | path                        | point                       | double precision | <->(point,path)                                              |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+

我是否遗漏了任何值得了解的索引选项?

查看实时索引

这是一个比它应该更长的查询,因为我仍然会发现目录混乱的查询,用于从每个用户索引中提取列,并找出它们更有趣的属性。要获得实用的简短目录搜索,请参阅 https://dba.stackexchange.com/questions/186944/how-to-list-all-the-indexes-along-with-their-type-btree-brin-hash-etc

with 
basic_details as (
select relnamespace::regnamespace::text     as schema_name,
       indrelid::regclass::text             as table_name,
       indexrelid::regclass::text           as index_name,
       unnest(indkey)                       as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
       generate_subscripts(indkey, 1) + 1   as column_position_in_index  -- 
                          
  from pg_index 
  join pg_class on pg_class.oid = pg_index.indrelid
),

enriched_details as (

  select basic_details.schema_name,
         basic_details.table_name,
         basic_details.index_name,
         basic_details.column_ordinal_position,
         basic_details.column_position_in_index,
                  
         columns.column_name,
         columns.udt_name     as column_type_name      
  
    from basic_details 
    
    join information_schema.columns as columns 
      on columns.table_schema     = basic_details.schema_name
     and columns.table_name       = basic_details.table_name
     and columns.ordinal_position = basic_details.column_ordinal_position
                     
    where schema_name not like 'pg_%'
  )
  
  select *,
        -- https://postgrespro.com/blog/pgsql/4161264
         coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
         coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false)       as supports_in_searches,
         coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false)         as supports_index_only_scans,
        
        
         (select indexdef 
             from pg_indexes 
            where pg_indexes.schemaname  = enriched_details.schema_name
              and pg_indexes.indexname   = enriched_details.index_name) as index_definition
  
     from enriched_details 
  
 order by supports_in_searches desc,
          schema_name,
          table_name,
          index_name

时间戳类型支持使用由 btree_gist 扩展创建的 <-> 运算符的 GiST 索引的 KNN。

你可以检查特定索引的特定列是否支持它,像这样:

select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');

据我所知,这是 PG 14 的游戏状态:

  • GiST 索引可能支持最近邻 (K-NN) 邻近度 <--> 搜索,并且始终如此。

  • 从 PG 12 开始,SP-GiST 添加了此类支持。

  • RUM 索引(不在核心中)也支持 K-NN。

在所有情况下,支持都在运算符 class:

中完成

https://www.postgresql.org/docs/current/indexes-opclass.html

这就是决定 distance_orderable 是否适用于 specific 类型索引上的 specific 数据类型的原因。一些内置的几何和文本矢量类型开箱即用。除了那个小集合之外,还有更多类型通过特定的运算符 classes 得到支持,例如:

https://www.postgresql.org/docs/current/btree-gist.html https://www.postgresql.org/docs/current/pgtrgm.html

对于 SP-GiST,支持的类型比 GiST 少很多,一旦你安装了 btree_gist:

https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html

看起来 text_optsrange_opts 支持邻近搜索。但是,对于 tsrange 等,其他工具可能有足够的选择。