哪些 Postgres 13 索引类型支持远程搜索?
What Postgres 13 index types support distance searches?
原始问题
我们使用带有 gist_trgm_ops
的 GiST 索引的 K-NN 搜索取得了很好的结果。纯粹的魔法。我遇到过其他情况,例如 timestamp
等其他数据类型,其中距离函数会非常有用。如果我没有想到,这可以通过 pg_catalog
获得或曾经获得。环顾四周,我找不到通过此类属性搜索索引的方法。我认为在这种情况下,我所追求的是 AMPROP_DISTANCE_ORDERABLE
幕后。
刚刚检查过,pg_am
确实比现在 9.6 之前的属性多了很多。
是否有另一种方法可以确定各种索引对目录搜索有哪些选择?
目录
jjanes 的回答启发了我更多地查看系统信息功能,并在 pg_catalog 表中度过了一天。索引和运算符的目录复杂。系统信息功能是一个很大的帮助。事实证明,这篇文章对于处理事情非常有用:
https://postgrespro.com/blog/pgsql/4161264
我认为结论是“不,您无法轻易弄清楚哪些数据类型和索引支持邻近搜索”。相关属性是特定索引中列的 属性。然而,最近邻搜索似乎需要 GiST 索引,并且有现成的索引运算符 classes 可以将 K-NN 搜索添加到大量常见类型中。很高兴对这些结论或下面的详细信息进行更正。
内置距离支持
https://www.postgresql.org/docs/current/gist-builtin-opclasses.html
从文档的各个部分来看,听起来好像有一些内置几何类型上的 GiST 索引的距离(邻近、最近邻、K-NN)运算符。
box
circle
point
poly
B树运算符类
未在文档中列出,但可通过此查询查看:
select am.amname AS index_method
, opc.opcname AS opclass_name
, opc.opcintype::regtype AS indexed_type
, opc.opcdefault AS is_default
from pg_am am
, pg_opclass opc
where opc.opcmethod = am.oid
and am.amname = 'btree'
order by 1,2;
B-tree GiST 距离支持
https://www.postgresql.org/docs/current/btree-gist.html
我想 B-tree 是 GiST 的特例,并且有一个 B-tree 运算符 class 可以匹配。文档说支持这些原生类型:
int2
int4
int8
float4
float8
timestamp with time zone
timestamp without time zone
time without time zone
date
interval
oid
money
BRIN 内置运算符类
https://www.postgresql.org/docs/current/brin-builtin-opclasses.html
内部文档中列出了 70 多个。
GIN 内置运算符类
https://www.postgresql.org/docs/12/gin-builtin-opclasses.html
array_ops
jsonb_ops
jsonb_path_ops
tsvector_ops
替代文本选项
https://www.postgresql.org/docs/current/indexes-opclass.html
有特殊的运算符 classes 用于逐个字符进行文本比较,而不是通过排序规则。或者文档说:
text_pattern_ops
varchar_pattern_ops
bpchar_pattern_ops
pg_trgm
除此之外,包含的 pg_trgm
模块包括 GIN 和 GiST 的运算符,GiST 版本优化 <->
。我认为这显示为:
text
注意:Postgres 14 修改pg_trgm
允许您调整索引条目的“签名长度”。更长的可能更准确,更短的签名在磁盘上更小。如果您一直在使用 pg_trgm
,可能值得尝试 PG 14 中的签名长度。
https://www.postgresql.org/docs/current/pgtrgm.html
SP-GiST 内置运算符类
box_ops
kd_point_ops
network_ops
poly_ops
quad_point_ops
range_ops
text_ops
pg_operator 搜索
这是对 pg_operator
的搜索,用于查找从 <->
运算符本身开始的匹配项:
select oprnamespace::regnamespace::text as schema_name,
oprowner::regrole as owner,
oprname as operator,
oprleft::regtype as left,
oprright::regtype as right,
oprresult::regtype as result,
oprcom::regoperator as commutator
from pg_operator
where oprname = '<->'
order by 1
我们的一台服务器的输出:
| schema_name | owner | operator | left | right | result | commutator |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
| extensions | postgres | <-> | text | text | real | <->(text,text) |
| extensions | postgres | <-> | money | money | money | <->(money,money) |
| extensions | postgres | <-> | date | date | integer | <->(date,date) |
| extensions | postgres | <-> | real | real | real | <->(real,real) |
| extensions | postgres | <-> | double precision | double precision | double precision | <->(double precision,double precision) |
| extensions | postgres | <-> | smallint | smallint | smallint | <->(smallint,smallint) |
| extensions | postgres | <-> | integer | integer | integer | <->(integer,integer) |
| extensions | postgres | <-> | bigint | bigint | bigint | <->(bigint,bigint) |
| extensions | postgres | <-> | interval | interval | interval | <->(interval,interval) |
| extensions | postgres | <-> | oid | oid | oid | <->(oid,oid) |
| extensions | postgres | <-> | time without time zone | time without time zone | interval | <->(time without time zone,time without time zone) |
| extensions | postgres | <-> | timestamp without time zone | timestamp without time zone | interval | <->(timestamp without time zone,timestamp without time zone) |
| extensions | postgres | <-> | timestamp with time zone | timestamp with time zone | interval | <->(timestamp with time zone,timestamp with time zone) |
| pg_catalog | postgres | <-> | box | box | double precision | <->(box,box) |
| pg_catalog | postgres | <-> | path | path | double precision | <->(path,path) |
| pg_catalog | postgres | <-> | line | line | double precision | <->(line,line) |
| pg_catalog | postgres | <-> | lseg | lseg | double precision | <->(lseg,lseg) |
| pg_catalog | postgres | <-> | polygon | polygon | double precision | <->(polygon,polygon) |
| pg_catalog | postgres | <-> | circle | circle | double precision | <->(circle,circle) |
| pg_catalog | postgres | <-> | point | circle | double precision | <->(circle,point) |
| pg_catalog | postgres | <-> | circle | point | double precision | <->(point,circle) |
| pg_catalog | postgres | <-> | point | polygon | double precision | <->(polygon,point) |
| pg_catalog | postgres | <-> | polygon | point | double precision | <->(point,polygon) |
| pg_catalog | postgres | <-> | circle | polygon | double precision | <->(polygon,circle) |
| pg_catalog | postgres | <-> | polygon | circle | double precision | <->(circle,polygon) |
| pg_catalog | postgres | <-> | point | point | double precision | <->(point,point) |
| pg_catalog | postgres | <-> | box | line | double precision | <->(line,box) |
| pg_catalog | postgres | <-> | tsquery | tsquery | tsquery | 0 |
| pg_catalog | postgres | <-> | line | box | double precision | <->(box,line) |
| pg_catalog | postgres | <-> | point | line | double precision | <->(line,point) |
| pg_catalog | postgres | <-> | line | point | double precision | <->(point,line) |
| pg_catalog | postgres | <-> | point | lseg | double precision | <->(lseg,point) |
| pg_catalog | postgres | <-> | lseg | point | double precision | <->(point,lseg) |
| pg_catalog | postgres | <-> | point | box | double precision | <->(box,point) |
| pg_catalog | postgres | <-> | box | point | double precision | <->(point,box) |
| pg_catalog | postgres | <-> | lseg | line | double precision | <->(line,lseg) |
| pg_catalog | postgres | <-> | line | lseg | double precision | <->(lseg,line) |
| pg_catalog | postgres | <-> | lseg | box | double precision | <->(box,lseg) |
| pg_catalog | postgres | <-> | box | lseg | double precision | <->(lseg,box) |
| pg_catalog | postgres | <-> | point | path | double precision | <->(path,point) |
| pg_catalog | postgres | <-> | path | point | double precision | <->(point,path) |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
我是否遗漏了任何值得了解的索引选项?
查看实时索引
这是一个比它应该更长的查询,因为我仍然会发现目录混乱的查询,用于从每个用户索引中提取列,并找出它们更有趣的属性。要获得实用的简短目录搜索,请参阅 https://dba.stackexchange.com/questions/186944/how-to-list-all-the-indexes-along-with-their-type-btree-brin-hash-etc
with
basic_details as (
select relnamespace::regnamespace::text as schema_name,
indrelid::regclass::text as table_name,
indexrelid::regclass::text as index_name,
unnest(indkey) as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
generate_subscripts(indkey, 1) + 1 as column_position_in_index --
from pg_index
join pg_class on pg_class.oid = pg_index.indrelid
),
enriched_details as (
select basic_details.schema_name,
basic_details.table_name,
basic_details.index_name,
basic_details.column_ordinal_position,
basic_details.column_position_in_index,
columns.column_name,
columns.udt_name as column_type_name
from basic_details
join information_schema.columns as columns
on columns.table_schema = basic_details.schema_name
and columns.table_name = basic_details.table_name
and columns.ordinal_position = basic_details.column_ordinal_position
where schema_name not like 'pg_%'
)
select *,
-- https://postgrespro.com/blog/pgsql/4161264
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false) as supports_in_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false) as supports_index_only_scans,
(select indexdef
from pg_indexes
where pg_indexes.schemaname = enriched_details.schema_name
and pg_indexes.indexname = enriched_details.index_name) as index_definition
from enriched_details
order by supports_in_searches desc,
schema_name,
table_name,
index_name
时间戳类型支持使用由 btree_gist 扩展创建的 <-> 运算符的 GiST 索引的 KNN。
你可以检查特定索引的特定列是否支持它,像这样:
select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');
据我所知,这是 PG 14 的游戏状态:
GiST 索引可能支持最近邻 (K-NN) 邻近度 <-->
搜索,并且始终如此。
从 PG 12 开始,SP-GiST 添加了此类支持。
RUM 索引(不在核心中)也支持 K-NN。
在所有情况下,支持都在运算符 class:
中完成
https://www.postgresql.org/docs/current/indexes-opclass.html
这就是决定 distance_orderable
是否适用于 specific 类型索引上的 specific 数据类型的原因。一些内置的几何和文本矢量类型开箱即用。除了那个小集合之外,还有更多类型通过特定的运算符 classes 得到支持,例如:
https://www.postgresql.org/docs/current/btree-gist.html
https://www.postgresql.org/docs/current/pgtrgm.html
对于 SP-GiST,支持的类型比 GiST 少很多,一旦你安装了 btree_gist
:
https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html
看起来 text_opts
和 range_opts
不 支持邻近搜索。但是,对于 tsrange
等,其他工具可能有足够的选择。
原始问题
我们使用带有 gist_trgm_ops
的 GiST 索引的 K-NN 搜索取得了很好的结果。纯粹的魔法。我遇到过其他情况,例如 timestamp
等其他数据类型,其中距离函数会非常有用。如果我没有想到,这可以通过 pg_catalog
获得或曾经获得。环顾四周,我找不到通过此类属性搜索索引的方法。我认为在这种情况下,我所追求的是 AMPROP_DISTANCE_ORDERABLE
幕后。
刚刚检查过,pg_am
确实比现在 9.6 之前的属性多了很多。
是否有另一种方法可以确定各种索引对目录搜索有哪些选择?
目录
jjanes 的回答启发了我更多地查看系统信息功能,并在 pg_catalog 表中度过了一天。索引和运算符的目录复杂。系统信息功能是一个很大的帮助。事实证明,这篇文章对于处理事情非常有用:
https://postgrespro.com/blog/pgsql/4161264
我认为结论是“不,您无法轻易弄清楚哪些数据类型和索引支持邻近搜索”。相关属性是特定索引中列的 属性。然而,最近邻搜索似乎需要 GiST 索引,并且有现成的索引运算符 classes 可以将 K-NN 搜索添加到大量常见类型中。很高兴对这些结论或下面的详细信息进行更正。
内置距离支持
https://www.postgresql.org/docs/current/gist-builtin-opclasses.html
从文档的各个部分来看,听起来好像有一些内置几何类型上的 GiST 索引的距离(邻近、最近邻、K-NN)运算符。
box
circle
point
poly
B树运算符类
未在文档中列出,但可通过此查询查看:
select am.amname AS index_method
, opc.opcname AS opclass_name
, opc.opcintype::regtype AS indexed_type
, opc.opcdefault AS is_default
from pg_am am
, pg_opclass opc
where opc.opcmethod = am.oid
and am.amname = 'btree'
order by 1,2;
B-tree GiST 距离支持
https://www.postgresql.org/docs/current/btree-gist.html
我想 B-tree 是 GiST 的特例,并且有一个 B-tree 运算符 class 可以匹配。文档说支持这些原生类型:
int2
int4
int8
float4
float8
timestamp with time zone
timestamp without time zone
time without time zone
date
interval
oid
money
BRIN 内置运算符类
https://www.postgresql.org/docs/current/brin-builtin-opclasses.html
内部文档中列出了 70 多个。
GIN 内置运算符类
https://www.postgresql.org/docs/12/gin-builtin-opclasses.html
array_ops
jsonb_ops
jsonb_path_ops
tsvector_ops
替代文本选项
https://www.postgresql.org/docs/current/indexes-opclass.html 有特殊的运算符 classes 用于逐个字符进行文本比较,而不是通过排序规则。或者文档说:
text_pattern_ops
varchar_pattern_ops
bpchar_pattern_ops
pg_trgm
除此之外,包含的 pg_trgm
模块包括 GIN 和 GiST 的运算符,GiST 版本优化 <->
。我认为这显示为:
text
注意:Postgres 14 修改pg_trgm
允许您调整索引条目的“签名长度”。更长的可能更准确,更短的签名在磁盘上更小。如果您一直在使用 pg_trgm
,可能值得尝试 PG 14 中的签名长度。
https://www.postgresql.org/docs/current/pgtrgm.html
SP-GiST 内置运算符类
box_ops
kd_point_ops
network_ops
poly_ops
quad_point_ops
range_ops
text_ops
pg_operator 搜索
这是对 pg_operator
的搜索,用于查找从 <->
运算符本身开始的匹配项:
select oprnamespace::regnamespace::text as schema_name,
oprowner::regrole as owner,
oprname as operator,
oprleft::regtype as left,
oprright::regtype as right,
oprresult::regtype as result,
oprcom::regoperator as commutator
from pg_operator
where oprname = '<->'
order by 1
我们的一台服务器的输出:
| schema_name | owner | operator | left | right | result | commutator |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
| extensions | postgres | <-> | text | text | real | <->(text,text) |
| extensions | postgres | <-> | money | money | money | <->(money,money) |
| extensions | postgres | <-> | date | date | integer | <->(date,date) |
| extensions | postgres | <-> | real | real | real | <->(real,real) |
| extensions | postgres | <-> | double precision | double precision | double precision | <->(double precision,double precision) |
| extensions | postgres | <-> | smallint | smallint | smallint | <->(smallint,smallint) |
| extensions | postgres | <-> | integer | integer | integer | <->(integer,integer) |
| extensions | postgres | <-> | bigint | bigint | bigint | <->(bigint,bigint) |
| extensions | postgres | <-> | interval | interval | interval | <->(interval,interval) |
| extensions | postgres | <-> | oid | oid | oid | <->(oid,oid) |
| extensions | postgres | <-> | time without time zone | time without time zone | interval | <->(time without time zone,time without time zone) |
| extensions | postgres | <-> | timestamp without time zone | timestamp without time zone | interval | <->(timestamp without time zone,timestamp without time zone) |
| extensions | postgres | <-> | timestamp with time zone | timestamp with time zone | interval | <->(timestamp with time zone,timestamp with time zone) |
| pg_catalog | postgres | <-> | box | box | double precision | <->(box,box) |
| pg_catalog | postgres | <-> | path | path | double precision | <->(path,path) |
| pg_catalog | postgres | <-> | line | line | double precision | <->(line,line) |
| pg_catalog | postgres | <-> | lseg | lseg | double precision | <->(lseg,lseg) |
| pg_catalog | postgres | <-> | polygon | polygon | double precision | <->(polygon,polygon) |
| pg_catalog | postgres | <-> | circle | circle | double precision | <->(circle,circle) |
| pg_catalog | postgres | <-> | point | circle | double precision | <->(circle,point) |
| pg_catalog | postgres | <-> | circle | point | double precision | <->(point,circle) |
| pg_catalog | postgres | <-> | point | polygon | double precision | <->(polygon,point) |
| pg_catalog | postgres | <-> | polygon | point | double precision | <->(point,polygon) |
| pg_catalog | postgres | <-> | circle | polygon | double precision | <->(polygon,circle) |
| pg_catalog | postgres | <-> | polygon | circle | double precision | <->(circle,polygon) |
| pg_catalog | postgres | <-> | point | point | double precision | <->(point,point) |
| pg_catalog | postgres | <-> | box | line | double precision | <->(line,box) |
| pg_catalog | postgres | <-> | tsquery | tsquery | tsquery | 0 |
| pg_catalog | postgres | <-> | line | box | double precision | <->(box,line) |
| pg_catalog | postgres | <-> | point | line | double precision | <->(line,point) |
| pg_catalog | postgres | <-> | line | point | double precision | <->(point,line) |
| pg_catalog | postgres | <-> | point | lseg | double precision | <->(lseg,point) |
| pg_catalog | postgres | <-> | lseg | point | double precision | <->(point,lseg) |
| pg_catalog | postgres | <-> | point | box | double precision | <->(box,point) |
| pg_catalog | postgres | <-> | box | point | double precision | <->(point,box) |
| pg_catalog | postgres | <-> | lseg | line | double precision | <->(line,lseg) |
| pg_catalog | postgres | <-> | line | lseg | double precision | <->(lseg,line) |
| pg_catalog | postgres | <-> | lseg | box | double precision | <->(box,lseg) |
| pg_catalog | postgres | <-> | box | lseg | double precision | <->(lseg,box) |
| pg_catalog | postgres | <-> | point | path | double precision | <->(path,point) |
| pg_catalog | postgres | <-> | path | point | double precision | <->(point,path) |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
我是否遗漏了任何值得了解的索引选项?
查看实时索引
这是一个比它应该更长的查询,因为我仍然会发现目录混乱的查询,用于从每个用户索引中提取列,并找出它们更有趣的属性。要获得实用的简短目录搜索,请参阅 https://dba.stackexchange.com/questions/186944/how-to-list-all-the-indexes-along-with-their-type-btree-brin-hash-etc
with
basic_details as (
select relnamespace::regnamespace::text as schema_name,
indrelid::regclass::text as table_name,
indexrelid::regclass::text as index_name,
unnest(indkey) as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
generate_subscripts(indkey, 1) + 1 as column_position_in_index --
from pg_index
join pg_class on pg_class.oid = pg_index.indrelid
),
enriched_details as (
select basic_details.schema_name,
basic_details.table_name,
basic_details.index_name,
basic_details.column_ordinal_position,
basic_details.column_position_in_index,
columns.column_name,
columns.udt_name as column_type_name
from basic_details
join information_schema.columns as columns
on columns.table_schema = basic_details.schema_name
and columns.table_name = basic_details.table_name
and columns.ordinal_position = basic_details.column_ordinal_position
where schema_name not like 'pg_%'
)
select *,
-- https://postgrespro.com/blog/pgsql/4161264
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false) as supports_in_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false) as supports_index_only_scans,
(select indexdef
from pg_indexes
where pg_indexes.schemaname = enriched_details.schema_name
and pg_indexes.indexname = enriched_details.index_name) as index_definition
from enriched_details
order by supports_in_searches desc,
schema_name,
table_name,
index_name
时间戳类型支持使用由 btree_gist 扩展创建的 <-> 运算符的 GiST 索引的 KNN。
你可以检查特定索引的特定列是否支持它,像这样:
select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');
据我所知,这是 PG 14 的游戏状态:
GiST 索引可能支持最近邻 (K-NN) 邻近度
<-->
搜索,并且始终如此。从 PG 12 开始,SP-GiST 添加了此类支持。
RUM 索引(不在核心中)也支持 K-NN。
在所有情况下,支持都在运算符 class:
中完成https://www.postgresql.org/docs/current/indexes-opclass.html
这就是决定 distance_orderable
是否适用于 specific 类型索引上的 specific 数据类型的原因。一些内置的几何和文本矢量类型开箱即用。除了那个小集合之外,还有更多类型通过特定的运算符 classes 得到支持,例如:
https://www.postgresql.org/docs/current/btree-gist.html https://www.postgresql.org/docs/current/pgtrgm.html
对于 SP-GiST,支持的类型比 GiST 少很多,一旦你安装了 btree_gist
:
https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html
看起来 text_opts
和 range_opts
不 支持邻近搜索。但是,对于 tsrange
等,其他工具可能有足够的选择。