未使用 smallint[] 列上的 GIN 索引或错误 "operator is not unique"

Question

create table test(
    id serial primary key,
    tagged smallint[]
);

在 tagged 列上有 gin 索引，使用 _int2_ops 运算符 class:

CREATE INDEX ix ON test USING GIN(col _int2_ops);

当我运行这个查询时：

select * from test
where tagged @> ARRAY[11]
order by id limit 100;

EXPLAIN ANALYZE 显示：

Limit  (cost=0.43..19524.39 rows=100 width=36) (actual time=25024.124..25027.263 rows=100 loops=1)
  ->  Index Scan using test_pkey on test  (cost=0.43..508404.37 rows=2604 width=36) (actual time=25024.121..25027.251 rows=100 loops=1)
        Filter: ((tagged)::integer[] @> '{11}'::integer[])
        Rows Removed by Filter: 2399999
Planning time: 6.912 ms
Execution time: 25027.307 ms

大胆强调我的。为什么 tagged 列转换为 integer[] 类型？我认为这就是不使用 GIN 索引和查询运行s 慢的原因。

我试过 WHERE tagged @> ARRAY[11]::smallint[] 但出现此错误：

operator is not unique: smallint[] @> smallint[]

如果我这样做但使用 tagged int[] 并创建索引为

CREATE INDEX ix ON test USING GIN(tagged gin__int_ops);

那么上面的查询使用了GIN索引：

"->  Bitmap Index Scan on ix  (cost=0.00..1575.53 rows=2604 width=0) (actual time=382.840..382.840 rows=2604480 loops=1)"
"   Index Cond: (tagged @> '{11}'::integer[])"

这比以前快了一点，但平均需要 10 秒 - 仍然太慢了。我想尝试 smallint[] 而不是 int[]，也许那样会更快...

Answer 1

解决方案

最有可能的是，解决方案 是对运算符进行模式限定：

SELECT *
FROM   test
WHERE  tagged <b>OPERATOR(pg_catalog.@>)</b> '{11}'::int2[]
ORDER  BY id
LIMIT  100;

为什么？

这是运算符解析的问题（结合类型解析和转换上下文）。

在标准的 Postgres 中，只有一个候选运算符 anyarray @> anyarray，这就是您想要的。

如果您没有安装 附加模块 intarray（我的假设），您的设置会工作得很好，它为 integer[] @> integer[] 提供了另一个运算符。

因此，另一种解决方案是使用 integer[] 代替，并使用 gin__int_ops operator class 的 GIN 索引。或者尝试（intarray 的默认值）gist__int_ops 索引。两者都可能更快，但都不允许 NULL 值。
或者您可以将 intarray 运算符重命名为 @> 以消除歧义。（我不会那样做。随之而来的是升级和可移植性问题。）

对于涉及至少一个 integer[] 类型操作数的表达式，Postgres 知道选择哪个运算符：intarray 运算符。但是 索引不适用 ，因为 intarray 运算符只对 integer (int4) 而不是 int2。并且索引严格绑定到运算符：

Can PostgreSQL index array columns?
PostgreSQL behavior in presence of two different type of indexes on the same column

但对于 int2[] @> int2[]，Postgres 无法决定最佳运算符。两者似乎同样适用。由于在 pg_catalog 模式中提供了默认运算符，而在 public 模式中提供了 intarray 运算符（默认情况下 - 或者您安装扩展的任何地方），您可以通过模式限定帮助解决难题OPERATOR() construct 运算符。相关：

Compare arrays for equality, ignoring order of elements

您收到的错误消息有点误导。但是，如果您仔细观察，会添加一条 HINT 行，提示 (tada!) 方向正确：

ERROR:  operator is not unique: smallint[] @> smallint[]
LINE 1: SELECT NULL::int2[] @> NULL::int2[]
                            ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

您可以通过以下方式调查 @> 的现有候选运算符：

SELECT o.oid, *, oprleft::regtype, oprright::regtype, n.nspname
FROM   pg_operator o
JOIN   pg_namespace n ON n.oid = o.oprnamespace
WHERE  oprname = '@>';

另一种替代解决方案是临时（！）设置不同的 search_path，这样只会找到所需的运算符。在同一笔交易中：

SET LOCAL search_path = pg_catalog;
SELECT ...

但是您必须对查询中的所有表进行模式限定。

关于演员背景：

Generate series of dates - using date type as input

您可以更改 int2 的 castcontext -> int4。但我强烈反对它。太多可能的副作用：

Is there any way to cast postgresql 9.3 data type so that it can affect only one side

未使用 smallint[] 列上的 GIN 索引或错误 "operator is not unique"

GIN index on smallint[] column not used or error "operator is not unique"

postgresql

indexing

performance

operator-overloading

postgresql-9.5

解决方案

为什么？