JSONB 索引是否比原生索引慢？

Question

我有一个很大的 table（3000 万行），其中有 ~10 jsonb 个 B 树索引。

当我使用很少的条件创建查询时，查询速度相对较快。

当我添加更多条件时，尤其是具有稀疏 jsonb 索引的条件（例如 0 到 1,000,000 之间的整数），查询速度会急剧下降。

我想知道 jsonb 索引是否比本机索引慢？我希望通过切换到本机列而不是 JSON 来提高性能吗？

Table定义：

id  integer 
type    text    
data    jsonb   
company_index   ARRAY   
exchange_index  ARRAY   
eligible boolean

示例查询：

SELECT id, data, type 
FROM collection.bundles    
WHERE ( (ARRAY['.X'] && bundles.exchange_index)  AND   
type IN ('discussion') AND  
( ((data->>'sentiment_score')::bigint > 0 AND 
(data->'display_tweet'->'stocktwit'->'id') IS NOT NULL) )  AND  
(  eligible = true  )  AND  
((data->'display_tweet'->'stocktwit')->>'id')::bigint IS NULL )  
ORDER BY id DESC   
LIMIT 50

输出：

Limit  (cost=0.56..16197.56 rows=50 width=212) (actual time=31900.874..31900.874 rows=0 loops=1)
  Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
  I/O Timings: read=7644.206 write=7.294
  ->  Index Scan using bundles2_id_desc_idx on bundles  (cost=0.56..2401044.17 rows=7412 width=212) (actual time=31900.871..31900.871 rows=0 loops=1)
        Filter: (eligible AND ('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text) AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
        Rows Removed by Filter: 16093269
        Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
        I/O Timings: read=7644.206 write=7.294
Planning time: 0.366 ms
Execution time: 31900.909 ms

注：此查询中使用的每个 jsonb 条件都有 jsonb B 树索引。 exchange_index 和 company_index 有 GIN 索引。

更新在 Laurenz 更改查询后：

Limit  (cost=150634.15..150634.27 rows=50 width=211) (actual time=15925.828..15925.828 rows=0 loops=1)
  Buffers: shared hit=1137490 read=680349 written=2
  I/O Timings: read=2896.702 write=0.038
  ->  Sort  (cost=150634.15..150652.53 rows=7352 width=211) (actual time=15925.827..15925.827 rows=0 loops=1)
        Sort Key: bundles.id DESC
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=1137490 read=680349 written=2
        I/O Timings: read=2896.702 write=0.038
        ->  Bitmap Heap Scan on bundles  (cost=56666.15..150316.40 rows=7352 width=211) (actual time=15925.816..15925.816 rows=0 loops=1)
              Recheck Cond: (('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text))
              Filter: (eligible AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
              Rows Removed by Filter: 273230
              Heap Blocks: exact=175975
              Buffers: shared hit=1137490 read=680349 written=2
              I/O Timings: read=2896.702 write=0.038
              ->  BitmapAnd  (cost=56666.15..56666.15 rows=23817 width=0) (actual time=1895.890..1895.890 rows=0 loops=1)
                    Buffers: shared hit=37488 read=85559
                    I/O Timings: read=325.535
                    ->  Bitmap Index Scan on bundles2_exchange_index_ops_idx  (cost=0.00..6515.57 rows=863703 width=0) (actual time=218.690..218.690 rows=892669 loops=1)
                          Index Cond: ('{.X}'::text[] && exchange_index)
                          Buffers: shared hit=7 read=313
                          I/O Timings: read=1.458
                    ->  Bitmap Index Scan on bundles_eligible_idx  (cost=0.00..23561.74 rows=2476877 width=0) (actual time=436.719..436.719 rows=2569331 loops=1)
                          Index Cond: (eligible = true)
                          Buffers: shared hit=37473
                    ->  Bitmap Index Scan on bundles2_type_idx  (cost=0.00..26582.83 rows=2706276 width=0) (actual time=1052.267..1052.267 rows=2794517 loops=1)
                          Index Cond: (type = 'discussion'::text)
                          Buffers: shared hit=8 read=85246
                          I/O Timings: read=324.077
Planning time: 0.433 ms
Execution time: 15928.959 ms

Answer 1

你那些花哨的索引根本没有用到，所以问题不在于它们快不快。

这里有几件事在起作用：

在索引扫描中看到dirtied和written页，我怀疑你的table中有相当多的“死元组” .当索引扫描访问它们并注意到它们已死时，它会“杀死”那些索引条目，以便后续索引扫描不必重复该工作。

如果重复查询，您可能会注意到块数和执行时间变少了。

您可以通过 table 运行 VACUUM 或确保 autovacuum 足够频繁地处理 table 来减少该问题。
但是，您的主要问题是 LIMIT 子句诱使 PostgreSQL 使用以下策略：

因为您只需要 50 个结果行，并且您有一个索引，只需检查索引顺序中的 table 行并丢弃所有不匹配复杂条件的行，直到您有 50 个结果.

不幸的是，它必须扫描 16093319 行，直到找到 50 次匹配。 table 的“high id”端的行与条件不匹配。 PostgreSQL 不知道这种相关性。

解决方案是阻止 PostgreSQL 走这条路。最简单的方法是删除 id 上的所有索引，但考虑到它的名称可能不可行。

另一种方法是防止 PostgreSQL 在计划扫描时“看到”LIMIT 子句：
```
SELECT id, data, type
FROM (SELECT id, data, type
      FROM collection.bundles
      WHERE /* all your complicated conditions */
      OFFSET 0) subquery
ORDER BY id DESC
LIMIT 50;
```

备注：你没有显示你的索引定义，但听起来你有很多，可能太多了。索引很昂贵，因此请确保只定义那些能给您带来明显好处的索引。

JSONB 索引是否比原生索引慢？

Are JSONB indexes slower than native indexes?

postgresql

postgresql-9.6