使用 :jsonb ?& 运算符时查询变得很慢

query gets very slow when :jsonb ?& operator is used

我有以下 SQL 运行速度很快的查询

select
   phone_number.id,
   phone_number.phone_number,
   phone_number.account_id,
   phone_number.used AS used,
   (
      now() AT TIME ZONE account.timezone
   )
   ::time AS local_time 
from
   phone_number 
   INNER JOIN
      account 
      ON account.id = phone_number.account_id 
where
   phone_number.used = false 
   AND phone_number.account_id IN 
   (
      SELECT
         phone_number.account_id 
      FROM
         phone_number 
      WHERE
         insert_timestamp < (now() - interval '10 hours')
   )
   AND 
   (
      now() AT TIME ZONE account.timezone
   )
   ::time BETWEEN 
   CASE
      WHEN
         EXTRACT(DOW 
FROM
   now() AT TIME ZONE account.timezone) IN 
   (
      6,
      0
   )
THEN
   '15:30'::time 
ELSE
   '17:30'::time 
   END
AND '22:10'::time 
order by
   random() limit 1

但是当我将其添加到其中时 account.residence_details::jsonb ?& array['city', 'state', 'streetName'] 将完整查询变为

select
   phone_number.id,
   phone_number.phone_number,
   phone_number.account_id,
   phone_number.used AS used,
   (
      now() AT TIME ZONE account.timezone
   )
   ::time AS local_time 
from
   phone_number 
   INNER JOIN
      account 
      ON account.id = phone_number.account_id 
where
   phone_number.used = false 
   AND phone_number.account_id IN 
   (
      SELECT
         phone_number.account_id 
      FROM
         phone_number 
      WHERE
         insert_timestamp < (now() - interval '10 hours')
   )
   AND 
   (
      now() AT TIME ZONE account.timezone
   )
   ::time BETWEEN 
   CASE
      WHEN
         EXTRACT(DOW 
FROM
   now() AT TIME ZONE account.timezone) IN 
   (
      6,
      0
   )
THEN
   '15:30'::time 
ELSE
   '17:30'::time 
   END
AND '22:10'::time 
   AND account.residence_details::jsonb ?& array['city', 'state', 'streetName'] 
order by
   random() limit 1

查询大约需要 1 分钟才能完成

下面是 EXPLAIN ANALYZE 的查询,没有 account.residence_details::jsonb ?& array['city', 'state', 'streetName']

Limit  (cost=15795.97..15795.97 rows=1 width=45) (actual time=382.995..382.995 rows=0 loops=1)
  ->  Sort  (cost=15795.97..15796.18 rows=85 width=45) (actual time=382.993..382.993 rows=0 loops=1)
        Sort Key: (random())
        Sort Method: quicksort  Memory: 25kB
        ->  Nested Loop  (cost=8742.24..15795.54 rows=85 width=45) (actual time=382.640..382.640 rows=0 loops=1)
              Join Filter: (phone_number.account_id = account.id)
              ->  Hash Join  (cost=8741.96..15403.38 rows=850 width=37) (actual time=347.011..368.677 rows=2099 loops=1)
                    Hash Cond: (phone_number.account_id = phone_number_1.account_id)
                    ->  Seq Scan on phone_number  (cost=0.00..6649.74 rows=850 width=29) (actual time=14.499..33.591 rows=2453 loops=1)
                          Filter: (NOT used)
                          Rows Removed by Filter: 190152
                    ->  Hash  (cost=8629.44..8629.44 rows=9001 width=8) (actual time=332.368..332.369 rows=9581 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 503kB
                          ->  HashAggregate  (cost=8539.43..8629.44 rows=9001 width=8) (actual time=320.550..326.757 rows=9581 loops=1)
                                Group Key: phone_number_1.account_id
                                ->  Seq Scan on phone_number phone_number_1  (cost=0.00..8067.05 rows=188955 width=8) (actual time=0.010..169.126 rows=191615 loops=1)
                                      Filter: (insert_timestamp < (now() - '10:00:00'::interval))
                                      Rows Removed by Filter: 990
              ->  Index Scan using account_id_idx on account  (cost=0.29..0.45 rows=1 width=25) (actual time=0.006..0.006 rows=0 loops=2099)
                    Index Cond: (id = phone_number_1.account_id)
                    Filter: (((timezone(timezone, now()))::time without time zone <= '22:10:00'::time without time zone) AND ((timezone(timezone, now()))::time without time zone >= CASE WHEN (date_part('dow'::text, timezone(timezone, now())) = ANY ('{6,0}'::double precision[])) THEN '15:30:00'::time without time zone ELSE '17:30:00'::time without time zone END))
                    Rows Removed by Filter: 1
Planning time: 2.025 ms
Execution time: 383.794 ms

下面是 EXPLAIN ANALYZE 用于 account.residence_details::jsonb ?& array['city', 'state', 'streetName']

的查询
Limit  (cost=15916.82..15916.83 rows=1 width=45) (actual time=258768.686..258768.696 rows=1 loops=1)
  ->  Sort  (cost=15916.82..15916.83 rows=1 width=45) (actual time=258768.684..258768.685 rows=1 loops=1)
        Sort Key: (random())
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Nested Loop Semi Join  (cost=0.29..15916.81 rows=1 width=45) (actual time=495.076..258755.141 rows=1715 loops=1)
              Join Filter: (account.id = phone_number_1.account_id)
              Rows Removed by Join Filter: 167271743
              ->  Nested Loop  (cost=0.29..7634.96 rows=1 width=54) (actual time=65.620..229.670 rows=1737 loops=1)
                    ->  Seq Scan on phone_number  (cost=0.00..6649.74 rows=850 width=29) (actual time=59.234..98.326 rows=3772 loops=1)
                          Filter: (NOT used)
                          Rows Removed by Filter: 190333
                    ->  Index Scan using account_id_idx on account  (cost=0.29..1.16 rows=1 width=25) (actual time=0.029..0.029 rows=0 loops=3772)
                          Index Cond: (id = phone_number.account_id)
                          Filter: ((residence_details ?& '{city,state,streetName}'::text[]) AND ((timezone(timezone, now()))::time without time zone <= '22:10:00'::time without time zone) AND ((timezone(timezone, now()))::time without time zone >= CASE WHEN (date_part('dow'::text, timezone(timezone, now())) = ANY ('{6,0}'::double precision[])) THEN '15:30:00'::time without time zone ELSE '17:30:00'::time without time zone END))
                          Rows Removed by Filter: 1
              ->  Seq Scan on phone_number phone_number_1  (cost=0.00..8067.05 rows=188955 width=8) (actual time=0.004..87.357 rows=96300 loops=1737)
                    Filter: (insert_timestamp < (now() - '10:00:00'::interval))
                    Rows Removed by Filter: 21
Planning time: 1.712 ms
Execution time: 258768.781 ms

我不明白为什么添加 account.residence_details::jsonb ?& array['city', 'state', 'streetName']

后速度变慢了

我想说附加条件使 PostgreSQL 严重低估了第一个连接的结果计数,以至于它错误地为第二个连接选择了一个嵌套循环,这是所有时间都花在了哪里。

也许表达式的索引将有助于获得更好的估计:

CREATE INDEX ON account USING gin (residence_details::jsonb);
ANALYZE account;  -- to calculate statistics for the indexed expression