使用 :jsonb ?& 运算符时查询变得很慢
query gets very slow when :jsonb ?& operator is used
我有以下 SQL 运行速度很快的查询
select
phone_number.id,
phone_number.phone_number,
phone_number.account_id,
phone_number.used AS used,
(
now() AT TIME ZONE account.timezone
)
::time AS local_time
from
phone_number
INNER JOIN
account
ON account.id = phone_number.account_id
where
phone_number.used = false
AND phone_number.account_id IN
(
SELECT
phone_number.account_id
FROM
phone_number
WHERE
insert_timestamp < (now() - interval '10 hours')
)
AND
(
now() AT TIME ZONE account.timezone
)
::time BETWEEN
CASE
WHEN
EXTRACT(DOW
FROM
now() AT TIME ZONE account.timezone) IN
(
6,
0
)
THEN
'15:30'::time
ELSE
'17:30'::time
END
AND '22:10'::time
order by
random() limit 1
但是当我将其添加到其中时 account.residence_details::jsonb ?& array['city', 'state', 'streetName']
将完整查询变为
select
phone_number.id,
phone_number.phone_number,
phone_number.account_id,
phone_number.used AS used,
(
now() AT TIME ZONE account.timezone
)
::time AS local_time
from
phone_number
INNER JOIN
account
ON account.id = phone_number.account_id
where
phone_number.used = false
AND phone_number.account_id IN
(
SELECT
phone_number.account_id
FROM
phone_number
WHERE
insert_timestamp < (now() - interval '10 hours')
)
AND
(
now() AT TIME ZONE account.timezone
)
::time BETWEEN
CASE
WHEN
EXTRACT(DOW
FROM
now() AT TIME ZONE account.timezone) IN
(
6,
0
)
THEN
'15:30'::time
ELSE
'17:30'::time
END
AND '22:10'::time
AND account.residence_details::jsonb ?& array['city', 'state', 'streetName']
order by
random() limit 1
查询大约需要 1 分钟才能完成
下面是 EXPLAIN ANALYZE
的查询,没有 account.residence_details::jsonb ?& array['city', 'state', 'streetName']
Limit (cost=15795.97..15795.97 rows=1 width=45) (actual time=382.995..382.995 rows=0 loops=1)
-> Sort (cost=15795.97..15796.18 rows=85 width=45) (actual time=382.993..382.993 rows=0 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=8742.24..15795.54 rows=85 width=45) (actual time=382.640..382.640 rows=0 loops=1)
Join Filter: (phone_number.account_id = account.id)
-> Hash Join (cost=8741.96..15403.38 rows=850 width=37) (actual time=347.011..368.677 rows=2099 loops=1)
Hash Cond: (phone_number.account_id = phone_number_1.account_id)
-> Seq Scan on phone_number (cost=0.00..6649.74 rows=850 width=29) (actual time=14.499..33.591 rows=2453 loops=1)
Filter: (NOT used)
Rows Removed by Filter: 190152
-> Hash (cost=8629.44..8629.44 rows=9001 width=8) (actual time=332.368..332.369 rows=9581 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 503kB
-> HashAggregate (cost=8539.43..8629.44 rows=9001 width=8) (actual time=320.550..326.757 rows=9581 loops=1)
Group Key: phone_number_1.account_id
-> Seq Scan on phone_number phone_number_1 (cost=0.00..8067.05 rows=188955 width=8) (actual time=0.010..169.126 rows=191615 loops=1)
Filter: (insert_timestamp < (now() - '10:00:00'::interval))
Rows Removed by Filter: 990
-> Index Scan using account_id_idx on account (cost=0.29..0.45 rows=1 width=25) (actual time=0.006..0.006 rows=0 loops=2099)
Index Cond: (id = phone_number_1.account_id)
Filter: (((timezone(timezone, now()))::time without time zone <= '22:10:00'::time without time zone) AND ((timezone(timezone, now()))::time without time zone >= CASE WHEN (date_part('dow'::text, timezone(timezone, now())) = ANY ('{6,0}'::double precision[])) THEN '15:30:00'::time without time zone ELSE '17:30:00'::time without time zone END))
Rows Removed by Filter: 1
Planning time: 2.025 ms
Execution time: 383.794 ms
下面是 EXPLAIN ANALYZE
用于 account.residence_details::jsonb ?& array['city', 'state', 'streetName']
的查询
Limit (cost=15916.82..15916.83 rows=1 width=45) (actual time=258768.686..258768.696 rows=1 loops=1)
-> Sort (cost=15916.82..15916.83 rows=1 width=45) (actual time=258768.684..258768.685 rows=1 loops=1)
Sort Key: (random())
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop Semi Join (cost=0.29..15916.81 rows=1 width=45) (actual time=495.076..258755.141 rows=1715 loops=1)
Join Filter: (account.id = phone_number_1.account_id)
Rows Removed by Join Filter: 167271743
-> Nested Loop (cost=0.29..7634.96 rows=1 width=54) (actual time=65.620..229.670 rows=1737 loops=1)
-> Seq Scan on phone_number (cost=0.00..6649.74 rows=850 width=29) (actual time=59.234..98.326 rows=3772 loops=1)
Filter: (NOT used)
Rows Removed by Filter: 190333
-> Index Scan using account_id_idx on account (cost=0.29..1.16 rows=1 width=25) (actual time=0.029..0.029 rows=0 loops=3772)
Index Cond: (id = phone_number.account_id)
Filter: ((residence_details ?& '{city,state,streetName}'::text[]) AND ((timezone(timezone, now()))::time without time zone <= '22:10:00'::time without time zone) AND ((timezone(timezone, now()))::time without time zone >= CASE WHEN (date_part('dow'::text, timezone(timezone, now())) = ANY ('{6,0}'::double precision[])) THEN '15:30:00'::time without time zone ELSE '17:30:00'::time without time zone END))
Rows Removed by Filter: 1
-> Seq Scan on phone_number phone_number_1 (cost=0.00..8067.05 rows=188955 width=8) (actual time=0.004..87.357 rows=96300 loops=1737)
Filter: (insert_timestamp < (now() - '10:00:00'::interval))
Rows Removed by Filter: 21
Planning time: 1.712 ms
Execution time: 258768.781 ms
我不明白为什么添加 account.residence_details::jsonb ?& array['city', 'state', 'streetName']
后速度变慢了
我想说附加条件使 PostgreSQL 严重低估了第一个连接的结果计数,以至于它错误地为第二个连接选择了一个嵌套循环,这是所有时间都花在了哪里。
也许表达式的索引将有助于获得更好的估计:
CREATE INDEX ON account USING gin (residence_details::jsonb);
ANALYZE account; -- to calculate statistics for the indexed expression
我有以下 SQL 运行速度很快的查询
select
phone_number.id,
phone_number.phone_number,
phone_number.account_id,
phone_number.used AS used,
(
now() AT TIME ZONE account.timezone
)
::time AS local_time
from
phone_number
INNER JOIN
account
ON account.id = phone_number.account_id
where
phone_number.used = false
AND phone_number.account_id IN
(
SELECT
phone_number.account_id
FROM
phone_number
WHERE
insert_timestamp < (now() - interval '10 hours')
)
AND
(
now() AT TIME ZONE account.timezone
)
::time BETWEEN
CASE
WHEN
EXTRACT(DOW
FROM
now() AT TIME ZONE account.timezone) IN
(
6,
0
)
THEN
'15:30'::time
ELSE
'17:30'::time
END
AND '22:10'::time
order by
random() limit 1
但是当我将其添加到其中时 account.residence_details::jsonb ?& array['city', 'state', 'streetName']
将完整查询变为
select
phone_number.id,
phone_number.phone_number,
phone_number.account_id,
phone_number.used AS used,
(
now() AT TIME ZONE account.timezone
)
::time AS local_time
from
phone_number
INNER JOIN
account
ON account.id = phone_number.account_id
where
phone_number.used = false
AND phone_number.account_id IN
(
SELECT
phone_number.account_id
FROM
phone_number
WHERE
insert_timestamp < (now() - interval '10 hours')
)
AND
(
now() AT TIME ZONE account.timezone
)
::time BETWEEN
CASE
WHEN
EXTRACT(DOW
FROM
now() AT TIME ZONE account.timezone) IN
(
6,
0
)
THEN
'15:30'::time
ELSE
'17:30'::time
END
AND '22:10'::time
AND account.residence_details::jsonb ?& array['city', 'state', 'streetName']
order by
random() limit 1
查询大约需要 1 分钟才能完成
下面是 EXPLAIN ANALYZE
的查询,没有 account.residence_details::jsonb ?& array['city', 'state', 'streetName']
Limit (cost=15795.97..15795.97 rows=1 width=45) (actual time=382.995..382.995 rows=0 loops=1)
-> Sort (cost=15795.97..15796.18 rows=85 width=45) (actual time=382.993..382.993 rows=0 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=8742.24..15795.54 rows=85 width=45) (actual time=382.640..382.640 rows=0 loops=1)
Join Filter: (phone_number.account_id = account.id)
-> Hash Join (cost=8741.96..15403.38 rows=850 width=37) (actual time=347.011..368.677 rows=2099 loops=1)
Hash Cond: (phone_number.account_id = phone_number_1.account_id)
-> Seq Scan on phone_number (cost=0.00..6649.74 rows=850 width=29) (actual time=14.499..33.591 rows=2453 loops=1)
Filter: (NOT used)
Rows Removed by Filter: 190152
-> Hash (cost=8629.44..8629.44 rows=9001 width=8) (actual time=332.368..332.369 rows=9581 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 503kB
-> HashAggregate (cost=8539.43..8629.44 rows=9001 width=8) (actual time=320.550..326.757 rows=9581 loops=1)
Group Key: phone_number_1.account_id
-> Seq Scan on phone_number phone_number_1 (cost=0.00..8067.05 rows=188955 width=8) (actual time=0.010..169.126 rows=191615 loops=1)
Filter: (insert_timestamp < (now() - '10:00:00'::interval))
Rows Removed by Filter: 990
-> Index Scan using account_id_idx on account (cost=0.29..0.45 rows=1 width=25) (actual time=0.006..0.006 rows=0 loops=2099)
Index Cond: (id = phone_number_1.account_id)
Filter: (((timezone(timezone, now()))::time without time zone <= '22:10:00'::time without time zone) AND ((timezone(timezone, now()))::time without time zone >= CASE WHEN (date_part('dow'::text, timezone(timezone, now())) = ANY ('{6,0}'::double precision[])) THEN '15:30:00'::time without time zone ELSE '17:30:00'::time without time zone END))
Rows Removed by Filter: 1
Planning time: 2.025 ms
Execution time: 383.794 ms
下面是 EXPLAIN ANALYZE
用于 account.residence_details::jsonb ?& array['city', 'state', 'streetName']
Limit (cost=15916.82..15916.83 rows=1 width=45) (actual time=258768.686..258768.696 rows=1 loops=1)
-> Sort (cost=15916.82..15916.83 rows=1 width=45) (actual time=258768.684..258768.685 rows=1 loops=1)
Sort Key: (random())
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop Semi Join (cost=0.29..15916.81 rows=1 width=45) (actual time=495.076..258755.141 rows=1715 loops=1)
Join Filter: (account.id = phone_number_1.account_id)
Rows Removed by Join Filter: 167271743
-> Nested Loop (cost=0.29..7634.96 rows=1 width=54) (actual time=65.620..229.670 rows=1737 loops=1)
-> Seq Scan on phone_number (cost=0.00..6649.74 rows=850 width=29) (actual time=59.234..98.326 rows=3772 loops=1)
Filter: (NOT used)
Rows Removed by Filter: 190333
-> Index Scan using account_id_idx on account (cost=0.29..1.16 rows=1 width=25) (actual time=0.029..0.029 rows=0 loops=3772)
Index Cond: (id = phone_number.account_id)
Filter: ((residence_details ?& '{city,state,streetName}'::text[]) AND ((timezone(timezone, now()))::time without time zone <= '22:10:00'::time without time zone) AND ((timezone(timezone, now()))::time without time zone >= CASE WHEN (date_part('dow'::text, timezone(timezone, now())) = ANY ('{6,0}'::double precision[])) THEN '15:30:00'::time without time zone ELSE '17:30:00'::time without time zone END))
Rows Removed by Filter: 1
-> Seq Scan on phone_number phone_number_1 (cost=0.00..8067.05 rows=188955 width=8) (actual time=0.004..87.357 rows=96300 loops=1737)
Filter: (insert_timestamp < (now() - '10:00:00'::interval))
Rows Removed by Filter: 21
Planning time: 1.712 ms
Execution time: 258768.781 ms
我不明白为什么添加 account.residence_details::jsonb ?& array['city', 'state', 'streetName']
我想说附加条件使 PostgreSQL 严重低估了第一个连接的结果计数,以至于它错误地为第二个连接选择了一个嵌套循环,这是所有时间都花在了哪里。
也许表达式的索引将有助于获得更好的估计:
CREATE INDEX ON account USING gin (residence_details::jsonb);
ANALYZE account; -- to calculate statistics for the indexed expression