使用 Postgres intarray 字段的后代搜索性能
Performance of descendants search using Postgres intarray field
我有一个 Postgres(版本 9.5.4)table geo
,其中包含具有以下结构的 738,884 条国家/地区地理数据记录:
Table "public.geo"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+-----------------------------+-----------+----------+--------------+-------------
id | integer | not null | plain | |
kind | character varying(255) | | extended | |
name | character varying(255) | | extended | |
is_owner | integer | | plain | |
path_array | integer[] | | extended | |
Indexes:
"geo_pkey" PRIMARY KEY, btree (id)
"kind_index" btree (kind)
"path_array_idx" gin (path_array gin__int_ops)
记录按 kind
字段具有层次结构:country
-> province
-> area
-> locality
。此层次结构作为祖先数组和行的自身 ID 存储在 path_array
字段中。
示例:
17239123 locality Moscow 1 {17073865,17073877,17073958,17239123}
我已经安装了 intarray 扩展并为 path_array
字段添加了正确的索引。
现在我有一堆可以有任何类型(从国家到地方)的记录 ID,我需要 select 他们所有类型为 locality
的后代(即具有任何类型的记录)这个 id 在他们的 path_array
).
这是我的查询:
SELECT
id
FROM geo
WHERE
kind = 'locality'
AND is_owner = 1
AND path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]
这里是 EXPLAIN ANALYZE
输出:
Bitmap Heap Scan on geo (cost=1418.04..1532.99 rows=8 width=4) (actual time=685.183..723.330 rows=20984 loops=1)
Recheck Cond: ((is_owner = 1) AND (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]))
Filter: ((kind)::text = 'locality'::text)
Rows Removed by Filter: 2037
Heap Blocks: exact=17106
-> BitmapAnd (cost=1418.04..1418.04 rows=29 width=0) (actual time=681.154..681.154 rows=0 loops=1)
-> Bitmap Index Scan on is_owner_index (cost=0.00..544.24 rows=29309 width=0) (actual time=5.493..5.493 rows=29201 loops=1)
Index Cond: (is_owner = 1)
-> Bitmap Index Scan on path_array_idx (cost=0.00..873.54 rows=739 width=0) (actual time=667.888..667.888 rows=607440 loops=1)
Index Cond: (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[])
Planning time: 0.212 ms
Execution time: 727.370 ms
上面的查询用了大约 700 毫秒,我认为这很慢。
我是对的还是我要求太多了?
我在 path_array 和 is_owner 字段上创建了复杂索引。
CREATE INDEX path_array_owner_idx ON geo USING gin (path_array gin__int_ops) WHERE is_owner = 1
-------------------------------------------------------------------------------------
Bitmap Heap Scan on geo (cost=436.04..550.99 rows=8 width=4) (actual time=30.292..68.778 rows=20984 loops=1)
Recheck Cond: ((path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]) AND (is_owner = 1))
Filter: ((kind)::text = 'locality'::text)
Rows Removed by Filter: 2037
Heap Blocks: exact=17106
-> Bitmap Index Scan on path_array_owner_idx (cost=0.00..436.04 rows=29 width=0) (actual time=25.923..25.923 rows=23021 loops=1)
Index Cond: (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[])
Planning time: 0.219 ms
Execution time: 72.956 ms
现在上面的查询用了 70 毫秒,这很好。
我有一个 Postgres(版本 9.5.4)table geo
,其中包含具有以下结构的 738,884 条国家/地区地理数据记录:
Table "public.geo"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+-----------------------------+-----------+----------+--------------+-------------
id | integer | not null | plain | |
kind | character varying(255) | | extended | |
name | character varying(255) | | extended | |
is_owner | integer | | plain | |
path_array | integer[] | | extended | |
Indexes:
"geo_pkey" PRIMARY KEY, btree (id)
"kind_index" btree (kind)
"path_array_idx" gin (path_array gin__int_ops)
记录按 kind
字段具有层次结构:country
-> province
-> area
-> locality
。此层次结构作为祖先数组和行的自身 ID 存储在 path_array
字段中。
示例:
17239123 locality Moscow 1 {17073865,17073877,17073958,17239123}
我已经安装了 intarray 扩展并为 path_array
字段添加了正确的索引。
现在我有一堆可以有任何类型(从国家到地方)的记录 ID,我需要 select 他们所有类型为 locality
的后代(即具有任何类型的记录)这个 id 在他们的 path_array
).
这是我的查询:
SELECT
id
FROM geo
WHERE
kind = 'locality'
AND is_owner = 1
AND path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]
这里是 EXPLAIN ANALYZE
输出:
Bitmap Heap Scan on geo (cost=1418.04..1532.99 rows=8 width=4) (actual time=685.183..723.330 rows=20984 loops=1)
Recheck Cond: ((is_owner = 1) AND (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]))
Filter: ((kind)::text = 'locality'::text)
Rows Removed by Filter: 2037
Heap Blocks: exact=17106
-> BitmapAnd (cost=1418.04..1418.04 rows=29 width=0) (actual time=681.154..681.154 rows=0 loops=1)
-> Bitmap Index Scan on is_owner_index (cost=0.00..544.24 rows=29309 width=0) (actual time=5.493..5.493 rows=29201 loops=1)
Index Cond: (is_owner = 1)
-> Bitmap Index Scan on path_array_idx (cost=0.00..873.54 rows=739 width=0) (actual time=667.888..667.888 rows=607440 loops=1)
Index Cond: (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[])
Planning time: 0.212 ms
Execution time: 727.370 ms
上面的查询用了大约 700 毫秒,我认为这很慢。 我是对的还是我要求太多了?
我在 path_array 和 is_owner 字段上创建了复杂索引。
CREATE INDEX path_array_owner_idx ON geo USING gin (path_array gin__int_ops) WHERE is_owner = 1
-------------------------------------------------------------------------------------
Bitmap Heap Scan on geo (cost=436.04..550.99 rows=8 width=4) (actual time=30.292..68.778 rows=20984 loops=1)
Recheck Cond: ((path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[]) AND (is_owner = 1))
Filter: ((kind)::text = 'locality'::text)
Rows Removed by Filter: 2037
Heap Blocks: exact=17106
-> Bitmap Index Scan on path_array_owner_idx (cost=0.00..436.04 rows=29 width=0) (actual time=25.923..25.923 rows=23021 loops=1)
Index Cond: (path_array && '{17073888,17073984,17073885,17073905,17073958,17073927,17073908,17073952,17073948,17073947,17073917,17073944,17073919,17073922,17073914,17073937,17073895,17073904,17073911,17073949,17073938,17073957,17073900,17073915,17073936,17073887,17073933,17073939,17073956,17073884,17073901,17073881,17153202,17073916,17073945,17073883,17073943,17073909,17073950,17073942,17073906,17073886,17073910,17073882,17073941,17073891,17073929,17073928,17073903,17073912,17073930,17073898,17073899,17073954}'::integer[])
Planning time: 0.219 ms
Execution time: 72.956 ms
现在上面的查询用了 70 毫秒,这很好。