PostGis 最近邻查询
PostGis nearest neighbours query
我想检索另一组点的给定范围内的所有点。比方说,找到任何地铁站 500 米范围内的所有商店。
我写了这个查询,很慢,想优化一下:
SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500, false);
我 运行 使用最新版本的 Postgres 和 PostGis(Postgres 9.5、PostGis 2.2.1)
这是 table 元数据:
Table "public.locations"
Column | Type | Modifiers
--------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('locations_id_seq'::regclass)
coordinates | geometry |
Indexes:
"locations_coordinates_index" gist (coordinates)
Table "public.pois"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('pois_id_seq'::regclass)
coordinates | geometry |
poi_kind_id | integer |
Indexes:
"pois_pkey" PRIMARY KEY, btree (id)
"pois_coordinates_index" gist (coordinates)
"pois_poi_kind_id_index" btree (poi_kind_id)
Foreign-key constraints:
"pois_poi_kind_id_fkey" FOREIGN KEY (poi_kind_id) REFERENCES poi_kinds(id)
这是 EXPLAIN (ANALYZE, BUFFERS) 的结果:
Unique (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.080..3338.252 rows=918 loops=1)
Buffers: shared hit=559
-> Sort (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.079..3338.145 rows=963 loops=1)
Sort Key: locations.id
Sort Method: quicksort Memory: 70kB
Buffers: shared hit=559
-> Nested Loop (cost=0.00..2407390.71 rows=2 width=4) (actual time=2.466..3337.835 rows=963 loops=1)
Join Filter: (((pois.coordinates)::geography && _st_expand((locations.coordinates)::geography, 500::double precision)) AND ((locations.coordinates)::geography && _st_expand((pois.coordinates)::geography, 500::double precision)) AND _st_dwithin((pois.coordinates)::geography, (locations.coordinates)::geography, 500::double precision, false))
Rows Removed by Join Filter: 4531356
Buffers: shared hit=559
-> Seq Scan on locations (cost=0.00..791.68 rows=24168 width=36) (actual time=0.005..3.100 rows=24237 loops=1)
Buffers: shared hit=550
-> Materialize (cost=0.00..10.47 rows=187 width=32) (actual time=0.000..0.009 rows=187 loops=24237)
Buffers: shared hit=6
-> Seq Scan on pois (cost=0.00..9.54 rows=187 width=32) (actual time=0.015..0.053 rows=187 loops=1)
Filter: (poi_kind_id = 3)
Rows Removed by Filter: 96
Buffers: shared hit=6
Planning time: 0.184 ms
Execution time: 3338.304 ms
(20 rows)
我觉得你应该换个解决方案,postgis还是运行结构化数据库中的查询,功能强大,但在特殊需求下速度不快,可能你需要elasticsearch。
elasticsearch 擅长地理位置搜索,但不擅长地理数据处理,我想你两者都需要。
由于第四个参数,我认为您使用的是 st_dwithin 的地理版本。
尝试将您的查询更改为以下查询:
SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500);
如果还是没有解决,请post重新解释分析。
我最终得出结论,我无法在现实的时间内(< 1 秒)实时计算数千个兴趣点和数千个位置之间的距离。
所以我预先计算了所有内容:每次位置或 POI 是 created/updated,我存储每个位置和每种 POI 之间的最小距离,以便能够回答问题 "which locations are closer than X meters from this kind of POI".
这是我为此编写的模块(它在 Elixir 中,但主要部分是原始的 SQL)
defmodule My.POILocationDistanceService do
alias Ecto.Adapters.SQL
alias My.Repo
def delete_distance_for_location(location_id) do
run_query!("DELETE FROM poi_location_distance WHERE location_id = ::integer", [location_id])
end
def delete_distance_for_poi_kind(poi_kind_id) do
run_query!("DELETE FROM poi_location_distance WHERE poi_kind_id = ::integer", [poi_kind_id])
end
def insert_distance_for_location(location_id) do
sql = """
INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
SELECT
DISTINCT ON (p.poi_kind_id)
p.poi_kind_id as poi_kind_id,
l.id as location_id,
p.id as poi_id,
MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
FROM locations l, pois p
WHERE
l.id =
AND ST_DWithin(l.coordinates, p.coordinates, , FALSE)
GROUP BY p.poi_kind_id, p.id, l.id
ORDER BY p.poi_kind_id, distance;
"""
run_query!(sql, [location_id, max_distance])
end
def insert_distance_for_poi_kind(poi_kind_id, offset \ 0, limit \ 10_000_000) do
sql = """
INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
SELECT
DISTINCT ON(l.id, p.poi_kind_id)
p.poi_kind_id as poi_kind_id,
l.id as location_id,
p.id as poi_id,
MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
FROM pois p, (SELECT * FROM locations OFFSET LIMIT ) as l
WHERE
p.poi_kind_id =
AND ST_DWithin(l.coordinates, p.coordinates, , FALSE)
GROUP BY l.id, p.poi_kind_id, p.id;
"""
run_query!(sql, [offset, limit, poi_kind_id, max_distance])
end
defp run_query!(query, params) do
SQL.query!(Repo, query, params)
end
def max_distance, do: 5000
end
我想检索另一组点的给定范围内的所有点。比方说,找到任何地铁站 500 米范围内的所有商店。
我写了这个查询,很慢,想优化一下:
SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500, false);
我 运行 使用最新版本的 Postgres 和 PostGis(Postgres 9.5、PostGis 2.2.1)
这是 table 元数据:
Table "public.locations"
Column | Type | Modifiers
--------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('locations_id_seq'::regclass)
coordinates | geometry |
Indexes:
"locations_coordinates_index" gist (coordinates)
Table "public.pois"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('pois_id_seq'::regclass)
coordinates | geometry |
poi_kind_id | integer |
Indexes:
"pois_pkey" PRIMARY KEY, btree (id)
"pois_coordinates_index" gist (coordinates)
"pois_poi_kind_id_index" btree (poi_kind_id)
Foreign-key constraints:
"pois_poi_kind_id_fkey" FOREIGN KEY (poi_kind_id) REFERENCES poi_kinds(id)
这是 EXPLAIN (ANALYZE, BUFFERS) 的结果:
Unique (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.080..3338.252 rows=918 loops=1)
Buffers: shared hit=559
-> Sort (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.079..3338.145 rows=963 loops=1)
Sort Key: locations.id
Sort Method: quicksort Memory: 70kB
Buffers: shared hit=559
-> Nested Loop (cost=0.00..2407390.71 rows=2 width=4) (actual time=2.466..3337.835 rows=963 loops=1)
Join Filter: (((pois.coordinates)::geography && _st_expand((locations.coordinates)::geography, 500::double precision)) AND ((locations.coordinates)::geography && _st_expand((pois.coordinates)::geography, 500::double precision)) AND _st_dwithin((pois.coordinates)::geography, (locations.coordinates)::geography, 500::double precision, false))
Rows Removed by Join Filter: 4531356
Buffers: shared hit=559
-> Seq Scan on locations (cost=0.00..791.68 rows=24168 width=36) (actual time=0.005..3.100 rows=24237 loops=1)
Buffers: shared hit=550
-> Materialize (cost=0.00..10.47 rows=187 width=32) (actual time=0.000..0.009 rows=187 loops=24237)
Buffers: shared hit=6
-> Seq Scan on pois (cost=0.00..9.54 rows=187 width=32) (actual time=0.015..0.053 rows=187 loops=1)
Filter: (poi_kind_id = 3)
Rows Removed by Filter: 96
Buffers: shared hit=6
Planning time: 0.184 ms
Execution time: 3338.304 ms
(20 rows)
我觉得你应该换个解决方案,postgis还是运行结构化数据库中的查询,功能强大,但在特殊需求下速度不快,可能你需要elasticsearch。
elasticsearch 擅长地理位置搜索,但不擅长地理数据处理,我想你两者都需要。
由于第四个参数,我认为您使用的是 st_dwithin 的地理版本。
尝试将您的查询更改为以下查询:
SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500);
如果还是没有解决,请post重新解释分析。
我最终得出结论,我无法在现实的时间内(< 1 秒)实时计算数千个兴趣点和数千个位置之间的距离。
所以我预先计算了所有内容:每次位置或 POI 是 created/updated,我存储每个位置和每种 POI 之间的最小距离,以便能够回答问题 "which locations are closer than X meters from this kind of POI".
这是我为此编写的模块(它在 Elixir 中,但主要部分是原始的 SQL)
defmodule My.POILocationDistanceService do
alias Ecto.Adapters.SQL
alias My.Repo
def delete_distance_for_location(location_id) do
run_query!("DELETE FROM poi_location_distance WHERE location_id = ::integer", [location_id])
end
def delete_distance_for_poi_kind(poi_kind_id) do
run_query!("DELETE FROM poi_location_distance WHERE poi_kind_id = ::integer", [poi_kind_id])
end
def insert_distance_for_location(location_id) do
sql = """
INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
SELECT
DISTINCT ON (p.poi_kind_id)
p.poi_kind_id as poi_kind_id,
l.id as location_id,
p.id as poi_id,
MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
FROM locations l, pois p
WHERE
l.id =
AND ST_DWithin(l.coordinates, p.coordinates, , FALSE)
GROUP BY p.poi_kind_id, p.id, l.id
ORDER BY p.poi_kind_id, distance;
"""
run_query!(sql, [location_id, max_distance])
end
def insert_distance_for_poi_kind(poi_kind_id, offset \ 0, limit \ 10_000_000) do
sql = """
INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
SELECT
DISTINCT ON(l.id, p.poi_kind_id)
p.poi_kind_id as poi_kind_id,
l.id as location_id,
p.id as poi_id,
MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
FROM pois p, (SELECT * FROM locations OFFSET LIMIT ) as l
WHERE
p.poi_kind_id =
AND ST_DWithin(l.coordinates, p.coordinates, , FALSE)
GROUP BY l.id, p.poi_kind_id, p.id;
"""
run_query!(sql, [offset, limit, poi_kind_id, max_distance])
end
defp run_query!(query, params) do
SQL.query!(Repo, query, params)
end
def max_distance, do: 5000
end