如何对 python 中的大量点进行反向地理编码?
How do I reverse geocode a large number of points in python?
现在我有一个 GeoJson 文件和以下使用 shapely 的函数:
它接受坐标和returns社区名称
def get_neighb(lat, lon):
"""Input Latitude and Longitude, Returns Neighborhood Name"""
point = Point(lon, lat)
found = False
for feature in geo_data['features']:
polygon = shape(feature['geometry'])
if polygon.contains(point):
return(feature['properties']['neighborhood'])
found = True
if found is False:
return('NA')
# Initialize list
tn = ['']*data.shape[0]
for i in range(len(tn)):
tn[i] = get_neighb(data.latitude[i], data.longitude[i])
这有效,但它真的很慢,关于如何加快它的任何想法,目前 运行 它在 4,000,000 行。
你必须找到一种不检查每一行的策略。最简单的方法可能是将所有形状转储到地理感知数据库中并查询它。像 post-gis 或弹性搜索之类的东西。
另一种策略可能是找到所有邻域的质心,然后使用 KD 树仅过滤具有附近质心的邻域。
如果您想避免 PostGIS 数据库的重型机械,使用 rtree
包可能会很有趣(如文档所述)"cheapo spatial database"。这个想法主要如下:
#!/usr/bin/env python
from itertools import product
from random import uniform, sample, seed
from rtree import index
from shapely.geometry import Point, Polygon, box, shape
from shapely.affinity import translate
seed(666)
#generate random polygons, in your case, the polygons are stored
#in geo_data['features']
P = Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5), (0, 0)])
polygons = []
for dx, dy in product(range(0, 100), range(0, 100)):
polygons.append(translate(P, dx, dy))
#construct the spatial index and insert bounding boxes of all polygons
idx = index.Index()
for pid, P in enumerate(polygons):
idx.insert(pid, P.bounds)
delta = 0.5
for i in range(0, 1000):
#generate random points
x, y = uniform(0, 10), uniform(0, 10)
pnt = Point(x, y)
#create a region around the point of interest
bounds = (x-delta, y-delta, x+delta, y+delta)
#also possible, but much slower
#bounds = pnt.buffer(delta).bounds
#the index tells us which polygons are worth checking, i.e.,
#the bounding box of which intersects with the region constructed in previous step
for candidate in idx.intersection(bounds):
P = polygons[candidate]
#test only these candidates
if P.contains(pnt):
print(pnt, P)
现在我有一个 GeoJson 文件和以下使用 shapely 的函数:
它接受坐标和returns社区名称
def get_neighb(lat, lon):
"""Input Latitude and Longitude, Returns Neighborhood Name"""
point = Point(lon, lat)
found = False
for feature in geo_data['features']:
polygon = shape(feature['geometry'])
if polygon.contains(point):
return(feature['properties']['neighborhood'])
found = True
if found is False:
return('NA')
# Initialize list
tn = ['']*data.shape[0]
for i in range(len(tn)):
tn[i] = get_neighb(data.latitude[i], data.longitude[i])
这有效,但它真的很慢,关于如何加快它的任何想法,目前 运行 它在 4,000,000 行。
你必须找到一种不检查每一行的策略。最简单的方法可能是将所有形状转储到地理感知数据库中并查询它。像 post-gis 或弹性搜索之类的东西。
另一种策略可能是找到所有邻域的质心,然后使用 KD 树仅过滤具有附近质心的邻域。
如果您想避免 PostGIS 数据库的重型机械,使用 rtree
包可能会很有趣(如文档所述)"cheapo spatial database"。这个想法主要如下:
#!/usr/bin/env python
from itertools import product
from random import uniform, sample, seed
from rtree import index
from shapely.geometry import Point, Polygon, box, shape
from shapely.affinity import translate
seed(666)
#generate random polygons, in your case, the polygons are stored
#in geo_data['features']
P = Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5), (0, 0)])
polygons = []
for dx, dy in product(range(0, 100), range(0, 100)):
polygons.append(translate(P, dx, dy))
#construct the spatial index and insert bounding boxes of all polygons
idx = index.Index()
for pid, P in enumerate(polygons):
idx.insert(pid, P.bounds)
delta = 0.5
for i in range(0, 1000):
#generate random points
x, y = uniform(0, 10), uniform(0, 10)
pnt = Point(x, y)
#create a region around the point of interest
bounds = (x-delta, y-delta, x+delta, y+delta)
#also possible, but much slower
#bounds = pnt.buffer(delta).bounds
#the index tells us which polygons are worth checking, i.e.,
#the bounding box of which intersects with the region constructed in previous step
for candidate in idx.intersection(bounds):
P = polygons[candidate]
#test only these candidates
if P.contains(pnt):
print(pnt, P)