检查一个点是否包含在 polygon/multipolygon 中 - 对于很多点
Checking if a point is contained in a polygon/multipolygon - for many points
我有一个二维地图,分为矩形网格 - 大约 45,000 个 - 以及从 shapefile 派生的一些 polygons/multipolygons(我目前使用 shapefile 库 pyshp 读取它们)。不幸的是,这些多边形中的一些相当复杂并且由大量点组成(其中一个有 640,000 个点)并且其中可能有洞。
我想做的是检查每个多边形的单元中心(我的网格的单元格)落在该特定多边形内。然而,拥有大约 45,000 个单元中心和 150 个多边形需要相当长的时间才能使用 shapely 检查所有内容。这就是我正在做的,或多或少:
# nx and ny are the number of cells in the x and y direction respectively
# x, y are the coordinates of the cell centers in the grid - numpy arrays
# shapes is a list of shapely shapes - either Polygon or MultiPolygon
# Point is a shapely.geometry.Point class
for j in range(ny):
for i in range(nx):
p = Point([x[i], y[j]])
for s in shapes:
if s.contains(p):
# The shape s contains the point p - remove the cell
根据所讨论的多边形,每次检查需要 200 微秒到 80 毫秒,如果要进行超过 600 万次检查,运行时间很容易就会耗费数小时。
我想一定有更聪明的方法来处理这个问题 - 如果可能的话,我宁愿保持身材匀称,但我们非常感谢任何建议。
提前致谢。
如果您想执行较少的比较操作,可以尝试使用 shapely str-tree 功能。考虑以下代码:
from shapely.strtree import STRtree
from shapely.geometry import Polygon, Point
# this is the grid. It includes a point for each rectangle center.
points = [Point(i, j) for i in range(10) for j in range(10)]
tree = STRtree(points). # create a 'database' of points
# this is one of the polygons.
p = Polygon([[0.5, 0], [2, 0.2], [3, 4], [0.8, 0.5], [0.5, 0]])
res = tree.query(p). # run the query (a single shot).
# do something with the results
print([o.wkt for o in res])
这段代码的结果是:
['POINT (3 0)', 'POINT (2 0)', 'POINT (1 0)', 'POINT (1 1)', 'POINT (3 1)',
'POINT (2 1)', 'POINT (2 2)', 'POINT (3 2)', 'POINT (1 2)', 'POINT (2 3)',
'POINT (3 3)', 'POINT (1 3)', 'POINT (2 4)', 'POINT (1 4)', 'POINT (3 4)']
在接受的答案中,在 shapely 中使用 rtrees 仅支持多边形的扩展或边界框。正如 shapely 文档中所述:
https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree
从 1.7.1 开始,rtree.query 之后需要检查多边形是否实际包含该点。这意味着额外的检查,但如果多边形不是非常混乱的配置,仍然可以加快算法速度
也就是说,代码应该是这样的
from shapely.strtree import STRtree
from shapely.geometry import Polygon, Point
# this is the grid. It includes a point for each rectangle center.
points = [Point(i, j) for i in range(10) for j in range(10)]
tree = STRtree(points) # create a 'database' of points
# this is one of the polygons.
p = Polygon([[0.5, 0], [2, 0.2], [3, 4], [0.8, 0.5], [0.5, 0]])
res = [o for o in tree.query(p) if p.contains(o)] # run the query (a single shot) - and test if the found points are actually inside the polygon.
# do something with the results
print([o.wkt for o in res])
果然,现在的结果是
['POINT (2 1)', 'POINT (2 2)']
这里可以看出是多边形内部的点:
-- 奖金编辑 ---
我观察到通过在以下意义上定位问题,速度得到了更好的改进:
在该区域上制作了一个正方形网格,STRtree 结构由点和多边形组成。在每个单独的网格多边形上查询点和多边形 - 在每个网格多边形内,检查查询的点是否包含在查询的多边形内。从而减少对单个多边形的检查量。
from shapely.strtree import STRtree
from shapely.geometry import Polygon, Point
import random
# build a grid spread over the area.
grid = [Polygon([[i, j],[i+1,j],[i,j+1],[i+1,j+1]]) for i in range(10) for j in range(10)]
pointtree = STRtree([Point(random.uniform(1,10),random.uniform(1,10)) for q in range(50)]) # create a 'database' of 50 random points - and build a STRtree structure over it
# take the same polygon as above, but put in an STRtree
polytree = STRtree([Polygon([[0.5, 0], [2, 0.2], [3, 4], [0.8, 0.5], [0.5, 0]])])
res = []
#do nested queries inside the grid
for g in grid:
polyquery = polytree.query(g)
pointquery = pointtree.query(g)
for point in pointquery:
for poly in polyquery:
if poly.contains(point):
res.append(point)
# do something with the results
print([o.wkt for o in res])
我有一个二维地图,分为矩形网格 - 大约 45,000 个 - 以及从 shapefile 派生的一些 polygons/multipolygons(我目前使用 shapefile 库 pyshp 读取它们)。不幸的是,这些多边形中的一些相当复杂并且由大量点组成(其中一个有 640,000 个点)并且其中可能有洞。
我想做的是检查每个多边形的单元中心(我的网格的单元格)落在该特定多边形内。然而,拥有大约 45,000 个单元中心和 150 个多边形需要相当长的时间才能使用 shapely 检查所有内容。这就是我正在做的,或多或少:
# nx and ny are the number of cells in the x and y direction respectively
# x, y are the coordinates of the cell centers in the grid - numpy arrays
# shapes is a list of shapely shapes - either Polygon or MultiPolygon
# Point is a shapely.geometry.Point class
for j in range(ny):
for i in range(nx):
p = Point([x[i], y[j]])
for s in shapes:
if s.contains(p):
# The shape s contains the point p - remove the cell
根据所讨论的多边形,每次检查需要 200 微秒到 80 毫秒,如果要进行超过 600 万次检查,运行时间很容易就会耗费数小时。
我想一定有更聪明的方法来处理这个问题 - 如果可能的话,我宁愿保持身材匀称,但我们非常感谢任何建议。
提前致谢。
如果您想执行较少的比较操作,可以尝试使用 shapely str-tree 功能。考虑以下代码:
from shapely.strtree import STRtree
from shapely.geometry import Polygon, Point
# this is the grid. It includes a point for each rectangle center.
points = [Point(i, j) for i in range(10) for j in range(10)]
tree = STRtree(points). # create a 'database' of points
# this is one of the polygons.
p = Polygon([[0.5, 0], [2, 0.2], [3, 4], [0.8, 0.5], [0.5, 0]])
res = tree.query(p). # run the query (a single shot).
# do something with the results
print([o.wkt for o in res])
这段代码的结果是:
['POINT (3 0)', 'POINT (2 0)', 'POINT (1 0)', 'POINT (1 1)', 'POINT (3 1)',
'POINT (2 1)', 'POINT (2 2)', 'POINT (3 2)', 'POINT (1 2)', 'POINT (2 3)',
'POINT (3 3)', 'POINT (1 3)', 'POINT (2 4)', 'POINT (1 4)', 'POINT (3 4)']
在接受的答案中,在 shapely 中使用 rtrees 仅支持多边形的扩展或边界框。正如 shapely 文档中所述:
https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree
从 1.7.1 开始,rtree.query 之后需要检查多边形是否实际包含该点。这意味着额外的检查,但如果多边形不是非常混乱的配置,仍然可以加快算法速度
也就是说,代码应该是这样的
from shapely.strtree import STRtree
from shapely.geometry import Polygon, Point
# this is the grid. It includes a point for each rectangle center.
points = [Point(i, j) for i in range(10) for j in range(10)]
tree = STRtree(points) # create a 'database' of points
# this is one of the polygons.
p = Polygon([[0.5, 0], [2, 0.2], [3, 4], [0.8, 0.5], [0.5, 0]])
res = [o for o in tree.query(p) if p.contains(o)] # run the query (a single shot) - and test if the found points are actually inside the polygon.
# do something with the results
print([o.wkt for o in res])
果然,现在的结果是
['POINT (2 1)', 'POINT (2 2)']
这里可以看出是多边形内部的点:
-- 奖金编辑 ---
我观察到通过在以下意义上定位问题,速度得到了更好的改进:
在该区域上制作了一个正方形网格,STRtree 结构由点和多边形组成。在每个单独的网格多边形上查询点和多边形 - 在每个网格多边形内,检查查询的点是否包含在查询的多边形内。从而减少对单个多边形的检查量。
from shapely.strtree import STRtree
from shapely.geometry import Polygon, Point
import random
# build a grid spread over the area.
grid = [Polygon([[i, j],[i+1,j],[i,j+1],[i+1,j+1]]) for i in range(10) for j in range(10)]
pointtree = STRtree([Point(random.uniform(1,10),random.uniform(1,10)) for q in range(50)]) # create a 'database' of 50 random points - and build a STRtree structure over it
# take the same polygon as above, but put in an STRtree
polytree = STRtree([Polygon([[0.5, 0], [2, 0.2], [3, 4], [0.8, 0.5], [0.5, 0]])])
res = []
#do nested queries inside the grid
for g in grid:
polyquery = polytree.query(g)
pointquery = pointtree.query(g)
for point in pointquery:
for poly in polyquery:
if poly.contains(point):
res.append(point)
# do something with the results
print([o.wkt for o in res])