根据 DataFrame.within 方法的结果更新 GeoDataFrame 中值的有效方法

Efficient way to update values in a GeoDataFrame based on the result of DataFrame.within method

我有两个大的 GeoDataFrame:

一个来自 shapefile,其中每个多边形都有一个名为 'asapp' 的浮点值。

其次是 3x3 米的渔网网格的质心,列 'asapp' 归零。

我需要填充第二个底面的 'asapp',其中质心在第一个底面的多边形内。

下面的代码执行此操作,但速度低得可笑,每秒 15 个多边形(最小的 shapefile 之一有超过 20000 个多边形)。

# fishnet_grid is a dict created by GDAL with a raster with 3m pixel size
cells_in_wsg = np.array([(self.__convert_geom_sirgas(geom, ogr_transform), int(fid), 0.0) for fid, geom in fishnet_grid.items()])

# transforming the grid raster (which are square polygons) in a GeoDataframe of point using the centroids of the cells
fishnet_base = gpd.GeoDataFrame({'geometry': cells_in_wsg[..., 0], 'id': cells_in_wsg[..., 1], 'asapp': cells_in_wsg[..., 2]})
fishnet = gpd.GeoDataFrame({'geometry': fishnet_base.centroid, 'id': fishnet_base['id'], 'asapp': fishnet_base['asapp']})

# as_applied_data is the polygons GeoDataFrame
# the code below takes a lot of time to complete
for as_applied in as_applied_data.iterrows():
    fishnet.loc[fishnet.within(as_applied[1]['geometry']), ['asapp']] += [as_applied[1]['asapp']]

还有另一种性能更好的方法吗?

Tys!

我解决了问题。

我阅读了有关使用 geopandas.overlay (https://geopandas.org/en/stable/docs/user_guide/set_operations.html) 处理大量多边形的信息,但问题是它仅适用于多边形,而我有多边形和点。

因此,我的解决方案是从这些点创建非常小的多边形(边长为 2 厘米的正方形),然后使用叠加层。

最终代码:

# fishnet is now a GeoDataFrame of little squares
fishnet = gpd.GeoDataFrame({'geometry': cells_in_wsg[..., 0], 'id': cells_in_wsg[..., 1]})

#intersection has only the little squares that intersects with all as_applied_data polygons and the value in those polygons
intersection = gpd.overlay(fishnet, as_applied_data, how='intersection')

# now this is as easy as to calculate the mean and put it back in the fishnet using the merge
values = fishnet.merge(intersection.groupby(['id'], as_index=False).mean())
#and values has the the little squares, the geom_id and the mean values of the intersections!

效果很好!