仅当它们在附近时才从点列表创建多边形

Create polygon from list of points only if they are nearby

我有一个点列表(经度和纬度),以及它们在地理数据框中的关联点几何图形。所有的点都应该能够细分为单独的多边形,因为这些点通常聚集在几个区域中。我想做的是有某种算法在这些点上循环并检查前一个点和当前点之间的距离。如果距离足够小,它会将这些点组合在一起。这个过程将一直持续到当前点太远为止。它会用这些接近的点制作一个多边形,然后继续处理下一组点。

gdf
longitude   latitude    geometry
0   -76.575249  21.157229   POINT (-76.57525 21.15723)
1   -76.575035  21.157453   POINT (-76.57503 21.15745)
2   -76.575255  21.157678   POINT (-76.57526 21.15768)
3   -76.575470  21.157454   POINT (-76.57547 21.15745)
5   -112.973177 31.317333   POINT (-112.97318 31.31733)
... ... ... ...
2222    -113.492501 47.645914   POINT (-113.49250 47.64591)
2223    -113.492996 47.643609   POINT (-113.49300 47.64361)
2225    -113.492379 47.643557   POINT (-113.49238 47.64356)
2227    -113.487443 47.643142   POINT (-113.48744 47.64314)
2230    -105.022627 48.585669   POINT (-105.02263 48.58567)

所以在上面的数据中,前 4 个点将组合在一起并变成一个多边形。然后,它将移动到下一组,依此类推。每组点不是均匀分布的,即下一组可能是 7 对点,接下来可能是 3 对。理想情况下,最终输出将是另一个地理数据框,它只是一堆多边形。

看起来像是 k-means clustering 的工作。

您可能需要注意如何定义距离(“穿过”地球的实际距离,或最短路径?)

将每个簇变成多边形取决于您想要做什么...只需将它们链接起来或寻找它们的凸包络...

您可以尝试 DBSCAN 聚类,因为它会自动找到最佳数量的聚类,并且您可以指定点之间的最大距离 ( ε )。

使用您的示例,该算法识别出两个集群。

import pandas as pd
from sklearn.cluster import DBSCAN

df = pd.DataFrame(
    [
        [-76.575249, 21.157229, (-76., 21.15723)],
        [-76.575035, 21.157453, (-76.57503, 21.15745)],
        [-76.575255, 21.157678, (-76.57526, 21.15768)],
        [-76.575470, 21.157454, (-76.57547, 21.15745)],
        [-112.973177, 31.317333, (-112.97318, 31.31733)],
        [-113.492501, 47.645914, (-113.49250, 47.64591)],
        [-113.492996, 47.643609, (-113.49300, 47.64361)],
        [-113.492379, 47.643557, (-113.49238, 47.64356)],
        [-113.487443, 47.643142, (-113.48744, 47.64314)],
        [-105.022627, 48.585669, (-105.02263, 48.58567)]
    ], columns=["longitude", "latitude", "geometry"])

clustering = DBSCAN(eps=0.3, min_samples=4).fit(df[['longitude','latitude']].values)
gdf = pd.concat([df, pd.Series(clustering.labels_, name='label')], axis=1)
print(gdf)
gdf.plot.scatter(x='longitude', y='latitude', c='label')

    longitude   latitude                geometry  label
0  -76.575249  21.157229       (-76.0, 21.15723)      0
1  -76.575035  21.157453   (-76.57503, 21.15745)      0
2  -76.575255  21.157678   (-76.57526, 21.15768)      0
3  -76.575470  21.157454   (-76.57547, 21.15745)      0
4 -112.973177  31.317333  (-112.97318, 31.31733)     -1 # not in cluster
5 -113.492501  47.645914   (-113.4925, 47.64591)      1
6 -113.492996  47.643609    (-113.493, 47.64361)      1
7 -113.492379  47.643557  (-113.49238, 47.64356)      1
8 -113.487443  47.643142  (-113.48744, 47.64314)      1
9 -105.022627  48.585669  (-105.02263, 48.58567)     -1 # not in cluster

如果我们将随机数据添加到您的数据集,运行 聚类算法,并过滤掉那些不在聚类中的数据点,您就会更清楚地了解它的工作原理。

import numpy as np
rng = np.random.default_rng(seed=42)
arr2 = pd.DataFrame(rng.random((3000, 2)) * 100, columns=['latitude', 'longitude'])
randdf = pd.concat([df[['latitude', 'longitude']], arr2]).reset_index()
clustering = DBSCAN(eps=1, min_samples=4).fit(randdf[['longitude','latitude']].values)
labels =  pd.Series(clustering.labels_, name='label')
gdf = pd.concat([randdf[['latitude', 'longitude']], labels], axis=1)

subgdf = gdf[gdf['label']> -1]
subgdf.plot.scatter(x='longitude', y='latitude', c='label', colormap='viridis', figsize=(20,10))

print(gdf['label'].value_counts())
-1     2527
 16      10
 3        8
 10       8
 50       8
       ... 
 57       4
 64       4
 61       4
 17       4
 0        4
Name: label, Length: 99, dtype: int64

从此数据框中获取聚类点会相对简单。像这样:

subgdf['point'] = subgdf.apply(lambda x: (x['latitude'], x['longitude']), axis=1)
subgdf.groupby(['label'])['point'].apply(list)

label
0     [(21.157229, -76.575249), (21.157453, -76.5750...
1     [(47.645914, -113.492501), (47.643609, -113.49...
2     [(46.67210037270342, 4.380376578722878), (46.5...
3     [(85.34030732681661, 23.393948586534073), (86....
4     [(81.40203846660347, 16.697291990770392), (82....
                            ...                        
93    [(61.419880354359925, 23.25522624430636), (61....
94    [(50.893415175135424, 90.70863269095085), (52....
95    [(88.80586950148697, 81.17523712192651), (88.6...
96    [(34.23624333000541, 40.8156668231013), (35.86...
97    [(16.10456828199399, 67.41443008931344), (15.9...
Name: point, Length: 98, dtype: object

尽管您可能需要进行某种排序以确保在绘制多边形时连接的是最近的点。

Similar SO question

DBSCAN from sklearn

Haversine Formula in Python (Bearing and Distance between two GPS points)

https://gis.stackexchange.com/questions/121256/creating-a-circle-with-radius-in-metres

您可以使用半正弦公式对一定距离内的点进行分组。使用公式为每个点(下面的函数)创建多边形,然后从点的主列表中过滤内部的点并重复直到没有更多的点。

#import modules
import numpy as np
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame, GeoSeries
from shapely import geometry
from shapely.geometry import Polygon, Point
from functools import partial
import pyproj
from shapely.ops import transform

#function to create polygons on radius
def polycir(lat, lon, radius):
    
    local_azimuthal_projection = """+proj=aeqd +R=6371000 +units=m +lat_0={} +lon_0= 
                                  {}""".format(lat, lon)
    
    wgs84_to_aeqd = partial(
        pyproj.transform,
        pyproj.Proj("+proj=longlat +datum=WGS84 +no_defs"),
        pyproj.Proj(local_azimuthal_projection),
    )
    
    aeqd_to_wgs84 = partial(
        pyproj.transform,
        pyproj.Proj(local_azimuthal_projection),
        pyproj.Proj("+proj=longlat +datum=WGS84 +no_defs"),
    )

    center = Point(float(lon), float(lat))
    point_transformed = transform(wgs84_to_aeqd, center)
    buffer = point_transformed.buffer(radius)
    # Get the polygon with lat lon coordinates
    circle_poly = transform(aeqd_to_wgs84, buffer)
    
    return circle_poly

#Convert df to gdf
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))

#Create circle polygons col
gdf['polycir'] = [polycir(x, y, <'Radius in Meters'>) for x, y in zip(gdf.latitude, 
                  gdf.longitude)]

gdf.set_geometry('polycir', inplace=True)


#You should be able to loop through the polygons and find the geometries that overlap with 
# gdf_filtered = gdf[gdf.polycir.within(gdf.iloc[0,4])]