有没有一种方法可以计算最近的 lat_longs,然后将它们组合在 python 中?

Is there a way to calculate nearest lat_longs and then club them together in python?

我想做的是计算并将那些 lat_longs 组合成一个 lat_long ,它们的正弦距离小于 1km,然后将它们推入列表,还有那些 lat_longs距离不小于1公里。

我已经用haversine计算了haversine distance距离。

def get_dist(loc_1,loc_2):
    
    loc_1 = loc_1.split(",")
    loc_2 = loc_2.split(",")
    
    loc_1 = (float(loc_1[0]),float(loc_1[1]))
    loc_2 = (float(loc_2[0]),float(loc_2[1]))
    
    val = hs.haversine(loc_1,loc_2)
    
    return val

所以基本上我的目标是对地理空间位置进行聚类,以找出数据库中的 Natural-Gas 个泵。

我为此使用了 DBSCAN。

代码:-

    final_df[['latitude','longitude']] = 
    final_df['start_cord'].str.split(",",expand=True)
    print(len(final_df))
    
    del final_df['start_cord']
    
    final_df['latitude'] = pd.to_numeric(final_df['latitude'])
    final_df['longitude'] = pd.to_numeric(final_df['longitude'])
    final_df = final_df.reset_index(drop=True)
    
    coords = final_df.to_numpy()
    
    kms_per_radian = 6371.0088
    epsilon = 0.3 / kms_per_radian
    db = DBSCAN(eps=epsilon, min_samples=10,
                algorithm='ball_tree', metric='haversine').fit(np.radians(coords))
    cluster_labels = db.labels_
    num_clusters = len(set(cluster_labels))
    clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
    print('Number of clusters: {}'.format(num_clusters))

进口

import pandas as pd, numpy as np, matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint
import haversine as hs