如何测量两组点之间的成对距离?

How to measure pairwise distances between two sets of points?

我有两个数据集(csv 文件)。它们都包含两组(220 和 4400)点的经纬度。现在我想测量这两组点 (220 x 4400) 之间的成对距离(英里)。我怎样才能在 python 中做到这一点?类似于这个问题:https://gist.github.com/rochacbruno/2883505

最好使用 sklearn,它完全符合您的要求。

假设我们有一些示例数据

towns = pd.DataFrame({
    "name" : ["Merry Hill", "Spring Valley", "Nesconset"],
    "lat" : [36.01, 41.32, 40.84],
    "long" : [-76.7, -89.20, -73.15]
})

museum = pd.DataFrame({
    "name" : ["Motte Historical Car Museum, Menifee", "Crocker Art Museum, Sacramento", "World Chess Hall Of Fame, St.Louis", "National Atomic Testing Museum, Las", "National Air and Space Museum, Washington", "The Metropolitan Museum of Art", "Museum of the American Military Family & Learning Center"],
    "lat" : [33.743511, 38.576942, 38.644302, 36.114269, 38.887806, 40.778965, 35.083359],
    "long" : [-117.165161, -121.504997, -90.261154, -115.148315, -77.019844, -73.962311, -106.381531]
})

您可以使用 sklearn 距离度量,它实现了半正弦

from sklearn.neighbors import DistanceMetric

dist = DistanceMetric.get_metric('haversine')

提取numpy数组值后
places_gps = towns[["lat", "long"]].values
museum_gps = museum[["lat", "long"]].values

你只是

EARTH_RADIUS = 6371.009

haversine_distances = dist.pairwise(np.radians(places_gps), np.radians(museum_gps) )
haversine_distances *= EARTH_RADIUS

获取 KM 中的距离。如果需要里程,请乘以常数。

如果您只对最近的几个感兴趣,或者都在半径范围内,请查看 sklearn BallTree 算法,该算法也实现了 haversine。它要快得多。


编辑:要将输出转换为数据帧,例如使用

pd_distances = pd.DataFrame(haversine_distances, columns=museum.name, index=towns.name, )
pd_distances