使用 Pandas 对 GPS 点进行分组

Question

我有一个 Pandas 塔数据框，例如：

site       lat      lon
18ALOP01   11.1278  14.3578
18ALOP02   11.1278  14.3578
18ALOP12   11.1288  14.3575
18PENO01   11.1580  14.2898

如果距离太近 (50m)，我需要将它们分组。然后，我制作了一个脚本来执行 "self cross join"，计算所有站点组合之间的距离，并为距离小于阈值的站点设置相同的 id。所以，如果我有 n 个网站，它会计算 (n^2) - n 组合，那么，这是一个糟糕的算法。有更好的方法吗？

Answer 1

假设站点的数量和 "true" 位置未知，您可以尝试 MeanShift clustering algorithm. While that is a general-purpose algorithm and not highly scalable it will be faster than implementing your own clustering algorithm in python, and you could experiment with bin_seeding=True 作为优化，如果将数据点合并到网格中是修剪起始点的可接受的捷径种子。（注意：如果将数据点合并到网格而不是计算点之间的欧几里德距离是可接受的 "full" 解决方案，那么这似乎是解决您的问题的最快方法。）

下面是 scikit-learn 实现 MeanShift 的示例，其中 x/y 坐标以米为单位，算法创建半径为 50m 的簇。

In [2]: from sklearn.cluster import MeanShift

In [3]: import numpy as np

In [4]: X = np.array([
   ...:     [0, 1], [51, 1], [100, 1], [151, 1],
   ...: ])

In [5]: clustering = MeanShift(bandwidth=50).fit(X)  # OR speed up with bin_seeding=True

In [6]: print(clustering.labels_)
[1 0 0 2]

In [7]: print(clustering.cluster_centers_)
[[ 75.5   1. ]
 [  0.    1. ]
 [151.    1. ]]

使用 Pandas 对 GPS 点进行分组

Group GPS points with Pandas

python

geo

pandas

geopandas