围绕集合质心聚类坐标的最佳方法（改进 Scikit K-Means 输出？朴素方法？）

Question

所以基本上我有两个坐标列表，一个是“家”点（本质上是质心），另一个是“目的地”点。我想将这些“目的地”坐标聚集到最近的“家”点（就好像“家”点是质心）。下面是我想要的示例：

输入:
[home_coords_1, home_coords_2, home_coords_3]
[destination_coords_1, destination_coords_2, destination_coords_3, destination_coords_4, destination_coords_5]

输出:
[[home_coords_1, destination_coords_2, destination_coords_5],[home_coords_2, destination_coords_4]、[home_coords_3、destination_coords_1、destination_coords_3]]
鉴于“目的地”坐标与其子数组中的“家”坐标非常接近

我已经通过将家庭坐标作为初始质心传递给 scikit python 包中的 K-Means 聚类函数来完成此操作。但是我注意到聚类中存在一些缺陷。此外，这似乎几乎是对 K-Means 聚类的不当使用，因为只发生了一次迭代（请参阅下面的代码行）。

km = KMeans(n_clusters=len(home_coords_list), n_init= 1, init= home_coords).fit(destination_coords)

这让我想到了我的问题：围绕预设坐标列表聚集坐标列表的最佳方法是什么。我正在考虑的另一种选择是运行通过“家”坐标列表，然后一个接一个地选择 n 最近的“目的地”坐标。不过，这似乎天真得多。有什么想法或建议吗？任何帮助表示赞赏！谢谢！

Answer 1

您可以使用例如scipy.spatial.KDTree.

from scipy.spatial import KDTree
import numpy as np

# sample arrays with home and destination coordinates
np.random.seed(0)
home = np.random.rand(10, 2)
destination = np.random.rand(50, 2)

kd_tree = KDTree(home)
labels = kd_tree.query(destination)[1]
print(labels)

这将给出一个数组，每个 destination 点给出最近的 home 点的索引：

[9 0 8 8 1 2 2 8 1 5 2 4 0 7 2 1 4 7 1 1 7 4 7 4 4 4 5 4 7 7 2 8 1 7 6 2 8
 7 7 4 5 9 2 1 3 3 5 5 5 5]

然后对于任何给定的 home 点，您可以找到与该点聚类的所有 destination 点的坐标：

# destination points clustered with `home[0]`
destination[labels == 0]

它给出：

array([[0.46147936, 0.78052918],
       [0.66676672, 0.67063787]])

围绕集合质心聚类坐标的最佳方法（改进 Scikit K-Means 输出？朴素方法？）

Best method to cluster coordinates around set centroids (Improving Scikit K-Means output? Naive methods?)

python

numpy

coordinates

k-means

scikit-learn