生成 'K' 个数据点的最近邻

Question

我需要给定一个数据点生成 K 个最近的邻居。我读了 sklearn.neighbours module of sklearn 但它在两组数据之间生成了邻居。我想要的可能是最接近传递的数据点的 100 个数据点的列表。

无论如何，任何 KNN 算法都应该在底层找到这 K 个数据点。有什么办法可以将这些 K 点作为输出返回吗？

Answer 1

您无需深入了解。

将 kd-tree for nearest-neighbor lookup. Once, you have the index ready, you would query 用于 k-NN。

参考示例：

>>> from scipy import spatial
>>> x, y = np.mgrid[0:5, 2:8]
>>> tree = spatial.KDTree(list(zip(x.ravel(), y.ravel())))
>>> pts = np.array([[0, 0], [2.1, 2.9]])
>>> tree.query(pts)
(array([ 2.        ,  0.14142136]), array([ 0, 13]))
>>> tree.query(pts[0])
(2.0, 0)

Answer 2

from sklearn.neighbors import NearestNeighbors

这可以为您提供数据集中 k 个最近邻居的索引。使用kneighbors，第一个值是距离，第二个值是邻居的索引。来自文档：

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=1)
>>> neigh.fit(samples) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> print(neigh.kneighbors([[1., 1., 1.]])) 
(array([[0.5]]), array([[2]]))

生成 'K' 个数据点的最近邻

Generate 'K' Nearest Neighbours to a datapoint

python

nearest-neighbor

knn

pandas

scikit-learn