计算数据集中所有点的所有第 n 个最近点

compute all n-th closest points of all points in a dataset

我有一个平面上 1000 个点的数据集。我表示了 P 中所有可能的点对,并计算了所有可能对的距离。 我要做的是:对于给定的n,计算P中所有点p的所有第n个最近点。

我之前做了什么:

P_pairs = [((33, 9), (34, 13)), ((33, 9), (62, 119)), ((33, 9), (33, 7)), ((33, 9), (48, 123)), ...]

listofdistances =  [{'((33, 9), (34, 13))': 4.123105625617661}, {'((33, 9), (62, 119))': 113.75851616472501}, {'((33, 9), (33, 7))': 2.0}, ...]

在这种情况下,我被困在排序 listofdistances 中,这样对于每个点,都有最小的 n 个距离作为剩余值。

也许我必须直接计算第n个最近的点,而不是计算所有点的距离。但是我不太清楚。

创建一个包含所有可能对的列表,然后创建一个以距离作为值的单键字典列表确实会造成排序问题。相反,我会矢量化这项工作并使用 numpy。

import numpy as np

P = np.array([(33, 9), (34, 13), (62, 119), ...])

# Finds the n closest points to p in P
def n_closest_points(p, P, n)
    p_vector = np.tile(p, (len(P), 1))
    dists = np.linalg.norm(P-p_vector, axis=1)
    sorted_dists = np.sort(dists)

    # Exclude the 0th element as the distance from p to itself is 0
    return sorted_dists[1:n+1] 
P = [(33, 9), (34, 13), (62, 119), (33, 7), (48, 123)]
P = np.array(P)

x, y = P[:,0], P[:,1]
# Create a distance table of point (row) vs point (column)
dist = np.sqrt((x - x[:,None])**2 + (y - y[:,None])**2)
# The diagonals are 0, as the distance of a point to itself is 0,
# but we want that to have a large value so it comes last in sorting
np.fill_diagonal(dist, np.inf)
# Get the sorted index for each row
idx = dist.argsort(axis=1)

现在如果你想要第 n 个最近的邻居,n = 3,你可以用 idx = idx[:,:3] 得到它。对于第一点,您现在可以做

P[0]             # the point itself
P[idx[0]]        # its nearest neighbours
dist[0,idx[0]]   # their distances