在二维数组中查找到最近邻居的距离

Question

我有一个二维数组，我想尽快为每个 (x, y) 点找到到其最近邻居的距离。

import numpy as np
from scipy.spatial.distance import cdist

# Random data
data = np.random.uniform(0., 1., (1000, 2))
# Distance between the array and itself
dists = cdist(data, data)
# Sort by distances
dists.sort()
# Select the 1st distance, since the zero distance is always 0.
# (distance of a point with itself)
nn_dist = dists[:, 1]

这行得通，但我觉得它的工作太多了，KDTree 应该能够处理这个问题，但我不确定如何处理。我对最近邻居的坐标不感兴趣，我只想要距离（并且尽可能快）。

Answer 1

KDTree 可以做到这一点。该过程与使用 cdist 时几乎相同。但是 cdist 要快得多。正如评论中指出的那样，cKDTree 甚至更快：

import numpy as np
from scipy.spatial.distance import cdist
from scipy.spatial import KDTree
from scipy.spatial import cKDTree
import timeit

# Random data
data = np.random.uniform(0., 1., (1000, 2))

def scipy_method():
    # Distance between the array and itself
    dists = cdist(data, data)
    # Sort by distances
    dists.sort()
    # Select the 1st distance, since the zero distance is always 0.
    # (distance of a point with itself)
    nn_dist = dists[:, 1]
    return nn_dist

def KDTree_method():
    # You have to create the tree to use this method.
    tree = KDTree(data)
    # Then you find the closest two as the first is the point itself
    dists = tree.query(data, 2)
    nn_dist = dists[0][:, 1]
    return nn_dist

def cKDTree_method():
    tree = cKDTree(data)
    dists = tree.query(data, 2)
    nn_dist = dists[0][:, 1]
    return nn_dist

print(timeit.timeit('cKDTree_method()', number=100, globals=globals()))
print(timeit.timeit('scipy_method()', number=100, globals=globals()))
print(timeit.timeit('KDTree_method()', number=100, globals=globals()))

输出：

0.34952507635557595
7.904083715193579
20.765962179145546

再一次证明 C 很棒！

在二维数组中查找到最近邻居的距离

Find distance to nearest neighbor in 2d array

python

numpy

nearest-neighbor

scipy

euclidean-distance