在二维数组中查找到最近邻居的距离
Find distance to nearest neighbor in 2d array
我有一个二维数组,我想尽快为每个 (x, y)
点找到到其最近邻居的 距离。
我可以使用 scipy.spatial.distance.cdist:
import numpy as np
from scipy.spatial.distance import cdist
# Random data
data = np.random.uniform(0., 1., (1000, 2))
# Distance between the array and itself
dists = cdist(data, data)
# Sort by distances
dists.sort()
# Select the 1st distance, since the zero distance is always 0.
# (distance of a point with itself)
nn_dist = dists[:, 1]
这行得通,但我觉得它的工作太多了,KDTree 应该能够处理这个问题,但我不确定如何处理。我对最近邻居的 坐标 不感兴趣,我只想要距离(并且尽可能快)。
KDTree 可以做到这一点。该过程与使用 cdist 时几乎相同。但是 cdist 要快得多。正如评论中指出的那样,cKDTree 甚至更快:
import numpy as np
from scipy.spatial.distance import cdist
from scipy.spatial import KDTree
from scipy.spatial import cKDTree
import timeit
# Random data
data = np.random.uniform(0., 1., (1000, 2))
def scipy_method():
# Distance between the array and itself
dists = cdist(data, data)
# Sort by distances
dists.sort()
# Select the 1st distance, since the zero distance is always 0.
# (distance of a point with itself)
nn_dist = dists[:, 1]
return nn_dist
def KDTree_method():
# You have to create the tree to use this method.
tree = KDTree(data)
# Then you find the closest two as the first is the point itself
dists = tree.query(data, 2)
nn_dist = dists[0][:, 1]
return nn_dist
def cKDTree_method():
tree = cKDTree(data)
dists = tree.query(data, 2)
nn_dist = dists[0][:, 1]
return nn_dist
print(timeit.timeit('cKDTree_method()', number=100, globals=globals()))
print(timeit.timeit('scipy_method()', number=100, globals=globals()))
print(timeit.timeit('KDTree_method()', number=100, globals=globals()))
输出:
0.34952507635557595
7.904083715193579
20.765962179145546
再一次证明 C 很棒!
我有一个二维数组,我想尽快为每个 (x, y)
点找到到其最近邻居的 距离。
我可以使用 scipy.spatial.distance.cdist:
import numpy as np
from scipy.spatial.distance import cdist
# Random data
data = np.random.uniform(0., 1., (1000, 2))
# Distance between the array and itself
dists = cdist(data, data)
# Sort by distances
dists.sort()
# Select the 1st distance, since the zero distance is always 0.
# (distance of a point with itself)
nn_dist = dists[:, 1]
这行得通,但我觉得它的工作太多了,KDTree 应该能够处理这个问题,但我不确定如何处理。我对最近邻居的 坐标 不感兴趣,我只想要距离(并且尽可能快)。
KDTree 可以做到这一点。该过程与使用 cdist 时几乎相同。但是 cdist 要快得多。正如评论中指出的那样,cKDTree 甚至更快:
import numpy as np
from scipy.spatial.distance import cdist
from scipy.spatial import KDTree
from scipy.spatial import cKDTree
import timeit
# Random data
data = np.random.uniform(0., 1., (1000, 2))
def scipy_method():
# Distance between the array and itself
dists = cdist(data, data)
# Sort by distances
dists.sort()
# Select the 1st distance, since the zero distance is always 0.
# (distance of a point with itself)
nn_dist = dists[:, 1]
return nn_dist
def KDTree_method():
# You have to create the tree to use this method.
tree = KDTree(data)
# Then you find the closest two as the first is the point itself
dists = tree.query(data, 2)
nn_dist = dists[0][:, 1]
return nn_dist
def cKDTree_method():
tree = cKDTree(data)
dists = tree.query(data, 2)
nn_dist = dists[0][:, 1]
return nn_dist
print(timeit.timeit('cKDTree_method()', number=100, globals=globals()))
print(timeit.timeit('scipy_method()', number=100, globals=globals()))
print(timeit.timeit('KDTree_method()', number=100, globals=globals()))
输出:
0.34952507635557595
7.904083715193579
20.765962179145546
再一次证明 C 很棒!