为什么 Scipy 的 KDTree 这么慢？

Question

假设我有大约 100 组 100 个点，我想找出哪些点彼此之间的给定距离内。我有两种实现方式，一种使用 k-d 树，另一种简单地获取成对距离：

from scipy.spatial.distance import cdist
from scipy.spatial import KDTree
from itertools import combinations
import numpy
import time

pts = [numpy.random.randn(100,2) for x in range(100)]


start = time.time()

for p1, p2 in combinations(pts,2):
    numpy.argwhere(cdist(p1, p2) < 0.5)

print(time.time() - start)


start = time.time()

trees = [KDTree(x) for x in pts]

for p1, p2 in combinations(trees,2):
    p1.query_ball_tree(p2,0.5,eps=1)

print(time.time() - start)

在我的机器上 cdist 需要 0.5 秒，而 KDTree 执行需要整整一分钟。构建树需要 0.03 秒。我希望 KDTree 方法更快，因为它不需要考虑每一对可能的点。

所以，我误解了什么，可以更快地完成吗？

Answer 1

很纯粹python。替代实现 cKDTree 快得多。

为什么 Scipy 的 KDTree 这么慢？

Why is Scipy's KDTree so slow?

python

numpy

kdtree

scipy