Scikit-learn KDTree query_radius return count 和 ind？

Question

我正在尝试 return count（邻居的数量）和 ind（所述邻居的索引）但我不能，除非我调用 query_radius 两次，虽然计算量很大，但实际上 比在 Python 中迭代和计算每一行的大小 Python 更快 ！这似乎非常低效，所以我想知道有没有办法在一个电话中 return 它们？

调用 query_radius 后，我尝试访问 tree 的计数和索引对象，但它们不存在。在 numpy 中没有有效的方法来做到这一点，是吗？

>>> array = np.array([[1,2,3], [2,3,4], [6,2,3]])
>>> tree = KDTree(array)
>>> neighbors = tree.query_radius(array, 1)
>>> tree.ind
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'sklearn.neighbors.kd_tree.KDTree' object has no attribute 'ind'
>>> tree.count
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'sklearn.neighbors.kd_tree.KDTree' object has no attribute 'count'

Answer 1

不确定为什么你认为你需要这样做两次：

a = np.random.rand(100,3)*10
tree = KDTree(a)
neighbors = tree.query_radius(a, 1)

%timeit counts = tree.query_radius(a, 1, count_only = 1)
1000 loops, best of 3: 231 µs per loop

%timeit counts = np.array([arr.size for arr in neighbors])
The slowest run took 5.66 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 22.5 µs per loop

仅在 neighbors 中查找数组对象的大小比重做 tree.query_radius

要快得多

Answer 2

考虑这个数据集：

array = np.random.random((10**5, 3))*10
tree = KDTree(array)

您在问题中确定了 3 个选项：

1) 调用 tree.query_radius 两次以获取邻居及其计数。

neighbors = tree.query_radius(array, 1)
counts = tree.query_radius(array, 1, count_only=1)

这需要 8.347 秒。

2) 仅获取邻居，然后通过遍历它们来获取计数：

neighbors = tree.query_radius(array, 1)
counts = []
for i in range(len(neighbors)):
    counts.append(len(neighbors[i]))

这比第一种方法快得多，需要 4.697 秒

3) 现在，我们可以改进循环时间来计算counts。

neighbors = tree.query_radius(array, 1)
len_array = np.frompyfunc(len, 1, 1)
counts = len_array(neighbors)

这是最快的，为 4.449 秒。

Scikit-learn KDTree query_radius return count 和 ind？

Scikit-learn KDTree query_radius return both count and ind?

python

numpy

machine-learning

kdtree

scikit-learn