使用 Numpy 获取训练集中与输入矩阵最短距离的数据点索引

Question

我想构建一个函数 npbatch(U,X)，它将输入矩阵 (U) 中的数据点与训练矩阵 (X) 中的数据点进行比较，并得到 X 的索引和最短的欧几里德到 U 中数据点的距离。我想避免任何循环以提高性能，我想使用函数 scipy.spatial.distance.cdist 来计算距离。

示例输入：

U
array([[0.69646919, 0.28613933, 0.22685145],
       [0.55131477, 0.71946897, 0.42310646],
       [0.9807642 , 0.68482974, 0.4809319 ]])

X
array([[0.24875591, 0.16306678, 0.78364326],
       [0.80852339, 0.62562843, 0.60411363],
       [0.8857019 , 0.75911747, 0.18110506]])

--> 预期输出：X 中数据点的三个索引与 U 中三个数据点的距离最短的数组。

我的总体目标是使用我得到的索引获取相应数据点的标签。标签输入示例为：

Y
array([1, 0, 0])

感谢您的任何提示！

Answer 1

使用 scipy.spatial.distance.cdist 您已经为该任务选择了一个非常适合的函数。要获得索引，我们只需要沿轴 0（或轴 1 为 cdist(U, X)）应用 numpy.argmin：

ix = numpy.argmin(scipy.spatial.distance.cdist(X, U), 0)

获取标签就很简单了：

Y[ix]

使用 Numpy 获取训练集中与输入矩阵最短距离的数据点索引

Get Index of data point in training set with shortest distance to input matrix with Numpy

python

numpy

scipy

euclidean-distance