相同向量之间的余弦距离不等于 0

Question

我正在尝试从向量列表中检索向量的最近邻居，使用：

neigh = NearestNeighbors(metric='cosine')

neigh.fit(list)

根据我的阅读和目睹，如果 vector1 和 vector2 在所有维度上具有相同的精确值，从中检索的距离这两个向量将等于 0。我正在使用 kneighbors 方法来计算距离。

neigh.kneighbors(vector_input)

然而，在某些情况下（并非所有情况下）即使两个向量相等，检索到的距离也不等于 0，而是一些很小的数字，如 2.34e-16。

len([i for i, j in zip(vector_from_list,vector_input) if i == j]) returns the dimension of the list meaning that each i-index element is equal to the i-index element of the other vector. Therefore, the vectors, if I'm not wrong, are totally equal.

所有向量的 dtype 都是 np.float64

是不是求距离的方法不一致？还是我忽略了 scikit 方法中的某些内容（例如参数）？

Answer 1

我认为这是预期的行为。

如果您想使用距离等于零的条件，请考虑使用 numpy.isclose。例如，

import numpy as np

a = 2.34e-16
b = 1.7e-14 # both tiny values, almost zero
print(a==b) # prints False
print(np.isclose(a,b)) # prints True

您可以通过设置函数的其他参数来设置您希望该值接近的程度。有关更多信息，请参阅 documentation。

或者，您也可以使用 python 的内置函数 math.isclose。参见 documentation。例如，

import math

a = 2.34e-16
b = 1.7e-14 # both tiny values, almost zero
print(math.isclose(a,b, abs_tol=1e-10)) # True

相同向量之间的余弦距离不等于 0

Cosine distance between same vectors not equal 0

python

numpy

nearest-neighbor

scikit-learn