最近邻的距离函数的输入维度

Question

在 scikit-learn 的无监督最近邻的情况下，我实现了自己的距离函数来处理我的不确定点（即一个点表示为正态分布）：

def my_mahalanobis_distance(x, y):

'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, 
                        x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
                        y[2]: cov_y_11, y[3]: cov_y_22 
'''     

    cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
    return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)

但是，当我设置最近的邻居时：

nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

其中 X 是一个 (N, 4) (n_samples, n_features) 数组，如果我在我的 my_mahalanobis_distance 中打印 x 和 y，我得到 (10,) 的形状] 而不是我期望的 (4,)。

示例：

我将以下行添加到 my_mahalanobis_distance:

print(x.shape)

然后在我的主要：

n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

结果是：

(10,)
ValueError: shapes (2,) and (8,8) not aligned: 2 (dim 0) != 8 (dim 0)

我完全理解这个错误，但我不明白为什么我的 x.shape 是 (10,) 而我的特征数量是 4 in X.

我正在使用 Python 2.7.10 和 scikit-learn 0.16.1.

编辑：

将 return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv) 替换为 return 1 只是为了测试 return:

(10,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)

所以只有第一次调用 my_mahalanobis_distance 是错误的。查看第一次迭代中的 x 和 y 值，我的观察结果是：

x 和 y 相同
如果我多次运行我的代码，x 和 y 仍然相同，但它们的值与之前的运行相比发生了变化.
这些值似乎来自 numpy.random 函数。

我会得出结论，这样的第一次调用是一段尚未删除的调试代码。

Answer 1

这不是答案，但对于评论来说太长了。我无法重现错误。

使用：

Python 3.5.2 和 Sklearn 0.18.1

使用代码：

from sklearn.neighbors import NearestNeighbors
import numpy as np
import scipy as sp
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)


def my_mahalanobis_distance(x, y):    
    cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
    print(x.shape)
    return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)

n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

输出为

(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)

Answer 2

我定制了我的 my_mahalanobis_distance 来处理这个问题：

def my_mahalanobis_distance(x, y):
    '''
    x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, 
                            x[2]: cov_x_11, x[3]: cov_x_22
    y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
                            y[2]: cov_y_11, y[3]: cov_y_22 
    '''     

    if (x.size, y.size) == (4, 4):        

        return sp.spatial.distance.mahalanobis(x[:2], y[:2], 
                                           np.linalg.inv(np.diag(x[2:]) 
                                           + np.diag(y[2:])))

    # to handle the buggy first call when calling NearestNeighbors.fit()
    else:
        warnings.warn('x and y are respectively of size %i and %i' % (x.size, y.size))
        return sp.spatial.distance.euclidean(x, y)

最近邻的距离函数的输入维度

Input dimensions for distance function for nearest neighbors

python

nearest-neighbor

scikit-learn