最近邻的距离函数的输入维度

Input dimensions for distance function for nearest neighbors

scikit-learn 的无监督最近邻的情况下,我实现了自己的距离函数来处理我的不确定点(即一个点表示为正态分布):

def my_mahalanobis_distance(x, y):

'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, 
                        x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
                        y[2]: cov_y_11, y[3]: cov_y_22 
'''     

    cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
    return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)

但是,当我设置最近的邻居时:

nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

其中 X 是一个 (N, 4) (n_samples, n_features) 数组,如果我在我的 my_mahalanobis_distance 中打印 xy,我得到 (10,) 的形状] 而不是我期望的 (4,)

示例:

我将以下行添加到 my_mahalanobis_distance:

print(x.shape)

然后在我的主要:

n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

结果是:

(10,)
ValueError: shapes (2,) and (8,8) not aligned: 2 (dim 0) != 8 (dim 0)

我完全理解这个错误,但我不明白为什么我的 x.shape(10,) 而我的特征数量是 4 in X.

我正在使用 Python 2.7.10scikit-learn 0.16.1.

编辑:

return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv) 替换为 return 1 只是为了测试 return:

(10,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)

所以只有第一次调用 my_mahalanobis_distance 是错误的。查看第一次迭代中的 xy 值,我的观察结果是:

我会得出结论,这样的第一次调用是一段尚未删除的调试代码。

这不是答案,但对于评论来说太长了。我无法重现错误。

使用:

Python 3.5.2 和 Sklearn 0.18.1

使用代码:

from sklearn.neighbors import NearestNeighbors
import numpy as np
import scipy as sp
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)


def my_mahalanobis_distance(x, y):    
    cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
    print(x.shape)
    return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)

n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)

输出为

(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)

我定制了我的 my_mahalanobis_distance 来处理这个问题:

def my_mahalanobis_distance(x, y):
    '''
    x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, 
                            x[2]: cov_x_11, x[3]: cov_x_22
    y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
                            y[2]: cov_y_11, y[3]: cov_y_22 
    '''     

    if (x.size, y.size) == (4, 4):        

        return sp.spatial.distance.mahalanobis(x[:2], y[:2], 
                                           np.linalg.inv(np.diag(x[2:]) 
                                           + np.diag(y[2:])))

    # to handle the buggy first call when calling NearestNeighbors.fit()
    else:
        warnings.warn('x and y are respectively of size %i and %i' % (x.size, y.size))
        return sp.spatial.distance.euclidean(x, y)