PCA 后马氏距离不等于欧几里德距离

Mahalanobis distance not equal to Euclidean distance after PCA

我正在尝试将马氏距离计算为使用 PCA 转换后的欧几里得距离,但是,我没有得到相同的结果。以下代码:

import numpy as np
from scipy.spatial.distance import mahalanobis
from sklearn.decomposition import PCA

X = [[1,2], [2,2], [3,3]]

mean = np.mean(X, axis=0)
cov = np.cov(X, rowvar=False)
covI = np.linalg.inv(cov)

maha = mahalanobis(X[0], mean, covI)
print(maha)

pca = PCA()

X_transformed = pca.fit_transform(X)

stdev = np.std(X_transformed, axis=0)
X_transformed /= stdev

print(np.linalg.norm(X_transformed[0]))

打印

1.1547005383792515
1.4142135623730945

据我了解,PCA 不相关维度,除以标准差对每个维度的权重相等,因此欧氏距离应等于马氏距离。我哪里错了?

根据this discussion, the relationship between PCA and the Mahalanobis distance only holds true with PCA components with unit variance. This can be obtained by applying PCA on the whitened data (more information here).

一旦这样做,原始 space 中的马氏距离就等于 PCA 中的欧氏距离 space。您可以在下面的代码中看到它的演示:

import numpy as np
from scipy.spatial.distance import mahalanobis,euclidean
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

X = np.array([[1,2], [2,2], [3,3]])

cov = np.cov(X, rowvar=False)
covI = np.linalg.inv(cov)
mean=np.mean(X)
maha = mahalanobis(X[0], X[1], covI)

pca = PCA(whiten=True)
X_transformed= pca.fit_transform(X)

print('Mahalanobis distance: '+str(maha))
print('Euclidean distance: '+str(euclidean(X_transformed[0],X_transformed[1])))

输出给出:

Mahalanobis distance: 2.0
Euclidean distance: 2.0000000000000004