PCA 矩阵与 sklearn
PCA matrix with sklearn
我对一些数据进行了 PCA,我想提取 PCA 矩阵。这是我的代码(不包括加载数据):
from sklearn.decomposition import PCA
pca = PCA(n_components=5)
pca_result = pca.fit_transform(recon.data.cpu().numpy())
M = pca.components_
我认为M应该是PCA矩阵。但是,当我打印 pca_result
(前几行)时,我得到了这个:
[-21.08167 , -5.67821 , 0.17554353, -0.732398 ,0.04658243],
[-25.936056 , -6.535223 , 0.6887493 , -0.8394666 ,0.06557591],
[-30.755266 , -6.0098953 , 1.1643354 , -0.82322127,0.07585468]
但是当我打印 np.transpose(np.matmul(M,np.transpose(recon)))
时,我得到了这个:
[-27.78438 , -2.5913327 , 0.87771094, -1.0819707 ,0.1037216 ],
[-32.63887 , -3.4483302 , 1.3909296 , -1.1890743 ,0.12274324],
[-37.45802 , -2.9229708 , 1.8665184 , -1.1728177 ,0.13301012]
我做错了什么以及如何获得实际的 PCA 矩阵?谢谢!
in a PCA you go from an n-dimensional space to a different (rotated) n-dimensional space. This change is done using an nxn matrix
这确实是pca.components_
返回的矩阵;当乘以 PCA 转换后的数据时,它给出了原始数据 X 的重建。
这是虹膜数据的演示:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
X = load_iris().data
mu = np.mean(X, axis=0) # mean value
pca = PCA()
X_pca = pca.fit_transform(X)
M = pca.components_
M
# result:
array([[ 0.36138659, -0.08452251, 0.85667061, 0.3582892 ],
[ 0.65658877, 0.73016143, -0.17337266, -0.07548102],
[-0.58202985, 0.59791083, 0.07623608, 0.54583143],
[-0.31548719, 0.3197231 , 0.47983899, -0.75365743]])
即确实是一个 4x4 矩阵(虹膜数据有 4 个特征)。
让我们使用所有PC重建原始数据:
X_hat = np.matmul(X_pca, M)
X_hat = X_hat + mu # add back the mean
print(X_hat[0]) # reconstructed
print(X_[0]) # original
结果:
[5.1 3.5 1.4 0.2]
[5.1 3.5 1.4 0.2]
即完美重构
用更少的 PC 重建,假设 2 台(共 4 台):
n_comp = 2
X_hat2 = np.matmul(X_pca[:,:n_comp], pca.components_[:n_comp,:])
X_hat2 = X_hat2 + mu
print(X_hat2[0])
结果:
[5.08303897 3.51741393 1.40321372 0.21353169]
即由于使用过的 PC 被截断(2 台而不是全部 4 台),重建不太准确,正如我们所期望的那样。
(代码改编自 Cross Validated 的伟大线程 How to reverse PCA and reconstruct original variables from several principal components?。)
我对一些数据进行了 PCA,我想提取 PCA 矩阵。这是我的代码(不包括加载数据):
from sklearn.decomposition import PCA
pca = PCA(n_components=5)
pca_result = pca.fit_transform(recon.data.cpu().numpy())
M = pca.components_
我认为M应该是PCA矩阵。但是,当我打印 pca_result
(前几行)时,我得到了这个:
[-21.08167 , -5.67821 , 0.17554353, -0.732398 ,0.04658243],
[-25.936056 , -6.535223 , 0.6887493 , -0.8394666 ,0.06557591],
[-30.755266 , -6.0098953 , 1.1643354 , -0.82322127,0.07585468]
但是当我打印 np.transpose(np.matmul(M,np.transpose(recon)))
时,我得到了这个:
[-27.78438 , -2.5913327 , 0.87771094, -1.0819707 ,0.1037216 ],
[-32.63887 , -3.4483302 , 1.3909296 , -1.1890743 ,0.12274324],
[-37.45802 , -2.9229708 , 1.8665184 , -1.1728177 ,0.13301012]
我做错了什么以及如何获得实际的 PCA 矩阵?谢谢!
in a PCA you go from an n-dimensional space to a different (rotated) n-dimensional space. This change is done using an nxn matrix
这确实是pca.components_
返回的矩阵;当乘以 PCA 转换后的数据时,它给出了原始数据 X 的重建。
这是虹膜数据的演示:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
X = load_iris().data
mu = np.mean(X, axis=0) # mean value
pca = PCA()
X_pca = pca.fit_transform(X)
M = pca.components_
M
# result:
array([[ 0.36138659, -0.08452251, 0.85667061, 0.3582892 ],
[ 0.65658877, 0.73016143, -0.17337266, -0.07548102],
[-0.58202985, 0.59791083, 0.07623608, 0.54583143],
[-0.31548719, 0.3197231 , 0.47983899, -0.75365743]])
即确实是一个 4x4 矩阵(虹膜数据有 4 个特征)。
让我们使用所有PC重建原始数据:
X_hat = np.matmul(X_pca, M)
X_hat = X_hat + mu # add back the mean
print(X_hat[0]) # reconstructed
print(X_[0]) # original
结果:
[5.1 3.5 1.4 0.2]
[5.1 3.5 1.4 0.2]
即完美重构
用更少的 PC 重建,假设 2 台(共 4 台):
n_comp = 2
X_hat2 = np.matmul(X_pca[:,:n_comp], pca.components_[:n_comp,:])
X_hat2 = X_hat2 + mu
print(X_hat2[0])
结果:
[5.08303897 3.51741393 1.40321372 0.21353169]
即由于使用过的 PC 被截断(2 台而不是全部 4 台),重建不太准确,正如我们所期望的那样。
(代码改编自 Cross Validated 的伟大线程 How to reverse PCA and reconstruct original variables from several principal components?。)