当组件数量选择相同时,sklearn PCA 对输入数组有何影响?

What does the sklearn PCA to the input array when when the number of components is choose to be the same?

例如我们有:

from sklearn.decomposition import PCA
import numpy as np 

xx = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA()
pca.fit_transform(xx)

输出:

array([[ 1.38340578,  0.2935787 ],
   [ 2.22189802, -0.25133484],
   [ 3.6053038 ,  0.04224385],
   [-1.38340578, -0.2935787 ],
   [-2.22189802,  0.25133484],
   [-3.6053038 , -0.04224385]])

在这种情况下,我没有减小大小,而是更改了数组...为什么?

PCA 对您的特征进行线性(旋转)变换 space。在你的情况下, 假设特征 1 沿 x,特征 2 沿 y,生成的变换与将特征向量旋转 theta ~ 2.565 弧度的角度相同。下面我定义了这样一个旋转矩阵并向您展示了相同的结果:

import numpy as np
def rot_matrix(theta):
    # returns rotation matrix through angle theta
    rotation_matrix = np.dot(np.array([[np.cos(theta), -

np.sin(theta)], [np.sin(theta), np.cos(theta)]])
        return rotation_matrix

theta = 2.565
rot = rot_matrix(theta)
np.dot(rot, xx.T).T

结果是(接近于)PCA 变换的输出:

array([[ 1.38349574,  0.29315446],
       [ 2.22182084, -0.25201619],
       [ 3.60531658,  0.04113827],
       [-1.38349574, -0.29315446],
       [-2.22182084,  0.25201619],
       [-3.60531658, -0.04113827]])