当组件数量选择相同时，sklearn PCA 对输入数组有何影响？

Question

例如我们有：

from sklearn.decomposition import PCA
import numpy as np 

xx = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA()
pca.fit_transform(xx)

输出：

array([[ 1.38340578,  0.2935787 ],
   [ 2.22189802, -0.25133484],
   [ 3.6053038 ,  0.04224385],
   [-1.38340578, -0.2935787 ],
   [-2.22189802,  0.25133484],
   [-3.6053038 , -0.04224385]])

在这种情况下，我没有减小大小，而是更改了数组...为什么？

Answer 1

PCA 对您的特征进行线性（旋转）变换 space。在你的情况下，假设特征 1 沿 x，特征 2 沿 y，生成的变换与将特征向量旋转 theta ~ 2.565 弧度的角度相同。下面我定义了这样一个旋转矩阵并向您展示了相同的结果：

import numpy as np
def rot_matrix(theta):
    # returns rotation matrix through angle theta
    rotation_matrix = np.dot(np.array([[np.cos(theta), -

np.sin(theta)], [np.sin(theta), np.cos(theta)]])
        return rotation_matrix

theta = 2.565
rot = rot_matrix(theta)
np.dot(rot, xx.T).T

结果是（接近于）PCA 变换的输出：

array([[ 1.38349574,  0.29315446],
       [ 2.22182084, -0.25201619],
       [ 3.60531658,  0.04113827],
       [-1.38349574, -0.29315446],
       [-2.22182084,  0.25201619],
       [-3.60531658, -0.04113827]])

当组件数量选择相同时，sklearn PCA 对输入数组有何影响？

What does the sklearn PCA to the input array when when the number of components is choose to be the same?

machine-learning

pca

scikit-learn