更改数据框位置后,主成分分析 (PCA) 解释的方差保持不变

Principle Component Analysis (PCA) Explained Variance remains the same after changing dataframe position

我有一个数据框,其中 A 和 B 用于预测 C

df = df[['A','B','C']]
array = df.values

X = array[:,0:-1]
Y = array[:,-1]

# Feature Importance
model = GradientBoostingClassifier()
model.fit(X, Y)
print ("Importance:")
print((model.feature_importances_)*100)


#PCA
pca = PCA(n_components=len(df.columns)-1)
fit = pca.fit(X)

print("Explained Variance")
print(fit.explained_variance_ratio_)

这会打印

Importance:
[ 53.37975706  46.62024294]
Explained Variance
[ 0.98358394  0.01641606]

然而,当我改变数据框位置交换A和B时,只有重要性改变了,但解释方差仍然存在,为什么解释方差没有根据[0.01641606 0.98358394]改变?

df = df[['B','A','C']]


Importance:
[ 46.40771024  53.59228976]
Explained Variance
[ 0.98358394  0.01641606]

解释方差不涉及 A 或 B 或数据框的任何列。它是指PCA识别出的主成分,是列的一些线性组合。这些组件按方差递减的顺序排序,如 documentation 所说:

components_ : array, shape (n_components, n_features) Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_.

explained_variance_ : array, shape (n_components,) The amount of variance explained by each of the selected components. Equal to n_components largest eigenvalues of the covariance matrix of X.

explained_variance_ratio_ : array, shape (n_components,) Percentage of variance explained by each of the selected components.

因此,特征的顺序不会影响返回的组件的顺序。它确实会影响数组 components_,这是一个可用于将主成分映射到特征 space.

的矩阵