在 PCA 之后找出我的组件中有哪些功能

Question

我对我的数据进行了主成分分析。数据如下所示：

df
Out[60]: 
        Drd1_exp1  Drd1_exp2  Drd1_exp3  ...  M7_pppp  M7_puuu  Brain_Region
0            -1.0       -1.0       -1.0  ...      0.0      0.0          BaGr

3            -1.0       -1.0       -1.0  ...      0.0      0.0          BaGr
4            -1.0       -1.0       -1.0  ...      0.0      0.0          BaGr
          ...        ...        ...  ...      ...      ...           ...
150475       -1.0       -1.0       -1.0  ...      0.0      0.0          BaGr
150478       -1.0       -1.0       -1.0  ...      0.0      0.0          BaGr
150479       -1.0       -1.0       -1.0  ...      0.0      0.0          BaGr

我知道 'Brain Regions' 之前的每一行都用作特征。我也标准化了它们。这些特征是不同的实验，它们为我提供了有关大脑 3D 图像的信息。我会告诉你我的代码：

from sklearn.preprocessing import StandardScaler
x = df.loc[:, listend1].values
y= df.loc[:, 'Brain_Region'].values

x = StandardScaler().fit_transform(x)

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])

finalDf = pd.concat([principalDf, df[['Brain_Region']]], axis = 1)

然后我绘制了 finalDF：

我现在的问题是：我怎样才能知道哪些功能有助于我的组件？我怎样才能找到并解释数据？

Answer 1

您可以使用 pca.components_（或 pca.components，具体取决于 sklearn 版本）。它的形状为 (n_components, n_features)，在您的例子中为 (2, n_features)，表示数据中最大方差的方向，它反映了特征向量中相应值的大小（幅度越大 - 重要性越高）。你会得到这样的东西：

[[0.522 0.26 0.58 0.56],
 [0.37 0.92 0.02 0.06]]

暗示对于第一个组件（第一行），第一个、第三个和最后一个特征具有更高的重要性，而对于第二个组件，只有第二个特征很重要。

看看sklern PCA attributes description or to this 。

顺便说一句，你也可以使用 Random Forest Classifier including the labels, and after the training you can explore the feature importance, e.g. this post.

在 PCA 之后找出我的组件中有哪些功能

Find out which features are in my components after PCA

python

statistics

pca

pandas

scikit-learn