如何找到对 PCA 贡献最大的特征？

Question

我对我的数据（~250 个特征）进行运行 PCA，发现所有点都聚集在 3 个 blob 中。

能否看出 250 个特征中哪些特征对结果的贡献最大？如果是怎么办？

（使用 Scikit-learn 实现）

Answer 1

让我们看看维基百科是怎么说的：

PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

要了解 'influent' 是来自原始 space 的向量在较小的向量中，您还必须对它们进行投影。由以下人员完成：

res = pca.transform(np.eye(D))

np.eye(n) 创建一个 n x n 对角矩阵（一个在对角线上，否则为 0）。
因此，np.eye(D) 是您在原始特征中的特征 space
res 是您的特征在较低 space 中的投影。

有趣的是res是一个D x d矩阵，其中res[i][j]代表"how much feature i contribute to component j"

然后，您可以对列求和以获得 D x 1 矩阵（称之为 contributiion，其中每个 contribution[i] 是特征我.

对其进行排序，您会发现贡献最大的特征:)

不确定是否清楚，可以添加任何类型的附加信息。

希望这对您有所帮助， pltrdy

如何找到对 PCA 贡献最大的特征？

How to find most contributing features to PCA?

pca

scikit-learn