使用 numpy 计算 PCA

Question

我是 python 编程新手，想问一下 numpy 中的 PCA（主成分分析）。我有一个包含 2d numpy 数组的数据集。如何使用 numpy 在此数据集上找到 PCA。最好的方法是什么？

Output of the list:

[[  9.59440303 -30.33995167  -9.56393401 ...,  20.47675724  21.32716639
    4.72543396]
 [  9.51383834 -29.91598995 -15.53265741 ...,  29.3551776   22.27276737
    0.21362916]
 [  9.51410643 -29.76027936 -14.61218821 ...,  26.02439054   4.7944802
   -4.97069797]
 ..., 
 [ 10.18460025 -25.08264383  -8.48524125 ...,  -3.86304594  -7.48117144
    0.49041786]
 [ 10.11421507 -27.23984612  -8.57355611 ...,   1.86266657  -5.25912341
    4.07026804]
 [ 11.86344836 -29.08311293  -6.40004177 ...,   3.81287345  -8.21500311
   18.31793505]]

给定的数据是示例，但实际数据包含很长的数据，可能是相关的。您可以使用虹膜数据或其他一些虚拟数据。

Answer 1

使用 sklearn.decomposition.PCA(n_components=2).fit(data).

Answer 2

正如 Nils 所建议的，最简单的解决方案是使用 scikit-learn 包中的 PCA class。如果由于某种原因您不能使用 scikit-learn，PCA 算法本身相当简单。在 scikit-learn 的源代码中，您可以在此处找到它： https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py#L408

作为简化摘要：

centered_data = data - np.mean(data)
U, S, V = np.linalg.svd(centered_data, full_matrices=False)
components = V
coefficients = np.dot(U, np.diag(S))

使用 numpy 计算 PCA

Calculate PCA using numpy

python

numpy

pca