numpy 中 KNN 的 PCA

Question

我的任务是实施我的 PCA 代码以将数据转换为用于 KNN 分配的二维字段。我的 PCA 代码创建了一个名为 PCevecs 的特征向量数组。

def __PCA(data):
   #Normalize data
   data_cent = data-np.mean(data)

   #calculate covariance
   covarianceMatrix = np.cov(data_cent, bias=True)

   #Find eigenvector and eigenvalue
   eigenvalue, eigenvector= np.linalg.eigh(covarianceMatrix)

   #Sorting the eigenvectors and eigenvalues:
   PCevals = eigenvalue[::-1]
   PCevecs = eigenvector[:,::-1]

   return PCevals, PCevecs

作业使用 PCA 转换训练数据。返回的 PCevecs 具有通过调用 print(PCevecs.shape) 给出的形状 (88, 88)。训练数据的形状是 (88, 4)。

np.dot(trainingFeatures, PCevecs[:, 0:2])

当代码为运行时，我收到错误消息 "ValueError: shapes (88,4) and (88,2) not alligned: 4 (dim 1) != 88 (dim 0)"。我可以看到数组不匹配，但我看不出我在 PCA 实现上做错了什么。我试图在 Whosebug 上查看类似的问题。我还没有看到有人以同样的方式对特征向量和特征值进行排序。

Answer 1

（已编辑，附加信息来自评论）

虽然 PCA 实现总体上没问题，但您可能想要在转置的数据上计算它，或者您想要确保通过 [=] 告诉 np.cov() 您的维度在哪个轴上13=] 参数.

以下将按您的预期工作：

import numpy as np


def __PCA_fixed(data, rowvar=False):
   # Normalize data
   data_cent = data - np.mean(data)

   # calculate covariance (pass `rowvar` to `np.cov()`)
   covarianceMatrix = np.cov(data_cent, rowvar=rowvar, bias=True)  
   # Find eigenvector and eigenvalue
   eigenvalue, eigenvector= np.linalg.eigh(covarianceMatrix)

   # Sorting the eigenvectors and eigenvalues:
   PCevals = eigenvalue[::-1]
   PCevecs = eigenvector[:,::-1]

   return PCevals, PCevecs

用一些随机数进行测试：

data = np.random.randint(0, 100, (100, 10))
PCevals, PCevecs = __PCA_fixed(data)
print(PCevecs.shape)
# (10, 10)

另请注意，在更一般的情况下，您使用的 singular value decomposition (np.linalg.svd() in NumPy) might be a better approach for principal component analysis (with a simple relationship with the eigenvalue decomposition 和换位）。

作为一般的编码风格说明，遵循 PEP-8, many of which can be readily checked by some automated tool like, e.g. autopep8 的建议可能是个好主意。

numpy 中 KNN 的 PCA

PCA for KNN in numpy

python

arrays

numpy

pca