Python PCA - 投影到低维 space

Question

我正在尝试实施 PCA，它在特征值和特征向量等中间结果方面运行良好。然而，当我尝试将数据（3 维）投影到 2D 主分量 space 时，结果是错误的。我花了很多时间将我的代码与其他实现进行比较，例如：

http://sebastianraschka.com/Articles/2014_pca_step_by_step.html

可是弄了半天没有任何进展，也找不到错误。由于正确的中间结果，我认为问题是一个简单的编码错误。提前感谢任何真正阅读过这个问题的人，更要感谢那些提供帮助的人 comments/answers.

我的代码如下：

import numpy as np

class PCA():   
def __init__(self, X):           
    #center the data        
    X = X - X.mean(axis=0)         
    #calculate covariance matrix based on X where data points are represented in rows
    C = np.cov(X, rowvar=False)    
    #get eigenvectors and eigenvalues
    d,u = np.linalg.eigh(C)        
    #sort both eigenvectors and eigenvalues descending regarding the eigenvalue
    #the output of np.linalg.eigh is sorted ascending, therefore both are turned around to reach a descending order
    self.U = np.asarray(u).T[::-1]    
    self.D = d[::-1]

**problem starts here**       

def project(self, X, m):
    #use the top m eigenvectors with the highest eigenvalues for the transformation matrix
    Z = np.dot(X,np.asmatrix(self.U[:m]).T)
    return Z

我的代码的结果是：

 myresult
 ([[ 0.03463706, -2.65447128],
   [-1.52656731,  0.20025725],
   [-3.82672364,  0.88865609],
   [ 2.22969475,  0.05126909],
   [-1.56296316, -2.22932369],
   [ 1.59059825,  0.63988429],
   [ 0.62786254, -0.61449831],
   [ 0.59657118,  0.51004927]])

correct result - such as by sklearn.PCA
([[ 0.26424835, -2.25344912],
 [-1.29695602,  0.60127941],
 [-3.59711235,  1.28967825],
 [ 2.45930604,  0.45229125],
 [-1.33335186, -1.82830153],
 [ 1.82020954,  1.04090645],
 [ 0.85747383, -0.21347615],
 [ 0.82618248,  0.91107143]])

The input is defined as follows: 
X = np.array([
[-2.133268233289599,0.903819474847349,2.217823388231679,-0.444779660856219,-0.661480010318842,-0.163814281248453,-0.608167714051449, 0.949391996219125],
[-1.273486742804804,-1.270450725314960,-2.873297536940942, 1.819616794091556,-2.617784834189455, 1.706200163080549,0.196983250752276,0.501491995499840],
[-0.935406638147949,0.298594472836292,1.520579082270122,-1.390457671168661,-1.180253547776717,-0.194988736923602,-0.645052874385757,-1.400566775105519]]).T

Answer 1

在将数据投影到新基础上之前，您需要通过减去平均值来使数据居中：

mu = X.mean(0)
C = np.cov(X - mu, rowvar=False)
d, u = np.linalg.eigh(C)
U = u.T[::-1]
Z = np.dot(X - mu, U[:2].T)

print(Z)
# [[ 0.26424835 -2.25344912]
#  [-1.29695602  0.60127941]
#  [-3.59711235  1.28967825]
#  [ 2.45930604  0.45229125]
#  [-1.33335186 -1.82830153]
#  [ 1.82020954  1.04090645]
#  [ 0.85747383 -0.21347615]
#  [ 0.82618248  0.91107143]]

Python PCA - 投影到低维 space

Python PCA - projection into lower dimensional space

python

numpy

projection

matrix

pca