为什么 PCA 和 IncrementalPCA 的结果不同

Question

为什么PCA和IncreasePCA的结果差别很大？

我使用 PCA 和 IncreasePCA 来拟合相同的数据。

但是在transform的时候，两种方法的差距比较大

你能帮我解释一下吗？非常感谢！

import numpy as np
from sklearn.decomposition import PCA, IncrementalPCA
data = np.random.random([100000, 512])
pca_obj = PCA(n_components=256)
ipca_obj = IncrementalPCA(n_components=256, batch_size=1000)
pca_obj.fit(data)
ipca_obj.fit(data)
print pca_obj.transform(np.expand_dims(data[0], axis=0))
print ipca_obj.transform(np.expand_dims(data[0], axis=0))

Answer 1

来自docs,

IPCA builds a low-rank approximation for the input data using an amount of memory which is independent of the number of input data samples.

IPCA 只能用于海量数据集，因为它实际上会降低数据采样率。数据集越大，IPCA 投影就越像 PCA，但它始终是一个近似值，这对于小数据集会更加明显。

为什么 PCA 和 IncrementalPCA 的结果不同

Why is the result different between PCA and IncrementalPCA

python

scipy

pca