PCA 不会降低我的数据的维度

Question

我想在 18 维的热图上应用 PCA。

dim(heatmaps)=(224,224,18)

由于 PCA 仅采用 dim <= 2 的数据。我按如下方式重塑我的热图：

heatmaps=heatmaps.reshape(-1,18)
heatmaps.shape
(50176, 18)

现在，我将应用 PCA 并采用保留 95% 方差的第一个组件。

from sklearn.decomposition import PCA
pca = PCA(n_components=18)
reduced_heatmaps=pca.transform(heatmaps)

然而 reduced_heatmaps 的维度与原来的 heatmaps (50176, 18) 保持不变。

我的问题如下：如何在保留 95% 方差的同时降低热图的维度？

奇怪的事情

pca.explained_variance_ratio_.cumsum()
array([ 0.05744624,  0.11482341,  0.17167621,  0.22837643,  0.284996  ,
        0.34127299,  0.39716828,  0.45296374,  0.50849681,  0.56382308,
        0.61910508,  0.67425335,  0.72897448,  0.78361028,  0.83813329,
        0.89247688,  0.94636864,  1.        ])

这意味着，我需要保留 17 个分量来降低数据的维度，这样我就有 18 个维度。

怎么了？

编辑：遵循 Eric Yang

的建议

heatmaps=heatmaps.reshape(18,-1)
heatmaps.shape
(18,50176)

然后应用 PCA 如下：

pca = PCA(n_components=11)
reduced_heatmaps=pca.fit_transform(heatmaps)
pca.explained_variance_ratio_.cumsum()
results the following : 
array([ 0.21121199,  0.33070526,  0.44827572,  0.55748779,  0.64454442,
        0.72588593,  0.7933346 ,  0.85083687,  0.89990991,  0.9306283 ,
        0.9596194 ], dtype=float32)

需要 11 个成分来解释我数据的 95% 方差。

reduced_heatmaps.shape
(18, 11)

因此我们从 (18,50176) 到 (18, 11)

感谢您的帮助

Answer 1

减少方差的能力取决于您的数据。如果你有一个 N 维高斯，每个维度为 N(0,1)，每个维度将解释你的方差的 1/N，因此你通过 PCA 减少维度的能力将是最小的。所以PCA的结果好像没有错

现在根据对你的问题的粗略理解，你有18张224x224的图片正确吗？如果这是正确的，那么你的维度是 224x224 而不是 18。所以你想问我的图像中解释我的 18 张图像之间差异的最小像素数是多少。（但是，如果这不是假设，我可能是错的，而你拥有的是一张图像的 18 个通道）

还有另一种可能性，您有一系列相似的图像（因此您的维度将为 18），并且您正在寻找本征图像。如果图像差异太大，您将对维度进行最小程度的降低。

PCA 不会降低我的数据的维度

PCA doesn't reduce the dimensionality of my data

heatmap

pca

python-2.7

scikit-learn