sklearn PCA returns 组件数组接近于零

Question

我正在尝试使用 sklearn 的 decomposition.PCA 函数：

输入为100张4096x4096x3(RGB)人脸图片(in numpy array form(uint8),RGB,[0,255]range)由cv2读取

我将它们转换为 [1,4096x4096x3] 二维形状，例如：

[255. 128. 128. ... 255. 128. 128.]

然后我将所有这些数组放入 n_components=20 的 sklearn 的 PCA() 中，以找到 20 个主要特征。

计算成功，但PCA.components_中的所有组件都非常相似，接近零数组。

以下是我解决的所有问题：

1.The 输入图像矩阵有大约 24% 的条目与另一个输入图像相比差异 >10（在 [0,255] 范围内）。

pca.mean_ 很正常：它是一个数组，看起来像输入：

[255. 128. 128. ... 255. 128. 128.]

我可以用它成功重建人脸图像

但是，我发现所有组件都是由非常接近于 0 的浮点数组成的数组，例如：

[[ 1.4016776e-08 4.3943277e-08 2.7873748e-08]

[ 4.1034184e-08 -1.2753417e-08 6.2264380e-09]

[-6.7606822e-09 4.9416444e-09 5.4486654e-10]

...

[-0.0000000e+00 -0.0000000e+00 -0.0000000e+00]

[-0.0000000e+00 -0.0000000e+00 -0.0000000e+00]]

实际上，其中 None 个>1。

2.I 厌倦了使用参数，如：

pca=PCA(n_components=20,svd_solver="randomized", whiten=True)

但结果是一样的。仍然非常相似的组件。

为什么会这样以及如何解决它，感谢您的任何想法！

代码：

from sklearn.decomposition import PCA
import numpy as np
import cv2
import os

folder_path = "./normal"
input=[]
for i in range(1, 101):
    if i%10 == 0: print("loading",i,"th image")
    if i == 60: continue #special case, should be skipped

    image_path = folder_path+f"/total_matrix_tangent {i}.png"
    img = cv2.imread(image_path)
    input.append(img.reshape(-1))
print("Loaded all",i,"images")
# change into numpy matrix
all_image = np.stack(input,axis=0)
# trans to 0-1 format float32!
all_image = (all_image.astype(np.float64))

### shape: #_of_imag x image_RGB_pixel_num (50331648 for normal case)
# print(all_image)
# print(all_image.shape)

# PCA, keeps 20 features
pca=PCA(n_components=20)
pca.fit_transform(all_image)
print("finished PCA")

result=pca.components_
print("PCA mean:",pca.mean_)

result=result.reshape(-1,4096,4096,3)
# result shape: #_of_componets * 4096 * 4096 * 3
# print(result.shape)

dst=result/np.linalg.norm(result,axis=(3),keepdims=True)
saving_path = "./principle64"
for i in range(20):
    reconImage=(dst)[i]
    cv2.imwrite(os.path.join(saving_path,("p"+str(i)+".png")),reconImage)
print("Saved",i+1,"principle imgs")

Answer 1

pca.components_ 不是转换后输入的列表 - 它是 number 将保留的主要组件，在您的情况下为 20.

要获取reduced-dimensionality张图片，您需要使用transform或fit_transform方法：

# PCA, keeps 20 features
pca = PCA(n_components=20)

# Transform all_image
result = pca.fit_transform(all_image)
# result shape: num_of_images, 20

注意，变换会将维度数从409640963减少到20，所以后面的reshape操作没有意义，不会起作用。

如果你想尝试使用保留的信息重建原始图像，你可以调用inverse_transform，即

reconImages = pca.inverse_transform(result)
# reconImages shape: num_of_images, 4096 * 4096 * 3

reconImages = reconImages.reshape(-1, 4096, 4096, 3)

sklearn PCA returns 组件数组接近于零

sklearn PCA returns components arrays close to zero

numpy

image

pca

scikit-learn