为什么我需要在主成分分析中指明要保留的成分数量?
Why do I need to indicate the number of components to be kept in Principal Component Analysis?
我发现要使用 pca 必须在开始时指明要保留的组件数量,例如在下一个代码中:
# Initialize
model = pca(n_components=3, normalize=True)
有没有只标示方差,让算法给我最重要的组成部分?
您不一定需要提前指定组件数量。您可以提取所有成分并仅保留解释给定部分累积方差的成分。有关示例,请参见下面的代码。
import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import make_spd_matrix
from sklearn.preprocessing import StandardScaler
# generate the data
np.random.seed(100)
N = 1000 # number of samples
K = 10 # number of features
mean = np.zeros(K)
cov = make_spd_matrix(K)
X = np.random.multivariate_normal(mean, cov, N)
print(X.shape)
# (1000, 10)
# rescale the data
scaler = StandardScaler()
X = scaler.fit_transform(X)
# perform the PCA
pca = PCA(n_components=None)
pca.fit(X)
# extract the smallest number of components which
# explain at least p% (e.g. 80%) of the variance
p = 0.80
n_components = 1 + np.argmax(np.cumsum(pca.explained_variance_ratio_) >= p)
print(n_components)
# 6
# extract the values of the selected components
Z = pca.transform(X)[:, :n_components]
print(Z.shape)
# (1000, 6)
我发现要使用 pca 必须在开始时指明要保留的组件数量,例如在下一个代码中:
# Initialize
model = pca(n_components=3, normalize=True)
有没有只标示方差,让算法给我最重要的组成部分?
您不一定需要提前指定组件数量。您可以提取所有成分并仅保留解释给定部分累积方差的成分。有关示例,请参见下面的代码。
import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import make_spd_matrix
from sklearn.preprocessing import StandardScaler
# generate the data
np.random.seed(100)
N = 1000 # number of samples
K = 10 # number of features
mean = np.zeros(K)
cov = make_spd_matrix(K)
X = np.random.multivariate_normal(mean, cov, N)
print(X.shape)
# (1000, 10)
# rescale the data
scaler = StandardScaler()
X = scaler.fit_transform(X)
# perform the PCA
pca = PCA(n_components=None)
pca.fit(X)
# extract the smallest number of components which
# explain at least p% (e.g. 80%) of the variance
p = 0.80
n_components = 1 + np.argmax(np.cumsum(pca.explained_variance_ratio_) >= p)
print(n_components)
# 6
# extract the values of the selected components
Z = pca.transform(X)[:, :n_components]
print(Z.shape)
# (1000, 6)