scikit-learn PCA方法中百分比值的解释
Explanation of the percentage value in scikit-learn PCA method
在scikit-learn中,有一个叫做PCA. This method takes a percentage parameter. This site的方法对这个参数的解释如下:
Notice the code below has .95 for the number of components parameter.
It means that scikit-learn choose the minimum number of principal
components such that 95% of the variance is retained.
> from sklearn.decomposition import PCA
> # Make an instance of the Model
> pca = PCA(.95)
我对这个解释的解释有点一头雾水。假设 PCA 的输出如下:
- PC1 解释了 70% 的完整方差
- PC2 解释了 15% 的完整方差
- PC3 解释了 10% 的完整方差
- PC4 解释了 4% 的完整方差
- PC5 解释了 1% 的完整方差
语句 PCA(0.71) return PC1 和 PC5(因为它们都解释了 71% 的方差)还是 return PC1 和 PC2?如果我想检索 0.5% 的方差会发生什么情况,即哪台 PC 会使用语句 PCA(0.005) return?
通常,当我们 select 没有。的主成分(例如降维、可视化等),我们 select 一个数字 k
,隐含它的意思是 "start from PC1 and continue sequentially, up to (and including) PCk"。这就是 R 中 caret
包的 preProcess
换句话说,至少据我所知,在您描述的情况下,我们绝不会通过 cherrypicking 选择 PC(即选择 PC2、PC4和 PC5,例如)。相反,我们总是选择一个k < n
),然后我们继续取所有第一个 k
documentation 下面是关于 0 < n_components < 1
if 0 < n_components < 1
and svd_solver == 'full',
select the number
of components such that the amount of variance that needs to be
explained is greater than the percentage specified by n_components.
if 0 < n_components < 1
and svd_solver == 'full',
select the minimum number of components from the sorted list (descending order)
according to their respective explained variance values such that the amount of
variance that needs to be explained is greater than the percentage specified by n_components
会 return PC1 和 PC2
-(不太可能的情况)会 return PC1
在scikit-learn中,有一个叫做PCA. This method takes a percentage parameter. This site的方法对这个参数的解释如下:
Notice the code below has .95 for the number of components parameter. It means that scikit-learn choose the minimum number of principal components such that 95% of the variance is retained.
> from sklearn.decomposition import PCA
> # Make an instance of the Model
> pca = PCA(.95)
我对这个解释的解释有点一头雾水。假设 PCA 的输出如下:
- PC1 解释了 70% 的完整方差
- PC2 解释了 15% 的完整方差
- PC3 解释了 10% 的完整方差
- PC4 解释了 4% 的完整方差
- PC5 解释了 1% 的完整方差
语句 PCA(0.71) return PC1 和 PC5(因为它们都解释了 71% 的方差)还是 return PC1 和 PC2?如果我想检索 0.5% 的方差会发生什么情况,即哪台 PC 会使用语句 PCA(0.005) return?
通常,当我们 select 没有。的主成分(例如降维、可视化等),我们 select 一个数字 k
,隐含它的意思是 "start from PC1 and continue sequentially, up to (and including) PCk"。这就是 R 中 caret
包的 preProcess
换句话说,至少据我所知,在您描述的情况下,我们绝不会通过 cherrypicking 选择 PC(即选择 PC2、PC4和 PC5,例如)。相反,我们总是选择一个k < n
),然后我们继续取所有第一个 k
documentation 下面是关于 0 < n_components < 1
0 < n_components < 1
and svd_solver == 'full',select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.
0 < n_components < 1
and svd_solver == 'full',select the minimum number of components from the sorted list (descending order) according to their respective explained variance values such that the amount of variance that needs to be explained is greater than the percentage specified by n_components
会 return PC1 和 PC2
-(不太可能的情况)会 return PC1