PCA 缩放不适用于个人?
PCA scaling not applied to individuals?
此问题适用于我自己的数据,但为了可重现性,我的 issue/question 也出现在 FactoExtra 插图中,或 here,因此我将使用它简单。
首先,生成了一个简单的主成分分析 (scale = T) 并提取了前 4 个轴的坐标变量:
head(var$coord) # coordinates of variables
> Dim.1 Dim.2 Dim.3 Dim.4
> Sepal.Length 0.8901688 -0.36082989 0.27565767 0.03760602
> Sepal.Width -0.4601427 -0.88271627 -0.09361987 -0.01777631
> Petal.Length 0.9915552 -0.02341519 -0.05444699 -0.11534978
> Petal.Width 0.9649790 -0.06399985 -0.24298265 0.07535950
这也是为“个人”做的。这是输出:
head(ind$coord) # coordinates of individuals
> Dim.1 Dim.2 Dim.3 Dim.4
> 1 -2.257141 -0.4784238 0.12727962 0.024087508
> 2 -2.074013 0.6718827 0.23382552 0.102662845
> 3 -2.356335 0.3407664 -0.04405390 0.028282305
4 -2.291707 0.5953999 -0.09098530 -0.065735340
5 -2.381863 -0.6446757 -0.01568565 -0.035802870
6 -2.068701 -1.4842053 -0.02687825 0.006586116
由于 PCA 是使用 scale=T
生成的,我对为什么不缩放各个坐标(-1 到 1?)感到非常困惑。例如,“个人1”的DIM-1得分为-2.257141,但我没有对范围从-0.46到0.991的变量坐标进行比较。如何用 -1 到 1 的比例 PCA 范围解释 -2.25 的分数?
我错过了什么吗?
感谢您的宝贵时间!
已更新所有相关代码空白:
> data(iris)
> res.pca <- prcomp(iris[, -5], scale = TRUE)
> ind <- get_pca_ind(res.pca)
> print(ind)
>var <- get_pca_var(res.pca)
> print(var)
prcomp(...,scale=T)
时进行的缩放是将输入变量 缩放到单位方差 。
我认为它对各个坐标的范围标准化没有任何作用,除非可能使用 center = ...
。但是,post-hoc(或 pre)很容易做到。这是一个相关的 post:
Range standardization (0 to 1) in R
这个问题我问过FactoExtra的作者。这是他的回复:
Scale = TRUE will normalize the variables to make them comparable. This is particularly recommended when variables are measured in different scales (e.g: kilograms, kilometers, centimeters, …);(http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/).
In this case, the correlation between a variable and a principal component (PC) is used as the coordinates of the variable on the PC. The representation of variables differs from the plot of the observations: The observations are represented by their projections, but the variables are represented by their correlations.
So, the coordinates of individuals are not expected to be between -1 and 1, even if scale = TRUE.
It’s only possible to interpret the relative position of individuals and variables by creating a biplot as described at: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/.
双标图对我来说不是个好主意,但我试过重新缩放并且它有效。另外,我想我可以带一个人并将他们投射到 PCA 中,看看他们落在哪里。
总之,到此为止。感谢您的帮助@Hack-r!
此问题适用于我自己的数据,但为了可重现性,我的 issue/question 也出现在 FactoExtra 插图中,或 here,因此我将使用它简单。
首先,生成了一个简单的主成分分析 (scale = T) 并提取了前 4 个轴的坐标变量:
head(var$coord) # coordinates of variables
> Dim.1 Dim.2 Dim.3 Dim.4 > Sepal.Length 0.8901688 -0.36082989 0.27565767 0.03760602 > Sepal.Width -0.4601427 -0.88271627 -0.09361987 -0.01777631 > Petal.Length 0.9915552 -0.02341519 -0.05444699 -0.11534978 > Petal.Width 0.9649790 -0.06399985 -0.24298265 0.07535950
这也是为“个人”做的。这是输出:
head(ind$coord) # coordinates of individuals
> Dim.1 Dim.2 Dim.3 Dim.4 > 1 -2.257141 -0.4784238 0.12727962 0.024087508 > 2 -2.074013 0.6718827 0.23382552 0.102662845 > 3 -2.356335 0.3407664 -0.04405390 0.028282305 4 -2.291707 0.5953999 -0.09098530 -0.065735340 5 -2.381863 -0.6446757 -0.01568565 -0.035802870 6 -2.068701 -1.4842053 -0.02687825 0.006586116
由于 PCA 是使用 scale=T
生成的,我对为什么不缩放各个坐标(-1 到 1?)感到非常困惑。例如,“个人1”的DIM-1得分为-2.257141,但我没有对范围从-0.46到0.991的变量坐标进行比较。如何用 -1 到 1 的比例 PCA 范围解释 -2.25 的分数?
我错过了什么吗? 感谢您的宝贵时间!
已更新所有相关代码空白:
> data(iris)
> res.pca <- prcomp(iris[, -5], scale = TRUE)
> ind <- get_pca_ind(res.pca)
> print(ind)
>var <- get_pca_var(res.pca)
> print(var)
prcomp(...,scale=T)
时进行的缩放是将输入变量 缩放到单位方差 。
我认为它对各个坐标的范围标准化没有任何作用,除非可能使用 center = ...
。但是,post-hoc(或 pre)很容易做到。这是一个相关的 post:
Range standardization (0 to 1) in R
这个问题我问过FactoExtra的作者。这是他的回复:
Scale = TRUE will normalize the variables to make them comparable. This is particularly recommended when variables are measured in different scales (e.g: kilograms, kilometers, centimeters, …);(http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/).
In this case, the correlation between a variable and a principal component (PC) is used as the coordinates of the variable on the PC. The representation of variables differs from the plot of the observations: The observations are represented by their projections, but the variables are represented by their correlations.
So, the coordinates of individuals are not expected to be between -1 and 1, even if scale = TRUE.
It’s only possible to interpret the relative position of individuals and variables by creating a biplot as described at: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/.
双标图对我来说不是个好主意,但我试过重新缩放并且它有效。另外,我想我可以带一个人并将他们投射到 PCA 中,看看他们落在哪里。
总之,到此为止。感谢您的帮助@Hack-r!