PCA 缩放不适用于个人?

PCA scaling not applied to individuals?

此问题适用于我自己的数据,但为了可重现性,我的 issue/question 也出现在 FactoExtra 插图中,或 here,因此我将使用它简单。

首先,生成了一个简单的主成分分析 (scale = T) 并提取了前 4 个轴的坐标变量:

head(var$coord) # coordinates of variables
>                   Dim.1       Dim.2       Dim.3       Dim.4

> Sepal.Length  0.8901688 -0.36082989  0.27565767  0.03760602

> Sepal.Width  -0.4601427 -0.88271627 -0.09361987 -0.01777631

> Petal.Length  0.9915552 -0.02341519 -0.05444699 -0.11534978

> Petal.Width   0.9649790 -0.06399985 -0.24298265  0.07535950

这也是为“个人”做的。这是输出:

head(ind$coord) # coordinates of individuals
>       Dim.1      Dim.2       Dim.3        Dim.4

> 1 -2.257141 -0.4784238  0.12727962  0.024087508

> 2 -2.074013  0.6718827  0.23382552  0.102662845

> 3 -2.356335  0.3407664 -0.04405390  0.028282305

4 -2.291707  0.5953999 -0.09098530 -0.065735340

5 -2.381863 -0.6446757 -0.01568565 -0.035802870

6 -2.068701 -1.4842053 -0.02687825  0.006586116

由于 PCA 是使用 scale=T 生成的,我对为什么不缩放各个坐标(-1 到 1?)感到非常困惑。例如,“个人1”的DIM-1得分为-2.257141,但我没有对范围从-0.46到0.991的变量坐标进行比较。如何用 -1 到 1 的比例 PCA 范围解释 -2.25 的分数?

我错过了什么吗? 感谢您的宝贵时间!

已更新所有相关代码空白:

> data(iris)

> res.pca <- prcomp(iris[, -5],  scale = TRUE)

> ind <- get_pca_ind(res.pca)

> print(ind)

>var <- get_pca_var(res.pca)

> print(var)

prcomp(...,scale=T) 时进行的缩放是将输入变量 缩放到单位方差

我认为它对各个坐标的范围标准化没有任何作用,除非可能使用 center = ...。但是,post-hoc(或 pre)很容易做到。这是一个相关的 post:

Range standardization (0 to 1) in R

这个问题我问过FactoExtra的作者。这是他的回复:

Scale = TRUE will normalize the variables to make them comparable. This is particularly recommended when variables are measured in different scales (e.g: kilograms, kilometers, centimeters, …);(http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/).

In this case, the correlation between a variable and a principal component (PC) is used as the coordinates of the variable on the PC. The representation of variables differs from the plot of the observations: The observations are represented by their projections, but the variables are represented by their correlations.

So, the coordinates of individuals are not expected to be between -1 and 1, even if scale = TRUE.

It’s only possible to interpret the relative position of individuals and variables by creating a biplot as described at: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/.

双标图对我来说不是个好主意,但我试过重新缩放并且它有效。另外,我想我可以带一个人并将他们投射到 PCA 中,看看他们落在哪里。

总之,到此为止。感谢您的帮助@Hack-r!