为什么 "Varimax" 和 "varimax" 之间的 psych::principal 有差异?

Why are there differences in psych::principal between "Varimax" and "varimax"?

related question中,我曾问过为什么stats::varimaxGPArotation::Varimax之间存在差异,两者都是psych::principal调用,具体取决于为[设置的选项=17=].

这两者之间的差异(参见其他问题)解释了与 psych::principal 的部分差异,但不是全部差异。 psych::principal 似乎 加剧了 这些差异。 (我有一个简单的理论为什么,我想得到证实)。

library(GPArotation)
library(psych)
data("Thurstone")

principal.unrotated <- principal(r = Thurstone, nfactors = 4, rotate = "none")  # find unrotated PCs first
loa <- unclass(principal.unrotated$loadings)

# just to compare that the rot.mat is correct
varimax.stats <- stats::varimax(x = loa, normalize = TRUE)
varimax.GPA <- GPArotation::Varimax(L = loa, normalize = TRUE)
# notice we're here NOT interested in the difference between stats and GPA, that's the other question

diff.from.rot.meth <- unclass(varimax.stats$loadings - varimax.GPA$loadings)  # very small differences, see this question: 
mean(abs(diff.from.rot.meth))
#> [1] 8.036863e-05

principal.varimax.stats <- principal(r = Thurstone, nfactors = 4, rotate = "varimax")
principal.Varimax.GPA <- principal(r = Thurstone, nfactors = 4, rotate = "Varimax")

diff.from.princ <- principal.Varimax.GPA$rot.mat - principal.varimax.stats$rot.mat  # quite a substantial change, because Theta is NOT a rotmat, that makes sense
mean(abs(diff.from.princ))
#> [1] 0.021233

mean(abs(diff.from.rot.meth)) - mean(abs(diff.from.princ))  # principal has MUCH bigger differences
#> [1] -0.02115263

这对于浮点人工制品或其他东西来说似乎太大了。

我的假设是(额外的)差异源于 GPArotation::Varimax 默认为 (Kaiser) normalize == FALSE,而 **stats::varimax 默认到 (Kaiser) normalize == TRUE不能 与 `principal::psych`` 中的设置不同。

stats::varimax 手册:

## varimax with normalize = TRUE is the default

GPArotation::Varimax / GPArotation::GPForth 手册:

The argument normalize gives an indication of if and how any normalization should be done before rotation, and then undone after rotation. If normalize is FALSE (the default) no normalization is done. If normalize is TRUE then Kaiser normalization is done. (So squared row entries of normalized A sum to 1.0. This is sometimes called Horst normalization.)

此外,他们 psych::Kaiser 手册警告:

The GPArotation package does not (by default) normalize, nor does the fa function. Then, to make it more confusing, varimax in stats does,Varimax in GPArotation does not.

谁能确认差异实际上是由归一化选项造成的?

这似乎证实了应用 psych::kaiser(我认为它是为此目的而构建)将差异缩小回 stats::varimaxGPArotation::Varimax 之间的原始差异:

principal.Varimax.GPA.kaiser <- kaiser(f = principal.unrotated, rotate = "Varimax")
diff.statsvari.gpavar.bothkaiser <- unclass(principal.Varimax.GPA.kaiser$loadings - principal.varimax.stats$loadings)
mean(abs(diff.statsvari.gpavar.bothkaiser))
#> [1] 8.036863e-05

这几乎是相同的结果,所以我认为假设得到证实

来自 psych::principal 的更大差异是因为 normalize 的默认值不同。


更新

对于各自的旋转矩阵(或 Th 是什么),差异也(再次)小得多:

principal.Varimax.GPA.kaiser$Th - principal.varimax.stats$rot.mat  # those differences are very small now, too
#>               [,1]         [,2]          [,3]          [,4]
#> [1,]  1.380279e-04 1.380042e-05 -2.214319e-04 -2.279170e-06
#> [2,]  9.631517e-05 2.391296e-05  1.531723e-04 -3.371868e-05
#> [3,]  1.758299e-04 7.917460e-05  6.788867e-05  1.099072e-04
#> [4,] -9.548010e-05 6.500162e-05 -1.679753e-05 -5.213475e-05

由于程序的精度不同而出现加载不同的问题(当然,由于在psych::principal中没有评估normalize选项,所有其他程序必须与该选项一起使用切换为 TRUE)。虽然可以配置 stats::varimax 和 GPArotation::Varimax 中的精度(参数 eps),但在 psych::principal 中忽略了这一点,并且似乎隐含地等于 stats::varimax with eps =1e-5。

如果我们将 stats::varimax 和 GPArotations::Varimax 的准确度提高到 eps=1e-15,那么我们会得到与我的 MatMate 中同时实现的结果相同的结果(至少最多 8 位数字)-我测试过的程序也非常准确地等于 SPSS 计算。

psych::principal 中缺少对显式选项 eps 的处理似乎是一个错误,其糟糕的隐式默认值肯定不能令人满意。

有趣的是,GPArotation::Varimax 需要与 eps=1e-15 进行多次旋转,请参阅最后的输出;因此,要么实施了另一个内部程序,要么在决定何时可以停止迭代时对 eps 参数进行不同的评估。一个示例(见本答案末尾处)可能暗示了这样一种效果。

低于比较协议;仅显示载荷的前两行。

The first three computations with low accuracy (eps=1e-5) 
all procedures give equal or comparable results, 
except GPArotation, which is already near the exact answer
--------------------------------
>  principal(r = Thurstone, nfactors = 4, rotate = "varimax")$loadings    *1000000
>  stats::varimax(x = loa, normalize = TRUE,eps=1e-5)$loadings            *1000000
>  GPArotation::Varimax(L = loa, normalize = TRUE,eps=1e-5)$loadings      *1000000

The second three computations with (attempted) high accuracy (eps=1e-15) 
all procedures except psych::principal give equal results, and they
agree also with external crosscheck using MatMate
--------------------------------
>  principal(r = Thurstone, nfactors = 4, rotate = "varimax",eps=1e-15)$loadings*1000000
>  stats::varimax(x = loa, normalize = TRUE,eps=1e-15)$loadings*1000000
>  GPArotation::Varimax(L = loa, normalize = TRUE,eps=1e-10)$loadings*1000000

尝试 little/default 具有大 eps 或没有

的准确结果
# ===== Numerical documentation (only first two rows are displayed)================
> principal(r = Thurstone, nfactors = 4, rotate = "varimax")$loadings*1000000
                RC1       RC2       RC3       RC4      
Sentences       871655.72 216638.46 198427.07 175202.57
Vocabulary      855609.28 294166.99 153181.45 180525.99


> stats::varimax(x = loa, normalize = TRUE,eps=1e-5)$loadings         *1000000
                PC1       PC2       PC3       PC4      
Sentences       871655.72 216638.46 198427.07 175202.57
Vocabulary      855609.28 294166.99 153181.45 180525.99

> GPArotation::Varimax(L = loa, normalize = TRUE,eps=1e-5)$loadings   *1000000
                     PC1      PC2      PC3       PC4
Sentences       871717.3 216618.7 198176.3 175204.47
Vocabulary      855663.1 294146.3 152930.7 180517.21
# =============================================================================

现在尝试用更小的eps

获得更准确的结果
> principal(r = Thurstone, nfactors = 4, rotate = "varimax",eps=1e-15)$loadings  *1000000
                RC1       RC2       RC3       RC4      
Sentences       871655.72 216638.46 198427.07 175202.57
Vocabulary      855609.28 294166.99 153181.45 180525.99

> stats::varimax(x = loa, normalize = TRUE,eps=1e-15)$loadings                   *1000000
                PC1       PC2       PC3       PC4      
Sentences       871716.83 216619.69 198174.31 175207.86
Vocabulary      855662.58 294147.47 152928.77 180519.37

> GPArotation::Varimax(L = loa, normalize = TRUE,eps=1e-10)$loadings             *1000000
                     PC1      PC2       PC3       PC4
Sentences       871716.8 216619.7 198174.31 175207.86
Vocabulary      855662.6 294147.5 152928.77 180519.37

Warnmeldung:
In GPForth(L, Tmat = Tmat, method = "varimax", normalize = normalize,  :
  convergence not obtained in GPForth. 1000 iterations used.

# Result by MatMate: --------------------------------------------------------
 lad = cholesky(Thurstone) 
 pc = rot(lad,"pca")
 pc4 = pc[*,1..4]                           // arrive at the first four pc's
     t = gettrans( normzl(pc4),"varimax")   // get rotation-matrix for row-normalized pc's
 vmx = pc4 * t                              // rotate pc4 by rotation-matrix 
 display = vmx     * 1000000
                     PC1      PC2       PC3       PC4
Sentences       871716.83    216619.68   198174.31   175207.87
Vocabulary      855662.58    294147.46   152928.77   180519.37


# ===============================================================================> 

通过在 stat:: 中将 eps 设置为 1e-12 并在 GPArotation 中设置为 1e-6 可以实现 stats::varimax 和 GPArotation::Varimax 结果的更好匹配,其中一个值为对方的广场。然后我们得到

> GPArotation::Varimax(L = loa, normalize = TRUE,eps=1e-6)$loadings*1000000
                     PC1      PC2       PC3       PC4
Sentences       871716.8 216619.8 198174.49 175207.63
Vocabulary      855662.5 294147.6 152928.94 180519.21

> stats::varimax(x = loa, normalize = TRUE,eps=1e-12)$loadings*1000000
                PC1       PC2       PC3       PC4      
Sentences       871716.80 216619.74 198174.40 175207.85
Vocabulary      855662.55 294147.52 152928.86 180519.36