与 In pf(F.stat, qr(Lc)$rank, nu.F) 交互项失败 lmerTest：产生 NaN

Question

我正在尝试对两个单独的数据集执行 lmerTest，但出于某种原因，其中一个数据集出现以下错误。

In pf(F.stat, qr(Lc)$rank, nu.F) : NaNs produced

This dataset 毫无问题地给出了 habitat 和 soil 之间交互项的 p 值。

anova(lmer(sqrt(abs) ~ habitat*soil + (1|species), data=frl_light, REML=T))

Analysis of Variance Table of type III  with  Satterthwaite 
approximation for degrees of freedom
               Sum Sq  Mean Sq NumDF  DenDF F.value  Pr(>F)  
habitat      0.057617 0.028809     2 8.8434  1.0880 0.37805  
soil         0.232708 0.232708     1 2.6732  8.7888 0.06848 .
habitat:soil 0.308003 0.154001     2 2.7134  5.8163 0.10443  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This dataset 具有相似的结构，但会抛出错误，并且无法给出 habitat 和 light 之间交互作用的 p 值。密度自由度测量也是0，估计是这个问题。

anova(lmer(sqrt(abs) ~ habitat*light + (1|species), data=frl_soil, REML=T))

Analysis of Variance Table of type III  with  Satterthwaite 
approximation for degrees of freedom
               Sum Sq  Mean Sq NumDF  DenDF F.value Pr(>F)
habitat       0.00845 0.004223     2 7.9751  0.3494 0.7154
light         0.01634 0.016336     1 1.9241  1.3517 0.3689
habitat:light 0.42813 0.214067     2 0.0000 17.7124       
Warning message:
In pf(F.stat, qr(Lc)$rank, nu.F) : NaNs produced

我不知道为什么 lmerTest 适用于一个数据集而不适用于另一个数据集，因为至少在我看来，这两个数据集之间几乎没有区别。如果有哪位大侠能解答一下，请帮忙。

更新 1：我尝试了 Ben Bolker 的建议，改用 Kenward-Roger 估计。然而，我们的答案似乎有所不同。我是运行 R 3.3.1、lme4 1.1-12 和 lmerTest 2.0-32。这是我的输出

anova(lmer(sqrt(abs) ~ habitat*light + (1|species), + data = frl_soil, REML = T), + ddf="Kenward-Roger")

anova from lme4 is returned
some computational error has occurred in lmerTest
Analysis of Variance Table
              Df  Sum Sq  Mean Sq F value
habitat        2 0.00244 0.001219  0.1009
light          1 0.00476 0.004763  0.3941
habitat:light  2 0.42813 0.214067 17.7124

更新 1.1：这是使用 SAS 进行混合模型分析的输出，我在其中添加了一个带有 abs 平方根的附加列，如 sqrtabs。

FILENAME REFFILE '/folders/myfolders/frl_soil.csv';

PROC IMPORT DATAFILE=REFFILE DBMS=CSV OUT=WORK.FRLSOIL; GETNAMES=YES; RUN;

PROC CONTENTS DATA=WORK.FRLSOIL; RUN;

%web_open_table(WORK.FRLSOIL);

PROC MIXED data = WORK.FRLSOIL; CLASS species habitat light sqrtabs; model sqrtabs = habitat light habitat*light / DDFM=KENWARDROGER; random intercept species; run;

Type 3 Tests of Fixed Effects
Effect         Num DF   Den DF  F Value     Pr > F
habitat             2       10     1.11     0.3681
light               1       10     0.45     0.5159
habitat*light       2       10     0.27     0.7716

Answer 1

我不能确切地告诉你为什么 Satterthwaite 近似在这里给你一个零 df 估计（这确实是你得到错误并且没有 $p$ 交互值的原因）；您必须详细研究代码才能看到（键入 lmerTest:::calcSatterthMultDF 并开始挖掘...）我做了一些 little 的挖掘；其中的关键行是

E <- sum((nu.m/(nu.m - 2)) * as.numeric(nu.m > 2))
nu.F <- 2 * E * as.numeric(E > q)/(E - q)

其中（我认为）nu.m ($\nu_m$) 是 Welch-Satterthwaite approximation 估计的自由度数。我不知道为什么 (1) lmerTest 在 nu.m <= 2 时将 E 和 nu.F 设置为零； (2) 一个数据集中组内方差的特定组合给出 nu.m < 2 而在另一个数据集中没有...

不过，与此同时，您可以根据需要使用 Kenward-Roger 近似值（它的计算成本更高，但通常稍微更准确......）使用第二个 link 的数据集:

frl_soil <- read.csv("frl_soil.csv")
library(lmerTest)
head(frl_soil,2)
##   X species habitat light        abs
## 1 1  ANI2GR     gen  G.cs 2.67477395
## 2 2  DIPTAC     gen  G.cs 0.09549154
anova(lmer(sqrt(abs) ~ habitat*light + (1|species), 
            data=frl_soil, REML=TRUE), 
     ddf="Kenward-Roger")
## Analysis of Variance Table of type III  with  Kenward-Roger 
## approximation for degrees of freedom
##                Sum Sq  Mean Sq NumDF  DenDF F.value  Pr(>F)  
## habitat       0.00842 0.004208     2 8.1220  0.3482 0.71602  
## light         0.01568 0.015679     1 2.0712  1.2973 0.36928  
## habitat:light 0.40886 0.204432     2 2.0713 16.9152 0.05212 .
sessionInfo()
## other attached packages:
## [1] lmerTest_2.0-32 lme4_1.1-13     Matrix_1.2-6

与 In pf(F.stat, qr(Lc)$rank, nu.F) 交互项失败 lmerTest：产生 NaN

Interaction term failing lmerTest with In pf(F.stat, qr(Lc)$rank, nu.F) : NaNs produced

r

lme4