Summary() returns 当我尝试在 glmm 中包含一个线性变量时的 NaN 值

Question

我正在尝试运行使用 glmmTMB 的模型。当我包含 avgt60 时，它在模型中做了奇怪的事情，我不太确定为什么。当我将它作为非多边形项包含时，它会给我 NaN 值。当我将它作为一个 poly() 项包括在内时，它会抛出整个模型。当我排除它时，它似乎很好......我是这类工作的新手，所以任何建议都将不胜感激！

m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + avgt60 + (1|year) + (1|site),
              family = "nbinom2", data = weather1)

我得到：

Family: nbinom2  ( log )
Formula:          dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60,      degree = 2) + avgt60 + (1 | year) + (1 | site)
Data: weather1

     AIC      BIC   logLik deviance df.resid 
  1647.9   1687.9   -813.0   1625.9      269 

Random effects:

Conditional model:
 Groups Name        Variance  Std.Dev. 
 year   (Intercept) 5.883e-24 2.426e-12
 site   (Intercept) 6.396e-07 7.997e-04
Number of obs: 280, groups:  year, 3; site, 6

Dispersion parameter for nbinom2 family (): 0.232 

Conditional model:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -7.8560        NaN     NaN      NaN
poly(rh60, degree = 2)1      47.9631        NaN     NaN      NaN
poly(rh60, degree = 2)2      -5.4370        NaN     NaN      NaN
poly(wndspd60, degree = 2)1  61.7092        NaN     NaN      NaN
poly(wndspd60, degree = 2)2 -74.9432        NaN     NaN      NaN
poly(raintt60, degree = 2)1  27.2669        NaN     NaN      NaN
poly(raintt60, degree = 2)2 -72.9072        NaN     NaN      NaN
avgt60                        0.4384        NaN     NaN      NaN

但是，如果没有 avgt60 变量...

m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + (1|year) + (1|site),
              family = "nbinom2", data = weather1)


 Family: nbinom2  ( log )
Formula:          dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60,      degree = 2) + (1 | year) + (1 | site)
Data: weather1

     AIC      BIC   logLik deviance df.resid 
  1648.2   1684.6   -814.1   1628.2      270 

Random effects:

Conditional model:
 Groups Name        Variance  Std.Dev. 
 year   (Intercept) 2.052e-10 1.433e-05
 site   (Intercept) 4.007e-10 2.002e-05
Number of obs: 280, groups:  year, 3; site, 6

Dispersion parameter for nbinom2 family (): 0.23 

Conditional model:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   1.3677     0.3482   3.928 8.56e-05 ***
poly(rh60, degree = 2)1      23.8058     9.6832   2.458 0.013953 *  
poly(rh60, degree = 2)2      -0.3452     4.2197  -0.082 0.934810    
poly(wndspd60, degree = 2)1  34.4332    10.1328   3.398 0.000678 ***
poly(wndspd60, degree = 2)2 -61.2044     6.5179  -9.390  < 2e-16 ***
poly(raintt60, degree = 2)1  12.0109     6.4949   1.849 0.064417 .  
poly(raintt60, degree = 2)2 -57.2197     6.0502  -9.457  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

如果我将 avgt60 作为 poly() 项保留，它会抛出整个模型，并且没有任何意义。有什么想法吗？

这是数据集的 link，站点名称已编辑：https://docs.google.com/spreadsheets/d/1mFDK_YEshvgGPHpvqu4o6TbFfwKRgHaVUVGOIZnsq7c/edit?usp=sharing

Answer 1

您的数据集中有 280 行，但预测变量只有 10 个唯一值：

nrow(unique(subset(weather1, select = -c(dsi))))

这决定了您实际可以拟合的模型的复杂程度。

您正在尝试估计 8 个 fixed-effect 个参数（length(fixef(m1)$cond) 或 ncol(model.matrix(m1))）、两个 random-effect 个参数（among-site 和 among-year方差）和一个色散参数（对于负二项式参数）= 11（或 length(m1$fit$par)）。这比您拥有的唯一预测变量组合更多的参数！

Murtaugh (2007) 指出，如果您有一个嵌套设计（预测变量的值仅在组之间发生变化，而不是在组内发生变化），您将获得相同的效果估计您是否将每个组的响应变量（或您的情况下的 site/year 组合）汇总到其均值。（如果你有不平衡的群体，就像在这种情况下，你需要用权重来拟合模型，这种方法不适用于 non-Gaussian 响应，但原理是相似的。）

如果您省略 avgt60，您“只有”10 个参数。我仍然不太相信这个模型，它严重过度参数化（通常你的目标是（# observations）/（# data points）至少 10，最好是 20 ...）老实说我不是甚至确定它为什么起作用——我认为是因为地点和年份的方差基本上降为零并将它们从模型中移除，所以你“只有”8 个参数要估计？

数据如下：

dsi 站点 5 和 6 的值始终为零（仅在 2021 年测量）
dsi 2019 年的值非常高，仅测量了两个站点（1 和 3）
没有特定的模式，当然也没有不与网站和年份混淆的模式。

我可能会尝试从这些数据中仅得出定性结论，或非常简单的定量结论...

library(tidyverse); theme_set(theme_bw())
w3 <- (weather1
    |> as_tibble()
    |> select(-date)
    |> pivot_longer(-c(site, year, dsi), names_to = "var")
    |> mutate(across(c(year,site), factor))
)

theme_set(theme_bw(base_size = 20)  + theme(panel.spacing = grid::unit(0, "lines")))
(ggplot(w3)
    + aes(x = value, y = dsi, colour = site, shape = year)
    + stat_sum(alpha = 0.6)
    + stat_summary(fun = mean)
    + stat_summary(fun = mean, geom = "line", aes(group = 1), colour = "gray")
    + facet_wrap(~var, scale = "free_x")
    + scale_y_sqrt()
)

Murtaugh, Paul A.“生态数据分析中的简单性和复杂性。”生态 88，没有。 1 (2007): 56–62.

Summary() returns 当我尝试在 glmm 中包含一个线性变量时的 NaN 值

Summary() returns NaN values when I try to include a variable as linear in a glmm

r

glmmtmb