Summary() returns 当我尝试在 glmm 中包含一个线性变量时的 NaN 值
Summary() returns NaN values when I try to include a variable as linear in a glmm
我正在尝试 运行 使用 glmmTMB
的模型。当我包含 avgt60 时,它在模型中做了奇怪的事情,我不太确定为什么。当我将它作为非多边形项包含时,它会给我 NaN 值。当我将它作为一个 poly() 项包括在内时,它会抛出整个模型。当我排除它时,它似乎很好......我是这类工作的新手,所以任何建议都将不胜感激!
m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + avgt60 + (1|year) + (1|site),
family = "nbinom2", data = weather1)
我得到:
Family: nbinom2 ( log )
Formula: dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + avgt60 + (1 | year) + (1 | site)
Data: weather1
AIC BIC logLik deviance df.resid
1647.9 1687.9 -813.0 1625.9 269
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
year (Intercept) 5.883e-24 2.426e-12
site (Intercept) 6.396e-07 7.997e-04
Number of obs: 280, groups: year, 3; site, 6
Dispersion parameter for nbinom2 family (): 0.232
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.8560 NaN NaN NaN
poly(rh60, degree = 2)1 47.9631 NaN NaN NaN
poly(rh60, degree = 2)2 -5.4370 NaN NaN NaN
poly(wndspd60, degree = 2)1 61.7092 NaN NaN NaN
poly(wndspd60, degree = 2)2 -74.9432 NaN NaN NaN
poly(raintt60, degree = 2)1 27.2669 NaN NaN NaN
poly(raintt60, degree = 2)2 -72.9072 NaN NaN NaN
avgt60 0.4384 NaN NaN NaN
但是,如果没有 avgt60 变量...
m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + (1|year) + (1|site),
family = "nbinom2", data = weather1)
Family: nbinom2 ( log )
Formula: dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + (1 | year) + (1 | site)
Data: weather1
AIC BIC logLik deviance df.resid
1648.2 1684.6 -814.1 1628.2 270
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
year (Intercept) 2.052e-10 1.433e-05
site (Intercept) 4.007e-10 2.002e-05
Number of obs: 280, groups: year, 3; site, 6
Dispersion parameter for nbinom2 family (): 0.23
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3677 0.3482 3.928 8.56e-05 ***
poly(rh60, degree = 2)1 23.8058 9.6832 2.458 0.013953 *
poly(rh60, degree = 2)2 -0.3452 4.2197 -0.082 0.934810
poly(wndspd60, degree = 2)1 34.4332 10.1328 3.398 0.000678 ***
poly(wndspd60, degree = 2)2 -61.2044 6.5179 -9.390 < 2e-16 ***
poly(raintt60, degree = 2)1 12.0109 6.4949 1.849 0.064417 .
poly(raintt60, degree = 2)2 -57.2197 6.0502 -9.457 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
如果我将 avgt60 作为 poly() 项保留,它会抛出整个模型,并且没有任何意义。有什么想法吗?
这是数据集的 link,站点名称已编辑:https://docs.google.com/spreadsheets/d/1mFDK_YEshvgGPHpvqu4o6TbFfwKRgHaVUVGOIZnsq7c/edit?usp=sharing
您的数据集中有 280 行,但预测变量只有 10 个唯一值:
nrow(unique(subset(weather1, select = -c(dsi))))
这决定了您实际可以拟合的模型的复杂程度。
您正在尝试估计 8 个 fixed-effect 个参数(length(fixef(m1)$cond)
或 ncol(model.matrix(m1))
)、两个 random-effect 个参数(among-site 和 among-year方差)和一个色散参数(对于负二项式参数)= 11(或 length(m1$fit$par)
)。这比您拥有的唯一预测变量组合更多的参数!
Murtaugh (2007) 指出,如果您有一个 嵌套 设计(预测变量的值仅在组之间发生变化,而不是在组内发生变化),您将获得相同的效果估计您是否将每个组的响应变量(或您的情况下的 site/year 组合)汇总到其均值。 (如果你有不平衡的群体,就像在这种情况下,你需要用权重来拟合模型,这种方法不适用于 non-Gaussian 响应,但原理是相似的。)
如果您省略 avgt60
,您“只有”10 个参数。我仍然不太相信这个模型,它严重过度参数化(通常你的目标是(# observations)/(# data points)至少 10,最好是 20 ...)老实说我不是甚至确定它为什么起作用——我认为是因为地点和年份的方差基本上降为零并将它们从模型中移除,所以你“只有”8 个参数要估计?
数据如下:
dsi
站点 5 和 6 的值始终 为零(仅在 2021 年测量)
dsi
2019 年的值非常高,仅测量了两个站点(1 和 3)
- 没有特定的模式,当然也没有不与网站和年份混淆的模式。
我可能会尝试从这些数据中仅得出定性结论,或非常简单的定量结论...
library(tidyverse); theme_set(theme_bw())
w3 <- (weather1
|> as_tibble()
|> select(-date)
|> pivot_longer(-c(site, year, dsi), names_to = "var")
|> mutate(across(c(year,site), factor))
)
theme_set(theme_bw(base_size = 20) + theme(panel.spacing = grid::unit(0, "lines")))
(ggplot(w3)
+ aes(x = value, y = dsi, colour = site, shape = year)
+ stat_sum(alpha = 0.6)
+ stat_summary(fun = mean)
+ stat_summary(fun = mean, geom = "line", aes(group = 1), colour = "gray")
+ facet_wrap(~var, scale = "free_x")
+ scale_y_sqrt()
)
Murtaugh, Paul A.“生态数据分析中的简单性和复杂性。”生态 88,没有。 1 (2007): 56–62.
我正在尝试 运行 使用 glmmTMB
的模型。当我包含 avgt60 时,它在模型中做了奇怪的事情,我不太确定为什么。当我将它作为非多边形项包含时,它会给我 NaN 值。当我将它作为一个 poly() 项包括在内时,它会抛出整个模型。当我排除它时,它似乎很好......我是这类工作的新手,所以任何建议都将不胜感激!
m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + avgt60 + (1|year) + (1|site),
family = "nbinom2", data = weather1)
我得到:
Family: nbinom2 ( log )
Formula: dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + avgt60 + (1 | year) + (1 | site)
Data: weather1
AIC BIC logLik deviance df.resid
1647.9 1687.9 -813.0 1625.9 269
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
year (Intercept) 5.883e-24 2.426e-12
site (Intercept) 6.396e-07 7.997e-04
Number of obs: 280, groups: year, 3; site, 6
Dispersion parameter for nbinom2 family (): 0.232
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.8560 NaN NaN NaN
poly(rh60, degree = 2)1 47.9631 NaN NaN NaN
poly(rh60, degree = 2)2 -5.4370 NaN NaN NaN
poly(wndspd60, degree = 2)1 61.7092 NaN NaN NaN
poly(wndspd60, degree = 2)2 -74.9432 NaN NaN NaN
poly(raintt60, degree = 2)1 27.2669 NaN NaN NaN
poly(raintt60, degree = 2)2 -72.9072 NaN NaN NaN
avgt60 0.4384 NaN NaN NaN
但是,如果没有 avgt60 变量...
m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + (1|year) + (1|site),
family = "nbinom2", data = weather1)
Family: nbinom2 ( log )
Formula: dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + (1 | year) + (1 | site)
Data: weather1
AIC BIC logLik deviance df.resid
1648.2 1684.6 -814.1 1628.2 270
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
year (Intercept) 2.052e-10 1.433e-05
site (Intercept) 4.007e-10 2.002e-05
Number of obs: 280, groups: year, 3; site, 6
Dispersion parameter for nbinom2 family (): 0.23
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3677 0.3482 3.928 8.56e-05 ***
poly(rh60, degree = 2)1 23.8058 9.6832 2.458 0.013953 *
poly(rh60, degree = 2)2 -0.3452 4.2197 -0.082 0.934810
poly(wndspd60, degree = 2)1 34.4332 10.1328 3.398 0.000678 ***
poly(wndspd60, degree = 2)2 -61.2044 6.5179 -9.390 < 2e-16 ***
poly(raintt60, degree = 2)1 12.0109 6.4949 1.849 0.064417 .
poly(raintt60, degree = 2)2 -57.2197 6.0502 -9.457 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
如果我将 avgt60 作为 poly() 项保留,它会抛出整个模型,并且没有任何意义。有什么想法吗?
这是数据集的 link,站点名称已编辑:https://docs.google.com/spreadsheets/d/1mFDK_YEshvgGPHpvqu4o6TbFfwKRgHaVUVGOIZnsq7c/edit?usp=sharing
您的数据集中有 280 行,但预测变量只有 10 个唯一值:
nrow(unique(subset(weather1, select = -c(dsi))))
这决定了您实际可以拟合的模型的复杂程度。
您正在尝试估计 8 个 fixed-effect 个参数(length(fixef(m1)$cond)
或 ncol(model.matrix(m1))
)、两个 random-effect 个参数(among-site 和 among-year方差)和一个色散参数(对于负二项式参数)= 11(或 length(m1$fit$par)
)。这比您拥有的唯一预测变量组合更多的参数!
Murtaugh (2007) 指出,如果您有一个 嵌套 设计(预测变量的值仅在组之间发生变化,而不是在组内发生变化),您将获得相同的效果估计您是否将每个组的响应变量(或您的情况下的 site/year 组合)汇总到其均值。 (如果你有不平衡的群体,就像在这种情况下,你需要用权重来拟合模型,这种方法不适用于 non-Gaussian 响应,但原理是相似的。)
如果您省略 avgt60
,您“只有”10 个参数。我仍然不太相信这个模型,它严重过度参数化(通常你的目标是(# observations)/(# data points)至少 10,最好是 20 ...)老实说我不是甚至确定它为什么起作用——我认为是因为地点和年份的方差基本上降为零并将它们从模型中移除,所以你“只有”8 个参数要估计?
数据如下:
dsi
站点 5 和 6 的值始终 为零(仅在 2021 年测量)dsi
2019 年的值非常高,仅测量了两个站点(1 和 3)- 没有特定的模式,当然也没有不与网站和年份混淆的模式。
我可能会尝试从这些数据中仅得出定性结论,或非常简单的定量结论...
library(tidyverse); theme_set(theme_bw())
w3 <- (weather1
|> as_tibble()
|> select(-date)
|> pivot_longer(-c(site, year, dsi), names_to = "var")
|> mutate(across(c(year,site), factor))
)
theme_set(theme_bw(base_size = 20) + theme(panel.spacing = grid::unit(0, "lines")))
(ggplot(w3)
+ aes(x = value, y = dsi, colour = site, shape = year)
+ stat_sum(alpha = 0.6)
+ stat_summary(fun = mean)
+ stat_summary(fun = mean, geom = "line", aes(group = 1), colour = "gray")
+ facet_wrap(~var, scale = "free_x")
+ scale_y_sqrt()
)
Murtaugh, Paul A.“生态数据分析中的简单性和复杂性。”生态 88,没有。 1 (2007): 56–62.