为什么我在模型摘要输出中得到 NA?使用 glmmTMB() 的零膨胀 GLMM
Why am I getting NAs in the model summary output? zero-inflated GLMM with glmmTMB()
我正在尝试 运行 具有 glmmTMB
的零膨胀负二项式 GLMM;但是我在模型摘要输出的 z
和 p
值中得到 NA
s。我不确定原因是什么;我遵循了小插图和在线帮助,但我认为我的数据和我尝试使用的技术一定存在问题。
我的数据类似于支持文档中使用的 Salamanders
示例:负二项分布,零膨胀,具有相同的数据结构。
问题出在哪里?此数据是否适合使用 family = nbinom2
?
数据:
> head(abun_data)
depl_ID Keyword_1 depl_dur logging n AmbientTemperature ElNino
1 B1-1-14_1 Bearded Pig 82 pre-logging 3 23.33333 before
2 B1-1-14_1 Malayan Porcupine 82 pre-logging 0 24.33333 before
3 B1-1-14_1 Pig-tailed Macaque 82 pre-logging 3 24.33333 before
4 B1-1-14_1 Sambar Deer 82 pre-logging 0 24.00000 before
5 B1-1-14_1 Red Muntjac 82 pre-logging 2 24.00000 before
6 B1-1-14_1 Lesser Mouse-deer 82 pre-logging 1 23.00000 before
> str(abun_data)
'data.frame': 1860 obs. of 7 variables:
$ depl_ID : Factor w/ 315 levels "B1-1-14_1","B1-1-14_2",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Keyword_1 : Factor w/ 6 levels "Bearded Pig",..: 1 2 3 4 5 6 1 2 3 4 ...
$ depl_dur : num 82 82 82 82 82 82 26 26 26 26 ...
$ logging : Factor w/ 3 levels "logging","post-logging",..: 3 3 3 3 3 3 3 3 3 3 ...
$ n : int 3 0 3 0 2 1 2 0 0 0 ...
$ AmbientTemperature: num 23.3 24.3 24.3 24 24 ...
$ ElNino : Factor w/ 3 levels "after","before",..: 2 2 2 2 2 2 2 2 2 2 ...
我的模特:
> zinb <- glmmTMB(n ~ Keyword_1 * logging + (1|depl_ID), zi = ~ Keyword_1 * logging,
+ data = abun_data, family = "nbinom2")
Warning message:
In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
> summary(zinb)
Family: nbinom2 ( log )
Formula: n ~ Keyword_1 * logging + (1 | depl_ID)
Zero inflation: ~Keyword_1 * logging
Data: abun_data
AIC BIC logLik deviance df.resid
NA NA NA NA 1822
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
depl_ID (Intercept) 0.5413 0.7358
Number of obs: 1860, groups: depl_ID, 310
Overdispersion parameter for nbinom2 family (): 1.29
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.99965 NA NA NA
Keyword_1Malayan Porcupine -1.30985 NA NA NA
Keyword_1Pig-tailed Macaque -0.90110 NA NA NA
Keyword_1Sambar Deer -1.34268 NA NA NA
Keyword_1Red Muntjac -0.76250 NA NA NA
Keyword_1Lesser Mouse-deer -16.21798 NA NA NA
loggingpost-logging 0.83935 NA NA NA
loggingpre-logging 0.58252 NA NA NA
Keyword_1Malayan Porcupine:loggingpost-logging -0.53276 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpost-logging -5.52093 NA NA NA
Keyword_1Sambar Deer:loggingpost-logging -0.73450 NA NA NA
Keyword_1Red Muntjac:loggingpost-logging 0.04825 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpost-logging -9.74912 NA NA NA
Keyword_1Malayan Porcupine:loggingpre-logging -0.18893 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpre-logging -0.08802 NA NA NA
Keyword_1Sambar Deer:loggingpre-logging 0.72087 NA NA NA
Keyword_1Red Muntjac:loggingpre-logging 0.51223 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpre-logging 15.10588 NA NA NA
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3469 NA NA NA
Keyword_1Malayan Porcupine -11.7164 NA NA NA
Keyword_1Pig-tailed Macaque 1.5618 NA NA NA
Keyword_1Sambar Deer 0.6967 NA NA NA
Keyword_1Red Muntjac -17.6199 NA NA NA
Keyword_1Lesser Mouse-deer 18.7331 NA NA NA
loggingpost-logging -19.2344 NA NA NA
loggingpre-logging -2.1708 NA NA NA
Keyword_1Malayan Porcupine:loggingpost-logging 32.6525 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpost-logging -1.2560 NA NA NA
Keyword_1Sambar Deer:loggingpost-logging 19.1848 NA NA NA
Keyword_1Red Muntjac:loggingpost-logging -3.4218 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpost-logging 7.4168 NA NA NA
Keyword_1Malayan Porcupine:loggingpre-logging 14.3338 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpre-logging -22.1736 NA NA NA
Keyword_1Sambar Deer:loggingpre-logging 1.6785 NA NA NA
Keyword_1Red Muntjac:loggingpre-logging 17.0664 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpre-logging -14.3445 NA NA NA
第一个线索是警告
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
这意味着模型没有收敛,或者认为它没有收敛到对数似然曲面向下弯曲(即真正的最大值)的解。这就是无法计算标准误差的原因(如果您进行通常的计算,它们会出现负数或复数)。可以计算对数似然 ,但模型拟合值得怀疑,因此改为 glmmTMB returns NA
。
下一个问题:为什么?有时这很神秘且难以诊断,但在这种情况下我们有一个很好的线索:当您在(非同一 link)中看到极端参数值(例如 |beta|>10)时GLM,它几乎总是意味着某种形式的complete separation正在发生。也就是说,有一些协变量组合(例如 Keyword_1
==Lesser Mouse-deer
),其中计数总是为零。在对数尺度上,这意味着密度 无限 低于具有正均值的协变量组合。该参数约为 -16,对应于 exp(-16) = 1e-07
的预期乘法密度差。这不是无穷小,但它足够小以至于 glmmTMB 在优化器停止的对数似然中得到足够小的差异。但是,由于似然面几乎是平坦的,它不能计算曲率等。
您可以合并或删除类别或进行某种形式的正则化(例如,参见 here or here ...);将您的 Keyword_1
变量视为随机效应也可能有意义,这也会对估计值进行正则化。
我正在尝试 运行 具有 glmmTMB
的零膨胀负二项式 GLMM;但是我在模型摘要输出的 z
和 p
值中得到 NA
s。我不确定原因是什么;我遵循了小插图和在线帮助,但我认为我的数据和我尝试使用的技术一定存在问题。
我的数据类似于支持文档中使用的 Salamanders
示例:负二项分布,零膨胀,具有相同的数据结构。
问题出在哪里?此数据是否适合使用 family = nbinom2
?
数据:
> head(abun_data)
depl_ID Keyword_1 depl_dur logging n AmbientTemperature ElNino
1 B1-1-14_1 Bearded Pig 82 pre-logging 3 23.33333 before
2 B1-1-14_1 Malayan Porcupine 82 pre-logging 0 24.33333 before
3 B1-1-14_1 Pig-tailed Macaque 82 pre-logging 3 24.33333 before
4 B1-1-14_1 Sambar Deer 82 pre-logging 0 24.00000 before
5 B1-1-14_1 Red Muntjac 82 pre-logging 2 24.00000 before
6 B1-1-14_1 Lesser Mouse-deer 82 pre-logging 1 23.00000 before
> str(abun_data)
'data.frame': 1860 obs. of 7 variables:
$ depl_ID : Factor w/ 315 levels "B1-1-14_1","B1-1-14_2",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Keyword_1 : Factor w/ 6 levels "Bearded Pig",..: 1 2 3 4 5 6 1 2 3 4 ...
$ depl_dur : num 82 82 82 82 82 82 26 26 26 26 ...
$ logging : Factor w/ 3 levels "logging","post-logging",..: 3 3 3 3 3 3 3 3 3 3 ...
$ n : int 3 0 3 0 2 1 2 0 0 0 ...
$ AmbientTemperature: num 23.3 24.3 24.3 24 24 ...
$ ElNino : Factor w/ 3 levels "after","before",..: 2 2 2 2 2 2 2 2 2 2 ...
我的模特:
> zinb <- glmmTMB(n ~ Keyword_1 * logging + (1|depl_ID), zi = ~ Keyword_1 * logging,
+ data = abun_data, family = "nbinom2")
Warning message:
In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
> summary(zinb)
Family: nbinom2 ( log )
Formula: n ~ Keyword_1 * logging + (1 | depl_ID)
Zero inflation: ~Keyword_1 * logging
Data: abun_data
AIC BIC logLik deviance df.resid
NA NA NA NA 1822
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
depl_ID (Intercept) 0.5413 0.7358
Number of obs: 1860, groups: depl_ID, 310
Overdispersion parameter for nbinom2 family (): 1.29
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.99965 NA NA NA
Keyword_1Malayan Porcupine -1.30985 NA NA NA
Keyword_1Pig-tailed Macaque -0.90110 NA NA NA
Keyword_1Sambar Deer -1.34268 NA NA NA
Keyword_1Red Muntjac -0.76250 NA NA NA
Keyword_1Lesser Mouse-deer -16.21798 NA NA NA
loggingpost-logging 0.83935 NA NA NA
loggingpre-logging 0.58252 NA NA NA
Keyword_1Malayan Porcupine:loggingpost-logging -0.53276 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpost-logging -5.52093 NA NA NA
Keyword_1Sambar Deer:loggingpost-logging -0.73450 NA NA NA
Keyword_1Red Muntjac:loggingpost-logging 0.04825 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpost-logging -9.74912 NA NA NA
Keyword_1Malayan Porcupine:loggingpre-logging -0.18893 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpre-logging -0.08802 NA NA NA
Keyword_1Sambar Deer:loggingpre-logging 0.72087 NA NA NA
Keyword_1Red Muntjac:loggingpre-logging 0.51223 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpre-logging 15.10588 NA NA NA
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3469 NA NA NA
Keyword_1Malayan Porcupine -11.7164 NA NA NA
Keyword_1Pig-tailed Macaque 1.5618 NA NA NA
Keyword_1Sambar Deer 0.6967 NA NA NA
Keyword_1Red Muntjac -17.6199 NA NA NA
Keyword_1Lesser Mouse-deer 18.7331 NA NA NA
loggingpost-logging -19.2344 NA NA NA
loggingpre-logging -2.1708 NA NA NA
Keyword_1Malayan Porcupine:loggingpost-logging 32.6525 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpost-logging -1.2560 NA NA NA
Keyword_1Sambar Deer:loggingpost-logging 19.1848 NA NA NA
Keyword_1Red Muntjac:loggingpost-logging -3.4218 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpost-logging 7.4168 NA NA NA
Keyword_1Malayan Porcupine:loggingpre-logging 14.3338 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpre-logging -22.1736 NA NA NA
Keyword_1Sambar Deer:loggingpre-logging 1.6785 NA NA NA
Keyword_1Red Muntjac:loggingpre-logging 17.0664 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpre-logging -14.3445 NA NA NA
第一个线索是警告
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
这意味着模型没有收敛,或者认为它没有收敛到对数似然曲面向下弯曲(即真正的最大值)的解。这就是无法计算标准误差的原因(如果您进行通常的计算,它们会出现负数或复数)。可以计算对数似然 ,但模型拟合值得怀疑,因此改为 glmmTMB returns NA
。
下一个问题:为什么?有时这很神秘且难以诊断,但在这种情况下我们有一个很好的线索:当您在(非同一 link)中看到极端参数值(例如 |beta|>10)时GLM,它几乎总是意味着某种形式的complete separation正在发生。也就是说,有一些协变量组合(例如 Keyword_1
==Lesser Mouse-deer
),其中计数总是为零。在对数尺度上,这意味着密度 无限 低于具有正均值的协变量组合。该参数约为 -16,对应于 exp(-16) = 1e-07
的预期乘法密度差。这不是无穷小,但它足够小以至于 glmmTMB 在优化器停止的对数似然中得到足够小的差异。但是,由于似然面几乎是平坦的,它不能计算曲率等。
您可以合并或删除类别或进行某种形式的正则化(例如,参见 here or here ...);将您的 Keyword_1
变量视为随机效应也可能有意义,这也会对估计值进行正则化。